-
Notifications
You must be signed in to change notification settings - Fork 2
Datasets
Mehvin edited this page Aug 1, 2018
·
13 revisions
Below are the datasets used in this project
- Main Dataset
- English News Articles
- Each File Contains an Article and it's Respective Gold-standard
- Total of 312,085 Files
- Link to download
- Different Domain of Dataset - Legal Cases
- Used for Generalization Test
- Total of 3,887 Legal Documents
- Link to download
- Contains Short Texts and it's Respective Gold-standard
- Total of 2,400,591 Short Texts
- Link to download - Requires application to obtain corpus
- Done via Python
- Pre-processing done includes:
- Cleaning
- Re-formatting
- Splitting
- Pre-processing files for:
Completed by Melvin and Joe