Skip to content

Datasets

Mehvin edited this page Aug 1, 2018 · 13 revisions

Below are the datasets used in this project

DeepMind Q&A Dataset (CNN & DailyMail)

  • Main Dataset
  • English News Articles
  • Each File Contains an Article and it's Respective Gold-standard
  • Total of 312,085 Files
  • Link to download

Australian Legal Case Reports Dataset

  • Different Domain of Dataset - Legal Cases
  • Used for Generalization Test
  • Total of 3,887 Legal Documents
  • Link to download

Large Scale Chinese Short Text Summarization (LCSTS) Dataset

  • Contains Short Texts and it's Respective Gold-standard
  • Total of 2,400,591 Short Texts
  • Link to download - Requires application to obtain corpus

Datasets Pre-processing