Datasets for data-driven summarization of scientific articles: generating the title of a paper from its abstract (title-gen) or abstract from its full body (body-gen). title-gen was constructed from the MEDLINE dataset, whereas body-gen from the PubMed Open Access Subset. Some statistics on the datasets:
You can download all datasets from here. Paper to appear in the 7th International Workshop on Mining Scientific Publications, Miyazaki, Japan.
Citation:
@InProceedings{NIKOLOV18.2,
author = {Nikola Nikolov ,Michael Pfeiffer and Richard Hahnloser},
title = {Data-driven Summarization of Scientific Articles},
booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)},
year = {2018},
month = {may},
date = {7-12},
location = {Miyazaki, Japan},
editor = {},
publisher = {European Language Resources Association (ELRA)},
address = {Paris, France},
isbn = {979-10-95546-20-7},
language = {english}
}