Skip to content

Latest commit

 

History

History
37 lines (32 loc) · 1.35 KB

README.md

File metadata and controls

37 lines (32 loc) · 1.35 KB

DOI

About

Extracts Abstract and Title Dataset from arXiv articles

Contents

RequirementsCodeHow to Cite

Requirements

Code

  • Domain of articles: search_query (i.e. Artificial Intelligence), case insensitive
  • Exclude articles that have URL or "Proceeding of the" in the Title or Abstract
  • Results filename format:
    <QUERY>_<START_INDEX>_<MAX_NUMBER_ARTICLES_IN_PAGING>_<ACTUAL_NUMBER_ARTICLES>_<TOTAL_MAX_NUMBER_ARTICLES>_<MIN_NUMBER_WORDS_ABS>
    

Acknowledgement

Please star or fork if this code was useful for you. If you use it in a paper, please cite as:

@software{cunha_sergio2019arxiv_abstract2title,
    author       = {Gwenaelle Cunha Sergio},
    title        = {{gcunhase/ArXivAbsTitleDataset: Extracting Abstract and Title Dataset from arXiv articles}},
    month        = oct,
    year         = 2019,
    doi          = {10.5281/zenodo.3496527},
    version      = {v1.0},
    publisher    = {Zenodo},
    url          = {https://github.com/gcunhase/ArXivAbsTitleDataset}
    }