This repo contains data for the ACL 2021 paper: DESCGEN: A Distantly Supervised Dataset for Generating Abstractive Entity Descriptions
The dataset can be downloaded from the link: https://drive.google.com/drive/folders/1Nu3Y4lcau93hpJCJi6rhKOwEuLGrsHEc?usp=sharing
-
The entity2context file contains all source documents for writing descriptions, where the key is the entity name and the value is the list of documents mentioning the entity.
-
The entity2summary_train file contains the wiki summary for the training entities.
-
The entity2summary_dev_distant file contains the wiki summary for the dev entities.
-
The entity2summary_dev_verified file contains the human-written summary for the dev entities. Similar for entity2summary_test_distant and entity2summary_test_verified.
in progress