Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Document-Level Simplification

This repository contains the code for ACL2023 paper: SWiPE: A Dataset for Document-Level Simplification of Wikipedia Pages.

The SWiPE dataset

All dataset files are in the data/ folder. We release both the manually annotated portion of the data, which consists of ~5k samples, as well as the full dataset which contains roughly 140k document pairs.

The SWiPE_Dataset.ipynb notebook goes over how to load the dataset and process/visualize annotations.


We release three model cards on the HuggingFace hub:

  • Salesforce/bart-large-swipe: A BART-large model finetuned on the SWiPE dataset which can generate document-level edits.
  • Salesforce/bart-large-swipe-clean: A BART-large model finetuned on the cleaned version of the SWiPE dataset, which can generate document-level edits with a reduced proportion of (undesirable) extraneous information edits. We recommend using this model for future comparisons.
  • Salesforce/bic_simple_edit_id: The BIC model, which is a RoBERTa-large model finetuned on the task of edit group identification. BIC achieved the highest performance in our experiments on edit identification, by jointly grouping and categorizing edits using a BIO taggging label-set.

The Generation_and_Identification.ipynb notebook provides an example of generating simplified text for a Wikipedia page and identifying the edits using the BIC model.

Generator Data

Outputs from the models included in Section 6 are provided in data/swipe_generator_data.json. A notebook explaining how to inspect the data is provided in: Generation_Data.ipynb.

Cite the work

If you make use of the code, models, or dataset, please cite our paper:

  title={SWiPE: A Dataset for Document-Level Simplification of Wikipedia Pages},
  author={Philippe Laban and Jesse Vig and Wojciech Kryscinski and Shafiq Joty and Caiming Xiong and Chien-Sheng Jason Wu},
  booktitle={Proceedings of the 61th Annual Meeting of the Association for Computational Linguistics},


If you'd like to contribute, or have questions or suggestions, you can contact us at All contributions are welcome!


No description, website, or topics provided.



Code of conduct

Security policy





No releases published


No packages published