Sum4Simp

Codes and the dataset for the paper "Exploiting Summarization Data to Help Text Simplification"(https://aclanthology.org/2023.eacl-main.3.pdf)

The S4S dataset is a stardard sentence simplification dataset mentioned in the paper. You could also mix them all for data augmentation.

If you want to obtain the aligned sentence pairs yourself, you should download the CNN and DM datasets at first. Then, you need to run 'python align.py'.

If you want to filter the suitable sentence pairs from the aligned pairs, you should calculate the attribute values at first. We have upload some example files (for WikiLarge) and you could run 'python filter.py' and check out the total scores. You could set a threshold to filter the pairs you need.

If you have any questions, please contact us: sunrenliangpku@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
Aligned sentence pairs		Aligned sentence pairs
S4S dataset		S4S dataset
LICENSE		LICENSE
README.md		README.md
align.py		align.py
dict.txt		dict.txt
filter.py		filter.py
lexicon.tsv		lexicon.tsv
sari_value.txt		sari_value.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aligned sentence pairs

Aligned sentence pairs

S4S dataset

S4S dataset

LICENSE

LICENSE

README.md

README.md

align.py

align.py

dict.txt

dict.txt

filter.py

filter.py

lexicon.tsv

lexicon.tsv

sari_value.txt

sari_value.txt

Repository files navigation

Sum4Simp

About

Releases

Packages

Languages

License

RLSNLP/Sum4Simp

Folders and files

Latest commit

History

Repository files navigation

Sum4Simp

About

Resources

License

Stars

Watchers

Forks

Languages