Text Simplification Evaluation

Source code for the papers:

Tag v1.0 → Investigating Text Simplification Evaluation accepted in Findings at ACL-IJCNLP 2021
Tag v2.0 → The Role of Text Simplification Operations in Evaluation accepted at CTTS-2021 workshop

By @lmvasquezr, @MattShardlow, Piotr Przybyła and @SAnaniadou.

If you have any questions, please don't hesitate to contact us. Feel free to submit any issue/enhancement in GitHub as well.

Features

Analysis of Text Simplification corpora based on simplification operations, using the edit distance measure.
Creation of better distributed datasets (random and with our heuristic for reduction of incorrect alignments)
Technical details and modifications done for performance evaluation using EditNTS model.

Dependencies

1. Datasets Analysis & 2. Better-distributed datasets

You will need Python 3.7+ and Java (tested on 15.0.1)

git clone https://github.com/lmvasque/ts-explore.git
cd ts-explore
pip install -r requirements.txt

3. Model Evaluation

We have adapted EditNTS model code to run in our setting. You can use this adaptation from the following fork repo from original repo.

Code migration to Python 3
Scripts for data preprocessing
Other minor fixes

Usage

1. Datasets Analysis

Configure your datasets

Create a json file with the location of the dataset files:

{
  "wikismall": {
    "test": "<data_dir>/wikismall/PWKP_108016.tag.80.aner.ori.test",
    "dev": "<data_dir>/wikismall/PWKP_108016.tag.80.aner.ori.valid",
    "train": "<data_dir>/wikismall/PWKP_108016.tag.80.aner.ori.train",
    "tag": ["src", "dst"]
  }
}

This is an example for wikismall.json, which contains subsets that start with PWKP_108016.tag.80.aner.ori and end with .src and .dst, located in <data_dir>/wikismall/

Run the Java Server

Edit-distance calculations occur in Java. Open a new terminal and run the following command:

cd ts-explore/java
/bin/bash run.sh

Run the analysis

In a new terminal, run from the downloaded git repo:

python ts_eval.py --analysis --datasets examples/wikismall.json --output_dir output

2. Better-distributed datasets

For creating random distributed datasets:

python ts_eval.py --create random --datasets examples/wikismall.json --seed 324 --output_dir output

For creating datasets reduced in poor-alignments (sentences that are aligned incorrectly):

python ts_eval.py --create unaligned --datasets examples/wikismall.json --sample 0.95 --seed 324 --output_dir output

3. Model Evaluation

We adapted the original EditNTS model and documented our changes here. Then, we trained our model as follows:

python main.py --vocab_path vocab_data/ --device 0 --data_path datasets/<dataset_dir>/<dataset_train_dev> --store_dir <output_dir> --batch_size 64 --lr 0.001 --vocab_size 30000 --run_training

To run model evaluation:

python main.py --vocab_path vocab_data/ --device 0 --data_path datasets/<dataset_dir>/<dataset_test> --store_dir output/ --load_model output/<model>/checkpoints/<checkpoints_dir> --batch_size 64 --lr 0.001 --vocab_size 30000 --run_eval

📝 Note: Please note that for using this model you need to follow a preprocessing step. We have used the setting for no duplicate sentences. You can refer to the original documention for further details.

4. Calculate simplification operations

If you would like to use our edit-distance algorithm to get the simplification operations, you can run as follows:

In a separate terminal run the following command to start the Java Server:

git clone https://github.com/lmvasque/ts-explore.git
cd ts-explore/java
./run.sh

Run the script to obtain the list of operations needed to transform the source sentence into the target sentence.

python count_operations.py --source "The house was painted last week by John ." --target "John painted the house last week ."

Finally, you will get a list of operations, including the source and target token involved in the operation:

REPLACE,the,john
REPLACE,house,painted
REPLACE,was,the
REPLACE,painted,house
DELETE,by,null
DELETE,john,null

Reproducibility Details

Data

To replicate our results, please download or request the following resources:

WikiLarge & WikiSmall: from (Zhang and Lapata, 2017) splits.
Turk Corpus: from (Xu, 2016) splits.
ASSET: from (Alva-Manchego, 2020) splits. In this dataset, we performed minor transformations to be consistent with other datasets, in which there are spaces between punctuation marks. This is the list of replacements applied:
```
regex = [(",", " ,"), (".", " . "), ("(", " ( "), (")", " ) ")]
```
WikiManual: from (Jiang, 2020) splits. We limited our analysis to sentences labeled as "aligned", we filtered them as follows:
```
grep -E  "^aligned" <file> 
```
MSD: from (Cao, 2020) splits. The original dataset comes in JSON format, we filtered "text" field from each sentence. We kept every even line as the complex sentence and its corresponding odd line as its simple sentence.

Analysis

We have created a sample configuration file to replicate our TS datasets analysis. Please use this file and update with the location of the data files. You can run the datasets analysis as follows:

python ts_eval.py --analysis --datasets examples/ts_datasets.json --output_dir output

You will see the following outputs:

Edit-distance plots under <output_dir>/imgs
KL divergences between each dataset subsets, this are reported in console

Distribution divergences between Test/Dev subsets
   Dataset    Value
wikimanual 0.102053
 wikilarge 0.462257
 wikismall 0.069603

Distribution divergences between Test/Train subsets
   Dataset    Value
wikimanual 0.017596
 wikilarge 0.463852
 wikismall 0.057977

📝 Note: For ASSET and TurkCorpus, the KL-divergences were calculated in a different way since these datasets have multiple references. In our experiments, we merged all the references into a single file for each subset (test, dev and train) and then calculated the divergences.

Datasets files (complex and simple sentences in separate files) under <output_dir>/txt
Text files with edit-distance calculations under <output_dir>/txt

# Edit distance calculations: Score, Complex, Simple (tab-separated)
4.3478260869565215	She performed for President Reagan in 1988's Great Performances at the White House series , which aired on the Public Broadcasting Service .	She performed for Reagan in 1988's Great Performances at the White House series , which aired on the Public Broadcasting Service .
4.545454545454546	This was demonstrated in the Miller-Urey experiment by Stanley L .  Miller and Harold C .  Urey in 1953 .	This was shown in the Miller-Urey experiment by Stanley L .  Miller and Harold C .  Urey in 1953 .
4.545454545454546	This was substantially complete when Messiaen died , and Yvonne Loriod undertook the final movement's orchestration with advice from George Benjamin .	This was mostly complete when Messiaen died , and Yvonne Loriod undertook the final movement's orchestration with advice from George Benjamin .

Better-distributed datasets (Wiki Random, 98% and 95%)

Use the following command lines to reproduce our datasets.

# Supported values (evaluated in our paper)
# sample: 0.98, 0.95, 0.90, 0.85 and 0.80
# seed: 155, 324, 393, 728, 989 

# Wikilarge Random
python ts_eval.py --create random --seed 324 --datasets examples/datasets.wikilarge.json --output_dir output

# Wikilarge 98%
python ts_eval.py --create unaligned --datasets examples/datasets.wikilarge.json --sample 0.98 --seed 324 --output_dir output

# Wikilarge 95%
python ts_eval.py --create unaligned --datasets examples/datasets.wikilarge.json --sample 0.95 --seed 324 --output_dir output

And datasets.wikilarge.json will look like this:

{
  "wikilarge": {
    "test": "<data_dir>/wikilarge/wiki.full.aner.ori.test",
    "dev": "<data_dir>/wikilarge/wiki.full.aner.ori.dev",
    "train": "<data_dir>/wikilarge/wiki.full.aner.ori.train",
    "tag": ["src", "dst"]
  }
}

The same steps apply for WikiSmall dataset, just update the .json file.

📝 Note: The scripts above will recreate the datasets from scratch. We recommend you use this method since they fix minor limitations found in data after publication. If you still want to use the original datasets, you can download from here.

Hardware & Runtimes

For the datasets analysis and creation, we ran under the following setting:

Processor Name: 2 GHz Quad-Core Intel Core i5
Memory: 16 GB

Analysis duration: for all datasets presented in this paper it should take ~5 minutes.

For the model training, we used a different setting, using 1 GPU with the following specs:

Tesla V100-SXM2-16GB
CUDA Driver Version = 11.2

Model training duration: ~3-4 hours for WikiSmall and from ~17-22 hours for WikiLarge experiments.

Citation

If you use our results and scripts in your research, please cite our work:

Investigating Text Simplification Evaluation: this includes the evaluation of KL-divergences of Wikipedia-based TS datasets and our random (single seed) and poor-alignment (98% and 95%) analysis. These scenarios are evaluated together.

@inproceedings{vasquez-rodriguez-etal-2021-investigating,
    title = "Investigating Text Simplification Evaluation",
    author = "V{\'a}squez-Rodr{\'\i}guez, Laura  and
      Shardlow, Matthew  and
      Przyby{\l}a, Piotr  and
      Ananiadou, Sophia",
    booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.findings-acl.77",
    pages = "876--882",
}

The Role of Text Simplification Operations in Evaluation: our analysis is extended by adding multiple seeds (5) for random, more poor-alignment scenarios (98%, 95%, 90%, 85%, 80%) and Monte Carlo algorithm analysis. These scenarios are evaluated independently.

@inproceedings{vasquez-rodriguez-etal-2021-the-role,
    title = "The Role of Text Simplification Operations in Evaluation",
    author = "V{\'a}squez-Rodr{\'\i}guez, Laura  and
      Shardlow, Matthew  and
      Przyby{\l}a, Piotr  and
      Ananiadou, Sophia",
    booktitle = "First Workshop on Current Trends in Text Simplification (CTTS 2021)",
    month = sep,
    year = "2021",
    address = "Online",
    publisher = "CEUR Workshop Proceedings (CEUR-WS.org)",
    url = "http://ceur-ws.org/Vol-2944/paper4.pdf",
    pages = "57--69",
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Simplification Evaluation

Features

Dependencies

1. Datasets Analysis & 2. Better-distributed datasets

3. Model Evaluation

Usage

1. Datasets Analysis

Configure your datasets

Run the Java Server

Run the analysis

2. Better-distributed datasets

3. Model Evaluation

4. Calculate simplification operations

Reproducibility Details

Data

Analysis

Better-distributed datasets (Wiki Random, 98% and 95%)

Hardware & Runtimes

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
examples		examples
java		java
lib		lib
LICENSE		LICENSE
README.md		README.md
count_operations.py		count_operations.py
requirements.txt		requirements.txt
ts_eval.py		ts_eval.py

License

lmvasque/ts-explore

Folders and files

Latest commit

History

Repository files navigation

Text Simplification Evaluation

Features

Dependencies

1. Datasets Analysis & 2. Better-distributed datasets

3. Model Evaluation

Usage

1. Datasets Analysis

Configure your datasets

Run the Java Server

Run the analysis

2. Better-distributed datasets

3. Model Evaluation

4. Calculate simplification operations

Reproducibility Details

Data

Analysis

Better-distributed datasets (Wiki Random, 98% and 95%)

Hardware & Runtimes

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages