BET 🤞

We bet on it and it worked. This repository includes the scripts to train transformers (finetuneutils.py and torchutils.py) from Huggingface library with Pytorch. We load the data from module datautils.py. We prepared the main.py script to run the training and evaluation. Finally, we use the run_loop.py to run iteratively all the configuration for BET experiment: datasets, transformers and different augmentation languages.

Clustering Languages

We clustered all the Google Translation languages into the related language families based on the information provided in the Wikipedia info-boxes. The Romance branch is illustrated in the following figure.

Backtranslation process

Results

The bold values in the table show how BET increased the performance. "all" stands for backtranslation from all the languages in our set. We also report the top-performing languages too.

Model	Data	Accuracy	F1-score	Precision	Recall
BERT	baseline	0.802	0.858	0.820	0.899
BERT	all	0.824	0.877	0.819	0.945
BERT	Spanish	0.835	0.882	0.840	0.929
XLNet	baseline	0.845	0.886	0.868	0.905
XLNet	all	0.837	0.883	0.840	0.932
XLNet	Japanese	0.860	0.897	0.877	0.919
RoBERTa	baseline	0.874	0.906	0.898	0.914
RoBERTa	all	0.872	0.907	0.877	0.939
RoBERTa	Vietnamese	0.886	0.915	0.906	0.925
ALBERT	baseline	0.853	0.890	0.885	0.895
ALBERT	all	0.841	0.886	0.847	0.929
ALBERT	Yoruba	0.867	0.902	0.884	0.922

Install dependencies

pip install -r requirements.txt

How do I cite BET?

For now, cite ... ,

@article{jphabdibet,
  title={BET: A Backtranslation Approach for Easy Data Augmentation in Transformer-based Paraphrase Identification Context},
  author={Corbeil, Jean-Philippe and Abdi Ghavidel, Hadi},
  journal={arXiv preprint},
  year={2020}
}

Disclaimer

Some parts of the code were inspired by HuggingFace Transformers Implementations.

Name		Name	Last commit message	Last commit date
Latest commit jpcorb20 Merge branch 'master' of https://github.com/jpcorb20/emnlp2020-gt-bac… Oct 19, 2020 a14834c · Oct 19, 2020 History 34 Commits
datasets		datasets
google-translate-backtranslation-da		google-translate-backtranslation-da
img		img
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
datautils.py		datautils.py
finetuneutils.py		finetuneutils.py
main.py		main.py
requirements.txt		requirements.txt
run_loop.py		run_loop.py
torchutils.py		torchutils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BET 🤞

Clustering Languages

Backtranslation process

Results

Install dependencies

How do I cite BET?

Disclaimer

About

Releases

Packages

Contributors 3

Languages

jpcorb20/bet-backtranslation-paraphrase-experiment

Folders and files

Latest commit

History

Repository files navigation

BET 🤞

Clustering Languages

Backtranslation process

Results

Install dependencies

How do I cite BET?

Disclaimer

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages