In this is repository you will find the data and the code for the experiments reported in the paper A logical-based corpus for cross-lingual evaluation.
To install all the libraries run:
$ bash install.sh
The code to generate the synthetic dataset is in the folder clcd/text_generation
. In the same folder you can find a detailed description of all the templates used to generate sentence pairs.
To create all datasets both in English and Portuguese just run:
$ bash gen.sh
the result datasets will be stored in the folder text_gen_output
.
The list of all used templates is displayed in the file templates.pdf
.
All data used in the paper can be found in the folder clcd_datasets/
.
To train the different models you will need to use the file basic_RNN_BERT_train.py
. It requires the path for training and testing datasets, it also requires a name to store all the results. For example:
$ python basic_RNN_BERT_train.py clcd_datasets/Portuguese/boolean_coordination_pt_train.csv clcd_datasets/Portuguese/boolean_coordination_pt_test.csv pt_experiments
@misc{clcd2019,
author = {Felipe Salvatore},
title = {Cross-Lingual Contradiction Detection},
year = {2019},
howpublished = {\url{https://github.com/felipessalvatore/CLCD}},
note = {commit xxxxxxx}
}