Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add XNLI train set #781

Merged
merged 3 commits into from
Nov 9, 2020
Merged

Add XNLI train set #781

merged 3 commits into from
Nov 9, 2020

Conversation

lhoestq
Copy link
Member

@lhoestq lhoestq commented Oct 30, 2020

I added the train set that was built using the translated MNLI.
Now you can load the dataset specifying one language:

from datasets import load_dataset

xnli_en = load_dataset("xnli", "en")
print(xnli_en["train"][0])
# {'hypothesis': 'Product and geography are what make cream skimming work .', 'label': 1, 'premise': 'Conceptually cream skimming has two basic dimensions - product and geography .'}
print(xnli_en["test"][0])                                                                                                                   
# {'hypothesis': 'I havent spoken to him again.', 'label': 2, 'premise': "Well, I wasn't even thinking about that, but I was so frustrated, and, I ended up talking to him again."}

Cc @sgugger

@yjernite yjernite added this to In progress in Datasets to Add via automation Oct 30, 2020
@lhoestq lhoestq merged commit 1abf805 into master Nov 9, 2020
Datasets to Add automation moved this from In progress to Done Nov 9, 2020
@lhoestq lhoestq deleted the add-xnli-train-set branch November 9, 2020 18:22
@YifanYangEbidanko
Copy link

Hi! Thanks for adding the translated MNLI! Do you know what translations system / model you used when you created the datasets in the other languages?

@lhoestq
Copy link
Member Author

lhoestq commented Jun 9, 2022

According to the paper it's the result of the work of professional translators ;)

@YifanYangEbidanko
Copy link

YifanYangEbidanko commented Jun 9, 2022 via email

@lhoestq
Copy link
Member Author

lhoestq commented Jun 9, 2022

The training data is not from translators.

What makes you think that ? The paper litteraly says

we hire translators to translate the resulting sentences into 15 languages using the One Hour Translation platform.

@YifanYangEbidanko
Copy link

YifanYangEbidanko commented Jun 9, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

None yet

2 participants