Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training/test splits of the BUCC dataset #13

Closed
MGithubGA opened this issue Apr 24, 2020 · 3 comments
Closed

Training/test splits of the BUCC dataset #13

MGithubGA opened this issue Apr 24, 2020 · 3 comments

Comments

@MGithubGA
Copy link

Hi,

Congratulations on your paper!

I want to know the training/test splits of the BUCC dataset.
In the paper, it writes "we evaluate representations on the test sets directly", but the training data are renamed as test set at here.

for f in $base_dir/*training*; do mv $f ${f/training/test}; done

So which spilt is used as the test set of bucc in xtreme? Training set or test set?

Thanks.

@sebastianruder
Copy link
Collaborator

Hi! Thanks for the question and the close look at the code. Good catch! :)
We used the training set of the BUCC task for evaluation in XTREME as the BUCC test set is private. We might update the evaluation in the future if we get access to the original test data. This is not a problem in practice as no new parameters are learned for this task.
@JunjieHu, do you have anything to add?

@JunjieHu
Copy link
Contributor

Thanks for the question! Indeed as Sebastian said, the original test data is private. So we use the released train set as our evaluation test set, and use the released sample set as our dev set which can be used by the participants, for example to find a threshold for mining the bitext.

@MGithubGA
Copy link
Author

Thanks for your quick reply! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants