Training/test splits of the BUCC dataset #13

MGithubGA · 2020-04-24T09:15:48Z

Hi,

Congratulations on your paper!

I want to know the training/test splits of the BUCC dataset.
In the paper, it writes "we evaluate representations on the test sets directly", but the training data are renamed as test set at here.

for f in $base_dir/*training*; do mv $f ${f/training/test}; done

So which spilt is used as the test set of bucc in xtreme? Training set or test set?

Thanks.

The text was updated successfully, but these errors were encountered:

sebastianruder · 2020-04-26T13:35:47Z

Hi! Thanks for the question and the close look at the code. Good catch! :)
We used the training set of the BUCC task for evaluation in XTREME as the BUCC test set is private. We might update the evaluation in the future if we get access to the original test data. This is not a problem in practice as no new parameters are learned for this task.
@JunjieHu, do you have anything to add?

JunjieHu · 2020-04-26T20:37:03Z

Thanks for the question! Indeed as Sebastian said, the original test data is private. So we use the released train set as our evaluation test set, and use the released sample set as our dev set which can be used by the participants, for example to find a threshold for mining the bitext.

MGithubGA · 2020-04-29T05:59:21Z

Thanks for your quick reply! 👍

MGithubGA closed this as completed Apr 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training/test splits of the BUCC dataset #13

Training/test splits of the BUCC dataset #13

MGithubGA commented Apr 24, 2020

sebastianruder commented Apr 26, 2020

JunjieHu commented Apr 26, 2020

MGithubGA commented Apr 29, 2020

Training/test splits of the BUCC dataset #13

Training/test splits of the BUCC dataset #13

Comments

MGithubGA commented Apr 24, 2020

sebastianruder commented Apr 26, 2020

JunjieHu commented Apr 26, 2020

MGithubGA commented Apr 29, 2020