reproducability question #24

944721805 · 2020-05-11T03:06:55Z

Hi,I found that i can't achieve the same result especially on XNLI as the paper . I jsut bash the scripts without other operation.

944721805 · 2020-05-11T04:15:45Z

DEV result is normal，but TEST maybe something wrong

melvinjosej · 2020-05-12T18:06:22Z

Hi,

Can you post the numbers you're obtaining? We'd like to know the magnitude of the difference?

Also your command lines will help us debug.

JunjieHu · 2020-05-13T21:33:31Z

@944721805 I believe that you observed the test scores closed to 0.33. This is because we replaced the true label for all the test sets by a placeholder label. The rational of this process is to prevent people from just modifying the ground-truth labels as a submission to our platform. Instead, we encourage participants to submit their results to our platform for evaluation.

944721805 closed this as completed May 14, 2020

niwic mentioned this issue Sep 19, 2021

Unable to reproduce the reported numbers for XNLI and PAWS-X with mBERT. #65

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reproducability question #24

reproducability question #24

944721805 commented May 11, 2020

944721805 commented May 11, 2020

melvinjosej commented May 12, 2020

JunjieHu commented May 13, 2020

reproducability question #24

reproducability question #24

Comments

944721805 commented May 11, 2020

944721805 commented May 11, 2020

melvinjosej commented May 12, 2020

JunjieHu commented May 13, 2020