Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reproducability question #24

Closed
944721805 opened this issue May 11, 2020 · 3 comments
Closed

reproducability question #24

944721805 opened this issue May 11, 2020 · 3 comments

Comments

@944721805
Copy link

Hi,I found that i can't achieve the same result especially on XNLI as the paper . I jsut bash the scripts without other operation.

@944721805
Copy link
Author

DEV result is normal,but TEST maybe something wrong

@melvinjosej
Copy link
Collaborator

Hi,

Can you post the numbers you're obtaining? We'd like to know the magnitude of the difference?

Also your command lines will help us debug.

@JunjieHu
Copy link
Contributor

@944721805 I believe that you observed the test scores closed to 0.33. This is because we replaced the true label for all the test sets by a placeholder label. The rational of this process is to prevent people from just modifying the ground-truth labels as a submission to our platform. Instead, we encourage participants to submit their results to our platform for evaluation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants