-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't reproduce the result of "Benchmark 2: Fine-Tuning RoBERTa on GLUE tasks" #25
Comments
I am happy to help @CrazyElements |
Thanks @jiaweizzhao. For mrpc, I just used the hyperparameters listed in README.
For the other tasks, I modified |
I tried and it works as expected. The issue might be we report F1 score of mrpc in the paper and causes the confusion. I will change it back to accuracy in the new revision. |
Thank you for your response. But I'm still unable to replicate the results. The final f1 score of mrpc is 91.93, and the matthews_correlation of cola is 59.6. |
This might be due to the choice of the random seed. I did a quick sweep using my previous setup (based on the config you provided): |
So I wonder if you use a different seed(not 1234)? Maybe I mistakenly assumed that the example script in the README would yield the same results. If indeed you did, would it be possible for you to consider open-sourcing the fine-tuning script?
And here I used hyperparameters listed in table 7. |
We use the avg score of repeated runs. We will release the fine-tuning scripts later, along with a few more fine-tuning experiments. |
Has anyone successfully replicated the results of fine-tuning tasks?
I followed the hyperparameters outlined in the REAMDE and the paper, and tried cola and mrpc tasks on a single GPU without gradient accumulation. However, the results I obtained differed from those reported in the paper.
And here are best performances of my runs
where numbers in parentheses are the results of the paper.
I would appreciate any assistance from someone who can provide insights on this matter.
The text was updated successfully, but these errors were encountered: