New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong paraphrase in the TF2/PyTorch README example. #1942
Comments
Hi, I'm investigating. For now, I confirm the issue that you observe. I've tested on both CPU and GPU and it gives the same result. I've tested with Pytorch and TF models too, same result. Now, let's track the cause! |
Hi again,
So it's not crazy high but not near random either. Then I've retested:
I've taken a more complex sentence from training set
Here we see that it's not robust to I've taken sentences from test set:
Here we see that sometimes it works, sometimes not. I might be wrong but I haven't seen anything in the code that could explain this issue (83% is the final accuracy on dev set... ok but it remains 1 error on 5 cases). A priori, I'd say that basic BERT trained like that on this tiny dataset is simply not that robust for that task in a generalized case and would need more data or at least more data augmentation. Do you share my conclusion or see something different? |
Thanks for the investigation. Was the performance ever different at the time when that example was put into the README? |
TBH, personally I wasn't there, so I don't know... |
MRPC is a very small dataset (the smallest among all GLUE benchmark and that's why we use it as an example). I should not be expected to generalize well and be usable in real-life settings. |
Sounds like we don't think there's an actionable issue here. |
馃悰 Bug
Model I am using (Bert, XLNet....): TFBertForSequenceClassification
Language I am using the model on (English, Chinese....): English
The problem arise when using:
The tasks I am working on is:
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Environment
The text was updated successfully, but these errors were encountered: