-
Notifications
You must be signed in to change notification settings - Fork 30.9k
Description
I try to reproduce the CoLA results reported in the BERT paper but the numbers are far from the reported one. My best mcc (BERT large) for dev is 64.79% and the test result is 56.9% while the reported test result is 60.5%. The learning rate is 2e-5 and the total number of epochs is 5. For BERT base,the result is also lower by 3-5%.
As the paper said,
for BERTLARGE we found that fine-tuning was sometimes unstable on small data sets (i.e., some runs would produce degenerate results), so we ran several random restarts and selected the model that performed best on the Dev set.
I also tried several restarts with different learning rates and random seeds but it seems no improvement. I'm quite confused for the reproduction. Any suggestions would be greatly appreciated.