New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potentially redundant learning rate scheduling #195
Comments
Humm could be the case indeed. What do think about this @tholor? |
As far as I can tell this was introduced in c8ea286 as a byproduct of adding float16 support, and was then copied to other example files as well. |
I agree, there seems to be double LR scheduling. The applied LR is therefore lower than intended. Quick plot of the LR being set in the outer scope (i.e. in run_squad or run_lm_finetuning) vs. the inner one (in BERTAdam) shows this: In addition, I have noticed two further parts for potential clean up:
|
There is als an additional problem that causes the learning rate to not be set correctly in run_classifier.py. I created a pull request for that (and the double warmup problem): #218 |
Is there are something done for this double warmup bug? |
Yes, @matej-svejda worked on this in #218 |
I see that, but it isn't merge now? |
No, not yet. As you can see in the PR it's still WIP and he committed only 4 hours ago. If you need the fix urgently, you can apply the changes easily locally. It's quite a small fix. |
Sorry,I forget to see the time :) |
By the way, how can I draw a picture about the LR schedule about BERT like yours. I see if use |
I have plotted and Your above could should actually throw an exception because |
Ok this should be fixed in master now! |
In the following two code snippets below:
https://github.com/huggingface/pytorch-pretrained-BERT/blob/647c98353090ee411e1ef9016b2a458becfe36f9/examples/run_lm_finetuning.py#L570-L573
https://github.com/huggingface/pytorch-pretrained-BERT/blob/647c98353090ee411e1ef9016b2a458becfe36f9/examples/run_lm_finetuning.py#L611-L613
it appears that learning rate warmup is being done twice: once in the example file, and once inside the BertAdam class. Am I reading this wrong? Because I'm pretty sure the BertAdam class performs its own warm-up when initialized with those arguments.
Here is an excerpt from the BertAdam class, where warm-up is also applied:
https://github.com/huggingface/pytorch-pretrained-BERT/blob/647c98353090ee411e1ef9016b2a458becfe36f9/pytorch_pretrained_bert/optimization.py#L146-L150
This also applies to other examples, e.g.
https://github.com/huggingface/pytorch-pretrained-BERT/blob/647c98353090ee411e1ef9016b2a458becfe36f9/examples/run_squad.py#L848-L851
https://github.com/huggingface/pytorch-pretrained-BERT/blob/647c98353090ee411e1ef9016b2a458becfe36f9/examples/run_squad.py#L909-L911
The text was updated successfully, but these errors were encountered: