Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potentially redundant learning rate scheduling #195

Closed
nikitakit opened this issue Jan 14, 2019 · 12 comments
Closed

Potentially redundant learning rate scheduling #195

nikitakit opened this issue Jan 14, 2019 · 12 comments

Comments

@nikitakit
Copy link

nikitakit commented Jan 14, 2019

In the following two code snippets below:

https://github.com/huggingface/pytorch-pretrained-BERT/blob/647c98353090ee411e1ef9016b2a458becfe36f9/examples/run_lm_finetuning.py#L570-L573

https://github.com/huggingface/pytorch-pretrained-BERT/blob/647c98353090ee411e1ef9016b2a458becfe36f9/examples/run_lm_finetuning.py#L611-L613

it appears that learning rate warmup is being done twice: once in the example file, and once inside the BertAdam class. Am I reading this wrong? Because I'm pretty sure the BertAdam class performs its own warm-up when initialized with those arguments.

Here is an excerpt from the BertAdam class, where warm-up is also applied:
https://github.com/huggingface/pytorch-pretrained-BERT/blob/647c98353090ee411e1ef9016b2a458becfe36f9/pytorch_pretrained_bert/optimization.py#L146-L150

This also applies to other examples, e.g.
https://github.com/huggingface/pytorch-pretrained-BERT/blob/647c98353090ee411e1ef9016b2a458becfe36f9/examples/run_squad.py#L848-L851
https://github.com/huggingface/pytorch-pretrained-BERT/blob/647c98353090ee411e1ef9016b2a458becfe36f9/examples/run_squad.py#L909-L911

@thomwolf
Copy link
Member

Humm could be the case indeed. What do think about this @tholor?

@nikitakit
Copy link
Author

As far as I can tell this was introduced in c8ea286 as a byproduct of adding float16 support, and was then copied to other example files as well.

@tholor
Copy link
Contributor

tholor commented Jan 17, 2019

I agree, there seems to be double LR scheduling. The applied LR is therefore lower than intended. Quick plot of the LR being set in the outer scope (i.e. in run_squad or run_lm_finetuning) vs. the inner one (in BERTAdam) shows this:

lr_schedule_debug

In addition, I have noticed two further parts for potential clean up:

  1. I don't see a reason why the function warmup_linear() is implemented in two places: In optimization.py and in each example script.
  2. Is the method optimizer.get_lr() ever being called? There's actually another LR scheduling.
    https://github.com/huggingface/pytorch-pretrained-BERT/blob/f040a43cb3954e14dc47a815de012ac3f87a85d0/pytorch_pretrained_bert/optimization.py#L79-L92

@matej-svejda
Copy link
Contributor

There is als an additional problem that causes the learning rate to not be set correctly in run_classifier.py. I created a pull request for that (and the double warmup problem): #218

@kugwzk
Copy link

kugwzk commented Jan 27, 2019

Is there are something done for this double warmup bug?

@tholor
Copy link
Contributor

tholor commented Jan 27, 2019

Yes, @matej-svejda worked on this in #218

@kugwzk
Copy link

kugwzk commented Jan 27, 2019

I see that, but it isn't merge now?

@tholor
Copy link
Contributor

tholor commented Jan 27, 2019

No, not yet. As you can see in the PR it's still WIP and he committed only 4 hours ago. If you need the fix urgently, you can apply the changes easily locally. It's quite a small fix.

@kugwzk
Copy link

kugwzk commented Jan 27, 2019

Sorry,I forget to see the time :)

@kugwzk
Copy link

kugwzk commented Jan 28, 2019

By the way, how can I draw a picture about the LR schedule about BERT like yours. I see if use print(optimizer.param_groups['lr'] , the learning rate is always like I init it.

@tholor
Copy link
Contributor

tholor commented Jan 29, 2019

I have plotted optimizer.param_groups[0]["lr"] from here:
https://github.com/huggingface/pytorch-pretrained-BERT/blob/f040a43cb3954e14dc47a815de012ac3f87a85d0/examples/run_lm_finetuning.py#L610-L616

and lr_scheduled from here:
https://github.com/huggingface/pytorch-pretrained-BERT/blob/f040a43cb3954e14dc47a815de012ac3f87a85d0/pytorch_pretrained_bert/optimization.py#L145-L152

Your above could should actually throw an exception because optimizer.param_groups is a list. Try optimizer.param_groups[0]["lr"] or lr_this_step.

@thomwolf
Copy link
Member

thomwolf commented Feb 5, 2019

Ok this should be fixed in master now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants