New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BC-breaking] Feature/#2465 lr scheduler attach events #2496
[BC-breaking] Feature/#2465 lr scheduler attach events #2496
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the PR @yuta0821 !
I left few comments and will check in details the modifications in tests.
ignite/handlers/param_scheduler.py
Outdated
@@ -1015,7 +1019,7 @@ def print_lr(): | |||
if init_lr != param_group_warmup_end_value: | |||
milestones_values.append((warmup_duration, init_lr)) | |||
|
|||
lr_scheduler = LRScheduler(lr_scheduler, save_history=save_history) | |||
lr_scheduler = LRScheduler(lr_scheduler, save_history=save_history, use_legacy=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we check if we could use use_legacy=False
? Or what is blocking here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if use_legacy=False
, the result look like the example below
- example1
from torch.optim.lr_scheduler import ExponentialLR
tensor = torch.zeros([1], requires_grad=True)
optimizer = torch.optim.SGD([tensor], lr=0.01)
torch_lr_scheduler = StepLR(optimizer, step_size=3, gamma=0.1)
trainer = Engine(dummy_update)
scheduler = create_lr_scheduler_with_warmup(torch_lr_scheduler,
warmup_start_value=0.0,
warmup_end_value=0.1,
warmup_duration=3)
trainer.add_event_handler(Events.ITERATION_STARTED, scheduler)
@trainer.on(Events.ITERATION_COMPLETED)
def print_lr():
print(optimizer.param_groups[0]["lr"])
_ = trainer.run([0] * 8, max_epochs=1)
- output1
0.0
0.05
0.1
0.01
0.01
0.01
0.01
0.001
- example2
from torch.optim.lr_scheduler import ExponentialLR
tensor = torch.zeros([1], requires_grad=True)
optimizer = torch.optim.SGD([tensor], lr=0.1)
torch_lr_scheduler = StepLR(optimizer, step_size=3, gamma=0.1)
trainer = Engine(dummy_update)
scheduler = create_lr_scheduler_with_warmup(torch_lr_scheduler,
warmup_start_value=0.0,
warmup_end_value=0.1,
warmup_duration=3)
trainer.add_event_handler(Events.ITERATION_STARTED, scheduler)
@trainer.on(Events.ITERATION_COMPLETED)
def print_lr():
print(optimizer.param_groups[0]["lr"])
_ = trainer.run([0] * 8, max_epochs=1)
- output2
0.0
0.05
0.1
0.1
0.1
0.1
0.010000000000000002
0.010000000000000002
The initial value of stepLR (= 0.1 or 0.01) is counted one extra time than step_size (=3).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the details @yuta0821 , can you please debug this a bit more and explicitly say which scheduler is responsible for adding LR value (0.1 or 0.01). I'm not quite sure to understand why exactly this happens. Thanks !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vfdev-5 Thanks a lot for your comment !
Consider the case where warmup_start_value=0.0, warmup_end_value=0.1, warmup_duration=3
.
In this case, milestones_values = [(0, 0.0), (2, 0.1)]
If the initial value of lr
in the optimizer
is different from the warmup_end_value
, it is necessary to add the initial value to the end of milestones
. Therefore, milestones_values = [(0, 0.0), (2, 0.1), (3, initial value of lr)]
This is because the LRScheduler
updates the lr
starting from the last value of the milestones_values
.
After that the following code is executed, resulting in repeating the initial value of lr
.
super(LRScheduler, self).__call__(engine, name)
self.lr_scheduler.last_epoch += 1 # type: ignore[attr-defined]
Even if the initial value of lr
in the optimizer
is equal to the warmup_end_value
, then the initial value of lr
will be called extra once.
In the end, since the first __call__
method of LRScheduler
runs with reference to the last value of milestones_values
, the last value of milestones_values
plus the initial value of LRScheduler
are duplicated.
If we adjust this bug without use_legacy=False
, we may have to change a lot of code such like one related to the PeacewiseLinear
, which is beyond the scope of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yuta0821 sorry for delay and thanks for the explanation! I haven't checked it in details but will do as soon as it could be possible from my side (~4-5 days).
@sdesrozis can you help with that if you have some bandwidth ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It performs the same operation as use_legacy
, but wouldn't it be preferable to add the argument skip_initial_value
as a variable to be used for the internal function ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, expected rather than excepted 😅
As far I understand, using use_legacy=False
, the first lr
comes from the optimizer. Whatever schedulers used, the schedulers concatenation will produce a repetition at each joint.
Having an internal option as you suggested sounds good to me. I mean rename use_legacy
to skip_initial_value
is fine. Although, we have to keep use_legacy
for the users.
What do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sdesrozis
I am sorry for having consulted with you.
It seems that this problem can be solved by setting the internal variable keep_first_value=True
only when create_lr_scheduler_with_warmup
is called, as shown below, to store the initial value in LRScheduler
.
class LRScheduler(ParamScheduler):
def __init__(self, lr_scheduler: _LRScheduler, save_history: bool = False, use_legacy: bool = False, keep_first_lr: bool = False):
if keep_first_lr:
self.lr_scheduler._get_lr_called_within_step = True # type: ignore[attr-defined]
self.first_lr = self.lr_scheduler.get_lr()
self.lr_scheduler._get_lr_called_within_step = False # type: ignore[attr-defined]
def get_param(self) -> Union[float, List[float]]:
"""Method to get current optimizer's parameter value"""
# Emulate context manager for pytorch>=1.4
if hasattr(self, "first_lr"):
lr_list = self.first_lr
del self.first_lr
else:
def create_lr_scheduler_with_warmup( ):
if isinstance(lr_scheduler, _LRScheduler):
init_lr = param_group["lr"]
lr_scheduler = LRScheduler(lr_scheduler, save_history=save_history, keep_first_lr=True)
else:
init_lr = lr_scheduler.get_param()
I am running the existing test now. I will commit once all tests pass ! -> Done !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the tests are ko...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I added type: ignore[attr-defined]
!
…cheduler_attach_Events
@yuta0821 Thanks very much for this work. I have to think about the api though. The |
@sdesrozis Yes, I agree, I don't want to add more args without any thought. However, I haven't yet figured out how to control the |
My toughts are the target would be having the following code working fine from torch.optim.lr_scheduler import StepLR
scheduler1 = StepLR(optimizer, step_size=3, gamma=0.1)
scheduler1 = LRScheduler(scheduler1)
scheduler2 = StepLR(optimizer, step_size=3, gamma=0.01)
scheduler2 = LRScheduler(scheduler2)
scheduler = ConcatScheduler(schedulers=[scheduler1, scheduler2], durations=[4, ]) Whether scheduler2 = LRScheduler(scheduler2)
scheduler2.keep_first_lr = True I don't know yet... Extra question, what happen if |
@sdesrozis I'd like to confirm the behavior of following code using from torch.optim.lr_scheduler import StepLR
optimizer = torch.optim.SGD([tensor], lr=0.01)
torch_scheduler1 = StepLR(optimizer, step_size=3, gamma=0.1)
scheduler1 = LRScheduler(torch_scheduler1, use_legacy=True)
torch_scheduler2 = StepLR(optimizer, step_size=3, gamma=0.01)
scheduler2 = LRScheduler(torch_scheduler2, use_legacy=True)
scheduler = ConcatScheduler(schedulers=[scheduler1, scheduler2], durations=[4, ])
def dummy_update(engine, batch):
pass
trainer = Engine(dummy_update)
@trainer.on(Events.ITERATION_COMPLETED)
def print_lr():
print(optimizer.param_groups[0]["lr"])
trainer.add_event_handler(Events.ITERATION_COMPLETED, scheduler)
_ = trainer.run([0] * 9, max_epochs=1)
OK, I understand what message means. Thank you for the clear explanation. Surely, we must enable |
…hub.com/yuta0821/ignite into feature/#2465_LRScheduler_attach_Events
@yuta0821 the output looks correct to me. We have
What happens if using |
@vfdev-5
If set from torch.optim.lr_scheduler import StepLR
optimizer = torch.optim.SGD([tensor], lr=0.01)
torch_scheduler1 = StepLR(optimizer, step_size=3, gamma=0.1)
scheduler1 = LRScheduler(torch_scheduler1, use_legacy=False)
torch_scheduler2 = StepLR(optimizer, step_size=3, gamma=0.01)
scheduler2 = LRScheduler(torch_scheduler2, use_legacy=False)
scheduler = ConcatScheduler(schedulers=[scheduler1, scheduler2], durations=[4, ])
def dummy_update(engine, batch):
pass
trainer = Engine(dummy_update)
trainer.add_event_handler(Events.ITERATION_STARTED, scheduler)
@trainer.on(Events.ITERATION_COMPLETED)
def print_lr():
print(optimizer.param_groups[0]["lr"])
_ = trainer.run([0] * 9, max_epochs=1)
|
@yuta0821 sorry for delay on this PR, i'm still missing a complete understanding of the problem here. I'll try to figure out what happens exactly on my side and try to suggest a way to handle that. Adding an argument for internal usage is not a good design. |
No, sir, not at all. Thanks a lot for your help. If I can contribute in any way, I'll be happy to do so. |
@yuta0821 i understand the problem and why you need to introduce
or 2) I see here two ways to move forward:
lr_scheduler = LRScheduler(lr_scheduler, save_history=save_history)
lr_scheduler.lr_scheduler.last_epoch += 1 |
Thank you for your analysis. Your summary is exactly what I was thinking. |
@yuta0821 I'm trying to figure out how to merge your work to master step by step. |
@vfdev-5 Thank you for your reply ! |
…_LRScheduler_attach_Events
52dcbcf
to
f593385
Compare
@yuta0821 I pushed few commits to make this PR merged. Basically, it is one of your suggestions. Currently, I do not have a better way to fix it. Let's keep it as it is. |
Thank you for your review.
I understand. If I come up with a better way to fix it, I will create a new issue. |
Fixes #2465
Description:
Based on discussion in the #2465, I updated the code so that the LRScheduler will attach to Events.STARTED.
Main Contribution
use_legacy
on class LRScheduler and explanation of it__call__
methodsimulate_values
such as change the timing of__call__
method of scheduleruse_legacy
worksMatters for consultation
Currently implementing the following part using
use_legacy = True
, but feel it is not a very smart implemantation.However, in the current implementation, we need to set the initial learning rate at the end of each milestone, and the change for issue #2465 in implementation will probably result in one extra initial learning rate. I couldn't think of an easy way to fix this. I think if don't take
use_legacy=True
, probably may need to change the code in many places.I ran the following test and pass them
Check list: