Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BC-breaking] Feature/#2465 lr scheduler attach events #2496
[BC-breaking] Feature/#2465 lr scheduler attach events #2496
Changes from 3 commits
0e3feda
e95a902
1c74efc
074214e
b00d3d1
1df9b30
90fe396
bfb0fcb
56004eb
cfbc5ae
7d02ec7
6aef7ef
d89b54e
a62a941
bd7cebf
56be4c5
aee5be4
f593385
c15797a
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we check if we could use
use_legacy=False
? Or what is blocking here ?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if
use_legacy=False
, the result look like the example belowThe initial value of stepLR (= 0.1 or 0.01) is counted one extra time than step_size (=3).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the details @yuta0821 , can you please debug this a bit more and explicitly say which scheduler is responsible for adding LR value (0.1 or 0.01). I'm not quite sure to understand why exactly this happens. Thanks !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vfdev-5 Thanks a lot for your comment !
Consider the case where
warmup_start_value=0.0, warmup_end_value=0.1, warmup_duration=3
.In this case,
milestones_values = [(0, 0.0), (2, 0.1)]
If the initial value of
lr
in theoptimizer
is different from thewarmup_end_value
, it is necessary to add the initial value to the end ofmilestones
. Therefore,milestones_values = [(0, 0.0), (2, 0.1), (3, initial value of lr)]
This is because the
LRScheduler
updates thelr
starting from the last value of themilestones_values
.After that the following code is executed, resulting in repeating the initial value of
lr
.Even if the initial value of
lr
in theoptimizer
is equal to thewarmup_end_value
, then the initial value oflr
will be called extra once.In the end, since the first
__call__
method ofLRScheduler
runs with reference to the last value ofmilestones_values
, the last value ofmilestones_values
plus the initial value ofLRScheduler
are duplicated.If we adjust this bug without
use_legacy=False
, we may have to change a lot of code such like one related to thePeacewiseLinear
, which is beyond the scope of this PR.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yuta0821 sorry for delay and thanks for the explanation! I haven't checked it in details but will do as soon as it could be possible from my side (~4-5 days).
@sdesrozis can you help with that if you have some bandwidth ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It performs the same operation as
use_legacy
, but wouldn't it be preferable to add the argumentskip_initial_value
as a variable to be used for the internal function ?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, expected rather than excepted 😅
As far I understand, using
use_legacy=False
, the firstlr
comes from the optimizer. Whatever schedulers used, the schedulers concatenation will produce a repetition at each joint.Having an internal option as you suggested sounds good to me. I mean rename
use_legacy
toskip_initial_value
is fine. Although, we have to keepuse_legacy
for the users.What do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sdesrozis
I am sorry for having consulted with you.
It seems that this problem can be solved by setting the internal variable
keep_first_value=True
only whencreate_lr_scheduler_with_warmup
is called, as shown below, to store the initial value inLRScheduler
.I am running the existing test now. I will commit once all tests pass ! -> Done !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the tests are ko...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I added
type: ignore[attr-defined]
!