-
Notifications
You must be signed in to change notification settings - Fork 355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fix] Delete frozen parameters when using paramwise_cfg
#1441
Conversation
How about moving this logic deleting frozen parameters to mmengine/mmengine/optim/optimizer/default_constructor.py Lines 206 to 267 in ba5eed8
|
Good idea! Shall we delete the L216? mmengine/mmengine/optim/optimizer/default_constructor.py Lines 215 to 217 in ba5eed8
|
Yes, we can delete it.
|
paramwise_cfg
Hi, @zhouzaida |
Please fix the ut. |
f4d4d9c
to
3661fbd
Compare
Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. By the way, if you're not familiar with how to use pre-commit to fix lint issues or add unit tests, please refer to Contributing to OpenMMLab.
Motivation
It will cause errors when initializing DeepSpeed optimizer, with
paramwise_cfg
for optimizer to set different lr or weight_decay for different parametersThis is because that if setting
paramwise_cfg
, mmengine will treat each parameter (including frozen parameters) as a separate group, and that will lead to an empty list oftrainable_parameters
on the below code.https://github.com/microsoft/DeepSpeed/blob/2afa1c7f2f961ef18042a88467ff5d3373c22c07/deepspeed/runtime/zero/stage_1_and_2.py#L308-L313
Modification
mmengine/_strategy/deepspeed.py
BC-breaking (Optional)
Does the modification introduce changes that break the backward-compatibility of the downstream repos?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.
Use cases (Optional)
If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.
Checklist