[Fix] Delete frozen parameters when using `paramwise_cfg` #1441

LZHgrla · 2023-11-27T05:19:27Z

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. By the way, if you're not familiar with how to use pre-commit to fix lint issues or add unit tests, please refer to Contributing to OpenMMLab.

Motivation

It will cause errors when initializing DeepSpeed optimizer, with

Freezing some parameters of the model
Setting paramwise_cfg for optimizer to set different lr or weight_decay for different parameters

This is because that if setting paramwise_cfg, mmengine will treat each parameter (including frozen parameters) as a separate group, and that will lead to an empty list of trainable_parameters on the below code.

https://github.com/microsoft/DeepSpeed/blob/2afa1c7f2f961ef18042a88467ff5d3373c22c07/deepspeed/runtime/zero/stage_1_and_2.py#L308-L313

Modification

mmengine/_strategy/deepspeed.py

BC-breaking (Optional)

Does the modification introduce changes that break the backward-compatibility of the downstream repos?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.

Use cases (Optional)

If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.

Checklist

Pre-commit or other linting tools are used to fix the potential lint issues.
The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
If the modification has potential influence on downstream projects, this PR should be tested with downstream projects, like MMDetection or MMPretrain.
The documentation has been modified accordingly, like docstring or example tutorials.

zhouzaida · 2024-02-18T06:44:37Z

How about moving this logic deleting frozen parameters to DefaultOptimWrapperConstructor.

mmengine/mmengine/optim/optimizer/default_constructor.py

Lines 206 to 267 in ba5eed8

    
           for name, param in module.named_parameters(recurse=False): 
        
               param_group = {'params': [param]} 
        
               if bypass_duplicate and self._is_in(param_group, params): 
        
                   print_log( 
        
                       f'{prefix} is duplicate. It is skipped since ' 
        
                       f'bypass_duplicate={bypass_duplicate}', 
        
                       logger='current', 
        
                       level=logging.WARNING) 
        
                   continue 
        
               if not param.requires_grad: 
        
                   params.append(param_group) 
        
                   continue 
        
               # if the parameter match one of the custom keys, ignore other rules 
        
               is_custom = False 
        
               for key in sorted_keys: 
        
                   if key in f'{prefix}.{name}': 
        
                       is_custom = True 
        
                       lr_mult = custom_keys[key].get('lr_mult', 1.) 
        
                       param_group['lr'] = self.base_lr * lr_mult 
        
                       if self.base_wd is not None: 
        
                           decay_mult = custom_keys[key].get('decay_mult', 1.) 
        
                           param_group['weight_decay'] = self.base_wd * decay_mult 
        
                       # add custom settings to param_group 
        
                       for k, v in custom_keys[key].items(): 
        
                           param_group[k] = v 
        
                       break 
        
               if not is_custom: 
        
                   # bias_lr_mult affects all bias parameters 
        
                   # except for norm.bias dcn.conv_offset.bias 
        
                   if name == 'bias' and not ( 
        
                           is_norm or is_dcn_module) and bias_lr_mult is not None: 
        
                       param_group['lr'] = self.base_lr * bias_lr_mult 
        
                   if (prefix.find('conv_offset') != -1 and is_dcn_module 
        
                           and dcn_offset_lr_mult is not None 
        
                           and isinstance(module, torch.nn.Conv2d)): 
        
                       # deal with both dcn_offset's bias & weight 
        
                       param_group['lr'] = self.base_lr * dcn_offset_lr_mult 
        
                   # apply weight decay policies 
        
                   if self.base_wd is not None: 
        
                       # norm decay 
        
                       if is_norm and norm_decay_mult is not None: 
        
                           param_group[ 
        
                               'weight_decay'] = self.base_wd * norm_decay_mult 
        
                       # bias lr and decay 
        
                       elif (name == 'bias' and not is_dcn_module 
        
                             and bias_decay_mult is not None): 
        
                           param_group[ 
        
                               'weight_decay'] = self.base_wd * bias_decay_mult 
        
                       # depth-wise conv 
        
                       elif is_dwconv and dwconv_decay_mult is not None: 
        
                           param_group[ 
        
                               'weight_decay'] = self.base_wd * dwconv_decay_mult 
        
                       # flatten parameters except dcn offset 
        
                       elif (param.ndim == 1 and not is_dcn_module 
        
                             and flat_decay_mult is not None): 
        
                           param_group[ 
        
                               'weight_decay'] = self.base_wd * flat_decay_mult 
        
               params.append(param_group)

LZHgrla · 2024-02-18T07:02:18Z

How about moving this logic deleting frozen parameters to DefaultOptimWrapperConstructor.

mmengine/mmengine/optim/optimizer/default_constructor.py

Lines 206 to 267 in ba5eed8

for name, param in module.named_parameters(recurse=False):

param_group = {'params': [param]}

if bypass_duplicate and self._is_in(param_group, params):

print_log(

f'{prefix} is duplicate. It is skipped since '

f'bypass_duplicate={bypass_duplicate}',

logger='current',

level=logging.WARNING)

continue

if not param.requires_grad:

params.append(param_group)

continue

# if the parameter match one of the custom keys, ignore other rules

is_custom = False

for key in sorted_keys:

if key in f'{prefix}.{name}':

is_custom = True

lr_mult = custom_keys[key].get('lr_mult', 1.)

param_group['lr'] = self.base_lr * lr_mult

if self.base_wd is not None:

decay_mult = custom_keys[key].get('decay_mult', 1.)

param_group['weight_decay'] = self.base_wd * decay_mult

# add custom settings to param_group

for k, v in custom_keys[key].items():

param_group[k] = v

break

if not is_custom:

# bias_lr_mult affects all bias parameters

# except for norm.bias dcn.conv_offset.bias

if name == 'bias' and not (

is_norm or is_dcn_module) and bias_lr_mult is not None:

param_group['lr'] = self.base_lr * bias_lr_mult

if (prefix.find('conv_offset') != -1 and is_dcn_module

and dcn_offset_lr_mult is not None

and isinstance(module, torch.nn.Conv2d)):

# deal with both dcn_offset's bias & weight

param_group['lr'] = self.base_lr * dcn_offset_lr_mult

# apply weight decay policies

if self.base_wd is not None:

# norm decay

if is_norm and norm_decay_mult is not None:

param_group[

'weight_decay'] = self.base_wd * norm_decay_mult

# bias lr and decay

elif (name == 'bias' and not is_dcn_module

and bias_decay_mult is not None):

param_group[

'weight_decay'] = self.base_wd * bias_decay_mult

# depth-wise conv

elif is_dwconv and dwconv_decay_mult is not None:

param_group[

'weight_decay'] = self.base_wd * dwconv_decay_mult

# flatten parameters except dcn offset

elif (param.ndim == 1 and not is_dcn_module

and flat_decay_mult is not None):

param_group[

'weight_decay'] = self.base_wd * flat_decay_mult

params.append(param_group)

Good idea!

Shall we delete the L216?

mmengine/mmengine/optim/optimizer/default_constructor.py

Lines 215 to 217 in ba5eed8

    
           if not param.requires_grad: 
        
               params.append(param_group) 
        
               continue

zhouzaida · 2024-02-18T09:23:45Z

Yes, we can delete it.

How about moving this logic deleting frozen parameters to DefaultOptimWrapperConstructor.

mmengine/mmengine/optim/optimizer/default_constructor.py

Lines 206 to 267 in ba5eed8

for name, param in module.named_parameters(recurse=False):

param_group = {'params': [param]}

if bypass_duplicate and self._is_in(param_group, params):

print_log(

f'{prefix} is duplicate. It is skipped since '

f'bypass_duplicate={bypass_duplicate}',

logger='current',

level=logging.WARNING)

continue

if not param.requires_grad:

params.append(param_group)

continue

# if the parameter match one of the custom keys, ignore other rules

is_custom = False

for key in sorted_keys:

if key in f'{prefix}.{name}':

is_custom = True

lr_mult = custom_keys[key].get('lr_mult', 1.)

param_group['lr'] = self.base_lr * lr_mult

if self.base_wd is not None:

decay_mult = custom_keys[key].get('decay_mult', 1.)

param_group['weight_decay'] = self.base_wd * decay_mult

# add custom settings to param_group

for k, v in custom_keys[key].items():

param_group[k] = v

break

if not is_custom:

# bias_lr_mult affects all bias parameters

# except for norm.bias dcn.conv_offset.bias

if name == 'bias' and not (

is_norm or is_dcn_module) and bias_lr_mult is not None:

param_group['lr'] = self.base_lr * bias_lr_mult

if (prefix.find('conv_offset') != -1 and is_dcn_module

and dcn_offset_lr_mult is not None

and isinstance(module, torch.nn.Conv2d)):

# deal with both dcn_offset's bias & weight

param_group['lr'] = self.base_lr * dcn_offset_lr_mult

# apply weight decay policies

if self.base_wd is not None:

# norm decay

if is_norm and norm_decay_mult is not None:

param_group[

'weight_decay'] = self.base_wd * norm_decay_mult

# bias lr and decay

elif (name == 'bias' and not is_dcn_module

and bias_decay_mult is not None):

param_group[

'weight_decay'] = self.base_wd * bias_decay_mult

# depth-wise conv

elif is_dwconv and dwconv_decay_mult is not None:

param_group[

'weight_decay'] = self.base_wd * dwconv_decay_mult

# flatten parameters except dcn offset

elif (param.ndim == 1 and not is_dcn_module

and flat_decay_mult is not None):

param_group[

'weight_decay'] = self.base_wd * flat_decay_mult

params.append(param_group)

Good idea!

Shall we delete the L216?

mmengine/mmengine/optim/optimizer/default_constructor.py

Lines 215 to 217 in ba5eed8

if not param.requires_grad:

params.append(param_group)

continue

LZHgrla · 2024-02-19T03:34:20Z

Hi, @zhouzaida
I have fixed it!
Ready for review and merge.

zhouzaida · 2024-02-19T04:56:34Z

Hi, @zhouzaida I have fixed it! Ready for review and merge.

Please fix the ut.

fix

590824c

LZHgrla requested review from zhouzaida and HAOCHENYE as code owners November 27, 2023 05:19

LZHgrla changed the title ~~[Fix] Delete the freezing parameters from the DeepSpeed optimizer~~ [Fix] Delete the frozen parameters from the DeepSpeed optimizer Nov 27, 2023

LZHgrla changed the title ~~[Fix] Delete the frozen parameters from the DeepSpeed optimizer~~ [Fix] Delete frozen parameters from the DeepSpeed optimizer Nov 27, 2023

LZHgrla mentioned this pull request Nov 27, 2023

[Feature] Support LLaVA InternLM/xtuner#196

Merged

16 tasks

LZHgrla marked this pull request as draft November 27, 2023 16:20

Merge branch 'open-mmlab:main' into lzh/fix_ds_optimizer

6e3bb64

LZHgrla marked this pull request as ready for review February 4, 2024 05:29

LZHgrla added 2 commits February 19, 2024 11:21

update

768f3f9

update

43ffda5

LZHgrla changed the title ~~[Fix] Delete frozen parameters from the DeepSpeed optimizer~~ [Fix] Delete frozen parameters when using paramwise_cfg Feb 19, 2024

update

3661fbd

LZHgrla force-pushed the lzh/fix_ds_optimizer branch from f4d4d9c to 3661fbd Compare February 25, 2024 16:24

Minor refine

149b950

zhouzaida approved these changes Apr 22, 2024

View reviewed changes

zhouzaida merged commit acbc5e4 into open-mmlab:main Apr 22, 2024
16 of 20 checks passed

zhouzaida mentioned this pull request Apr 28, 2024

DeepSpeed2 不能自动排除冻结的参数[Bug] #1518

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] Delete frozen parameters when using `paramwise_cfg` #1441

[Fix] Delete frozen parameters when using `paramwise_cfg` #1441

LZHgrla commented Nov 27, 2023 •

edited

Loading

zhouzaida commented Feb 18, 2024

LZHgrla commented Feb 18, 2024 •

edited

Loading

zhouzaida commented Feb 18, 2024

LZHgrla commented Feb 19, 2024

zhouzaida commented Feb 19, 2024

[Fix] Delete frozen parameters when using paramwise_cfg #1441

[Fix] Delete frozen parameters when using paramwise_cfg #1441

Conversation

LZHgrla commented Nov 27, 2023 • edited Loading

Motivation

Modification

BC-breaking (Optional)

Use cases (Optional)

Checklist

zhouzaida commented Feb 18, 2024

LZHgrla commented Feb 18, 2024 • edited Loading

zhouzaida commented Feb 18, 2024

LZHgrla commented Feb 19, 2024

zhouzaida commented Feb 19, 2024

[Fix] Delete frozen parameters when using `paramwise_cfg` #1441

[Fix] Delete frozen parameters when using `paramwise_cfg` #1441

LZHgrla commented Nov 27, 2023 •

edited

Loading

LZHgrla commented Feb 18, 2024 •

edited

Loading