Why not save frozen params unless: `self.zero_optimization_stage() >= ZeroStageEnum.gradients`? #5439

freckletonj · 2024-04-19T22:59:52Z

I've spent 2 days drilling into why my frozen params aren't getting saved, and it comes down to this line:

https://github.com/microsoft/DeepSpeed/blob/c632ea09f8d107d10f76aa2b776e4df3c1ccf98a/deepspeed/runtime/engine.py#L3297C1-L3297C107

        save_frozen_param = self.zero_optimization_partition_gradients() and not exclude_frozen_parameters

exclude_frozen_parameters is therefore misleading, since that is not the only determinant of whether frozen params get saved.

To make matters more confusing, I am using deepspeed 2, but if I make a breakpoint in that zero_optimization_partiotion_gradients function, I see:

(Pdb) self.zero_optimization_stage()
1
(Pdb) ZeroStageEnum.gradients
<ZeroStageEnum.gradients: 2>

Why is this, and is there a straightforward non-hacky solution to get frozen params to save?

The text was updated successfully, but these errors were encountered:

tjruwase · 2024-04-19T23:21:36Z

@freckletonj, thanks for reporting this issue. I agree it is quite confusing, sorry about that. Unfortunately, I can't remember the rationale for including self.zero_optimization_partition_gradients() in the conditional logic.

Can you please clarify what you mean by "deepspeed 2"? Do you mean you are using zero stage 2? Can you please share your ds_config? Your breakpoint printout suggests that you are running zero stage 1.

freckletonj · 2024-04-19T23:33:41Z

@tjruwase thanks for the fast response!

yes I'm using zero stage 2 via pytorch lightning, with a config.yaml:

trainer:
  accelerator: gpu
  devices: auto
  num_nodes: 1
  strategy: deepspeed_stage_2
...

I was surprised to see the breakpoint print that i'm in stage 1, but i think that's a separate issue from the confusing conditional logic.

And there's a chance I'm just going about this all wrong, I'm new to both lightning and deepspeed, so, forgive me I'm probably overlooking something important :)

To clarify, my only concern is how to save frozen params along with the model.

Some more background: I'm working on the RWKV project, a fork, where they save the weights with a copy of zero_to_fp32.py.

I've added a hack into this file to keep the params, which do live at the state dict's 'module' key, but not under FROZEN_PARAM_SHAPES, where they'd get picked up automatically: RWKV/RWKV-infctx-trainer@51f9173#diff-d1b1e811618e950083898fd2b934639a17307a0d339ee61aa96f3d7539463e26R142

I've also tried lightning's version of this function, but it also drops the frozen params: https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.utilities.deepspeed.html#lightning.pytorch.utilities.deepspeed.convert_zero_checkpoint_to_fp32_state_dict

tjruwase · 2024-05-13T13:46:47Z

Some more background: I'm working on the RWKV project, a fork, where they save the weights with a copy of zero_to_fp32.py.

@freckletonj, apologies for the delayed response here. Is this the RWKV project? https://github.com/BlinkDL/RWKV-LM.

Can you please share your current status? Can you provide repro steps for us? Thanks!

tjruwase assigned jomayeri and samadejacobs and unassigned jomayeri May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why not save frozen params unless: `self.zero_optimization_stage() >= ZeroStageEnum.gradients`? #5439

Why not save frozen params unless: `self.zero_optimization_stage() >= ZeroStageEnum.gradients`? #5439

freckletonj commented Apr 19, 2024

tjruwase commented Apr 19, 2024

freckletonj commented Apr 19, 2024

tjruwase commented May 13, 2024

Why not save frozen params unless: self.zero_optimization_stage() >= ZeroStageEnum.gradients? #5439

Why not save frozen params unless: self.zero_optimization_stage() >= ZeroStageEnum.gradients? #5439

Comments

freckletonj commented Apr 19, 2024

tjruwase commented Apr 19, 2024

freckletonj commented Apr 19, 2024

tjruwase commented May 13, 2024

Why not save frozen params unless: `self.zero_optimization_stage() >= ZeroStageEnum.gradients`? #5439

Why not save frozen params unless: `self.zero_optimization_stage() >= ZeroStageEnum.gradients`? #5439