You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@freckletonj, thanks for reporting this issue. I agree it is quite confusing, sorry about that. Unfortunately, I can't remember the rationale for including self.zero_optimization_partition_gradients() in the conditional logic.
Can you please clarify what you mean by "deepspeed 2"? Do you mean you are using zero stage 2? Can you please share your ds_config? Your breakpoint printout suggests that you are running zero stage 1.
I was surprised to see the breakpoint print that i'm in stage 1, but i think that's a separate issue from the confusing conditional logic.
And there's a chance I'm just going about this all wrong, I'm new to both lightning and deepspeed, so, forgive me I'm probably overlooking something important :)
To clarify, my only concern is how to save frozen params along with the model.
Some more background: I'm working on the RWKV project, a fork, where they save the weights with a copy of zero_to_fp32.py.
I've spent 2 days drilling into why my frozen params aren't getting saved, and it comes down to this line:
https://github.com/microsoft/DeepSpeed/blob/c632ea09f8d107d10f76aa2b776e4df3c1ccf98a/deepspeed/runtime/engine.py#L3297C1-L3297C107
exclude_frozen_parameters
is therefore misleading, since that is not the only determinant of whether frozen params get saved.To make matters more confusing, I am using deepspeed 2, but if I make a breakpoint in that
zero_optimization_partiotion_gradients
function, I see:Why is this, and is there a straightforward non-hacky solution to get frozen params to save?
The text was updated successfully, but these errors were encountered: