Skip to content

[bugfix] fix seq_cls zero3#9190

Merged
Jintao-Huang merged 10 commits into
modelscope:mainfrom
Jintao-Huang:fix_seq_cls_zero3
Apr 23, 2026
Merged

[bugfix] fix seq_cls zero3#9190
Jintao-Huang merged 10 commits into
modelscope:mainfrom
Jintao-Huang:fix_seq_cls_zero3

Conversation

@Jintao-Huang
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces DeepSpeed ZeRO-3 support for weight initialization in the sequence classification patching logic within swift/model/patcher.py. A review comment identifies a potential issue where the 'meta' device check is bypassed when DeepSpeed is enabled, which could lead to failures if the model is loaded on the meta device. The feedback suggests moving the device check outside the conditional block to ensure proper allocation and reduce code duplication.

Comment thread swift/model/patcher.py Outdated
@Jintao-Huang
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a monkey-patch for ModulesToSaveWrapper to ensure that the ds_grads_remaining attribute is correctly propagated to underlying modules when DeepSpeed ZeRO-3 is enabled. Feedback was provided to improve the robustness of the _patch_modules_to_save_zero3 function by avoiding the shadowing of the __setattr__ magic method name and adding a safety check to verify the existence of modules_to_save before iteration.

Comment thread swift/pipelines/train/tuner.py Outdated
@Jintao-Huang
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates dependency versions for transformers and peft across documentation and requirement files, clarifies the min_lr calculation logic, and adjusts the WandB save directory. Additionally, it implements a patch for ModulesToSaveWrapper to support DeepSpeed ZeRO-3. A review comment pointed out a markdown formatting error in the documentation that should be addressed.

Comment thread docs/source/Megatron-SWIFT/Command-line-parameters.md Outdated
@Jintao-Huang Jintao-Huang merged commit a3127e4 into modelscope:main Apr 23, 2026
1 of 3 checks passed
Jintao-Huang added a commit that referenced this pull request Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants