[bugfix] fix seq_cls zero3#9190
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces DeepSpeed ZeRO-3 support for weight initialization in the sequence classification patching logic within swift/model/patcher.py. A review comment identifies a potential issue where the 'meta' device check is bypassed when DeepSpeed is enabled, which could lead to failures if the model is loaded on the meta device. The feedback suggests moving the device check outside the conditional block to ensure proper allocation and reduce code duplication.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a monkey-patch for ModulesToSaveWrapper to ensure that the ds_grads_remaining attribute is correctly propagated to underlying modules when DeepSpeed ZeRO-3 is enabled. Feedback was provided to improve the robustness of the _patch_modules_to_save_zero3 function by avoiding the shadowing of the __setattr__ magic method name and adding a safety check to verify the existence of modules_to_save before iteration.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request updates dependency versions for transformers and peft across documentation and requirement files, clarifies the min_lr calculation logic, and adjusts the WandB save directory. Additionally, it implements a patch for ModulesToSaveWrapper to support DeepSpeed ZeRO-3. A review comment pointed out a markdown formatting error in the documentation that should be addressed.
No description provided.