Skip to content

Support zero3 hierarchical gather in the ref sync callback#9170

Merged
hjh0119 merged 4 commits into
modelscope:mainfrom
hjh0119:sync-ref
Apr 25, 2026
Merged

Support zero3 hierarchical gather in the ref sync callback#9170
hjh0119 merged 4 commits into
modelscope:mainfrom
hjh0119:sync-ref

Conversation

@hjh0119
Copy link
Copy Markdown
Collaborator

@hjh0119 hjh0119 commented Apr 21, 2026

fix #8095

@hjh0119 hjh0119 changed the title Support hierarchical gather in the ref sync callback Support zero3 hierarchical gather in the ref sync callback Apr 21, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements reference model weight synchronization for RLHF trainers. It introduces a SyncRefModelCallback and a _sync_ref_model_weights method within the RolloutTrainerMixin to handle weight mixing, including support for DeepSpeed ZeRO-3. Feedback indicates that the initialization of parameter groups happens too early, which may lead to incorrect configurations when LoRA is enabled. Additionally, the synchronization method contains performance inefficiencies due to redundant dictionary creations and iterations inside loops.

Comment thread swift/rlhf_trainers/rollout_mixin.py Outdated
Comment thread swift/rlhf_trainers/rollout_mixin.py Outdated
@hjh0119
Copy link
Copy Markdown
Collaborator Author

hjh0119 commented Apr 21, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a custom SyncRefModelCallback and a _sync_ref_model_weights method within RolloutTrainerMixin to support reference model weight synchronization during training. It also refactors the initialization of parameter groups to ensure they are available for this process. The review feedback suggests improving the robustness of the synchronization logic by ensuring empty parameter groups result in a no-op rather than defaulting to all parameters, and by enhancing error diagnostics to identify specific missing parameters when using DeepSpeed.

Comment thread swift/rlhf_trainers/rollout_mixin.py Outdated
Comment thread swift/rlhf_trainers/rollout_mixin.py Outdated
@hjh0119
Copy link
Copy Markdown
Collaborator Author

hjh0119 commented Apr 21, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a custom SyncRefModelCallback and the _sync_ref_model_weights method within RolloutTrainerMixin to support reference model weight synchronization. It also moves the initialization of parameter groups to prepare_rollout to ensure availability regardless of vLLM usage. A review comment suggests refactoring the _sync_ref_model_weights method to reduce code duplication between the DeepSpeed ZeRO-3 and standard execution paths.

Comment thread swift/rlhf_trainers/rollout_mixin.py Outdated
@hjh0119 hjh0119 merged commit 9681664 into modelscope:main Apr 25, 2026
2 of 3 checks passed
@hjh0119 hjh0119 deleted the sync-ref branch April 25, 2026 06:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Qwen3-VL-32B,GRPO训练,sync_ref_model开启时,OOM

2 participants