Skip to content

[bugfix] fix gdn sharded_state_dict lora#23

Merged
Jintao-Huang merged 8 commits intomodelscope:mainfrom
Jintao-Huang:fix_gdn_sharded_state_dict_lora
Apr 14, 2026
Merged

[bugfix] fix gdn sharded_state_dict lora#23
Jintao-Huang merged 8 commits intomodelscope:mainfrom
Jintao-Huang:fix_gdn_sharded_state_dict_lora

Conversation

@Jintao-Huang
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the sharded_state_dict method in the GatedDeltaNet module to support distributed checkpointing. The implementation includes logic for sharding parameters and submodules, specifically handling conv1d and in_proj layers with tensor parallel sharding and tensor splitting. A critical issue was identified regarding the use of an undefined attribute self.conv_dim_local_tp in assertions, which should be replaced with a locally calculated variable.

Comment thread src/mcore_bridge/model/modules/gated_delta_net.py Outdated
@Jintao-Huang Jintao-Huang merged commit abc0a18 into modelscope:main Apr 14, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants