[model] support qwen3_next gdn by Jintao-Huang · Pull Request #76 · modelscope/mcore-bridge

Jintao-Huang · 2026-05-14T02:56:58Z

No description provided.

gemini-code-assist

Code Review

This pull request introduces Gated Delta Network (GDN) support for Qwen3 models by adding Qwen3NextGDNBridge and Qwen3NextLoader, and refactoring qwen3_5_gdn to inherit from these new components. Review feedback suggests removing the redundant Qwen3NextGatedDeltaNet subclass in favor of using GatedDeltaNet directly and centralizing the USE_MCORE_GDN environment variable retrieval to reduce code duplication across multiple files.

Jintao-Huang · 2026-05-14T07:20:31Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces Gated Delta Net (GDN) support for Qwen3 models, including new bridge and loader implementations and a refactor of Qwen3_5 models to utilize shared GDN logic. Additionally, the codebase is updated to rename CustomTransformerBlock and CustomTransformerLayer to TransformerBlock and TransformerLayer for consistency. Review feedback identifies a bug in the TransformerLayer initialization where an incorrect super() call skips the parent class's constructor, and suggests using torch.equal for more robust tensor comparisons in the GDN bridge.

Jintao-Huang · 2026-05-14T07:47:52Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces support for the Qwen3 Gated Delta Net (GDN) architecture, including new bridge and loader classes for both text and multimodal variants. It also refactors several core modules by renaming 'Custom' prefixed classes to more standard names and streamlining imports. Key feedback includes a critical bug in TransformerLayer where the use of super() with the wrong class name skips the parent's initialization logic. Additionally, it is recommended to replace runtime assert statements with explicit exception handling and to avoid hardcoding 'cuda' as a device to ensure better environment compatibility.

gemini-code-assist · 2026-05-14T07:49:32Z

    ):
        self.submodules_config = submodules
-        super(TransformerLayer, self).__init__(config=config, vp_stage=vp_stage)
+        super(McoreTransformerLayer, self).__init__(config=config, vp_stage=vp_stage)


The call super(McoreTransformerLayer, self).__init__ invokes the constructor of the grandparent class (the parent of McoreTransformerLayer), effectively skipping the initialization logic of McoreTransformerLayer. This is problematic because McoreTransformerLayer performs essential setup (such as setting self.config, self.layer_number, etc.). Furthermore, if the grandparent class (typically MegatronModule) does not accept the vp_stage keyword argument, this will result in a TypeError at runtime. If the intention is to initialize the parent class, use super().__init__(...) or super(TransformerLayer, self).__init__(...).

gemini-code-assist · 2026-05-14T07:49:32Z

+                assert (lora_A == hf_state_dict['in_proj_ba.lora_A.weight'].load()).all(), \
+                       'Need to ensure QKVZBA\'s lora_A are consistent'


Using assert for runtime validation is discouraged as assertions can be optimized away in production environments (e.g., when running Python with -O). It is better to use an explicit if check and raise a ValueError with a descriptive message.

Suggested change

assert (lora_A == hf_state_dict['in_proj_ba.lora_A.weight'].load()).all(), \

'Need to ensure QKVZBA\'s lora_A are consistent'

if not (lora_A == hf_state_dict['in_proj_qkvz.lora_A.weight'].load()).all():

raise ValueError('Need to ensure QKVZBA\'s lora_A are consistent')

gemini-code-assist · 2026-05-14T07:49:32Z

+            qkvz_dim = key_dim * 2 + value_dim * 2
+            is_lora = False if mg_attn is None else isinstance(mg_attn.in_proj,
+                                                               LoraParallelLinear) and self._peft_format
+            is_lora = torch.tensor([is_lora], dtype=torch.bool, device='cuda')


Hardcoding device='cuda' can cause issues in environments without GPU support or when running on other accelerators (e.g., NPU). It is safer to determine the device dynamically from the environment or existing tensors.

Suggested change

is_lora = torch.tensor([is_lora], dtype=torch.bool, device='cuda')

is_lora = torch.tensor([is_lora], dtype=torch.bool, device=torch.cuda.current_device() if torch.cuda.is_available() else 'cpu')

gemini-code-assist · 2026-05-14T07:49:32Z

+        lm_model = model.language_model if hasattr(model, 'language_model') else model
+        for layer in lm_model.decoder.layers:
+            if hasattr(layer.self_attention, 'out_norm'):
+                assert hasattr(layer.self_attention.out_norm, 'zero_centered_gamma')


Avoid using assert for critical runtime checks in model building logic, as it can be optimized away. Use an explicit check and raise an appropriate exception.

Suggested change

assert hasattr(layer.self_attention.out_norm, 'zero_centered_gamma')

if not hasattr(layer.self_attention.out_norm, 'zero_centered_gamma'):

raise AttributeError("layer.self_attention.out_norm missing 'zero_centered_gamma' attribute")

support qwen3_next gdn

0d5e495

gemini-code-assist Bot reviewed May 14, 2026

View reviewed changes

Comment thread src/mcore_bridge/model/gpts/qwen3_next_gdn.py Outdated

Comment thread src/mcore_bridge/model/gpts/qwen3_next.py

Jintao-Huang added 2 commits May 14, 2026 13:32

update

ee0c5c5

fix

24c758a

gemini-code-assist Bot reviewed May 14, 2026

View reviewed changes

Comment thread src/mcore_bridge/model/modules/transformer_layer.py

Comment thread src/mcore_bridge/model/gpts/qwen3_next_gdn.py

hjh0119 approved these changes May 14, 2026

View reviewed changes

fix

b3ea4ca

gemini-code-assist Bot reviewed May 14, 2026

View reviewed changes

fix

194f423

Jintao-Huang merged commit 85df30c into modelscope:main May 14, 2026
1 check passed

Jintao-Huang mentioned this pull request May 25, 2026

Qwen3-Next系列模型是不是从架构上支持不了packing modelscope/ms-swift#7981

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[model] support qwen3_next gdn#76

[model] support qwen3_next gdn#76
Jintao-Huang merged 5 commits into
modelscope:mainfrom
Jintao-Huang:support_qwen3_next_gdn

Jintao-Huang commented May 14, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Jintao-Huang commented May 14, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Jintao-Huang commented May 14, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 14, 2026

Uh oh!

gemini-code-assist Bot May 14, 2026

Uh oh!

gemini-code-assist Bot May 14, 2026

Uh oh!

gemini-code-assist Bot May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		assert (lora_A == hf_state_dict['in_proj_ba.lora_A.weight'].load()).all(), \
		'Need to ensure QKVZBA\'s lora_A are consistent'

	is_lora = torch.tensor([is_lora], dtype=torch.bool, device='cuda')
	is_lora = torch.tensor([is_lora], dtype=torch.bool, device=torch.cuda.current_device() if torch.cuda.is_available() else 'cpu')

	assert hasattr(layer.self_attention.out_norm, 'zero_centered_gamma')
	if not hasattr(layer.self_attention.out_norm, 'zero_centered_gamma'):
	raise AttributeError("layer.self_attention.out_norm missing 'zero_centered_gamma' attribute")

Conversation

Jintao-Huang commented May 14, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Jintao-Huang commented May 14, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Jintao-Huang commented May 14, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants