[megatron] support gemma4 megatron by Jintao-Huang · Pull Request #9296 · modelscope/ms-swift

Jintao-Huang · 2026-05-09T07:51:30Z

gemini-code-assist

Code Review

This pull request updates the documentation to reflect support for Gemma 4 models and refactors embedding handling in Megatron utilities to support multiple modules during device conversion. In swift/model/models/gemma.py, a suggestion was made to use inputs_embeds.device instead of multimodal_mask.device when moving the pad_embedding tensor to ensure better robustness and consistency across operands in the subsequent torch.where call.

gemini-code-assist · 2026-05-09T07:53:12Z


        if self.config.get_text_config().hidden_size_per_layer_input:
            pad_embedding = self.language_model.embed_tokens.weight[self.config.text_config.pad_token_id, :]
+            pad_embedding = pad_embedding.to(multimodal_mask.device)


Using inputs_embeds.device as the target for the .to() call is generally more robust than multimodal_mask.device. Since inputs_embeds is the primary tensor representing the hidden states in this operation, it serves as the most reliable reference for the execution device, ensuring consistency across all operands in the subsequent torch.where call.

Suggested change

pad_embedding = pad_embedding.to(multimodal_mask.device)

pad_embedding = pad_embedding.to(inputs_embeds.device)

Jintao-Huang · 2026-05-19T08:38:19Z

/gemini review

gemini-code-assist

Code Review

This pull request updates the documentation to reflect support for Gemma-4 models, adds a Megatron SFT training script, and refactors device conversion utilities. Review feedback identifies a potential runtime error in the training script due to inconsistent parallelism configurations where the Expert Parallel size exceeds the Data Parallel size. Additionally, improvements are suggested for the conversion utility to prevent a potential UnboundLocalError by initializing the data type variable and to enhance the readability of boolean logic through the use of parentheses.

gemini-code-assist · 2026-05-19T08:40:02Z

+    --tensor_model_parallel_size 2 \
+    --expert_model_parallel_size 4 \
+    --pipeline_model_parallel_size 2 \


The parallelism configuration appears inconsistent with the total number of GPUs (NPROC_PER_NODE=8). With tensor_model_parallel_size=2 and pipeline_model_parallel_size=2, the Data Parallel (DP) size is calculated as 8 / (2 * 2) = 2. In Megatron-Core, the Expert Parallel (EP) size (expert_model_parallel_size) must typically be less than or equal to the DP size (EP <= DP). Setting EP=4 while DP=2 will likely result in a runtime error during model initialization.

gemini-code-assist · 2026-05-19T08:40:02Z

+    for module in modules:
+        try:
+            origin_torch_dtype = next(module.parameters()).dtype
+        except StopIteration:
+            pass
+        else:
+            break


The variable origin_torch_dtype is not initialized before the loop. If the modules list is empty or if none of the modules contain parameters (causing StopIteration in every iteration), origin_torch_dtype will remain undefined. This will lead to an UnboundLocalError when it is accessed later in the _to_cpu_hook function (line 85). Initializing it to None provides a safe fallback, as module.to(dtype=None) is a no-op in PyTorch.

Suggested change

for module in modules:

try:

origin_torch_dtype = next(module.parameters()).dtype

except StopIteration:

pass

else:

break

origin_torch_dtype = None

for module in modules:

try:

origin_torch_dtype = next(module.parameters()).dtype

break

except StopIteration:

pass

gemini-code-assist · 2026-05-19T08:40:02Z


    def _to_cpu_hook(module, args, output):
-        if share_embedding and module in embeddings:
+        if share_embedding and module in embeddings or 'rotaryemb' in module.__class__.__name__.lower():


The boolean expression relies on operator precedence (and before or), which can be error-prone and harder to read. Additionally, checking for 'rotaryemb' in the class name is a bit fragile. While string matching is often used in this context to avoid circular imports, adding parentheses would at least clarify the intended logic.

Suggested change

if share_embedding and module in embeddings or 'rotaryemb' in module.__class__.__name__.lower():

if (share_embedding and module in embeddings) or 'rotaryemb' in module.__class__.__name__.lower():

support gemma4 megatron

152593c

gemini-code-assist Bot reviewed May 9, 2026

View reviewed changes

Jintao-Huang added 3 commits May 11, 2026 14:27

Merge branch 'main' into support_gemma4_megatron

5086cdb

Merge branch 'main' into support_gemma4_megatron

5c9f86b

Merge branch 'main' into support_gemma4_megatron

e732b71

hjh0119 approved these changes May 18, 2026

View reviewed changes

Jintao-Huang added 5 commits May 19, 2026 01:59

Merge branch 'main' into support_gemma4_megatron

f37f967

update

d7c2f28

update

96ff166

update

fd783fc

update

c8aa1c1

Jintao-Huang mentioned this pull request May 19, 2026

[model] Support gemma4 modelscope/mcore-bridge#56

Merged

update

845ad34

gemini-code-assist Bot reviewed May 19, 2026

View reviewed changes

update

72a258d

Jintao-Huang merged commit 30c6799 into modelscope:main May 19, 2026
1 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[megatron] support gemma4 megatron#9296

[megatron] support gemma4 megatron#9296
Jintao-Huang merged 11 commits into
modelscope:mainfrom
Jintao-Huang:support_gemma4_megatron

Jintao-Huang commented May 9, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 9, 2026

Uh oh!

Jintao-Huang commented May 19, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 19, 2026

Uh oh!

gemini-code-assist Bot May 19, 2026

Uh oh!

gemini-code-assist Bot May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	pad_embedding = pad_embedding.to(multimodal_mask.device)
	pad_embedding = pad_embedding.to(inputs_embeds.device)

	if share_embedding and module in embeddings or 'rotaryemb' in module.__class__.__name__.lower():
	if (share_embedding and module in embeddings) or 'rotaryemb' in module.__class__.__name__.lower():

Conversation

Jintao-Huang commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

Jintao-Huang commented May 19, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Jintao-Huang commented May 9, 2026 •

edited

Loading