[docs] update docs by Jintao-Huang · Pull Request #89 · modelscope/mcore-bridge

Jintao-Huang · 2026-05-21T12:53:56Z

No description provided.

gemini-code-assist

Code Review

This pull request updates the English and Chinese README files by removing environment-specific comments and adding a new 'Minimal forward example' section. This section provides a Python script demonstrating how to perform a forward pass and compute loss using Mcore-Bridge. The review feedback correctly identifies a bug in the example code: the get_mcore_model function returns a list of models, so attempting to call .cuda() directly on the list or passing it to the forward function will result in an AttributeError. The reviewer suggests extracting the first model from the list to resolve this issue.

Jintao-Huang · 2026-05-21T13:51:06Z

/gemini review

gemini-code-assist

Code Review

This pull request adds a 'Minimal forward example' section to both the English and Chinese README files, providing a Python script that demonstrates how to perform a forward pass and compute loss using Mcore-Bridge. The review feedback identifies several necessary improvements to the example code: replacing torch.roll with a proper concatenation and padding approach for label shifting in causal language modeling, improving the robustness of the LOCAL_RANK environment variable retrieval, and correcting the parallel configuration parameters to ensure the script runs correctly on the specified two-GPU setup.

gemini-code-assist · 2026-05-21T13:52:57Z

+        mg_inputs.pop(key, None)
+    if template.padding_free:
+        mg_inputs['packed_seq_params'] = get_packed_seq_params(text_position_ids)
+    mg_inputs['labels'] = torch.roll(mg_inputs['labels'], -1, dims=-1)


Using torch.roll to shift labels for causal language modeling is problematic because it wraps the first token's label to the last position. This results in an incorrect loss calculation for the final token of the sequence. It is better to shift the labels and pad the last position with the ignore index (-100).

Suggested change

mg_inputs['labels'] = torch.roll(mg_inputs['labels'], -1, dims=-1)

mg_inputs['labels'] = torch.cat([mg_inputs['labels'][..., 1:], torch.full_like(mg_inputs['labels'][..., :1], -100)], dim=-1)

gemini-code-assist · 2026-05-21T13:52:57Z

+    return loss.sum() / loss_mask.sum()
+
+
+torch.cuda.set_device(f"cuda:{os.getenv('LOCAL_RANK')}")


os.getenv('LOCAL_RANK') returns a string and can be None if the environment variable is missing. It is safer to use os.environ.get with a default value and cast it to an integer, which is the expected type for torch.cuda.set_device.

Suggested change

torch.cuda.set_device(f"cuda:{os.getenv('LOCAL_RANK')}")

torch.cuda.set_device(int(os.environ.get('LOCAL_RANK', 0)))

gemini-code-assist · 2026-05-21T13:52:58Z

+
+torch.cuda.set_device(f"cuda:{os.getenv('LOCAL_RANK')}")
+dist.init_process_group(backend='nccl')
+TP, PP, EP, ETP = 2, 1, 2, 1


The parallel configuration TP=2, PP=1, EP=2 requires at least 4 GPUs because the expert model parallel size (EP) must be less than or equal to the data parallel size (DP), and with 2 GPUs and TP=2, the DP size is 1. To match the execution command provided earlier (--nproc_per_node=2), EP should be set to 1.

Suggested change

TP, PP, EP, ETP = 2, 1, 2, 1

TP, PP, EP, ETP = 2, 1, 1, 1

gemini-code-assist · 2026-05-21T13:52:58Z

+        mg_inputs.pop(key, None)
+    if template.padding_free:
+        mg_inputs['packed_seq_params'] = get_packed_seq_params(text_position_ids)
+    mg_inputs['labels'] = torch.roll(mg_inputs['labels'], -1, dims=-1)


Using torch.roll to shift labels for causal language modeling is problematic because it wraps the first token's label to the last position. This results in an incorrect loss calculation for the final token of the sequence. It is better to shift the labels and pad the last position with the ignore index (-100).

Suggested change

mg_inputs['labels'] = torch.roll(mg_inputs['labels'], -1, dims=-1)

mg_inputs['labels'] = torch.cat([mg_inputs['labels'][..., 1:], torch.full_like(mg_inputs['labels'][..., :1], -100)], dim=-1)

gemini-code-assist · 2026-05-21T13:52:58Z

+    return loss.sum() / loss_mask.sum()
+
+
+torch.cuda.set_device(f"cuda:{os.getenv('LOCAL_RANK')}")


os.getenv('LOCAL_RANK') returns a string and can be None if the environment variable is missing. It is safer to use os.environ.get with a default value and cast it to an integer, which is the expected type for torch.cuda.set_device.

Suggested change

torch.cuda.set_device(f"cuda:{os.getenv('LOCAL_RANK')}")

torch.cuda.set_device(int(os.environ.get('LOCAL_RANK', 0)))

gemini-code-assist · 2026-05-21T13:52:58Z

+
+torch.cuda.set_device(f"cuda:{os.getenv('LOCAL_RANK')}")
+dist.init_process_group(backend='nccl')
+TP, PP, EP, ETP = 2, 1, 2, 1


The parallel configuration TP=2, PP=1, EP=2 requires at least 4 GPUs because the expert model parallel size (EP) must be less than or equal to the data parallel size (DP), and with 2 GPUs and TP=2, the DP size is 1. To match the execution command provided earlier (--nproc_per_node=2), EP should be set to 1.

Suggested change

TP, PP, EP, ETP = 2, 1, 2, 1

TP, PP, EP, ETP = 2, 1, 1, 1

update docs

9a1fbfc

tastelikefeet approved these changes May 21, 2026

View reviewed changes

gemini-code-assist Bot reviewed May 21, 2026

View reviewed changes

Comment thread README.md Outdated

Comment thread README_zh.md Outdated

hjh0119 approved these changes May 21, 2026

View reviewed changes

update

c8a3fa7

gemini-code-assist Bot reviewed May 21, 2026

View reviewed changes

Jintao-Huang merged commit 2a96684 into modelscope:main May 21, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[docs] update docs#89

[docs] update docs#89
Jintao-Huang merged 2 commits into
modelscope:mainfrom
Jintao-Huang:update_docs_0521

Jintao-Huang commented May 21, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Jintao-Huang commented May 21, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 21, 2026

Uh oh!

gemini-code-assist Bot May 21, 2026

Uh oh!

gemini-code-assist Bot May 21, 2026

Uh oh!

gemini-code-assist Bot May 21, 2026

Uh oh!

gemini-code-assist Bot May 21, 2026

Uh oh!

gemini-code-assist Bot May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	mg_inputs['labels'] = torch.roll(mg_inputs['labels'], -1, dims=-1)
	mg_inputs['labels'] = torch.cat([mg_inputs['labels'][..., 1:], torch.full_like(mg_inputs['labels'][..., :1], -100)], dim=-1)

		return loss.sum() / loss_mask.sum()


		torch.cuda.set_device(f"cuda:{os.getenv('LOCAL_RANK')}")

	torch.cuda.set_device(f"cuda:{os.getenv('LOCAL_RANK')}")
	torch.cuda.set_device(int(os.environ.get('LOCAL_RANK', 0)))

Conversation

Jintao-Huang commented May 21, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Jintao-Huang commented May 21, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants