Skip to content

compat megatron.core 0.18#77

Merged
Jintao-Huang merged 2 commits into
modelscope:mainfrom
Jintao-Huang:compat_megatron_core_018
May 14, 2026
Merged

compat megatron.core 0.18#77
Jintao-Huang merged 2 commits into
modelscope:mainfrom
Jintao-Huang:compat_megatron_core_018

Conversation

@Jintao-Huang
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for partial objects within the _build_mlp method of the transformer_layer.py module. A review comment identifies a potential TypeError because the implementation unconditionally passes arguments like pg_collection and is_mtp_layer to the partial object, regardless of the underlying module's signature. It also notes that additional_mlp_kwargs is redundant in the new branch and suggests a more robust way to handle different module types using build_module.

Comment on lines +236 to +243
if isinstance(mlp_spec, partial):
return mlp_spec(
config=self.config,
pg_collection=pg_collection,
is_mtp_layer=self.is_mtp_layer,
**additional_mlp_kwargs)
else:
return build_module(mlp_spec, config=self.config, **additional_mlp_kwargs)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation for handling partial objects is problematic for two reasons:

  1. Correctness: It unconditionally passes pg_collection and is_mtp_layer to the partial object. However, if the partial wraps a standard MLP or TEFusedMLP, it will likely fail with a TypeError because these modules expect tp_group instead of pg_collection, and do not accept is_mtp_layer in their constructors.
  2. Redundancy: additional_mlp_kwargs is only populated if mlp_spec is a ModuleSpec (lines 222-235). Since a partial is not a ModuleSpec, additional_mlp_kwargs is guaranteed to be empty when line 241 is reached, making **additional_mlp_kwargs redundant in that branch.

It is better to determine the underlying module type and populate additional_mlp_kwargs accordingly, then use build_module to perform the instantiation consistently.

Suggested change
if isinstance(mlp_spec, partial):
return mlp_spec(
config=self.config,
pg_collection=pg_collection,
is_mtp_layer=self.is_mtp_layer,
**additional_mlp_kwargs)
else:
return build_module(mlp_spec, config=self.config, **additional_mlp_kwargs)
if isinstance(mlp_spec, partial):
module = mlp_spec.func
if module in (MoELayer, TEGroupedMLP, SequentialMLP):
additional_mlp_kwargs['pg_collection'] = pg_collection
if module == MoELayer and 'is_mtp_layer' in inspect.signature(MoELayer).parameters:
additional_mlp_kwargs['is_mtp_layer'] = self.is_mtp_layer
elif module in (MLP, Glm4MLP) or (TEFusedMLP is not None and module == TEFusedMLP):
additional_mlp_kwargs['tp_group'] = pg_collection.tp
return build_module(mlp_spec, config=self.config, **additional_mlp_kwargs)

@Jintao-Huang Jintao-Huang merged commit 5e286f5 into modelscope:main May 14, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant