Skip to content

[megatron] remove megatron core 0.12-0.14#9260

Merged
Jintao-Huang merged 3 commits into
modelscope:mainfrom
Jintao-Huang:update_megatron_core_015
May 5, 2026
Merged

[megatron] remove megatron core 0.12-0.14#9260
Jintao-Huang merged 3 commits into
modelscope:mainfrom
Jintao-Huang:update_megatron_core_015

Conversation

@Jintao-Huang
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the minimum version requirements for megatron-core to 0.15 and mcore-bridge to 1.2.0, removing legacy version checks and patches for older Megatron-Core versions. Feedback identifies a NameError in base.py caused by the removal of track_moe_kwargs while it is still being referenced, and a logic regression in trainer.py where context parallel all-reduction is now incorrectly enabled for newer mcore versions. Additionally, documentation updates contain potential typos regarding CUDA version labels.

Comment thread swift/megatron/trainers/base.py Outdated
Comment thread swift/megatron/trainers/trainer.py Outdated
# transformer_engine
# 若出现安装错误,可以参考该issue解决: https://github.com/modelscope/ms-swift/issues/3793
pip install --no-build-isolation transformer-engine[pytorch] --no-cache-dir
# cuda13
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Potential typo: # cuda13. The latest stable CUDA version is 12.8, and the environment table below specifies cuda12.8/12.9. This should likely be # cuda12 or # cuda12.x.

Suggested change
# cuda13
# cuda12

# transformer_engine
# If an installation error occurs, you can refer to this issue for resolution: https://github.com/modelscope/ms-swift/issues/3793
pip install --no-build-isolation transformer-engine[pytorch] --no-cache-dir
# cuda13
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Potential typo: # cuda13. The latest stable CUDA version is 12.8, and the environment table below specifies cuda12.8/12.9. This should likely be # cuda12 or # cuda12.x.

Suggested change
# cuda13
# cuda12

@Jintao-Huang
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the minimum version requirements for megatron-core to 0.15 and mcore-bridge to 1.2.0. Consequently, it removes legacy version checks and patches for older Megatron-Core versions (0.12 to 0.14) across the codebase, simplifying the logic in trainers and utility modules. Documentation has been updated to reflect these new requirements and includes a command to install TransformerEngine from its stable branch. A typo was noted in the documentation where # cuda13 is used instead of # cuda12.

# transformer_engine
# 若出现安装错误,可以参考该issue解决: https://github.com/modelscope/ms-swift/issues/3793
pip install --no-build-isolation transformer-engine[pytorch] --no-cache-dir
# cuda13
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment # cuda13 appears to be a typo for # cuda12. CUDA 13 has not been released yet, and the stable branch of TransformerEngine is typically used with CUDA 12 environments.

Suggested change
# cuda13
# cuda12

# transformer_engine
# If an installation error occurs, you can refer to this issue for resolution: https://github.com/modelscope/ms-swift/issues/3793
pip install --no-build-isolation transformer-engine[pytorch] --no-cache-dir
# cuda13
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment # cuda13 appears to be a typo for # cuda12. CUDA 13 has not been released yet, and the stable branch of TransformerEngine is typically used with CUDA 12 environments.

Suggested change
# cuda13
# cuda12

@Jintao-Huang Jintao-Huang changed the title remove megatron core 0.12-0.14 [megatron] remove megatron core 0.12-0.14 May 5, 2026
@Jintao-Huang Jintao-Huang merged commit a9e3e23 into modelscope:main May 5, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants