Fix megatron save oom by Jintao-Huang · Pull Request #9166 · modelscope/ms-swift

Jintao-Huang · 2026-04-21T07:01:56Z

gemini-code-assist

Code Review

This pull request updates the minimum Python version requirement from 3.9 to 3.10 across multiple documentation files and introduces a garbage collection call (gc_collect) in the Megatron trainer's checkpointing logic to reduce memory pressure. A review comment suggests that an additional garbage collection call might be necessary before saving weights if LoRA merging is enabled, as that process is memory-intensive and could still lead to out-of-memory errors.

gemini-code-assist · 2026-04-21T07:06:51Z

            model = []
        else:
            model = self.wrapped_models
+        gc_collect()


While adding gc_collect() here helps mitigate OOM issues during the initial checkpointing phase, it might also be beneficial to call it again before the save_safetensors block (around line 743), especially if merge_lora is enabled. Merging LoRA adapters can be memory-intensive and may leave fragmented memory that could cause OOM during the subsequent save_weights call.

Jintao-Huang added 2 commits April 21, 2026 14:49

megatron save oom (add gc_collect)

85ac8eb

fix

53ff157

gemini-code-assist Bot reviewed Apr 21, 2026

View reviewed changes

hjh0119 approved these changes Apr 21, 2026

View reviewed changes

Jintao-Huang merged commit 4f379be into modelscope:main Apr 21, 2026
3 checks passed

Jintao-Huang added a commit that referenced this pull request Apr 23, 2026

Fix megatron save oom (#9166)

2a48c80

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix megatron save oom#9166

Fix megatron save oom#9166
Jintao-Huang merged 2 commits into
modelscope:mainfrom
Jintao-Huang:fix_megatron_save_oom

Jintao-Huang commented Apr 21, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Jintao-Huang commented Apr 21, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants