[bugfix] Fix lora llm resume from checkpoint by Jintao-Huang · Pull Request #9225 · modelscope/ms-swift

Jintao-Huang · 2026-04-28T06:00:03Z

No description provided.

…nto support_kimi_k26

gemini-code-assist

Code Review

This pull request removes an outdated comment about checkpoint support in the SFT script and modifies the LoRA LLM tuner plugin to ensure that vision tower and aligner modules are set to require gradients after loading. Feedback was provided to improve the robustness of the new logic by adding checks for the existence of model architecture metadata and its attributes to avoid potential attribute errors.

Jintao-Huang · 2026-04-28T07:20:16Z

/gemini review

gemini-code-assist

Code Review

This pull request enables support for resuming from checkpoints in multimodal LoRA training by updating the DeepSpeed integration. Key changes include removing the training status check in Qwen model forwards to ensure consistency with DeepSpeed, and implementing a more robust parameter loading mechanism in LoRALLMTuner that handles DeepSpeed ZeRO-3's partitioned parameters. A performance improvement was suggested to gather all parameters at once when using ZeRO-3 to reduce communication overhead during state dict loading.

gemini-code-assist · 2026-04-28T07:24:18Z

+            params_dict = dict(model.named_parameters())
+            for name, tensor in state_dict.items():
+                if name in params_dict:
+                    param = params_dict[name]
+                    with deepspeed.zero.GatheredParameters([param], modifier_rank=0):
+                        if deepspeed.comm.get_rank() == 0:
+                            param.data.copy_(tensor)


For better performance when using DeepSpeed ZeRO-3, it's more efficient to gather all relevant parameters at once rather than iterating and gathering them one by one. This minimizes communication overhead.

Suggested change

params_dict = dict(model.named_parameters())

for name, tensor in state_dict.items():

if name in params_dict:

param = params_dict[name]

with deepspeed.zero.GatheredParameters([param], modifier_rank=0):

if deepspeed.comm.get_rank() == 0:

param.data.copy_(tensor)

params_dict = dict(model.named_parameters())

params_to_load = {name: params_dict[name] for name in state_dict if name in params_dict}

if params_to_load:

with deepspeed.zero.GatheredParameters(list(params_to_load.values()), modifier_rank=0):

if deepspeed.comm.get_rank() == 0:

for name, param in params_to_load.items():

param.data.copy_(state_dict[name])

Jintao-Huang added 17 commits April 22, 2026 23:31

support kimi k2.5

14d8141

update

fb97a40

update

980d9af

update

e679346

Merge remote-tracking branch 'refs/remotes/origin/support_kimi_k26' i…

7de8145

…nto support_kimi_k26

fix

eb222a2

Merge branch 'main' into support_kimi_k26

99ad111

Merge branch 'main' into support_kimi_k26

f5a9216

Merge branch 'main' into support_kimi_k26

bfe9514

fix

d38150e

update

aa01584

fix

4f6ebbd

fix

900897c

update

9b44f08

fix

5aaeef7

fix lora_llm resume_from_checkpoint

3a0d78a

Merge branch 'main' into fix_lora_llm_resume_from_checkpoint

f4b0eeb

gemini-code-assist Bot reviewed Apr 28, 2026

View reviewed changes

Comment thread swift/tuner_plugin/lora_llm.py

hjh0119 approved these changes Apr 28, 2026

View reviewed changes

Jintao-Huang added 2 commits April 28, 2026 15:09

fix

9a889b7

fix

fc1553b

gemini-code-assist Bot reviewed Apr 28, 2026

View reviewed changes

fix

1c4956e

Jintao-Huang merged commit ecf00d4 into modelscope:main Apr 28, 2026
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bugfix] Fix lora llm resume from checkpoint#9225

[bugfix] Fix lora llm resume from checkpoint#9225
Jintao-Huang merged 20 commits into
modelscope:mainfrom
Jintao-Huang:fix_lora_llm_resume_from_checkpoint

Jintao-Huang commented Apr 28, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Jintao-Huang commented Apr 28, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Jintao-Huang commented Apr 28, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Jintao-Huang commented Apr 28, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants