[bugfix] fix ignore_data_skip by Jintao-Huang · Pull Request #9220 · modelscope/ms-swift

Jintao-Huang · 2026-04-27T12:09:43Z

gemini-code-assist

Code Review

This pull request modifies the _get_resume_checkpoint method in swift/pipelines/train/sft.py to use trainer.args instead of self.args for configuration retrieval. Feedback points out that trainer.args (typically a TrainingArguments object) may not contain custom attributes like use_flash_ckpt or callbacks, which could lead to AttributeError or logic errors. It is recommended to use trainer.args only for standard fields and continue using self.args for custom pipeline-level flags.

gemini-code-assist · 2026-04-27T12:11:25Z

+        args = trainer.args
+        if args.resume_from_checkpoint:
+            return args.resume_from_checkpoint
        resume_checkpoint = None
        # If flash checkpoint is enabled, try to resume from the last complete checkpoint.
        # If the previous training finished, resume_checkpoint stays None.
-        if self.args.use_flash_ckpt:
+        if args.use_flash_ckpt:
            # resume_checkpoint = <resume_dir>/checkpoint-<step>
            resume_checkpoint = trainer.get_resume_checkpoint()

        # Elastic runs require a universal checkpoint; fall back when missing or incomplete.
-        callbacks = set(getattr(self.args, 'callbacks', []))
+        callbacks = set(getattr(args, 'callbacks', []))
        elastic_enabled = 'deepspeed_elastic' in callbacks


Using trainer.args for custom attributes like use_flash_ckpt and callbacks is risky. trainer.args (which is self.args.training_args) is typically a sanitized TrainingArguments object that may not contain the extended attributes defined in SftArguments or BaseArguments.

Potential AttributeError: If use_flash_ckpt is not defined in the TrainingArguments class used by the trainer, line 252 will raise an AttributeError.

Logic Error: TrainingArguments does not have a callbacks attribute by default. Using getattr(args, 'callbacks', []) on trainer.args will likely return an empty list, which would disable the deepspeed_elastic check even if it was requested in the original configuration.

It is recommended to use trainer.args only for standard fields like resume_from_checkpoint and continue using self.args for custom pipeline-level flags.

Suggested change

args = trainer.args

if args.resume_from_checkpoint:

return args.resume_from_checkpoint

resume_checkpoint = None

# If flash checkpoint is enabled, try to resume from the last complete checkpoint.

# If the previous training finished, resume_checkpoint stays None.

if self.args.use_flash_ckpt:

if args.use_flash_ckpt:

# resume_checkpoint = <resume_dir>/checkpoint-<step>

resume_checkpoint = trainer.get_resume_checkpoint()

# Elastic runs require a universal checkpoint; fall back when missing or incomplete.

callbacks = set(getattr(self.args, 'callbacks', []))

callbacks = set(getattr(args, 'callbacks', []))

elastic_enabled = 'deepspeed_elastic' in callbacks

args = trainer.args

if args.resume_from_checkpoint:

return args.resume_from_checkpoint

resume_checkpoint = None

# If flash checkpoint is enabled, try to resume from the last complete checkpoint.

# If the previous training finished, resume_checkpoint stays None.

if self.args.use_flash_ckpt:

# resume_checkpoint = <resume_dir>/checkpoint-<step>

resume_checkpoint = trainer.get_resume_checkpoint()

# Elastic runs require a universal checkpoint; fall back when missing or incomplete.

callbacks = set(getattr(self.args, 'callbacks', []))

elastic_enabled = 'deepspeed_elastic' in callbacks

fix resume_from_checkpoint

e04d0c4

tastelikefeet approved these changes Apr 27, 2026

View reviewed changes

gemini-code-assist Bot reviewed Apr 27, 2026

View reviewed changes

hjh0119 approved these changes Apr 27, 2026

View reviewed changes

Jintao-Huang merged commit 16e7376 into modelscope:main Apr 27, 2026
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bugfix] fix ignore_data_skip#9220

[bugfix] fix ignore_data_skip#9220
Jintao-Huang merged 1 commit into
modelscope:mainfrom
Jintao-Huang:fix_resume_from_checkpoint_0427

Jintao-Huang commented Apr 27, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Jintao-Huang commented Apr 27, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants