[bugfix] sync template.padding_free with args after prepare_model for… by yaoruda · Pull Request #9031 · modelscope/ms-swift

yaoruda · 2026-04-07T12:24:36Z

PR type

Bug Fix
New Feature
Document Updates
More Models or Datasets Support

PR information

Problem

When running Megatron DPO/KTO training with DSA-based models (e.g. DeepSeek-V3.2),
training produces NaN loss on every step, regardless of --packing settings.

Root Cause

In the Megatron training path, MegatronArguments defines padding_free: bool = True as the default.
The template object is created before model initialization, so template.padding_free = True.

During BaseMegatronTrainer.prepare_model(), _check_padding_free() in
swift/megatron/model/utils.py detects DSA attention and forces args.padding_free = False.
However, template.padding_free is never updated — it remains True.

This mismatch causes:

Data collator (template.padding_free=True): packs chosen+rejected into a single row
→ labels.shape = [1, N]
_prepare_batch (args.padding_free=False): does not create packed_seq_params
→ packed_seq_params = None
loss_func: computes num_samples = labels.shape[0] // 2 = 1 // 2 = 0
→ empty chosen_logps → loss = NaN

This bug affects all DSA models by default (since Megatron defaults padding_free=True).
Non-DSA models (e.g. Qwen) are not affected because _check_padding_free does not override
args.padding_free for them.

Fix

Sync template.padding_free to args.padding_free right after self.prepare_model()
in BaseMegatronTrainer.__init__(), before the data collator is created.

self.prepare_model()
# Sync template.padding_free after prepare_model(), because _check_padding_free
# may override args.padding_free for certain models (e.g. DSA attention).
if template.padding_free != args.padding_free:
    logger.warning(
        f'template.padding_free({template.padding_free}) != args.padding_free({args.padding_free}), '
        f'syncing template.padding_free to {args.padding_free}.')
    template.padding_free = args.padding_free

Scope

Scenario	Before Fix	After Fix
DSA model + default `padding_free` (True)	NaN	✅ Normal
DSA model + `--padding_free true`	NaN	✅ Normal
DSA model + `--packing true`	NaN	✅ Normal
DSA model + `--padding_free false`	Normal (workaround)	✅ Normal
Non-DSA model (any setting)	Normal	✅ No change

Verification

Tested with DeepSeek-V3.2 Megatron DPO training:

Before fix: NaN loss from step 1 (confirmed chosen_logps.shape=[0] via logging)
After fix: training runs normally with valid loss values

… DSA models

gemini-code-assist

Code Review

This pull request introduces a synchronization step in the MegatronTrainer base class to ensure that template.padding_free is consistent with args.padding_free after the model is prepared. This change accounts for potential overrides of the padding configuration during model-specific initialization, such as for DSA attention models. I have no feedback to provide.

Jintao-Huang · 2026-04-08T11:34:25Z

thanks!

Jintao-Huang · 2026-04-08T11:44:35Z

please run:

pip install pre-commit
pre-commit run --all-files

… DSA models (#9031)

[bugfix] sync template.padding_free with args after prepare_model for…

f3f41ee

… DSA models

gemini-code-assist bot reviewed Apr 7, 2026

View reviewed changes

Jintao-Huang approved these changes Apr 8, 2026

View reviewed changes

Jintao-Huang merged commit e8b3876 into modelscope:main Apr 8, 2026
2 of 3 checks passed

Jintao-Huang pushed a commit that referenced this pull request Apr 13, 2026

[bugfix] sync template.padding_free with args after prepare_model for…

b54912a

… DSA models (#9031)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bugfix] sync template.padding_free with args after prepare_model for…#9031

[bugfix] sync template.padding_free with args after prepare_model for…#9031
Jintao-Huang merged 1 commit intomodelscope:mainfrom
yaoruda:fix/sync-template-padding-free

yaoruda commented Apr 7, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Jintao-Huang commented Apr 8, 2026

Uh oh!

Jintao-Huang commented Apr 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yaoruda commented Apr 7, 2026

PR type

PR information

Problem

Root Cause

Fix

Scope

Verification

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Jintao-Huang commented Apr 8, 2026

Uh oh!

Jintao-Huang commented Apr 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants