[bugfix] fix jinja-backend train↔infer parity for Qwen3.6 by ArvinZhuang · Pull Request #9277 · modelscope/ms-swift

ArvinZhuang · 2026-05-07T02:14:38Z

Closes #9276.

Problem

Qwen/Qwen3.6-*-A3B is bound to TemplateType.qwen3_5 (no separate qwen3_6 template registration in swift/model/models/qwen.py:1158-1161), and Qwen3.6's bundled chat_template.jinja differs from Qwen3.5's in two places (assistant <think> preservation gating + tool_call args rendering). Both also apply |trim filters on message content that the agent_template=qwen3_5 swift backend's prompt=['<|im_start|>user\n{{QUERY}}<|im_end|>\n'] format string does NOT apply.

For agent data where the user prompt ends in \n (common when prompts are read from a file), the swift backend emits an extra \n between the user content and <|im_end|> — so train↔inference input_ids drift by ~1 token per user turn.

The byte-equality test added in PR #8161 (test_qwen3_5) only covers Qwen3.5-35B-A3B with the function-calling-chatml dataset (whose user content has no trailing whitespace), so it does not catch any of the above on Qwen3.6.

The cleanest path is to make template_backend='jinja' actually usable for SFT. Currently _jinja_encode returns loss_scale=[1.] wholesale (base.py:1078), so the trainer supervises every token rendered by apply_chat_template — system, user, tool, role markers, everything. That's why users default to the swift backend even though they want the HF chat_template render.

Fix

In _jinja_encode (training path), split the rendered text along <|im_start|>role\n markers into alternating [non_assist_0, assist_0, non_assist_1, assist_1, ..., trailing_non_assist] chunks. The existing _encode_context_list then assigns loss_scale=0 to non-assistant chunks and loss_scale=1 to assistant chunks. Recombination is byte-exact, so train and inference still render identically.

This works for any ChatML-format chat_template (Qwen, ChatGLM, etc.). For non-ChatML templates the helper returns None and the caller falls back to the legacy wholesale loss_scale=[1.] path (no behavior change for those models).

After this PR, --template_backend jinja becomes the recommended setting for Qwen3.6 SFT, fully eliminating the swift-vs-jinja byte drift documented in #9276.

Test

Adds test_qwen3_6_jinja_train_infer_parity in tests/test_align/test_template/test_agent.py using agent-shape data: multi-turn <think> reasoning + tool_call + tool response, user content ending in \n (which exercises Qwen3.6 chat_template's |trim filter).

Asserts:

train input_ids byte-equal to inference apply_chat_template(...) tokens
labels mask non-assistant tokens (n_supervised > 0, n_masked > 0)
user-prompt content does not leak into supervised labels

Without the fix, Qwen3.6-35B-A3B fails (1) on agent data with \n-ended user content, and fails (2) entirely (every token supervised under template_backend=jinja).

Backward compatibility

Inference path unchanged (the new code is gated on self.is_training).
Non-ChatML templates unchanged ('<|im_start|>' not in text → return None → wholesale [1.] path).
Swift backend unchanged.
Encoder-decoder path unchanged (answer_len is still emitted correctly).

gemini-code-assist

Code Review

This pull request implements a mechanism to split ChatML-rendered text into alternating chunks, allowing for selective supervision of assistant turns when using the Jinja backend during training. A regression test for Qwen3.6 is included to ensure consistency between training and inference. Reviewers suggested setting load_model=False in the test to prevent OOM errors and adjusting the answer_len logic for encoder-decoder models to maintain correct data partitioning.

gemini-code-assist · 2026-05-07T02:17:02Z

+    rendered text, so the labels for system / user / tool tokens were not
+    masked — silently training on non-assistant tokens.
+    """
+    engine = TransformersEngine('Qwen/Qwen3.6-35B-A3B')


The model Qwen/Qwen3.6-35B-A3B is very large. Loading it during a template regression test is unnecessary and can lead to OOM errors or significantly slow down the CI environment. Since this test only requires the tokenizer and template logic, it is recommended to set load_model=False.

Suggested change

engine = TransformersEngine('Qwen/Qwen3.6-35B-A3B')

engine = TransformersEngine('Qwen/Qwen3.6-35B-A3B', load_model=False)

gemini-code-assist · 2026-05-07T02:17:02Z

+            if assistant_contexts is not None:
+                return assistant_contexts, [1. if i % 2 == 1 else 0.
+                                            for i in range(len(assistant_contexts))], answer_len


For encoder-decoder models, the answer_len is used to slice the res_context_list into prompt and answer parts (see lines 1450-1453). Previously, _jinja_encode returned a single context with answer_len=1, which meant the entire rendered text was treated as the answer (target). With the new splitting logic, len(assistant_contexts) is greater than 1, but answer_len remains 1. This will cause the prompt/answer split to be incorrect for encoder-decoder models that use ChatML templates (the prompt will contain all but the last chunk). To maintain the previous behavior for encoder-decoder models, answer_len should be adjusted to include all chunks in the decoder part.

Suggested change

if assistant_contexts is not None:

return assistant_contexts, [1. if i % 2 == 1 else 0.

for i in range(len(assistant_contexts))], answer_len

if assistant_contexts is not None:

if self.is_encoder_decoder:

answer_len = len(assistant_contexts)

return assistant_contexts, [1. if i % 2 == 1 else 0.

for i in range(len(assistant_contexts))], answer_len

Jintao-Huang · 2026-05-07T02:30:38Z

Please do not modify the base Template, as this will affect all models.

…lscope#9276) Closes modelscope#9276. ## Problem `Qwen/Qwen3.6-*-A3B` is bound to `TemplateType.qwen3_5` (no separate qwen3_6 template registration), and Qwen3.6's bundled `chat_template.jinja` differs from Qwen3.5's in two places: - assistant-turn `<think>` reasoning preservation gating (Qwen3.6 honors a `preserve_thinking` kwarg) - tool_call argument value rendering Both also share `|trim` filters on user/assistant content that the `agent_template=qwen3_5` swift backend's prompt format does NOT apply. For agent-shaped data where the user prompt ends in `\n` (common for prompts read from a file), the swift backend emits an extra `\n` between the user content and `<|im_end|>` — so train↔inference input_ids drift by ~1 token per user turn. The cleanest path is to make `template_backend='jinja'` actually usable for SFT — currently `_jinja_encode` returns `loss_scale=[1.]` wholesale, which means under jinja the trainer supervises *every* token (system / user / tool / role markers). That's why most users default to the swift backend even though they want the HF chat_template render. ## Fix (scoped to Qwen3_5Template — does NOT modify base Template) Override `_jinja_encode` only in `Qwen3_5Template` (which Qwen3.5 + Qwen3.6 all use today). When training, split the rendered text along `<|im_start|>role\n` markers into alternating `[non_assist, assist, ..., trailing_non_assist]` chunks. The trainer's existing `_encode_context_list` then assigns `loss_scale=0` to non- assistant chunks and `loss_scale=1` to assistant chunks. Recombination is byte-exact, so train and inference render identically. Inference behavior is unchanged (gated on `self.is_training`). The base `Template._jinja_encode` is untouched, so all other models are unaffected. ## Test Adds `test_qwen3_6_jinja_train_infer_parity` in `tests/test_align/test_template/test_agent.py`. Uses agent-shape data: multi-turn `<think>` + tool_call + tool, user content ending in `\n`. Loads with `load_model=False` (only tokenizer + template needed). Asserts: 1. train input_ids byte-equal to inference apply_chat_template tokens 2. labels mask non-assistant tokens (n_supervised > 0, n_masked > 0) 3. user-prompt content does not leak into supervised labels Without the fix, Qwen3.6-35B-A3B fails (1) on agent data with `\n`-ended user content, and fails (2) entirely (every token supervised under template_backend=jinja).

ArvinZhuang · 2026-05-07T02:37:35Z

@Jintao-Huang Thanks for the quick review — you're right, the base Template._jinja_encode change had too wide a blast radius. I've pushed a force-update that scopes the fix entirely to Qwen3_5Template (which is what every Qwen3.5 + Qwen3.6 model in the registry uses today):

swift/template/base.py is now untouched.
The training-only assistant-span splitting + loss-scale logic lives directly inside Qwen3_5Template._jinja_encode in swift/template/templates/qwen.py. Inference path is unchanged (gated on self.is_training).
All other models continue to use the base _jinja_encode exactly as before.

Also addressed the gemini-code-assist suggestions:

Test now uses TransformersEngine('Qwen/Qwen3.6-35B-A3B', load_model=False) so CI doesn't try to load the 35B weights.
The encoder-decoder answer_len concern is moot now that the override is scoped to a decoder-only template family — base behavior preserved for any encoder-decoder model.

Diff stat: swift/template/templates/qwen.py +74, tests/test_align/test_template/test_agent.py +90. Let me know if you'd prefer a different scoping (e.g. mixin class, or opt-in flag on TemplateMeta) instead.

Jintao-Huang · 2026-05-07T03:12:42Z

#9279

Jintao-Huang · 2026-05-07T03:33:44Z

Thank you for your PR, the issue has been fixed.

gemini-code-assist Bot reviewed May 7, 2026

View reviewed changes

ArvinZhuang force-pushed the fix/qwen3_6-jinja-train-infer-parity branch from 8d20e31 to 3c1c41a Compare May 7, 2026 02:37

ArvinZhuang closed this May 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bugfix] fix jinja-backend train↔infer parity for Qwen3.6#9277

[bugfix] fix jinja-backend train↔infer parity for Qwen3.6#9277
ArvinZhuang wants to merge 1 commit into
modelscope:mainfrom
ArvinZhuang:fix/qwen3_6-jinja-train-infer-parity

ArvinZhuang commented May 7, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 7, 2026

Uh oh!

gemini-code-assist Bot May 7, 2026

Uh oh!

Jintao-Huang commented May 7, 2026

Uh oh!

ArvinZhuang commented May 7, 2026

Uh oh!

Jintao-Huang commented May 7, 2026

Uh oh!

Jintao-Huang commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	engine = TransformersEngine('Qwen/Qwen3.6-35B-A3B')
	engine = TransformersEngine('Qwen/Qwen3.6-35B-A3B', load_model=False)

Conversation

ArvinZhuang commented May 7, 2026

Problem

Fix

Test

Backward compatibility

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

Jintao-Huang commented May 7, 2026

Uh oh!

ArvinZhuang commented May 7, 2026

Uh oh!

Jintao-Huang commented May 7, 2026

Uh oh!

Jintao-Huang commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants