Skip to content

Add {% generation %} support to training chat templates#5470

Merged
qgallouedec merged 25 commits intomainfrom
generation-tags
Apr 9, 2026
Merged

Add {% generation %} support to training chat templates#5470
qgallouedec merged 25 commits intomainfrom
generation-tags

Conversation

@qgallouedec
Copy link
Copy Markdown
Member

@qgallouedec qgallouedec commented Apr 7, 2026

SFT with assistant_only_loss=True requires the chat template to include {% generation %} / {% endgeneration %} markers so that return_assistant_tokens_mask=True produces correct masks. Very few models ship these markers natively. Users currently hit a cryptic error when trying assistant_only_loss=True with e.g. Qwen3.

Changes

This PR aims to provide a base structure to future patched chat templates. The first one to be added is Qwen3. Here are the changes:

  • in qwen3_training.jinjaAdded {% generation %} / {% endgeneration %} around the assistant message output block.
  • get_training_chat_template now returns None only when the template is both prefix-preserving and already contains {% generation %} markers. (Previously it returned None for any prefix-preserving template)
  • In SFT trainer when assistant_only_loss=True, automatically swaps in the training chat template if the current one lacks {% generation %} markers. and passes self.chat_template through _tokenizeapply_chat_template.

This PR requires #5459


Note

Medium Risk
Changes how chat templates are selected/patched for both SFT (assistant_only_loss) and GRPO/tooling flows; incorrect detection or patching could affect tokenization, masking, and therefore training loss/behavior for supported models.

Overview
Enables SFT assistant_only_loss=True to work out-of-the-box for supported models by ensuring the training chat template includes {% generation %} / {% endgeneration %} markers and passing the patched template through tokenization.

Updates get_training_chat_template to only return None when the current template is both prefix-preserving and generation-tagged; otherwise it returns the patched Qwen3 training template (or errors for unsupported templates). The Qwen3 training template is updated to wrap assistant output in generation tags, and tests/docs are extended to validate return_assistant_tokens_mask=True behavior and clarify the requirements.

Separately tightens GRPO/async GRPO initialization so the training template is only swapped in when tools are enabled and the original template is not prefix-preserving, avoiding unnecessary patching.

Reviewed by Cursor Bugbot for commit 31e640f. Bugbot is set up for automated code reviews on this repo. Configure here.

class TestGetTrainingChatTemplate:
def test_new_chat_template_is_prefix_preserving(self, tokenizer_name):
tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)
assert is_chat_template_prefix_preserving(tokenizer) is False
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for future models, get_training_chat_template will be called with any chat template, not only non-prefix-preserving ones

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3c2fc8e38f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 9d4d57f. Configure here.


# When assistant_only_loss is enabled, swap in a training chat template with {% generation %} markers
# if the current template doesn't already have them.
if args.assistant_only_loss and "{% generation %}" not in processing_class.chat_template:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential crash when chat_template is None

Low Severity

The in check "{% generation %}" not in processing_class.chat_template will raise a TypeError if processing_class.chat_template is None. This can happen with tokenizers that have no chat template set. A guard like processing_class.chat_template and "{% generation %}" not in processing_class.chat_template would prevent the crash and provide a clearer path to the downstream ValueError from get_training_chat_template.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 9d4d57f. Configure here.

@qgallouedec qgallouedec merged commit dd071d7 into main Apr 9, 2026
16 checks passed
@qgallouedec qgallouedec deleted the generation-tags branch April 9, 2026 02:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants