Add {% generation %} support to training chat templates#5470
Add {% generation %} support to training chat templates#5470qgallouedec merged 25 commits intomainfrom
{% generation %} support to training chat templates#5470Conversation
…sages + async grpo
| class TestGetTrainingChatTemplate: | ||
| def test_new_chat_template_is_prefix_preserving(self, tokenizer_name): | ||
| tokenizer = AutoTokenizer.from_pretrained(tokenizer_name) | ||
| assert is_chat_template_prefix_preserving(tokenizer) is False |
There was a problem hiding this comment.
for future models, get_training_chat_template will be called with any chat template, not only non-prefix-preserving ones
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3c2fc8e38f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 9d4d57f. Configure here.
|
|
||
| # When assistant_only_loss is enabled, swap in a training chat template with {% generation %} markers | ||
| # if the current template doesn't already have them. | ||
| if args.assistant_only_loss and "{% generation %}" not in processing_class.chat_template: |
There was a problem hiding this comment.
Potential crash when chat_template is None
Low Severity
The in check "{% generation %}" not in processing_class.chat_template will raise a TypeError if processing_class.chat_template is None. This can happen with tokenizers that have no chat template set. A guard like processing_class.chat_template and "{% generation %}" not in processing_class.chat_template would prevent the crash and provide a clearer path to the downstream ValueError from get_training_chat_template.
Reviewed by Cursor Bugbot for commit 9d4d57f. Configure here.


SFT with
assistant_only_loss=Truerequires the chat template to include{% generation %}/{% endgeneration %}markers so thatreturn_assistant_tokens_mask=Trueproduces correct masks. Very few models ship these markers natively. Users currently hit a cryptic error when tryingassistant_only_loss=Truewith e.g. Qwen3.Changes
This PR aims to provide a base structure to future patched chat templates. The first one to be added is Qwen3. Here are the changes:
qwen3_training.jinjaAdded{% generation %}/{% endgeneration %}around the assistant message output block.get_training_chat_templatenow returnsNoneonly when the template is both prefix-preserving and already contains{% generation %}markers. (Previously it returnedNonefor any prefix-preserving template)assistant_only_loss=True, automatically swaps in the training chat template if the current one lacks{% generation %}markers. and passesself.chat_templatethrough_tokenize→apply_chat_template.This PR requires #5459
Note
Medium Risk
Changes how chat templates are selected/patched for both SFT (
assistant_only_loss) and GRPO/tooling flows; incorrect detection or patching could affect tokenization, masking, and therefore training loss/behavior for supported models.Overview
Enables SFT
assistant_only_loss=Trueto work out-of-the-box for supported models by ensuring the training chat template includes{% generation %}/{% endgeneration %}markers and passing the patched template through tokenization.Updates
get_training_chat_templateto only returnNonewhen the current template is both prefix-preserving and generation-tagged; otherwise it returns the patched Qwen3 training template (or errors for unsupported templates). The Qwen3 training template is updated to wrap assistant output in generation tags, and tests/docs are extended to validatereturn_assistant_tokens_mask=Truebehavior and clarify the requirements.Separately tightens GRPO/async GRPO initialization so the training template is only swapped in when tools are enabled and the original template is not prefix-preserving, avoiding unnecessary patching.
Reviewed by Cursor Bugbot for commit 31e640f. Bugbot is set up for automated code reviews on this repo. Configure here.