Skip to content

feat(gpt-oss): Add {% generation %} markers for training chat template#5484

Merged
qgallouedec merged 6 commits intohuggingface:mainfrom
casinca:gpt-oss-gen-support-chat-template
Apr 10, 2026
Merged

feat(gpt-oss): Add {% generation %} markers for training chat template#5484
qgallouedec merged 6 commits intohuggingface:mainfrom
casinca:gpt-oss-gen-support-chat-template

Conversation

@casinca
Copy link
Copy Markdown
Contributor

@casinca casinca commented Apr 9, 2026

What does this PR do?

This PR aims to add {% generation %} tags/markers for gpt-oss: part of #5471

  • gptoss_training.ninja is just copied from the existing gptoss.jinja with wrapped {% generation %} / {%- endgeneration %} changes, just like qwen3.
Diff: gptoss.jinja vs gptoss_training.jinja
@@ -1,4 +1,9 @@
 {#-
+  Training variant of the GPT-OSS chat template (see gptoss.jinja for the original).
+  Modifications vs the original:
+    - Added {% generation %} / {% endgeneration %} around assistant message output to support
+      assistant-only loss masking in SFT training.
+
   In addition to the normal inputs of `messages` and `tools`, this template also accepts the
   following kwargs:
   - "builtin_tools": A list, can contain "browser" and/or "python".        
@@ -270,6 +275,7 @@
                 {{- raise_exception("You have passed a message containing <|channel|> tags in the thinking field. Instead of doing this, you should pass analysis messages (the string between '<|message|>' and '<|end|>') in the 'thinking' field, and final messages (the string between '<|message|>' and '<|end|>') in the 'content' field.") }}
             {%- endif %}
         {%- endif %}
+        {%- generation %}
         {%- if "tool_calls" in message %}
             {#- We need very careful handling here - we want to drop the tool call analysis message if the model #}
             {#- has output a later <|final|> message, but otherwise we want to retain it. This is the only case #}
@@ -314,6 +320,7 @@
             {{- "<|start|>assistant<|channel|>final<|message|>" + message.content + "<|end|>" }}
             {%- set last_tool_call.name = none %}
         {%- endif %}
+        {%- endgeneration %}
     {%- elif message.role == 'tool' -%}
         {%- if last_tool_call.name is none %}
             {{- raise_exception("Message has tool role, but there was no previous assistant message with a tool call!") }}

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline, Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

AI writing disclosure

We welcome the use of AI tools to help with contributions. For transparency and to help us improve our review process, please indicate the level of AI involvement in this PR.

  • No AI usage: the PR was written entirely by a human.
  • AI-assisted: some parts were suggested or improved by AI, but the PR was written and reviewed by a human.
  • AI-generated: the PR was mostly or fully generated by an AI tool.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

👋 @qgallouedec I was waiting for your #5470 to be merged, to get a clean template for this PR.


Note

Medium Risk
Adds a new GPT-OSS training chat template and updates template-selection logic, which can change rendered prompts and assistant token masks during SFT/GRPO when the patch is applied. Risk is limited to GPT-OSS/Qwen3 identity-matched templates but could affect training correctness if the template diverges from the original.

Overview
Adds GPT-OSS support to get_training_chat_template by introducing a new gptoss_training.jinja template and returning it when the tokenizer’s template matches gptoss.jinja.

The new GPT-OSS training template wraps assistant rendering with {% generation %} / {% endgeneration %} to enable correct return_assistant_tokens_mask=True behavior for assistant-only loss. Documentation is updated to describe the new training template, and the existing TestGetTrainingChatTemplate parametrization is extended to cover GPT-OSS.

Reviewed by Cursor Bugbot for commit ece2009. Bugbot is set up for automated code reviews on this repo. Configure here.

@casinca casinca changed the title feat(gpt-oss): Add {% generation %} chat template feat(gpt-oss): Add {% generation %} markers for training chat template Apr 9, 2026
@casinca casinca marked this pull request as ready for review April 9, 2026 13:23
@casinca
Copy link
Copy Markdown
Contributor Author

casinca commented Apr 9, 2026

Note

On {% generation %} placement (see the diff in the initial PR description):

The Qwen3 training template is cleaner, the role (<|im_start|>assistant\n) is easily placed outside the {% generation %} block.
In GPT-OSS, the <|start|>assistant prefix is included inside the block, for simplicity, because it uses varying prefixes across >branches (<|channel|>analysis, <|channel|>final...). Rigorously matching 1:1 Qwen3 would require refactoring each branch.

I preferred to keep things simple in the first place but if necessary I'll refactor.

@qgallouedec
Copy link
Copy Markdown
Member

lgtm!

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Swish!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@qgallouedec
Copy link
Copy Markdown
Member

Note

On {% generation %} placement (see the diff in the initial PR description):

The Qwen3 training template is cleaner, the role (<|im_start|>assistant\n) is easily placed outside the {% generation %} block. In GPT-OSS, the <|start|>assistant prefix is included inside the block, for simplicity, because it uses varying prefixes across >branches (<|channel|>analysis, <|channel|>final...). Rigorously matching 1:1 Qwen3 would require refactoring each branch.
I preferred to keep things simple in the first place but if necessary I'll refactor.

Yes, I just checked the template, we would need a good amount of refactoring and I don't think this a something we want. You'll get one or two unwanted tokens in the loss, I think that should be fine

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@qgallouedec qgallouedec merged commit 6f6440b into huggingface:main Apr 10, 2026
10 of 12 checks passed
@casinca casinca deleted the gpt-oss-gen-support-chat-template branch April 10, 2026 09:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants