-
Notifications
You must be signed in to change notification settings - Fork 253
Description
Background
While using the gpt-oss model series, I discovered an inconsistency between the official OpenAI Harmony rendering logic and the chat_template.jinja provided in the Hugging Face repository.
According to the OpenAI Harmony specification (as referenced in Harmony Issue #58), structural outputs and tool calls require a specific content_type set to <|constrain|>json. However, the current Jinja template in the HF tokenizer_config.json does not correctly render this token, even when tools are provided to the template.
The Problem
-
Inconsistent Inference: The model is trained to expect and produce
<|constrain|>jsonfor tool arguments. When the chat template omits this token during few-shot prompting or conversation history reconstruction, it breaks the model's structural consistency and may degrade tool-calling performance. -
Template Implementation Gap: The current Jinja implementation appears to hardcode
jsonor omit the<|constrain|>modifier entirely, which is inconsistent with the Harmony framework's requirements for proper tool execution formatting.
Steps to Reproduce
Run the following script with openai/gpt-oss-20b (or gpt-oss-120b):
from transformers import AutoTokenizer
import json
# Using the official GPT-OSS tokenizer
tokenizer = AutoTokenizer.from_pretrained("openai/gpt-oss-20b", use_fast=False)
obj = {
"tools": [
{
"function": {
"name": "python_executor",
"description": "Executes Python code and returns the output or error message.",
"parameters": {
"type": "object",
"properties": {
"code": {"type": "string", "description": "Python code to execute"}
},
"required": ["code"]
}
}
}
],
"messages": [
{"role": "system", "content": "You are Qwen, a helpful AI assistant that can execute Python code using available tools."},
{"role": "user", "content": "Please run this code: print(2 + 3)"},
{
"role": "assistant",
"tool_calls": [
{
"function": {
"name": "python_executor",
"arguments": {"code": "print(2 + 3)"}
}
}
]
},
{"role": "tool", "content": "{\"stdout\": \"5\\n\", \"stderr\": \"\"}"}
]
}
text = tokenizer.apply_chat_template(
obj["messages"],
tokenize=False,
add_generation_prompt=True,
tools=obj["tools"]
)
print(text)Actual vs Expected Output
Actual Output (Current Template)
...<|start|>assistant|>functions.python_executor<|channel|>commentary json<|message|>{"code": "print(2 + 3)"}<|call|>...
Expected Output (Per Harmony Spec)
...<|start|>assistant|>functions.python_executor<|channel|>commentary <|constrain|>json<|message|>{"code": "print(2 + 3)"}<|call|>...
Key Difference
Actual: <|channel|>commentary json
Expected: <|channel|>commentary <|constrain|>json
The <|constrain|> special token is missing between commentary and json.
Impact
This discrepancy means:
- Tool-calling examples in conversation history are not properly formatted
- Few-shot prompting with tools may not work as intended
- The model may not correctly recognize tool call formatting during inference
- Behavior diverges from the official OpenAI Harmony implementation