-
Couldn't load subscription status.
- Fork 224
[Bug] fix render - tool formatting #76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
I noticed this issue as well, in vLLM I get this error which I believe is related:
(related gh issue/comment: vllm-project/vllm#22515 (comment)) UPDATE: I was able to fix this by fetching the latest model files from HF since this commit fixed the generatiom_config.json file. My saved files were from before that commit. |
|
@dkundel-openai |
|
@andresC98 |
|
I encountered the issue that tool call parameters sometimes were incorrectly generated inside "reasoning_text" instead of "function_call". I can also confirm that updating generation_config.json did not help with this issue. |
|
Hi team! @dkundel-openai @scott-oai Just a gentle ping on this PR. I understand you might be busy, but I'd appreciate any feedback when you get a chance. This fix addresses some encoding issues that could affect tool functionality stability. Happy to make any adjustments if needed! |
|
Merge this pls |
|
@levunet Although you experimentally found that removing <|constrain|> tends to stabilize tool calling, your assumption that 'the <|constrain|> token is not appeared to be used during the training of the model' sounds too strong to believe for me. Can you elaborate more on this? Do you have any other observations to support your argument? |
|
@levunet Perhaps a smaller diff like this may work, what do you think? Let me test this as well... index 6a9305b..d04aad7 100644
--- a/src/encoding.rs
+++ b/src/encoding.rs
@@ -823,7 +823,8 @@ impl Render<Message> for HarmonyEncoding {
// next render the header recipient, if there is one
if let Some(recipient) = &message.recipient {
if recipient != "all" {
- self.render_text_into(format!(" to={recipient}"), into)?;
+ self.render_text_into(" to", into)?;
+ self.render_text_into(format!("={recipient}"), into)?;
}
}
@@ -844,7 +845,7 @@ impl Render<Message> for HarmonyEncoding {
self.render_text_into(" ", into)?;
self.render_formatting_token_into(FormattingToken::ConstrainedFormat, into)?;
if !rest.is_empty() {
- self.render_text_into(rest, into)?;
+ self.render_text_into(format!(" {rest}"), into)?;
}
} else {
self.render_text_into(format!(" {content_type}"), into)?; |
|
Since I organized this based on my experimental results, I think my opinion may have come across as unintentionally too strong. To explain my earlier argument in more detail, I conducted hundreds of tests using multiple tool callings with dozens of tools, and during that process, I experimented with the <|constrain|> token while conducting various token tests for stabilization. As a key peculiarity, when using this token, there was a very high probability of outputting ' json' which is not used otherwise, but in the opposite case - when not using the <|constrain|> token - the probability of outputting '<|constrain|>' was very low. In my experimental results, it was never outputted, though I assumed the probability was just very low. Additionally, normal responses were generated only when using the ' to' and ' json' tokens with spaces included, and when 'to' and 'json' tokens without spaces were used due to some mistakes, I confirmed that model errors accumulated and the response structure broke down. Based on these results, I thought the model tends to use the learned data structure as-is, which led me to think that the <|constrain|> token was likely not used in training. |
I'm reporting that this smaller patch was NOT enough to make vllm-hosted gpt-oss-120b work with codex. So it look like that your original patch is necessary! |

I found the main encoding issues and the fixes are as follows:
'to' and ' to' encoding issue
There was an issue with incorrectly encoding the 'to' token (id 935) and ' to' token (id 316). During model training, it appears the model was configured to use id 316 when using tools, but when encoding produces 935 instead, there's a high probability the model will generate abnormal tokens. To resolve this, I modified it to consistently encode ' to'.
'<|constrain|>' encoding issue
It appears that during model training, this token was not used, and instead tool execution requests were trained using the ' json' token with a space included, similar to the ' to' token. Therefore, I removed '<|constrain|>' and modified it to ensure spaces are mandatory. This solves the issue where including '<|constrain|>' makes the model's function output extremely unstable and generates abnormal tokens as tool executions are repeated multiple times.
Main token investigation details (token id):
220 = ' '
12606 = 'comment'
815 = 'ary'
316 = ' to' (presumed to be a token trained specifically for tool requests)
935 = 'to'
28 = '='
6961 = '??'
4108 = 'json'
5701 = ' json' (presumed to be a token trained specifically for tool requests)
openai/gpt-oss-20b
For those using vllm, by applying the two additional PRs and using the model configuration values from the test code, you can use the tool execution functionality with almost 100% reliability without issues.
vllm-project/vllm#24954
vllm-project/vllm#24768