Skip to content

Conversation

@levunet
Copy link

@levunet levunet commented Sep 16, 2025

I found the main encoding issues and the fixes are as follows:

  1. 'to' and ' to' encoding issue
    There was an issue with incorrectly encoding the 'to' token (id 935) and ' to' token (id 316). During model training, it appears the model was configured to use id 316 when using tools, but when encoding produces 935 instead, there's a high probability the model will generate abnormal tokens. To resolve this, I modified it to consistently encode ' to'.

  2. '<|constrain|>' encoding issue
    It appears that during model training, this token was not used, and instead tool execution requests were trained using the ' json' token with a space included, similar to the ' to' token. Therefore, I removed '<|constrain|>' and modified it to ensure spaces are mandatory. This solves the issue where including '<|constrain|>' makes the model's function output extremely unstable and generates abnormal tokens as tool executions are repeated multiple times.

Main token investigation details (token id):
220 = ' '
12606 = 'comment'
815 = 'ary'
316 = ' to' (presumed to be a token trained specifically for tool requests)
935 = 'to'
28 = '='
6961 = '??'
4108 = 'json'
5701 = ' json' (presumed to be a token trained specifically for tool requests)

openai/gpt-oss-20b


For those using vllm, by applying the two additional PRs and using the model configuration values from the test code, you can use the tool execution functionality with almost 100% reliability without issues.

vllm-project/vllm#24954
vllm-project/vllm#24768

@andresC98
Copy link

andresC98 commented Sep 16, 2025

I noticed this issue as well, in vLLM I get this error which I believe is related:

openai.InternalServerError: Error code: 500 - {'error': {'message': 'Unexpected token 12606 while expecting start token 200006', 'type': 'Internal Server Error', 'param': None, 'code': 500}}

(related gh issue/comment: vllm-project/vllm#22515 (comment))

UPDATE: I was able to fix this by fetching the latest model files from HF since this commit fixed the generatiom_config.json file. My saved files were from before that commit.

@levunet
Copy link
Author

levunet commented Sep 17, 2025

@dkundel-openai
Hello! I've submitted a PR addressing some critical encoding issues in the harmony library that were causing incorrect token generation. If you have time, I would greatly appreciate your review and feedback. Thank you!

@levunet
Copy link
Author

levunet commented Sep 23, 2025

@andresC98
I have just checked the UPDATED content. I've also double-checked, but that commit did not resolve the issue for me. The problem I'm experiencing occurs when there is heavy tool usage in the gpt-oss model, where encoding errors accumulate and cause token mistakes after to=functions.name. This PR that fixes this issue is really necessary for me...

@levunet
Copy link
Author

levunet commented Sep 23, 2025

gpt-oss_test.py
messages.txt

When the changes in the current PR are applied, there are no cases where tool requests are missing even after multiple tests. However, when not applied, there is an issue where responses come without tool requests when executed multiple times. (Photo attached)

(This test code was temporarily written to reproduce the issue I encountered, so there may be some odd parts.)

image

@sa411022
Copy link

sa411022 commented Sep 23, 2025

I encountered the issue that tool call parameters sometimes were incorrectly generated inside "reasoning_text" instead of "function_call".
After applying this PR, the issue seems to be solved.
@levunet Thanks for your good work.

I can also confirm that updating generation_config.json did not help with this issue.

@levunet
Copy link
Author

levunet commented Sep 30, 2025

Hi team! @dkundel-openai @scott-oai

Just a gentle ping on this PR. I understand you might be busy, but I'd appreciate any feedback when you get a chance.

This fix addresses some encoding issues that could affect tool functionality stability. Happy to make any adjustments if needed!

@amal5haji
Copy link

Merge this pls

@borishim
Copy link

borishim commented Oct 10, 2025

@levunet Although you experimentally found that removing <|constrain|> tends to stabilize tool calling, your assumption that 'the <|constrain|> token is not appeared to be used during the training of the model' sounds too strong to believe for me. Can you elaborate more on this? Do you have any other observations to support your argument?

@borishim
Copy link

borishim commented Oct 10, 2025

@levunet Perhaps a smaller diff like this may work, what do you think? Let me test this as well...

index 6a9305b..d04aad7 100644
--- a/src/encoding.rs
+++ b/src/encoding.rs
@@ -823,7 +823,8 @@ impl Render<Message> for HarmonyEncoding {
         // next render the header recipient, if there is one
         if let Some(recipient) = &message.recipient {
             if recipient != "all" {
-                self.render_text_into(format!(" to={recipient}"), into)?;
+                self.render_text_into(" to", into)?;
+                self.render_text_into(format!("={recipient}"), into)?;
             }
         }
 
@@ -844,7 +845,7 @@ impl Render<Message> for HarmonyEncoding {
                     self.render_text_into(" ", into)?;
                     self.render_formatting_token_into(FormattingToken::ConstrainedFormat, into)?;
                     if !rest.is_empty() {
-                        self.render_text_into(rest, into)?;
+                        self.render_text_into(format!(" {rest}"), into)?;
                     }
                 } else {
                     self.render_text_into(format!(" {content_type}"), into)?;

@levunet
Copy link
Author

levunet commented Oct 10, 2025

Since I organized this based on my experimental results, I think my opinion may have come across as unintentionally too strong. To explain my earlier argument in more detail, I conducted hundreds of tests using multiple tool callings with dozens of tools, and during that process, I experimented with the <|constrain|> token while conducting various token tests for stabilization.

As a key peculiarity, when using this token, there was a very high probability of outputting ' json' which is not used otherwise, but in the opposite case - when not using the <|constrain|> token - the probability of outputting '<|constrain|>' was very low. In my experimental results, it was never outputted, though I assumed the probability was just very low.

Additionally, normal responses were generated only when using the ' to' and ' json' tokens with spaces included, and when 'to' and 'json' tokens without spaces were used due to some mistakes, I confirmed that model errors accumulated and the response structure broke down. Based on these results, I thought the model tends to use the learned data structure as-is, which led me to think that the <|constrain|> token was likely not used in training.

@borishim
Copy link

borishim commented Oct 11, 2025

@levunet Perhaps a smaller diff like this may work, what do you think? Let me test this as well...

index 6a9305b..d04aad7 100644
--- a/src/encoding.rs
+++ b/src/encoding.rs
@@ -823,7 +823,8 @@ impl Render<Message> for HarmonyEncoding {
         // next render the header recipient, if there is one
         if let Some(recipient) = &message.recipient {
             if recipient != "all" {
-                self.render_text_into(format!(" to={recipient}"), into)?;
+                self.render_text_into(" to", into)?;
+                self.render_text_into(format!("={recipient}"), into)?;
             }
         }
 
@@ -844,7 +845,7 @@ impl Render<Message> for HarmonyEncoding {
                     self.render_text_into(" ", into)?;
                     self.render_formatting_token_into(FormattingToken::ConstrainedFormat, into)?;
                     if !rest.is_empty() {
-                        self.render_text_into(rest, into)?;
+                        self.render_text_into(format!(" {rest}"), into)?;
                     }
                 } else {
                     self.render_text_into(format!(" {content_type}"), into)?;

I'm reporting that this smaller patch was NOT enough to make vllm-hosted gpt-oss-120b work with codex. So it look like that your original patch is necessary!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants