Skip to content

common : gracefully handle incomplete output#20191

Merged
pwilkin merged 4 commits intoggml-org:masterfrom
aldehir:fix-utf8-peg-parser
Mar 8, 2026
Merged

common : gracefully handle incomplete output#20191
pwilkin merged 4 commits intoggml-org:masterfrom
aldehir:fix-utf8-peg-parser

Conversation

@aldehir
Copy link
Collaborator

@aldehir aldehir commented Mar 7, 2026

When reaching the max context, or max_tokens, the current parse function will fail if there are still pieces to parse when is_partial = false. This PR loosens the requirements and allows partially parsed output even at EOG. This behavior is the same as when streaming, except there is no subsequent parse attempt for the next token.

#19869 (comment)
supercedes #19992
fixes #20193
fixes #20229

@trshimizu @akreal please test to see if this fixes the issues you are seeing.

re: invalid UTF-8 characters. I haven't seen this happen, but if you have a good reproducer then I'll be glad to test.

@aldehir aldehir requested a review from pwilkin March 7, 2026 10:13
@akreal
Copy link

akreal commented Mar 7, 2026

Thanks! It does fix the case when there is an incomplete UTF-8 character at the end:

tst.test("Hello, world!\nWhat's up?\xE2\x9C").expect(message_assist).run();

However, it still fails when an incomplete UTF-8 is in the middle:

tst.test("Hello, \xE2\x9Cworld!\nWhat's up?\xE2\x9C").expect(message_assist).run();

Would it be possible to handle the mid-string case too?

I know it is very unlikely for a model to generate such sequence, since it should never appear in the training data. However, it does happen with Ministral-3 models sometimes, unfortunately.

@trshimizu
Copy link

Thank you for the fix. However, a quick check on my side didn't eliminate the error.

Even if rest() now stops before the incomplete UTF-8 bytes and returns success, the subsequent end() assertion will still fail because there are unconsumed bytes remaining in the input. This causes the same runtime_error to be thrown and the same 500 error to occur.

The root problem seems to be that there is no error recovery at the common_chat_parse() level for PEG formats. When is_partial=false and the final token cuts off mid-UTF-8 character (which happens when max_tokens forces an early stop), the parse fails with no fallback.

@aldehir
Copy link
Collaborator Author

aldehir commented Mar 7, 2026

Even if rest() now stops before the incomplete UTF-8 bytes and returns success, the subsequent end() assertion will still fail because there are unconsumed bytes remaining in the input. This causes the same runtime_error to be thrown and the same 500 error to occur.

Model?

Is it happening during reasoning or when given the final response?

@aldehir aldehir requested a review from ggerganov as a code owner March 7, 2026 19:09
@aldehir
Copy link
Collaborator Author

aldehir commented Mar 7, 2026

@trshimizu please try the latest commit.

@akreal the scope of this PR has deviated away from just UTF-8, so I will address the invalid UTF-8 codepoints in a later PR.

@aldehir aldehir changed the title common : handle incomplete UTF-8 at end of input in PEG parser common : gracefully handle incomplete output Mar 7, 2026
@aldehir
Copy link
Collaborator Author

aldehir commented Mar 7, 2026

Need to evaluate this in the context of parsing the model ini, which should have stricter conditions.

@pwilkin
Copy link
Collaborator

pwilkin commented Mar 7, 2026

This now supersedes #20204 right?

@github-actions github-actions bot added the testing Everything test related label Mar 7, 2026
@aldehir
Copy link
Collaborator Author

aldehir commented Mar 7, 2026

This now supersedes #20204 right?

Yes.

@trshimizu
Copy link

trshimizu commented Mar 8, 2026

Model?

Qwen3.5-397B-A17B in the non-reasoning mode

Is it happening during reasoning or when given the final response?

Happening in the final response.

@trshimizu please try the latest commit.

The error has disappeared on the latest commit!

@aldehir aldehir requested a review from pwilkin March 8, 2026 07:11
@aldehir
Copy link
Collaborator Author

aldehir commented Mar 8, 2026

@pwilkin I renamed partial to lenient and refactored into flags. It carries the same semantics as partial, but we always enable it when parsing model output to capture a partial AST.

The name change is to differentiate between is_partial and convey parser configuration vs. input state.

@pwilkin pwilkin merged commit 451ef08 into ggml-org:master Mar 8, 2026
76 of 78 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Everything test related

Projects

None yet

4 participants