common : gracefully handle incomplete output by aldehir · Pull Request #20191 · ggml-org/llama.cpp

aldehir · 2026-03-07T10:12:28Z

When reaching the max context, or max_tokens, the current parse function will fail if there are still pieces to parse when is_partial = false. This PR loosens the requirements and allows partially parsed output even at EOG. This behavior is the same as when streaming, except there is no subsequent parse attempt for the next token.

#19869 (comment)
supercedes #19992
fixes #20193
fixes #20229

@trshimizu @akreal please test to see if this fixes the issues you are seeing.

re: invalid UTF-8 characters. I haven't seen this happen, but if you have a good reproducer then I'll be glad to test.

akreal · 2026-03-07T11:01:40Z

Thanks! It does fix the case when there is an incomplete UTF-8 character at the end:

tst.test("Hello, world!\nWhat's up?\xE2\x9C").expect(message_assist).run();

However, it still fails when an incomplete UTF-8 is in the middle:

tst.test("Hello, \xE2\x9Cworld!\nWhat's up?\xE2\x9C").expect(message_assist).run();

Would it be possible to handle the mid-string case too?

I know it is very unlikely for a model to generate such sequence, since it should never appear in the training data. However, it does happen with Ministral-3 models sometimes, unfortunately.

trshimizu · 2026-03-07T12:52:27Z

Thank you for the fix. However, a quick check on my side didn't eliminate the error.

Even if rest() now stops before the incomplete UTF-8 bytes and returns success, the subsequent end() assertion will still fail because there are unconsumed bytes remaining in the input. This causes the same runtime_error to be thrown and the same 500 error to occur.

The root problem seems to be that there is no error recovery at the common_chat_parse() level for PEG formats. When is_partial=false and the final token cuts off mid-UTF-8 character (which happens when max_tokens forces an early stop), the parse fails with no fallback.

aldehir · 2026-03-07T18:00:39Z

Even if rest() now stops before the incomplete UTF-8 bytes and returns success, the subsequent end() assertion will still fail because there are unconsumed bytes remaining in the input. This causes the same runtime_error to be thrown and the same 500 error to occur.

Model?

Is it happening during reasoning or when given the final response?

… partial output

aldehir · 2026-03-07T19:10:23Z

@trshimizu please try the latest commit.

@akreal the scope of this PR has deviated away from just UTF-8, so I will address the invalid UTF-8 codepoints in a later PR.

aldehir · 2026-03-07T20:58:59Z

Need to evaluate this in the context of parsing the model ini, which should have stricter conditions.

pwilkin · 2026-03-07T21:46:44Z

This now supersedes #20204 right?

aldehir · 2026-03-07T21:59:13Z

This now supersedes #20204 right?

Yes.

trshimizu · 2026-03-08T00:09:23Z

Model?

Qwen3.5-397B-A17B in the non-reasoning mode

Is it happening during reasoning or when given the final response?

Happening in the final response.

@trshimizu please try the latest commit.

The error has disappeared on the latest commit!

aldehir · 2026-03-08T07:15:12Z

@pwilkin I renamed partial to lenient and refactored into flags. It carries the same semantics as partial, but we always enable it when parsing model output to capture a partial AST.

The name change is to differentiate between is_partial and convey parser configuration vs. input state.

common : handle incomplete UTF-8 at end of input in PEG parser

7562f38

aldehir requested a review from pwilkin March 7, 2026 10:13

cont : if reached end prematurely, emit needs_more_input to propagate…

a10559c

… partial output

aldehir requested a review from ggerganov as a code owner March 7, 2026 19:09

aldehir changed the title ~~common : handle incomplete UTF-8 at end of input in PEG parser~~ common : gracefully handle incomplete output Mar 7, 2026

aldehir mentioned this pull request Mar 7, 2026

Misc. bug: llama-server responds with error code 500 and "Failed to parse input at pos ..." message when max_tokens is reached #20193

Closed

pwilkin approved these changes Mar 7, 2026

View reviewed changes

github-actions bot added the testing Everything test related label Mar 7, 2026

cont: refactor peg parse context to add lenient flag

8e311ee

aldehir requested a review from pwilkin March 8, 2026 07:11

cont : remove partial flag, keep lenient flag

66cdd03

pwilkin approved these changes Mar 8, 2026

View reviewed changes

pwilkin merged commit 451ef08 into ggml-org:master Mar 8, 2026
76 of 78 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

common : gracefully handle incomplete output#20191

common : gracefully handle incomplete output#20191
pwilkin merged 4 commits intoggml-org:masterfrom
aldehir:fix-utf8-peg-parser

aldehir commented Mar 7, 2026 •

edited

Loading

Uh oh!

akreal commented Mar 7, 2026

Uh oh!

trshimizu commented Mar 7, 2026

Uh oh!

aldehir commented Mar 7, 2026 •

edited

Loading

Uh oh!

aldehir commented Mar 7, 2026

Uh oh!

aldehir commented Mar 7, 2026 •

edited

Loading

Uh oh!

pwilkin commented Mar 7, 2026

Uh oh!

aldehir commented Mar 7, 2026 •

edited

Loading

Uh oh!

trshimizu commented Mar 8, 2026 •

edited

Loading

Uh oh!

aldehir commented Mar 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

aldehir commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

akreal commented Mar 7, 2026

Uh oh!

trshimizu commented Mar 7, 2026

Uh oh!

aldehir commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aldehir commented Mar 7, 2026

Uh oh!

aldehir commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pwilkin commented Mar 7, 2026

Uh oh!

aldehir commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

trshimizu commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aldehir commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

aldehir commented Mar 7, 2026 •

edited

Loading

aldehir commented Mar 7, 2026 •

edited

Loading

aldehir commented Mar 7, 2026 •

edited

Loading

aldehir commented Mar 7, 2026 •

edited

Loading

trshimizu commented Mar 8, 2026 •

edited

Loading

aldehir commented Mar 8, 2026 •

edited

Loading