common : gracefully handle incomplete output#20191
Conversation
|
Thanks! It does fix the case when there is an incomplete UTF-8 character at the end: tst.test("Hello, world!\nWhat's up?\xE2\x9C").expect(message_assist).run();However, it still fails when an incomplete UTF-8 is in the middle: tst.test("Hello, \xE2\x9Cworld!\nWhat's up?\xE2\x9C").expect(message_assist).run();Would it be possible to handle the mid-string case too? I know it is very unlikely for a model to generate such sequence, since it should never appear in the training data. However, it does happen with Ministral-3 models sometimes, unfortunately. |
|
Thank you for the fix. However, a quick check on my side didn't eliminate the error. Even if rest() now stops before the incomplete UTF-8 bytes and returns success, the subsequent end() assertion will still fail because there are unconsumed bytes remaining in the input. This causes the same runtime_error to be thrown and the same 500 error to occur. The root problem seems to be that there is no error recovery at the common_chat_parse() level for PEG formats. When is_partial=false and the final token cuts off mid-UTF-8 character (which happens when max_tokens forces an early stop), the parse fails with no fallback. |
Model? Is it happening during reasoning or when given the final response? |
|
@trshimizu please try the latest commit. @akreal the scope of this PR has deviated away from just UTF-8, so I will address the invalid UTF-8 codepoints in a later PR. |
|
Need to evaluate this in the context of parsing the model ini, which should have stricter conditions. |
|
This now supersedes #20204 right? |
Yes. |
Qwen3.5-397B-A17B in the non-reasoning mode
Happening in the final response.
The error has disappeared on the latest commit! |
|
@pwilkin I renamed partial to lenient and refactored into flags. It carries the same semantics as partial, but we always enable it when parsing model output to capture a partial AST. The name change is to differentiate between |
When reaching the max context, or
max_tokens, the current parse function will fail if there are still pieces to parse whenis_partial = false. This PR loosens the requirements and allows partially parsed output even at EOG. This behavior is the same as when streaming, except there is no subsequent parse attempt for the next token.#19869 (comment)
supercedes #19992
fixes #20193
fixes #20229
@trshimizu @akreal please test to see if this fixes the issues you are seeing.
re: invalid UTF-8 characters. I haven't seen this happen, but if you have a good reproducer then I'll be glad to test.