common : implement parser combinators for chat parsing [WIP] #17136

aldehir · 2025-11-10T03:28:32Z

Putting this out there as a proof-of-concept and to gather feedback. It is still a WIP.

Problem

Each model currently requires a custom parser to handle reasoning and tool calls. XML-based models are especially tricky to parse. For example, Qwen3-Coder outputs:

<tool_call>
<function={name}>
<parameter={arg-name}>{arg_value as json or string}</parameter>
...
</function>
</tool_call>

The main issue is the typed arguments. A raw string looks the same as JSON until you try to parse it. One workaround is to treat it as a string only if JSON parsing fails. A better approach is to let the argument types drive the parser directly.

Proposal

I propose using parser combinators to simplify parsing. We can compose parsers suitable for PEG grammars, which should make parsing model output much easier. With this approach, we can generate specialized parsers on the fly that match the argument type specifications. This PR implements a proof-of-concept.

Here's an example of what that currently looks like:

auto parser = build_peg_parser([](common_chat_peg_parser_builder & p) {
    auto thinking = p.add_rule("raw-reasoning",
        "<think>" << p.add_rule("reasoning-content", p.until("</think>")) << "</think>");

    auto content = p.add_rule("content", p.until("<tool_call>"));

    auto arg_name = p.add_rule("arg-start", "<parameter=" + p.capture("arg-name", p.chars("[a-zA-Z0-9_]")) + ">");
    auto arg_end = p.add_rule("arg-end", "</parameter>" + p.peek(p.literal("<parameter=") | "</function>"));

    auto string_arg_content = p.add_rule("arg-string-content",
        p.until_one_of({"</parameter><parameter=", "</parameter></function>"}));

    auto string_arg = p.add_rule("arg-string", arg_name + string_arg_content + arg_end);

    auto json = p.json();

    auto json_arg = p.add_rule("arg-json", arg_name + p.add_rule("arg-json-content", json) + arg_end);

    auto function = p.add_rule("function",
            p.add_rule("function-start", "<function=" + p.capture("tool-name", p.chars("[a-zA-Z0-9_]")) + ">")
            // NOTE: This will accept JSON if it can be parsed, otherwise fall back to a raw string.
            // In practice, the rule should be derived from the argument types of the tool.
            + p.one_or_more(json_arg | string_arg)
            + "</function>");

    auto tool_call = p.trigger(p.add_rule("tool-call",
        "<tool_call>" + p.one_or_more(function) + "</tool_call>"));

    return thinking + p.optional(p.space() + content) + p.zero_or_more(p.space() + tool_call);
});

// GBNF grammar rules derived from the PEG parser.
// 
// NOTE: GBNF describes a CFG grammar. PEG and CFG are similar, but they do
// have some differences that make them incompatible. We address this by
// converting a PEG grammar to a CFG equivalent when possible (e.g. until() is
// converted to an exclusion pattern).
//
// Ordered vs. unordered choice is irrelevant since the underlying sampling
// effectively forces ordered choice. This is my understanding of how it works,
// but feel free to fact check me on this.

auto grammar = build_grammar([&](const common_grammar_builder & builder) {
    parser.build_grammar(builder);
});

auto lazy_grammar = build_grammar([&](const common_grammar_builder & builder) {
    parser.build_grammar(builder, true);
});

// SAX-style parsing allows us to define the logic independent of parser construction.
auto handler = [&](const common_chat_parse_event & ev, common_chat_parse_semantics & semantics) {
    // Most of these can be built in to a common handler and reused across models.
    if (ev.rule == "reasoning-content" && ev.ending()) {
        semantics.reasoning_content = ev.text;
    }

    if (ev.rule == "content" && ev.ending()) {
        semantics.content = ev.text;
    }

    if (ev.rule == "function-start" && ev.ending() && ev.success()) {
        semantics.tool_calls.emplace_back();
        auto & tc = semantics.tool_calls.back();
        tc.name = semantics.captures["tool-name"];
    }

    if (ev.rule == "arg-start" && ev.ending() && ev.success()) {
        auto & tc = semantics.tool_calls.back();
        auto name = semantics.captures["arg-name"];
        if (tc.arguments.empty()) {
            tc.arguments += "{";
        } else {
            tc.arguments += ", ";
        }
        tc.arguments += "\"" + name + "\": ";
    }

    if (ev.rule == "arg-string-content" && ev.ending() && ev.success()) {
        auto & tc = semantics.tool_calls.back();
        tc.arguments += "\"" + std::string(ev.text);
    }

    if (ev.rule == "arg-string" && ev.ending() && ev.success()) {
        auto & tc = semantics.tool_calls.back();
        tc.arguments += "\"";
    }

    if (ev.rule == "arg-json-content" && ev.ending() && (ev.success() || ev.need_more_input())) {
        auto & tc = semantics.tool_calls.back();
        tc.arguments += std::string(ev.text);
    }
};

common_chat_parse_semantics semantics;
common_chat_parse_context ctx(in, &semantics, handler, /* is_input_complete = */ true);

// The parser can be reused, instead of constructed every time.
// Although, there does not appear to be an easy way to do this right now.
auto result = parser.parse(ctx);

std::cout << "Reasoning: " << semantics.reasoning_content << "\n";
std::cout << "Content:   " << semantics.content << "\n";
if (!prev.tool_calls.empty()) {
    std::cout << "Tool Calls:\n";
    for (const auto & tc : semantics.tool_calls) {
        std::cout << "  ID  : " << tc.id << "\n";
        std::cout << "  Name: " << tc.name << "\n";
        std::cout << "  Args: " << tc.arguments << "\n\n";
    }
}

The generated parse tree can be used to produce a GBNF grammar. The plan is to build the parser during chat param initialization and derive grammar rules with support for lazy triggers. This should support both tool_choice = auto and tool_choice = required.

array ::= "[" space ( value ("," space value)* )? "]" space
boolean ::= ("true" | "false") space
char ::= [^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4})
content ::= ([^<] | "<" [^t] | "<t" [^o] | "<to" [^o] | "<too" [^l] | "<tool" [^_] | "<tool_" [^c] | "<tool_c" [^a] | "<tool_ca" [^l] | "<tool_cal" [^l] | "<tool_call" [^>])*
decimal-part ::= [0-9]{1,16}
get-weather ::= object
integral-part ::= [0] | [1-9] [0-9]{0,15}
null ::= "null" space
number ::= ("-"? integral-part) ("." decimal-part)? ([eE] [-+]? integral-part)? space
object ::= "{" space ( string ":" space value ("," space string ":" space value)* )? "}" space
reasoning ::= "<think>" space ([^<] | "<" [^/] | "</" [^t] | "</t" [^h] | "</th" [^i] | "</thi" [^n] | "</thin" [^k] | "</think" [^>])* space "</think>"
root ::= reasoning space content? space tool-call?
space ::= | " " | "\n"{1,2} [ \t]{0,20}
string ::= "\"" char* "\"" space
tool-call ::= "<tool_call>" space tool-call-name space tool-call-args space "</tool_call>"
tool-call-args ::= "<args>" space get-weather space "</args>"
tool-call-name ::= "<name>" space ([^<] | "<" [^/] | "</" [^n] | "</n" [^a] | "</na" [^m] | "</nam" [^e] | "</name" [^>])* space "</name>"
value ::= object | array | string | number | boolean | null

Specifics

NOTE: This is still a WIP. I am iterating over the parsers and seeing what works well.

This PR implements parser combinators for PEG grammars. It uses caching to implement packrat parsing. The following are implemented:

Basic Parsers

literal(string) - Matches an exact literal string. S -> "hello"
any() - Matches any single character. S -> .
one(classes) - Matches a single character from a character class or range. S -> [a-z] or S -> [^0-9]
chars(classes, min, max) - Matches between min and max repetitions of characters from a character class. S -> [a-z]{m,n}. Use -1 for max to represent unbounded repetition {m,}

Operators

Parsers can be combined using operator overloading for convenient syntax:

~p - Negative lookahead, equivalent to negate(p). S -> !A
p1 + p2 - Sequence, matches p1 followed by p2, equivalent to sequence({p1, p2}). S -> A B
p1 | p2 - Choice, matches p1 or p2, equivalent to choice({p1, p2}). S -> A | B
p1 << p2 - Sequence with whitespace in between, equivalent to sequence({p1, space(), p2}). S -> A [ \t\n]* B

Operators also work with string literals on the left side:

"literal" + p - Sequence starting with a literal string
"literal" | p - Choice with a literal string as first alternative
"literal" << p - Literal followed by whitespace then parser

Combinators

sequence(parsers) - Matches a sequence of parsers in order, all must succeed. S -> A B C
choice(parsers) - Matches the first parser that succeeds from a list of alternatives. S -> A | B | C
one_or_more(p) - Matches one or more repetitions of a parser. S -> A+
zero_or_more(p) - Matches zero or more repetitions of a parser, always succeeds. S -> A*
optional(p) - Matches zero or one occurrence of a parser, always succeeds. S -> A?
repeat(p, min, max) - Matches between min and max repetitions of a parser (inclusive). S -> A{m,n}. Use -1 for max to represent unbounded repetition {m,}
repeat(p, n) - Matches exactly n repetitions of a parser. S -> A{n}
negate(p) - Negative lookahead: succeeds if child parser fails, consumes no input. S -> !A

Utility Parsers

space() - Matches zero or more whitespace characters (space, tab, newline). S -> [ \t\n]*
until(delimiter, consume_spaces) - Matches all characters until a delimiter is found (delimiter not consumed). S -> (!delim .)*
rule(name) - References a named rule for recursive or reusable grammar definitions. expr -> term | expr "+" term

JSON Parsers

json() - Creates a complete JSON parser supporting objects, arrays, strings, numbers, booleans, and null. value -> object | array | string | number | true | false | null
json_string() - Specialized single-pass JSON string parser with escape sequence handling

GBNF Integration

schema(p, name, schema) - Wraps a parser with JSON schema metadata for grammar generation. Used internally to convert JSON schemas to GBNF grammar rules.
trigger(p) - Mark the parser as the start of a trigger rule.

Rule Management

add_rule(name, p) - Adds a named rule to the grammar for reuse and recursion

The operators +, |, and ~ construct sequence, choice, and negate parsers respectively. The << operator includes a space rule between parsers.

Drawbacks

Parsers that match content while excluding certain patterns, such as end tags, have a less obvious syntax. For example, p.zero_or_more(~(space + p.literal("</think>")) + p.any()) matches any character that isn't followed by </think>. The p.until("</think>") parser is intended to simplify this.
Packrat parsing requires caching all intermediate parse results, which introduces memory overhead proportional to input size and grammar complexity
Each model still requires a custom parser, though they share a common framework that simplifies implementation
Parser combinators may offer less flexibility for handling malformed model output compared to hand-written parsers, though constrained decoding should prevent malformed tool calls

To do

pwilkin · 2025-11-10T12:00:28Z

Yes! This is exactly what I was thinking about :) can you give me push writes to your repo so I can contribute without doing PRs to PRs?

aldehir · 2025-11-10T15:47:05Z

Yes! This is exactly what I was thinking about :) can you give me push writes to your repo so I can contribute without doing PRs to PRs?

Sure. I've never managed permissions on a GitHub repo, but let me know if you can't push.

The interface isn't solidified, so hammer away. I do want to clean up the header and move stuff into the source file. Figured I'd handle that as I get further along.

The partial parsing works, but does require careful attention if editing. The idea is to "succeed" if the parse tree is partially traversed and the input is marked as incomplete. With some caveats: if a literal is partially matched, it will propagate a result indicating we need more input. I intend to add a regex parser that uses the builtin partial regex matching support, which should do the same thing. This allows us to collect the results when sending a streaming response.

I need to clean up the caching. Initially, I thought, maybe we could reuse the cache as we get more and more input. I'm finding it very difficult to find the correct time to cache. So I'm thinking about nixing that idea and just provide a cache per parsing run--as the packrat algorithm originally intended. Then we can profile if caching is beneficial or not on a real example. I suspect there shouldn't be a whole lot of backtracking, so the memory cost might not be worth it if the gains are minuscule.

pwilkin · 2025-11-10T17:58:14Z

Aight, let me bounce my original idea - what if we just created a GBNF parser builder and used that to parse the messages? Then we have both problems (tool call / reasoning and compatibility with normal parsing) done in one go. Unless (haven't looked into it) it would just be too inefficient for normal content parsing?

Because right now it feels like we're adding another intermediate abstraction while GBNF is already implemented in GGML - so maybe just use a builder as an abstraction layer to create all the needed objects and add any missing partial parse support?

This is just an idea, not very fixated on it, just thought I'd share it. Regarding memory coatsnand the packrat parser, I think O(n) with typical LLM inputs is negligible, even with super long contexts we're looking at like a few MB overhead at most.

aldehir · 2025-11-10T18:52:49Z

Sounds like you're thinking of a parser generator. Something like yacc, bison, or ANTLR. The problem I see with those solutions is they require building a parse table upfront, which is less intuitive than building a parse tree such as in this PR. You could create a recursive descent parser but that would have to be done at compile time. If you did it at runtime, I think the solution would look a lot like this!

I haven't examined the GBNF code with a scalpel, but taking a brief look it seems like it uses a pushdown automata and may be challenging to extract content. Not that we would want to, since it is part of the core and not common. I believe there is a desire to keep the chat parsing isolated in common.

I also think you lose the expressiveness of being able to define the grammar in C++. For example, with this solution we could add a execute() parser to take in a user lambda and run when the parse subtree succeeds. You could define prune() that removes parts of the tree on a condition, such as if there no tools are provided. Not saying we want to do that, just to demonstrate the flexibility offered.

The solutions I mentioned above do this by defining their own language to insert code--not pretty in my experience.

That said, I am open to ideas. If you have a clearer picture of what that looks like, I'm happy to review. I understand inserting a new abstraction is a tough ask. I wanted to roll out a PoC to hopefully show value.

pwilkin · 2025-11-10T20:40:23Z

@aldehir Nah, you're probably right. I looked at the GBNF code and in fact it would take too much effort to extract the parsed content from there. We're better off just doing it your way. I'll try to code some of the missing pieces.

aldehir · 2025-11-10T22:13:59Z

@pwilkin great! If you have any questions, feel free to ask.

pwilkin · 2025-11-12T00:08:32Z

Aight, I'm done with the hybrid ops and convert_hf_to_gguf refactoring cleanup, so I'll probably finally look at this tomorrow :>

…g functions for each parser

aldehir · 2025-11-12T06:07:16Z

No rush. I am getting closer to a set of parsing functions that I'm happy with. The unfortunate part is I had to roll specialized parsers to maintain comparable performance with the existing parsing. A lexer would likely help, but optimized parsers for certain use cases is enough for now.

I added a benchmark in the test that implements the Command R2B parser, and compares it to the existing one. It seemed like a good one to illustrate.

// Benchmarks are over 100 iterations
Reasoning + Content:
   New parser avg: 23 us
Legacy parser avg: 450 us

Reasoning + Tool Call:
   New parser avg: 263 us
Legacy parser avg: 151 us

The existing parsing has a leg up with JSON. That said, it's still a fraction of a millisecond for a full prompt. I think most of the cost will go into the constrained decoding anyway. I'll have to benchmark larger JSON documents. Worst case, we can fall back to the implementation in json-partial.cpp. The intent here is to better support streaming JSON.

aldehir · 2025-11-16T08:36:11Z

I'm thinking we should put the helpers in a separate file. The parser implementation is pretty big. It feels complete, though.

pwilkin · 2025-11-16T11:05:27Z

@aldehir Yeah, I split off the helper as a subclass of the main builder, will add any further helpers there, should avoid overfilling the main parser class.

I also reverted the old explicit Qwen3 parser builder and added the new helper alongside it. Restructured the test a bit to make it clearer. Now I'm going to try and add as many of the old parsers as possible to see how well it'll go and potentially get good patterns for the helpers.

… them

pwilkin · 2025-11-16T13:48:20Z

Aight, Minimax M2 and Seed-OSS are up. With the first one, I did a stupid mistake of doing different tool definitions from tool calls, so couldn't get a proper parse, so I added some debugging prints to go + a live example of how to use them :)

BTW, the current solution is if an incorrect function call is detected, it's still marked as a success since zero_or_more always succeeds, not sure if we don't want to pass a failure over somehow (as in, zero_or_more only trivially succeeds if the rest is empty?)

aldehir · 2025-11-16T17:52:40Z

Thanks!

I just found a case for keeping tests to 1 source file: it's a little hard to test in isolation :). If they were in a single source file, you can run ctest -V -R test-chat-peg-parser to run all tests, or -R test-chat-peg-parser-example-qwen3 to run one (or however many contain that prefix).

By incorrect, do you mean if the model generated an invalid tool call? I don't think that should happen in practice. With constrained decoding, we enforce the grammar so it should be parseable. If there are no tools, then we shouldn't constrain and make the reasoning/content parsing as permissive as possible. Also shouldn't build a parser that has a tool calling support and should just consume all content until the end.

You can add p.end() to the end to ensure that everything is consumed, but I found a bug when min repetitions == 0. I'll push out a fix here in a bit.

aldehir · 2025-11-16T18:42:09Z

common/chat-peg-parser-helper.h

+    if (ev.rule.find("arg-string") != std::string::npos && ev.ending() && ev.success()) {
+        auto & tc = semantics.tool_calls.back();
+        tc.arguments += "\"";
+    }


This was causing me some grief, because it matches the "arg-string-content" above and was adding two quotes to the end.

I don't think we should do a search here. I assume you mean to match arg-string-<param>. We can still produce those rules, but wrap them with an arg-string rule and use direct string comparison. Just to avoid tiny little bugs like this.

Actually, never mind. I see why you did that. Ok, I have to rethink this.

aldehir · 2025-11-16T20:14:39Z

Ok, to better support writing custom helpers and simplify a few things, I'm going to:

Introduce ref() to reference a rule. The rule() function will be the actual rule definition. This replaces add_rule(). At the end we can resolve the rule references by traversing the parse tree. With that in place, helpers don't have to subclass builder. They can just use the builder to generate a subtree of rules. Users of helpers can attach that subtree to their own parser.
Remove trigger(), instead add it as an attribute to rules. I think it's ok to say only rules can be triggers.
Add an annotation property to rules. I noticed that we need to perform the same logic in the event handler for certain nodes, but they can't be named the same. We can use the annotation field instead.

pwilkin · 2025-11-16T20:32:04Z

Yeah, was thinking something similar, either add an extra property or make the rule name itself structured somehow (as in "category" and "name").

- Refactor all messaging to use LOG_ERR - Fix lack of argument / tool name capturing - Temporary fix for double event capture

pwilkin · 2025-11-16T20:41:19Z

Alright, I added some stuff. Besides doing a temporary workaround for the double-event problem (I renamed arg-string-content to arg-str-content to fix the double match):

I added an option for selective testing, you can now run all tests with test-chat-peg-parser, or you can enumerate the tests to run only the selected ones, --help lists the available tests
I refactored all the printouts to use LOG_ERR to help with interleaving conflicts for bufferring with the use of C and C++ printouts
I fixed the helpers to correctly capture argument and function names

Besides that, I fixed in the other tests the one thing that I already fixed in the Minimax-M2 test but forgot to mention: the logic for determining whether you should to a complete parse was wrong, because std::accumulate is, like most "substring" functions, exclusive, so you actually have to do it + 1 instead of it and likewise check it + 1 == tokens.end() instead of it == tokens.end().

common : implement parser combinators to simplify chat parsing

c822e73

github-actions bot added the testing Everything test related label Nov 10, 2025

DajanaV mentioned this pull request Nov 10, 2025

UPSTREAM PR #17136: common : implement parser combinators for chat parsing [WIP] auroralabs-loci/llama.cpp#153

Closed

8 tasks

aldehir added 4 commits November 9, 2025 22:34

add virtual destructor to parser_base

e6153bb

fix memory leak from circular references of rules

4ced999

implement gbnf grammar building

2a9a13d

remove unused private variable

2286532

aldehir added 11 commits November 10, 2025 20:17

create a base visitor and implement id assignment as a visitor

3e6662f

fix const ref for grammar builder

76cf0b5

clean up types, friend classes, and class declarations

9c7b3e8

remove builder usage from until_parser

f02e2b0

Use a counter class to help assign rule ids

66cf038

cache everything

2b3caef

add short description for each parser

adac6ba

create a type for the root parser

0be2a93

implement repetition parser

31b386f

Make optional, one_or_more, and zero_or_more subclasses of repetition

ffb7a6f

improve context constructor

085404a

aldehir added 3 commits November 11, 2025 22:22

improve until parsing and add benchmarks

6bd9a95

remove cached() pattern, cache in parser_base with specialized parsin…

62656db

…g functions for each parser

improve json parsing performance to better match legacy parsing

18557f3

fix const auto * it for windows

f6aa608

aldehir added 15 commits November 16, 2025 00:36

fix type

3d78144

reformat char class parsing

d2b4a4a

clean up json string parser

3da306b

clean up + fix diagnostics

26c9553

reorder includes

175cb57

compact builder functions

f5af89a

replace action_parser with capture_parser, rename env to semantics

9199b00

rename env to semantics

0c162a0

clean up common_chat_parse_context

bea64a0

move type() to below constant

27ffc9f

use default constructor for common_chat_peg_parser

425863e

make all operators functions for consistency

4413c5c

fix compilation errors in test-optional.cpp

817a0eb

simplify result values

f41539b

rename json_string_unquoted to json_string_content

7cf9b73

pwilkin added 2 commits November 16, 2025 12:02

Move helper to separate class, add separate explicit and helper classes

c0faa27

Whitespace

851b070

pwilkin added 4 commits November 16, 2025 12:31

Change + to append()

09976dd

Reformat

a1fc700

Add extra helpers, tests and Minimax example

d0c83f8

Add some extra optional debugging prints + real example of how to use…

bbcf1f6

… them

aldehir commented Nov 16, 2025

View reviewed changes

- Add selective testing

7d30b27

- Refactor all messaging to use LOG_ERR - Fix lack of argument / tool name capturing - Temporary fix for double event capture

common : implement parser combinators for chat parsing [WIP] #17136

Are you sure you want to change the base?

common : implement parser combinators for chat parsing [WIP] #17136

Conversation

aldehir commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Proposal

Specifics

Basic Parsers

Operators

Combinators

Utility Parsers

JSON Parsers

GBNF Integration

Rule Management

Drawbacks

To do

Uh oh!

pwilkin commented Nov 10, 2025

Uh oh!

aldehir commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pwilkin commented Nov 10, 2025

Uh oh!

aldehir commented Nov 10, 2025

Uh oh!

pwilkin commented Nov 10, 2025

Uh oh!

aldehir commented Nov 10, 2025

Uh oh!

pwilkin commented Nov 12, 2025

Uh oh!

aldehir commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aldehir commented Nov 16, 2025

Uh oh!

pwilkin commented Nov 16, 2025

Uh oh!

pwilkin commented Nov 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aldehir commented Nov 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aldehir Nov 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aldehir Nov 16, 2025

Choose a reason for hiding this comment

Uh oh!

aldehir commented Nov 16, 2025

Uh oh!

pwilkin commented Nov 16, 2025

Uh oh!

pwilkin commented Nov 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aldehir commented Nov 10, 2025 •

edited

Loading

aldehir commented Nov 10, 2025 •

edited

Loading

aldehir commented Nov 12, 2025 •

edited

Loading

pwilkin commented Nov 16, 2025 •

edited

Loading

aldehir commented Nov 16, 2025 •

edited

Loading

aldehir Nov 16, 2025 •

edited

Loading

pwilkin commented Nov 16, 2025 •

edited

Loading