Skip to content

Fix grammar parsing issues to prevent stack overflow and hangs#18604

Merged
pwilkin merged 5 commits intoggml-org:masterfrom
aagit:grammar-fixes
Mar 21, 2026
Merged

Fix grammar parsing issues to prevent stack overflow and hangs#18604
pwilkin merged 5 commits intoggml-org:masterfrom
aagit:grammar-fixes

Conversation

@aagit
Copy link
Contributor

@aagit aagit commented Jan 5, 2026

This pull request addresses some issues in the grammar parsing system that could lead to stack overflow and hangs when processing certain GBNF grammars. The fixes include:

  1. Stack overflow prevention: Added cycle detection in llama_grammar_advance_stack to prevent infinite recursion when processing grammars with nullable symbols that could lead to infinite derivations of empty strings.

  2. Iterative implementation: Converted the recursive llama_grammar_advance_stack function to an iterative approach using explicit stacks, which eliminates the risk of stack overflow from deep recursion.

  3. Repetition threshold checking: Added a maximum repetition threshold to prevent excessive rule expansion during grammar parsing of deeply nested repetition patterns like {m,n}.

The repetition threshold value hasn't changed, but it now applies to all nested rules so it makes some valid grammar invalid, supposedly such previously valid grammars would hang or stack overflow.

Testing

The changes have been tested with:

  • Existing test suites (test-llama-grammar, test-grammar-integration, test-grammar-parser)
  • The two llama-server curl reproducers mentioned in the commits
  • Manual verification with some ripgrep-edit sessions with GBNF enabled

New test cases have been added to verify:

  • The stack overflow case with ( [x]* )* grammar is fixed
  • The hang case with deeply nested repetition patterns is rejected

@fiesh
Copy link

fiesh commented Feb 26, 2026

This fixes #19845

@0cc4m
Copy link
Contributor

0cc4m commented Mar 10, 2026

@ggerganov This has been stuck for a while, can you take a look and let us know how to proceed?

@ggerganov
Copy link
Member

@pwilkin Would you like to take a look and review?

@pwilkin pwilkin self-assigned this Mar 10, 2026
@pwilkin
Copy link
Contributor

pwilkin commented Mar 10, 2026

@ggerganov Aye, can look.

Copy link
Contributor

@pwilkin pwilkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please run editorchecker and fix the indentation issues, otherwise looks fine. See also my changes to the seen and let me know if you approve.

aagit added 5 commits March 11, 2026 14:48
Reproduce stack overflow (or OOM) with ( [x]* )* found while adding
GBNF support to ripgrep-edit.

llama-server reproducer:

curl \
  -X POST \
  -d '{
    "messages": [{ "role": "user", "content": "write yes" }],
    "grammar": "root ::= ( [x]* )*"
  }' \
  -H "Content-Type: application/json" \
  http://localhost:8811/v1/chat/completions
Fix a potential stack overflow in llama_grammar_advance_stack that
could occur when processing grammars with nullable symbols that lead
to infinite derivations of empty strings. The fix introduces cycle
detection by tracking visited stacks to prevent infinite recursion.

rg-edit regexp: llama_grammar_advance_stack
rg-edit extra-args: -A20
rg-edit directive: """Rewrite: fix the following segfault:

[..]
⚫ Testing segfault. Grammar:
            root ::= ( [x]* )*

            root ::= ( [x]* )*

Segmentation fault         build/bin/test-grammar-integration"""

gptel-context:
(("~/llama.cpp/src/llama-grammar.cpp")
 ("~/llama.cpp/tests/test-grammar-integration.cpp")
 ("~/llama.cpp/grammars/./list.gbnf")
 ("~/llama.cpp/grammars/./json_arr.gbnf")
 ("~/llama.cpp/grammars/./json.gbnf")
 ("~/llama.cpp/grammars/./japanese.gbnf")
 ("~/llama.cpp/grammars/./english.gbnf")
 ("~/llama.cpp/grammars/./chess.gbnf")
 ("~/llama.cpp/grammars/./c.gbnf")
 ("~/llama.cpp/grammars/./arithmetic.gbnf")
 ("~/llama.cpp/grammars/./README.md"))
This change converts the function to an iterative approach using
explicit stacks, which prevents deep recursion and eliminates the risk
of stack overflow.

rg-edit regexp: llama_grammar_advance_stack
rg-edit extra-args: -A30
rg-edit directive: """Rewrite: fix the following segfault:

[..]
⚫ Testing segfault. Grammar:
            root ::= ( [x]* )*

            root ::= ( [x]* )*

Segmentation fault         build/bin/test-grammar-integration

convert from recursive to interactive"""

gptel-context:
(("~/llama.cpp/src/llama-grammar.cpp")
 ("~/llama.cpp/tests/test-grammar-integration.cpp")
 ("~/llama.cpp/grammars/./list.gbnf")
 ("~/llama.cpp/grammars/./json_arr.gbnf")
 ("~/llama.cpp/grammars/./json.gbnf")
 ("~/llama.cpp/grammars/./japanese.gbnf")
 ("~/llama.cpp/grammars/./english.gbnf")
 ("~/llama.cpp/grammars/./chess.gbnf")
 ("~/llama.cpp/grammars/./c.gbnf")
 ("~/llama.cpp/grammars/./arithmetic.gbnf")
 ("~/llama.cpp/grammars/./README.md"))

v2: Added a `std::set` to perform tree-based lookups with O(N log N)
complexity. Testing with a parallel run of `test-grammar-integration`
shows a double-digit percentage increase in runtime. An
`unordered_set` with O(1) hashing was also evaluated, but the overhead
of constructing hash keys from pointers made it significantly slower
than the rbtree implementation that only requires an ordering
operator. The performance regression in the test suite appears
justified by the overall reduction in algorithmic complexity.

Co-developed-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>
This commit adds a new test case to the grammar integration tests that
specifically targets a hang scenario in the repetition grammar parser
found while adding GBNF support to ripgrep-edit.

llama-server reproducer:

curl \
  -X POST \
  -d '{
    "messages": [{ "role": "user", "content": "write yes" }],
    "grammar": "root ::= (([^x]*){0,99}){0,99}"
  }' \
  -H "Content-Type: application/json" \
  http://localhost:8811/v1/chat/completions
The change introduces a maximum repetition threshold to avoid
excessive rule expansion during grammar parsing. When parsing
repetition patterns like {m,n}, the parser now calculates the
potential number of rules that would be generated and throws an error
if the product of previous rules and new rules exceeds the threshold.

A test case was added to verify the threshold is properly enforced for
deeply nested repetition patterns that would otherwise cause hangs.
@aagit
Copy link
Contributor Author

aagit commented Mar 11, 2026

Please run editorchecker and fix the indentation issues, otherwise looks fine. See also my changes to the seen and let me know if you approve.

Sure the set addition looks good. I would have kept it incremental, but the UI seems to suggest to fold it, so I folded it into the patch, I don't mind either ways. I also tried an unordered_set, but that requires building a key from all pointers in the stack vector and using test-grammar-integration as benchmark it was slower than the rbtree in the set. The set is also slower than the original linear vector but less (around 13% increase in runtime).

@pwilkin
Copy link
Contributor

pwilkin commented Mar 11, 2026

"I also tried an unordered_set, but that requires building a key from all pointers in the stack vector and using test-grammar-integration as benchmark it was slower than the rbtree in the set."

Yeah tried that as well but it required too much setup with the hash function.

13% is fine if it helps us prevent catastrophic times with some very big grammars.

@aagit aagit requested a review from pwilkin March 21, 2026 17:30
@pwilkin
Copy link
Contributor

pwilkin commented Mar 21, 2026

Oh, I'm sorry, should've pinged me earlier :)

@pwilkin pwilkin merged commit 990e4d9 into ggml-org:master Mar 21, 2026
66 of 78 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants