Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce GBNF grammar parameter: run flags, REPL and API #2754

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

zopieux
Copy link

@zopieux zopieux commented Feb 26, 2024

This patch adds full support for the llama.cpp "grammar" option, which defaults to "" i.e. no grammar constraint.

For backwards compatability, the existing format parameter remains available in flags, the REPL and the API. Only one of grammar or format at a time is legal, since format: json is just a shortcut for the hard-coded JSON grammar.

Tested locally:

$ ./ollama run mistral-openorca --grammar-file json-two-string-array.bnf \
  'Write the result of "3+5" followed by the english spelling of the number, as two JSON strings'
["8", "Eight"]

$ ./ollama run mistral-openorca --grammar ' root ::= "[" [0-9]+ ", \"" [a-z]+ "\"]" ' \
  'Write the result of "3+5" followed by the english spelling of the number, as a json number and a json string'
[8, "eight"]

This more or less conflicts with #2404 which instead decided to go with an Options, which AFAICT is more geared towards creating derived models with an embedded grammar, whereas this patch is more about runtime override. It would be nice to eventually converge on something that does both, but today the concept of "format", which is very limiting (list of well-known formats and only JSON today), is not an Options but a top-level parameter and I wasn't sure how to make this play nice with #2404 without too many changes.

@i404788
Copy link

i404788 commented Mar 6, 2024

Bug report when testing:

When provided with an invalid grammar (for the llama.cpp backend) it will stop stop responding to new requests to generate. It seems like ollama does not detect that it has failed and will infinitely wait for tokens.

@zopieux
Copy link
Author

zopieux commented Mar 6, 2024

Bug report when testing:

When provided with an invalid grammar (for the llama.cpp backend) it will stop stop responding to new requests

Yeah in fact it literally segfaults and dumps core. However fixing this is beyond the scope of this PR as ollama itself can't do validation. It should however gracefully handle this failure.

@clevcode
Copy link

clevcode commented Mar 9, 2024

I would suggest making a simple stand alone GBNF grammar validator based on llama.cpp sources (as a C-exported function, that can be linked and used directly by ollama with a cgo-binding) and pre-validate the grammar before invoking llama.cpp. Simple to do, and essential if llama.cpp does not graciously handle invalid grammars itself.

This patch adds full support for the llama.cpp "grammar" option, which
defaults to "" i.e. no grammar constraint.

For backwards compatability, the existing `format` parameter remains
available in flags, the REPL and the API. Only one of `grammar` or
`format` at a time is legal, since `format: json` is just a shortcut
for the hard-coded JSON grammar.

Tested locally.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants