Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for specifying an arbitrary GBNF compatible grammar #1606

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

clevcode
Copy link

@clevcode clevcode commented Dec 19, 2023

in the Modelfile, for models running on the llama.cpp backend

Note that this is basically just the same PR as the one submitted by SyrupThinker in September (#565), and that has been mentioned in issue #1507 and #808 since then.

There are plenty of users that would appreciate this feature, so I really hope that it can get merged.

It's great that support for JSON grammar specifically has been added, by setting the GBNF grammar in question when JSON format is requested, but providing the user with the ability to specify an arbitrary grammar opens up for a lot more possibilities than that

Pull request #830 adds support for specifying JSON schemas, which is yet another great convenience feature for a specific and common usecase, but by adding support for arbitrary GBNF grammar it would be possible to have any model outputting data in any type of format, including custom DSLs and text-based file formats in general

This is a tremendously useful thing to have when building various types of automation related applications, so I really hope that this can get merged to avoid having to maintain separate forks. Ollama is a great project, let's keep making it even better

in the Modelfile, for models running on the llama.cpp backend

Note that this is basically just the same PR as the one submitted
by SyrupThinker in September (ollama#565), and that has been mentioned
in issue ollama#1507 and ollama#808 since then.

There are plenty of users that would appreciate this feature, so
I really hope that it can get merged.

It's great that support for JSON grammar specifically has been
added, by setting the GBNF grammar in question when JSON format
is requested, but by providing the user with the ability to
specify an arbitrary grammar opens up for a lot more possibilities
than that

Pull request ollama#830 adds support for specifying JSON schemas, which
is yet another great convenience feature for a specific and common
usecase, but by adding support for arbitrary GBNF grammar it would
be possible to have any model outputting data in any type of format,
including custom DSLs and text-based file formats in general

This is a tremendously useful thing to have when building various
types of automation related applications, so I really hope that this
can get merged to avoid having to maintain separate forks. Ollama
is a great project, let's keep making it even better
@clevcode
Copy link
Author

clevcode commented Dec 19, 2023

A really simple Modelfile example, to ensure that a model only answers with a Python code block 😄

FROM deepseek-coder
PARAMETER grammar """
root ::= "\x60\x60\x60python3\n" [^\x60]+ "\n\x60\x60\x60"
"""

@clevcode
Copy link
Author

Theoretically it would even be possible to enforce that the actual code produced is syntactically valid Python code. Adding a good SYSTEM prompt helps a lot of course.

Here's an example of it in action:

$ cat Modelfile 
FROM deepseek-coder:33b-instruct-q6_K

TEMPLATE """{{ .System }}
### Instruction:
{{ .Prompt }}
### Response:
"""

SYSTEM """You are an expert coding assistant, striving for excellence in everything you do.

Respond concisely, but without leaving out important details. Skip caveats and explanations that are obvious to advanced users though. Step-by-step thinking, logically and analytically. Strive to provide the best possible solutions.

Use complete markdown-based code blocks that can be passed directly to a Python interpreter.  Always try to provide complete solutions to whatever is being asked without anything similar to "TODO" comments. That being said, only address the specific request provided and assume that anything else has already been taken care of.
"""

PARAMETER grammar """
root ::= "\x60\x60\x60python3\n" [^\x60]+ "\n\x60\x60\x60"
"""

PARAMETER num_ctx 16384
$ ollama create coder-python
transferring model data 
reading model metadata 
creating template layer 
creating system layer 
creating parameters layer 
creating config layer 
using already created layer sha256:cee2b20336444a7fc764ae4a31d7c3ca135a2fab233714b15dd230aff93a7010 
using already created layer sha256:a3a0e9449cb691a12f4de1d03725fd41326614fdeaf5d80b28c51187da0bed0e 
using already created layer sha256:602d4199b3b775f993839cf879c0633c266b8e3dd07f18c51ce68754abd609dd 
using already created layer sha256:8893e08fa9f91f7dc39e24d27bdfaece4e9c86bb3269293ff8cea6cba98c872d 
using already created layer sha256:584fd87f75335d530f3f26e6f27c38cb4d98204ffd5161acf710d22d17b68e31 
using already created layer sha256:179c66e0d123a43313f24669830090abc1981994ef663e6720d4d5b862cd6201 
using already created layer sha256:0667a8032296b8d28afab2b222f7aaf91bd6dbac28cd05910aef5bf901e3b4ad 
writing manifest 
success 
$ ollama run coder-python print the 100th fibonacci number | grep -v '^```' | python3
218922995834555169026

@clevcode
Copy link
Author

clevcode commented Dec 19, 2023

Another example:

$ wget https://raw.githubusercontent.com/ggerganov/llama.cpp/master/examples/json-schema-to-grammar.py
$ cat > movie-schema.json << EOF
{
  "type": "object",
  "required": ["title", "director", "releaseDate"],
  "properties": {
    "title": {
      "type": "string"
    },
    "director": {
      "type": "string"
    },
    "releaseDate": {
      "type": "string",
      "format": "date"
    },
    "genre": {
      "type": "string",
      "enum": ["Action", "Comedy", "Drama", "Science Fiction"]
    },
    "duration": {
      "type": "string"
    },
    "cast": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "additionalItems": false
    }
  }
}
EOF
$ cat > Modelfile << EOF
FROM deepseek-coder:33b-instruct-q6_K

TEMPLATE """{{ .System }}
### Instruction:
{{ .Prompt }}
### Response:
"""

SYSTEM """You are an AI developed by OpenAI. You process data and respond with JSON"""

PARAMETER grammar """
$(python3 json-schema-to-grammar.py movie-schema.json)
"""

PARAMETER num_ctx 16384
EOF
$ ollama create movie-info
...
$ pip install strip-tags
$ curl -sSL -A Chromium -s https://www.imdb.com/title/tt1375666 | strip-tags | ollama run movie-info 
{ "cast": ["Leonardo DiCaprio", "Joseph Gordon-Levitt", "Elliot Page"], "director": "Christopher 
Nolan", "duration": "2 hours 28 minutes", "genre": "Action", "releaseDate": "July 16, 2010 (United 
Kingdom)", "title": "Inception" } 

@clevcode
Copy link
Author

PS. The reason I'm telling deepseek-coder that it's an AI developed by OpenAI is because of this pretty hilarious result 😄

"Telling mixtral that it is "ChatGPT developed by OpenAI" boosts humaneval score by 6%"

https://www.reddit.com/r/MistralAI/comments/18lhila/telling_mixtral_that_it_is_chatgpt_developed_by/

Telling non-OpenAI models that they were developed by OpenAI might actually boost their performance, "self-belief" matters ;)

@shroominic
Copy link

shroominic commented Dec 23, 2023

LGTM!

@Yu-Vitaqua-fer-Chronos
Copy link

Would love for this to be merged, it'd be very, very useful to be able to get responses formatted in JSON following a specific format, or as people have said here, to confine it to a language's grammar reliably :p

@nfsecurity
Copy link

This would be awesome, particularly when you need a specific output like "yes or no".

@DeNeutoy
Copy link

This would enable a whole host of powerful applications if merged. Would love to see this get in!

@jayfalls
Copy link

Please can we get this merged, what still needs to happen?

@ryanpeach
Copy link

The diff is so small! Is this really all that is needed to enable it?

@nperez
Copy link

nperez commented Apr 1, 2024

It would be awesome to get this merged but apparently it has some bitrot now. Maybe @clevcode can resolve those so we can make it trivial for maintainers to merge and renew our calls for it?

Just want to add my voice to the others wanting this merged in. Grammar constrained generation is a must for building reliable tools on top of LLMs, and the hard work has already been done by the GGML folks.

@jayfalls
Copy link

jayfalls commented Apr 1, 2024

I'm willing to do it with the latest code, but @jquesnelle already did that over a month ago with #2404, and still no merge

No feedback from core team, so I'm wondering if they're just not seeing this pull in the ocean of pulls they have

@nperez
Copy link

nperez commented Apr 1, 2024

Ooof, yeah, now that I look more closely, you're right that there are multiple PRs specifically for this feature.
image

@oeway
Copy link

oeway commented Apr 4, 2024

+1 Please! This is a great feature to have!

@mishushakov
Copy link

Would love to use this for my project

@thiswillbeyourgithub
Copy link

This is an awesome feature! Image what we could do if llama3 could output python list formats! I'm eagerly waiting for this to up my "auto labelling" workflows in various apps.

@qkxie
Copy link

qkxie commented Apr 23, 2024

three month has passed. this feature wasn't merged.

@alexclaydon
Copy link

We love Ollama and have been using it for months, but we can’t wait much longer for GBNF, particularly now with the launch of viable small models like Llama3:8b. Would be really helpful to hear something from the core team on the roadmap for this - it seems such a glaring omission. We’re on the verge of throwing in the towel and switching over to llama.cpp.

@UmutAlihan
Copy link

I am very looking forward to see support for GBNF so that we can force models proper json using ollama. I really want to utilize further ollama for such use cases as well.

@tionis
Copy link

tionis commented Apr 30, 2024

Just a small note: This PR does not apply cleanly anymore. I did merge it just for testing purposes with some small modifications and the grammars didn't seem to apply correctly.
I did however only do two very quick tests, so maybe I misused the API.

@trustedtomato
Copy link

I applied this patch to an earlier version of ollama, and it works nicely:) the only problem might be, before it gets into upstream, is that an incorrect grammar crashes the server. It would be much nicer if the server handled the error, and just returned some error code with the error message coming from Llama.cpp.

@jacopofar
Copy link

@trustedtomato I think llamacpp does not return any error message, it just segfaults when it gets a wrong grammar file

@trustedtomato
Copy link

@trustedtomato I think llamacpp does not return any error message, it just segfaults when it gets a wrong grammar file

In the end something definitely segfaults, but I have an error message before that, which can be traced back to: https://github.com/ggerganov/llama.cpp/blob/1fd9c1741d864d01cd7ec6d67227b92d7bfabf22/common/grammar-parser.cpp#L299 and https://github.com/ggerganov/llama.cpp/blob/master/common/grammar-parser.cpp#L258
Here are the logs of ollama serve just before it crashes, showing an example of the fact that there is an error message:

{"function":"update_slots","level":"INFO","line":1636,"msg":"slot released","n_cache_tokens":1176,"n_ctx":2048,"n_past":1175,"n_system_tokens":0,"slot_id":0,"task_id":0,"tid":"127570538399424","timestamp":1715168697,"truncated":false}
parse: error parsing grammar: expecting ::= at := [0-9] | [1-9] [0-9]*
natintarray ::= "[" ws (natint ("," ws natint)*)? ws "]"
string ::=
  "\"" (
    [^"\\\x7F\x00-\x1F] |
    "\\" (["\\/bfnrt] | "u" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F]) # escapes
  )* "\"" ws
stringarray ::= "[" ws (string ("," ws string)*)? ws "]"
answerprefix ::= "{" ws "\"answer\":" ws
answerpostfix ::= ws "}"
root ::=  answerprefix natintarray answerpostfix
llama_sampling_init: failed to parse grammar
{"function":"launch_slot_with_data","level":"INFO","line":827,"msg":"slot is processing task","slot_id":0,"task_id":100,"tid":"127570538399424","timestamp":1715168697}
Segmentation fault (core dumped)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet