Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exposing grammar as a request parameter in completion/chat with go-side grammar validation #4525

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions api/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,9 @@ type GenerateRequest struct {
// Format specifies the format to return a response in.
Format string `json:"format"`

// Grammar specifies the GBNF grammar string to constrain generation output.
Grammar string `json:"grammar"`

// KeepAlive controls how long the model will stay loaded in memory following
// this request.
KeepAlive *Duration `json:"keep_alive,omitempty"`
Expand Down Expand Up @@ -94,6 +97,9 @@ type ChatRequest struct {
// Format is the format to return the response in (e.g. "json").
Format string `json:"format"`

// Grammar specifies the GBNF grammar string to constrain generation output.
Grammar string `json:"grammar"`

// KeepAlive controls how long the model will stay loaded into memory
// followin the request.
KeepAlive *Duration `json:"keep_alive,omitempty"`
Expand Down
16 changes: 16 additions & 0 deletions docs/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ Generate a response for a given prompt with a provided model. This is a streamin
Advanced parameters (optional):

- `format`: the format to return a response in. Currently the only accepted value is `json`
- `grammar`: the [GBNF grammar](https://github.com/ggerganov/llama.cpp/tree/master/grammars) to constrain generated output to
- `options`: additional model parameters listed in the documentation for the [Modelfile](./modelfile.md#valid-parameters-and-values) such as `temperature`
- `system`: system message to (overrides what is defined in the `Modelfile`)
- `template`: the prompt template to use (overrides what is defined in the `Modelfile`)
Expand Down Expand Up @@ -162,6 +163,21 @@ curl http://localhost:11434/api/generate -d '{
}'
```

#### Request (GBNF mode)

> When `grammar` is set to a [GBNF grammar](https://github.com/ggerganov/llama.cpp/tree/master/grammars) output will be constrained to the grammar's rules. This method does not rely upon the prompt containing references to how it should output.
##### Request

```shell
curl http://localhost:11434/api/generate -d '{
"model": "llama3",
"prompt": "Are llamas amazing?",
"grammar": "root ::= \"yes\" | \"no\"",
"stream": false
}'
```

##### Response

```json
Expand Down
Loading