Improved json grammar #3785

hughescr · 2024-04-20T19:33:18Z

This PR includes #3782 #3783 #3784 but also completely removes all whitespace generation when applying format=json.

Whitespace is completely unnecessary in JSON - it is optionally acceptable, but can be ignored if we're generating and not parsing. Since every token costs time, money, and CO2, removing all whitespace is highly recommended.

It will also fix a buglet, where models could get "stuck" generating infinite whitespace, since there is no limit in the definition of ws for how long it can be.

…characters inside strings, which breaks parsing.

…t objects at the top level.

…bly infinite whitespace

…peed, and more $$$ in NVidia's pocket.

not-nullptr · 2024-04-25T19:00:10Z

could we please have a contributor review this PR? json is currently unusable with the llama3 family of models, and i think this PR should fix it

hughescr · 2024-05-05T01:01:36Z

I think #3782 and #3784 are the bufixes needed to fix llama3 (or any other model that likes generating bad JSON if unconstrained) -- one prevents output of malformed JSON strings (with unescaped characters in the middle of the strings); the other prevents endless whitespace at the end of the generated JSON. #3783 is more of a feature patch, allowing more flexibility in the JSON which is generated, so that you don't always require an object wrapper at the top level of the JSON - you can just have bare literal strings/numbers/arrays. That's sometimes good, but sometimes bad; I've actually seen llama3 generate worse JSON with #3783 than without it, cos it'll do something like:

"{\"key\":\"value\"}"

ie is starts off with a literal string, and then it realizes it wants to do an object, so you end up with a string that contains JSON...

This patch here I didn't include separately, but it's a good performance boost by removing interior whitespace in the JSON so the LLM just skips emitting all the whitespace characters that are un-needed.

My recommendation to whoever's deciding here would be to either accept this PR (ie all the changes), or else accept this one, then back out #3783 if you think allowing non-objects at the top level in JSON is too much.

coder543 · 2024-05-05T01:09:05Z

Personally, I think it is better to allow interior whitespace, but limit it to up to one new line and up to a small number of spaces between fields. Papers like “let’s think dot by dot” show that every token of output can help the models perform better, so if the models “want” more “time” to think, a little flexibility for the “comfort” of the model isn’t a terrible thing.

On the other hand, whitespace after the emitted JSON has already become syntactically valid is a total waste of time.

Regarding excessive flexibility on the top level type causing issues, I agree with that as well. I wish we could just supply either a custom grammar or a JSON Schema or something, but since we can’t… I wish we could at least specify whether we want an object or an array as the top level type. I do not see any possible reason to allow a string or a null as the top level type, but I could see some argument for a number or a Boolean. In the absence of being allowed to specify a top level type at all, I’d either choose between only allowing an object or allowing an object or an array.

coder543 · 2024-05-05T01:10:43Z

I also wish we could get a maintainer to look at these changes… it has been awhile. @jmorganca ?

joliss · 2024-05-16T10:56:57Z

It might be worth benchmarking whether or not the response quality suffers if you disallow whitespace.

It's conceivable that models might benefit from whitespace, either because the indentation helps them keep track of where they are in the object hierarchy, ~~or because of the "let's think dot by dot" result that @coder543 mentions~~ [1]. If that's the case, then perhaps there should be an option to toggle whitespace in the output.

But it's also conceivable that it makes little to no difference, and then we might as well default to never generating whitespace, to save on tokens.

I'll be away from my development machine for a while, but if I have anything to report in the future, I'll leave a comment here.

[1] Update: Reading the "Let's Think Dot by Dot" paper, their method requires bespoke model finetuning and doesn't work on existing models. So I don't think this is a concern here.

not-nullptr · 2024-05-16T11:01:20Z

my reply isn't gonna be very constructive, but does it matter if quality is a tad worse? for any sort of real time production use case, json mode is unusable. waiting 10+ seconds for a response because the LLM keeps outputting whitespace after the answer is not a feature, it's a bug; it would freeze usually but there's a mechanism in ollama which cuts off generation if a character is repeated too many times (which also breaks the ollama JS library, since it throws an error if done is false on a non streamed request)

joliss · 2024-05-16T11:04:00Z

I agree, and to be clear, my comment was only about the part where we remove all whitespace generation. Quoting the OP:

This PR includes #3782 #3783 #3784 but also completely removes all whitespace generation when applying format=json.

The changes in #3782 #3783 #3784 seem like unambiguous improvements, and I think they're all worth merging.

not-nullptr · 2024-05-16T11:06:33Z

removing all whitespace seems like an unambiguous improvement too. i don't see why generation quality would be hindered, and every token is more CO2 (like OP said)

coder543 · 2024-05-16T12:09:15Z

@not-nullptr because of the Let’s Think Dot by Dot paper I linked? And in my previous reply, I was extremely clear that whitespace after the end of the valid JSON is a “total waste of time.” I’m not suggesting to keep that, at all. Why would anyone?

Regardless, I’d rather have these changes as-is than keep the current (broken) state of things, even if I believe forcibly removing all interior whitespace goes against the currently understood research on output quality.

richardanaya · 2024-05-21T18:22:26Z

I'd offer with the ability to override the grammar on API calls as in this #4525 PR, we wouldn't have to debate which grammar to use for JSON.

hughescr · 2024-05-28T16:59:49Z

It might be worth benchmarking whether or not the response quality suffers if you disallow whitespace.

It's conceivable that models might benefit from whitespace, either because the indentation helps them keep track of where they are in the object hierarchy, ~~or because of the "let's think dot by dot" result that @coder543 mentions~~ [1]. If that's the case, then perhaps there should be an option to toggle whitespace in the output.

But it's also conceivable that it makes little to no difference, and then we might as well default to never generating whitespace, to save on tokens.

I'll be away from my development machine for a while, but if I have anything to report in the future, I'll leave a comment here.

[1] Update: Reading the "Let's Think Dot by Dot" paper, their method requires bespoke model finetuning and doesn't work on existing models. So I don't think this is a concern here.

I have actually noticed that Phi-3 generates garbage JSON unless it's allowed to first output a single space, then begin the {..., I've tried adjusting prompts to put the leading space in the reply and hope for continuation, but it just outputs another space and then then JSON. Constraining the output to not include that initial space for Phi-3 basically completely crippples it.

hughescr · 2024-05-28T17:01:29Z

my reply isn't gonna be very constructive, but does it matter if quality is a tad worse? for any sort of real time production use case, json mode is unusable. waiting 10+ seconds for a response because the LLM keeps outputting whitespace after the answer is not a feature, it's a bug; it would freeze usually but there's a mechanism in ollama which cuts off generation if a character is repeated too many times (which also breaks the ollama JS library, since it throws an error if done is false on a non streamed request)

trailing whitespace in particular ought to always be truncatable. There's NO REASON ever to generate it. The rest has already been generated, so JSON quality can't drop based on removing trailing WS. As mentioned above though, some models definitely suffer when leading or interior whitespace is forced out, though how much will vary from model to model.

coder543 · 2024-05-28T17:06:34Z

Also, I just saw the edit from @joliss, but my conclusion was that the paper’s findings were broadly applicable. I interpreted that they fine-tuned the final layer for a particular task to try to make the results more clear. But, I could always be wrong.

To quote the paper’s abstract summary:

In summary, our results show that additional tokens can provide computational benefits independent of token choice. The fact that intermediate tokens can act as filler tokens raises concerns about large language models engaging in unauditable, hidden computations that are increasingly detached from the observed chain-of-thought tokens.

hughescr · 2024-05-28T17:43:03Z

Phi3 is also super sensitive to interior whitespace I guess. These runs were with a JSON grammar that allows ws before the root object, but removes all other whitespace (like this #3785 but allowing an initial space that Phi3 requires). As can be seen, this completely breaks the LLM which just stop generating very early. When we don't force that tight JSON grammar, the model does just fine on its own (with a single trailing '\n')

[craig@Megamind:~]$ ollama run phi3:medium-128k-instruct-q8_0 'Please return a JSON object representing an address book entry for some invented person' --format=json                                                                              [√][!21742]
 {"name": {"firstName":


[craig@Megamind:~]$ ollama run phi3:medium-128k-instruct-q8_0 'Please return a JSON object representing an address book entry for some invented person'                                                                                            [√][!21743]
 {
  "name": "Jane Doe",
  "email": "jane.doe@example.com",
  "phone_number": "+1-555-0123456",
  "address": {
    "street": "123 Main St.",
    "city": "Metropolis",
    "state": "NY",
    "postal_code": "10001"
  },
  "notes": "Close friend and colleague. Enjoys hiking and photography."
}

...so probably this patch is not a good one in general, but #3782 and #3784 definitely are unambiguous improvements. #3783 I've had occasional issues - some LLMs seem to like generating stringified JSON (ie JSON inside a string) with #3783 so if what you want is objects, you might often not get them with that patch depending on the LLM.

I'm closing this PR since it doesn't seem ideal after using it in a variety of models for a while now. Leaving the others, and I'll add some notes to #3783

hughescr added 4 commits April 20, 2024 12:08

Critical fix from llama.cpp JSON grammar to forbid un-escaped escape …

e6c00a9

…characters inside strings, which breaks parsing.

Allow any kind of JSON including arrays or straight literals, not jus…

6523267

…t objects at the top level.

Allow whitespace within objects and arrays, but remove trailing possi…

604ac83

…bly infinite whitespace

Remove all whitespace for generation. Every token is more CO2, less s…

461ee34

…peed, and more $$$ in NVidia's pocket.

joliss mentioned this pull request May 16, 2024

JSON Mode + Streaming + OpenAI API + Llama3 = never sends STOP, and a lot of whitespace after the JSON #4446

Open

hughescr closed this May 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved json grammar #3785

Improved json grammar #3785

hughescr commented Apr 20, 2024

not-nullptr commented Apr 25, 2024

hughescr commented May 5, 2024

coder543 commented May 5, 2024 •

edited

Loading

coder543 commented May 5, 2024

joliss commented May 16, 2024 •

edited

Loading

not-nullptr commented May 16, 2024

joliss commented May 16, 2024

not-nullptr commented May 16, 2024

coder543 commented May 16, 2024

richardanaya commented May 21, 2024

hughescr commented May 28, 2024

hughescr commented May 28, 2024

coder543 commented May 28, 2024

hughescr commented May 28, 2024

Improved json grammar #3785

Improved json grammar #3785

Conversation

hughescr commented Apr 20, 2024

not-nullptr commented Apr 25, 2024

hughescr commented May 5, 2024

coder543 commented May 5, 2024 • edited Loading

coder543 commented May 5, 2024

joliss commented May 16, 2024 • edited Loading

not-nullptr commented May 16, 2024

joliss commented May 16, 2024

not-nullptr commented May 16, 2024

coder543 commented May 16, 2024

richardanaya commented May 21, 2024

hughescr commented May 28, 2024

hughescr commented May 28, 2024

coder543 commented May 28, 2024

hughescr commented May 28, 2024

coder543 commented May 5, 2024 •

edited

Loading

joliss commented May 16, 2024 •

edited

Loading