You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Basically, since you know the JSON schema, you return appropriate LLM tokens for structure based on control flow, and constrain logit output for typed value situations.
I started to work on this approach in a demo codebase... I'll report back on any progress.
Curious to hear from others about how feasible the approach is.
The text was updated successfully, but these errors were encountered:
👋 I wrote a implementation of constrained sampling with candle here that might be useful as a reference. Here are a few things I found important:
Parsing must be incremental if you want to get reasonable speeds for longer sequences (This makes FSM a good choice)
You can accelerate text generation by eagerly sampling the grammar and feeding the required next tokens into the LLM in one batch instead of one token at a time
llama.cpp now supports grammars:
https://til.simonwillison.net/llms/llama-cpp-python-grammars
Is that something that will come to candle?
It sounds like the approach taken in this python library would be straight forward:
https://github.com/1rgs/jsonformer/blob/main/jsonformer/main.py
Basically, since you know the JSON schema, you return appropriate LLM tokens for structure based on control flow, and constrain logit output for typed value situations.
I started to work on this approach in a demo codebase... I'll report back on any progress.
Curious to hear from others about how feasible the approach is.
The text was updated successfully, but these errors were encountered: