feat: grammar-based sampling in llama-cpp #9712

eryk-dsai · 2023-08-24T16:32:59Z

Description

The following PR enables the grammar-based sampling in llama-cpp LLM.

In short, loading file with formal grammar definition will constrain model outputs. For instance, one can force the model to generate valid JSON or generate only python lists.

In the follow-up PR we will add:

docs with some description why it is cool and how it works
maybe some code sample for some task such as in llama repo

vercel · 2023-08-24T16:33:03Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
langchain	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Aug 25, 2023 0:08am

1 Ignored Deployment

Name	Status	Preview	Comments	Updated (UTC)
langchain-deprecated	⬜️ Ignored (Inspect)	Visit Preview		Aug 25, 2023 0:08am

rlancemartin

We should add some usage examples to the notebook:
https://python.langchain.com/docs/integrations/llms/llamacpp

Edit: oh i see you want to add in follow-up PR? If easy, maybe just consolidate here?

hinthornw · 2023-08-24T17:53:24Z

libs/langchain/langchain/llms/llamacpp.py

@baskaryan feel free to weigh in here, but think we should make this a classmethod or in the init

from_grammar_path()

it's not great to be hitting the file system every time we call the model, and I'd assume people may want to directly provide the grammar as a string rather than exclusively via a file

rlancemartin · 2023-08-24T18:15:59Z

I'm testing now:

Using grammars here

Example init:

n_gpu_layers = 1  # Metal set to 1 is enough.
n_batch = 512  # Should be between 1 and n_ctx, consider the amount of RAM of your Apple Silicon Chip.

llm = LlamaCpp(
    model_path="/Users/rlm/Desktop/Code/llama.cpp/llama-2-13b-chat.ggmlv3.q4_0.bin",
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    f16_kv=True,  # MUST set to True, otherwise you will run into problem after a couple of calls
    callback_manager=callback_manager,
    verbose=True,
    grammar_path="/Users/rlm/Desktop/json.gbnf",
)

Result:

question = "Request: schedule a call at 8pm; Command:"
llm(question)
root ::= object 
object ::= [{] ws object_11 [}] ws 
value ::= object | array | string | number | value_6 ws 
array ::= [[] ws array_15 []] ws 
string ::= ["] string_18 ["] ws 
number ::= number_19 number_25 number_29 ws 
value_6 ::= [t] [r] [u] [e] | [f] [a] [l] [s] [e] | [n] [u] [l] [l] 
ws ::= ws_31 
object_8 ::= string [:] ws value object_10 
object_9 ::= [,] ws string [:] ws value 
object_10 ::= object_9 object_10 | 
object_11 ::= object_8 | 
array_12 ::= value array_14 
array_13 ::= [,] ws value 
array_14 ::= array_13 array_14 | 
array_15 ::= array_12 | 
string_16 ::= [^"\] | [\] string_17 
string_17 ::= ["\/bfnrt] | [u] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] 
string_18 ::= string_16 string_18 | 
number_19 ::= number_20 number_21 
number_20 ::= [-] | 
number_21 ::= [0-9] | [1-9] number_22 
number_22 ::= [0-9] number_22 | 
number_23 ::= [.] number_24 
number_24 ::= [0-9] number_24 | [0-9] 
number_25 ::= number_23 | 
number_26 ::= [eE] number_27 number_28 
number_27 ::= [-+] | 
number_28 ::= [0-9] number_28 | [0-9] 
number_29 ::= number_26 | 
ws_30 ::= [ <U+0009><U+000A>] ws 
ws_31 ::= ws_30 | 
{ "scheduleCall": { "date": "2018-09-25", "time": "20:00" } }

But, many newlines added following the JSON:

'{"schedule": {"date": "2018-09-14T20:00:00.000Z", "duration": 60}}\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n'

rlancemartin · 2023-08-24T21:23:52Z

Possible bug / issue in generations w/ excessive newlines.

Also see this in LangSmith trace.

Seems specific to when I supply the grammar_path.

Separately, Ollama folks report similar behavior w/ code-LLaMA model @jmorganca.

Possible a common issue w/ incorrectly LLaMA prompting in both cases?

rlancemartin · 2023-08-24T22:05:29Z

This is fixed if we manually supply a STOP token in the prompt. See here.

rlancemartin · 2023-08-24T22:06:44Z

Full demo:

n_gpu_layers = 1 
n_batch = 512 
llm = LlamaCpp(
    model_path="/Users/rlm/Desktop/Code/llama.cpp/llama-2-13b-chat.ggmlv3.q4_0.bin",
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    f16_kv=True,  # MUST set to True, otherwise you will run into problem after a couple of calls
    callback_manager=callback_manager,
    verbose=True,
    grammar_path="/Users/rlm/Desktop/json.gbnf",
    stop=["STOP"]
)

Run w/ STOP token specified:

template = """Print 'STOP' when you are finished answering the question. Question: {question}"""
prompt = PromptTemplate(template=template, input_variables=["question"])
llm_chain = LLMChain(prompt=prompt, llm=llm)
question = "Request: schedule a call at 8pm; Command:"
llm_chain.run(question)

Result:

'{"type": "request", "message": "Hello! I would like to request your availability for a call tonight at 8pm. Would you be available?", "tones": [{"id": "polite", "name": "Polite"}]}'

rlancemartin · 2023-08-24T23:14:18Z

@eryk-dsai overall, this is promising but quite finicky / tricky.

We will definitely need a good / crisp prompting guide.

In addition, we should check in some hardened .gbnf files that "just work."

I started here, but still work to be done since reliability is not there yet.

In particular, the default .gbnf w/ llama.cpp (here) spew excessive newlines (at least JSON did).

I tried to fix it in the linked PR.

eryk-dsai · 2023-08-25T12:02:40Z

Hi @rlancemartin ,

I played with list grammar a little bit more and I think I've got something that works reasonably well for lists that consist of multi-word strings. Obviously Python's lists can be much more diverse: empty lists, lists with different types, lists of lists... but I think the example grammar of list and json that we've provided should be enough.

Feel free to update the model and grammar path in list example if you want

rlancemartin · 2023-08-25T16:15:56Z

Hi @rlancemartin ,

I played with list grammar a little bit more and I think I've got something that works reasonably well for lists that consist of multi-word strings. Obviously Python's lists can be much more diverse: empty lists, lists with different types, lists of lists... but I think the example grammar of list and json that we've provided should be enough.

Feel free to update the model and grammar path in list example if you want

Thanks for adding! I'll have a look now.

rlancemartin · 2023-08-26T06:24:42Z

Hi @rlancemartin ,

I played with list grammar a little bit more and I think I've got something that works reasonably well for lists that consist of multi-word strings. Obviously Python's lists can be much more diverse: empty lists, lists with different types, lists of lists... but I think the example grammar of list and json that we've provided should be enough.

Feel free to update the model and grammar path in list example if you want

Pretty cool! Just tested w/ -

llama.cpp/models/openorca-platypus2-13b.gguf.q4_0.bin

Note gguf formated needed for llama-cpp-python v0.1.79.

Converted my model (openorca-platypus2-13b) via -

llama.cpp % python ./convert-llama-ggmlv3-to-gguf.py --eps 1e-5 --input models/openorca-platypus2-13b.ggmlv3.q4_0.bin --output models/openorca-platypus2-13b.gguf.q4_0.bin

Init -

llm = LlamaCpp(
    model_path="/Users/rlm/Desktop/Code/llama.cpp/models/openorca-platypus2-13b.gguf.q4_0.bin",
    n_gpu_layers=1,
    n_batch=512,
    f16_kv=True, 
    callback_manager=callback_manager,
    verbose=True,
    grammar_path="/Users/rlm/Desktop/Code/langchain-main/langchain/libs/langchain/langchain/llms/grammars/json.gbnf",
)

Ran llm("Describe a person in JSON format:") -

{
  "name": "John Doe",
  "age": 30,
  "": "Engineer",
  "country": {
    "name": "United States"
  },
  "likes": [
    "Sports",
    "Music",
    "Movies"
  ]}

Lists still need some work (at least for me).

result=llm("List of top-3 my favourite books:")
{"data":[{"book_name":"The Alchemist","author":"Paul Coelho"},{"book_name":"The Secret","author":"Rhonda Byrne"},{"book_name":"Think and Grow Rich","author":"Napoleon Hill"}]}

eryk-dsai · 2023-08-28T11:03:51Z

@rlancemartin thank you for mentioning gguf.

Your most recent output appears to be the correct json. Is it possible that you used the path json.gbnf rather than list.gbnf? I tested list grammar again locally, and it always produced the correct Python list.

rlancemartin · 2023-08-28T16:51:25Z

@rlancemartin thank you for mentioning gguf.

Your most recent output appears to be the correct json. Is it possible that you used the path json.gbnf rather than list.gbnf? I tested list grammar again locally, and it always produced the correct Python list.

I was able to reproduce! Nice. Going to merge this.

rlancemartin

I can reproduce this:

# Path to LLaMA
llm_path = "/Users/rlm/Desktop/Code/llama.cpp/models/openorca-platypus2-13b.gguf.q4_0.bin"
# Path to Langchain repo
langchain_path = "/Users/rlm/Desktop/Code/langchain-main"

JSON -

n_gpu_layers = 1
n_batch = 512
llm = LlamaCpp(
    model_path=llm_path,
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    f16_kv=True,  # MUST set to True, otherwise you will run into problem after a couple of calls
    callback_manager=callback_manager,
    verbose=True,
    grammar_path=langchain_path+"/langchain/libs/langchain/langchain/llms/grammars/json.gbnf",
)
result=llm("Describe a person in JSON format:")

Result -

{"name":"John Smith", "age":32, "":"Software Engineer"}

Results are also as expected using list.

eryk-dsai and others added 4 commits August 24, 2023 15:56

grammar based sampling in llama-cpp

0c96b00

more verbose description of grammar_path, catching ImportError

df7ba74

removing redundant error handling

108098c

Merge branch 'langchain-ai:master' into llama-cpp-grammar

65c6ae7

dosubot bot added Ɑ: models Related to LLMs or chat model modules 🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features labels Aug 24, 2023

vercel bot deployed to Preview – langchain August 24, 2023 16:43 View deployment

rlancemartin reviewed Aug 24, 2023

View reviewed changes

hinthornw reviewed Aug 24, 2023

View reviewed changes

Add grammar

74de068

Enable grammars w LLaMA

7fa40c3

rlancemartin mentioned this pull request Aug 24, 2023

Enable grammars for JSON, list w/ LLaMA #9727

Closed

rlancemartin and others added 4 commits August 24, 2023 16:25

Further testing

75d4722

cr

b4e77e5

More testing

3ca8feb

fixing small typos

f893ea3

vercel bot deployed to Preview – langchain August 25, 2023 11:32 View deployment

eryk-dsai added 3 commits August 25, 2023 13:37

correct grammar for python list of single word strings

c1091a0

grammar for python list supports multi word strings now

6550ae4

update llama-cpp list example

006b544

vercel bot deployed to Preview – langchain August 25, 2023 12:08 View deployment

rlancemartin approved these changes Aug 28, 2023

View reviewed changes

rlancemartin merged commit 7f5713b into langchain-ai:master Aug 28, 2023
27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: grammar-based sampling in llama-cpp #9712

feat: grammar-based sampling in llama-cpp #9712

eryk-dsai commented Aug 24, 2023 •

edited

vercel bot commented Aug 24, 2023 •

edited

rlancemartin left a comment •

edited

hinthornw Aug 24, 2023

rlancemartin commented Aug 24, 2023 •

edited

rlancemartin commented Aug 24, 2023 •

edited

rlancemartin commented Aug 24, 2023

rlancemartin commented Aug 24, 2023

rlancemartin commented Aug 24, 2023

eryk-dsai commented Aug 25, 2023 •

edited

rlancemartin commented Aug 25, 2023

rlancemartin commented Aug 26, 2023

eryk-dsai commented Aug 28, 2023

rlancemartin commented Aug 28, 2023

rlancemartin left a comment

feat: grammar-based sampling in llama-cpp #9712

feat: grammar-based sampling in llama-cpp #9712

Conversation

eryk-dsai commented Aug 24, 2023 • edited

Description

vercel bot commented Aug 24, 2023 • edited

rlancemartin left a comment • edited

Choose a reason for hiding this comment

hinthornw Aug 24, 2023

Choose a reason for hiding this comment

rlancemartin commented Aug 24, 2023 • edited

rlancemartin commented Aug 24, 2023 • edited

rlancemartin commented Aug 24, 2023

rlancemartin commented Aug 24, 2023

rlancemartin commented Aug 24, 2023

eryk-dsai commented Aug 25, 2023 • edited

rlancemartin commented Aug 25, 2023

rlancemartin commented Aug 26, 2023

eryk-dsai commented Aug 28, 2023

rlancemartin commented Aug 28, 2023

rlancemartin left a comment

Choose a reason for hiding this comment

eryk-dsai commented Aug 24, 2023 •

edited

vercel bot commented Aug 24, 2023 •

edited

rlancemartin left a comment •

edited

rlancemartin commented Aug 24, 2023 •

edited

rlancemartin commented Aug 24, 2023 •

edited

eryk-dsai commented Aug 25, 2023 •

edited