llama-cpp-python support #70

Maximilian-Winter · 2023-05-20T08:42:37Z

I have added llama-cpp-python support. I also created a example notebook on how to use it!

Merge microsoft/guidance

Maximilian-Winter · 2023-05-20T08:43:36Z

@microsoft-github-policy-service agree

alxspiker · 2023-05-20T18:43:51Z

Thank you!

Maximilian-Winter · 2023-05-20T18:45:21Z

@alxspiker I found a couple of problems with my implementation and are fixing them right now!

alxspiker · 2023-05-20T18:52:16Z

Anyway to support mmap? Seems like its not.

alxspiker · 2023-05-20T18:53:34Z

llama_print_timings:        load time =  4772.70 ms
llama_print_timings:      sample time =     3.01 ms /     1 runs   (    3.01 ms per run)
llama_print_timings: prompt eval time = 11246.46 ms /    23 tokens (  488.98 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 12235.80 ms
Traceback (most recent call last):
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\_program_executor.py", line 94, in run
    await self.visit(self.parse_tree)
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\_program_executor.py", line 428, in visit
    visited_children.append(await self.visit(child, inner_next_node, inner_next_next_node, inner_prev_node, node, parent_node))
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\_program_executor.py", line 428, in visit
    visited_children.append(await self.visit(child, inner_next_node, inner_next_next_node, inner_prev_node, node, parent_node))
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\_program_executor.py", line 395, in visit
    command_output = await command_function(*positional_args, **named_args)
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\library\_select.py", line 158, in select
    option_logprobs = await recursive_select("")
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\library\_select.py", line 131, in recursive_select
    sub_logprobs = await recursive_select(rec_prefix, allow_token_extension=False)
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\library\_select.py", line 131, in recursive_select
    sub_logprobs = await recursive_select(rec_prefix, allow_token_extension=False)
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\library\_select.py", line 131, in recursive_select
    sub_logprobs = await recursive_select(rec_prefix, allow_token_extension=False)
  [Previous line repeated 477 more times]
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\library\_select.py", line 107, in recursive_select
    gen_obj = await parser.llm_session(
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\llms\_llama_cpp.py", line 244, in __call__
    key = self._cache_key(locals())
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\llms\_llm.py", line 76, in _cache_key
    key = self._gen_key(args_dict)
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\llms\_llm.py", line 69, in _gen_key
    return "_---_".join([str(v) for v in ([args_dict[k] for k in var_names] + [self.llm.model_name, self.llm.__class__.__name__, self.llm.cache_version])])
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\llms\_llm.py", line 69, in <listcomp>
    return "_---_".join([str(v) for v in ([args_dict[k] for k in var_names] + [self.llm.model_name, self.llm.__class__.__name__, self.llm.cache_version])])
RecursionError: maximum recursion depth exceeded while getting the repr of an object

Error in program:  maximum recursion depth exceeded while getting the repr of an object

… list to LlamaCpp

Maximilian-Winter · 2023-05-20T20:40:11Z

@alxspiker I have fixed all errors on my side but couldn't reproduce your error, but I added nmap to the settings!

Maximilian-Winter · 2023-05-20T20:48:05Z

@alxspiker At the moment you have to use my fork of llama-cpp-python to use guidance.
You will find the fork here:
https://github.com/Maximilian-Winter/llama-cpp-python

Mihaiii · 2023-05-20T21:38:08Z

Related PR in llama-cpp-python: abetlen/llama-cpp-python#252

It would be awesome to use guidance with llama.cpp! I'm excited :)

slundberg · 2023-05-20T22:36:09Z

@Maximilian-Winter this is great, thanks! It will probably be Monday before I can review it properly. Are the there any basic units tests we can add for this? (with small LMs that don't slow down the test process too much) ...might not be possible with LLaMA, but even a file with test that only run locally would be good so we can make sure this stays working :)

slundberg · 2023-05-20T22:36:36Z

(I also just approved the unit tests to run for this)

Maximilian-Winter · 2023-05-21T00:00:35Z

@slundberg I have added a test file in the tests/llms folder called "test_llamacpp.py".
I used the test_transformers file as a template.

DanielusG · 2023-05-21T18:31:34Z

After many attempts I could not get the role chat to work

I've use this code:

import re
import guidance

# define the model we will use

settings = guidance.llms.LlamaCppSettings()
settings.n_gpu_layers = 14
settings.n_threads = 16
settings.n_ctx = 1024
settings.use_mlock = True
settings.model = "path/to/model"
# Create a LlamaCpp instance and pass the settings to it.
llama = guidance.llms.LlamaCpp(settings=settings)
guidance.llm = llama
def parse_best(prosandcons, options):
    best = int(re.findall(r'Best=(\d+)', prosandcons)[0])
    return options[best]

create_plan = guidance('''
{{#system~}}
You are a helpful assistant.
{{~/system}}

{{! generate five potential ways to accomplish a goal }}
{{#block hidden=True}}
{{#user~}}
I want to {{goal}}.
{{~! generate potential options ~}}
Can you please generate one option for how to accomplish this?
Please make the option very short, at most one line.
{{~/user}}

{{#assistant~}}
{{gen 'options' n=5 temperature=1.0 max_tokens=500}}
{{~/assistant}}
{{/block}}

{{! generate pros and cons for each option and select the best option }}
{{#block hidden=True}}
{{#user~}}
I want to {{goal}}.

Can you please comment on the pros and cons of each of the following options, and then pick the best option?
---{{#each options}}
Option {{@index}}: {{this}}{{/each}}
---
Please discuss each option very briefly (one line for pros, one for cons), and end by saying Best=X, where X is the best option.
{{~/user}}

{{#assistant~}}
{{gen 'prosandcons' temperature=0.0 max_tokens=500}}
{{~/assistant}}
{{/block}}

{{! generate a plan to accomplish the chosen option }}
{{#user~}}
I want to {{goal}}.
{{~! Create a plan }}
Here is my plan:
{{parse_best prosandcons options}}
Please elaborate on this plan, and tell me how to best accomplish it.
{{~/user}}

{{#assistant~}}
{{gen 'plan' max_tokens=500}}
{{~/assistant}}''')
out = create_plan(
    goal='read more books',
    parse_best=parse_best # a custom python function we call in the program
)

Maximilian-Winter · 2023-05-28T13:19:55Z

@slundberg I have implemented proper role_end again, also implemented streaming support.

Maximilian-Winter · 2023-05-29T03:00:15Z

@slundberg I think the best way would be to test just locally. The smallest model right now is a 7B parameter model which is already 3.8gb of memory.

slundberg · 2023-05-29T21:12:47Z

Just a note here, I was still getting some tokenization issues and realized it is going to be hard to maintain so much code that is similar between transformers and llamacpp, so I am going to try and push a proposal to share more code tonight.

…lamaCpp2 proposal

slundberg · 2023-05-31T05:24:05Z

I pushed a proposal in the form of LlamaCpp2, along with lots of updates to Transformers that are related because we will want to depend on them. I think we need to inherit from the Transformers LLM class because otherwise we duplicate lots of code that is tricky and should only live in one place :)

LlamaCpp2 does not work fully yet, but I am pushing to to see what you think @Maximilian-Winter.

thanks again for all the hard work pushing on this :)

Maximilian-Winter · 2023-05-31T07:46:35Z

@slundberg Will take a look later today

Maximilian-Winter · 2023-05-31T20:51:07Z

@slundberg Looks good to me! But I think the tokenizer of llama.cpp is bugged because it refuse to give me the eos or bos token.
It is always empty when I try to decode it from the id!

slundberg · 2023-05-31T23:48:38Z

@slundberg Looks good to me! But I think the tokenizer of llama.cpp is bugged because it refuse to give me the eos or bos token.
It is always empty when I try to decode it from the id!

Yeah, I just think we can just return </s> directly for now. I just pushed a few more fixes. Can I hand this back over to you to wrap up? There is some difference with the way the logprobs are returned that is not quite matching how transformers returns it yet, but otherwise I think we are close!

I also noticed that logit bias processor inside llama-cpp-python seems to save the bias values after the local logits variable is already set:
https://github.com/abetlen/llama-cpp-python/blob/232880cbc677db1998afa240c25e58090f399072/llama_cpp/llama.py#L373-L383

Maximilian-Winter · 2023-06-01T00:00:24Z

@slundberg I will try to make it work later!

Jchang4 · 2023-06-03T23:23:46Z

guidance/llms/_llama_cpp2.py

+        return self.model_obj.detokenize(tokens).decode("utf-8", errors="ignore") # errors="ignore" is copied from llama-cpp-python
+
+    def convert_ids_to_tokens(self, ids):
+        return [self.decode([id]) for id in ids]


return [self.decode(ids)]?

Jchang4 · 2023-06-15T12:44:34Z

Please merge this microsoft or at least help support it. This would be HUGE for guidance

vmajor · 2023-06-16T12:35:31Z

Is there progress with this? Oh I see there is already a message. Yes, there are a few of us spamming refresh on this...

Maximilian-Winter · 2023-06-16T12:58:01Z

Sorry, was very busy with other stuff at work! Will look into this!

kongjiellx · 2023-06-27T02:07:49Z

Any progress?

Blueoctopusinc · 2023-07-15T18:34:58Z

Any updates on this?

charles-dyfis-net · 2023-07-28T17:41:15Z

Hmm. Looks like there's a conflict with 47b1cd4. Trying to use a519012, a merge which brings the former commit into the PR...

>>> import guidance
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/nix/store/0j6zxl1kvqkvhcw0i2chdqxx22xsi5sf-python3-3.10.11-env/lib/python3.10/site-packages/guidance/__init__.py", line 7, in <module>
    from ._program import Program
  File "/nix/store/0j6zxl1kvqkvhcw0i2chdqxx22xsi5sf-python3-3.10.11-env/lib/python3.10/site-packages/guidance/_program.py", line 17, in <module>
    from .llms import _openai
  File "/nix/store/0j6zxl1kvqkvhcw0i2chdqxx22xsi5sf-python3-3.10.11-env/lib/python3.10/site-packages/guidance/llms/__init__.py", line 7, in <module>
    from ._llama_cpp import LlamaCpp
  File "/nix/store/0j6zxl1kvqkvhcw0i2chdqxx22xsi5sf-python3-3.10.11-env/lib/python3.10/site-packages/guidance/llms/_llama_cpp.py", line 17, in <module>
    class LlamaCpp(LLM):
  File "/nix/store/0j6zxl1kvqkvhcw0i2chdqxx22xsi5sf-python3-3.10.11-env/lib/python3.10/site-packages/guidance/llms/_llama_cpp.py", line 21, in LlamaCpp
    cache = LLM._open_cache("_llama_cpp.diskcache")
AttributeError: type object 'LLM' has no attribute '_open_cache'

Jchang4 · 2023-07-29T13:47:24Z

yeah this needs to be updated. I've tried forking Max's and git pulling Microsoft's main branch, but there have been a lot of changes since June so lots of things need tweaking.

talhalatifkhan · 2023-08-16T08:23:25Z

Any updates on this?

nielsrolf · 2023-09-16T19:48:06Z

Any plans on merging this at some point?

freckletonj · 2023-09-27T04:18:43Z

guidance's templating is miles more friendly to use than lmql.

But... guidance, are you still alive?

akashAD98 · 2023-10-27T03:19:06Z

any update on this ???

slundberg · 2023-12-11T23:44:38Z

@Maximilian-Winter thank you so much for all your hard work on this! Due to some external circumstances over the summer I couldn't come back to push it over the finish line until this fall (with v0.1). This PR strongly informed the design decisions we made for Llama.cpp support in v0.1 though so it was very useful.

I am closing this now since we now have full llama.cpp support in v0.1 :)

Maximilian-Winter added 4 commits May 20, 2023 00:41

Added support for llama-cpp-python models.

15cfec3

Merge pull request #1 from microsoft/main

6b7e9b9

Merge microsoft/guidance

Added example notebook and automatic vocab size recognition

f5d8857

Merge branch 'main' of https://github.com/Maximilian-Winter/guidance

c85ea11

Maximilian-Winter changed the title ~~I have added llama-cpp-python support.~~ llama-cpp-python support May 20, 2023

Maximilian-Winter added 2 commits May 20, 2023 10:49

Fixed vocab size passing

c610315

Fixed all vocab size passing

c7e12e5

Maximilian-Winter closed this May 20, 2023

Maximilian-Winter added 3 commits May 20, 2023 21:39

Merge branch 'microsoft:main' into main

2aba8b5

Fixed LlamaCpp generation and added logic processor and stop criteria…

95218ec

… list to LlamaCpp

Added mmap to configurable settings.

0ead052

Maximilian-Winter reopened this May 20, 2023

Added complete example and fixed formatting.

2c68603

Added tests

2d95751

QuangBK mentioned this pull request May 21, 2023

How to get the required model? QuangBK/localLLM_guidance#1

Closed

Merge branch 'microsoft:main' into main

bcf1f69

Maximilian-Winter added 2 commits May 21, 2023 21:10

Merge branch 'microsoft:main' into main

17f7bfe

Merge branch 'microsoft:main' into main

77cd71e

Merge branch 'microsoft:main' into main

67399ad

Support multi-token healing, new tokenization for transformers, and L…

b4e53bc

…lamaCpp2 proposal

Maximilian-Winter and others added 3 commits May 31, 2023 11:05

Merge branch 'microsoft:main' into main

313c726

More cleanup to improve API and get closer to running LlamaCpp2

ed46a24

Fix streaming end for LlamaCpp2

d0c5731

Fix a few more llamacpp issues

0b4fc50

Merge branch 'main' of https://github.com/microsoft/guidance

a519012

Jchang4 reviewed Jun 3, 2023

View reviewed changes

paolorechia mentioned this pull request Jun 6, 2023

Adds guidance extension oobabooga/text-generation-webui#2554

Closed

slundberg closed this Dec 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama-cpp-python support #70

llama-cpp-python support #70

Maximilian-Winter commented May 20, 2023

Maximilian-Winter commented May 20, 2023

alxspiker commented May 20, 2023

Maximilian-Winter commented May 20, 2023

alxspiker commented May 20, 2023

alxspiker commented May 20, 2023

Maximilian-Winter commented May 20, 2023

Maximilian-Winter commented May 20, 2023

Mihaiii commented May 20, 2023 •

edited

slundberg commented May 20, 2023

slundberg commented May 20, 2023

Maximilian-Winter commented May 21, 2023

DanielusG commented May 21, 2023

Maximilian-Winter commented May 28, 2023

Maximilian-Winter commented May 29, 2023

slundberg commented May 29, 2023

slundberg commented May 31, 2023

Maximilian-Winter commented May 31, 2023

Maximilian-Winter commented May 31, 2023

slundberg commented May 31, 2023 •

edited

Maximilian-Winter commented Jun 1, 2023

Jchang4 Jun 3, 2023

Jchang4 commented Jun 15, 2023

vmajor commented Jun 16, 2023

Maximilian-Winter commented Jun 16, 2023

kongjiellx commented Jun 27, 2023

Blueoctopusinc commented Jul 15, 2023

charles-dyfis-net commented Jul 28, 2023 •

edited

Jchang4 commented Jul 29, 2023

talhalatifkhan commented Aug 16, 2023

nielsrolf commented Sep 16, 2023

freckletonj commented Sep 27, 2023

akashAD98 commented Oct 27, 2023

slundberg commented Dec 11, 2023

llama-cpp-python support #70

llama-cpp-python support #70

Conversation

Maximilian-Winter commented May 20, 2023

Maximilian-Winter commented May 20, 2023

alxspiker commented May 20, 2023

Maximilian-Winter commented May 20, 2023

alxspiker commented May 20, 2023

alxspiker commented May 20, 2023

Maximilian-Winter commented May 20, 2023

Maximilian-Winter commented May 20, 2023

Mihaiii commented May 20, 2023 • edited

slundberg commented May 20, 2023

slundberg commented May 20, 2023

Maximilian-Winter commented May 21, 2023

DanielusG commented May 21, 2023

Maximilian-Winter commented May 28, 2023

Maximilian-Winter commented May 29, 2023

slundberg commented May 29, 2023

slundberg commented May 31, 2023

Maximilian-Winter commented May 31, 2023

Maximilian-Winter commented May 31, 2023

slundberg commented May 31, 2023 • edited

Maximilian-Winter commented Jun 1, 2023

Jchang4 Jun 3, 2023

Choose a reason for hiding this comment

Jchang4 commented Jun 15, 2023

vmajor commented Jun 16, 2023

Maximilian-Winter commented Jun 16, 2023

kongjiellx commented Jun 27, 2023

Blueoctopusinc commented Jul 15, 2023

charles-dyfis-net commented Jul 28, 2023 • edited

Jchang4 commented Jul 29, 2023

talhalatifkhan commented Aug 16, 2023

nielsrolf commented Sep 16, 2023

freckletonj commented Sep 27, 2023

akashAD98 commented Oct 27, 2023

slundberg commented Dec 11, 2023

Mihaiii commented May 20, 2023 •

edited

slundberg commented May 31, 2023 •

edited

charles-dyfis-net commented Jul 28, 2023 •

edited