Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama-cpp-python support #70

Closed
wants to merge 45 commits into from

Conversation

Maximilian-Winter
Copy link

I have added llama-cpp-python support. I also created a example notebook on how to use it!

@Maximilian-Winter
Copy link
Author

@microsoft-github-policy-service agree

@Maximilian-Winter Maximilian-Winter changed the title I have added llama-cpp-python support. llama-cpp-python support May 20, 2023
@alxspiker
Copy link

Thank you!

@Maximilian-Winter
Copy link
Author

@alxspiker I found a couple of problems with my implementation and are fixing them right now!

@alxspiker
Copy link

Anyway to support mmap? Seems like its not.

@alxspiker
Copy link

llama_print_timings:        load time =  4772.70 ms
llama_print_timings:      sample time =     3.01 ms /     1 runs   (    3.01 ms per run)
llama_print_timings: prompt eval time = 11246.46 ms /    23 tokens (  488.98 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 12235.80 ms
Traceback (most recent call last):
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\_program_executor.py", line 94, in run
    await self.visit(self.parse_tree)
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\_program_executor.py", line 428, in visit
    visited_children.append(await self.visit(child, inner_next_node, inner_next_next_node, inner_prev_node, node, parent_node))
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\_program_executor.py", line 428, in visit
    visited_children.append(await self.visit(child, inner_next_node, inner_next_next_node, inner_prev_node, node, parent_node))
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\_program_executor.py", line 395, in visit
    command_output = await command_function(*positional_args, **named_args)
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\library\_select.py", line 158, in select
    option_logprobs = await recursive_select("")
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\library\_select.py", line 131, in recursive_select
    sub_logprobs = await recursive_select(rec_prefix, allow_token_extension=False)
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\library\_select.py", line 131, in recursive_select
    sub_logprobs = await recursive_select(rec_prefix, allow_token_extension=False)
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\library\_select.py", line 131, in recursive_select
    sub_logprobs = await recursive_select(rec_prefix, allow_token_extension=False)
  [Previous line repeated 477 more times]
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\library\_select.py", line 107, in recursive_select
    gen_obj = await parser.llm_session(
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\llms\_llama_cpp.py", line 244, in __call__
    key = self._cache_key(locals())
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\llms\_llm.py", line 76, in _cache_key
    key = self._gen_key(args_dict)
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\llms\_llm.py", line 69, in _gen_key
    return "_---_".join([str(v) for v in ([args_dict[k] for k in var_names] + [self.llm.model_name, self.llm.__class__.__name__, self.llm.cache_version])])
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\llms\_llm.py", line 69, in <listcomp>
    return "_---_".join([str(v) for v in ([args_dict[k] for k in var_names] + [self.llm.model_name, self.llm.__class__.__name__, self.llm.cache_version])])
RecursionError: maximum recursion depth exceeded while getting the repr of an object

Error in program:  maximum recursion depth exceeded while getting the repr of an object

@Maximilian-Winter
Copy link
Author

@alxspiker I have fixed all errors on my side but couldn't reproduce your error, but I added nmap to the settings!

@Maximilian-Winter
Copy link
Author

@alxspiker At the moment you have to use my fork of llama-cpp-python to use guidance.
You will find the fork here:
https://github.com/Maximilian-Winter/llama-cpp-python

@Mihaiii
Copy link
Contributor

Mihaiii commented May 20, 2023

Related PR in llama-cpp-python: abetlen/llama-cpp-python#252

It would be awesome to use guidance with llama.cpp! I'm excited :)

@slundberg
Copy link
Contributor

@Maximilian-Winter this is great, thanks! It will probably be Monday before I can review it properly. Are the there any basic units tests we can add for this? (with small LMs that don't slow down the test process too much) ...might not be possible with LLaMA, but even a file with test that only run locally would be good so we can make sure this stays working :)

@slundberg
Copy link
Contributor

(I also just approved the unit tests to run for this)

@Maximilian-Winter
Copy link
Author

@slundberg I have added a test file in the tests/llms folder called "test_llamacpp.py".
I used the test_transformers file as a template.

@DanielusG
Copy link

After many attempts I could not get the role chat to work

I've use this code:

import re
import guidance

# define the model we will use

settings = guidance.llms.LlamaCppSettings()
settings.n_gpu_layers = 14
settings.n_threads = 16
settings.n_ctx = 1024
settings.use_mlock = True
settings.model = "path/to/model"
# Create a LlamaCpp instance and pass the settings to it.
llama = guidance.llms.LlamaCpp(settings=settings)
guidance.llm = llama
def parse_best(prosandcons, options):
    best = int(re.findall(r'Best=(\d+)', prosandcons)[0])
    return options[best]

create_plan = guidance('''
{{#system~}}
You are a helpful assistant.
{{~/system}}

{{! generate five potential ways to accomplish a goal }}
{{#block hidden=True}}
{{#user~}}
I want to {{goal}}.
{{~! generate potential options ~}}
Can you please generate one option for how to accomplish this?
Please make the option very short, at most one line.
{{~/user}}

{{#assistant~}}
{{gen 'options' n=5 temperature=1.0 max_tokens=500}}
{{~/assistant}}
{{/block}}

{{! generate pros and cons for each option and select the best option }}
{{#block hidden=True}}
{{#user~}}
I want to {{goal}}.

Can you please comment on the pros and cons of each of the following options, and then pick the best option?
---{{#each options}}
Option {{@index}}: {{this}}{{/each}}
---
Please discuss each option very briefly (one line for pros, one for cons), and end by saying Best=X, where X is the best option.
{{~/user}}

{{#assistant~}}
{{gen 'prosandcons' temperature=0.0 max_tokens=500}}
{{~/assistant}}
{{/block}}

{{! generate a plan to accomplish the chosen option }}
{{#user~}}
I want to {{goal}}.
{{~! Create a plan }}
Here is my plan:
{{parse_best prosandcons options}}
Please elaborate on this plan, and tell me how to best accomplish it.
{{~/user}}

{{#assistant~}}
{{gen 'plan' max_tokens=500}}
{{~/assistant}}''')
out = create_plan(
    goal='read more books',
    parse_best=parse_best # a custom python function we call in the program
)

@Maximilian-Winter
Copy link
Author

@slundberg I have implemented proper role_end again, also implemented streaming support.

@Maximilian-Winter
Copy link
Author

@slundberg I think the best way would be to test just locally. The smallest model right now is a 7B parameter model which is already 3.8gb of memory.

@slundberg
Copy link
Contributor

Just a note here, I was still getting some tokenization issues and realized it is going to be hard to maintain so much code that is similar between transformers and llamacpp, so I am going to try and push a proposal to share more code tonight.

@slundberg
Copy link
Contributor

I pushed a proposal in the form of LlamaCpp2, along with lots of updates to Transformers that are related because we will want to depend on them. I think we need to inherit from the Transformers LLM class because otherwise we duplicate lots of code that is tricky and should only live in one place :)

LlamaCpp2 does not work fully yet, but I am pushing to to see what you think @Maximilian-Winter.

thanks again for all the hard work pushing on this :)

@Maximilian-Winter
Copy link
Author

@slundberg Will take a look later today

@Maximilian-Winter
Copy link
Author

@slundberg Looks good to me! But I think the tokenizer of llama.cpp is bugged because it refuse to give me the eos or bos token.
It is always empty when I try to decode it from the id!

@slundberg
Copy link
Contributor

slundberg commented May 31, 2023

@slundberg Looks good to me! But I think the tokenizer of llama.cpp is bugged because it refuse to give me the eos or bos token.
It is always empty when I try to decode it from the id!

Yeah, I just think we can just return </s> directly for now. I just pushed a few more fixes. Can I hand this back over to you to wrap up? There is some difference with the way the logprobs are returned that is not quite matching how transformers returns it yet, but otherwise I think we are close!

I also noticed that logit bias processor inside llama-cpp-python seems to save the bias values after the local logits variable is already set:
https://github.com/abetlen/llama-cpp-python/blob/232880cbc677db1998afa240c25e58090f399072/llama_cpp/llama.py#L373-L383

@Maximilian-Winter
Copy link
Author

@slundberg I will try to make it work later!

return self.model_obj.detokenize(tokens).decode("utf-8", errors="ignore") # errors="ignore" is copied from llama-cpp-python

def convert_ids_to_tokens(self, ids):
return [self.decode([id]) for id in ids]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return [self.decode(ids)]?

@Jchang4
Copy link

Jchang4 commented Jun 15, 2023

Please merge this microsoft or at least help support it. This would be HUGE for guidance

@vmajor
Copy link

vmajor commented Jun 16, 2023

Is there progress with this? Oh I see there is already a message. Yes, there are a few of us spamming refresh on this...

@Maximilian-Winter
Copy link
Author

Sorry, was very busy with other stuff at work! Will look into this!

@kongjiellx
Copy link

Any progress?

@Blueoctopusinc
Copy link

Any updates on this?

@charles-dyfis-net
Copy link

charles-dyfis-net commented Jul 28, 2023

Hmm. Looks like there's a conflict with 47b1cd4. Trying to use a519012, a merge which brings the former commit into the PR...

>>> import guidance
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/nix/store/0j6zxl1kvqkvhcw0i2chdqxx22xsi5sf-python3-3.10.11-env/lib/python3.10/site-packages/guidance/__init__.py", line 7, in <module>
    from ._program import Program
  File "/nix/store/0j6zxl1kvqkvhcw0i2chdqxx22xsi5sf-python3-3.10.11-env/lib/python3.10/site-packages/guidance/_program.py", line 17, in <module>
    from .llms import _openai
  File "/nix/store/0j6zxl1kvqkvhcw0i2chdqxx22xsi5sf-python3-3.10.11-env/lib/python3.10/site-packages/guidance/llms/__init__.py", line 7, in <module>
    from ._llama_cpp import LlamaCpp
  File "/nix/store/0j6zxl1kvqkvhcw0i2chdqxx22xsi5sf-python3-3.10.11-env/lib/python3.10/site-packages/guidance/llms/_llama_cpp.py", line 17, in <module>
    class LlamaCpp(LLM):
  File "/nix/store/0j6zxl1kvqkvhcw0i2chdqxx22xsi5sf-python3-3.10.11-env/lib/python3.10/site-packages/guidance/llms/_llama_cpp.py", line 21, in LlamaCpp
    cache = LLM._open_cache("_llama_cpp.diskcache")
AttributeError: type object 'LLM' has no attribute '_open_cache'

@Jchang4
Copy link

Jchang4 commented Jul 29, 2023

yeah this needs to be updated. I've tried forking Max's and git pulling Microsoft's main branch, but there have been a lot of changes since June so lots of things need tweaking.

@talhalatifkhan
Copy link

Any updates on this?

@nielsrolf
Copy link

Any plans on merging this at some point?

@freckletonj
Copy link

guidance's templating is miles more friendly to use than lmql.

But... guidance, are you still alive?

@akashAD98
Copy link

any update on this ???

@slundberg
Copy link
Contributor

@Maximilian-Winter thank you so much for all your hard work on this! Due to some external circumstances over the summer I couldn't come back to push it over the finish line until this fall (with v0.1). This PR strongly informed the design decisions we made for Llama.cpp support in v0.1 though so it was very useful.

I am closing this now since we now have full llama.cpp support in v0.1 :)

@slundberg slundberg closed this Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet