-
Notifications
You must be signed in to change notification settings - Fork 993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llama-cpp-python support #70
Conversation
@microsoft-github-policy-service agree |
Thank you! |
@alxspiker I found a couple of problems with my implementation and are fixing them right now! |
Anyway to support mmap? Seems like its not. |
llama_print_timings: load time = 4772.70 ms
llama_print_timings: sample time = 3.01 ms / 1 runs ( 3.01 ms per run)
llama_print_timings: prompt eval time = 11246.46 ms / 23 tokens ( 488.98 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 12235.80 ms
Traceback (most recent call last):
File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\_program_executor.py", line 94, in run
await self.visit(self.parse_tree)
File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\_program_executor.py", line 428, in visit
visited_children.append(await self.visit(child, inner_next_node, inner_next_next_node, inner_prev_node, node, parent_node))
File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\_program_executor.py", line 428, in visit
visited_children.append(await self.visit(child, inner_next_node, inner_next_next_node, inner_prev_node, node, parent_node))
File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\_program_executor.py", line 395, in visit
command_output = await command_function(*positional_args, **named_args)
File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\library\_select.py", line 158, in select
option_logprobs = await recursive_select("")
File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\library\_select.py", line 131, in recursive_select
sub_logprobs = await recursive_select(rec_prefix, allow_token_extension=False)
File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\library\_select.py", line 131, in recursive_select
sub_logprobs = await recursive_select(rec_prefix, allow_token_extension=False)
File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\library\_select.py", line 131, in recursive_select
sub_logprobs = await recursive_select(rec_prefix, allow_token_extension=False)
[Previous line repeated 477 more times]
File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\library\_select.py", line 107, in recursive_select
gen_obj = await parser.llm_session(
File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\llms\_llama_cpp.py", line 244, in __call__
key = self._cache_key(locals())
File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\llms\_llm.py", line 76, in _cache_key
key = self._gen_key(args_dict)
File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\llms\_llm.py", line 69, in _gen_key
return "_---_".join([str(v) for v in ([args_dict[k] for k in var_names] + [self.llm.model_name, self.llm.__class__.__name__, self.llm.cache_version])])
File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\llms\_llm.py", line 69, in <listcomp>
return "_---_".join([str(v) for v in ([args_dict[k] for k in var_names] + [self.llm.model_name, self.llm.__class__.__name__, self.llm.cache_version])])
RecursionError: maximum recursion depth exceeded while getting the repr of an object
Error in program: maximum recursion depth exceeded while getting the repr of an object |
@alxspiker I have fixed all errors on my side but couldn't reproduce your error, but I added nmap to the settings! |
@alxspiker At the moment you have to use my fork of llama-cpp-python to use guidance. |
Related PR in llama-cpp-python: abetlen/llama-cpp-python#252 It would be awesome to use guidance with llama.cpp! I'm excited :) |
@Maximilian-Winter this is great, thanks! It will probably be Monday before I can review it properly. Are the there any basic units tests we can add for this? (with small LMs that don't slow down the test process too much) ...might not be possible with LLaMA, but even a file with test that only run locally would be good so we can make sure this stays working :) |
(I also just approved the unit tests to run for this) |
@slundberg I have added a test file in the tests/llms folder called "test_llamacpp.py". |
After many attempts I could not get the role chat to work I've use this code:
|
@slundberg I have implemented proper role_end again, also implemented streaming support. |
@slundberg I think the best way would be to test just locally. The smallest model right now is a 7B parameter model which is already 3.8gb of memory. |
Just a note here, I was still getting some tokenization issues and realized it is going to be hard to maintain so much code that is similar between transformers and llamacpp, so I am going to try and push a proposal to share more code tonight. |
…lamaCpp2 proposal
I pushed a proposal in the form of LlamaCpp2, along with lots of updates to Transformers that are related because we will want to depend on them. I think we need to inherit from the Transformers LLM class because otherwise we duplicate lots of code that is tricky and should only live in one place :) LlamaCpp2 does not work fully yet, but I am pushing to to see what you think @Maximilian-Winter. thanks again for all the hard work pushing on this :) |
@slundberg Will take a look later today |
@slundberg Looks good to me! But I think the tokenizer of llama.cpp is bugged because it refuse to give me the eos or bos token. |
Yeah, I just think we can just return I also noticed that logit bias processor inside llama-cpp-python seems to save the bias values after the local |
@slundberg I will try to make it work later! |
return self.model_obj.detokenize(tokens).decode("utf-8", errors="ignore") # errors="ignore" is copied from llama-cpp-python | ||
|
||
def convert_ids_to_tokens(self, ids): | ||
return [self.decode([id]) for id in ids] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return [self.decode(ids)]
?
Please merge this microsoft or at least help support it. This would be HUGE for guidance |
Is there progress with this? Oh I see there is already a message. Yes, there are a few of us spamming refresh on this... |
Sorry, was very busy with other stuff at work! Will look into this! |
Any progress? |
Any updates on this? |
Hmm. Looks like there's a conflict with 47b1cd4. Trying to use a519012, a merge which brings the former commit into the PR...
|
yeah this needs to be updated. I've tried forking Max's and git pulling Microsoft's main branch, but there have been a lot of changes since June so lots of things need tweaking. |
Any updates on this? |
Any plans on merging this at some point? |
But... |
any update on this ??? |
@Maximilian-Winter thank you so much for all your hard work on this! Due to some external circumstances over the summer I couldn't come back to push it over the finish line until this fall (with v0.1). This PR strongly informed the design decisions we made for Llama.cpp support in v0.1 though so it was very useful. I am closing this now since we now have full llama.cpp support in v0.1 :) |
I have added llama-cpp-python support. I also created a example notebook on how to use it!