llama.cpp and gguf files #1760

denijane · 2024-04-20T12:13:19Z

I'm trying to create a flow using a locally run llama3 model. I tried using ollama to run the llama3 model, but I'm getting strange responses. (Prompt simply doesn't stop and I'm watching the AI talk to itself). Also, it is very slow, like 10 times slower than what I get from directly running ollama.

Then I decided to use directly the downloaded model with LlamaCpp to see if it wokrs better. First thing: LLM->LlamaCPP accepts only bin files and the newer format is gguf.

Then I downloaded an older model that is in bin format and I'm getting
"ValueError: Error building node LlamaCpp(ID:LlamaCpp-BzhwI): Could not import llama-cpp-python library. Please install the llama-cpp-python library to use this embedding model: pip install llama-cpp-python"

I spent the night debugging this and I'm scratching my head. llama-cpp-python import Llama, while the error I get with LlamaCpp should be imported from langchain_community, not from llama-cpp-python.

I made a test in Python and after importing LlamaCPP from langchain_community, I was able to run fine Meta-Llama-3-8B-Instruct-Q6_K.gguf but not llama-2-7b-chat.ggmlv3.q3_K_L.bin which returs error(type=value_error).

I also made a test with from llama_cpp import Llama - again it works with .gguf files but not with bin files.

So I'm not sure which library LangFlow uses, maybe it's just naming convention calling it LlamaCPP when it's calling Llama, or it's really LlamaCpp and the error message about the library is wrong, but the file format is definitely wrong and LlamaCPP simply doesn't work.

The text was updated successfully, but these errors were encountered:

YamonBot · 2024-04-22T01:36:22Z

I recommend using Ollama, which has focused community support. Recently, I have been working on improvements to this component and expect to complete it within 2-3 days, as I have verified it works correctly. I suggest you review my draft and make your own modifications to temporarily operate it. (In my draft, remove and use the "buildConfig" section due to an incorrect implementation of the buildConfig method.)

#1701

denijane · 2024-04-22T16:12:16Z

Hi, I managed to make LlamaCpp work by editing the python code in the LllamaCpp component to allow gguf files (I also played with some source files but I don't think they did it). So it runs.

Now the problem is similar to the one with using ollama - 1) very slow (compared to just doing >ollama run llama3 and talking to it) and 2) it doesn't stop. It starts generating human responses and then replying to them and it goes on forever.

So if you fixed at least 2) in the new version, that would be a significant update.

anovazzi1 · 2024-07-01T16:54:41Z

Hello @denijane,
Sorry for the delay. Did you try using the new version? Does the error still persist?

carlosrcoelho · 2024-07-17T14:34:35Z

Hi @denijane

We hope you're doing well. Just a friendly reminder that if we do not hear back from you within the next 3 days, we will close this issue. If you need more time or further assistance, please let us know. 

Thank you for your understanding!

denijane · 2024-07-17T23:55:29Z

Hi @denijane

We hope you're doing well. Just a friendly reminder that if we do not hear back from you within the next 3 days, we will close this issue. If you need more time or further assistance, please let us know. 

Thank you for your understanding!

Hi, I've been on work trips, so I can't update right now and do proper testing. I'll return home this Friday and I'll have the time to test the issue during the weekend, if this is ok with you.
I have one test code with CustomLlama which seems to be working both with LllamaCpp and Lllama from langflow but I can't test more than this today (moreover my nvidia seems to not be in the mood to work today, I probably need to restart)

carlosrcoelho · 2024-07-22T00:45:15Z

@denijane

No worries at all, we totally understand! Take your time to get everything set up and test. We'll keep the issue open and look forward to your update.

Thanks for letting us know!

carlosrcoelho · 2024-07-24T14:10:29Z

Hi @denijane

We hope you're doing well. Just a friendly reminder that if we do not hear back from you within the next 3 days, we will close this issue. If you need more time or further assistance, please let us know. 

Thank you for your understanding!

denijane · 2024-07-25T21:46:59Z

Hi, just a quick note. I tested a flow about RAG with ollama - it worked quite nicely. There was a bug when reading ollama which didn't allow me to select the model and kept asking for llama2, but after I modified the code to set the url and the model, all worked and it's quite quick. I had a chat about a text document and it worked well.

I'm not sure where the llamacpp option went in this version of langflow and how you use gguf models, so if you can give me a hint, I could test also this. But ollama with llama3.1:8b worked very well.

carlosrcoelho added the question Further information is requested label Jul 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama.cpp and gguf files #1760

llama.cpp and gguf files #1760

denijane commented Apr 20, 2024

YamonBot commented Apr 22, 2024

denijane commented Apr 22, 2024

anovazzi1 commented Jul 1, 2024

carlosrcoelho commented Jul 17, 2024

denijane commented Jul 17, 2024

carlosrcoelho commented Jul 22, 2024

carlosrcoelho commented Jul 24, 2024

denijane commented Jul 25, 2024

llama.cpp and gguf files #1760

llama.cpp and gguf files #1760

Comments

denijane commented Apr 20, 2024

YamonBot commented Apr 22, 2024

denijane commented Apr 22, 2024

anovazzi1 commented Jul 1, 2024

carlosrcoelho commented Jul 17, 2024

denijane commented Jul 17, 2024

carlosrcoelho commented Jul 22, 2024

carlosrcoelho commented Jul 24, 2024

denijane commented Jul 25, 2024