Integration with llama.cpp #898

0x090909 · 2023-03-26T08:37:09Z

Hello,

Im reading the documentation and it seems that this indexer cannot be used with https://github.com/ggerganov/llama.cpp.

Am I correct?

If so, will it be integrated in the future?

Thanks

logan-markewich · 2023-03-26T15:53:46Z

@0x090909 the documentation provides an example of using a custom LLM here: https://gpt-index.readthedocs.io/en/latest/how_to/custom_llms.html#example-using-a-custom-llm-model

It will be up to you to handle passing the text to the model and returning the newly generated tokens.

0x090909 · 2023-03-27T07:23:05Z

@logan-markewich Thank you

ianscrivener · 2023-07-09T23:54:11Z

@logan-markewich .. sorry, but that link is dead now

ianscrivener · 2023-07-10T00:00:30Z

I'm using llama.cpp a lot. C++ inference of 4 bit quantized and optimised models is VERY performant. Significantly faster than FP16 python code

One approach to use llama.cpp with llama_index would be to use Abetlen's https://github.com/abetlen/llama-cpp-python pip pakcage hich replicates OpenAIa's API... though this (I assume) would require runnign two processes to get the job done

Ideally, there would be a llama_index Custom LLM module, that used an existing llama.cpp install .. llama_cpp-bin=/somwhere/llama.cpp/bin/main... leveraging the python C++ headers as per llama-cpp-python

ianscrivener · 2023-07-10T00:04:26Z

Another good, C++ FAST local LLM inference engine is https://github.com/OpenNMT/CTranslate2... which can use Llama family models.. as well as other family LLM models. (However it does not support MacOS Metal GPUs - which I have)

logan-markewich · 2023-07-10T01:11:01Z

You can use any LLM that langchain offers (which happens to include llama.cpp)

Using v0.7.4

from llama_index import SeviceContext, set_global_service_context 

llm = <setup langchain llm>
service_context = ServiceContext.from_defaults(llm=llm, context_window=<context window of llm>, chunk_size=<some value about 25%smaller than the context window of the llm>)
set_global_service_context(service_context)

Ngl but the last time I tried, llama.cpp is still not as fast as I'd like, and I think the context window was only 512 which is quite tiny for llama index

ianscrivener · 2023-07-10T02:19:52Z

thanks @logan-markewich 🙏

So langchain supports llama.cpp via llama-cpp-python library... which is fine - usually just 1 release version behind llama.cpp. llama-cpp-python and llama.cpp happily run Mac Arm64 & Metal

BTW: llama.cpp currently supports context size up to 2048, the C++ devs are currently working on extending context size via RoPE scaling.

llama.cpp is by far the best & fastest self hosted LLM inference I have found for Mac Silicon (Metal)

ianscrivener · 2023-07-10T02:29:17Z

Here's how I upgraded llama-cpp-python to support MacOS MEtal GPU

pip uninstall llama-cpp-python
CMAKE_ARGS="-DLLAMA_METAL=on" FORCE_CMAKE=1 pip install -U llama-cpp-python --no-cache-dir

BTW: xcode is required to compile the llama.cpp binary

logan-markewich closed this as completed Jun 6, 2023

dosubot bot mentioned this issue Sep 4, 2023

[Question]: GGUF model support? #7547

Closed

1 task

dosubot bot mentioned this issue Apr 25, 2024

[Question]: Using llama.cpp serve hosted model #13107

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integration with llama.cpp #898

Integration with llama.cpp #898

0x090909 commented Mar 26, 2023

logan-markewich commented Mar 26, 2023 •

edited

0x090909 commented Mar 27, 2023

ianscrivener commented Jul 9, 2023

ianscrivener commented Jul 10, 2023

ianscrivener commented Jul 10, 2023 •

edited

logan-markewich commented Jul 10, 2023 •

edited

ianscrivener commented Jul 10, 2023 •

edited

ianscrivener commented Jul 10, 2023

Integration with llama.cpp #898

Integration with llama.cpp #898

Comments

0x090909 commented Mar 26, 2023

logan-markewich commented Mar 26, 2023 • edited

0x090909 commented Mar 27, 2023

ianscrivener commented Jul 9, 2023

ianscrivener commented Jul 10, 2023

ianscrivener commented Jul 10, 2023 • edited

logan-markewich commented Jul 10, 2023 • edited

ianscrivener commented Jul 10, 2023 • edited

ianscrivener commented Jul 10, 2023

logan-markewich commented Mar 26, 2023 •

edited

ianscrivener commented Jul 10, 2023 •

edited

logan-markewich commented Jul 10, 2023 •

edited

ianscrivener commented Jul 10, 2023 •

edited