feature: support for exllama and AutoGPTQ #796

mudler · 2023-07-24T20:34:21Z

Discussed in #763

^{Originally posted by yarray July 17, 2023}
Although llama.cpp can now support GPU via cublas, it seems that exllama runs times faster if with a good enough GPU (3090 as an example). Is there any plan to support exllama, or in general, other loaders to load LLM?

localai-bot · 2023-07-24T21:01:53Z

⚠️⚠️⚠️⚠️⚠️

Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!

but.... I can also be funny or helpful 😸 and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.

Don't engage in conversation with me,
I don't support (yet) replying!

⚠️⚠️⚠️⚠️⚠️

Yes, we are planning to add support for more loaders in the future. Currently, we are working on improving the llama.cpp backend to make it faster and more efficient. We also plan to add support for other loaders such as exllama and others. However, this will require some time to implement and test properly. In the meantime, you can use exllama or other loaders to load LLM models and use them with LocalAI.

Sources:

mudler · 2023-07-26T16:57:44Z

warningwarningwarningwarningwarning

Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!

but.... I can also be funny or helpful smile_cat and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.

Don't engage in conversation with me, I don't support (yet) replying!

warningwarningwarningwarningwarning

Yes, we are planning to add support for more loaders in the future. Currently, we are working on improving the llama.cpp backend to make it faster and more efficient. We also plan to add support for other loaders such as exllama and others. However, this will require some time to implement and test properly. In the meantime, you can use exllama or other loaders to load LLM models and use them with LocalAI.

Sources:
* https://localai.io/features/embeddings/index.html

* https://localai.io/advanced/index.html

* https://localai.io/basics/news/index.html

* https://localai.io/basics/getting_started/index.html

lol 🤣

mudler · 2023-07-26T16:59:32Z

maybe we can split the twos - but for now keeping it here open for discussion. any takers here? or I'll likely start to have a look at it sooner or later

mudler added the enhancement New feature or request label Jul 24, 2023

mudler self-assigned this Jul 24, 2023

mudler changed the title ~~feature: support for exllama~~ feature: support for exllama and AutoGPTQ Jul 26, 2023

mudler added the high prio label Jul 26, 2023

mudler mentioned this issue Jul 30, 2023

feature: support CTranslate2 #839

Open

This was referenced Aug 7, 2023

feat: add bark and AutoGPTQ #871

Merged

feat: Add exllama #881

Merged

mudler closed this as completed in #881 Aug 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: support for exllama and AutoGPTQ #796

feature: support for exllama and AutoGPTQ #796

mudler commented Jul 24, 2023

localai-bot commented Jul 24, 2023

mudler commented Jul 26, 2023

warningwarningwarningwarningwarning

warningwarningwarningwarningwarning

mudler commented Jul 26, 2023

feature: support for exllama and AutoGPTQ #796

feature: support for exllama and AutoGPTQ #796

Comments

mudler commented Jul 24, 2023

Discussed in #763

localai-bot commented Jul 24, 2023

⚠️⚠️⚠️⚠️⚠️

⚠️⚠️⚠️⚠️⚠️

mudler commented Jul 26, 2023

warningwarningwarningwarningwarning

warningwarningwarningwarningwarning

mudler commented Jul 26, 2023