Support for model quantization #249

a-alhusaini · 2023-09-19T00:14:45Z

Running larger models on bumblebee is difficut for people with lower tier hardware.

Adding GPTQ and GGUF/GGML would greatly boost model accessability in the elixir ecosystem.

josevalim · 2023-09-19T06:20:17Z

This is a feature that will be added directly to Nx/EXLA. Currently WIP so stay tuned. :)

philpax · 2023-10-25T12:46:15Z

Hi there! I was wondering what the current status of this was; I found https://elixirforum.com/t/high-scale-performance-of-llms-needed-features/58562 but couldn't find any tracking issues for Bumblebee or NX.

I think quantization support is critical to making Bumblebee a viable option for developers/deployment; it makes it actually tractable to run larger models on consumer hardware.

Thanks in advance!

josevalim · 2023-10-25T13:02:44Z

We are currently updating XLA in Nx and the new version supports quantization. :) So hopefully sooner than later!

benbot · 2023-12-02T02:43:32Z

Is there an issue tracking that somewhere?

josevalim · 2023-12-02T03:15:13Z

Search for MLIR in the Nx project. Once that is done, we can start thinking about quantization!

shaqq · 2024-02-11T20:19:01Z

Adding GPTQ and GGUF/GGML would greatly boost model accessability in the elixir ecosystem.

Just to clarify, @josevalim, do you think Nx will be able to run GGUF models, even after the MLIR updates? I don't believe XLA will work with GGUF models out of the box, since that's a quantized model file format for llama.cpp:

https://github.com/ggerganov/ggml/blob/master/docs/gguf.md

josevalim · 2024-02-11T20:21:17Z

Those are two separate problems. Once we support quantization, then we may be able to run GGUF, as long as someone writes a deserializer for it.

SichangHe · 2024-07-07T15:33:54Z

Blocking on elixir-nx/nx#1452.

josevalim closed this as completed Sep 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for model quantization #249

Support for model quantization #249

a-alhusaini commented Sep 19, 2023

josevalim commented Sep 19, 2023

philpax commented Oct 25, 2023

josevalim commented Oct 25, 2023

benbot commented Dec 2, 2023

josevalim commented Dec 2, 2023

shaqq commented Feb 11, 2024 •

edited

Loading

josevalim commented Feb 11, 2024

SichangHe commented Jul 7, 2024

Support for model quantization #249

Support for model quantization #249

Comments

a-alhusaini commented Sep 19, 2023

josevalim commented Sep 19, 2023

philpax commented Oct 25, 2023

josevalim commented Oct 25, 2023

benbot commented Dec 2, 2023

josevalim commented Dec 2, 2023

shaqq commented Feb 11, 2024 • edited Loading

josevalim commented Feb 11, 2024

SichangHe commented Jul 7, 2024

shaqq commented Feb 11, 2024 •

edited

Loading