Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for model quantization #249

Closed
a-alhusaini opened this issue Sep 19, 2023 · 8 comments
Closed

Support for model quantization #249

a-alhusaini opened this issue Sep 19, 2023 · 8 comments

Comments

@a-alhusaini
Copy link

Running larger models on bumblebee is difficut for people with lower tier hardware.

Adding GPTQ and GGUF/GGML would greatly boost model accessability in the elixir ecosystem.

@josevalim
Copy link
Contributor

This is a feature that will be added directly to Nx/EXLA. Currently WIP so stay tuned. :)

@philpax
Copy link

philpax commented Oct 25, 2023

Hi there! I was wondering what the current status of this was; I found https://elixirforum.com/t/high-scale-performance-of-llms-needed-features/58562 but couldn't find any tracking issues for Bumblebee or NX.

I think quantization support is critical to making Bumblebee a viable option for developers/deployment; it makes it actually tractable to run larger models on consumer hardware.

Thanks in advance!

@josevalim
Copy link
Contributor

We are currently updating XLA in Nx and the new version supports quantization. :) So hopefully sooner than later!

@benbot
Copy link

benbot commented Dec 2, 2023

Is there an issue tracking that somewhere?

@josevalim
Copy link
Contributor

Search for MLIR in the Nx project. Once that is done, we can start thinking about quantization!

@shaqq
Copy link

shaqq commented Feb 11, 2024

Adding GPTQ and GGUF/GGML would greatly boost model accessability in the elixir ecosystem.

Just to clarify, @josevalim, do you think Nx will be able to run GGUF models, even after the MLIR updates? I don't believe XLA will work with GGUF models out of the box, since that's a quantized model file format for llama.cpp:

https://github.com/ggerganov/ggml/blob/master/docs/gguf.md

@josevalim
Copy link
Contributor

Those are two separate problems. Once we support quantization, then we may be able to run GGUF, as long as someone writes a deserializer for it.

@SichangHe
Copy link

Blocking on elixir-nx/nx#1452.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants