Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AQLM Quantization #5465

Closed
CamiloMM opened this issue Feb 7, 2024 · 6 comments
Closed

AQLM Quantization #5465

CamiloMM opened this issue Feb 7, 2024 · 6 comments
Labels
enhancement New feature or request stale

Comments

@CamiloMM
Copy link

CamiloMM commented Feb 7, 2024

Description

AQLM (GitHub, Paper, Reddit discussion) is a novel quantization method that focuses on 2-2.5 bit and claims to surpass QuiP# and allows for a 70b to run on a 3090 with surprisingly good PPL (allegedly), and even 3-bit GPTQ

Additional Context

According to my high-accuracy crystal ball which I bought from The Onion a decade ago, TheBloke will ignore this and continue to release half a dozen quants of GPTQ per model until late 2026, no matter what else dethrones it.

@CamiloMM CamiloMM added the enhancement New feature or request label Feb 7, 2024
@oobabooga
Copy link
Owner

#5466

Performance is pretty bad, but it may become better after huggingface/transformers#27931.

@Tedy50
Copy link

Tedy50 commented Feb 12, 2024

I think at this time exl2 format is most attractive and may become most popular.

GPTQ is currently unavoidable because that's the only format which supports training.
but comparing performance with transformers is pointless.

@CamiloMM
Copy link
Author

CamiloMM commented Feb 12, 2024

I think at this time exl2 format is most attractive and may become most popular.

Dunno about that man, have you checked the new IQ2/IQ3_XSS quants?

(Early stages and I haven't tried it but the ppl seems promising! Though for now I use exl2 because it's just so fast.)

@Tedy50
Copy link

Tedy50 commented Feb 12, 2024

Though for now I use exl2 because it's just so fast.)

yes that's exactly why it is most attractive as it is running on exlama engine which is in order of magnitude faster than anything else and allows huge context sizes. At this time I just don't see any usable alternatives.

@github-actions github-actions bot added the stale label Apr 12, 2024
Copy link

This issue has been closed due to inactivity for 2 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

@dsent
Copy link

dsent commented May 21, 2024

It seems that AQLM is supported by now, but there's no way to use it on Windows, because of aqlm[gpu] dependency which requires triton, triton not being available on Windows. @oobabooga, can you confirm?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale
Projects
None yet
Development

No branches or pull requests

4 participants