AQLM Quantization #5465

CamiloMM · 2024-02-07T18:39:58Z

Description

AQLM (GitHub, Paper, Reddit discussion) is a novel quantization method that focuses on 2-2.5 bit and claims to surpass QuiP# and allows for a 70b to run on a 3090 with surprisingly good PPL (allegedly), and even 3-bit GPTQ

Additional Context

According to my high-accuracy crystal ball which I bought from The Onion a decade ago, TheBloke will ignore this and continue to release half a dozen quants of GPTQ per model until late 2026, no matter what else dethrones it.

oobabooga · 2024-02-07T21:23:47Z

#5466

Performance is pretty bad, but it may become better after huggingface/transformers#27931.

Tedy50 · 2024-02-12T02:12:24Z

I think at this time exl2 format is most attractive and may become most popular.

GPTQ is currently unavoidable because that's the only format which supports training.
but comparing performance with transformers is pointless.

CamiloMM · 2024-02-12T16:52:33Z

I think at this time exl2 format is most attractive and may become most popular.

Dunno about that man, have you checked the new IQ2/IQ3_XSS quants?

(Early stages and I haven't tried it but the ppl seems promising! Though for now I use exl2 because it's just so fast.)

Tedy50 · 2024-02-12T17:22:03Z

Though for now I use exl2 because it's just so fast.)

yes that's exactly why it is most attractive as it is running on exlama engine which is in order of magnitude faster than anything else and allows huge context sizes. At this time I just don't see any usable alternatives.

github-actions · 2024-04-12T23:15:00Z

This issue has been closed due to inactivity for 2 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

dsent · 2024-05-21T08:27:38Z

It seems that AQLM is supported by now, but there's no way to use it on Windows, because of aqlm[gpu] dependency which requires triton, triton not being available on Windows. @oobabooga, can you confirm?

CamiloMM added the enhancement New feature or request label Feb 7, 2024

github-actions bot added the stale label Apr 12, 2024

github-actions bot closed this as completed Apr 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AQLM Quantization #5465

AQLM Quantization #5465

CamiloMM commented Feb 7, 2024

oobabooga commented Feb 7, 2024

Tedy50 commented Feb 12, 2024

CamiloMM commented Feb 12, 2024 •

edited

Tedy50 commented Feb 12, 2024 •

edited

github-actions bot commented Apr 12, 2024

dsent commented May 21, 2024

AQLM Quantization #5465

AQLM Quantization #5465

Comments

CamiloMM commented Feb 7, 2024

oobabooga commented Feb 7, 2024

Tedy50 commented Feb 12, 2024

CamiloMM commented Feb 12, 2024 • edited

Tedy50 commented Feb 12, 2024 • edited

github-actions bot commented Apr 12, 2024

dsent commented May 21, 2024

CamiloMM commented Feb 12, 2024 •

edited

Tedy50 commented Feb 12, 2024 •

edited