SmoothQuant vs AWQ which one is faster? #56

codertimo · 2023-08-02T06:44:58Z

Question

We are very interested in two post-training quantization papers from han lab!

SmoothQuant use W8A8 for efficient GPU computation.
AWQ uses W4/3A16 for lower memory requirements and higher memory throughput.

But which one is faster in actual production?
If you have any data about this, could you share it with us?

casper-hansen · 2023-08-06T10:11:59Z

Question

We are very interested in two post-training quantization papers from han lab!

SmoothQuant use W8A8 for efficient GPU computation. AWQ uses W4/3A16 for lower memory requirements and higher memory throughput.

But which one is faster in actual production? If you have any data about this, could you share it with us?

W4A16 is the fastest. I believe this is discussed in the paper, something along the lines of “weights make up the majority of delay”. Most layers in transformers are linear layers, so naturally you will see a large benefit from quantizing them.

I don’t have benchmarks to compare against SmoothQuant as it seems AWQ is preferred by the authors due to usability and speed with TinyChat.

tonylins · 2023-08-06T17:51:49Z

Hi @codertimo , usually W8A8 (SmoothQuant) is better for compute-bounded scenarios (e.g., large batch size, targeting large throughput), and W4A16 (AWQ) is better for memory-bounded scenarios (smaller batch size, lower latency). Let me know if you have more questions.

tonylins closed this as completed Aug 6, 2023

DavidePaglieri mentioned this issue Dec 7, 2023

AWQ and SmoothQuant #130

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SmoothQuant vs AWQ which one is faster? #56

SmoothQuant vs AWQ which one is faster? #56

codertimo commented Aug 2, 2023

casper-hansen commented Aug 6, 2023

Question

tonylins commented Aug 6, 2023

SmoothQuant vs AWQ which one is faster? #56

SmoothQuant vs AWQ which one is faster? #56

Comments

codertimo commented Aug 2, 2023

Question

casper-hansen commented Aug 6, 2023

Question

tonylins commented Aug 6, 2023