Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SmoothQuant vs AWQ which one is faster? #56

Closed
codertimo opened this issue Aug 2, 2023 · 2 comments
Closed

SmoothQuant vs AWQ which one is faster? #56

codertimo opened this issue Aug 2, 2023 · 2 comments

Comments

@codertimo
Copy link

Question

We are very interested in two post-training quantization papers from han lab!

SmoothQuant use W8A8 for efficient GPU computation.
AWQ uses W4/3A16 for lower memory requirements and higher memory throughput.

But which one is faster in actual production?
If you have any data about this, could you share it with us?

@casper-hansen
Copy link
Contributor

Question

We are very interested in two post-training quantization papers from han lab!

SmoothQuant use W8A8 for efficient GPU computation. AWQ uses W4/3A16 for lower memory requirements and higher memory throughput.

But which one is faster in actual production? If you have any data about this, could you share it with us?

W4A16 is the fastest. I believe this is discussed in the paper, something along the lines of “weights make up the majority of delay”. Most layers in transformers are linear layers, so naturally you will see a large benefit from quantizing them.

I don’t have benchmarks to compare against SmoothQuant as it seems AWQ is preferred by the authors due to usability and speed with TinyChat.

@tonylins
Copy link
Contributor

tonylins commented Aug 6, 2023

Hi @codertimo , usually W8A8 (SmoothQuant) is better for compute-bounded scenarios (e.g., large batch size, targeting large throughput), and W4A16 (AWQ) is better for memory-bounded scenarios (smaller batch size, lower latency). Let me know if you have more questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants