Does this framework support "what you serve is what you train" for weight only quantization? #179

chenho74 · 2023-06-05T15:48:10Z

Keeping activations as float while quantizing the weights to int8 or int4. I see that currently how it is done is by using float * float dot product. is there a int * float dot product available?

lukaszlew · 2023-06-05T16:54:00Z

Yes, in the config, you can select it individually.

chenho74 · 2023-06-05T18:21:50Z

Thanks for the speedy reply. Would you mind pointing me to the config field that toggles this?

lukaszlew · 2023-06-05T20:22:26Z

https://github.com/google/aqt/blob/main/aqt/common/aqt_config.py#L362

As you see lhs and rhs have completly separate quantization configs.

chenho74 · 2023-06-05T21:38:58Z

Yes, but from what i see in the underlying dot product code, if activation is not quantized, a float * float dot product is used? https://github.com/google/aqt/blob/main/aqt/jax/aqt_dot_general.py#L91 Is this a fake quantization or is this also the arithmetic in serving time?

lukaszlew · 2023-11-07T01:52:28Z

Sorry that I missed your comment.
For the dot product to be accelerated, both sides need to have the same type.
You are probably most concerned about weight loading.
We will implement that in AQTv2 within a month.

chenho74 changed the title ~~Does this framework support option to keep activations as float?~~ Does this framework support "what you serve is what you train" for weight only quantization? Jun 5, 2023

lukaszlew closed this as completed Nov 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does this framework support "what you serve is what you train" for weight only quantization? #179

Does this framework support "what you serve is what you train" for weight only quantization? #179

chenho74 commented Jun 5, 2023 •

edited

lukaszlew commented Jun 5, 2023

chenho74 commented Jun 5, 2023

lukaszlew commented Jun 5, 2023

chenho74 commented Jun 5, 2023

lukaszlew commented Nov 7, 2023

Does this framework support "what you serve is what you train" for weight only quantization? #179

Does this framework support "what you serve is what you train" for weight only quantization? #179

Comments

chenho74 commented Jun 5, 2023 • edited

lukaszlew commented Jun 5, 2023

chenho74 commented Jun 5, 2023

lukaszlew commented Jun 5, 2023

chenho74 commented Jun 5, 2023

lukaszlew commented Nov 7, 2023

chenho74 commented Jun 5, 2023 •

edited