-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does this framework support "what you serve is what you train" for weight only quantization? #179
Comments
Yes, in the config, you can select it individually. |
Thanks for the speedy reply. Would you mind pointing me to the config field that toggles this? |
https://github.com/google/aqt/blob/main/aqt/common/aqt_config.py#L362 As you see |
Yes, but from what i see in the underlying dot product code, if activation is not quantized, a float * float dot product is used? https://github.com/google/aqt/blob/main/aqt/jax/aqt_dot_general.py#L91 Is this a fake quantization or is this also the arithmetic in serving time? |
Sorry that I missed your comment. |
Keeping activations as float while quantizing the weights to int8 or int4. I see that currently how it is done is by using float * float dot product. is there a int * float dot product available?
The text was updated successfully, but these errors were encountered: