-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Selective dequantization #5375
Selective dequantization #5375
Conversation
@microsoft-github-policy-service agree company="Snowflake |
@JamesTheZ could i please get your review on this PR too? thanks |
Hi @loadams can we please have someone review this so we merge it soon? thanks :) |
@microsoft-github-policy-service agree [company="Snowflake"] |
This PR adds a new functionality for the dequantizer function, called `selective_dequantize`, which enables partially dequantizing a 3-dimensional matrix in case we don't need to dequantize all the data from lower bit (like fp8/fp6) to bf16. I also added a unit test to check its functionality. --------- Co-authored-by: Reza Yazdani <reza.yazdani@snowflake.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
This PR adds a new functionality for the dequantizer function, called `selective_dequantize`, which enables partially dequantizing a 3-dimensional matrix in case we don't need to dequantize all the data from lower bit (like fp8/fp6) to bf16. I also added a unit test to check its functionality. --------- Co-authored-by: Reza Yazdani <reza.yazdani@snowflake.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
This PR adds a new functionality for the dequantizer function, called
selective_dequantize
, which enables partially dequantizing a 3-dimensional matrix in case we don't need to dequantize all the data from lower bit (like fp8/fp6) to bf16.I also added a unit test to check its functionality.