Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Int4WeightOnlyQuantizer to set different dtype for scales_and_zeros #479

Merged
merged 2 commits into from
Jul 5, 2024

Conversation

larryliu0820
Copy link
Contributor

As titled. Currently Int4WeightOnlyQuantizer is hardcoded to return scales_and_zeros with dtype torch.bfloat16. Adding dtype argument into the flow so that it can be different dtype.

scales_and_zeros

As titled. Currently `Int4WeightOnlyQuantizer` is hardcoded to return
`scales_and_zeros` with dtype `torch.bfloat16`. Adding `dtype` argument
into the flow so that it can be different dtype.
Copy link

pytorch-bot bot commented Jul 5, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/479

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit f3c320a with merge base a35a1cd (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 5, 2024
) -> None:
super().__init__()
self.padding = not _check_linear_int4_k(in_features, groupsize, inner_k_tiles)
if self.padding:
from model import find_multiple
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's a module called model

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks I think this is a relic of when gptq was more deeply coupled with gpt-fast

@msaroufim
Copy link
Member

This seems fine to merge although I do worry that most of our gptq tests are disabled right now in test/quantization/test_quant.api.py

@msaroufim msaroufim self-requested a review July 5, 2024 20:56
Copy link
Member

@msaroufim msaroufim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks fine but FYI we don't really have anyone maintaining the gptq example so if there's a use-case for it please let me know

@larryliu0820
Copy link
Contributor Author

Mostly looks fine but FYI we don't really have anyone maintaining the gptq example so if there's a use-case for it please let me know

I'm migrating torchchat to use these APIs, to be prepared for shared kernels across ET and PyTorch eager/compile.

@larryliu0820 larryliu0820 merged commit 9f85488 into main Jul 5, 2024
13 checks passed
@msaroufim msaroufim deleted the quant_dtype branch July 5, 2024 21:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants