-
Notifications
You must be signed in to change notification settings - Fork 338
Avoid normalization layers in HF's quantization_config #3030
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3030
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 1572b34 with merge base 8525185 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
I think that's the intended behavior but just wanted to cc @jerryzh168 to confirm. There are two ways to skip quantizing certain layers today:
However, neither of these accept regex today so you'll have to specify all the modules you want to skip manually, which may be a bit brittle |
80e9c51
to
9b781c6
Compare
e6c994c
to
1697fee
Compare
1697fee
to
1572b34
Compare
self.embedding = embedding | ||
self.tied_weights = tied_weights | ||
|
||
class PreTrainedM(M, PreTrainedModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I meant just using some specific model defined in transformers, and use the public APIs, just making sure, would the tests work for existing models in transformers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, existing transformers models also inherit from PreTrainedModel
. AutoModelForCausalLM.from_pretrained(..., quantization_config=quantization_config)
can be tested in the same way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lg, see comments inline, just want to make sure that the APIs we are using are applicable to all huggingface models
This is a follow-up to #3015. I found that setting
model.config.quantization_config
with a "_default" key will quantize not just linear layer weights, but all weights, including normalization layers.The culprit is still this line from TorchAoHfQuantizer.create_quantized_param:
In order to avoid quantizing non-linear weights, I had to manually add all module names for normalization layers to
modules_to_not_convert
, which is an argument toTorchAoConfig
.@andrewor14 Do you know if this is the intended behavior? I don't see why they aren't using the default
filter_fn
forquantize_
.