Skip to content

Conversation

lisjin
Copy link
Contributor

@lisjin lisjin commented Sep 18, 2025

This is a follow-up to #3015. I found that setting model.config.quantization_config with a "_default" key will quantize not just linear layer weights, but all weights, including normalization layers.

The culprit is still this line from TorchAoHfQuantizer.create_quantized_param:

quantize_(module, c, filter_fn=lambda x, fqn: True)

In order to avoid quantizing non-linear weights, I had to manually add all module names for normalization layers to modules_to_not_convert, which is an argument to TorchAoConfig.

@andrewor14 Do you know if this is the intended behavior? I don't see why they aren't using the default filter_fn for quantize_.

Copy link

pytorch-bot bot commented Sep 18, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3030

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 1572b34 with merge base 8525185 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 18, 2025
@lisjin lisjin added the topic: bug fix Use this tag for PRs that fix bugs label Sep 18, 2025
@andrewor14
Copy link
Contributor

I think that's the intended behavior but just wanted to cc @jerryzh168 to confirm.

There are two ways to skip quantizing certain layers today:

  1. ModuleFqnToConfig, more flexible but also a bit more verbose
  2. modules_to_not_convert, just specify a list of module names to not convert

However, neither of these accept regex today so you'll have to specify all the modules you want to skip manually, which may be a bit brittle

@lisjin lisjin force-pushed the lvj/hf-quant-config branch 3 times, most recently from 80e9c51 to 9b781c6 Compare September 21, 2025 17:01
@lisjin lisjin force-pushed the lvj/hf-quant-config branch 4 times, most recently from e6c994c to 1697fee Compare September 21, 2025 18:41
self.embedding = embedding
self.tied_weights = tied_weights

class PreTrainedM(M, PreTrainedModel):
Copy link
Contributor

@jerryzh168 jerryzh168 Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I meant just using some specific model defined in transformers, and use the public APIs, just making sure, would the tests work for existing models in transformers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, existing transformers models also inherit from PreTrainedModel. AutoModelForCausalLM.from_pretrained(..., quantization_config=quantization_config) can be tested in the same way

Copy link
Contributor

@jerryzh168 jerryzh168 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lg, see comments inline, just want to make sure that the APIs we are using are applicable to all huggingface models

@lisjin lisjin merged commit fb7c837 into main Sep 22, 2025
18 checks passed
@lisjin lisjin deleted the lvj/hf-quant-config branch September 22, 2025 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: bug fix Use this tag for PRs that fix bugs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants