-
Notifications
You must be signed in to change notification settings - Fork 33
[transforms] TransformScheme.block_size, deprecate head_dim #466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I think pydantic also has an alias option which might be an alternative to the custom_validator. That might also enforce that both values aren't set. But also fine this way and I'm not sure how well alias works with the deprecation warnings.
Thanks, yeah I played around with |
…field (#1806) SUMMARY: Resolves `INFERENG-1882` The research community [has pointed out](https://github.com/IST-DASLab/FP-Quant?tab=readme-ov-file#fp-format-quantization-harness) that the rotation/transform block size is important when performing transforms: > Key to efficiency is that the Hadamard block size matches the microscaling format group size (16 or 32) This exposes a new field on SpinQuantModifier and QuIPModifier to allow the user to set it to an arbitrary value, as long as the model's hidden_size and head_dim are both evenly divisible by it. - [x] Add to SpinQuant Modifier. Option to allow for different `transform_block_size`s for R1 vs. R2 can be added at a future time. - [x] Add to QuIPModifier. Option to allow for different `transform_block_size`s for U vs. V can be added at a future time. Merge in conjunction with: * neuralmagic/compressed-tensors#466 TEST PLAN: `transform_block_size` added to parameterized `tests/llmcompressor/modifiers/transform/(test_correctness.py|test_serialization.py)` --------- Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
…field (#1806) SUMMARY: Resolves `INFERENG-1882` The research community [has pointed out](https://github.com/IST-DASLab/FP-Quant?tab=readme-ov-file#fp-format-quantization-harness) that the rotation/transform block size is important when performing transforms: > Key to efficiency is that the Hadamard block size matches the microscaling format group size (16 or 32) This exposes a new field on SpinQuantModifier and QuIPModifier to allow the user to set it to an arbitrary value, as long as the model's hidden_size and head_dim are both evenly divisible by it. - [x] Add to SpinQuant Modifier. Option to allow for different `transform_block_size`s for R1 vs. R2 can be added at a future time. - [x] Add to QuIPModifier. Option to allow for different `transform_block_size`s for U vs. V can be added at a future time. Merge in conjunction with: * neuralmagic/compressed-tensors#466 TEST PLAN: `transform_block_size` added to parameterized `tests/llmcompressor/modifiers/transform/(test_correctness.py|test_serialization.py)` --------- Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
This deprecates
TransformScheme.head_dim
in favor ofTransformScheme.block_size
, which is more meaningful now that we want to apply block-diagonal transforms with a user-configured block size.To be merged in conjunction with:
transform_block_size
field vllm-project/llm-compressor#1806