-
Notifications
You must be signed in to change notification settings - Fork 349
Open
Labels
Description
Today we have:
Int8DynamicActivationInt4WeightConfig
Int4WeightOnlyConfig
Float8DynamicActivationFloat8WeightConfig
Float8DynamicActivationInt4WeightConfig
NVFP4InferenceConfig
We should add:
(for lowering to executorch)
- Int8DynamicActivationIntxWeightConfig
- IntxWeightOnlyConfig
(requested by Axolotl, see axolotl-ai-cloud/axolotl#3107)
- Int8DynamicActivationInt8WeightConfig
- Int4DynamicActivationInt4WeightConfig
- Int8WeightOnlyConfig
Registration happens here today:
def _infer_fake_quantize_configs( |
Example usage:
# Not supported yet today
base_config = Int8DynamicActivationIntxWeightConfig(group_size=32)
quantize_(model, QATConfig(base_config, step="prepare"))
train(model)
quantize_(model, QATConfig(base_config, step="convert"))