-
Notifications
You must be signed in to change notification settings - Fork 72
Open
Labels
bugSomething isn't workingSomething isn't working
Milestone
Description
Description:
When running quantization on an FP8 model with the --ignore_layers argument, the specified layers are not ignored as expected. The log shows not_to_quantized_layers: [], indicating the argument is not being processed or respected for FP8 models.
AR_LOG_LEVEL=TRACE auto_round --model /models/Qwen3-8B-FP8 --ignore_layers "mlp"Loading checkpoint shards: 100%|████████████████████████| 2/2 [00:00<00:00, 16.29it/s]
2026-01-14 22:03:32 WARNING model.py L323: the support for fp8 model as input is experimental, please use with caution.
2026-01-14 22:03:32 INFO base.py L442: using torch.bfloat16 for quantization tuning
2026-01-14 22:03:32 INFO base.py L708: 'enable_torch_compile' is set to `False` by default. Enabling it can reduce tuning cost by 20%, but it might throw an exception.
2026-01-14 22:03:32 DEBUG replace_modules.py L152: Scanning for modules to replace
2026-01-14 22:03:32 DEBUG replace_modules.py L174: No modules found for replacement
2026-01-14 22:03:32 TRACE utils.py L889: not_to_quantized_layers: [] <--------------------Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working