diff --git a/docs/source/backends/nxp/nxp-quantization.md b/docs/source/backends/nxp/nxp-quantization.md index 7406eea41cb..095f9e29e52 100644 --- a/docs/source/backends/nxp/nxp-quantization.md +++ b/docs/source/backends/nxp/nxp-quantization.md @@ -103,3 +103,120 @@ quantized_graph_module = calibrate_and_quantize( ``` See [PyTorch 2 Export Post Training Quantization](https://docs.pytorch.org/ao/main/tutorials_source/pt2e_quant_ptq.html) for more information. + +### Quantization Aware Training + +The NeutronQuantizer supports two modes of quantization: *Post‑Training Quantization (PTQ)* and *Quantization Aware Training (QAT)*. +PTQ uses a calibration phase to tune quantization parameters on an already‑trained model in order to obtain a model with integer weights. +While this optimization reduces model size, it introduces quantization noise and can degrade the model's performance. +Compared to PTQ, QAT enables the model to adapt its weights to the introduced quantization noise. +In QAT, instead of calibration we run training to optimize both quantization parameters and model weights at the same time. + +See the [Quantization Aware Training blog post](https://pytorch.org/blog/quantization-aware-training/) for an introduction to the QAT method. + +To use QAT with the Neutron backend, toggle the `is_qat` parameter: + +```python +from executorch.backends.nxp.quantizer.neutron_quantizer import ( + NeutronQuantizer, + NeutronTargetSpec, +) + +target_spec = NeutronTargetSpec(target="imxrt700") +neutron_quantizer = NeutronQuantizer(neutron_target_spec=target_spec, is_qat=True) +``` + +The rest of the quantization pipeline works similarly to the PTQ workflow. +The most significant change is that the calibration step is replaced by training. + +
+Note: QAT uses prepare_qat_pt2e prepare function instead of prepare_pt2e. +
+ +```python +import torch +from torch.utils.data import DataLoader +import torchvision.models as models +import torchvision.datasets as datasets +from torchvision.models.mobilenetv2 import MobileNet_V2_Weights +from executorch.backends.nxp.quantizer.neutron_quantizer import NeutronQuantizer +from executorch.backends.nxp.backend.neutron_target_spec import NeutronTargetSpec +from torchao.quantization.pt2e.quantize_pt2e import convert_pt2e, prepare_qat_pt2e +from torchao.quantization.pt2e import ( + move_exported_model_to_eval, + move_exported_model_to_train, + disable_observer, +) + +model = models.mobilenetv2.mobilenet_v2(weights=MobileNet_V2_Weights.DEFAULT).eval() + +neutron_target_spec = NeutronTargetSpec(target="imxrt700") +quantizer = NeutronQuantizer(neutron_target_spec, is_qat=True) # (1) + +sample_inputs = (torch.randn(1, 3, 224, 224),) +training_ep = torch.export.export(model, sample_inputs).module() # (2) + +## Steps different from PTQ (3–6) +prepared_model = prepare_qat_pt2e(training_ep, quantizer) # (3) !!! Different prepare function +prepared_model = move_exported_model_to_train(prepared_model) # (4) + +# ---------------- Training phase (5) ---------------- +criterion = torch.nn.CrossEntropyLoss() +optimizer = torch.optim.SGD(prepared_model.parameters(), lr=1e-2, momentum=0.9) + +train_data = datasets.ImageNet("./", split="train", transform=...) +train_loader = DataLoader(train_data, batch_size=5) + +# Training replaces calibration in QAT +for epoch in range(num_epochs): + for imgs, labels in train_loader: + optimizer.zero_grad() + outputs = prepared_model(imgs) + loss = criterion(outputs, labels) + loss.backward() + optimizer.step() + + # It is recommended to disable quantization params + # updates after few epochs of training. + if epoch >= num_epochs / 3: + model.apply(disable_observer) +# --------------- End of training phase --------------- + +prepared_model = move_exported_model_to_eval(prepared_model) # (6) +quantized_model = convert_pt2e(prepared_model) # (7) + +# Optional step - fixes biasless convolution (see Known Limitations of QAT) +quantized_model = QuantizeFusedConvBnBiasAtenPass( + default_zero_bias=True +)(quantized_model).graph_module + +... +``` + +Moving from PTQ to QAT check-list: +- Set `is_qat=True` in `NeutronQuantizer` +- Use `prepare_qat_pt2e` instead of `prepare_pt2e` +- Call `move_exported_model_to_train()` before training +- Train the model instead of calibrating +- Call `move_exported_model_to_eval()` after training + +#### Known limitations of QAT + +In the current ExecuTorch/TorchAO implementation, there is an issue when quantizing biasless convolutions during QAT. +The pipeline produces a non‑quantized empty bias, which causes the Neutron Converter to fail. +To mitigate this issue, use the `QuantizeFusedConvBnBiasAtenPass` post‑quantization: + +```python +... + +# training + +prepared_model = move_exported_model_to_eval(prepared_model) # (6) +quantized_model = convert_pt2e(prepared_model) # (7) + +quantized_model = QuantizeFusedConvBnBiasAtenPass( + default_zero_bias=True +)(quantized_model).graph_module + +... +``` diff --git a/docs/source/quantization-overview.md b/docs/source/quantization-overview.md index 81b15f6c8bb..b05c03026e7 100644 --- a/docs/source/quantization-overview.md +++ b/docs/source/quantization-overview.md @@ -25,12 +25,14 @@ These quantizers usually support configs that allow users to specify quantizatio * Precision (e.g., 8-bit or 4-bit) * Quantization type (e.g., dynamic, static, or weight-only quantization) * Granularity (e.g., per-tensor, per-channel) +* Post-Training Quantization vs. Quantization Aware Training Not all quantization options are supported by all backends. Consult backend-specific guides for supported quantization modes and configuration, and how to initialize the backend-specific PT2E quantizer: * [XNNPACK quantization](backends/xnnpack/xnnpack-quantization.md) * [CoreML quantization](backends/coreml/coreml-quantization.md) * [QNN quantization](backends-qualcomm.md#step-2-optional-quantize-your-model) +* [NXP quantization](backends/nxp/nxp-quantization.md)