# Export Quantized Pytorch Model

[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/pytorch/export/example_pytorch_export.ipynb)


To export a Pytorch model as a quantized model, it is necessary to first apply quantization
to the model using MCT:





In [None]:
! pip install -q mct-nightly

In order to export your quantized model to onnx format, and use it for inference, some additional packages are needed. Notice, this is needed only for models exported to onnx, so this part can be skipped if this is not planned:

In [None]:
! pip install -q onnx onnxruntime onnxruntime-extensions

Now, let's start the export demonstration by quantizing the model using MCT:

In [None]:
import model_compression_toolkit as mct
import numpy as np
import torch
from torchvision.models.mobilenetv2 import mobilenet_v2

# Create a model
float_model = mobilenet_v2()


# Notice that here the representative dataset is random for demonstration only.
def representative_data_gen():
    yield [np.random.random((1, 3, 224, 224))]


quantized_exportable_model, _ = mct.ptq.pytorch_post_training_quantization_experimental(float_model, representative_data_gen=representative_data_gen)



### ONNX

The model will be exported in ONNX format where weights and activations are represented as float. Notice onnx should be installed in order to export the model to an onnx model.

There are two optional formats to choose: MCTQ or FAKELY_QUANT.

#### MCTQ Quantization Format

By default, `mct.exporter.pytorch_export_model` will export the quantized pytorch model to
an onnx model with custom quantizers from mct_quantizers module.  



In [None]:
# Path of exported model
onnx_file_path = 'model_format_onnx_mctq.onnx'

# Export onnx model with mctq quantizers.
mct.exporter.pytorch_export_model(model=quantized_exportable_model,
                                  save_model_path=onnx_file_path,
                                  repr_dataset=representative_data_gen)

Notice that the model has the same size as the quantized exportable model as weights data types are float.

#### ONNX opset version

By default, the used ONNX opset version is 15, but this can be changed using `onnx_opset_version`:

In [None]:
# Export onnx model with mctq quantizers.
mct.exporter.pytorch_export_model(model=quantized_exportable_model,
                                  save_model_path=onnx_file_path,
                                  repr_dataset=representative_data_gen,
                                  onnx_opset_version=16)

### Use exported model for inference

To load and infer using the exported model, which was exported to an onnx file in MCTQ format, we will use `mct_quantizers` method `get_ort_session_options` during onnxruntime session creation.

In [None]:
import mct_quantizers as mctq
import onnxruntime as ort

sess = ort.InferenceSession(onnx_file_path,
                            mctq.get_ort_session_options(),
                            providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])

_input_data = next(representative_data_gen())[0].astype(np.float32)
_model_output_name = sess.get_outputs()[0].name
_model_input_name = sess.get_inputs()[0].name

# Run inference
predictions = sess.run([_model_output_name], {_model_input_name: _input_data})

#### Fakely-Quantized

To export a fakely-quantized model, use QuantizationFormat.FAKELY_QUANT:

In [None]:
import tempfile

# Path of exported model
_, onnx_file_path = tempfile.mkstemp('.onnx')

# Use QuantizationFormat.FAKELY_QUANT for fakely-quantized weights and activations.
mct.exporter.pytorch_export_model(model=quantized_exportable_model,
                                  save_model_path=onnx_file_path,
                                  repr_dataset=representative_data_gen,
                                  quantization_format=mct.exporter.QuantizationFormat.FAKELY_QUANT)


Notice that the fakely-quantized model has the same size as the quantized
exportable model as weights data types are float.

### TorchScript

The model will be exported in TorchScript format where weights and activations are
quantized but represented as float (fakely quant).

In [None]:
# Path of exported model
_, torchscript_file_path = tempfile.mkstemp('.pt')


# Use mode PytorchExportSerializationFormat.TORCHSCRIPT a torchscript model
# and QuantizationFormat.FAKELY_QUANT for fakely-quantized weights and activations.
mct.exporter.pytorch_export_model(model=quantized_exportable_model,
                                  save_model_path=torchscript_file_path,
                                  repr_dataset=representative_data_gen,
                                  serialization_format=mct.exporter.PytorchExportSerializationFormat.TORCHSCRIPT,
                                  quantization_format=mct.exporter.QuantizationFormat.FAKELY_QUANT)

Notice that the fakely-quantized model has the same size as the quantized exportable model as weights data types are
float.

Copyright 2024 Sony Semiconductor Israel, Inc. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
