# Pytorch and ONNX Flow for Training


## Goals

* Learn how to re-training a model using Pytorch

* Learn how to export a trained model to ONNX

* Learn how to quantize an ONNX model to run inference on the NPU

## References

**[Ryzen AI Software Platform](https://ryzenai.docs.amd.com/en/latest/getstartex.html)**

**[Vitis AI Execution Provider](https://onnxruntime.ai/docs/execution-providers/Vitis-AI-ExecutionProvider.html)**

**[CIFAR10](https://github.com/EN10/CIFAR)**

---


<div class="alert alert-box alert-warning"> 
    
Running this re-training notebook will generate model files that will overwrite the existing trained quantized file in the `onnx` folder.

Please make sure you rename any existing model files in the `onnx` folder to save them.

The names of the model files that will be written are the following:

1. The trained ResNet-50 model on the CIFAR-10 dataset is: `onnx\resnet_trained_for_cifar10.pt`.
2. The trained ResNet-50 model on the CIFAR-10 dataset in ONNX format is: `onnx\resnet_trained_for_cifar10.onnx`.
3. The trained quantized ResNet-50 model on the CIFAR-10 dataset in ONNX format is: `onnx/resnet.qdq.U8S8.onnx`
</div>   


---


## Step 1: Import Packages

In [1]:
import torch
import torch.nn as nn
from torch.utils.data import DataLoader, Dataset

import torchvision
import torchvision.transforms as transforms
from torchvision.models import ResNet50_Weights, resnet50
from torchvision.datasets import CIFAR10

In [2]:
import onnx
import onnxruntime
from onnxruntime.quantization import CalibrationDataReader, QuantType, QuantFormat, CalibrationMethod, quantize_static

from quark.onnx.quantization.config import (Config, get_default_config)
from quark.onnx import ModelQuantizer
from onnxruntime.quantization import CalibrationDataReader, QuantType, QuantFormat
import random

[32m
[QUARK-INFO]: Custom Op compilation start.[0m
[32m
[QUARK-INFO]: The custom_op already exists.[0m
[32m
[QUARK-INFO]: Custom Op compilation already complete.[0m
  from .autonotebook import tqdm as notebook_tqdm



---


## Step 2: Prepare the Model

Let us retrain the [ResNet-50 model](https://arxiv.org/pdf/1512.03385.pdf) from PyTorch Hub using the CIFAR-10 dataset.

The CIFAR-10 dataset is used to retrain the default model using the [transfer learning technique](https://www.youtube.com/watch?v=BqqfQnyjmgg&list=PLo2EIpI_JMQtNtKNFFSMNIZwspj8H7-sQ&index=3).   


<div class="alert alert-box alert-warning">
Make sure that the CIFAR-10 dataset is downloaded. For steps refer to the previous notebook.
</div>

### Load model for re-training using transfer learning

The pre-trained ResNet-50 model trained on 1,000 class ImageNet dataset by default has fully connected (FC) layer of output size 1,000. This means that it produces a 1,000-dimensional vector, where each dimension corresponds to a class in the ImageNet dataset.

We use transfer learning to select a set of pre-trained weights for the model and then customize the model's classifier by replacing its FC layers. The modification includes adding two linear layers, one with 2,048 input features and 64 output features, followed by a ReLU activation function, and another linear layer with 64 input features and 10 output features. This adaptation transforms the ResNet-50 model into a classifier suitable for a specific task with 10 classes. 

In [2]:
# License 1 (see end of notebook)

def load_resnet_model():
    weights = ResNet50_Weights.DEFAULT
    resnet = resnet50(weights=weights)
    resnet.fc = torch.nn.Sequential(torch.nn.Linear(2048, 64), torch.nn.ReLU(inplace=True), torch.nn.Linear(64, 10))
    return resnet


# For updating learning rate
def update_lr(optimizer, lr):
    for param_group in optimizer.param_groups:
        param_group["lr"] = lr

### Model re-training

Define the CIFAR-10 dataset directory

In [3]:
global models_dir, data_dir
models_dir = ".\\onnx"
data_dir= ".\\onnx\\data"

The training process runs over 500 images with a `batch_size` of 100, i.e., over the total 50,000 images in the train set.

The training process takes approximately 10 minutes to complete each epoch. Number of epochs can be varied to optimize the accuracy of the model.

At the end of this process, we will save the trained model as an ONNX model and then we will also quantize this model.

In [4]:
# License 1 (see end of notebook)

def prepare_model(num_epochs=0):
    # Seed everything to 0
    random.seed(0)
    torch.manual_seed(0)
    torch.cuda.manual_seed(0)

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    # Hyper-parameters
    num_epochs = num_epochs
    learning_rate = 0.001

    # Image preprocessing modules
    transform = transforms.Compose(
        [transforms.Pad(4), transforms.RandomHorizontalFlip(), transforms.RandomCrop(32), transforms.ToTensor()]
    )

    # CIFAR-10 dataset
    train_dataset = torchvision.datasets.CIFAR10(root=data_dir, train=True, transform=transform, download=False)
    test_dataset = torchvision.datasets.CIFAR10(root=data_dir, train=False, transform=transforms.ToTensor())

    # Data loader
    train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=100, shuffle=True)
    test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=100, shuffle=False)

    model = load_resnet_model().to(device)

    # Loss and optimizer
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

    # Train the model
    total_step = len(train_loader)
    curr_lr = learning_rate
    for epoch in range(num_epochs):
        for i, (images, labels) in enumerate(train_loader):
            images = images.to(device)
            labels = labels.to(device)
            # Forward pass
            outputs = model(images)
            loss = criterion(outputs, labels)
            # Backward and optimize
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            if (i + 1) % 100 == 0:
                print(
                    "Epoch [{}/{}], Step [{}/{}] Loss: {:.4f}".format(
                        epoch + 1, num_epochs, i + 1, total_step, loss.item()
                    )
                )
        # Decay learning rate
        if (epoch + 1) % 20 == 0:
            curr_lr /= 3
            update_lr(optimizer, curr_lr)

    # Test the model
    model.eval()
    if num_epochs:
        with torch.no_grad():
            correct = 0
            total = 0
            for images, labels in test_loader:
                images = images.to(device)
                labels = labels.to(device)
                outputs = model(images)
                _, predicted = torch.max(outputs.data, 1)
                total += labels.size(0)
                correct += (predicted == labels).sum().item()

            accuracy = 100 * correct / total
            print("Accuracy of the model on the test images: {} %".format(accuracy))
    return model

In [5]:
# Run training
model = prepare_model(num_epochs=1)

Downloading: "https://download.pytorch.org/models/resnet50-11ad3fa6.pth" to C:\Users\aup-a/.cache\torch\hub\checkpoints\resnet50-11ad3fa6.pth
100%|█████████████████████████████████████████████████████████████████████████████| 97.8M/97.8M [00:05<00:00, 19.2MB/s]


Epoch [1/1], Step [100/500] Loss: 1.0639
Epoch [1/1], Step [200/500] Loss: 0.9532
Epoch [1/1], Step [300/500] Loss: 0.6795
Epoch [1/1], Step [400/500] Loss: 0.5647
Epoch [1/1], Step [500/500] Loss: 0.7307
Accuracy of the model on the test images: 75.29 %


Save the trained Pytorch model by running the following cell:

In [6]:
model.to("cpu")
model_path = f"{models_dir}/resnet_trained_for_cifar10.pt"
torch.save(model, model_path)

After completing the training process, observe the following output:   

* The trained ResNet-50 model on the CIFAR-10 dataset is saved at the following location: `onnx/resnet_trained_for_cifar10.pt`.

---

## Step 3: Convert Model to ONNX Format

Run the following cell to save the trained model as an ONNX model:

In [10]:
def save_onnx_model(model):
    dummy_inputs = torch.randn(1, 3, 32, 32)
    input_names = ['input']
    output_names = ['output']
    dynamic_axes = {'input': {0: 'batch_size'}, 'output': {0: 'batch_size'}}
    onnx_model_path = f"{models_dir}/resnet_trained_for_cifar10.onnx"
    torch.onnx.export(
        model,
        dummy_inputs,
        onnx_model_path,
        export_params=True,
        opset_version=13,
        input_names=input_names,
        output_names=output_names,
        dynamic_axes=dynamic_axes,
    )

In [11]:
# Save model
save_onnx_model(model)

After completing this process, observe the following output:

* The trained ResNet-50 model on the CIFAR-10 dataset is saved at the following location in ONNX format: `onnx/resnet_trained_for_cifar10.onnx`.

### Visualize the ONNX model

Generated and adapted using Netron
>Netron is a viewer for neural network, deep learning and machine learning models.

<div class="alert alert-box alert-warning">

<strong>Note</strong> this is an image of the default model we are using. If you have modified or re-trained your model, please visit [Netron](https://netron.app/) to generate a graph for your model.

</div>

In [19]:
from IPython.display import IFrame

notebook_url = "https://netron.app/"

iframe = IFrame(notebook_url, width=800, height=600)

display(iframe)



---


## Step 4: Quantize the Model

Quantizing AI models from floating-point to 8-bit integers reduces computational power and the memory footprint required for inference. For model quantization, you can either use [AMD Quark](https://quark.docs.amd.com/latest/index.html) or [Microsoft Olive](https://ryzenai.docs.amd.com/en/latest/olive_quant.html). This example utilizes the AMD Quark quantizer workflow. 
   
This will generate a quantized model using QDQ quant format and UInt8 activation type and Int8 weight type. After the run is completed, the quantized ONNX model `resnet.qdq.U8S8.onnx` is saved to `onnx/resnet.qdq.U8S8.onnx`.
    
For more information on representation of quantized ONNX models (e.g., QDQ quant format, UInt8 activation type and Int8 weight type) see [here](https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html#onnx-quantization-representation-format)   
   
The  ```QuantizationConfig``` class is used to configure the quantization parameters to the model. 

```python
from quark.onnx import ModelQuantizer, PowerOfTwoMethod, QuantType
from quark.onnx.quantization.config.config import Config, QuantizationConfig

quant_config = QuantizationConfig(
    quant_format=quark.onnx.QuantFormat.QDQ,
    calibrate_method=quark.onnx.PowerOfTwoMethod.MinMSE,
    input_nodes=[],
    output_nodes=[],
    op_types_to_quantize=[],
    per_channel=False,
    reduce_range=False,
    activation_type=quark.onnx.QuantType.QInt8,
    weight_type=quark.onnx.QuantType.QInt8,
    nodes_to_quantize=[],
    nodes_to_exclude=[],
    subgraphs_to_exclude=[],
    optimize_model=True,
    use_dynamic_quant=False,
    use_external_data_format=False,
    execution_providers=['CPUExecutionProvider'],
    enable_npu_cnn=False,
    enable_npu_transformer=False,
    convert_fp16_to_fp32=False,
    convert_nchw_to_nhwc=False,
    include_cle=False,
    include_sq=False,
    extra_options={},)
config = Config(global_quant_config=quant_config)

quantizer = ModelQuantizer(config)
quantizer.quantize_model(input_model_path, output_model_path, calibration_data_reader=None)
```



Run the following cell to define the calibration data reader (`resnet_calibration_reader`):

In [12]:
# License 2 (see end of notebook)

class CIFAR10DataSet:
    def __init__(
        self,
        data_dir,
        **kwargs,
    ):
        super().__init__()
        self.train_path = data_dir
        self.vld_path = data_dir
        self.setup("fit")

    def setup(self, stage: str):
        transform = transforms.Compose(
            [transforms.Pad(4), transforms.RandomHorizontalFlip(), transforms.RandomCrop(32), transforms.ToTensor()]
        )
        self.train_dataset = CIFAR10(root=self.train_path, train=True, transform=transform, download=False)
        self.val_dataset = CIFAR10(root=self.vld_path, train=True, transform=transform, download=False)


class PytorchResNetDataset(Dataset):
    def __init__(self, dataset):
        self.dataset = dataset

    def __len__(self):
        return len(self.dataset)

    def __getitem__(self, index):
        sample = self.dataset[index]
        input_data = sample[0]
        label = sample[1]
        return input_data, label


def create_dataloader(data_dir, batch_size):
    cifar10_dataset = CIFAR10DataSet(data_dir)
    _, val_set = torch.utils.data.random_split(cifar10_dataset.val_dataset, [49000, 1000])
    benchmark_dataloader = DataLoader(PytorchResNetDataset(val_set), batch_size=batch_size, drop_last=True)
    return benchmark_dataloader


class ResnetCalibrationDataReader(CalibrationDataReader):
    def __init__(self, data_dir: str, batch_size: int = 16):
        super().__init__()
        self.iterator = iter(create_dataloader(data_dir, batch_size))

    def get_next(self) -> dict:
        try:
            images, labels = next(self.iterator)
            return {"input": images.numpy()}
        except Exception:
            return None


def resnet_calibration_reader(data_dir, batch_size=16):
    return ResnetCalibrationDataReader(data_dir, batch_size=batch_size)

Run the following cell to quantize and save the model:

In [13]:
# License 2 (see end of notebook)

# `input_model_path` is the path to the original, unquantized ONNX model.
input_model_path = "onnx/resnet_trained_for_cifar10.onnx"

# `output_model_path` is the path where the quantized model will be saved.
output_model_path = "onnx/resnet.qdq.U8S8.onnx"

# `calibration_dataset_path` is the path to the dataset used for calibration during quantization.
calibration_dataset_path = "onnx/data/"

# `dr` (Data Reader) is an instance of ResNet50DataReader, which is a utility class that 
# reads the calibration dataset and prepares it for the quantization process.
dr = resnet_calibration_reader(calibration_dataset_path)

#Quantization with Quark
    
# Get quantization configuration
quant_config = get_default_config("XINT8")
config = Config(global_quant_config=quant_config)
print(f"The configuration for quantization is {config}")

# Create an ONNX quantizer
quantizer = ModelQuantizer(config)

# Quantize the ONNX model
quantizer.quantize_model(input_model_path, output_model_path, dr)

print('Calibrated and quantized model saved at:', output_model_path)

[32m
[QUARK-INFO]: The input ONNX model onnx/resnet_trained_for_cifar10.onnx can create InferenceSession successfully[0m


[QUARK_INFO]: Time information:
2025-02-25 16:51:18.617728
[QUARK_INFO]: OS and CPU information:
                                        system --- Windows
                                          node --- AUP11
                                       release --- 10
                                       version --- 10.0.26100
                                       machine --- AMD64
                                     processor --- AMD64 Family 26 Model 36 Stepping 0, AuthenticAMD
[QUARK_INFO]: Tools version information:
                                        python --- 3.10.16
                                          onnx --- 1.16.1
                                   onnxruntime --- 1.19.0
                                    quark.onnx --- 0.6.0+dba9ca364
[QUARK_INFO]: Quantized Configuration information:
                                   model_input --- onnx/resnet_trained_for_cifar10.onnx
                                  model_output --- onnx/resnet.qdq.U8S8.onnx
              

[32m
[QUARK-INFO]: Obtained calibration data with 62 iters[0m
[32m
[QUARK-INFO]: Removed initializers from input[0m
[32m
[QUARK-INFO]: Simplified model sucessfully[0m
[32m
[QUARK-INFO]: Loading model...[0m
[32m
[QUARK-INFO]: The input ONNX model C:/Users/aup-a/AppData/Local/Temp/vai.simp.l08qdde9/model_simp.onnx can run inference successfully[0m
[32m
[QUARK-INFO]: optimize the model for better hardware compatibility.[0m
[33m
[32m
[QUARK-INFO]: Start calibration...[0m
[32m
[QUARK-INFO]: Start collecting data, runtime depends on your model size and the number of calibration dataset.[0m
[32m
[QUARK-INFO]: Finding optimal threshold for each tensor using PowerOfTwoMethod.MinMSE algorithm ...[0m
[32m
[QUARK-INFO]: Use all calibration data to calculate min mse[0m
Computing range: 100%|███████████████████████████████████████████████████████████| 125/125 [00:22<00:00,  5.56tensor/s]
[32m
[QUARK-INFO]: Finished the calibration of PowerOfTwoMethod.MinMSE which costs 24.8s[0

The operation types and their corresponding quantities of the input float model is shown in the table below.


The quantized information for all operation types is shown in the table below.
The discrepancy between the operation types in the quantized model and the float model is due to the application of graph optimization.


Calibrated and quantized model saved at: onnx/resnet.qdq.U8S8.onnx


After completing the quantization process, observe the following output:

* The quantized ResNet-50 model on the CIFAR-10 dataset is saved at the following location in ONNX format: `onnx/resnet.qdq.U8S8.onnx`.


---


## Step 5: Deploy the Model on NPU for Inference

<div class="alert alert-box alert-warning">

To run Inference using the model generated in this notebook please refer to the [Pytorch_ONNX_Inference](5_1_pytorch_onnx_inference.ipynb) notebook.

</div>


---


## Licenses

License 1

```python
# -------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
# --------------------------------------------------------------------------
```

License 2

```python
#################################################################################  
# License
# Ryzen AI is licensed under `MIT License <https://github.com/amd/ryzen-ai-documentation/blob/main/License>`_ . Refer to the `LICENSE File <https://github.com/amd/ryzen-ai-documentation/blob/main/License>`_ for the full license text and copyright notice.
```


---

<center>
Copyright&copy; 2023 AMD, Inc
</center>
<center>
SPDX-License-Identifier: MIT
</center>