# YOLOv6 Quantization Compression Example

This example uses [ACT](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/example/auto_compression) from [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim) for YOLOv6 quantization.
The quantized model can be deployed on TensorRT.

- Benchmark

| Model | Base mAP<sup>val<br>0.5:0.95  | Quant mAP<sup>val<br>0.5:0.95 | Latency<sup><small>FP32</small><sup><br><sup> | Latency<sup><small>FP16</small><sup><br><sup> | Latency<sup><small>INT8</small><sup><br><sup> | Model |
| :-------- |:-------- |:--------: | :--------: | :---------------------: | :----------------: | :----------------: |
| YOLOv6s |  42.4   | 41.3  |  9.06ms  |   2.90ms   |  **1.83ms**  | [ONNX](https://github.com/meituan/YOLOv6/releases/download/0.1.0/yolov6s.onnx) &#124; [Quant ONNX](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov6s_quant_onnx.tar) |
| YOLOv6-Tiny  |  40.8   | 39.0 |  5.06ms  |   2.32ms   |  **1.68ms** | [ONNX](https://github.com/meituan/YOLOv6/releases/download/0.1.0/yolov6t.onnx) &#124; [Quant ONNX](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov6_tiny_quant_onnx.tar) |

### Experiment

(1) Environment Dependencies Installation:
  - paddlepaddle>=2.3.2
  - paddleslim>=2.3.4

In [None]:
# Take Ubuntu and CUDA 11.2 as an example for GPU, and other environments can be installed directly according to Paddle's official website.
#  https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html 

!python -m pip install paddlepaddle-gpu==2.3.2.post112 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html

# CPU
#!pip install paddlepaddle==2.3.2

!pip install paddleslim==2.3.4

(2) Model Preparation: the YOLOv6 ONNX model (currently only exclude NMS are supported)

In [None]:
# export yolov6s.onnx
!python ./deploy/ONNX/export_onnx.py \
    --weights yolov6s.pt \
    --img 640 \
    --batch 1

# Can also directly download the exported ONNX model
# !https://github.com/meituan/YOLOv6/releases/download/0.1.0/yolov6s.onnx

(3) Dataset Preparation (some unlabeled pictures of real scenes):

The directory format is as follows:
```
image_dir
├── 000000000139.jpg
├── 000000000285.jpg
├── ...
```

We use COCO's official `val` set as the image path.

In [None]:
image_dir' = './dataset/coco/val2017/'
model_dir = './yolov6s.onnx'

(4) Dependency Packages Import:

In [None]:
import cv2
import os
import numpy as np
import sys
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import paddle
from paddleslim.auto_compression import AutoCompression

paddle.set_device('gpu')

(5) Definition of Data Preprocessing:

In [None]:
def _generate_scale(im, target_shape, keep_ratio=True):
    origin_shape = im.shape[:2]
    im_size_min = np.min(origin_shape)
    im_size_max = np.max(origin_shape)
    target_size_min = np.min(target_shape)
    target_size_max = np.max(target_shape)
    im_scale = float(target_size_min) / float(im_size_min)
    if np.round(im_scale * im_size_max) > target_size_max:
        im_scale = float(target_size_max) / float(im_size_max)
    im_scale_x = im_scale
    im_scale_y = im_scale
    return im_scale_y, im_scale_x

def image_preprocess(img, target_shape=[640,640]):
    # Resize image
    im_scale_y, im_scale_x = _generate_scale(img, target_shape)
    img = cv2.resize(
        img,
        None,
        None,
        fx=im_scale_x,
        fy=im_scale_y,
        interpolation=cv2.INTER_LINEAR)
    # Pad
    im_h, im_w = img.shape[:2]
    h, w = target_shape[:]
    if h != im_h or w != im_w:
        canvas = np.ones((h, w, 3), dtype=np.float32)
        canvas *= np.array([114.0, 114.0, 114.0], dtype=np.float32)
        canvas[0:im_h, 0:im_w, :] = img.astype(np.float32)
        img = canvas
    img = np.transpose(img / 255, [2, 0, 1])
    return img.astype(np.float32)

(6) Definition of Configuration for AutoCompression:

In [None]:
run_config = {
    'Distillation': {
        'alpha': 1.0,
        'loss': 'soft_label'},
    'Quantization': {
        'onnx_format': True,
        'activation_quantize_type': 'moving_average_abs_max',
        'quantize_op_types': ['conv2d', 'depthwise_conv2d']},
    'TrainConfig': {
        'train_iter': 2000,
        'eval_iter': 1000,
        'learning_rate': 0.00003,
        'optimizer_builder': {'optimizer': {'type': 'SGD'}, 'weight_decay': 4e-05}}
}

(7) Auto Compression:

In [None]:
def reader_wrapper(reader, input_name='x2paddle_image_arrays'):
    def gen():
        for data in reader:
            yield {input_name: data[0]}
    return gen

paddle.vision.image.set_image_backend('cv2')
train_dataset = paddle.vision.datasets.ImageFolder(image_dir, transform=image_preprocess)
train_loader = paddle.io.DataLoader(train_dataset, batch_size=1, shuffle=True, drop_last=True, num_workers=0)

ac = AutoCompression(
    model_dir=model_dir,
    train_dataloader=reader_wrapper(train_loader),
    save_dir='output',
    config=run_config,
    eval_callback=None)
ac.compress()
# convert to ONNX
ac.export_onnx()

After executing the program, output files will be generated in the output folder as shown below:
```shell
├── model.pdiparams         # Paddle predicts model weights
├── model.pdmodel           # Paddle prediction model file
├── calibration_table.txt   # Paddle calibration table after quantification
├── ONNX
│   ├── quant_model.onnx      # ONNX model after quantization
│   ├── calibration.cache     # TensorRT can directly load the calibration table
```

- Speed Test:

In [None]:
!trtexec --onnx=output/ONNX/quant_model.onnx --avgRuns=1000 --workspace=1024 --calib=output/ONNX/calibration.cache --int8

- Python test:

Load `quant_model.onnx` and `calibration.cache`, you can directly use the TensorRT test script to verify. The detailed code can refer to [TensorRT deployment](/TensorRT).


In [None]:
!git clone https://github.com/PaddlePaddle/PaddleSlim.git
!cd example/auto_compression/pytorch_yolo_series/TensorRT
!python trt_eval.py --onnx_model_file=output/ONNX/quant_model.onnx \
                   --calibration_file=output/ONNX/calibration.cache \
                   --image_file=../images/000000570688.jpg \
                   --precision_mode=int8

And you can also eval COCO mAP:

In [None]:
!python trt_eval.py --onnx_model_file=output/ONNX/quant_model.onnx \
                   --calibration_file=output/ONNX/calibration.cache \
                   --precision_mode=int8 \
                   --dataset_dir=dataset/coco/ \
                   --val_image_dir=val2017 \
                   --val_anno_path=annotations/instances_val2017.json