# AWS Neuron compilation on Yolov8

This notebook shows how to compile Yolov8/Pytorch to AWS Inferentia (inf1 instances) using NeuronSDK.

Reference: 
- Model Prediction with Ultralytics YOLO
    - https://docs.ultralytics.com/modes/predict/

In [2]:
%load_ext autoreload
%autoreload 2

import sys, os
print(os.getcwd())
sys.path.append(os.path.abspath(".."))

# for i in sys.path:
#     print(i)


The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
/home/ubuntu/lab/03-yolo8-inf1/notebook


## 1. Neuron Compilation using  Native Neuron SDK

### Load yolo8 model using ultralytics Lib

In [3]:
from ultralytics import YOLO

model = YOLO("../model/yolov8n.pt", task="detect")


  from .autonotebook import tqdm as notebook_tqdm


### Compile pytorch model to neuron model
- When having compilation for the first time if there is no neuron compilation file, it has the following error. After that, if you try one more, it successfully compile it.
    - The values for attribute 'shape' do not match: torch.Size([]) != torch.Size([1, 8400]).



In [4]:
from utils.local_util import * 

In [5]:
import torch
import torch_neuron
import os

pt_model_path = '../model/yolov8n.pt'
neuron_model_path = "../model/traced_yolo8_model_neuron.pt"

# generate dummy input example
batch_sizes = 1
input_shape = (batch_sizes, 3, 640, 640)
inputs_example = torch.ones(input_shape)  # or numpy array for TF, MX
print("input example shape: ", inputs_example.shape)


if os.path.exists(neuron_model_path):
    # Load the existing model
    neuron_model = load_neuron_model(neuron_model_path)
    print(f"Loaded existing model from {neuron_model_path}")
else:
    # trace the model forward
    neuron_model = torch_neuron.trace(model.model.eval(), inputs_example)
    print(f"Compile and Load model from pytorch model, {pt_model_path}, and neuron model, {neuron_model_path}")
    print(f"Neuron model is saved at, {neuron_model_path}")
    save_neuron_model(model=neuron_model, path=neuron_model_path)

input example shape:  torch.Size([1, 3, 640, 640])
/home/ubuntu/lab/03-yolo8-inf1/model/traced_yolo8_model_neuron.pt is given
Loaded existing model from ../model/traced_yolo8_model_neuron.pt


### Inference on neuron model

##### infereince on dummy data

In [10]:
# result_neuron = neuron_model(inputs_example)

batch_sizes = 64
input_shape = (batch_sizes, 3, 640, 640)
inputs_example = torch.ones(input_shape)  # or numpy array for TF, MX
result_neuron = neuron_model(inputs_example)

print("result_neuron: ", len(result_neuron), ", shape: ", result_neuron[0].shape)

result_neuron:  2 , shape:  torch.Size([64, 84, 8400])


##### infereince on bus image and post_processing

In [18]:
import cv2
import numpy as np
from ultralytics import YOLO

# convert image to numpy array which shapes, [1,3,640,640]
image_path = "../test_image/bus.jpg"
preprocessed_image, original_size = preprocess_image(image_path)

print("preprocessed_image: ", preprocessed_image.shape)
print("original_size: ", original_size)

preprocessed_image_torch = torch.from_numpy(preprocessed_image)

# inference on neuron model
result_neuron = neuron_model(preprocessed_image_torch)
print("result_neuron: ", len(result_neuron), ", shape:", result_neuron[0].shape)

# convert tensor to numpy array, [1,84,8400]
result_np = result_neuron[0].numpy()
print(result_np.shape)

# post_process for showing bound box
post_process_ultralytics(input_image=image_path, outputs=result_np)

preprocessed_image:  (1, 3, 640, 640)
original_size:  (1080, 810)
result_neuron:  2 , shape: torch.Size([1, 84, 8400])
(1, 84, 8400)


[{'class_id': 0,
  'class_name': 'person',
  'confidence': 0.8887587785720825,
  'box': [478.0, 226.0, 84.0, 296.0],
  'scale': 1.6875},
 {'class_id': 0,
  'class_name': 'person',
  'confidence': 0.8807970881462097,
  'box': [210.75, 241.0, 72.5, 266.0],
  'scale': 1.6875},
 {'class_id': 0,
  'class_name': 'person',
  'confidence': 0.8774768114089966,
  'box': [109.25, 236.0, 115.5, 300.0],
  'scale': 1.6875},
 {'class_id': 5,
  'class_name': 'bus',
  'confidence': 0.8459424376487732,
  'box': [97.0, 137.0, 458.0, 322.0],
  'scale': 1.6875},
 {'class_id': 0,
  'class_name': 'person',
  'confidence': 0.4234580993652344,
  'box': [79.875, 326.0, 34.25, 188.0],
  'scale': 1.6875}]

## 2. Compile and inference using ultralytics lib

### Load pytorch model, yolo8, and compile it to neuron model

In [19]:
from ultralytics import YOLO

import os

pt_model_path = '../model/yolov8n.pt'
neuron_model_path = '../model/yolov8n.neuron'

if os.path.exists(neuron_model_path):
    # Load the existing model
    # m_inf= YOLO("../model/traced_yolo8_model_neuron.pt", task="detect")
    m_inf= YOLO(neuron_model_path, task="detect")
    print(f"Loaded existing model from {neuron_model_path}")
else:
    mx=YOLO(pt_model_path)
    mx.export(format="neuron")
    # m_inf= YOLO("model/yolov8n.neuron", task="detect")
    m_inf= YOLO(neuron_model_path, task="detect")
    print(f"Compile and Load model from pytorch model, {pt_model_path}, and neuron model, {neuron_model_path}")



Ultralytics YOLOv8.2.0 🚀 Python-3.10.15 torch-1.13.1+cu117 CPU (Intel Xeon Platinum 8275CL 3.00GHz)
YOLOv8n summary (fused): 168 layers, 3151904 parameters, 0 gradients, 8.7 GFLOPs

[34m[1mPyTorch:[0m starting from '../model/yolov8n.pt' with input shape (1, 3, 640, 640) BCHW and output shape(s) (1, 84, 8400) (6.2 MB)

[34m[1mNeuron:[0m starting export with torch_neuron 1.13.1.2.11.7.0 and neuron-cc 1.24.0.0+d58fa6134...


INFO:Neuron:All operators are compiled by neuron-cc (this does not guarantee that neuron-cc will successfully compile)
INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 186, fused = 186, percent fused = 100.0%
INFO:Neuron:Compiler args type is <class 'list'> value is ['--fast-math', 'none']
INFO:Neuron:Compiling function _NeuronGraph$1524 with neuron-cc
INFO:Neuron:Compiling with command line: '/home/ubuntu/miniconda3/envs/yolo8-conda-py310/bin/neuron-cc compile /tmp/tmpg2lzlp3k/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpg2lzlp3k/graph_def.neff --io-config {"inputs": {"0:0": [[1, 3, 640, 640], "float32"]}, "outputs": ["Detect_74/aten_cat_5/concat:0"]} --fast-math none --verbose 35'


....
Compiler status PASS


INFO:Neuron:Number of arithmetic operators (post-compilation) before = 186, compiled = 186, percent compiled = 100.0%
INFO:Neuron:The neuron partitioner created 1 sub-graphs
INFO:Neuron:Neuron successfully compiled 1 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 100.0%
INFO:Neuron:Compiled these operators (and operator counts) to Neuron:
INFO:Neuron: => aten::Int: 7
INFO:Neuron: => aten::_convolution: 64
INFO:Neuron: => aten::add: 8
INFO:Neuron: => aten::cat: 19
INFO:Neuron: => aten::chunk: 1
INFO:Neuron: => aten::div: 1
INFO:Neuron: => aten::max_pool2d: 3
INFO:Neuron: => aten::mul: 1
INFO:Neuron: => aten::sigmoid: 1
INFO:Neuron: => aten::silu_: 57
INFO:Neuron: => aten::size: 3
INFO:Neuron: => aten::softmax: 1
INFO:Neuron: => aten::split_with_sizes: 9
INFO:Neuron: => aten::sub: 2
INFO:Neuron: => aten::transpose: 1
INFO:Neuron: => aten::unsqueeze: 1
INFO:Neuron: => aten::upsample_nearest2d: 2
INFO:Neuron: => aten::view: 5


[34m[1mNeuron:[0m export success ✅ 84.6s, saved as '../model/yolov8n.neuron' (11.3 MB)

Export complete (86.0s)
Results saved to [1m/home/ubuntu/lab/03-yolo8-inf1/model[0m
Predict:         yolo predict task=detect model=../model/yolov8n.neuron imgsz=640  
Validate:        yolo val task=detect model=../model/yolov8n.neuron imgsz=640 data=coco.yaml  
Visualize:       https://netron.app
Compile and Load model from pytorch model, ../model/yolov8n.pt, and neuron model, ../model/yolov8n.neuron


### inference on neuron model

In [20]:
results = m_inf.predict("../test_image/bus.jpg", 
                            # show=True,
                            save=True, 
                            save_txt=True, 
                            save_crop=True, 
                            save_conf=True,
                            project='result_image')


Loading ../model/yolov8n.neuron for Neuron (NeuronCore-v1) inference...

image 1/1 /home/ubuntu/lab/03-yolo8-inf1/notebook/../test_image/bus.jpg: 640x640 4 persons, 1 bus, 27.0ms
Speed: 2.1ms preprocess, 27.0ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 640)
Results saved to [1mresult_image/predict4[0m
1 label saved to result_image/predict4/labels


### Bounding Box information
Refer to the link 
- [Model Prediction with Ultralytics YOLO](https://docs.ultralytics.com/modes/predict/#working-with-results)

In [10]:
# View results
for r in results:
    print(r.boxes)  # print the Boxes object containing the detection bounding boxes

NameError: name 'results' is not defined