# AWS Neuron compilation on Yolov8

This notebook shows how to compile Yolov8/Pytorch to AWS Inferentia (inf1 instances) using NeuronSDK.

Reference: 
- Model Prediction with Ultralytics YOLO
    - https://docs.ultralytics.com/modes/predict/

## 1. Neuron Compilation using  Native Neuron SDK

### Load yolo8 model using ultralytics Lib

In [12]:
from ultralytics import YOLO

model = YOLO("model/yolov8n.pt", task="detect")


Downloading https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8n.pt to 'model/yolov8n.pt'...


100%|██████████| 6.23M/6.23M [00:00<00:00, 356MB/s]


### Compile pytorch model to neuron model
- When having an error, skip this cell

In [9]:
import torch
import torch_neuron

# generate dummy input example
example = torch.rand([1, 3, 640, 640])
print("input example shape: ", example.shape)
# trace the model forward
trace = torch_neuron.trace(model.model, example)



input example shape:  torch.Size([1, 3, 640, 640])


  if self.dynamic or self.shape != shape:
  module._c._create_method_from_trace(
INFO:Neuron:All operators are compiled by neuron-cc (this does not guarantee that neuron-cc will successfully compile)
INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 186, fused = 186, percent fused = 100.0%
  traced = torch._C._create_function_from_trace(
INFO:Neuron:Compiling function _NeuronGraph$650 with neuron-cc
INFO:Neuron:Compiling with command line: '/home/ubuntu/miniconda3/envs/yolo8-conda-py310/bin/neuron-cc compile /tmp/tmp62li8sr0/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp62li8sr0/graph_def.neff --io-config {"inputs": {"0:0": [[1, 3, 640, 640], "float32"]}, "outputs": ["Detect_74/aten_cat_5/concat:0", "Detect_74/aten_cat/concat:0", "Detect_74/aten_cat_1/concat:0", "Detect_74/aten_cat_2/concat:0"]} --verbose 35'


....
Compiler status PASS


INFO:Neuron:Number of arithmetic operators (post-compilation) before = 186, compiled = 186, percent compiled = 100.0%
INFO:Neuron:The neuron partitioner created 1 sub-graphs
INFO:Neuron:Neuron successfully compiled 1 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 100.0%
INFO:Neuron:Compiled these operators (and operator counts) to Neuron:
INFO:Neuron: => aten::Int: 7
INFO:Neuron: => aten::_convolution: 64
INFO:Neuron: => aten::add: 8
INFO:Neuron: => aten::cat: 19
INFO:Neuron: => aten::chunk: 9
INFO:Neuron: => aten::div: 1
INFO:Neuron: => aten::max_pool2d: 3
INFO:Neuron: => aten::mul: 1
INFO:Neuron: => aten::sigmoid: 1
INFO:Neuron: => aten::silu_: 57
INFO:Neuron: => aten::size: 3
INFO:Neuron: => aten::softmax: 1
INFO:Neuron: => aten::split_with_sizes: 1
INFO:Neuron: => aten::sub: 2
INFO:Neuron: => aten::transpose: 1
INFO:Neuron: => aten::unsqueeze: 1
INFO:Neuron: => aten::upsample_nearest2d: 2
INFO:Neuron: => aten::view: 5


### Inference on neuron model

In [11]:
result_neuron = trace(example)
print("result_neuron: ", len(result_neuron), ", shape: ", result_neuron[0].shape)

result_neuron:  2 , shape:  torch.Size([1, 84, 8400])


## 2. Compile and inference using ultralytics lib

### Load pytorch model, yolo8, and compile it to neuron model

In [14]:
from ultralytics import YOLO

import os

pt_model_path = 'model/yolov8n.pt'
neuron_model_path = 'model/yolov8n.neuron'

if os.path.exists(neuron_model_path):
    # Load the existing model
    m_inf= YOLO("model/yolov8n.neuron", task="detect")
    print(f"Loaded existing model from {neuron_model_path}")
else:
    mx=YOLO(pt_model_path)
    mx.export(format="neuron")
    m_inf= YOLO("model/yolov8n.neuron", task="detect")
    print(f"Compile and Load model from pytorch model, {pt_model_path}, and neuron model, {neuron_model_path}")



Loaded existing model from model/yolov8n.neuron


### inference on neuron model

In [15]:
result = m_inf.predict("test_image/bus.jpg", 
                            # show=True,
                            save=True, 
                            save_txt=True, 
                            save_crop=True, 
                            save_conf=True,
                            project='result_image')



Loading model/yolov8n.neuron for Neuron (NeuronCore-v1) inference...

image 1/1 /home/ubuntu/lab/03-yolo8-inf1/test_image/bus.jpg: 640x640 4 persons, 1 bus, 28.3ms
Speed: 2.2ms preprocess, 28.3ms inference, 73.9ms postprocess per image at shape (1, 3, 640, 640)
Results saved to [1mresult_image/predict2[0m
1 label saved to result_image/predict2/labels


In [16]:
print("result_inf2): \n", result)

result_inf2): 
 [ultralytics.engine.results.Results object with attributes:

boxes: ultralytics.engine.results.Boxes object
keypoints: None
masks: None
names: {0: 'person', 1: 'bicycle', 2: 'car', 3: 'motorcycle', 4: 'airplane', 5: 'bus', 6: 'train', 7: 'truck', 8: 'boat', 9: 'traffic light', 10: 'fire hydrant', 11: 'stop sign', 12: 'parking meter', 13: 'bench', 14: 'bird', 15: 'cat', 16: 'dog', 17: 'horse', 18: 'sheep', 19: 'cow', 20: 'elephant', 21: 'bear', 22: 'zebra', 23: 'giraffe', 24: 'backpack', 25: 'umbrella', 26: 'handbag', 27: 'tie', 28: 'suitcase', 29: 'frisbee', 30: 'skis', 31: 'snowboard', 32: 'sports ball', 33: 'kite', 34: 'baseball bat', 35: 'baseball glove', 36: 'skateboard', 37: 'surfboard', 38: 'tennis racket', 39: 'bottle', 40: 'wine glass', 41: 'cup', 42: 'fork', 43: 'knife', 44: 'spoon', 45: 'bowl', 46: 'banana', 47: 'apple', 48: 'sandwich', 49: 'orange', 50: 'broccoli', 51: 'carrot', 52: 'hot dog', 53: 'pizza', 54: 'donut', 55: 'cake', 56: 'chair', 57: 'couch', 58