# AWS Neuron Benchmark on Yolov8

This notebook shows how to compile Yolov8/Pytorch to AWS Inferentia (inf1 instances) using NeuronSDK.

Reference: 
- Model NeuronPerf (Beta)
    - https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuronperf/index.html
- [NeuronPerf Examples](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuronperf/neuronperf_examples.html#neuronperf-examples)    

## 1. Neuron Compilation using  Native Neuron SDK

### Load yolo8 model using ultralytics Lib

In [1]:
from ultralytics import YOLO

model = YOLO("model/yolov8n.pt", task="detect")


  from .autonotebook import tqdm as notebook_tqdm


### Benchmark a Compiled Model
- [NeuronPerf Examples](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuronperf/neuronperf_examples.html#neuronperf-examples)

In [2]:
import torch  # or tensorflow, mxnet

import neuronperf as npf
import neuronperf.torch  # or tensorflow, mxnet


neuron_model_path = "model/benchmark_yolo8_model_neuron.pt"

# Construct dummy inputs
batch_sizes = 1
input_shape = (batch_sizes, 3, 640, 640)
inputs = torch.ones(input_shape)  # or numpy array for TF, MX

# Benchmark and save results
# Compile
npf.torch.compile(
	model, 
	inputs, 
	batch_sizes=batch_sizes, 
	filename=neuron_model_path,
)

INFO:neuronperf.compiling - Compiling batch size 1 for 1 NeuronCore(s) with performance level -1/-1. [1/1]


New https://pypi.org/project/ultralytics/8.3.13 available 😃 Update with 'pip install -U ultralytics'
Ultralytics YOLOv8.2.0 🚀 Python-3.10.15 torch-1.13.1+cu117 CPU (Intel Xeon Platinum 8275CL 3.00GHz)
[34m[1mengine/trainer: [0mtask=detect, mode=train, model=model/yolov8n.pt, data=coco.yaml, epochs=100, time=None, patience=100, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=None, workers=8, project=None, name=train3, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, sh

ERROR:neuronperf.compiling - Failed to compile input=0, batch_size=1, pipeline_size=1, perf_level=-1.
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/yolo8-conda-py310/lib/python3.10/site-packages/ultralytics/engine/trainer.py", line 514, in get_dataset
    data = check_det_dataset(self.args.data)
  File "/home/ubuntu/miniconda3/envs/yolo8-conda-py310/lib/python3.10/site-packages/ultralytics/data/utils.py", line 269, in check_det_dataset
    file = check_file(dataset)
  File "/home/ubuntu/miniconda3/envs/yolo8-conda-py310/lib/python3.10/site-packages/ultralytics/utils/checks.py", line 499, in check_file
    raise FileNotFoundError(f"'{file}' does not exist")
FileNotFoundError: 'coco.yaml' does not exist

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/yolo8-conda-py310/lib/python3.10/site-packages/neuronperf/compiling.py", line 256, in compile
    model_filename = compi

'model/benchmark_yolo8_model_neuron.pt.json'

In [4]:


reports = npf.torch.benchmark(neuron_model_path, inputs, batch_sizes, 
                              duration=10, # 10 seconds
                              )

npf.print_reports(reports)
npf.write_json(reports)

INFO:neuronperf.benchmarking - Benchmarking 'model/traced_yolo8_model_neuron.pt', ~8.0 minutes remaining.
ERROR:neuronperf.benchmarking - Benchmarker 6 encountered an error during prep: 
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/yolo8-conda-py310/lib/python3.10/site-packages/neuronperf/benchmarking.py", line 345, in run
    self.setup()
  File "/home/ubuntu/miniconda3/envs/yolo8-conda-py310/lib/python3.10/site-packages/neuronperf/benchmarking.py", line 244, in setup
    self.load()
  File "/home/ubuntu/miniconda3/envs/yolo8-conda-py310/lib/python3.10/site-packages/neuronperf/benchmarking.py", line 181, in load
    self.model = self.load_fn(self.model_filename, device_id=self.device_id)
  File "/home/ubuntu/miniconda3/envs/yolo8-conda-py310/lib/python3.10/site-packages/neuronperf/torch/torch.py", line 61, in _neuron_load_fn
    model = torch.jit.load(model_filename)
  File "/home/ubuntu/miniconda3/envs/yolo8-conda-py310/lib/python3.10/site-packages/torch_ne

config_tag         cost_per_1m_inf    throughput_avg     latency_ms_p50     latency_ms_p99     n_models           workers_per_model  batch_size         
id_nlqv2b3a        3.266              44.647             44.76              44.907             1                  2                  1                  
id_hgk03wvj        4.041              36.092             27.665             27.96              1                  1                  1                  


'model/traced_yolo8_model_neuron.pt.results-20241015-012448.json'

## 2. Compile and inference using ultralytics lib

### Load pytorch model, yolo8, and compile it to neuron model

In [6]:
from ultralytics import YOLO

import os

pt_model_path = 'model/yolov8n.pt'
neuron_model_path = 'model/yolov8n.neuron'

if os.path.exists(neuron_model_path):
    # Load the existing model
    m_inf= YOLO("model/yolov8n.neuron", task="detect")
    print(f"Loaded existing model from {neuron_model_path}")
else:
    mx=YOLO(pt_model_path)
    mx.export(format="neuron")
    m_inf= YOLO("model/yolov8n.neuron", task="detect")
    print(f"Compile and Load model from pytorch model, {pt_model_path}, and neuron model, {neuron_model_path}")



Loaded existing model from model/yolov8n.neuron


### inference on neuron model

In [9]:
results = m_inf.predict("test_image/bus.jpg", 
                            # show=True,
                            save=True, 
                            save_txt=True, 
                            save_crop=True, 
                            save_conf=True,
                            project='result_image')




image 1/1 /home/ubuntu/lab/03-yolo8-inf1/test_image/bus.jpg: 640x640 4 persons, 1 bus, 30.4ms
Speed: 2.1ms preprocess, 30.4ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 640)
Results saved to [1mresult_image/predict10[0m
1 label saved to result_image/predict10/labels


### Bounding Box information
Refer to the link 
- [Model Prediction with Ultralytics YOLO](https://docs.ultralytics.com/modes/predict/#working-with-results)

In [10]:
# View results
for r in results:
    print(r.boxes)  # print the Boxes object containing the detection bounding boxes

ultralytics.engine.results.Boxes object with attributes:

cls: tensor([0., 0., 0., 5., 0.])
conf: tensor([0.8909, 0.8833, 0.8779, 0.8442, 0.4408])
data: tensor([[6.7083e+02, 3.8008e+02, 8.0986e+02, 8.7969e+02, 8.9086e-01, 0.0000e+00],
        [2.2162e+02, 4.0706e+02, 3.4353e+02, 8.5626e+02, 8.8332e-01, 0.0000e+00],
        [5.0671e+01, 3.9760e+02, 2.4420e+02, 9.0507e+02, 8.7790e-01, 0.0000e+00],
        [3.1541e+01, 2.3063e+02, 8.0153e+02, 7.7584e+02, 8.4424e-01, 5.0000e+00],
        [4.2298e-01, 5.4981e+02, 5.7900e+01, 8.6834e+02, 4.4076e-01, 0.0000e+00]])
id: None
is_track: False
orig_shape: (1080, 810)
shape: torch.Size([5, 6])
xywh: tensor([[740.3431, 629.8870, 139.0354, 499.6159],
        [282.5750, 631.6615, 121.9024, 449.1991],
        [147.4372, 651.3355, 193.5327, 507.4696],
        [416.5346, 503.2327, 769.9878, 545.2150],
        [ 29.1616, 709.0754,  57.4772, 318.5244]])
xywhn: tensor([[0.9140, 0.5832, 0.1716, 0.4626],
        [0.3489, 0.5849, 0.1505, 0.4159],
        [0.18

In [4]:
# print("result_inf2): \n", result)