# Convert Model

There are 2 ways to convert yolo model to tensorrt engine. <br>
> 1. Use Ultralytics api
> 2. Use tensorRT API or trtexec command

During the validation, #1 method didn't work with Triton inference server.<br>
~~So We will use the tensorRT engine which is converted by trtexec command.~~ <br>
But With some work-arounds, Ultralytics API works to convert model from pt to tensorrt engine. <br> 
In this tutorial, we will walk around end to end example for `Serving Yolo model as tensorrt Engine with Triton Inference Server.`

## Prerequisites
* Install python packages in requirements.txt
* ~~Create Persistent Volume (RWX permission) via Kubeflow to save the model between pods and Create Notebook with it. ( PV is mounted to **/mnt/yolo-model** in this example)~~
* Need GPU to convert TensorRT engine

### check the output shape

In [5]:
from ultralytics import YOLO
import torch

model_path = "/mnt/user/LPD/finetune/artifact_downloads/14be0ff37e03403b9f50b3cd4e8dd7a5/weights/"
model_name = "best.pt"

torch_model = YOLO(model_path + model_name).model
torch_model.eval()

dummy_input = torch.randn(1, 3, 640, 640)
with torch.no_grad():
    outputs = torch_model(dummy_input)

for out in outputs:
    # Some outputs are lists; checking each element carefully
    if isinstance(out, torch.Tensor):
        print(out.shape)
    else:
        print("List output:", [o.shape for o in out if hasattr(o, 'shape')])

torch.Size([1, 5, 8400])
List output: [torch.Size([1, 65, 80, 80]), torch.Size([1, 65, 40, 40]), torch.Size([1, 65, 20, 20])]


## 1. Convert yolo model to TensorRT Engine
#### Before Directly Convert model with YOLO API, There are 2 points to be handled
* Block some lines in ultralytics/engine/exporter.py to prevent well know issue with Triton Inference Server ( https://github.com/ultralytics/ultralytics/issues/4597#issuecomment-1694948850 )
* Make sure below snippet is blocked in your environment

In [6]:
from ultralytics import YOLO

model_path = "/mnt/user/LPD/finetune/artifact_downloads/14be0ff37e03403b9f50b3cd4e8dd7a5/weights/"
model_name = "best.pt"

# Load a model
model = YOLO(model_path + model_name)  # load an official model

# Retrieve metadata during export. Metadata needs to be added to config.pbtxt. See next section.
metadata = []

def export_cb(exporter):
    metadata.append(exporter.metadata)

model.add_callback("on_export_end", export_cb)

# Export the model
engine_file = model.export(format="engine", dynamic=True,half=True,device=0)

data = """
# Add metadata
parameters {
  key: "metadata"
  value {
    string_value: "%s"
  }
}


name: "license_detector"
platform: "tensorrt_plan"
max_batch_size : 0
input [
  {
    name: "images"
    dims: [ -1, 3, 640, 640 ]
  }
]
output [
  {
    name: "output0"
    dims: [ -1, 5, 8400 ]
  }
]
""" % metadata[0]  # noqa

with open("config.pbtxt", "w") as f:
    f.write(data)

Ultralytics 8.3.144 🚀 Python-3.11.9 torch-2.7.0+cu126 CUDA:0 (NVIDIA L40S, 45589MiB)
YOLO11s summary (fused): 100 layers, 9,413,187 parameters, 0 gradients, 21.3 GFLOPs

[34m[1mPyTorch:[0m starting from '/mnt/user/LPD/finetune/artifact_downloads/14be0ff37e03403b9f50b3cd4e8dd7a5/weights/best.pt' with input shape (1, 3, 640, 640) BCHW and output shape(s) (1, 5, 8400) (18.3 MB)

[34m[1mONNX:[0m starting export with onnx 1.17.0 opset 19...
[34m[1mONNX:[0m slimming with onnxslim 0.1.53...
[34m[1mONNX:[0m export success ✅ 3.3s, saved as '/mnt/user/LPD/finetune/artifact_downloads/14be0ff37e03403b9f50b3cd4e8dd7a5/weights/best.onnx' (36.1 MB)

[34m[1mTensorRT:[0m starting export with TensorRT 10.11.0.33...
[05/28/2025-06:00:22] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU -1567, GPU +0, now: CPU 5898, GPU 1460 (MiB)
[05/28/2025-06:00:22] [TRT] [I] ----------------------------------------------------------------
[05/28/2025-06:00:22] [TRT] [I] Input filename:   /mnt

In [9]:
# Load a model
vehicle = YOLO('yolo11s')  # load an official model

# Retrieve metadata during export. Metadata needs to be added to config.pbtxt. See next section.
metadata = []

def export_cb(exporter):
    metadata.append(exporter.metadata)

vehicle.add_callback("on_export_end", export_cb)

# Export the model
engine_file = vehicle.export(format="engine", dynamic=True,half=True,device=0)

data = """
# Add metadata
parameters {
  key: "metadata"
  value {
    string_value: "%s"
  }
}


name: "vehicle_detector"
platform: "tensorrt_plan"
max_batch_size : 0
input [
  {
    name: "images"
    dims: [ -1, 3, 640, 640 ]
  }
]
output [
  {
    name: "output0"
    dims: [ -1, 84, 8400 ]
  }
]
""" % metadata[0]  # noqa

with open("config.pbtxt", "w") as f:
    f.write(data)

Ultralytics 8.3.144 🚀 Python-3.11.9 torch-2.7.0+cu126 CUDA:0 (NVIDIA L40S, 45589MiB)
YOLO11s summary (fused): 100 layers, 9,443,760 parameters, 0 gradients, 21.5 GFLOPs

[34m[1mPyTorch:[0m starting from 'yolo11s.pt' with input shape (1, 3, 640, 640) BCHW and output shape(s) (1, 84, 8400) (18.4 MB)

[34m[1mONNX:[0m starting export with onnx 1.17.0 opset 19...
[34m[1mONNX:[0m slimming with onnxslim 0.1.53...
[34m[1mONNX:[0m export success ✅ 3.4s, saved as 'yolo11s.onnx' (36.2 MB)

[34m[1mTensorRT:[0m starting export with TensorRT 10.11.0.33...
[05/28/2025-06:38:25] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU -1567, GPU +0, now: CPU 5887, GPU 1566 (MiB)
[05/28/2025-06:38:25] [TRT] [I] ----------------------------------------------------------------
[05/28/2025-06:38:25] [TRT] [I] Input filename:   yolo11s.onnx
[05/28/2025-06:38:25] [TRT] [I] ONNX IR version:  0.0.9
[05/28/2025-06:38:25] [TRT] [I] Opset version:    19
[05/28/2025-06:38:25] [TRT] [I] Producer 

## 2. Create Triton compatible Directory
**Triton model repository layout**<br> Refer : https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/model_repository.html#repository-layout 
```
  <model-repository-path>/
    <model-name>/
      [config.pbtxt]
      [<output-labels-file> ...]
      [configs]/
        [<custom-config-file> ...]
      <version>/
        <model-definition-file>
```

In [8]:
!mkdir -p triton_engines/license_detector/1
!mv config.pbtxt triton_engines/license_detector
!mv /mnt/user/LPD/finetune/artifact_downloads/14be0ff37e03403b9f50b3cd4e8dd7a5/weights/best.engine triton_engines/license_detector/1/model.plan

mv: cannot stat '/mnt/user/LPD/finetune/artifact_downloads/14be0ff37e03403b9f50b3cd4e8dd7a5/weights/best.engine': No such file or directory


In [10]:
!mkdir -p triton_engines/vehicle_detector/1
!mv config.pbtxt triton_engines/vehicle_detector
!mv yolo11s.engine triton_engines/vehicle_detector/1/model.plan

In [None]:
!ls -rlt triton_models/*

# Upload engine into the MLflow

In this part, we will use example files in triton inference server github repo. <br>
**Ref** : https://github.com/triton-inference-server/server/tree/main/deploy/mlflow-triton-plugin#triton-flavor

In [14]:
!cat mlflow_scripts/upload.sh

#!/bin/bash

export MLFLOW_TRACKING_URI='https://mlflow.ingress.pcai0103.sy6.hpecolo.net'
export MLFLOW_TRACKING_TOKEN=$(cat /etc/secrets/ezua/.auth_token)
export MLFLOW_S3_ENDPOINT_URL='http://local-s3-service.ezdata-system.svc.cluster.local:30000'
python mlflow_scripts/publish_model_to_mlflow.py --model_name license-detector --model_directory ./triton_engines --flavor triton



In [16]:
!sh mlflow_scripts/upload.sh

<module 'triton_flavor' from '/mnt/user/LPD/finetune/mlflow_scripts/triton_flavor.py'>
Registered model 'license-detector' already exists. Creating a new version of this model...
2025/05/28 14:40:16 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: license-detector, version 2
Created version '2' of model 'license-detector'.
s3://mlflow.sy6s171r10/16/b73cfe58b2a54d7a8dd96b157957d06d/artifacts
2025/05/28 14:40:16 INFO mlflow.tracking._tracking_service.client: 🏃 View run receptive-asp-793 at: https://mlflow.ingress.pcai0103.sy6.hpecolo.net/#/experiments/16/runs/b73cfe58b2a54d7a8dd96b157957d06d.
2025/05/28 14:40:16 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://mlflow.ingress.pcai0103.sy6.hpecolo.net/#/experiments/16.
