# Introduction
This consolidated Notebook  show off-the-shelf Faster RCNN model in action on a simple video. The simple video in this case is demo.mp4. First, we will fetch Faster RCNN Model from MM Openlab github repository and run the video frames into the model to detect the objects. We calculate the latency and throughput of this inference. Then we will walk through the steps to serve this same model through the Triton Inference Server. This includes the step to convert this model to ONNX model (this is one of the native format that Triton supports) and then organize the converted model into Triton Model Registry and load it into Triton for serving. We will then calculate the latency throughput of the inference through Triton. We do this not to compare how local inference performance against Triton inference instead to highlight steps involved converting a model to be served by Triton server. Actually, the latency comparison between local inference with Triton inference below is not a fair comparison, because the local inference has no overhead besides the local model inference call is asynchronous, where as the Triton Infrence call goes through as a http payload and the inference call in synchronous. 

## Download the Simple Video
In the following cell we download the video from MM Openlab Github repository.

In [24]:
!wget https://github.com/open-mmlab/mmdetection/blob/main/demo/demo.mp4?raw=true -O demo.mp4

--2025-07-15 22:58:25--  https://github.com/open-mmlab/mmdetection/blob/main/demo/demo.mp4?raw=true
Resolving github.com (github.com)... 140.82.114.4
Connecting to github.com (github.com)|140.82.114.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github.com/open-mmlab/mmdetection/raw/refs/heads/main/demo/demo.mp4 [following]
--2025-07-15 22:58:25--  https://github.com/open-mmlab/mmdetection/raw/refs/heads/main/demo/demo.mp4
Reusing existing connection to github.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/open-mmlab/mmdetection/refs/heads/main/demo/demo.mp4 [following]
--2025-07-15 22:58:25--  https://raw.githubusercontent.com/open-mmlab/mmdetection/refs/heads/main/demo/demo.mp4
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.108.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443...

In the following cell, we make a 'temp' directory and move the downloaded .mp4 video to that 'temp' directory.

In [25]:
!mkdir temp
!mv demo.mp4 temp/

mkdir: cannot create directory ‘temp’: File exists


## Decode the .mp4 Video

The following snippet will decode frame after frame from the .mp4 video and write it to the output directory namely 'video_frames'. And in the following cell, we list the downloaded frames.

In [3]:
import cv2
import os

video_path = 'temp/demo.mp4'
output_dir = 'video_frames'
os.makedirs(output_dir, exist_ok=True)

cap = cv2.VideoCapture(video_path)
i = 0
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break
    cv2.imwrite(f"{output_dir}/frame_{i:03d}.jpg", frame)
    i += 1
cap.release()

Here we list all the frames we wrote into the video_frames directory.

In [4]:
!ls video_frames

frame_000.jpg  frame_014.jpg  frame_028.jpg  frame_042.jpg  frame_056.jpg
frame_001.jpg  frame_015.jpg  frame_029.jpg  frame_043.jpg  frame_057.jpg
frame_002.jpg  frame_016.jpg  frame_030.jpg  frame_044.jpg  frame_058.jpg
frame_003.jpg  frame_017.jpg  frame_031.jpg  frame_045.jpg  frame_059.jpg
frame_004.jpg  frame_018.jpg  frame_032.jpg  frame_046.jpg  frame_060.jpg
frame_005.jpg  frame_019.jpg  frame_033.jpg  frame_047.jpg  frame_061.jpg
frame_006.jpg  frame_020.jpg  frame_034.jpg  frame_048.jpg  frame_062.jpg
frame_007.jpg  frame_021.jpg  frame_035.jpg  frame_049.jpg  frame_063.jpg
frame_008.jpg  frame_022.jpg  frame_036.jpg  frame_050.jpg  frame_064.jpg
frame_009.jpg  frame_023.jpg  frame_037.jpg  frame_051.jpg  frame_065.jpg
frame_010.jpg  frame_024.jpg  frame_038.jpg  frame_052.jpg  frame_066.jpg
frame_011.jpg  frame_025.jpg  frame_039.jpg  frame_053.jpg
frame_012.jpg  frame_026.jpg  frame_040.jpg  frame_054.jpg
frame_013.jpg  frame_027.jpg  frame_041.jpg  frame_055.jpg


## Fetch Faster RCNN Model

In the following cell we download the base model, configurations, and checkpoints for the model.


In [5]:
# Create folders
!mkdir -p configs/_base_/models
!mkdir -p configs/_base_/datasets
!mkdir -p configs/_base_/schedules

# Download the required base files
!wget https://raw.githubusercontent.com/open-mmlab/mmdetection/main/configs/_base_/models/fast-rcnn_r50_fpn.py -P configs/_base_/models/
!wget https://raw.githubusercontent.com/open-mmlab/mmdetection/main/configs/_base_/datasets/coco_detection.py -P configs/_base_/datasets/
!wget https://raw.githubusercontent.com/open-mmlab/mmdetection/main/configs/_base_/schedules/schedule_1x.py -P configs/_base_/schedules/
!wget https://raw.githubusercontent.com/open-mmlab/mmdetection/main/configs/_base_/default_runtime.py -P configs/_base_/

!wget https://raw.githubusercontent.com/open-mmlab/mmdetection/main/configs/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py -P configs/faster_rcnn/
!wget https://raw.githubusercontent.com/open-mmlab/mmdetection/main/configs/_base_/models/faster-rcnn_r50_fpn.py -P configs/_base_/models/
!wget https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth -P checkpoints/

--2025-07-15 19:48:41--  https://raw.githubusercontent.com/open-mmlab/mmdetection/main/configs/_base_/models/fast-rcnn_r50_fpn.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2256 (2.2K) [text/plain]
Saving to: ‘configs/_base_/models/fast-rcnn_r50_fpn.py.1’


2025-07-15 19:48:42 (57.8 MB/s) - ‘configs/_base_/models/fast-rcnn_r50_fpn.py.1’ saved [2256/2256]

--2025-07-15 19:48:42--  https://raw.githubusercontent.com/open-mmlab/mmdetection/main/configs/_base_/datasets/coco_detection.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Leng

## Run Inference and Visualize Detections



In [6]:
import os
import mmcv
from mmdet.apis import init_detector, inference_detector
from mmdet.visualization import DetLocalVisualizer

# Initialize the model
config = 'configs/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py'
checkpoint = 'checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth'
model = init_detector(config, checkpoint, device='cuda:0')

# Setup input/output directories
input_dir = './video_frames'
output_dir = './output_frames'
os.makedirs(output_dir, exist_ok=True)

# Set up the visualizer
visualizer = DetLocalVisualizer(name='my_vis')
visualizer.dataset_meta = model.dataset_meta

# Process images
for img_name in sorted(os.listdir(input_dir)):
    if not img_name.lower().endswith('.jpg'):
        continue
    img_path = os.path.join(input_dir, img_name)
    image = mmcv.imread(img_path)  # Read the actual image array
    result = inference_detector(model, image)

    out_file = os.path.join(output_dir, img_name)
    visualizer.add_datasample(
        name=img_name,
        image=image,
        data_sample=result,
        draw_gt=False,
        draw_pred=True,
        show=False,
        wait_time=0,
        out_file=out_file
    )


Loads checkpoint by local backend from path: checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth




![f1](./output_frames/frame_000.jpg)

In [9]:
import time
import torch

# Initialize the model
config = 'configs/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py'
checkpoint = 'checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth'
model = init_detector(config, checkpoint, device='cuda:0')


# Setup input/output directories
input_dir = './video_frames'
output_dir = './output_frames'
os.makedirs(output_dir, exist_ok=True)

# Process images
img_cnt = 1
warmup_cnt = 6
for img_name in sorted(os.listdir(input_dir)):
    if not img_name.lower().endswith('.jpg'):
        continue
    img_cnt += 1
    if img_cnt <= warmup_cnt:
        continue
    if img_cnt == (warmup_cnt+1):
        start = time.time()
    img_path = os.path.join(input_dir, img_name)
    image = mmcv.imread(img_path)  # Read the actual image array
    result = inference_detector(model, image)
    
torch.cuda.synchronize()  # Important!
end = time.time()

n_runs = (img_cnt-warmup_cnt-1)
print('number of runs:', n_runs)

latency_ms = (end - start) / n_runs * 1000
throughput = n_runs / (end - start)

print('latency_ms:', latency_ms)
print('throughput:', throughput)

Loads checkpoint by local backend from path: checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth
number of runs: 61
latency_ms: 89.80352761315517
throughput: 11.135420028349884


In [10]:
!python /mmdeploy/tools/torch2onnx.py /mmdeploy/configs/mmdet/detection/detection_onnxruntime_dynamic.py configs/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth video_frames/frame_000.jpg --device cuda --work-dir output_models

07/15 19:55:00 - mmengine - [4m[97mINFO[0m - torch2onnx: 
	model_cfg: configs/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py 
	deploy_cfg: /mmdeploy/configs/mmdet/detection/detection_onnxruntime_dynamic.py
Loads checkpoint by local backend from path: checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth
07/15 19:55:01 - mmengine - [4m[97mINFO[0m - Export PyTorch model to ONNX: output_models/end2end.onnx.
  ys_shape = tuple(int(s) for s in ys.shape)
  assert cls_score.size()[-2:] == bbox_pred.size()[-2:]
  k = torch.tensor(k, device=input.device, dtype=torch.long)
  assert pred_bboxes.size(0) == bboxes.size(0)
  assert pred_bboxes.size(1) == bboxes.size(1)
  assert len(max_shape) == 2, '`max_shape` should be [h, w]'
  iou_threshold = torch.tensor([iou_threshold], dtype=torch.float32)
  score_threshold = torch.tensor([score_threshold], dtype=torch.float32)
  score_threshold = float(score_threshold)
  iou_threshold = float(iou_threshold)
  assert boxes.size(1) == 4
  assert boxe

In [11]:
!ls output_models

end2end.onnx


In [12]:
!mv output_models/end2end.onnx output_models/model.onnx
!chmod 744 create_model_repo.sh 

In [13]:
from PIL import Image

img = Image.open("video_frames/frame_000.jpg")
print(img.size)  # (width, height)

(1280, 720)


In [14]:
import onnx

model = onnx.load("output_models/model.onnx")
print("Inputs:")
for input in model.graph.input:
    print(input.name, [dim.dim_value for dim in input.type.tensor_type.shape.dim])

print("Outputs:")
for output in model.graph.output:
    print(output.name, [dim.dim_value for dim in output.type.tensor_type.shape.dim])

Inputs:
input [0, 3, 0, 0]
Outputs:
dets [0, 0, 0]
labels [0, 0]


In [15]:
!./create_model_repo.sh

✅ Model repository created successfully in: model_repository


### Now go ahead and start the triton server before running the next cell
```
tritonserver --model-repository=/workspace/model_repository/ --log-verbose=1
```

In [17]:
import os
import numpy as np
import tritonclient.http as httpclient

# Create Triton client
client = httpclient.InferenceServerClient("localhost:8000")

# Setup input/output directories
input_dir = './video_frames'
output_dir = './output_frames'
os.makedirs(output_dir, exist_ok=True)

# Model name in Triton
model_name = "faster_rcnn"

# Processing variables
img_cnt = 1
warmup_cnt = 6

total_end_minus_start = 0

# Loop through images
for img_name in sorted(os.listdir(input_dir)):
    if not img_name.lower().endswith('.jpg'):
        continue

    img_cnt += 1
    if img_cnt <= warmup_cnt:
        continue

    img_path = os.path.join(input_dir, img_name)
    image = mmcv.imread(img_path)  # Read image, shape (H, W, C)

    # Preprocess: keep original size
    input_data = image.astype(np.float32)
    input_data = np.transpose(input_data, (2, 0, 1))  # HWC -> CHW
    input_data = np.expand_dims(input_data, axis=0)   # Add batch dim: (1, C, H, W)

    # Create Triton input
    inputs = [httpclient.InferInput("input", input_data.shape, "FP32")]
    inputs[0].set_data_from_numpy(input_data)

    # Create output request
    outputs = [httpclient.InferRequestedOutput("dets"),
               httpclient.InferRequestedOutput("labels")]

    # Inference timing
    start = time.time()
    response = client.infer(model_name, inputs=inputs, outputs=outputs)
    # If you do any local torch work, synchronize here:
    # torch.cuda.synchronize()
    end = time.time()

    total_end_minus_start += (end - start)

    # Extract results
    dets = response.as_numpy("dets")
    labels = response.as_numpy("labels")

    #print(f"Image {img_name}: dets shape={dets.shape}, labels shape={labels.shape}")

# Compute statistics
n_runs = (img_cnt - warmup_cnt - 1)
print("number of runs:", n_runs)

latency_ms = total_end_minus_start / n_runs * 1000
throughput = n_runs / total_end_minus_start

print(f"Latency per image: {latency_ms:.2f} ms")
print(f"Throughput: {throughput:.2f} FPS")


number of runs: 61
Latency per image: 132.03 ms
Throughput: 7.57 FPS


In [18]:
from mmengine.structures import InstanceData
from mmdet.structures import DetDataSample
from mmdet.visualization import DetLocalVisualizer
from mmdet.datasets import CocoDataset

import tritonclient.http as httpclient
import numpy as np
import os

# Create Triton client
client = httpclient.InferenceServerClient("localhost:8000")

# Initialize visualizer with COCO classes
visualizer = DetLocalVisualizer()
visualizer.dataset_meta = CocoDataset.METAINFO

input_dir = "./video_frames"
output_dir = "./output_frames1"
os.makedirs(output_dir, exist_ok=True)

# Normalization values (COCO mean and std)
mean = np.array([123.675, 116.28, 103.53], dtype=np.float32).reshape(3,1,1)
std = np.array([58.395, 57.12, 57.375], dtype=np.float32).reshape(3,1,1)

# Process images
for img_name in sorted(os.listdir(input_dir)):
    if not img_name.lower().endswith(".jpg"):
        continue
    img_path = os.path.join(input_dir, img_name)
    image = mmcv.imread(img_path)

    print(f"Processing image: {img_name}, shape: {image.shape}")

    # Normalize input image as model expects
    input_data = (image.transpose(2,0,1).astype(np.float32) - mean) / std
    input_data = input_data[None]  # add batch dimension

    print(f"Input data shape (NCHW): {input_data.shape}")

    # Prepare input for Triton
    inputs = [httpclient.InferInput("input", input_data.shape, "FP32")]
    inputs[0].set_data_from_numpy(input_data)

    # Prepare output requests
    outputs = [
        httpclient.InferRequestedOutput("dets"),
        httpclient.InferRequestedOutput("labels")
    ]

    # Perform inference
    response = client.infer(
        model_name="faster_rcnn",
        inputs=inputs,
        outputs=outputs
    )

    # Extract output arrays
    dets = response.as_numpy("dets")
    labels = response.as_numpy("labels")

    #print("Raw dets from Triton:", dets)
    #print("Raw labels from Triton:", labels)

    # Flatten batch and detection dims
    dets = dets.reshape(-1, dets.shape[-1])
    labels = labels.reshape(-1)

    print("After reshape dets shape:", dets.shape)
    print("After reshape labels shape:", labels.shape)

    # Filter out detections with zero or negative score
    valid_mask = dets[:, 4] > 0.35
    dets = dets[valid_mask]
    labels = labels[valid_mask].astype(np.int64)

    #print("Filtered dets:", dets)
    #print("Filtered labels:", labels)

    pred_instances = InstanceData()

    if dets.size == 0:
        print("No valid detections for this image.")
        pred_instances.bboxes = np.empty((0,4), dtype=np.float32)
        pred_instances.scores = np.empty((0,), dtype=np.float32)
        pred_instances.labels = np.empty((0,), dtype=np.int64)
    else:
        pred_instances.bboxes = dets[:, :4]
        pred_instances.scores = dets[:, 4]
        pred_instances.labels = labels

    data_sample = DetDataSample()
    data_sample.pred_instances = pred_instances

    out_file = os.path.join(output_dir, img_name)
    visualizer.add_datasample(
        name=img_name,
        image=image,
        data_sample=data_sample,
        draw_gt=False,
        draw_pred=True,
        show=False,
        out_file=out_file
    )


Processing image: frame_000.jpg, shape: (720, 1280, 3)
Input data shape (NCHW): (1, 3, 720, 1280)
After reshape dets shape: (43, 5)
After reshape labels shape: (43,)
Processing image: frame_001.jpg, shape: (720, 1280, 3)
Input data shape (NCHW): (1, 3, 720, 1280)




After reshape dets shape: (53, 5)
After reshape labels shape: (53,)
Processing image: frame_002.jpg, shape: (720, 1280, 3)
Input data shape (NCHW): (1, 3, 720, 1280)
After reshape dets shape: (41, 5)
After reshape labels shape: (41,)
Processing image: frame_003.jpg, shape: (720, 1280, 3)
Input data shape (NCHW): (1, 3, 720, 1280)
After reshape dets shape: (46, 5)
After reshape labels shape: (46,)
Processing image: frame_004.jpg, shape: (720, 1280, 3)
Input data shape (NCHW): (1, 3, 720, 1280)
After reshape dets shape: (56, 5)
After reshape labels shape: (56,)
Processing image: frame_005.jpg, shape: (720, 1280, 3)
Input data shape (NCHW): (1, 3, 720, 1280)
After reshape dets shape: (53, 5)
After reshape labels shape: (53,)
Processing image: frame_006.jpg, shape: (720, 1280, 3)
Input data shape (NCHW): (1, 3, 720, 1280)
After reshape dets shape: (56, 5)
After reshape labels shape: (56,)
Processing image: frame_007.jpg, shape: (720, 1280, 3)
Input data shape (NCHW): (1, 3, 720, 1280)
Afte

![f1](./output_frames1/frame_009.jpg)

In [20]:
!pwd

/workspace


In [21]:
!ls

benchmark.ipynb  configs	       model_repository  output_models
checkpoints	 consolidated.ipynb    optimize1.ipynb	 temp
config.pbtxt	 create_model_repo.sh  output_frames	 video_frames
config.yaml	 example.ipynb	       output_frames1
