# Evaluation of Yolo TensorRT Zoo

For [ENGALL-2424](https://aquabyte.atlassian.net/browse/ENGALL-2424) we seek to understand the TensorRT-accelerated inference time (latency) of YoloV3 on the Jetson TX2.  

This page was generated from [this notebook](https://github.com/aquabyte-new/research-exploration/blob/pwais/tx2-yolo-zoo-0/pwais/mft-pg/notebooks/darknet_yolo_trt_zoo_eval.ipynb).

The results below study:
 * A zoo of YoloV3 models [with architectures that are nearly identical to the production detector](https://github.com/aquabyte-new/research-exploration/blob/ba35c6356ed44ddd1145008f2e0eeaaf7d958378/pwais/mft-pg/detection/models/README.md#available-models).  We only vary the input size.  All models have square inputs.
 * Each model has been trained and tested on ~750 images from a GoPro video feed. [Details](https://github.com/aquabyte-new/research-exploration/tree/ba35c6356ed44ddd1145008f2e0eeaaf7d958378/pwais/mft-pg/datasets#mft-pg-datasets).
 * The `GeForce.RTX.2070` platform is a Lambda Labs Tensorbook with a RTX 2070 (8GB memory).
 * The `Tegra.X2` platform is a Jetson TX2 NVidia Devkit.  [Photo](https://drive.google.com/file/d/1Ycx2VLA5_y-s6nEoseHPstOnLElBulXj/view?usp=sharing)
 
Notes on TX2 Tests:
 * The TX2 generally remained under 40C operating temperature for the duration of the study.  [Pic of JTOP](https://drive.google.com/file/d/1nFlXunwwnOoIz_LA2ydk3BapYSFy4Ucj/view?usp=sharing).
 * During the inference test, there was time for the GPU to rest while the next image was being resized.  [Pic of JTOP GPU during an inference run](https://drive.google.com/file/d/1aFW6kTwcp5JbHZEkXUT0vWs2Gq7dPL28/view?usp=sharing).
 * System memory usage during the inference test was about 4GB; unclear what the actual GPU memory usage was.  [Pic of JTOP during an inference run](https://drive.google.com/file/d/123iIBGbByS2FNNb2P3TErex8AzYMtIqV/view?usp=sharing).
 

Reproducing these results:
 * From scratch: use this branch of [this repo](https://github.com/aquabyte-new/research-exploration): `pwais/tx2-yolo-zoo-0`
   See also this Pull Request: https://github.com/aquabyte-new/research-exploration/pull/3
 * This report: use the `mlruns` directory persisted to S3:
    ```
    aws s3 sync --size-only s3://aquabyte-research/pwais/mlruns-tx2-yolo-zoo-0/ ./mlruns/
    ```
   And start jupyter using the `mft-cli`: `$ ./mft-cli --jupyter-ui`
    

In [None]:
import sys
sys.path.append('/opt/mft-pg')

# The first Yolo zoo didn't have run ID tracking set up right.  Here are the
# MLFlow Run Ids for that zoo
FIRST_YOLO_ZOO_RUN_IDS = (
  '90fafa4fb2194c70ab5e99bf94964587',
  '3771d3357f534c689a31f7278f2fe60e',
  '86d0fbfae7d9432bb15b17113bf3f291',
  '058730c05f8f4dd8a3299fb10dece255',
  '32779f411bbd4c1dafe483ca8a636601',
  'e10cae25a3884f75919c2b6212e4824f',
  '3b33e4f6bc45466c9a1d35ee839a0c75',
  'b18e9351ef4b4da9b788cc135339f457',
  'e235ff91d5c043979f31b8ce92c1e7a5',
  'c114c649c02349389b54810c526bdc7f',
  '95a388e4809d4e24870e4d147140c356',
  '95d86a448acc4d86ba2d3714795453a2',
  'ea283b03af834744a616fa740b4f303a',
  'f231ff83d1dd494fa6269916cc0b1ab3',
  '1ab6658a60a7402ba0398cf445ba22d2',
  '2de468aed4e445a6a352db4b15fd703c',
  '71058625298a4b99a91f5099fd8c7cd4',
  '2176e2dd712949c5bb7e1c25abc3b278',
  '4ff330ca02054a1db9eff79abb06841a',
  '3a10b05bc3344922b0b953f27e61b07b',
  'd7e6b25641e34edd913c66d0e6725a6b',
  'f0af1e307acc4d19b13b89ab3025ccf6',
  '81fe7f4ff374481198a0aa1e94ebe3a1',
  '10a236b73ea64e84b0f7628fa3c6f0f6',
  'e4297a980b7e43ac8099454601895c0a',
  '0053813569944f2ca919993ed3bea4f9',
  '5f7c3d1b1e0c49e1a52b25ec0bf6d316',
  '4e3232e14ae6472a86717be9ce8572a7',
)

print('len(FIRST_YOLO_ZOO_RUN_IDS)', len(FIRST_YOLO_ZOO_RUN_IDS))


In [None]:
import mlflow
mlflow.set_tracking_uri('/opt/mft-pg/mlruns')
mlflow_df = mlflow.search_runs()
mlflow_df

In [None]:
import os
import pandas as pd

import numpy as np

from mft_utils.bbox2d import BBox2D

stat_df_rows = []

artifact_uris = mlflow_df[mlflow_df['run_id'].isin(FIRST_YOLO_ZOO_RUN_IDS)]['artifact_uri']

for artifact_dir in artifact_uris:
    # Trim the 'file://' thing for os.path
    artifact_dir = artifact_dir.replace('file://', '')
    # print(artifact_dir)
    
    trt_tx2_engine_path = os.path.join(artifact_dir, 'yolov3.NVIDIA.Tegra.X2.trt')
    trt_rtx_engine_path = os.path.join(artifact_dir, 'yolov3.GeForce.RTX.2070.with.Max-Q.Design.trt')

    trt_tx2_df_path = os.path.join(artifact_dir, 'YoloTRTRunner.NVIDIA.Tegra.X2.detections_df.pkl')
    trt_rtx_df_path = os.path.join(artifact_dir, 'YoloTRTRunner.GeForce.RTX.2070.with.Max-Q.Design.detections_df.pkl')
    
    yolo_config_path = os.path.join(artifact_dir, 'yolov3.cfg')
    
    from mft_utils import misc as mft_misc
    w, h = mft_misc.darknet_get_yolo_input_wh(yolo_config_path)
    
    img_width = w
    
    if os.path.exists(trt_tx2_df_path):
        det_df = pd.read_pickle(trt_tx2_df_path)
        stat_df_rows.append({
            'platform': 'Tegra.X2',
            'img_width': img_width,
            'latencies': det_df['latency_sec'].to_numpy(),
            'trt_engine_size_bytes': os.path.getsize(trt_tx2_engine_path),
            'trt_load_time': det_df['extra'][0]['trt_engine_load_time_sec'],
            'mean_resize_time_ms': 1e3*np.array([
                float(det_df['extra'][i]['resize_time_sec']) for i in range(len(det_df))]).mean(),
        })
    
    if os.path.exists(trt_rtx_df_path):
        det_df = pd.read_pickle(trt_rtx_df_path)
        stat_df_rows.append({
            'platform': 'GeForce.RTX.2070',
            'img_width': img_width,
            'latencies': det_df['latency_sec'].to_numpy(),
            'trt_engine_size_bytes': os.path.getsize(trt_rtx_engine_path),
            'trt_load_time': det_df['extra'][0]['trt_engine_load_time_sec'],
            'mean_resize_time_ms': 1e3*np.array([
                float(det_df['extra'][i]['resize_time_sec']) for i in range(len(det_df))]).mean(),
        })


results_df = pd.DataFrame(stat_df_rows)
print('len(results_df)', len(results_df))
results_df


In [None]:
from bokeh.plotting import figure 
from bokeh.io import output_notebook, show
output_notebook()


In [None]:
fig = figure(
        title="Inference Latencies for YoloV3 Fish Detector",
        plot_width=950,
        y_axis_label="Latency (milliseconds)",
        x_axis_label="Network input width (pixels)")

for row in results_df.to_dict(orient='records'):
    ys = 1e3 * row['latencies']
    xs = [row['img_width']] * len(ys)
    color = 'blue' if 'Tegra' in row['platform'] else 'orange'
    fig.scatter(xs, ys, fill_alpha=0.25, color=color, legend_label=row['platform'])

show(fig)


In [None]:
from bokeh.transform import factor_cmap
from bokeh.transform import transform
from bokeh.models import ColumnDataSource
results_src = ColumnDataSource(results_df)
results_src.data['trt_engine_size_MBytes'] = 1e-6*results_src.data['trt_engine_size_bytes']

fig2 = figure(
        title="TensorRT Engine Sizes for YoloV3 Fish Detector",
        plot_width=950,
        y_axis_label="Engine size (MBytes)",
        x_axis_label="Network input width (pixels)")
fig2.scatter(
    source=results_src,
    x='img_width',
    y='trt_engine_size_MBytes',
    fill_alpha=0.25,
    color=factor_cmap(
            field_name='platform',
            palette=['blue', 'orange'],
            factors=results_df['platform'].unique()),
    legend_field='platform')
show(fig2)

In [None]:
fig3 = figure(
        title="TensorRT Engine Load Time for YoloV3 Fish Detector",
        plot_width=950,
        y_axis_label="Load Time (seconds)",
        x_axis_label="Network input width (pixels)")
fig3.scatter(
    source=results_src,
    x='img_width',
    y='trt_load_time',
    fill_alpha=0.25,
    color=factor_cmap(
            field_name='platform',
            palette=['blue', 'orange'],
            factors=results_df['platform'].unique()),
    legend_field='platform')
show(fig3)

In [None]:
fig4 = figure(
        title="Mean Image Resize Time (OpenCV CPU) for YoloV3 Fish Detector",
        plot_width=950,
        y_axis_label="Resize Time (milliseconds)",
        x_axis_label="Network input width (pixels)")
fig4.scatter(
    source=results_src,
    x='img_width',
    y='mean_resize_time_ms',
    fill_alpha=0.25,
    color=factor_cmap(
            field_name='platform',
            palette=['blue', 'orange'],
            factors=results_df['platform'].unique()),
    legend_field='platform')
show(fig4)