Skip to content
/ xtrt Public template

A lightweight, high-performance deep learning inference tool.

License

Notifications You must be signed in to change notification settings

gitctrlx/xtrt

Repository files navigation

 
XTRT document HOT      XTRT installation TRY IT OUT
 

📄 Table of Contents

🎉 What's New

[2024.03.08] First release v0.1.0 on March 8, 2024:

  • Enables rapid deployment of various deep learning models, converting them into TensorRT engine files.
  • Supports multi-precision model deployment, including int8 quantization, with fine-tuning capabilities during quantization.
  • Supports Quantization-Dequantization (QDQ) for quantizing ONNX models.
  • Facilitates model inference accuracy validation across multiple platforms (PyTorch, ONNXRuntime, TensorRT).
  • Includes support for the Efficient NMS plugin, Rotated Object Detection NMS plugin, and YOLO decode plugin.

📚 Introduction

X-TRT is a lightweight and highly efficient deep learning model inference platform developed based on TensorRT C++.

  • Lightweight and Scalability: The most notable advantage of X-TRT is its lightweight and scalability. It offers a variety of decoupled tool classes, such as model accuracy verification and tuning tools, ONNX model quantization tools, ONNX model modification tools, and many example codes. These features significantly enhance the platform's flexibility and user-friendliness, making X-TRT not only suitable for advanced users for deep customization and optimization but also accessible for beginners to learn and use.
  • High Efficiency and Versatility: X-TRT combines a lightweight design with high efficiency, minimizing resource use and maximizing processing speed, making it ideal for a wide range of hardware. It is built with TensorRT C++, enhancing deep learning inference performance through optimized model execution.
  • Broad Compatibility and Deployment Options: Designed for wide operating system compatibility, X-TRT supports major platforms such as Ubuntu and Windows and extends to high-performance edge computing with support for deployment on edge devices, including the Jetson series.

⚙️ Installation

💡 Prerequisites

🔖 Docker

(Coming soon!) We recommend deploying with Docker for the quickest project startup. Docker images for both X86 architecture and NVIDIA Jetson ARM architecture are provided.

docker build -f docker/[dockerfile]

If you prefer to manually configure the environment, please continue reading the section below.

🔖 NVIDIA Jetson Appliances

🔖 Windows or Linux (x86)

Click to expand to read the detailed environment configuration.

To build the JetYOLO components, you will first need the following software packages.

TensorRT

DeepStream

System Packages

Pytorch(Optional)

  • You need the CUDA version of PyTorch. If your device is Jetson, please refer to the Jetson Models Zoo for installation.

🛠️ build

Building the Basic Inference Framework:

git clone https://github.com/gitctrlx/xtrt.git
cmake -S . -B build 
cmake --build build

If you wish to compile the eval tool, please add the option -DBUILD_TOOLS_EVAL=ON during compilation. If you are unsure about your CUDA SM version, you can run tools/cudasm.sh to check. For more details, please see FAQ.

(Optional) If you would like to use the complete set of tools developed in Python, please install the following:

python3 -m pip install requirements.txt

✏️ Tutorial

🧨 Quick Start

1. Data Preparation

Data is used for calibration during quantization. We plan to use the COCO val dataset for model quantization calibration work. Place the downloaded val2017 dataset in the data/coco directory.

├── cmake
├── common
├── config
├── data
│   └── coco
│       ├── annotations
│       └── val2017
├── weights
├── engine
├── media
├── output
├── plugins
├── scripts
├── tools
└── docker

2. Model Preparation

Place the prepared ONNX file into the weights folder. You can directly download the ONNX weights we have exported from HuggingFace, all weights originate from mmyolo pre-trained weights. You also have the option to configure mmyolo to freely export weights, or use other object detection models to export ONNX. The related code can be found in tools/modify_onnx.

There are two formats of ONNX exported by mmyolo. One is an end-to-end ONNX that has added the EfficientNMS node from TensorRT8, and the other is a pure model part that has removed the decode part (including three output results). For detailed content, please see the detailed tutorial document. You can use the ONNX model that has added EfficientNMS, or use the model that has removed the decode part and manually add plugins for acceleration. The related code can be found in tools/modify_onnx.

3. Building the Engine

Once the dataset is ready, the next step is to construct the engine. Below is an example for building a YOLOv5s TensorRT engine, with the corresponding code located in scripts/build.sh:

./build/build \
    "./weights/yolov5s_trt8.onnx" \    # ONNX Model File Path
    "./engine/yolo.plan" \             # TensorRT Engine Save Path
    "int8" \                           # Quantization Precision
    3 \                                # TRT Optimization Level
    1 1 1 \                            # Dynamic Shape Parameters
    3 3 3 \							 
    640 640 640 \					   
    640 640 640 \					   
    550 \                              # Calibration Iterations
    "./data/coco/val2017" \            # Calibration Dataset Path
    "./data/coco/filelist.txt" \       # Calibration Image List
    "./engine/int8Cache/int8.cache" \  # Calibration File Save Path
    true \                             # Timing Cache Usage
    false \                            # Ignore Timing Cache Mismatch
    "./engine/timingCache/timing.cache"# Timing Cache Save Path

For a detailed analysis of the code's parameters, please see the detailed documentation.

4. Executing Inference(demo)

Inferencing a single image using the built YOLO TensorRT engine.

./build/yolo_det_img \
    "engine/yolo_m.plan" \   # TensorRT Engine Save Path
    "media/demo.jpg" \       # Input Image Path
    "output/output.jpg"\     # Output Image Path
    2 \                      # Pre-processing Pipeline
    1 3 640 640              # Input Model Tensor Values

Inferencing a video using the built YOLO TensorRT engine.

./build/yolo_det \
    "engine/yolo_trt8.plan" \ # TensorRT Engine Save Path
    "media/c3.mp4" \          # Input Image Path 
    "output/output.mp4"\      # Output Image Path
    2 \	                      # Pre-processing Pipeline
    1 3 640 640	              # Input Model Tensor Values

For a detailed analysis of the code's parameters, please see the detailed documentation.

💻 Overview of Benchmark and Model Zoo

MMYOLO COCO Dataset evaluation

The evaluation results above are from the MMYOLO model under FP16 precision. The "TRT-FP16-GPU-Latency(ms)" refers to the GPU compute time for model forwarding only on the NVIDIA Tesla T4 device using TensorRT 8.4, with a batch size of 1, testing shape of 640x640 (for YOLOX-tiny, the testing shape is 416x416).

In practical tests, we found that on the Jetson platform, due to differences in memory size, there might be some impact on the model's accuracy. This is because TensorRT requires sufficient memory during the engine construction phase to test certain strategies. Across different platforms, there could be an accuracy loss of about 0.2%-0.4%.

📖 Document

For more detailed tutorials about the project, please refer to the detailed documentation.

❓ FAQ

Please refer to the FAQ for frequently asked questions.

🧾 License

This project is released under the GPL 3.0 license.

🎯 Reference

This project references many excellent works from predecessors, and some useful repository links are provided at the end.

  1. TensorRT
  2. mmyolo
  3. Lidar_AI_Solution
  4. PPQ
  5. DeepStream-Yolo
  6. CVprojects
  7. efficientTensorRTNMSPluginForRotateBox