TensorRT C++

A simple program that implements the NVIDIA TensorRT SDK for high-performance deep learning inference, written in C++

Features

Caption
- Generate a caption of the image using Booru tags
Upscale
- Super resolution the image using a model
more coming soon...?

Getting Started

(for Windows)

Requirements

Nvidia RTX GPU
TensorRT 10.0 SDK

An Nvidia Developer account is needed
CUDA Toolkit

Be sure to download the release specified by your TensorRT version
OpenCV 4.10.0

It needs to be exactly this version, unless you're planning to build from source

Recommended to add the OpenCV bin folder to your system PATH; otherwise, you have to manually place opencv_world4100.dll next to the .exe; TensorRT and CUDA Toolkit bin folders should be included in PATH already during installation

Models

For optional arguments during engine conversion, refer to the trtexec section

Caption:
1. Go to SmilingWolf's HuggingFace
2. Select a tagger model of choice
  
  This program was built and tested on WD SwinV2 Tagger v3
3. Download both the .onnx and the .csv files
4. Convert the .onnx model to a .trt engine
  - Example
```
trtexec --onnx=model.onnx --saveEngine=model.trt --fp16
```
5. Modify the config.json file accordingly (see below)
Upscale:
1. Go to OpenModelDB
2. Expand the Advanced tag selector, and filter the Platform to ONNX format
3. Download a model of choice
  
  This program was built and tested on 4x-Nomos8kDAT
4. Convert the .onnx model to a .trt engine
  - Example
```
trtexec --onnx=4xNomos8kDAT.onnx --saveEngine=4xNomos8kDAT.trt --shapes=input:1x3x128x128 --inputIOFormats=fp32:chw --outputIOFormats=fp32:chw
```
5. Modify the config.json file accordingly (see below)

Configs

Inside the config.json file, you need to have the following fields:

Required
- deviceID: The ID of the CUDA device
  
  Should be 0 if you only have one GPU
- mode: "caption" or "upscale"
- modelPath: The path to the .trt engine
  
  Use absolute path so it supports drag & drop
- inputResolution: Should be 448 for most tagger models; 64 or 128 for most upscale models
- fp16: Enable to use half precision I/O
Caption
- tagsPath: The path to the .csv tags spreadsheet
  
  Use absolute path so it supports drag & drop
- threshold: The score needed for a tag to be included
Upscale
- overlap: The overlap between each tile
  
  This is to prevent seams
- upscaleRatio: The Scale of your upscale model

Deployment

If you simply want to run the program:

Download the built .exe from Releases
Place the config.json next to the .exe
Launch the .exe

Development

If you want to build from source:

Install Visual Studio with C++ module
git clone this repo
Open the .vcxproj project
Modify the CUDA.props to point to the correct paths
- TensorRT
- CUDA Toolkit
- OpenCV
Download the Json for C++ package, and add the single-file json.hpp
Download the CSV for C++ package, and add the single-file rapidcsv.h
Configure the solution to Release (instead of Debug)
Build

For other OS, you will need to modify path_util.cpp to use platform-specific implementation

Command-Line Arguments

The program can take 2 arguments:

The first one is the path to an image or a path to a folder of images, which means you can simply drag and drop onto the .exe to process. If empty, it will ask for a path instead.
The second one is the path to the config, allowing you to easily switch between different models and modes. If empty, defaults to config.json in the same folder of the .exe.

Benchmark

Running 4xNomos8kDAT at fp32, with input size of 128 and overlap of 16, on a RTX 3060:

Upscale a 512x512 image:
- Using ComfyUI: ~11.6s
- Using Forge: ~12.8s
- Using TensorRT: ~6.2s
Upscale a 1024x1024 image:
- Using ComfyUI: ~36.5s
- Using Forge: ~36.9s
- Using TensorRT: ~19.24s

Roadmap

trtexec

Extract the trtexec.exe from the downloaded TensorRT .zip

Parameters

--onnx: Path to the model to convert
--saveEngine: Path to save the converted engine

Optional

--shapes: The shape of the model's input

This is only needed for model with dynamic inputs (ie. the upscale models)
- The first number is batch size
  
  This program currently only supports 1
- The second number is the channel count
  
  This program currently only supports 3 (RGB)
- The third and forth numbers are the input dimension of your model
  
  Refer to the model page
--inputIOFormats: Specify the precision of the inputs and the channel order

upscale mode supports fp32 and fp16 I/O; caption mode only supports fp32 I/O

Most upscale models are chw; the tagger models are hwc
--outputIOFormats: Same as above

Precision

Specify the precision to store the engine weights in

(default): When omitted, defaults to fp32 full precision

Largest in size; slowest in performance
--bf16: More advanced half precision

Second largest in size; similar performance to fp32

Requires RTX 30 series or newer GPU
--fp16: Half precision

Almost half in size; almost double in performance

Some models may not work properly (eg. the DAT upscale models do not work in fp16)
--best: Let trtexec determine the precision to use for each layer, including fp8

May cause inaccuracy (eg. generate artifacts for upscale models)

I/O precision and Weight precision are independent

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
src		src
.gitignore		.gitignore
CUDA.props		CUDA.props
LICENSE		LICENSE
README.md		README.md
TensorRT-Cpp.cpp		TensorRT-Cpp.cpp
TensorRT-Cpp.sln		TensorRT-Cpp.sln
TensorRT-Cpp.vcxproj		TensorRT-Cpp.vcxproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TensorRT C++

Features

Getting Started

Requirements

Models

Configs

Deployment

Development

Command-Line Arguments

Benchmark

Roadmap

trtexec

About

Uh oh!

Releases 6

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TensorRT C++

Features

Getting Started

Requirements

Models

Configs

Deployment

Development

Command-Line Arguments

Benchmark

Roadmap

trtexec

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Uh oh!

Contributors

Uh oh!

Languages