A simple program that implements the NVIDIA TensorRT SDK for high-performance deep learning inference, written in C++
- Caption
- Generate a caption of the image using Booru tags
- Upscale
- Super resolution the image using a model
- more coming soon...?
(for Windows)
- Nvidia RTX GPU
- TensorRT 10.0 SDK
An Nvidia Developer account is needed
- CUDA Toolkit
Be sure to download the release specified by your TensorRT version
- OpenCV 4.10.0
It needs to be exactly this version, unless you're planning to build from source
Recommended to add the OpenCV
binfolder to your system PATH; otherwise, you have to manually placeopencv_world4100.dllnext to the.exe; TensorRT and CUDA Toolkitbinfolders should be included in PATH already during installation
For optional arguments during engine conversion, refer to the trtexec section
-
Caption:
- Go to SmilingWolf's HuggingFace
- Select a tagger model of choice
This program was built and tested on WD SwinV2 Tagger v3
- Download both the
.onnxand the.csvfiles - Convert the
.onnxmodel to a.trtengine- Example
trtexec --onnx=model.onnx --saveEngine=model.trt --fp16
- Example
- Modify the
config.jsonfile accordingly (see below)
-
Upscale:
- Go to OpenModelDB
- Expand the
Advanced tag selector, and filter the Platform toONNXformat - Download a model of choice
This program was built and tested on 4x-Nomos8kDAT
- Convert the
.onnxmodel to a.trtengine- Example
trtexec --onnx=4xNomos8kDAT.onnx --saveEngine=4xNomos8kDAT.trt --shapes=input:1x3x128x128 --inputIOFormats=fp32:chw --outputIOFormats=fp32:chw
- Example
- Modify the
config.jsonfile accordingly (see below)
Inside the
config.jsonfile, you need to have the following fields:
-
Required
- deviceID: The ID of the CUDA device
Should be
0if you only have one GPU - mode:
"caption"or"upscale" - modelPath: The path to the
.trtengineUse absolute path so it supports drag & drop
- inputResolution: Should be
448for most tagger models;64or128for most upscale models - fp16: Enable to use half precision I/O
- deviceID: The ID of the CUDA device
-
Caption
- tagsPath: The path to the
.csvtags spreadsheetUse absolute path so it supports drag & drop
- threshold: The score needed for a tag to be included
- tagsPath: The path to the
-
Upscale
- overlap: The overlap between each tile
This is to prevent seams
- upscaleRatio: The
Scaleof your upscale model
- overlap: The overlap between each tile
If you simply want to run the program:
- Download the built
.exefrom Releases - Place the
config.jsonnext to the.exe - Launch the
.exe
If you want to build from source:
- Install Visual Studio with C++ module
gitclonethis repo- Open the
.vcxprojproject - Modify the
CUDA.propsto point to the correct paths- TensorRT
- CUDA Toolkit
- OpenCV
- Download the Json for C++ package, and add the single-file
json.hpp - Download the CSV for C++ package, and add the single-file
rapidcsv.h - Configure the solution to
Release(instead ofDebug) - Build
For other OS, you will need to modify
path_util.cppto use platform-specific implementation
The program can take 2 arguments:
-
The first one is the path to an image or a path to a folder of images, which means you can simply drag and drop onto the
.exeto process. If empty, it will ask for a path instead. -
The second one is the path to the config, allowing you to easily switch between different models and modes. If empty, defaults to
config.jsonin the same folder of the.exe.
Running 4xNomos8kDAT at fp32, with input size of 128 and overlap of 16, on a RTX 3060:
-
Upscale a
512x512image: -
Upscale a
1024x1024image:
- Upgrade to TensorRT 10
- Upgrade to OpenCV 4.10.0
- Seamless Tiling
- Support Folder Processing
- Support Half Precision I/O
- Support Batch Size
Extract the
trtexec.exefrom the downloaded TensorRT.zip
Parameters
- --onnx: Path to the model to convert
- --saveEngine: Path to save the converted engine
Optional
-
--shapes: The shape of the model's input
This is only needed for model with dynamic inputs (ie. the upscale models)
- The first number is batch size
This program currently only supports
1 - The second number is the channel count
This program currently only supports
3(RGB) - The third and forth numbers are the input dimension of your model
Refer to the model page
- The first number is batch size
-
--inputIOFormats: Specify the precision of the inputs and the channel order
upscale mode supports
fp32andfp16I/O; caption mode only supportsfp32I/OMost upscale models are
chw; the tagger models arehwc -
--outputIOFormats: Same as above
Precision
Specify the precision to store the engine weights in
-
(default): When omitted, defaults to
fp32full precisionLargest in size; slowest in performance
-
--bf16: More advanced half precision
Second largest in size; similar performance to
fp32Requires RTX 30 series or newer GPU
-
--fp16: Half precision
Almost half in size; almost double in performance
Some models may not work properly (eg. the
DATupscale models do not work infp16) -
--best: Let
trtexecdetermine the precision to use for each layer, includingfp8May cause inaccuracy (eg. generate artifacts for upscale models)
I/O precision and Weight precision are independent