Skip to content

ptaxom/pnn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

93 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pnn

pnn is Darknet compatible neural nets inference engine implemented in Rust. By optimizing was achieved significant performance increment(especially in FP16 mode). pnn provide CUDNN-based and TensorRT-based inference engines.

FPS Performance

Performance is measured at RTX 3070Ti, TensorRT v8.2.1, CUDNN v8.3.0, NVCC/CUDA Runtime 11.5, SM=80. For fair comparison was used tkDNN from tensorrt8 branch.

  • YOLOv4 CSP 512x512
    Configuration Darknet tkDNN pnn + CUDNN pnn + TensorRT
    BS=1, FP32 87.8 98.2(112.9**) 98.1(107.0**) 108.7(119.6**)
    BS=1, FP16 99.9* 221.2(359.0**) 159(183.7**) 197.3(238.0**)
    BS=4, FP32 - 121.0(129.3**) 117.4(517**) 130.1(590.0**)
    BS=4, FP16 - 268.3(493.4**) 193.2(869.0**) 230.7(1150.5**)
  • YOLOv4 416x416[WIP]
    Configuration Darknet tkDNN pnn(CUDNN) pnn(TensorRT)
    BS=1, FP32 60.3 121.7(133.8*) N/A N/A
    BS=1, FP16 69.7 290.5(455.1*) N/A N/A
    BS=4, FP32 N/A 161.4(179.8*) N/A N/A
    BS=4, FP16 N/A 365.1(632.1*) N/A N/A

* - Actually, Darknet hasnt FP16 mode, it operate in mixed precision

** - Main value is full inference time, including reading, preprocessing and postprocessing. Value in brackets is clear inference time. During benchmark nor of Darknet, tkDNN or pnn doesnt render video to screen/file. If perform benchmark with render you will get 3-5% decreasing for pnn/Darknet with multithreaded loader/renderer and ~30% for tkDNN with single threaded renderer.

Usage

  • To build TensorRT engine use
    $ ./pnn build --help    
    pnn-build 
    
    Build TensorRT engine file
    
    USAGE:
        pnn build [OPTIONS] --weights <WEIGHTS> --config <CONFIG>
    
    OPTIONS:
        -b, --batchsize <BATCHSIZE>    Batchsize. [default: 1]
        -c, --config <CONFIG>          Path to config
        -h, --help                     Print help information
            --half                     Build HALF precision engine
        -o, --output <OUTPUT>          Output engine
        -w, --weights <WEIGHTS>        Path to weights
    # For example
    $ ./pnn build -b 4 -c ../../cfgs/tests/yolov4-csp.cfg -w ../../../models/yolov4-csp.weights -o ../../yolo_fp16_bs4.engine --half
  • To run/benchmark/render use
    ./pnn benchmark --help
    
    Do performance benchmark
    
    USAGE:
        pnn benchmark [OPTIONS] --weights <WEIGHTS> --config <CONFIG> --input <INPUT>
    
    OPTIONS:
        -b, --batchsize <BATCHSIZE>          Batchsize [default: 1]
        -c, --config <CONFIG>                Path to config
            --classes-file <CLASSES_FILE>    Confidence threshold [default: ./cfgs/tests/coco.names]
        -h, --help                           Print help information
            --half                           Build HALF precision engine
        -i, --input <INPUT>                  Input file
            --iou-tresh <IOU_TRESH>          Confidence threshold [default: 0.45]
        -o, --output <OUTPUT>                Output render file
        -s, --show                           Render window during work
            --threshold <THRESHOLD>          Confidence threshold [default: 0.45]
            --trt                            Load as TensorRT engine
        -w, --weights <WEIGHTS>              Path to weights
    # For example
    $ ./pnn benchmark -w ~/Sources/models/yolov4-p6.weights -c ~/Sources/models/yolov4-p6.cfg -s -b 1 -i ~/Sources/models/yolo_test.mp4 # Run yolov4-p6 with darknet FP32 and BS1 engine and render result to screen
    
    $ ./pnn benchmark --trt --weights yolo_fp16_bs4.engine -c cfgs/tests/yolov4-csp.cfg --input ../models/yolo_test.mp4 --output res.avi # Run yolo_fp16_bs4.engine engine with predefined in build-time settings and save to result to res.avi
    The result would be like this one
    Stats for      ../models/yolo_test.mp4
    Data type:     FP16
    Batchsize:     1
    Total frames:  1213
    FPS:           147.48 # END-TO-END FPS, including reading/preprocessing/rendering time
    INF+NMS FPS:   159.13 # Inference time + post processing FPS
    Inference FPS: 173.68 # Only inference measured. In case bs != 1 counted by bs * inference FPS
    
  • To show model architecture use
    $ ./pnn dot --help
    Build dot graph of model
    
    USAGE:
        pnn dot --config <CONFIG> --output <OUTPUT>
    
    OPTIONS:
        -c, --config <CONFIG>    Path to config
        -h, --help               Print help information
        -o, --output <OUTPUT>    Output dot file
    # For further conversion use 
    $ dot -Tpng path.dot > path.png
    YOLOv4 CSP architecture

Requirements

  • Rust 2021 edition
  • Clang ≥ 13.0
  • GCC ≥ 9.0
  • NVCC ≥ 10
  • CUDNN ≥ 8
  • TensorRT ≥ 8
  • OpenCV ≥ 4.4

Roadmap

  • CUDNN FP32 Support
  • dot files render
  • TensorRT FP32 Support
  • FP16 mode
  • Python bindings
  • Releases & packages for Rust/C++/Python
  • INT8 support
  • Refitting engine

About

pnn is Darknet compatible neural nets inference engine implemented in Rust.

Topics

Resources

Stars

Watchers

Forks