Skip to content

Releases: NVIDIA/TensorRT

TensorRT OSS v10.1.0

18 Jun 00:26
9db1508
Compare
Choose a tag to compare

Key Features and Updates:

  • Parser changes
    • Added supportsModelV2 API
    • Added support for DeformConv operation
    • Added support for PluginV3 TensorRT Plugins
    • Marked all IParser and IParserRefitter APIs as noexcept
  • Plugin changes
    • Added version 2 of ROIAlign_TRT plugin, which implements the IPluginV3 plugin interface. When importing an ONNX model with the RoiAlign op, this new version of the plugin will be inserted to the TRT network.
  • Samples changes
  • Updated tooling
    • Polygraphy v0.49.12
    • ONNX-GraphSurgeon v0.5.3

TensorRT OSS v10.0.1

30 Apr 18:05
d2f4ef7
Compare
Choose a tag to compare

Key Features and Updates:

  • Parser changes
    • Added support for building with protobuf-lite.
    • Fixed issue when parsing and refitting models with nested BatchNormalization nodes.
    • Added support for empty inputs in custom plugin nodes.
  • Demo changes
    • The following demos have been removed: Jasper, Tacotron2, HuggingFace Diffusers notebook
  • Updated tooling
    • Polygraphy v0.49.10
    • ONNX-GraphSurgeon v0.5.2
  • Build Containers
    • Updated default cuda versions to 12.4.0.
    • Added Rocky Linux 8 and Rocky Linux 9 build containers

TensorRT v10.0.0

03 Apr 21:45
Compare
Choose a tag to compare

Key Features and Updates:

  • Samples changes
    • Added a sample showcasing weight-stripped engines.
    • Added a sample demonstrating the use of custom tactics with IPluginV3.
    • Added a sample to showcase plugins with data-dependent output shapes, using IPluginV3.
  • Parser changes
    • Added a new class IParserRefitter that can be used to refit a TensorRT engine with the weights of an ONNX model.
    • kNATIVE_INSTANCENORM is now set to ON by default.
    • Added support for IPluginV3 interfaces from TensorRT.
    • Added support for INT4 quantization.
    • Added support for the reduction attribute in ScatterElements.
    • Added support for wrap padding mode in Pad
  • Plugin changes
    • A new plugin has been added in compliance with ONNX ScatterElements.
    • The TensorRT plugin library no longer has a load-time link dependency on cuBLAS or cuDNN libraries.
    • All plugins which relied on cuBLAS/cuDNN handles passed through IPluginV2Ext::attachToContext() have moved to use cuBLAS/cuDNN resources initialized by the plugin library itself. This works by dynamically loading the required cuBLAS/cuDNN library. Additionally, plugins which independently initialized their cuBLAS/cuDNN resources have also moved to dynamically loading the required library. If the respective library is not discoverable through the library path(s), these plugins will not work.
    • bertQKVToContextPlugin: Version 2 of this plugin now supports head sizes less than or equal to 32.
    • reorgPlugin: Added a version 2 which implements IPluginV2DynamicExt.
    • disentangledAttentionPlugin: Fixed a kernel bug.
  • Demo changes
    • HuggingFace demos have been removed. For all users using TensorRT to accelerate Large Language Model inference, please use TensorRT-LLM.
  • Updated tooling
    • Polygraphy v0.49.9
    • ONNX-GraphSurgeon v0.5.1
    • TensorRT Engine Explorer v0.1.8
  • Build Containers
    • RedHat/CentOS 7.x are no longer officially supported starting with TensorRT 10.0. The corresponding container has been removed from TensorRT-OSS.

TensorRT OSS v9.3.0

09 Feb 22:30
6d1397e
Compare
Choose a tag to compare

TensorRT OSS release corresponding to TensorRT 9.3.0.1 release.

Updates since TensorRT 9.2.0 release.

Key Features and Updates:

  • Faster Text-to-image using SDXL & INT8 quantization using AMMO
  • Updated Polygraphy v0.49.7

TensorRT OSS v9.2.0

05 Dec 00:30
Compare
Choose a tag to compare

TensorRT OSS release corresponding to TensorRT 9.2.0.5 release.

Updates since TensorRT 9.1.0 release.

Key Features and Updates:

  • trtexec enhancement: Added --weightless flag to mark the engine as weightless.
  • Parser changes
    • Added support for Hardmax operator.
    • Changes to a few operator importers to ensure that TensorRT preserves the precision of operations when using strongly typed mode.
  • Plugin changes
    • Explicit INT8 support added to bertQKVToContextPlugin.
    • Various bug fixes.
  • Updated HuggingFace demo to use transformers v4.31.0 and PyTorch v2.1.0.

TensorRT OSS v9.1.0

20 Oct 00:34
b8ada01
Compare
Choose a tag to compare

TensorRT OSS release corresponding to TensorRT 9.1.0.4 GA release.

Updates since TensorRT 8.6.1 GA release.

Key Features and Updates:

  • Update the trt_python_plugin sample.
    • Python plugins API reference is part of the offical TRT Python API.
  • Added samples demonstrating the usage of the progress monitor API.
  • Remove dependencies related to python<3.8 in python samples as we no longer support python<3.8 for python samples.
  • Demo changes
    • Added LAMBADA dataset accuracy checks in the HuggingFace demo.
    • Enabled structured sparsity and FP8 quantized batch matrix multiplication(BMM)s in attention in the NeMo demo.
    • Replaced deprecated APIs in the BERT demo.
  • Updated tooling
    • Polygraphy v0.49.1

23.08

07 Aug 17:09
Compare
Choose a tag to compare

What's Changed

  • Fix python bindings build and README
  • Add kNATIVE_INSTANCENORM flag to demoDiffusion
  • Update demoDiffusion to support torch 2.x and fix typo in README
  • Add HuggingFace Stable Diffusion pipeline demo
  • Upgrade pytorch-quantization to 2.1.3

Full Changelog: v8.6.1...23.08

TensorRT OSS v8.6.1

05 May 00:34
a25ca8b
Compare
Choose a tag to compare

TensorRT OSS release corresponding to TensorRT 8.6.1.6 GA release.

Key Features and Updates:

  • Added a new flag --use-cuda-graph to demoDiffusion to improve performance.
  • Optimized GPT2 and T5 HuggingFace demos to use fp16 I/O tensors for fp16 networks.

TensorRT OSS v8.6.0

17 Mar 04:06
Compare
Choose a tag to compare

TensorRT OSS release corresponding to TensorRT 8.6.0.12 EA release.

Key Features and Updates:

  • demoDiffusion acceleration is now supported out of the box in TensorRT without requiring plugins.
    • The following plugins have been removed accordingly: GroupNorm, LayerNorm, MultiHeadCrossAttention, MultiHeadFlashAttention, SeqLen2Spatial, and SplitGeLU.
  • Added a new sample called onnx_custom_plugin.

We needed to force-push main and release/8.6 branches and v8.6.0 release. If you cloned/pulled the repo recently, please rebase the affected branches. Our apologies for this inconvenience.

TensorRT OSS v8.5.3

03 Feb 20:28
b0c259a
Compare
Choose a tag to compare

TensorRT OSS release corresponding to TensorRT 8.5.3.1 GA release.

Key Features and Updates:

  • Added the following HuggingFace demos: GPT-J-6B, GPT2-XL, and GPT2-Medium
  • Added nvinfer1::plugin namespace
  • Optimized KV Cache performance for T5