Releases · NVIDIA/TensorRT

18 Jun 00:26

akhilg-nv

v10.1.0

9db1508

TensorRT OSS v10.1.0 Latest

Latest

Key Features and Updates:

Parser changes
- Added supportsModelV2 API
- Added support for DeformConv operation
- Added support for PluginV3 TensorRT Plugins
- Marked all IParser and IParserRefitter APIs as noexcept
Plugin changes
- Added version 2 of ROIAlign_TRT plugin, which implements the IPluginV3 plugin interface. When importing an ONNX model with the RoiAlign op, this new version of the plugin will be inserted to the TRT network.
Samples changes
- Added a new sample non_zero_plugin, which is a Python version of the C++ sample sampleNonZeroPlugin.
Updated tooling
- Polygraphy v0.49.12
- ONNX-GraphSurgeon v0.5.3

Assets 2

30 Apr 18:05

asfiyab-nvidia

v10.0.1

d2f4ef7

TensorRT OSS v10.0.1

Key Features and Updates:

Parser changes
- Added support for building with protobuf-lite.
- Fixed issue when parsing and refitting models with nested BatchNormalization nodes.
- Added support for empty inputs in custom plugin nodes.
Demo changes
- The following demos have been removed: Jasper, Tacotron2, HuggingFace Diffusers notebook
Updated tooling
- Polygraphy v0.49.10
- ONNX-GraphSurgeon v0.5.2
Build Containers
- Updated default cuda versions to 12.4.0.
- Added Rocky Linux 8 and Rocky Linux 9 build containers

Assets 2

03 Apr 21:45

asfiyab-nvidia

v10.0.0

28733f0

TensorRT v10.0.0

Key Features and Updates:

Samples changes
- Added a sample showcasing weight-stripped engines.
- Added a sample demonstrating the use of custom tactics with IPluginV3.
- Added a sample to showcase plugins with data-dependent output shapes, using IPluginV3.
Parser changes
- Added a new class IParserRefitter that can be used to refit a TensorRT engine with the weights of an ONNX model.
- kNATIVE_INSTANCENORM is now set to ON by default.
- Added support for IPluginV3 interfaces from TensorRT.
- Added support for INT4 quantization.
- Added support for the reduction attribute in ScatterElements.
- Added support for wrap padding mode in Pad
Plugin changes
- A new plugin has been added in compliance with ONNX ScatterElements.
- The TensorRT plugin library no longer has a load-time link dependency on cuBLAS or cuDNN libraries.
- All plugins which relied on cuBLAS/cuDNN handles passed through IPluginV2Ext::attachToContext() have moved to use cuBLAS/cuDNN resources initialized by the plugin library itself. This works by dynamically loading the required cuBLAS/cuDNN library. Additionally, plugins which independently initialized their cuBLAS/cuDNN resources have also moved to dynamically loading the required library. If the respective library is not discoverable through the library path(s), these plugins will not work.
- bertQKVToContextPlugin: Version 2 of this plugin now supports head sizes less than or equal to 32.
- reorgPlugin: Added a version 2 which implements IPluginV2DynamicExt.
- disentangledAttentionPlugin: Fixed a kernel bug.
Demo changes
- HuggingFace demos have been removed. For all users using TensorRT to accelerate Large Language Model inference, please use TensorRT-LLM.
Updated tooling
- Polygraphy v0.49.9
- ONNX-GraphSurgeon v0.5.1
- TensorRT Engine Explorer v0.1.8
Build Containers
- RedHat/CentOS 7.x are no longer officially supported starting with TensorRT 10.0. The corresponding container has been removed from TensorRT-OSS.

Assets 2

09 Feb 22:30

rajeevsrao

v9.3.0

6d1397e

TensorRT OSS v9.3.0

TensorRT OSS release corresponding to TensorRT 9.3.0.1 release.

Updates since TensorRT 9.2.0 release.

Key Features and Updates:

Faster Text-to-image using SDXL & INT8 quantization using AMMO
Updated Polygraphy v0.49.7

Assets 2

05 Dec 00:30

rajeevsrao

v9.2.0

a1820ec

TensorRT OSS v9.2.0

TensorRT OSS release corresponding to TensorRT 9.2.0.5 release.

Updates since TensorRT 9.1.0 release.

Key Features and Updates:

trtexec enhancement: Added --weightless flag to mark the engine as weightless.
Parser changes
- Added support for Hardmax operator.
- Changes to a few operator importers to ensure that TensorRT preserves the precision of operations when using strongly typed mode.
Plugin changes
- Explicit INT8 support added to bertQKVToContextPlugin.
- Various bug fixes.
Updated HuggingFace demo to use transformers v4.31.0 and PyTorch v2.1.0.

Assets 2

20 Oct 00:34

SimengLiu-nv

v9.1.0

b8ada01

TensorRT OSS v9.1.0

TensorRT OSS release corresponding to TensorRT 9.1.0.4 GA release.

Updates since TensorRT 8.6.1 GA release.

Key Features and Updates:

Update the trt_python_plugin sample.
- Python plugins API reference is part of the offical TRT Python API.
Added samples demonstrating the usage of the progress monitor API.
- Check sampleProgressMonitor for the C++ sample.
- Check simple_progress_monitor for the Python sample.
Remove dependencies related to python<3.8 in python samples as we no longer support python<3.8 for python samples.
Demo changes
- Added LAMBADA dataset accuracy checks in the HuggingFace demo.
- Enabled structured sparsity and FP8 quantized batch matrix multiplication(BMM)s in attention in the NeMo demo.
- Replaced deprecated APIs in the BERT demo.
Updated tooling
- Polygraphy v0.49.1

Assets 2

07 Aug 17:09

ttyio

23.08

35477bd

23.08

What's Changed

Fix python bindings build and README
Add kNATIVE_INSTANCENORM flag to demoDiffusion
Update demoDiffusion to support torch 2.x and fix typo in README
Add HuggingFace Stable Diffusion pipeline demo
Upgrade pytorch-quantization to 2.1.3

Full Changelog: v8.6.1...23.08

Assets 2

05 May 00:34

ilyasher

v8.6.1

a25ca8b

TensorRT OSS v8.6.1

TensorRT OSS release corresponding to TensorRT 8.6.1.6 GA release.

Updates since TensorRT 8.6.0 EA release.
Please refer to the TensorRT 8.6.1.6 GA release notes for more information.

Key Features and Updates:

Added a new flag --use-cuda-graph to demoDiffusion to improve performance.
Optimized GPT2 and T5 HuggingFace demos to use fp16 I/O tensors for fp16 networks.

Assets 2

17 Mar 04:06

rajeevsrao

v8.6.0

4af32ba

TensorRT OSS v8.6.0

TensorRT OSS release corresponding to TensorRT 8.6.0.12 EA release.

Updates since TensorRT 8.5.3 GA release.
Please refer to the TensorRT 8.6.0.12 EA release notes for more information.

Key Features and Updates:

demoDiffusion acceleration is now supported out of the box in TensorRT without requiring plugins.
- The following plugins have been removed accordingly: GroupNorm, LayerNorm, MultiHeadCrossAttention, MultiHeadFlashAttention, SeqLen2Spatial, and SplitGeLU.
Added a new sample called onnx_custom_plugin.

We needed to force-push main and release/8.6 branches and v8.6.0 release. If you cloned/pulled the repo recently, please rebase the affected branches. Our apologies for this inconvenience.

Assets 2

03 Feb 20:28

rajeevsrao

8.5.3

b0c259a

TensorRT OSS v8.5.3

TensorRT OSS release corresponding to TensorRT 8.5.3.1 GA release.

Updates since TensorRT 8.5.2 GA release.
Please refer to the TensorRT 8.5.3 GA release notes for more information.

Key Features and Updates:

Added the following HuggingFace demos: GPT-J-6B, GPT2-XL, and GPT2-Medium
Added nvinfer1::plugin namespace
Optimized KV Cache performance for T5

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Releases: NVIDIA/TensorRT

TensorRT OSS v10.1.0

TensorRT OSS v10.0.1

TensorRT v10.0.0

TensorRT OSS v9.3.0

TensorRT OSS v9.2.0

TensorRT OSS v9.1.0

23.08

What's Changed

TensorRT OSS v8.6.1

TensorRT OSS v8.6.0

TensorRT OSS v8.5.3