Release Torch-TensorRT v2.2.0 · pytorch/TensorRT

Dynamo Frontend for Torch-TensorRT, PyTorch 2.2, CUDA 12.1, TensorRT 8.6

Torch-TensorRT 2.2.0 targets PyTorch 2.2, CUDA 12.1 (builds for CUDA 11.8 are available via the PyTorch package index - https://download.pytorch.org/whl/cu118) and TensorRT 8.6. This release is the second major release of Torch-TensorRT as the default frontend has changed from TorchScript to Dynamo allowing for users to more easily control and customize the compiler in Python.

The dynamo frontend can support both JIT workflows through torch.compile and AOT workflows through torch.export + torch_tensorrt.compile. It targets the Core ATen Opset (https://pytorch.org/docs/stable/torch.compiler_ir.html#core-aten-ir) and currently has 82% coverage. Just like in Torchscript graphs will be partitioned based on the ability to map operators to TensorRT in addition to any graph surgery done in Dynamo.

Output Format

Through the Dynamo frontend, different output formats can be selected for AOT workflows via the output_format kwarg. The choices are torchscript where the resulting compiled module will be traced with torch.jit.trace, suitable for Pythonless deployments, exported_program a new serializable format for PyTorch models or finally if you would like to run further graph transformations on the resultant model, graph_module will return a torch.fx.GraphModule.

Multi-GPU Safety

To address a long standing source of overhead, single GPU systems will now operate without typical required device checks. This check can be re-added when multiple GPUs are available to the host process using torch_tensorrt.runtime.set_multi_device_safe_mode

# Enables Multi Device Safe Mode
torch_tensorrt.runtime.set_multi_device_safe_mode(True)

# Disables Multi Device Safe Mode [Default Behavior]
torch_tensorrt.runtime.set_multi_device_safe_mode(False)

# Enables Multi Device Safe Mode, then resets the safe mode to its prior setting
with torch_tensorrt.runtime.set_multi_device_safe_mode(True):
    ...

More information can be found here: https://pytorch.org/TensorRT/user_guide/runtime.html

Capability Validators

In the Dynamo frontend, tests can be written and associated with converters to dynamically enable or disable them based on conditions in the target graph.

For example, the convolution converter in dynamo only supports 1D, 2D, and 3D convolution. We can therefore create a lambda which given a convolution FX node can determine if the convolution is supported:

@dynamo_tensorrt_converter(
    torch.ops.aten.convolution.default, 
     capability_validator=lambda conv_node: conv_node.args[7] in ([0], [0, 0], [0, 0, 0])
)  # type: ignore[misc]
def aten_ops_convolution(
    ctx: ConversionContext,
    target: Target,
    args: Tuple[Argument, ...],
    kwargs: Dict[str, Argument],
    name: str,
) -> Union[TRTTensor, Sequence[TRTTensor]]:

In such a case where the Node is not supported, the node will be partitioned out and run in PyTorch.
All capability validators are run prior to partitioning, after the lowering phase.

More information on writing converters for the Dynamo frontend can be found here: https://pytorch.org/TensorRT/contributors/dynamo_converters.html

Breaking Changes

Dynamo (torch.export) is now the default frontend for Torch-TensorRT. The TorchScript and FX frontends are now in maintenance mode. Therefore any torch.nn.Modules or torch.fx.GraphModules provided to torch_tensorrt.compile will by default be exported using torch.export then compiled. This default can be overridden by setting the ir=[torchscript|fx] kwarg. Any bugs reported will first be attempted to be resolved in the dynamo stack before attempting other frontends however pull requests for additional functionally in the TorchScript and FX frontends from the community will still be accepted.

What's Changed

chore: Update Torch and Torch-TRT versions and docs on main by @gs-olive in #1784
fix: Repair invalid schema arising from lowering pass by @gs-olive in #1786
fix: Allow full model compilation with collection inputs (input_signature) by @gs-olive in #1656
feat(//core/conversion): Add support for aten::size with dynamic shaped models for Torchscript backend. by @peri044 in #1647
feat: add support for aten::baddbmm by @mfeliz-cruise in #1806
[feat] Add dynamic conversion path to aten::mul evaluator by @mfeliz-cruise in #1710
[fix] aten::stack with dynamic inputs by @mfeliz-cruise in #1804
fix undefined attr issue by @bowang007 in #1783
fix: Out-Of-Bounds bug in Unsqueeze by @gs-olive in #1820
feat: Upgrade Docker build to use custom TRT + CUDNN by @gs-olive in #1805
fix: include str ivalue type conversion by @bowang007 in #1785
fix: dependency order of inserted long input casts by @mfeliz-cruise in #1833
feat: Add ts converter support for aten::all.dim by @mfeliz-cruise in #1840
fix: Error caused by invalid binding name in TRTEngine.to_str() method by @gs-olive in #1846
fix: Implement aten.mean.default and aten.mean.dim converters by @gs-olive in #1810
feat: Add converter for aten::log2 by @mfeliz-cruise in #1866
feat: Add support for aten::where with scalar other by @mfeliz-cruise in #1855
feat: Add converter support for logical_and by @mfeliz-cruise in #1856
feat: Refactor FX APIs under dynamo namespace for parity with TS APIs by @peri044 in #1807
fix: Add version checking for torch._dynamo import in __init__ by @gs-olive in #1881
fix: Improve Docker build robustness, add validation by @gs-olive in #1873
fix: Improve input weight handling to acc_ops convolution layers in FX by @gs-olive in #1886
fix: Upgrade main to TRT 8.6, CUDA 11.8, CuDNN 8.8, Torch Dev by @gs-olive in #1852
feat: Wrap dynamic size handling in a compilation flag by @peri044 in #1851
fix: Add torchvision legacy CI parameter by @gs-olive in #1918
Sync fb internal change to OSS by @wushirong in #1892
fix: Reorganize Dynamo directory + backends by @gs-olive in #1928
fix: Improve partitioning + lowering systems in torch.compile path by @gs-olive in #1879
fix: Upgrade TRT to 8.6.1, parallelize FX tests in CI by @gs-olive in #1930
feat: Add issue template for Story by @gs-olive in #1936
feat: support type promotion in aten::cat converter by @mfeliz-cruise in #1911
Reorg for converters in (FX Converter Refactor [1/N]) by @narendasan in #1867
fix: Add support for default dimension in aten.cat by @gs-olive in #1863
Relaxing glob pattern for CUDA12 by @borisfom in #1950
refactor: Centralizing sigmoid implementation (FX Converter Refactor [2/N]) <Target: converter_reorg_proto> by @narendasan in #1868
fix: Address .numpy() issue on fake tensors by @gs-olive in #1949
feat: Add support for passing through build issues in Dynamo compile by @gs-olive in #1952
fix: int/int=float division by @mfeliz-cruise in #1957
fix: Support dims < -1 in aten::stack converter by @mfeliz-cruise in #1947
fix: Resolve issue in isInputDynamic with mixed static/dynamic shapes by @mfeliz-cruise in #1883
DLFW changes by @apbose in #1878
feat: Add converter for aten::isfinite by @mfeliz-cruise in #1841
Reorg for converters in hardtanh(FX Converter Refactor [5/N]) <Target: converter_reorg_proto> by @apbose in #1901
fix/feat: Add lowering pass to resolve most aten::Int.Tensor uses by @gs-olive in #1937
fix: Add decomposition for aten.addmm by @gs-olive in #1953
Reorg for converters tanh (FX Converter Refactor [4/N]) <Target: converter_reorg_proto> by @apbose in #1900
Reorg for converters leaky_relu (FX Converter Refactor [6/N]) <Target: converter_reorg_proto> by @apbose in #1902
Upstream 3 features to fx_ts_compat: MS, VC, Optimization Level by @wu6u3tw in #1935
fix: Add lowering pass to remove output repacking in convert_method_to_trt_engine calls by @gs-olive in #1945
Fixing aten::slice invalid schema and implementing aten::list evaluator by @apbose in #1695
fix: Rewrite constant_pad_nd to use a single slice layer for performance by @mfeliz-cruise in #1970
Adding converter aten::chunk in torchscript by @apbose in #1802
fix: Repair index used to access tensor bindings by @gs-olive in #1998
Reorg for converters elu and selu (FX Converter Refactor [7/N]) <Target: converter_reorg_proto> by @apbose in #1903
chore(deps): bump transformers from 4.17.0 to 4.30.0 in /tests/modules by @dependabot in #2013
fix: Repair input range on BERT inputs for CI by @gs-olive in #2017
fix: Refactor assertions in E2E tests for Dynamo by @gs-olive in #2001
chore/fix: Update TRTInterpreter impl in Dynamo compile [1 / x] by @gs-olive in #2002
fix: Repair flaky TopK core test by @gs-olive in #2022
feat: Add options kwargs for Torch compile [3 / x] by @gs-olive in #2005
feat: Add support for output data types in TRTInterpreter [2 / x] by @gs-olive in #2004
chore: Upgrade Torch nightly to 2.1.0.dev20230605 [4 / x] by @gs-olive in #1975
fix: Repair output binding indexing scheme in TRT by @gs-olive in #2054
fix: Improve logging and kwarg passing in Dynamo by @gs-olive in #2052
fix: Add support for fake tensors by @gs-olive in #1955
fix: Repair argument passing in both Dynamo paths by @gs-olive in #1997
minor fix: Dynamo CI fix due to merge issue by @gs-olive in #2067
feat: Module-Acceleration in Dynamo [5 / x] by @gs-olive in #1979
fix/feat: Move convolution core to impl + add feature (FX converter refactor) by @gs-olive in #1972
chore: Upgrade to CUDA 12.1 by @gs-olive in #2020
fix: Repair null bindings issue in TRT Engines by @gs-olive in #2080
fix: Add python3 symlink in final container by @gs-olive in #2085
feat: Add support for TorchTensorRTModule in Dynamo [1 / x] by @gs-olive in #2003
fix: Repair import error for legacy TS testing by @gs-olive in #2091
chore: Update Torch to Jul 3 Nightly by @gs-olive in #2099
fix: Repair graph naming for FX legacy suite by @gs-olive in #2111
DLFW changes by @apbose in #2109
fix: Update CI GPU Class by @gs-olive in #2116
fix: Replace EliminateExceptions lowering pass by @gs-olive in #1859
chore: Improve error propagation for torch compile by @gs-olive in #2106
fix: Repair version checking system for Torch by @gs-olive in #2118
feat: Dynamo refactor by @peri044 in #2104
feat: Set default ir to dynamo export by @peri044 in #2029
fix: TRTInterpreter output lacks return value by @gs-olive in #2114
fix/feat: Add Dynamo-only converter registry by @gs-olive in #1944
fix: Add support for truncate_long_and_double in Dynamo [8 / x] by @gs-olive in #1983
docs: Update readme to include TRT as a seperate install dep. by @narendasan in #2137
fix: Move all aten PRs to Dynamo converter registry by @gs-olive in #2070
Change python build system to be PEP517 compatible by @narendasan in #2056
chore: fix the docgen job by @narendasan in #2158
feat: Implement dynamic shape support for floordiv, NumToTensor, layer_norm by @peri044 in #2006
examples: Add example usage scripts for torch_tensorrt.dynamo.compile path [1.1 / x] by @gs-olive in #1966
[feat] TS: Add support for dynamic select and masked_fill by @mfeliz-cruise in #2115
feat: Added support for aten::unflatten converter by @andi4191 in #2097
feat: Added a variant for aten::fake_quant_per_tensor by @andi4191 in #2107
ci: Add automatic GHA job to build + push Docker Container on main by @gs-olive in #2129
chore: Add pyyaml import to GHA Docker job by @gs-olive in #2170
feat(torch_tensorrt.dynamo.tools): Tool to calculate coverage of PyTorch by @narendasan in #2166
chore: Add parallelism to Dynamo tests by @gs-olive in #2165
feat: Add support for dynamic zeros_like and ones_like by @mfeliz-cruise in #1847
feat: Added support for aten::tile converter by @andi4191 in #2105
Improve Python tooling by @narendasan in #2126
Py38 compatibility by @narendasan in #2189
abandoned create_plugin() function by @zewenli98 in #2146
feat: Improve Dynamo partitioning System Performance on Large Models by @gs-olive in #2175
feat: Improve Logging in Dynamo by @peri044 in #2194
feat: Add ExportedProgram as an IR by @peri044 in #2191
feat: Improve layer naming by @peri044 in #2162
fix: Update aten.embedding to reflect schema by @gs-olive in #2182
feat: Add _to_copy, operator.get and clone ATen converters by @gs-olive in #2161
fix: Repair broadcasting utility for aten.where by @gs-olive in #2228
chore: Fix Logging in torch_compile path by @peri044 in #2238
feat: Add Selective ATen decompositions by @gs-olive in #2173
Type mismatch for dynamo aten::where converter by @apbose in #2198
fix: Set dynamic=False in torch.compile call by @gs-olive in #2240
fix: Allow rank differences in aten.expand by @gs-olive in #2234
fix: Address runtimes with 0D inputs by @gs-olive in #2188
feat: support many unary dynamo converters by @zewenli98 in #2246
fix: Decrease Docker container size by 20% by @gs-olive in #2257
feat: Add support for device compilation setting by @gs-olive in #2190
fix: Legacy CI pip installation by @gs-olive in #2239
feat: support amax dynamo converter by @zewenli98 in #2241
feat: Exempt default softmax from decomposition by @gs-olive in #2268
fix: Reorganize Dynamo testing directories by @gs-olive in #2255
feat: Add support for require_full_compilation in Dynamo by @gs-olive in #2138
fix: Unify layers in Docker Container Cleanup by @gs-olive in #2275
infra: testing out GHA CI by @narendasan in #2073
feat: support conv dynamo converter by @zewenli98 in #2252
Enabling var_mean decomposition by @apbose in #2273
fix: add an arg in matmul by @zewenli98 in #2279
feat: support activation dynamo converters by @zewenli98 in #2254
feat: support torch.ops.aten.sum.(default and dim_IntList) dynamo converter by @zewenli98 in #2278
tools(opset_coverage): Map default ops to unoverloaded ops by @narendasan in #2292
add initial support for torch.ops.aten.neg.default converter by @bowang007 in #2147
fix: Torch Upgrade to 2.2.0.dev by @gs-olive in #2298
chore: enabling TS FE testing by @narendasan in #2283
Update _Input.py by @phyboy in #2293
feat: support many elementwise dynamo converters by @zewenli98 in #2263
feat: support linear (fully connected layer) dynamo converter by @zewenli98 in #2253
WAR: Disabling ViT tests until exporting with py311 is fixed by @narendasan in #2305
neg converter correction by @apbose in #2307
feat: Add preliminary support for freezing tensors in Dynamo by @gs-olive in #2128
fix: Wrap import of ConstantFold utilities by @gs-olive in #2312
fix: Move aten.neg test case by @gs-olive in #2310
small fix: Packaging version switch by @gs-olive in #2315
fix: Register tensorrt backend name by @gs-olive in #2311
feat: Transition export workflows to use torch._export APIs by @peri044 in #2195
fix: Add special cases for clone and to_copy where input of graph is output by @gs-olive in #2265
fix: Raise error when registering Packet-keyed converter by @gs-olive in #2285
FX converter documentation by @apbose in #2039
aten::split converter by @apbose in #2232
DLFW changes by @apbose in #2281
feat: Add ATen lowering pass system by @gs-olive in #2280
fix: Support non -1 end idx and <0 start idx in aten::flatten converter by @mfeliz-cruise in #2321
Dynamo converter support for torch.ops.aten.erf.default op by @bowang007 in #2164
fix: Update Torchvision version to address dependency resolution issue by @gs-olive in #2339
fix: Remove input aliasing of builtin ops by @gs-olive in #2276
fix: Allow low rank inputs in Python Runtime by @gs-olive in #2282
fix: Address multi-GPU issue in engine deserialize by @gs-olive in #2325
feat: support deconv (1d, 2d, and Nd) dynamo converter by @zewenli98 in #2337
Update usage of PyTorch's custom op API by @zou3519 in #2193
feat: support bmm converter in dynamo by @bowang007 in #2248
feat: support 1D, 2D, and 3D avg and max pooling dynamo converters by @zewenli98 in #2317
fix: Add support for negative dimensions in reduce by @gs-olive in #2347
feat: Add tensor type enforcement for converters by @gs-olive in #2324
fix: Issue in TS dimension-squeeze utility by @gs-olive in #2336
perf: Add lowering passes to improve TRT runtime on SD by @gs-olive in #2351
feat: Implement Dynamic shapes + fallback support for export path by @peri044 in #2271
feat: Add maxpool lowering passes and experimental folder in Dynamo by @gs-olive in #2358
Aten::Index converter by @apbose in #2277
feat: Implement support for exporting Torch-TensorRT compiled graphs using torch.export serde APIs by @peri044 in #2249
chore: Switch converter tests to generate standalone ops using fx.symbolic_trace by @peri044 in #2361
fix/feat: Add and repair multiple converters for SD + other models by @gs-olive in #2353
feat: support flatten and reshape via shuffle_layer by @zewenli98 in #2354
feat: support prod, max, min, and mean via reduce layer by @zewenli98 in #2355
minor fix: Update get_ir prefixes by @gs-olive in #2369
Dynamo converter cat by @apbose in #2343
fix: Repair issue in Torch Constant Folder by @gs-olive in #2375
fix: Repair aten.where with Numpy + Broadcast by @gs-olive in #2372
Cherry-pick changes from main into release/2.1 by @narendasan in #2302
cherry-pick: Key converters and documentation to release/2.1 by @gs-olive in #2387
cherry-pick: Decompostion fix and documentation updates by @gs-olive in #2391
feat: Wrap ExportedPrograms transformations with an API, allow dynamo.compile to accept graphmodules. by @peri044 in #2388
Cherry-pick : Add documentation for dynamo.compile backend (#2389) by @peri044 in #2416
cherry-pick: Transformer XL fix to release/2.1 by @gs-olive in #2414
cherry-pick/fix: Performance benchmarking fixes and Torch version fix by @gs-olive in #2433
Cherry pick 2420 to release/2.1 by @peri044 in #2425
cherry-pick/minor fix: Parse out slashes in Docker container name (#2437) by @gs-olive in #2438
chore: fix docs for export [release/2.1] by @peri044 in #2448
chore: add additional native BN converter (cherry-pick of #2446) by @peri044 in #2452
cherry-pick/fix: Docs rendering on PyTorch site (#2440) by @gs-olive in #2441
minor fix: Update Benchmark values (#2453) by @gs-olive in #2454
cherry-pick: Wrap perf benchmarks with no_grad (#2466) by @gs-olive in #2470
chore: Upgrade release to Torch 2.1.1 by @gs-olive in #2472
fix: Naming issue in opset coverage tool by @gs-olive in #2477
cherry-pick: View and slice bugfixes by @gs-olive in #2500
cherry-pick: Perf + Bugfix PRs by @gs-olive in #2513
fix: release/2.1 CI Repair by @gs-olive in #2528
cherry-pick: Safe mode and Build Arguments PRs by @gs-olive in #2521
cherry-pick: Port most changes from main by @gs-olive in #2574
chore: clean up AWS credentials PR changes by @peri044 in #2608
chore: Set return type of compilation to ExportedProgram [release/2.2] by @peri044 in #2607
cherry-pick: Docker fixes release/2.2 by @gs-olive in #2628
fix: Upgrade versions for Docker build rel 2.2 by @gs-olive in #2630
cherry-pick: Remove keyserver fetch from Dockerfile (#2639) by @gs-olive in #2640
cherry-pick: Remove extraneous argument in compile (#2635) by @gs-olive in #2638
cherry-pick: Attention converter and linting fixes by @gs-olive in #2641
small fix: Index validator enable int64 (#2642) by @gs-olive in #2643

New Contributors

@wushirong made their first contribution in #1892
@wu6u3tw made their first contribution in #1935
@phyboy made their first contribution in #2293
@zou3519 made their first contribution in #2193

Full Changelog: v1.4.0...v2.2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Torch-TensorRT v2.2.0