C++ runtime support in Windows Support, Enhanced Dynamic Shape support in Converters, PyTorch 2.4, CUDA 12.4, TensorRT 10.1, Python 3.12
Torch-TensorRT 2.4.0 targets PyTorch 2.4, CUDA 12.4 (builds for CUDA 11.8/12.1 are available via the PyTorch package index - https://download.pytorch.org/whl/cu118 https://download.pytorch.org/whl/cu121) and TensorRT 10.1.
This version introduces official support for the C++ runtime on the Windows platform, though it is limited to the dynamo frontend, supporting both AOT and JIT workflows. Users can now utilize both Python and C++ runtimes on Windows. Additionally, this release expands support to include all Aten Core Operators, except torch.nonzero
, and significantly increases dynamic shape support across more converters. Python 3.12 is supported for the first time in this release.
Full Windows Support
In this release we introduce both C++ and Python runtime support in Windows. Users can now directly optimize PyTorch models with TensorRT on Windows, with no code changes. C++ runtime is the default option and users can enable Python runtime by specifying use_python_runtime=True
import torch
import torch_tensorrt
import torchvision.models as models
model = models.resnet18(pretrained=True).eval().to("cuda")
input = torch.randn((1, 3, 224, 224)).to("cuda")
trt_mod = torch_tensorrt.compile(model, ir="dynamo", inputs=[input])
trt_mod(input)
Enhanced Op support in Converters
Support for Converters is near 100% of core ATen. At this point fall back to PyTorch execution is either due to specific limitations of converters or some combination of user compiler settings (e.g. torch_executed_ops
, dynamic shape). This release also expands the number of operators that support dynamic shape. dryrun
will provide specific information on your model + settings support.
What's Changed
- fix: FakeTensors appearing in
get_attr
calls by @gs-olive in #2669 - feat: support adaptive_avg_pool1d dynamo converter by @zewenli98 in #2614
- fix: Add cmake missing source file ref for core_lowering.passes by @Arktische in #2672
- ci: Torch nightly version upgrade to
2.4.0
by @gs-olive in #2704 - Add support for
aten.pixel_unshuffle
dynamo converter by @HolyWu in #2696 - feat: support aten.atan2 converter by @chohk88 in #2689
- feat: support aten.index_select converter by @chohk88 in #2710
- feat: support aten.isnan converter by @chohk88 in #2711
- feat: support adaptive avg pool 2d and 3d dynamo converters by @zewenli98 in #2632
- feat: support aten.expm1 converter by @chohk88 in #2714
- fix: Add dependencies to Docker container for
apt
versioning TRT by @gs-olive in #2746 - fix: Missing parameters in compiler settings by @gs-olive in #2749
- fix: param bug in
test_binary_ops_aten
by @zewenli98 in #2733 - aten::empty_like by @apbose in #2654
- empty_permute decomposition by @apbose in #2698
- Removing grid lowering by @apbose in #2686
- Selectively enable different frontends by @narendasan in #2693
- chore(deps): bump transformers from 4.33.2 to 4.36.0 in /tools/perf by @dependabot in #2555
- Fix upsample converter not properly registered by @HolyWu in #2683
- feat: TS Add converter support for aten::grid_sampler by @mfeliz-cruise in #2717
- fix: Bump
torchvision
version by @gs-olive in #2770 - fix: convert_module_to_trt_engine by @zewenli98 in #2728
- chore: cherry pick of save API by @peri044 in #2719
- chore: Upgrade TensorRT version to TRT 10 EA (#2699) by @peri044 in #2774
- Fix minor grammatical corrections by @aakashapoorv in #2779
- feat: cherry-pick of Implement symbolic shape propagation, sym_size converter by @peri044 in #2751
- feat: cherry-pick of torch.compile dynamic shapes by @peri044 in #2750
- chore: bump deps for default workspace file by @narendasan in #2786
- fix: Point infra branch to main by @gs-olive in #2785
- "empty_like" decomposition test correction by @apbose in #2784
- chore: Bump versions by @narendasan in #2787
- fix: refactor layer norm converter with INormalization Layer by @zewenli98 in #2755
- TRT-10 GA Support for main branch by @zewenli98 in #2781
- chore(//tests): Update tests to use assertEqual by @narendasan in #2800
- feat: Add support for
is_causal
argument in attention by @gs-olive in #2780 - feat: Adding support for native int64 by @narendasan in #2789
- chore: small mypy issue by @narendasan in #2803
- Rand converter - evaluator by @apbose in #2580
- cherry-pick: Python Runtime Windows Builds on TRT 10 (#2764) by @gs-olive in #2776
- feat: support 1d ITensor offsets for embedding_bag converter by @zewenli98 in #2677
- chore(deps): bump transformers from 4.36.0 to 4.38.0 in /tools/perf by @dependabot in #2766
- fix: a bug in func run_test_compare_tensor_attributes_only by @zewenli98 in #2809
- Fix ModuleNotFoundError in ptq by @HolyWu in #2814
- docs: Example on how to use custom kernels in Torch-TensorRT by @narendasan in #2812
- typo fix in doc on saving models by @laikhtewari in #2818
- chore: Remove CUDNN dependencies by @zewenli98 in #2804
- fix: bug in elementwise base for static inputs by @zewenli98 in #2819
- Use environment for docgen by @atalman in #2826
- tool: Opset coverage notebook by @narendasan in #2831
- ci: Add release flag for nightly build tag by @gs-olive in #2821
- [doc] Update options documentation for torch.compile by @lanluo-nvidia in #2834
- feat(//py/torch_tensorrt/dynamo): Support for BF16 by @narendasan in #2833
- feat: data parallel inference examples by @bowang007 in #2805
- fix: bugs in TRT 10 upgrade by @zewenli98 in #2832
- feat: support aten._cdist_forward converter by @chohk88 in #2726
- chore: cherry pick of #2805 by @bowang007 in #2851
- feat: Add support for multi-device safe mode in C++ by @gs-olive in #2824
- feat: support aten.log1p converter by @chohk88 in #2823
- feat: support aten.as_strided converter by @chohk88 in #2735
- fix: Fix deconv kernel channel num_output_maps where wts are ITensor by @andi4191 in #2678
- Aten scatter converter by @apbose in #2664
- fix user_guide and tutorial docs by @yoosful in #2854
- chore: Make from and to methods use the same TRT API by @narendasan in #2858
- add aten.topk implementation by @lanluo-nvidia in #2841
- feat: support aten.atan2.out converter by @chohk88 in #2829
- chore: update docker, refactor CI TRT dep to main by @peri044 in #2793
- feat: Cherry pick of Add validators for dynamic shapes in converter registration by @peri044 in #2849
- feat: support aten.diagonal converter by @chohk88 in #2856
- Remove ops from decompositions where converters exist by @HolyWu in #2681
- slice_scatter decomposition by @apbose in #2519
- select_scatter decomp by @apbose in #2515
- manylinux wheel file build update for TensorRT-10.0.1 by @lanluo-nvidia in #2868
- replace itemset due to numpy version 2.0 removed itemset api by @lanluo-nvidia in #2879
- chore: cherry-pick of DS feature by @peri044 in #2857
- feat: TS Add converter support for aten::flip by @mfeliz-cruise in #2722
- ptq test error correction by @apbose in #2860
- feat: Add dynamic shape support for sub by @keehyuna in #2888
- feat: dynamic shapes support for sqrt and copy by @chohk88 in #2889
- add dynamic shape support for aten.ops.gt and aten.ops.ge by @lanluo-nvidia in #2883
- chore: cherry-pick FP8 by @peri044 in #2892
- add dynamic shape support for sin/cos/cat by @lanluo-nvidia in #2887
- Cancel in-progress ci build when a new commit is pushed by @lanluo-nvidia in #2903
- readme by @laikhtewari in #2864
- Only trigger doc gen if it is not a pytorchbot commit by @lanluo-nvidia in #2909
- fix: Handle dynamic shapes in where ops by @keehyuna in #2853
- chore: Dynamic support for split (#2871) into main by @peri044 in #2914
- feat: C++ runtime on Windows by @HolyWu in #2806
- chore: cherry pick of #2709 by @peri044 in #2850
- Add dynamic shape support for layer_norm/native_group_norm/group_norm by @lanluo-nvidia in #2908
- feat: dynamic shapes support for neg ops by @keehyuna in #2878
- empty_stride decomposition by @apbose in #2859
- empty_memory_format evaluator by @apbose in #2745
- gather converter by @apbose in #2905
- feat: Win/Linux Dual Compatible
WORKSPACE
+ Upgrade CUDA + Upgrade PyT by @gs-olive in #2907 - chore: add dynamic shapes section in the resnet tutorial by @peri044 in #2904
- fix: Remove build artifact by @gs-olive in #2924
- feat: Use a global timing cache and add a save option by @peri044 in #2898
- chore: fix ValueRanges computation in symbolic nodes by @peri044 in #2918
- scatter CI failures by @apbose in #2925
- chore: Update layer_norm converter to use INormalizationLayer by @mfeliz-cruise in #2509
- Add dynamic shape support for leaky_relu/elu/hard_sigmoid/softplus by @lanluo-nvidia in #2927
- feat: Improve logging throughout the Dynamo path by @gs-olive in #2405
- fix unsqueeze cannot work on more than 1 dynamic_shape dimensions by @lanluo-nvidia in #2933
- feat: support
native_dropout
dynamo converter by @zewenli98 in #2931 - feat: support aten index_put converter for accumulate=False by @chohk88 in #2880
- feat: support aten.resize_ converter by @chohk88 in #2874
- fix the docker build failure on main by @lanluo-nvidia in #2942
- feat: Add Branches to Docker Build File by @gs-olive in #2935
- add dynamic shape support for amax/amin/max/min/prod/sum by @lanluo-nvidia in #2943
- fix: bug in vgg16_fp8_ptq example by @zewenli98 in #2950
- Fixed layernorm when weight and bias is None in Stable Diffusion 3 by @cehongwang in #2936
- chore: dynamic shape support for rsqrt/erf ops by @keehyuna in #2929
- feat: dynamic shape support for tan, sinh, cosh, asin and acos by @chohk88 in #2941
- fix: Repair integer inputs in dynamic shape cases by @gs-olive in #2876
- Update PYTORCH to 2.4 by @lanluo-nvidia in #2953
- Automate release artifacts build: usage pytorch cxx11 builder base image by @lanluo-nvidia in #2988
- chore: cherrypick of #2855 by @zewenli98 in #3027
- cherry pick 2740 to release2.4 branch. by @lanluo-nvidia in #3033
- cherry pick from 3008 to release/2.4 by @lanluo-nvidia in #3035
- assertEquals is deprecated in TestCase in Python 3.12 by @lanluo-nvidia in #3038
- fix the artifacts name issue by @lanluo-nvidia in #3041
New Contributors
- @Arktische made their first contribution in #2672
- @aakashapoorv made their first contribution in #2779
- @atalman made their first contribution in #2826
- @yoosful made their first contribution in #2854
Full Changelog: v2.3.0...v2.4.0