MD-TRT Support, Compile/Export, C++ and Python by narendasan · Pull Request #4183 · pytorch/TensorRT

narendasan · 2026-04-12T19:08:47Z

Description

Opening this to test the CI

Fixes # (issue)

Type of change

Please delete options that are not relevant and/or add your own.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Checklist:

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

- C++ runtime: NCCL communicator init via c10d, rank/world_size serialization, DynamicOutputAllocator, ABI version bump to 8 - Python runtime: distributed support in PythonTorchTensorRTModule and TorchTensorRTModule, NCCL library auto-detection - Conversion: native TRT DistCollective API (AllGather, ReduceScatter, AllReduce) with TRT-LLM plugin fallback - Graph lowering: fuse c10d_functional collectives + wait_tensor into single ops - Feature detection: native_trt_collectives flag, platform validation, graceful fallback chain - Build: conditional NCCL compilation via torch_nccl toolchain - Examples: tensor_parallel_simple_example.py, tensor_parallel_llama_llm.py

…, ABI v9

…g and enable DTensor decomposition

…hapes Five interconnected fixes: 1. fold_get_attr_item_calls: fold scalar param .item() calls into Python scalars before AOT tracing. Inside FakeTensorMode, even real-tensor .item() calls raise DataDependentOutputException. 2. backends.py: three changes: - call fold_get_attr_item_calls before entering FakeTensorMode - detect vmap/higher-order ops and route them through aot_autograd instead of aot_export_joint_simple (which doesn't handle HOPs) - on TRT build failure, strip TRT-only kwargs (use_fp32_acc) from the fallback graph before returning it to PyTorch 3. _decompositions.py: prevent SDPA from leaking back into the decomp table via Core ATen Interchange ops even after being removed from TORCH_TRT_DECOMPOSITIONS. 4. partitioning/common.py: lower the default max dynamic shape from min*2^16 to min*2^12 — 65536 is too large for TRT to find kernel implementations for attention ops. 5. _TorchTensorRTModule.py: move CPU scalar inputs to CUDA before execution — aot_autograd lifts scalar attributes (e.g. head_dim^-0.5) as explicit graph inputs; TRT requires all inputs on CUDA. Also fixes remove_sym_nodes to match tensor sources by equality rather than local_name so that GetItemSource bases (from torch.compile dynamic=True) are matched correctly, and updates register_sdpa.py to handle aten.scaled_dot_product_attention.default (the form produced after aot_autograd) in addition to the flash/efficient variants.

…a 3.2 1B

github-actions

There are some changes that do not conform to C++ style guidelines:

diff --git a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.h b/tmp/changes.txt
index cd8af65..615600d 100644
--- a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.h
+++ b/tmp/changes.txt
@@ -33,17 +33,17 @@ namespace core {
namespace runtime {

using FlattenedState = std::tuple<
-    std::tuple<std::string, std::string>,  // ABI_VERSION
-    std::tuple<std::string, std::string>,  // name
-    std::tuple<std::string, std::string>,  // device
-    std::tuple<std::string, std::string>,  // engine
-    std::tuple<std::string, std::string>,  // input binding names
-    std::tuple<std::string, std::string>,  // output binding names
-    std::tuple<std::string, std::string>,  // HW compatibility
-    std::tuple<std::string, std::string>,  // requires_output_allocator
-    std::tuple<std::string, std::string>,  // serialized metadata
-    std::tuple<std::string, std::string>,  // Platform
-    std::tuple<std::string, std::string>,  // Resource Allocation Strategy
+    std::tuple<std::string, std::string>, // ABI_VERSION
+    std::tuple<std::string, std::string>, // name
+    std::tuple<std::string, std::string>, // device
+    std::tuple<std::string, std::string>, // engine
+    std::tuple<std::string, std::string>, // input binding names
+    std::tuple<std::string, std::string>, // output binding names
+    std::tuple<std::string, std::string>, // HW compatibility
+    std::tuple<std::string, std::string>, // requires_output_allocator
+    std::tuple<std::string, std::string>, // serialized metadata
+    std::tuple<std::string, std::string>, // Platform
+    std::tuple<std::string, std::string>, // Resource Allocation Strategy
    std::tuple<std::string, std::string>>; // requires_multidevice

struct TorchTRTRuntimeStates {
ERROR: Some files do not conform to style guidelines

github-actions

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/runtime/_PythonTorchTensorRTModule.py	2026-04-21 00:20:02.596692+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/runtime/_PythonTorchTensorRTModule.py	2026-04-21 00:20:22.271000+00:00
@@ -386,11 +386,10 @@
                logger.debug(
                    "Barrier after execution context creation (distributed NCCL engine)"
                )
                dist.barrier()

-
        if ENABLED_FEATURES.tensorrt_rtx:
            self._setup_runtime_config()

        self.context = self._create_context()
        assert self.context is not None, "Failed to create execution context"

…us option

github-actions

There are some changes that do not conform to C++ style guidelines:

diff --git a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp b/tmp/changes.txt
index 4b91415..ae5232b 100644
--- a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp
+++ b/tmp/changes.txt
@@ -573,14 +573,16 @@ bool TRTEngine::bind_nccl_comm() {
    } else if (nccl_groups.size() > 1) {
      std::string names;
      for (const auto& n : nccl_groups) {
-        if (!names.empty()) names += ", ";
+        if (!names.empty())
+          names += ", ";
        names += "'" + n + "'";
      }
      LOG_WARNING(
          "This TRT engine requires NCCL but multiple NCCL process groups are registered ("
-          << names << "). Cannot auto-select a group — NCCL bind deferred. "
-          "Use the recommended workflow: "
-          "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
+          << names
+          << "). Cannot auto-select a group — NCCL bind deferred. "
+             "Use the recommended workflow: "
+             "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
    } else {
      LOG_WARNING(
          "This TRT engine requires NCCL (requires_native_multidevice=true) but no NCCL process group "
ERROR: Some files do not conform to style guidelines

github-actions

There are some changes that do not conform to C++ style guidelines:

diff --git a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp b/tmp/changes.txt
index 4b91415..ae5232b 100644
--- a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp
+++ b/tmp/changes.txt
@@ -573,14 +573,16 @@ bool TRTEngine::bind_nccl_comm() {
    } else if (nccl_groups.size() > 1) {
      std::string names;
      for (const auto& n : nccl_groups) {
-        if (!names.empty()) names += ", ";
+        if (!names.empty())
+          names += ", ";
        names += "'" + n + "'";
      }
      LOG_WARNING(
          "This TRT engine requires NCCL but multiple NCCL process groups are registered ("
-          << names << "). Cannot auto-select a group — NCCL bind deferred. "
-          "Use the recommended workflow: "
-          "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
+          << names
+          << "). Cannot auto-select a group — NCCL bind deferred. "
+             "Use the recommended workflow: "
+             "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
    } else {
      LOG_WARNING(
          "This TRT engine requires NCCL (requires_native_multidevice=true) but no NCCL process group "
ERROR: Some files do not conform to style guidelines

github-actions

There are some changes that do not conform to C++ style guidelines:

diff --git a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp b/tmp/changes.txt
index 4b91415..ae5232b 100644
--- a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp
+++ b/tmp/changes.txt
@@ -573,14 +573,16 @@ bool TRTEngine::bind_nccl_comm() {
    } else if (nccl_groups.size() > 1) {
      std::string names;
      for (const auto& n : nccl_groups) {
-        if (!names.empty()) names += ", ";
+        if (!names.empty())
+          names += ", ";
        names += "'" + n + "'";
      }
      LOG_WARNING(
          "This TRT engine requires NCCL but multiple NCCL process groups are registered ("
-          << names << "). Cannot auto-select a group — NCCL bind deferred. "
-          "Use the recommended workflow: "
-          "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
+          << names
+          << "). Cannot auto-select a group — NCCL bind deferred. "
+             "Use the recommended workflow: "
+             "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
    } else {
      LOG_WARNING(
          "This TRT engine requires NCCL (requires_native_multidevice=true) but no NCCL process group "
ERROR: Some files do not conform to style guidelines

github-actions

There are some changes that do not conform to C++ style guidelines:

diff --git a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp b/tmp/changes.txt
index 4b91415..ae5232b 100644
--- a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp
+++ b/tmp/changes.txt
@@ -573,14 +573,16 @@ bool TRTEngine::bind_nccl_comm() {
    } else if (nccl_groups.size() > 1) {
      std::string names;
      for (const auto& n : nccl_groups) {
-        if (!names.empty()) names += ", ";
+        if (!names.empty())
+          names += ", ";
        names += "'" + n + "'";
      }
      LOG_WARNING(
          "This TRT engine requires NCCL but multiple NCCL process groups are registered ("
-          << names << "). Cannot auto-select a group — NCCL bind deferred. "
-          "Use the recommended workflow: "
-          "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
+          << names
+          << "). Cannot auto-select a group — NCCL bind deferred. "
+             "Use the recommended workflow: "
+             "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
    } else {
      LOG_WARNING(
          "This TRT engine requires NCCL (requires_native_multidevice=true) but no NCCL process group "
ERROR: Some files do not conform to style guidelines

+        uses: pytorch/test-infra/.github/workflows/generate_binary_build_matrix.yml@main
        with:
-          repository: pytorch/tensorrt
-      - name: Generate matrix
-        id: generate
-        run: |
-          set -eou pipefail
-          MATRIX_BLOB=${{ toJSON(needs.generate-matrix.outputs.matrix) }}
-          MATRIX_BLOB="$(python3 .github/scripts/filter-matrix.py --matrix "${MATRIX_BLOB}")"
-          echo "${MATRIX_BLOB}"
-          echo "matrix=${MATRIX_BLOB}" >> "${GITHUB_OUTPUT}"
+            package-type: wheel
+            os: linux
+            test-infra-repository: pytorch/test-infra
+            test-infra-ref: main
+            with-rocm: false
+            with-cpu: false

-  build:
-    needs: filter-matrix
-    permissions:
-      id-token: write
-      contents: read
-    strategy:
-      fail-fast: false
-      matrix:
-        include:
-          - repository: pytorch/tensorrt
-            pre-script: packaging/pre_build_script.sh
-            env-var-script: packaging/env_vars.txt
-            post-script: packaging/post_build_script.sh
-            smoke-test-script: packaging/smoke_test_script.sh
-            package-name: torch_tensorrt
-            display-name: Build Linux x86_64 torch-tensorrt whl package
-    name: ${{ matrix.display-name }}
-    uses: ./.github/workflows/build_linux.yml
-    with:
-      repository: ${{ matrix.repository }}
-      ref: ""
-      test-infra-repository: pytorch/test-infra
-      test-infra-ref: main
-      build-matrix: ${{ needs.filter-matrix.outputs.matrix }}
-      pre-script: ${{ matrix.pre-script }}
-      env-var-script: ${{ matrix.env-var-script }}
-      post-script: ${{ matrix.post-script }}
-      package-name: ${{ matrix.package-name }}
-      smoke-test-script: ${{ matrix.smoke-test-script }}
-      trigger-event: ${{ github.event_name }}
-      architecture: "x86_64"
-      use-rtx: false
-      pip-install-torch-extra-args: "--extra-index-url https://pypi.org/simple"
+    filter-matrix:


+        needs: [generate-matrix]
+        outputs:
+            matrix: ${{ steps.generate.outputs.matrix }}
+        runs-on: ubuntu-latest
+        steps:
+            - uses: actions/setup-python@v6
+              with:
+                  python-version: "3.11"
+            - uses: actions/checkout@v6
+              with:
+                  repository: pytorch/tensorrt
+            - name: Generate matrix
+              id: generate
+              run: |
+                  set -eou pipefail
+                  MATRIX_BLOB=${{ toJSON(needs.generate-matrix.outputs.matrix) }}
+                  MATRIX_BLOB="$(python3 .github/scripts/filter-matrix.py --matrix "${MATRIX_BLOB}")"
+                  echo "${MATRIX_BLOB}"
+                  echo "matrix=${MATRIX_BLOB}" >> "${GITHUB_OUTPUT}"

-  L0-dynamo-converter-tests:
-    name: ${{ matrix.display-name }}
-    needs: [filter-matrix, build]
-    if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }}
-    strategy:
-      fail-fast: false
-      matrix:
-        include:
-          - repository: pytorch/tensorrt
-            package-name: torch_tensorrt
-            pre-script: packaging/pre_build_script.sh
-            post-script: packaging/post_build_script.sh
-            smoke-test-script: packaging/smoke_test_script.sh
-            display-name: L0 dynamo converter tests
-    uses: ./.github/workflows/linux-test.yml
-    with:
-      job-name: L0-dynamo-converter-tests
-      repository: "pytorch/tensorrt"
-      ref: ""
-      test-infra-repository: pytorch/test-infra
-      test-infra-ref: main
-      build-matrix: ${{ needs.filter-matrix.outputs.matrix }}
-      pre-script: ${{ matrix.pre-script }}
-      script: |
-        set -euo pipefail
-        pushd .
-        cd tests/py/dynamo
-        python -m pytest -ra -n 8 --junitxml=${RUNNER_TEST_RESULTS_DIR}/l0_dynamo_converter_tests_results.xml  --dist=loadscope --maxfail=20 conversion/
-        popd
+    build:


+        name: ${{ matrix.display-name }}
+        needs: [filter-matrix, build]
+        if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }}
+        strategy:
+            fail-fast: false
+            matrix:
+                include:
+                    - repository: pytorch/tensorrt
+                      package-name: torch_tensorrt
+                      pre-script: packaging/pre_build_script.sh
+                      post-script: packaging/post_build_script.sh
+                      smoke-test-script: packaging/smoke_test_script.sh
+                      display-name: L0 dynamo converter tests
+        uses: ./.github/workflows/linux-test.yml
+        with:
+            job-name: L0-dynamo-converter-tests
+            repository: "pytorch/tensorrt"
+            ref: ""
+            test-infra-repository: pytorch/test-infra
+            test-infra-ref: main
+            build-matrix: ${{ needs.filter-matrix.outputs.matrix }}
+            pre-script: ${{ matrix.pre-script }}
+            script: |
+                set -euo pipefail
+                pushd .
+                cd tests/py/dynamo
+                python -m pytest -ra -n 8 --junitxml=${RUNNER_TEST_RESULTS_DIR}/l0_dynamo_converter_tests_results.xml  --dist=loadscope --maxfail=20 conversion/
+                popd

-  L0-py-core-tests:
-    name: ${{ matrix.display-name }}
-    needs: [filter-matrix, build]
-    if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }}
-    strategy:
-      fail-fast: false
-      matrix:
-        include:
-          - repository: pytorch/tensorrt
-            package-name: torch_tensorrt
-            pre-script: packaging/pre_build_script.sh
-            post-script: packaging/post_build_script.sh
-            smoke-test-script: packaging/smoke_test_script.sh
-            display-name: L0 core python tests
-    uses: ./.github/workflows/linux-test.yml
-    with:
-      job-name: L0-py-core-tests
-      repository: "pytorch/tensorrt"
-      ref: ""
-      test-infra-repository: pytorch/test-infra
-      test-infra-ref: main
-      build-matrix: ${{ needs.filter-matrix.outputs.matrix }}
-      pre-script: ${{ matrix.pre-script }}
-      script: |
-        set -euo pipefail
-        pushd .
-        cd tests/py/core
-        python -m pytest -ra -n 8 --junitxml=${RUNNER_TEST_RESULTS_DIR}/l0_py_core_tests_results.xml  .
-        popd
+    L0-dynamo-core-tests:


+        name: ${{ matrix.display-name }}
+        needs: [filter-matrix, build]
+        if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }}
+        strategy:
+            fail-fast: false
+            matrix:
+                include:
+                    - repository: pytorch/tensorrt
+                      package-name: torch_tensorrt
+                      pre-script: packaging/pre_build_script.sh
+                      post-script: packaging/post_build_script.sh
+                      smoke-test-script: packaging/smoke_test_script.sh
+                      display-name: L0 dynamo core tests
+        uses: ./.github/workflows/linux-test.yml
+        with:
+            job-name: L0-dynamo-core-tests
+            repository: "pytorch/tensorrt"
+            ref: ""
+            test-infra-repository: pytorch/test-infra
+            test-infra-ref: main
+            build-matrix: ${{ needs.filter-matrix.outputs.matrix }}
+            pre-script: ${{ matrix.pre-script }}
+            script: |
+                set -euo pipefail
+                pushd .
+                cd tests/py
+                cd dynamo
+                python -m pytest -ra -n 8 --junitxml=${RUNNER_TEST_RESULTS_DIR}/l0_dynamo_core_runtime_tests_results.xml runtime/test_000_*
+                python -m pytest -ra -n 8 --junitxml=${RUNNER_TEST_RESULTS_DIR}/l0_dynamo_core_partitioning_tests_results.xml partitioning/test_000_*
+                python -m pytest -ra -n 8 --junitxml=${RUNNER_TEST_RESULTS_DIR}/l0_dynamo_core_lowering_tests_results.xml lowering/
+                popd

-  L0-torchscript-tests:
-    name: ${{ matrix.display-name }}
-    needs: [filter-matrix, build]
-    if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }}
-    strategy:
-      fail-fast: false
-      matrix:
-        include:
-          - repository: pytorch/tensorrt
-            package-name: torch_tensorrt
-            pre-script: packaging/pre_build_script.sh
-            post-script: packaging/post_build_script.sh
-            smoke-test-script: packaging/smoke_test_script.sh
-            display-name: L0 torchscript tests
-    uses: ./.github/workflows/linux-test.yml
-    with:
-      job-name: L0-torchscript-tests
-      repository: "pytorch/tensorrt"
-      ref: ""
-      test-infra-repository: pytorch/test-infra
-      test-infra-ref: main
-      build-matrix: ${{ needs.filter-matrix.outputs.matrix }}
-      pre-script: ${{ matrix.pre-script }}
-      script: |
-        set -euo pipefail
-        pushd .
-        cd tests/modules
-        python hub.py
-        popd
-        pushd .
-        cd tests/py/ts
-        python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/l0_ts_api_tests_results.xml   api/
-        popd
+    L0-py-core-tests:


+        name: ${{ matrix.display-name }}
+        needs: [filter-matrix, build]
+        if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }}
+        strategy:
+            fail-fast: false
+            matrix:
+                include:
+                    - repository: pytorch/tensorrt
+                      package-name: torch_tensorrt
+                      pre-script: packaging/pre_build_script.sh
+                      post-script: packaging/post_build_script.sh
+                      smoke-test-script: packaging/smoke_test_script.sh
+                      display-name: L0 core python tests
+        uses: ./.github/workflows/linux-test.yml
+        with:
+            job-name: L0-py-core-tests
+            repository: "pytorch/tensorrt"
+            ref: ""
+            test-infra-repository: pytorch/test-infra
+            test-infra-ref: main
+            build-matrix: ${{ needs.filter-matrix.outputs.matrix }}
+            pre-script: ${{ matrix.pre-script }}
+            script: |
+                set -euo pipefail
+                pushd .
+                cd tests/py/core
+                python -m pytest -ra -n 8 --junitxml=${RUNNER_TEST_RESULTS_DIR}/l0_py_core_tests_results.xml  .
+                popd

-  L1-dynamo-core-tests:
-    name: ${{ matrix.display-name }}
-    needs: [filter-matrix, build, L0-dynamo-converter-tests, L0-dynamo-core-tests, L0-py-core-tests, L0-torchscript-tests]
-    if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }}
-    strategy:
-      fail-fast: false
-      matrix:
-        include:
-          - repository: pytorch/tensorrt
-            package-name: torch_tensorrt
-            pre-script: packaging/pre_build_script.sh
-            post-script: packaging/post_build_script.sh
-            smoke-test-script: packaging/smoke_test_script.sh
-            display-name: L1 dynamo core tests
-    uses: ./.github/workflows/linux-test.yml
-    with:
-      job-name: L1-dynamo-core-tests
-      repository: "pytorch/tensorrt"
-      ref: ""
-      test-infra-repository: pytorch/test-infra
-      test-infra-ref: main
-      build-matrix: ${{ needs.filter-matrix.outputs.matrix }}
-      pre-script: ${{ matrix.pre-script }}
-      script: |
-        set -euo pipefail
-        pushd .
-        cd tests/py/dynamo
-        python -m pytest -ra -n 8 --junitxml=${RUNNER_TEST_RESULTS_DIR}/l1_dynamo_core_tests_results.xml  runtime/test_001_*
-        python -m pytest -ra -n 8 --junitxml=${RUNNER_TEST_RESULTS_DIR}/l1_dynamo_core_partitioning_tests_results.xml partitioning/test_001_*
+    L0-torchscript-tests:


+        name: ${{ matrix.display-name }}
+        needs:
+            [
+                filter-matrix,
+                build,
+                L1-dynamo-compile-tests,
+                L1-dynamo-core-tests,
+                L1-torch-compile-tests,
+                L1-torchscript-tests,
+            ]
+        if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }}
+        strategy:
+            fail-fast: false
+            matrix:
+                include:
+                    - repository: pytorch/tensorrt
+                      package-name: torch_tensorrt
+                      pre-script: packaging/pre_build_script.sh
+                      post-script: packaging/post_build_script.sh
+                      smoke-test-script: packaging/smoke_test_script.sh
+                      display-name: L2 dynamo compile tests
+        uses: ./.github/workflows/linux-test.yml
+        with:
+            job-name: L2-dynamo-compile-tests
+            repository: "pytorch/tensorrt"
+            ref: ""
+            test-infra-repository: pytorch/test-infra
+            test-infra-ref: main
+            build-matrix: ${{ needs.filter-matrix.outputs.matrix }}
+            pre-script: ${{ matrix.pre-script }}
+            script: |
+                set -euo pipefail
+                pushd .
+                cd tests/py/dynamo/
+                python -m pytest -m "not critical" -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/l2_dynamo_compile_tests_results.xml models/
+                python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/l2_dynamo_compile_llm_tests_results.xml llm/
+                popd

-  L2-dynamo-plugin-tests:
-    name: ${{ matrix.display-name }}
-    needs: [filter-matrix, build, L1-dynamo-core-tests, L1-dynamo-compile-tests, L1-torch-compile-tests, L1-torchscript-tests]
-    if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }}
-    strategy:
-      fail-fast: false
-      matrix:
-        include:
-          - repository: pytorch/tensorrt
-            package-name: torch_tensorrt
-            pre-script: packaging/pre_build_script.sh
-            post-script: packaging/post_build_script.sh
-            smoke-test-script: packaging/smoke_test_script.sh
-            display-name: L2 dynamo plugin tests
-    uses: ./.github/workflows/linux-test.yml
-    with:
-      job-name: L2-dynamo-plugin-tests
-      repository: "pytorch/tensorrt"
-      ref: ""
-      test-infra-repository: pytorch/test-infra
-      test-infra-ref: main
-      build-matrix: ${{ needs.filter-matrix.outputs.matrix }}
-      pre-script: ${{ matrix.pre-script }}
-      script: |
-        set -euo pipefail
-        pushd .
-        cd tests/py/dynamo
-        python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/dynamo_converters_test_results.xml -n 4 conversion/
-        python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/dynamo_converters_test_results.xml automatic_plugin/test_automatic_plugin.py
-        python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/dynamo_converters_test_results.xml automatic_plugin/test_automatic_plugin_with_attrs.py
-        python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/dynamo_converters_test_results.xml automatic_plugin/test_flashinfer_rmsnorm.py
-        popd
+    L2-dynamo-core-tests:


+        name: ${{ matrix.display-name }}
+        needs:
+            [
+                filter-matrix,
+                build,
+                L1-dynamo-core-tests,
+                L1-dynamo-compile-tests,
+                L1-torch-compile-tests,
+                L1-torchscript-tests,
+            ]
+        if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }}
+        strategy:
+            fail-fast: false
+            matrix:
+                include:
+                    - repository: pytorch/tensorrt
+                      package-name: torch_tensorrt
+                      pre-script: packaging/pre_build_script.sh
+                      post-script: packaging/post_build_script.sh
+                      smoke-test-script: packaging/smoke_test_script.sh
+                      display-name: L2 dynamo core tests
+        uses: ./.github/workflows/linux-test.yml
+        with:
+            job-name: L2-dynamo-core-tests
+            repository: "pytorch/tensorrt"
+            ref: ""
+            test-infra-repository: pytorch/test-infra
+            test-infra-ref: main
+            build-matrix: ${{ needs.filter-matrix.outputs.matrix }}
+            pre-script: ${{ matrix.pre-script }}
+            script: |
+                set -euo pipefail
+                pushd .
+                cd tests/py/dynamo
+                python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/l2_dynamo_core_tests_results.xml  -k "not test_000_ and not test_001_" runtime/*
+                popd

-  L2-torchscript-tests:
-    name: ${{ matrix.display-name }}
-    needs: [filter-matrix, build, L1-dynamo-core-tests, L1-dynamo-compile-tests, L1-torch-compile-tests, L1-torchscript-tests]
-    if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }}
-    strategy:
-      fail-fast: false
-      matrix:
-        include:
-          - repository: pytorch/tensorrt
-            package-name: torch_tensorrt
-            pre-script: packaging/pre_build_script.sh
-            post-script: packaging/post_build_script.sh
-            smoke-test-script: packaging/smoke_test_script.sh
-            display-name: L2 torch script tests
-    uses: ./.github/workflows/linux-test.yml
-    with:
-      job-name: L2-torchscript-tests
-      repository: "pytorch/tensorrt"
-      ref: ""
-      test-infra-repository: pytorch/test-infra
-      test-infra-ref: main
-      build-matrix: ${{ needs.filter-matrix.outputs.matrix }}
-      pre-script: ${{ matrix.pre-script }}
-      script: |
-        set -euo pipefail
-        pushd .
-        cd tests/modules
-        python hub.py
-        popd
-        pushd .
-        cd tests/py/ts
-        python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/l2_ts_integrations_tests_results.xml   integrations/
-        popd
+    L2-dynamo-plugin-tests:


+        name: ${{ matrix.display-name }}
+        needs:
+            [
+                filter-matrix,
+                build,
+                L1-dynamo-core-tests,
+                L1-dynamo-compile-tests,
+                L1-torch-compile-tests,
+                L1-torchscript-tests,
+            ]
+        if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }}
+        strategy:
+            fail-fast: false
+            matrix:
+                include:
+                    - repository: pytorch/tensorrt
+                      package-name: torch_tensorrt
+                      pre-script: packaging/pre_build_script.sh
+                      post-script: packaging/post_build_script.sh
+                      smoke-test-script: packaging/smoke_test_script.sh
+                      display-name: L2 dynamo plugin tests
+        uses: ./.github/workflows/linux-test.yml
+        with:
+            job-name: L2-dynamo-plugin-tests
+            repository: "pytorch/tensorrt"
+            ref: ""
+            test-infra-repository: pytorch/test-infra
+            test-infra-ref: main
+            build-matrix: ${{ needs.filter-matrix.outputs.matrix }}
+            pre-script: ${{ matrix.pre-script }}
+            script: |
+                set -euo pipefail
+                pushd .
+                cd tests/py/dynamo
+                python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/dynamo_converters_test_results.xml -n 4 conversion/
+                python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/dynamo_converters_test_results.xml automatic_plugin/test_automatic_plugin.py
+                python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/dynamo_converters_test_results.xml automatic_plugin/test_automatic_plugin_with_attrs.py
+                python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/dynamo_converters_test_results.xml automatic_plugin/test_flashinfer_rmsnorm.py
+                popd
+
+    L2-torchscript-tests:


+        name: ${{ matrix.display-name }}
+        needs:
+            [
+                filter-matrix,
+                build,
+                L1-dynamo-core-tests,
+                L1-dynamo-compile-tests,
+                L1-torch-compile-tests,
+                L1-torchscript-tests,
+            ]
+        if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }}
+        strategy:
+            fail-fast: false
+            matrix:
+                include:
+                    - repository: pytorch/tensorrt
+                      package-name: torch_tensorrt
+                      pre-script: packaging/pre_build_script.sh
+                      post-script: packaging/post_build_script.sh
+                      smoke-test-script: packaging/smoke_test_script.sh
+                      display-name: L2 torch script tests
+        uses: ./.github/workflows/linux-test.yml
+        with:
+            job-name: L2-torchscript-tests
+            repository: "pytorch/tensorrt"
+            ref: ""
+            test-infra-repository: pytorch/test-infra
+            test-infra-ref: main
+            build-matrix: ${{ needs.filter-matrix.outputs.matrix }}
+            pre-script: ${{ matrix.pre-script }}
+            script: |
+                set -euo pipefail
+                pushd .
+                cd tests/modules
+                python hub.py
+                popd
+                pushd .
+                cd tests/py/ts
+                python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/l2_ts_integrations_tests_results.xml   integrations/
+                popd

-  L2-dynamo-distributed-tests:
-    name: ${{ matrix.display-name }}
-    needs: [filter-matrix, build, L1-dynamo-core-tests, L1-dynamo-compile-tests, L1-torch-compile-tests, L1-torchscript-tests]
-    if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }}
-    strategy:
-      fail-fast: false
-      matrix:
-        include:
-          - repository: pytorch/tensorrt
-            package-name: torch_tensorrt
-            pre-script: packaging/pre_build_script.sh
-            post-script: packaging/post_build_script.sh
-            smoke-test-script: packaging/smoke_test_script.sh
-            display-name: L2 dynamo distributed tests
-    uses: ./.github/workflows/linux-test.yml
-    with:
-      job-name: L2-dynamo-distributed-tests
-      repository: "pytorch/tensorrt"
-      ref: ""
-      test-infra-repository: pytorch/test-infra
-      test-infra-ref: main
-      build-matrix: ${{ needs.filter-matrix.outputs.matrix }}
-      pre-script: ${{ matrix.pre-script }}
-      script: |
-        set -euo pipefail
-        export USE_HOST_DEPS=1
-        export CI_BUILD=1
-        export USE_TRTLLM_PLUGINS=1
-        dnf install -y mpich mpich-devel openmpi openmpi-devel
-        pushd .
-        cd tests/py
-        cd dynamo
-        python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/l2_dynamo_distributed_test_results.xml distributed/test_nccl_ops.py
-        popd
+    L2-dynamo-distributed-tests:


github-actions

There are some changes that do not conform to C++ style guidelines:

diff --git a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp b/tmp/changes.txt
index 4b91415..ae5232b 100644
--- a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp
+++ b/tmp/changes.txt
@@ -573,14 +573,16 @@ bool TRTEngine::bind_nccl_comm() {
    } else if (nccl_groups.size() > 1) {
      std::string names;
      for (const auto& n : nccl_groups) {
-        if (!names.empty()) names += ", ";
+        if (!names.empty())
+          names += ", ";
        names += "'" + n + "'";
      }
      LOG_WARNING(
          "This TRT engine requires NCCL but multiple NCCL process groups are registered ("
-          << names << "). Cannot auto-select a group — NCCL bind deferred. "
-          "Use the recommended workflow: "
-          "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
+          << names
+          << "). Cannot auto-select a group — NCCL bind deferred. "
+             "Use the recommended workflow: "
+             "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
    } else {
      LOG_WARNING(
          "This TRT engine requires NCCL (requires_native_multidevice=true) but no NCCL process group "
ERROR: Some files do not conform to style guidelines

github-actions

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/tests/py/dynamo/distributed/test_export_save_load.py	2026-04-21 18:14:22.530281+00:00
+++ /home/runner/work/TensorRT/TensorRT/tests/py/dynamo/distributed/test_export_save_load.py	2026-04-21 18:14:47.015206+00:00
@@ -299,11 +299,14 @@

    path = rank_path(save_dir, rank, world_size)
    loaded = torch_tensorrt.load(path)
    loaded_model = loaded.module()

-    with distributed_context(dist.group.WORLD, [loaded_model, compiled_model]) as (lm, cm):
+    with distributed_context(dist.group.WORLD, [loaded_model, compiled_model]) as (
+        lm,
+        cm,
+    ):
        with torch.no_grad():
            loaded_output = lm(inp)
            compiled_output = cm(inp)

    diff = float((compiled_output - loaded_output).abs().max())

github-actions

There are some changes that do not conform to C++ style guidelines:

diff --git a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp b/tmp/changes.txt
index 4b91415..ae5232b 100644
--- a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp
+++ b/tmp/changes.txt
@@ -573,14 +573,16 @@ bool TRTEngine::bind_nccl_comm() {
    } else if (nccl_groups.size() > 1) {
      std::string names;
      for (const auto& n : nccl_groups) {
-        if (!names.empty()) names += ", ";
+        if (!names.empty())
+          names += ", ";
        names += "'" + n + "'";
      }
      LOG_WARNING(
          "This TRT engine requires NCCL but multiple NCCL process groups are registered ("
-          << names << "). Cannot auto-select a group — NCCL bind deferred. "
-          "Use the recommended workflow: "
-          "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
+          << names
+          << "). Cannot auto-select a group — NCCL bind deferred. "
+             "Use the recommended workflow: "
+             "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
    } else {
      LOG_WARNING(
          "This TRT engine requires NCCL (requires_native_multidevice=true) but no NCCL process group "
ERROR: Some files do not conform to style guidelines

github-actions

There are some changes that do not conform to C++ style guidelines:

diff --git a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp b/tmp/changes.txt
index 4b91415..ae5232b 100644
--- a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp
+++ b/tmp/changes.txt
@@ -573,14 +573,16 @@ bool TRTEngine::bind_nccl_comm() {
    } else if (nccl_groups.size() > 1) {
      std::string names;
      for (const auto& n : nccl_groups) {
-        if (!names.empty()) names += ", ";
+        if (!names.empty())
+          names += ", ";
        names += "'" + n + "'";
      }
      LOG_WARNING(
          "This TRT engine requires NCCL but multiple NCCL process groups are registered ("
-          << names << "). Cannot auto-select a group — NCCL bind deferred. "
-          "Use the recommended workflow: "
-          "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
+          << names
+          << "). Cannot auto-select a group — NCCL bind deferred. "
+             "Use the recommended workflow: "
+             "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
    } else {
      LOG_WARNING(
          "This TRT engine requires NCCL (requires_native_multidevice=true) but no NCCL process group "
ERROR: Some files do not conform to style guidelines

github-actions

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/tests/py/dynamo/distributed/test_export_save_load.py	2026-04-21 19:29:54.973984+00:00
+++ /home/runner/work/TensorRT/TensorRT/tests/py/dynamo/distributed/test_export_save_load.py	2026-04-21 19:30:16.927307+00:00
@@ -299,11 +299,14 @@

    path = rank_path(save_dir, rank, world_size)
    loaded = torch_tensorrt.load(path)
    loaded_model = loaded.module()

-    with distributed_context(dist.group.WORLD, [loaded_model, compiled_model]) as (lm, cm):
+    with distributed_context(dist.group.WORLD, [loaded_model, compiled_model]) as (
+        lm,
+        cm,
+    ):
        with torch.no_grad():
            loaded_output = lm(inp)
            compiled_output = cm(inp)

    diff = float((compiled_output - loaded_output).abs().max())

+        name: ${{ matrix.display-name }}
+        needs:
+            [
+                filter-matrix,
+                build,
+                L1-dynamo-core-tests,
+                L1-dynamo-compile-tests,
+                L1-torch-compile-tests,
+                L1-torchscript-tests,
+            ]
+        if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }}
+        strategy:
+            fail-fast: false
+            matrix:
+                include:
+                    - repository: pytorch/tensorrt
+                      package-name: torch_tensorrt
+                      pre-script: packaging/pre_build_script.sh
+                      post-script: packaging/post_build_script.sh
+                      smoke-test-script: packaging/smoke_test_script.sh
+                      display-name: L2 dynamo distributed tests
+        uses: ./.github/workflows/linux-test.yml
+        with:
+            job-name: L2-dynamo-distributed-tests
+            repository: "pytorch/tensorrt"
+            ref: ""
+            test-infra-repository: pytorch/test-infra
+            test-infra-ref: main
+            build-matrix: ${{ needs.filter-matrix.outputs.matrix }}
+            pre-script: ${{ matrix.pre-script }}
+            runner: linux.g4dn.12xlarge.nvidia.gpu
+            script: |
+                set -euo pipefail
+                export USE_HOST_DEPS=1
+                export CI_BUILD=1
+                export USE_TRTLLM_PLUGINS=1
+                dnf install -y mpich mpich-devel openmpi openmpi-devel
+                pushd .
+                cd tests/py
+                cd dynamo
+                python -m pytest -ra -v --junitxml=${RUNNER_TEST_RESULTS_DIR}/l2_dynamo_distributed_test_results.xml \
+                  distributed/test_nccl_ops.py \
+                  distributed/test_native_nccl.py \
+                  distributed/test_export_save_load.py
+                python -m torch_tensorrt.distributed.run --nproc_per_node=2 distributed/test_native_nccl.py --multirank
+                python -m torch_tensorrt.distributed.run --nproc_per_node=2 distributed/test_export_save_load.py --multirank
+                popd


…istributed if its available

apbose and others added 11 commits April 12, 2026 11:41

removing the try-except block in TRTengine.cpp and correcting the typis

aaa6557

Redesign distributed inference API: auto-detect rank, lazy NCCL setup…

4c1e68d

…, ABI v9

remove nccl.h dependancy

7cfa40b

clean up import and add comment

ac96255

moving setup_nccl_library call to example script

fe1c6f4

work on the save/load export part-add is_md flag, guard export tracin…

b658c7a

…g and enable DTensor decomposition

refactor: Adjusting how we use NCCL

a35dfe6

test: add torch.compile(backend='tensorrt') integration test for Llam…

2aa8f14

…a 3.2 1B

feat: llama3.2 working with MD-TRT

6f81a66

meta-cla Bot added the cla signed label Apr 12, 2026

github-actions Bot requested a review from zewenli98 April 12, 2026 19:09

This comment was marked as outdated.

Sign in to view

narendasan force-pushed the push-vqqzkszwrvyx branch from 67134da to b5b1f5f Compare April 12, 2026 19:15

This comment was marked as outdated.

Sign in to view

narendasan force-pushed the push-vqqzkszwrvyx branch from b5b1f5f to 1957cc4 Compare April 12, 2026 19:18

github-actions Bot requested changes Apr 21, 2026

View reviewed changes

chore: skip test which is not valid ATen attn

fe59779

narendasan force-pushed the push-vqqzkszwrvyx branch from 9d95aa9 to fe59779 Compare April 21, 2026 00:26

narendasan added 2 commits April 20, 2026 18:38

finalizing internal apis

fdfd45a

fix: align apis, make sure to defer binding unless there is one obvio…

f9bd6a4

…us option

github-actions Bot requested changes Apr 21, 2026

View reviewed changes

test: make bert test backwards compatible

34ccf1d

github-actions Bot requested changes Apr 21, 2026

View reviewed changes

fix: address some issues in the converters

6379b06

github-actions Bot requested changes Apr 21, 2026

View reviewed changes

github-advanced-security AI found potential problems Apr 21, 2026

View reviewed changes

narendasan force-pushed the push-vqqzkszwrvyx branch from 9010576 to 3b62346 Compare April 21, 2026 16:47

github-actions Bot requested changes Apr 21, 2026

View reviewed changes

narendasan force-pushed the push-vqqzkszwrvyx branch from 3b62346 to c3dcb1c Compare April 21, 2026 18:08

github-actions Bot requested changes Apr 21, 2026

View reviewed changes

narendasan force-pushed the push-vqqzkszwrvyx branch from c3dcb1c to b565101 Compare April 21, 2026 19:29

github-actions Bot requested changes Apr 21, 2026

View reviewed changes

github-advanced-security AI found potential problems Apr 21, 2026

View reviewed changes

fix: we should inherit information about the device mesh from torch d…

25537bf

…istributed if its available

narendasan force-pushed the push-vqqzkszwrvyx branch 3 times, most recently from 28b53a3 to 0525934 Compare April 22, 2026 14:33

fix: address non MD-TRT build torch bind

2f05500

narendasan force-pushed the push-vqqzkszwrvyx branch from 0525934 to 2f05500 Compare April 22, 2026 15:15

narendasan merged commit 0b274f0 into main Apr 22, 2026
81 of 86 checks passed

narendasan deleted the push-vqqzkszwrvyx branch April 22, 2026 17:43

Conversation

narendasan commented Apr 12, 2026

Description

Type of change

Checklist:

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants