Skip to content

MD-TRT Support, Compile/Export, C++ and Python #4183

Merged
narendasan merged 32 commits intomainfrom
push-vqqzkszwrvyx
Apr 22, 2026
Merged

MD-TRT Support, Compile/Export, C++ and Python #4183
narendasan merged 32 commits intomainfrom
push-vqqzkszwrvyx

Conversation

@narendasan
Copy link
Copy Markdown
Collaborator

Description

Opening this to test the CI

Fixes # (issue)

Type of change

Please delete options that are not relevant and/or add your own.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Checklist:

  • My code follows the style guidelines of this project (You can use the linters)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas and hacks
  • I have made corresponding changes to the documentation
  • I have added tests to verify my fix or my feature
  • New and existing unit tests pass locally with my changes
  • I have added the relevant labels to my PR in so that relevant reviewers are notified

apbose and others added 11 commits April 12, 2026 11:41
- C++ runtime: NCCL communicator init via c10d, rank/world_size serialization, DynamicOutputAllocator, ABI version bump to 8
- Python runtime: distributed support in PythonTorchTensorRTModule and TorchTensorRTModule, NCCL library auto-detection
- Conversion: native TRT DistCollective API (AllGather, ReduceScatter, AllReduce) with TRT-LLM plugin fallback
- Graph lowering: fuse c10d_functional collectives + wait_tensor into single ops
- Feature detection: native_trt_collectives flag, platform validation, graceful fallback chain
- Build: conditional NCCL compilation via torch_nccl toolchain
- Examples: tensor_parallel_simple_example.py, tensor_parallel_llama_llm.py
…hapes

Five interconnected fixes:

1. fold_get_attr_item_calls: fold scalar param .item() calls into Python
   scalars before AOT tracing. Inside FakeTensorMode, even real-tensor
   .item() calls raise DataDependentOutputException.

2. backends.py: three changes:
   - call fold_get_attr_item_calls before entering FakeTensorMode
   - detect vmap/higher-order ops and route them through aot_autograd
     instead of aot_export_joint_simple (which doesn't handle HOPs)
   - on TRT build failure, strip TRT-only kwargs (use_fp32_acc) from
     the fallback graph before returning it to PyTorch

3. _decompositions.py: prevent SDPA from leaking back into the decomp
   table via Core ATen Interchange ops even after being removed from
   TORCH_TRT_DECOMPOSITIONS.

4. partitioning/common.py: lower the default max dynamic shape from
   min*2^16 to min*2^12 — 65536 is too large for TRT to find kernel
   implementations for attention ops.

5. _TorchTensorRTModule.py: move CPU scalar inputs to CUDA before
   execution — aot_autograd lifts scalar attributes (e.g. head_dim^-0.5)
   as explicit graph inputs; TRT requires all inputs on CUDA.

Also fixes remove_sym_nodes to match tensor sources by equality rather
than local_name so that GetItemSource bases (from torch.compile
dynamic=True) are matched correctly, and updates register_sdpa.py to
handle aten.scaled_dot_product_attention.default (the form produced after
aot_autograd) in addition to the flash/efficient variants.
@meta-cla meta-cla Bot added the cla signed label Apr 12, 2026
@github-actions github-actions Bot added documentation Improvements or additions to documentation component: tests Issues re: Tests component: lowering Issues re: The lowering / preprocessing passes component: conversion Issues re: Conversion stage component: core Issues re: The core compiler component: converters Issues re: Specific op converters component: build system Issues re: Build system component: api [Python] Issues re: Python API component: runtime component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths component: torch_compile labels Apr 12, 2026
@github-actions github-actions Bot requested a review from zewenli98 April 12, 2026 19:09
github-actions[bot]

This comment was marked as outdated.

github-actions[bot]

This comment was marked as outdated.

github-actions[bot]

This comment was marked as outdated.

github-actions[bot]

This comment was marked as outdated.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some changes that do not conform to C++ style guidelines:

diff --git a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.h b/tmp/changes.txt
index cd8af65..615600d 100644
--- a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.h
+++ b/tmp/changes.txt
@@ -33,17 +33,17 @@ namespace core {
namespace runtime {

using FlattenedState = std::tuple<
-    std::tuple<std::string, std::string>,  // ABI_VERSION
-    std::tuple<std::string, std::string>,  // name
-    std::tuple<std::string, std::string>,  // device
-    std::tuple<std::string, std::string>,  // engine
-    std::tuple<std::string, std::string>,  // input binding names
-    std::tuple<std::string, std::string>,  // output binding names
-    std::tuple<std::string, std::string>,  // HW compatibility
-    std::tuple<std::string, std::string>,  // requires_output_allocator
-    std::tuple<std::string, std::string>,  // serialized metadata
-    std::tuple<std::string, std::string>,  // Platform
-    std::tuple<std::string, std::string>,  // Resource Allocation Strategy
+    std::tuple<std::string, std::string>, // ABI_VERSION
+    std::tuple<std::string, std::string>, // name
+    std::tuple<std::string, std::string>, // device
+    std::tuple<std::string, std::string>, // engine
+    std::tuple<std::string, std::string>, // input binding names
+    std::tuple<std::string, std::string>, // output binding names
+    std::tuple<std::string, std::string>, // HW compatibility
+    std::tuple<std::string, std::string>, // requires_output_allocator
+    std::tuple<std::string, std::string>, // serialized metadata
+    std::tuple<std::string, std::string>, // Platform
+    std::tuple<std::string, std::string>, // Resource Allocation Strategy
    std::tuple<std::string, std::string>>; // requires_multidevice

struct TorchTRTRuntimeStates {
ERROR: Some files do not conform to style guidelines

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/runtime/_PythonTorchTensorRTModule.py	2026-04-21 00:20:02.596692+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/runtime/_PythonTorchTensorRTModule.py	2026-04-21 00:20:22.271000+00:00
@@ -386,11 +386,10 @@
                logger.debug(
                    "Barrier after execution context creation (distributed NCCL engine)"
                )
                dist.barrier()

-
        if ENABLED_FEATURES.tensorrt_rtx:
            self._setup_runtime_config()

        self.context = self._create_context()
        assert self.context is not None, "Failed to create execution context"

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some changes that do not conform to C++ style guidelines:

diff --git a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp b/tmp/changes.txt
index 4b91415..ae5232b 100644
--- a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp
+++ b/tmp/changes.txt
@@ -573,14 +573,16 @@ bool TRTEngine::bind_nccl_comm() {
    } else if (nccl_groups.size() > 1) {
      std::string names;
      for (const auto& n : nccl_groups) {
-        if (!names.empty()) names += ", ";
+        if (!names.empty())
+          names += ", ";
        names += "'" + n + "'";
      }
      LOG_WARNING(
          "This TRT engine requires NCCL but multiple NCCL process groups are registered ("
-          << names << "). Cannot auto-select a group — NCCL bind deferred. "
-          "Use the recommended workflow: "
-          "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
+          << names
+          << "). Cannot auto-select a group — NCCL bind deferred. "
+             "Use the recommended workflow: "
+             "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
    } else {
      LOG_WARNING(
          "This TRT engine requires NCCL (requires_native_multidevice=true) but no NCCL process group "
ERROR: Some files do not conform to style guidelines

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some changes that do not conform to C++ style guidelines:

diff --git a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp b/tmp/changes.txt
index 4b91415..ae5232b 100644
--- a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp
+++ b/tmp/changes.txt
@@ -573,14 +573,16 @@ bool TRTEngine::bind_nccl_comm() {
    } else if (nccl_groups.size() > 1) {
      std::string names;
      for (const auto& n : nccl_groups) {
-        if (!names.empty()) names += ", ";
+        if (!names.empty())
+          names += ", ";
        names += "'" + n + "'";
      }
      LOG_WARNING(
          "This TRT engine requires NCCL but multiple NCCL process groups are registered ("
-          << names << "). Cannot auto-select a group — NCCL bind deferred. "
-          "Use the recommended workflow: "
-          "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
+          << names
+          << "). Cannot auto-select a group — NCCL bind deferred. "
+             "Use the recommended workflow: "
+             "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
    } else {
      LOG_WARNING(
          "This TRT engine requires NCCL (requires_native_multidevice=true) but no NCCL process group "
ERROR: Some files do not conform to style guidelines

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some changes that do not conform to C++ style guidelines:

diff --git a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp b/tmp/changes.txt
index 4b91415..ae5232b 100644
--- a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp
+++ b/tmp/changes.txt
@@ -573,14 +573,16 @@ bool TRTEngine::bind_nccl_comm() {
    } else if (nccl_groups.size() > 1) {
      std::string names;
      for (const auto& n : nccl_groups) {
-        if (!names.empty()) names += ", ";
+        if (!names.empty())
+          names += ", ";
        names += "'" + n + "'";
      }
      LOG_WARNING(
          "This TRT engine requires NCCL but multiple NCCL process groups are registered ("
-          << names << "). Cannot auto-select a group — NCCL bind deferred. "
-          "Use the recommended workflow: "
-          "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
+          << names
+          << "). Cannot auto-select a group — NCCL bind deferred. "
+             "Use the recommended workflow: "
+             "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
    } else {
      LOG_WARNING(
          "This TRT engine requires NCCL (requires_native_multidevice=true) but no NCCL process group "
ERROR: Some files do not conform to style guidelines

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some changes that do not conform to C++ style guidelines:

diff --git a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp b/tmp/changes.txt
index 4b91415..ae5232b 100644
--- a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp
+++ b/tmp/changes.txt
@@ -573,14 +573,16 @@ bool TRTEngine::bind_nccl_comm() {
    } else if (nccl_groups.size() > 1) {
      std::string names;
      for (const auto& n : nccl_groups) {
-        if (!names.empty()) names += ", ";
+        if (!names.empty())
+          names += ", ";
        names += "'" + n + "'";
      }
      LOG_WARNING(
          "This TRT engine requires NCCL but multiple NCCL process groups are registered ("
-          << names << "). Cannot auto-select a group — NCCL bind deferred. "
-          "Use the recommended workflow: "
-          "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
+          << names
+          << "). Cannot auto-select a group — NCCL bind deferred. "
+             "Use the recommended workflow: "
+             "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
    } else {
      LOG_WARNING(
          "This TRT engine requires NCCL (requires_native_multidevice=true) but no NCCL process group "
ERROR: Some files do not conform to style guidelines

Comment on lines +18 to +27
uses: pytorch/test-infra/.github/workflows/generate_binary_build_matrix.yml@main
with:
repository: pytorch/tensorrt
- name: Generate matrix
id: generate
run: |
set -eou pipefail
MATRIX_BLOB=${{ toJSON(needs.generate-matrix.outputs.matrix) }}
MATRIX_BLOB="$(python3 .github/scripts/filter-matrix.py --matrix "${MATRIX_BLOB}")"
echo "${MATRIX_BLOB}"
echo "matrix=${MATRIX_BLOB}" >> "${GITHUB_OUTPUT}"
package-type: wheel
os: linux
test-infra-repository: pytorch/test-infra
test-infra-ref: main
with-rocm: false
with-cpu: false

build:
needs: filter-matrix
permissions:
id-token: write
contents: read
strategy:
fail-fast: false
matrix:
include:
- repository: pytorch/tensorrt
pre-script: packaging/pre_build_script.sh
env-var-script: packaging/env_vars.txt
post-script: packaging/post_build_script.sh
smoke-test-script: packaging/smoke_test_script.sh
package-name: torch_tensorrt
display-name: Build Linux x86_64 torch-tensorrt whl package
name: ${{ matrix.display-name }}
uses: ./.github/workflows/build_linux.yml
with:
repository: ${{ matrix.repository }}
ref: ""
test-infra-repository: pytorch/test-infra
test-infra-ref: main
build-matrix: ${{ needs.filter-matrix.outputs.matrix }}
pre-script: ${{ matrix.pre-script }}
env-var-script: ${{ matrix.env-var-script }}
post-script: ${{ matrix.post-script }}
package-name: ${{ matrix.package-name }}
smoke-test-script: ${{ matrix.smoke-test-script }}
trigger-event: ${{ github.event_name }}
architecture: "x86_64"
use-rtx: false
pip-install-torch-extra-args: "--extra-index-url https://pypi.org/simple"
filter-matrix:
Comment on lines +28 to +48
needs: [generate-matrix]
outputs:
matrix: ${{ steps.generate.outputs.matrix }}
runs-on: ubuntu-latest
steps:
- uses: actions/setup-python@v6
with:
python-version: "3.11"
- uses: actions/checkout@v6
with:
repository: pytorch/tensorrt
- name: Generate matrix
id: generate
run: |
set -eou pipefail
MATRIX_BLOB=${{ toJSON(needs.generate-matrix.outputs.matrix) }}
MATRIX_BLOB="$(python3 .github/scripts/filter-matrix.py --matrix "${MATRIX_BLOB}")"
echo "${MATRIX_BLOB}"
echo "matrix=${MATRIX_BLOB}" >> "${GITHUB_OUTPUT}"

L0-dynamo-converter-tests:
name: ${{ matrix.display-name }}
needs: [filter-matrix, build]
if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }}
strategy:
fail-fast: false
matrix:
include:
- repository: pytorch/tensorrt
package-name: torch_tensorrt
pre-script: packaging/pre_build_script.sh
post-script: packaging/post_build_script.sh
smoke-test-script: packaging/smoke_test_script.sh
display-name: L0 dynamo converter tests
uses: ./.github/workflows/linux-test.yml
with:
job-name: L0-dynamo-converter-tests
repository: "pytorch/tensorrt"
ref: ""
test-infra-repository: pytorch/test-infra
test-infra-ref: main
build-matrix: ${{ needs.filter-matrix.outputs.matrix }}
pre-script: ${{ matrix.pre-script }}
script: |
set -euo pipefail
pushd .
cd tests/py/dynamo
python -m pytest -ra -n 8 --junitxml=${RUNNER_TEST_RESULTS_DIR}/l0_dynamo_converter_tests_results.xml --dist=loadscope --maxfail=20 conversion/
popd
build:
Comment on lines +83 to +112
name: ${{ matrix.display-name }}
needs: [filter-matrix, build]
if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }}
strategy:
fail-fast: false
matrix:
include:
- repository: pytorch/tensorrt
package-name: torch_tensorrt
pre-script: packaging/pre_build_script.sh
post-script: packaging/post_build_script.sh
smoke-test-script: packaging/smoke_test_script.sh
display-name: L0 dynamo converter tests
uses: ./.github/workflows/linux-test.yml
with:
job-name: L0-dynamo-converter-tests
repository: "pytorch/tensorrt"
ref: ""
test-infra-repository: pytorch/test-infra
test-infra-ref: main
build-matrix: ${{ needs.filter-matrix.outputs.matrix }}
pre-script: ${{ matrix.pre-script }}
script: |
set -euo pipefail
pushd .
cd tests/py/dynamo
python -m pytest -ra -n 8 --junitxml=${RUNNER_TEST_RESULTS_DIR}/l0_dynamo_converter_tests_results.xml --dist=loadscope --maxfail=20 conversion/
popd

L0-py-core-tests:
name: ${{ matrix.display-name }}
needs: [filter-matrix, build]
if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }}
strategy:
fail-fast: false
matrix:
include:
- repository: pytorch/tensorrt
package-name: torch_tensorrt
pre-script: packaging/pre_build_script.sh
post-script: packaging/post_build_script.sh
smoke-test-script: packaging/smoke_test_script.sh
display-name: L0 core python tests
uses: ./.github/workflows/linux-test.yml
with:
job-name: L0-py-core-tests
repository: "pytorch/tensorrt"
ref: ""
test-infra-repository: pytorch/test-infra
test-infra-ref: main
build-matrix: ${{ needs.filter-matrix.outputs.matrix }}
pre-script: ${{ matrix.pre-script }}
script: |
set -euo pipefail
pushd .
cd tests/py/core
python -m pytest -ra -n 8 --junitxml=${RUNNER_TEST_RESULTS_DIR}/l0_py_core_tests_results.xml .
popd
L0-dynamo-core-tests:
Comment on lines +113 to +145
name: ${{ matrix.display-name }}
needs: [filter-matrix, build]
if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }}
strategy:
fail-fast: false
matrix:
include:
- repository: pytorch/tensorrt
package-name: torch_tensorrt
pre-script: packaging/pre_build_script.sh
post-script: packaging/post_build_script.sh
smoke-test-script: packaging/smoke_test_script.sh
display-name: L0 dynamo core tests
uses: ./.github/workflows/linux-test.yml
with:
job-name: L0-dynamo-core-tests
repository: "pytorch/tensorrt"
ref: ""
test-infra-repository: pytorch/test-infra
test-infra-ref: main
build-matrix: ${{ needs.filter-matrix.outputs.matrix }}
pre-script: ${{ matrix.pre-script }}
script: |
set -euo pipefail
pushd .
cd tests/py
cd dynamo
python -m pytest -ra -n 8 --junitxml=${RUNNER_TEST_RESULTS_DIR}/l0_dynamo_core_runtime_tests_results.xml runtime/test_000_*
python -m pytest -ra -n 8 --junitxml=${RUNNER_TEST_RESULTS_DIR}/l0_dynamo_core_partitioning_tests_results.xml partitioning/test_000_*
python -m pytest -ra -n 8 --junitxml=${RUNNER_TEST_RESULTS_DIR}/l0_dynamo_core_lowering_tests_results.xml lowering/
popd

L0-torchscript-tests:
name: ${{ matrix.display-name }}
needs: [filter-matrix, build]
if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }}
strategy:
fail-fast: false
matrix:
include:
- repository: pytorch/tensorrt
package-name: torch_tensorrt
pre-script: packaging/pre_build_script.sh
post-script: packaging/post_build_script.sh
smoke-test-script: packaging/smoke_test_script.sh
display-name: L0 torchscript tests
uses: ./.github/workflows/linux-test.yml
with:
job-name: L0-torchscript-tests
repository: "pytorch/tensorrt"
ref: ""
test-infra-repository: pytorch/test-infra
test-infra-ref: main
build-matrix: ${{ needs.filter-matrix.outputs.matrix }}
pre-script: ${{ matrix.pre-script }}
script: |
set -euo pipefail
pushd .
cd tests/modules
python hub.py
popd
pushd .
cd tests/py/ts
python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/l0_ts_api_tests_results.xml api/
popd
L0-py-core-tests:
Comment on lines +146 to +175
name: ${{ matrix.display-name }}
needs: [filter-matrix, build]
if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }}
strategy:
fail-fast: false
matrix:
include:
- repository: pytorch/tensorrt
package-name: torch_tensorrt
pre-script: packaging/pre_build_script.sh
post-script: packaging/post_build_script.sh
smoke-test-script: packaging/smoke_test_script.sh
display-name: L0 core python tests
uses: ./.github/workflows/linux-test.yml
with:
job-name: L0-py-core-tests
repository: "pytorch/tensorrt"
ref: ""
test-infra-repository: pytorch/test-infra
test-infra-ref: main
build-matrix: ${{ needs.filter-matrix.outputs.matrix }}
pre-script: ${{ matrix.pre-script }}
script: |
set -euo pipefail
pushd .
cd tests/py/core
python -m pytest -ra -n 8 --junitxml=${RUNNER_TEST_RESULTS_DIR}/l0_py_core_tests_results.xml .
popd

L1-dynamo-core-tests:
name: ${{ matrix.display-name }}
needs: [filter-matrix, build, L0-dynamo-converter-tests, L0-dynamo-core-tests, L0-py-core-tests, L0-torchscript-tests]
if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }}
strategy:
fail-fast: false
matrix:
include:
- repository: pytorch/tensorrt
package-name: torch_tensorrt
pre-script: packaging/pre_build_script.sh
post-script: packaging/post_build_script.sh
smoke-test-script: packaging/smoke_test_script.sh
display-name: L1 dynamo core tests
uses: ./.github/workflows/linux-test.yml
with:
job-name: L1-dynamo-core-tests
repository: "pytorch/tensorrt"
ref: ""
test-infra-repository: pytorch/test-infra
test-infra-ref: main
build-matrix: ${{ needs.filter-matrix.outputs.matrix }}
pre-script: ${{ matrix.pre-script }}
script: |
set -euo pipefail
pushd .
cd tests/py/dynamo
python -m pytest -ra -n 8 --junitxml=${RUNNER_TEST_RESULTS_DIR}/l1_dynamo_core_tests_results.xml runtime/test_001_*
python -m pytest -ra -n 8 --junitxml=${RUNNER_TEST_RESULTS_DIR}/l1_dynamo_core_partitioning_tests_results.xml partitioning/test_001_*
L0-torchscript-tests:
Comment on lines +409 to +447
name: ${{ matrix.display-name }}
needs:
[
filter-matrix,
build,
L1-dynamo-compile-tests,
L1-dynamo-core-tests,
L1-torch-compile-tests,
L1-torchscript-tests,
]
if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }}
strategy:
fail-fast: false
matrix:
include:
- repository: pytorch/tensorrt
package-name: torch_tensorrt
pre-script: packaging/pre_build_script.sh
post-script: packaging/post_build_script.sh
smoke-test-script: packaging/smoke_test_script.sh
display-name: L2 dynamo compile tests
uses: ./.github/workflows/linux-test.yml
with:
job-name: L2-dynamo-compile-tests
repository: "pytorch/tensorrt"
ref: ""
test-infra-repository: pytorch/test-infra
test-infra-ref: main
build-matrix: ${{ needs.filter-matrix.outputs.matrix }}
pre-script: ${{ matrix.pre-script }}
script: |
set -euo pipefail
pushd .
cd tests/py/dynamo/
python -m pytest -m "not critical" -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/l2_dynamo_compile_tests_results.xml models/
python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/l2_dynamo_compile_llm_tests_results.xml llm/
popd

L2-dynamo-plugin-tests:
name: ${{ matrix.display-name }}
needs: [filter-matrix, build, L1-dynamo-core-tests, L1-dynamo-compile-tests, L1-torch-compile-tests, L1-torchscript-tests]
if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }}
strategy:
fail-fast: false
matrix:
include:
- repository: pytorch/tensorrt
package-name: torch_tensorrt
pre-script: packaging/pre_build_script.sh
post-script: packaging/post_build_script.sh
smoke-test-script: packaging/smoke_test_script.sh
display-name: L2 dynamo plugin tests
uses: ./.github/workflows/linux-test.yml
with:
job-name: L2-dynamo-plugin-tests
repository: "pytorch/tensorrt"
ref: ""
test-infra-repository: pytorch/test-infra
test-infra-ref: main
build-matrix: ${{ needs.filter-matrix.outputs.matrix }}
pre-script: ${{ matrix.pre-script }}
script: |
set -euo pipefail
pushd .
cd tests/py/dynamo
python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/dynamo_converters_test_results.xml -n 4 conversion/
python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/dynamo_converters_test_results.xml automatic_plugin/test_automatic_plugin.py
python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/dynamo_converters_test_results.xml automatic_plugin/test_automatic_plugin_with_attrs.py
python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/dynamo_converters_test_results.xml automatic_plugin/test_flashinfer_rmsnorm.py
popd
L2-dynamo-core-tests:
Comment on lines +448 to +485
name: ${{ matrix.display-name }}
needs:
[
filter-matrix,
build,
L1-dynamo-core-tests,
L1-dynamo-compile-tests,
L1-torch-compile-tests,
L1-torchscript-tests,
]
if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }}
strategy:
fail-fast: false
matrix:
include:
- repository: pytorch/tensorrt
package-name: torch_tensorrt
pre-script: packaging/pre_build_script.sh
post-script: packaging/post_build_script.sh
smoke-test-script: packaging/smoke_test_script.sh
display-name: L2 dynamo core tests
uses: ./.github/workflows/linux-test.yml
with:
job-name: L2-dynamo-core-tests
repository: "pytorch/tensorrt"
ref: ""
test-infra-repository: pytorch/test-infra
test-infra-ref: main
build-matrix: ${{ needs.filter-matrix.outputs.matrix }}
pre-script: ${{ matrix.pre-script }}
script: |
set -euo pipefail
pushd .
cd tests/py/dynamo
python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/l2_dynamo_core_tests_results.xml -k "not test_000_ and not test_001_" runtime/*
popd

L2-torchscript-tests:
name: ${{ matrix.display-name }}
needs: [filter-matrix, build, L1-dynamo-core-tests, L1-dynamo-compile-tests, L1-torch-compile-tests, L1-torchscript-tests]
if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }}
strategy:
fail-fast: false
matrix:
include:
- repository: pytorch/tensorrt
package-name: torch_tensorrt
pre-script: packaging/pre_build_script.sh
post-script: packaging/post_build_script.sh
smoke-test-script: packaging/smoke_test_script.sh
display-name: L2 torch script tests
uses: ./.github/workflows/linux-test.yml
with:
job-name: L2-torchscript-tests
repository: "pytorch/tensorrt"
ref: ""
test-infra-repository: pytorch/test-infra
test-infra-ref: main
build-matrix: ${{ needs.filter-matrix.outputs.matrix }}
pre-script: ${{ matrix.pre-script }}
script: |
set -euo pipefail
pushd .
cd tests/modules
python hub.py
popd
pushd .
cd tests/py/ts
python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/l2_ts_integrations_tests_results.xml integrations/
popd
L2-dynamo-plugin-tests:
Comment on lines +486 to +526
name: ${{ matrix.display-name }}
needs:
[
filter-matrix,
build,
L1-dynamo-core-tests,
L1-dynamo-compile-tests,
L1-torch-compile-tests,
L1-torchscript-tests,
]
if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }}
strategy:
fail-fast: false
matrix:
include:
- repository: pytorch/tensorrt
package-name: torch_tensorrt
pre-script: packaging/pre_build_script.sh
post-script: packaging/post_build_script.sh
smoke-test-script: packaging/smoke_test_script.sh
display-name: L2 dynamo plugin tests
uses: ./.github/workflows/linux-test.yml
with:
job-name: L2-dynamo-plugin-tests
repository: "pytorch/tensorrt"
ref: ""
test-infra-repository: pytorch/test-infra
test-infra-ref: main
build-matrix: ${{ needs.filter-matrix.outputs.matrix }}
pre-script: ${{ matrix.pre-script }}
script: |
set -euo pipefail
pushd .
cd tests/py/dynamo
python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/dynamo_converters_test_results.xml -n 4 conversion/
python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/dynamo_converters_test_results.xml automatic_plugin/test_automatic_plugin.py
python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/dynamo_converters_test_results.xml automatic_plugin/test_automatic_plugin_with_attrs.py
python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/dynamo_converters_test_results.xml automatic_plugin/test_flashinfer_rmsnorm.py
popd

L2-torchscript-tests:
Comment on lines +527 to +568
name: ${{ matrix.display-name }}
needs:
[
filter-matrix,
build,
L1-dynamo-core-tests,
L1-dynamo-compile-tests,
L1-torch-compile-tests,
L1-torchscript-tests,
]
if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }}
strategy:
fail-fast: false
matrix:
include:
- repository: pytorch/tensorrt
package-name: torch_tensorrt
pre-script: packaging/pre_build_script.sh
post-script: packaging/post_build_script.sh
smoke-test-script: packaging/smoke_test_script.sh
display-name: L2 torch script tests
uses: ./.github/workflows/linux-test.yml
with:
job-name: L2-torchscript-tests
repository: "pytorch/tensorrt"
ref: ""
test-infra-repository: pytorch/test-infra
test-infra-ref: main
build-matrix: ${{ needs.filter-matrix.outputs.matrix }}
pre-script: ${{ matrix.pre-script }}
script: |
set -euo pipefail
pushd .
cd tests/modules
python hub.py
popd
pushd .
cd tests/py/ts
python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/l2_ts_integrations_tests_results.xml integrations/
popd

L2-dynamo-distributed-tests:
name: ${{ matrix.display-name }}
needs: [filter-matrix, build, L1-dynamo-core-tests, L1-dynamo-compile-tests, L1-torch-compile-tests, L1-torchscript-tests]
if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }}
strategy:
fail-fast: false
matrix:
include:
- repository: pytorch/tensorrt
package-name: torch_tensorrt
pre-script: packaging/pre_build_script.sh
post-script: packaging/post_build_script.sh
smoke-test-script: packaging/smoke_test_script.sh
display-name: L2 dynamo distributed tests
uses: ./.github/workflows/linux-test.yml
with:
job-name: L2-dynamo-distributed-tests
repository: "pytorch/tensorrt"
ref: ""
test-infra-repository: pytorch/test-infra
test-infra-ref: main
build-matrix: ${{ needs.filter-matrix.outputs.matrix }}
pre-script: ${{ matrix.pre-script }}
script: |
set -euo pipefail
export USE_HOST_DEPS=1
export CI_BUILD=1
export USE_TRTLLM_PLUGINS=1
dnf install -y mpich mpich-devel openmpi openmpi-devel
pushd .
cd tests/py
cd dynamo
python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/l2_dynamo_distributed_test_results.xml distributed/test_nccl_ops.py
popd
L2-dynamo-distributed-tests:
Comment thread .github/workflows/build-test-linux-x86_64.yml Fixed
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some changes that do not conform to C++ style guidelines:

diff --git a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp b/tmp/changes.txt
index 4b91415..ae5232b 100644
--- a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp
+++ b/tmp/changes.txt
@@ -573,14 +573,16 @@ bool TRTEngine::bind_nccl_comm() {
    } else if (nccl_groups.size() > 1) {
      std::string names;
      for (const auto& n : nccl_groups) {
-        if (!names.empty()) names += ", ";
+        if (!names.empty())
+          names += ", ";
        names += "'" + n + "'";
      }
      LOG_WARNING(
          "This TRT engine requires NCCL but multiple NCCL process groups are registered ("
-          << names << "). Cannot auto-select a group — NCCL bind deferred. "
-          "Use the recommended workflow: "
-          "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
+          << names
+          << "). Cannot auto-select a group — NCCL bind deferred. "
+             "Use the recommended workflow: "
+             "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
    } else {
      LOG_WARNING(
          "This TRT engine requires NCCL (requires_native_multidevice=true) but no NCCL process group "
ERROR: Some files do not conform to style guidelines

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/tests/py/dynamo/distributed/test_export_save_load.py	2026-04-21 18:14:22.530281+00:00
+++ /home/runner/work/TensorRT/TensorRT/tests/py/dynamo/distributed/test_export_save_load.py	2026-04-21 18:14:47.015206+00:00
@@ -299,11 +299,14 @@

    path = rank_path(save_dir, rank, world_size)
    loaded = torch_tensorrt.load(path)
    loaded_model = loaded.module()

-    with distributed_context(dist.group.WORLD, [loaded_model, compiled_model]) as (lm, cm):
+    with distributed_context(dist.group.WORLD, [loaded_model, compiled_model]) as (
+        lm,
+        cm,
+    ):
        with torch.no_grad():
            loaded_output = lm(inp)
            compiled_output = cm(inp)

    diff = float((compiled_output - loaded_output).abs().max())

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some changes that do not conform to C++ style guidelines:

diff --git a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp b/tmp/changes.txt
index 4b91415..ae5232b 100644
--- a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp
+++ b/tmp/changes.txt
@@ -573,14 +573,16 @@ bool TRTEngine::bind_nccl_comm() {
    } else if (nccl_groups.size() > 1) {
      std::string names;
      for (const auto& n : nccl_groups) {
-        if (!names.empty()) names += ", ";
+        if (!names.empty())
+          names += ", ";
        names += "'" + n + "'";
      }
      LOG_WARNING(
          "This TRT engine requires NCCL but multiple NCCL process groups are registered ("
-          << names << "). Cannot auto-select a group — NCCL bind deferred. "
-          "Use the recommended workflow: "
-          "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
+          << names
+          << "). Cannot auto-select a group — NCCL bind deferred. "
+             "Use the recommended workflow: "
+             "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
    } else {
      LOG_WARNING(
          "This TRT engine requires NCCL (requires_native_multidevice=true) but no NCCL process group "
ERROR: Some files do not conform to style guidelines

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some changes that do not conform to C++ style guidelines:

diff --git a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp b/tmp/changes.txt
index 4b91415..ae5232b 100644
--- a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp
+++ b/tmp/changes.txt
@@ -573,14 +573,16 @@ bool TRTEngine::bind_nccl_comm() {
    } else if (nccl_groups.size() > 1) {
      std::string names;
      for (const auto& n : nccl_groups) {
-        if (!names.empty()) names += ", ";
+        if (!names.empty())
+          names += ", ";
        names += "'" + n + "'";
      }
      LOG_WARNING(
          "This TRT engine requires NCCL but multiple NCCL process groups are registered ("
-          << names << "). Cannot auto-select a group — NCCL bind deferred. "
-          "Use the recommended workflow: "
-          "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
+          << names
+          << "). Cannot auto-select a group — NCCL bind deferred. "
+             "Use the recommended workflow: "
+             "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
    } else {
      LOG_WARNING(
          "This TRT engine requires NCCL (requires_native_multidevice=true) but no NCCL process group "
ERROR: Some files do not conform to style guidelines

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/tests/py/dynamo/distributed/test_export_save_load.py	2026-04-21 19:29:54.973984+00:00
+++ /home/runner/work/TensorRT/TensorRT/tests/py/dynamo/distributed/test_export_save_load.py	2026-04-21 19:30:16.927307+00:00
@@ -299,11 +299,14 @@

    path = rank_path(save_dir, rank, world_size)
    loaded = torch_tensorrt.load(path)
    loaded_model = loaded.module()

-    with distributed_context(dist.group.WORLD, [loaded_model, compiled_model]) as (lm, cm):
+    with distributed_context(dist.group.WORLD, [loaded_model, compiled_model]) as (
+        lm,
+        cm,
+    ):
        with torch.no_grad():
            loaded_output = lm(inp)
            compiled_output = cm(inp)

    diff = float((compiled_output - loaded_output).abs().max())

Comment on lines +569 to +615
name: ${{ matrix.display-name }}
needs:
[
filter-matrix,
build,
L1-dynamo-core-tests,
L1-dynamo-compile-tests,
L1-torch-compile-tests,
L1-torchscript-tests,
]
if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }}
strategy:
fail-fast: false
matrix:
include:
- repository: pytorch/tensorrt
package-name: torch_tensorrt
pre-script: packaging/pre_build_script.sh
post-script: packaging/post_build_script.sh
smoke-test-script: packaging/smoke_test_script.sh
display-name: L2 dynamo distributed tests
uses: ./.github/workflows/linux-test.yml
with:
job-name: L2-dynamo-distributed-tests
repository: "pytorch/tensorrt"
ref: ""
test-infra-repository: pytorch/test-infra
test-infra-ref: main
build-matrix: ${{ needs.filter-matrix.outputs.matrix }}
pre-script: ${{ matrix.pre-script }}
runner: linux.g4dn.12xlarge.nvidia.gpu
script: |
set -euo pipefail
export USE_HOST_DEPS=1
export CI_BUILD=1
export USE_TRTLLM_PLUGINS=1
dnf install -y mpich mpich-devel openmpi openmpi-devel
pushd .
cd tests/py
cd dynamo
python -m pytest -ra -v --junitxml=${RUNNER_TEST_RESULTS_DIR}/l2_dynamo_distributed_test_results.xml \
distributed/test_nccl_ops.py \
distributed/test_native_nccl.py \
distributed/test_export_save_load.py
python -m torch_tensorrt.distributed.run --nproc_per_node=2 distributed/test_native_nccl.py --multirank
python -m torch_tensorrt.distributed.run --nproc_per_node=2 distributed/test_export_save_load.py --multirank
popd
@narendasan narendasan force-pushed the push-vqqzkszwrvyx branch 3 times, most recently from 28b53a3 to 0525934 Compare April 22, 2026 14:33
@narendasan narendasan merged commit 0b274f0 into main Apr 22, 2026
81 of 86 checks passed
@narendasan narendasan deleted the push-vqqzkszwrvyx branch April 22, 2026 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed component: api [Python] Issues re: Python API component: build system Issues re: Build system component: conversion Issues re: Conversion stage component: converters Issues re: Specific op converters component: core Issues re: The core compiler component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths component: lowering Issues re: The lowering / preprocessing passes component: runtime component: tests Issues re: Tests component: torch_compile documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants