MD-TRT Support, Compile/Export, C++ and Python #4183
Conversation
- C++ runtime: NCCL communicator init via c10d, rank/world_size serialization, DynamicOutputAllocator, ABI version bump to 8 - Python runtime: distributed support in PythonTorchTensorRTModule and TorchTensorRTModule, NCCL library auto-detection - Conversion: native TRT DistCollective API (AllGather, ReduceScatter, AllReduce) with TRT-LLM plugin fallback - Graph lowering: fuse c10d_functional collectives + wait_tensor into single ops - Feature detection: native_trt_collectives flag, platform validation, graceful fallback chain - Build: conditional NCCL compilation via torch_nccl toolchain - Examples: tensor_parallel_simple_example.py, tensor_parallel_llama_llm.py
…g and enable DTensor decomposition
…hapes
Five interconnected fixes:
1. fold_get_attr_item_calls: fold scalar param .item() calls into Python
scalars before AOT tracing. Inside FakeTensorMode, even real-tensor
.item() calls raise DataDependentOutputException.
2. backends.py: three changes:
- call fold_get_attr_item_calls before entering FakeTensorMode
- detect vmap/higher-order ops and route them through aot_autograd
instead of aot_export_joint_simple (which doesn't handle HOPs)
- on TRT build failure, strip TRT-only kwargs (use_fp32_acc) from
the fallback graph before returning it to PyTorch
3. _decompositions.py: prevent SDPA from leaking back into the decomp
table via Core ATen Interchange ops even after being removed from
TORCH_TRT_DECOMPOSITIONS.
4. partitioning/common.py: lower the default max dynamic shape from
min*2^16 to min*2^12 — 65536 is too large for TRT to find kernel
implementations for attention ops.
5. _TorchTensorRTModule.py: move CPU scalar inputs to CUDA before
execution — aot_autograd lifts scalar attributes (e.g. head_dim^-0.5)
as explicit graph inputs; TRT requires all inputs on CUDA.
Also fixes remove_sym_nodes to match tensor sources by equality rather
than local_name so that GetItemSource bases (from torch.compile
dynamic=True) are matched correctly, and updates register_sdpa.py to
handle aten.scaled_dot_product_attention.default (the form produced after
aot_autograd) in addition to the flash/efficient variants.
67134da to
b5b1f5f
Compare
b5b1f5f to
1957cc4
Compare
There was a problem hiding this comment.
There are some changes that do not conform to C++ style guidelines:
diff --git a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.h b/tmp/changes.txt
index cd8af65..615600d 100644
--- a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.h
+++ b/tmp/changes.txt
@@ -33,17 +33,17 @@ namespace core {
namespace runtime {
using FlattenedState = std::tuple<
- std::tuple<std::string, std::string>, // ABI_VERSION
- std::tuple<std::string, std::string>, // name
- std::tuple<std::string, std::string>, // device
- std::tuple<std::string, std::string>, // engine
- std::tuple<std::string, std::string>, // input binding names
- std::tuple<std::string, std::string>, // output binding names
- std::tuple<std::string, std::string>, // HW compatibility
- std::tuple<std::string, std::string>, // requires_output_allocator
- std::tuple<std::string, std::string>, // serialized metadata
- std::tuple<std::string, std::string>, // Platform
- std::tuple<std::string, std::string>, // Resource Allocation Strategy
+ std::tuple<std::string, std::string>, // ABI_VERSION
+ std::tuple<std::string, std::string>, // name
+ std::tuple<std::string, std::string>, // device
+ std::tuple<std::string, std::string>, // engine
+ std::tuple<std::string, std::string>, // input binding names
+ std::tuple<std::string, std::string>, // output binding names
+ std::tuple<std::string, std::string>, // HW compatibility
+ std::tuple<std::string, std::string>, // requires_output_allocator
+ std::tuple<std::string, std::string>, // serialized metadata
+ std::tuple<std::string, std::string>, // Platform
+ std::tuple<std::string, std::string>, // Resource Allocation Strategy
std::tuple<std::string, std::string>>; // requires_multidevice
struct TorchTRTRuntimeStates {
ERROR: Some files do not conform to style guidelinesThere was a problem hiding this comment.
There are some changes that do not conform to Python style guidelines:
--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/runtime/_PythonTorchTensorRTModule.py 2026-04-21 00:20:02.596692+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/runtime/_PythonTorchTensorRTModule.py 2026-04-21 00:20:22.271000+00:00
@@ -386,11 +386,10 @@
logger.debug(
"Barrier after execution context creation (distributed NCCL engine)"
)
dist.barrier()
-
if ENABLED_FEATURES.tensorrt_rtx:
self._setup_runtime_config()
self.context = self._create_context()
assert self.context is not None, "Failed to create execution context"9d95aa9 to
fe59779
Compare
There was a problem hiding this comment.
There are some changes that do not conform to C++ style guidelines:
diff --git a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp b/tmp/changes.txt
index 4b91415..ae5232b 100644
--- a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp
+++ b/tmp/changes.txt
@@ -573,14 +573,16 @@ bool TRTEngine::bind_nccl_comm() {
} else if (nccl_groups.size() > 1) {
std::string names;
for (const auto& n : nccl_groups) {
- if (!names.empty()) names += ", ";
+ if (!names.empty())
+ names += ", ";
names += "'" + n + "'";
}
LOG_WARNING(
"This TRT engine requires NCCL but multiple NCCL process groups are registered ("
- << names << "). Cannot auto-select a group — NCCL bind deferred. "
- "Use the recommended workflow: "
- "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
+ << names
+ << "). Cannot auto-select a group — NCCL bind deferred. "
+ "Use the recommended workflow: "
+ "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
} else {
LOG_WARNING(
"This TRT engine requires NCCL (requires_native_multidevice=true) but no NCCL process group "
ERROR: Some files do not conform to style guidelinesThere was a problem hiding this comment.
There are some changes that do not conform to C++ style guidelines:
diff --git a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp b/tmp/changes.txt
index 4b91415..ae5232b 100644
--- a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp
+++ b/tmp/changes.txt
@@ -573,14 +573,16 @@ bool TRTEngine::bind_nccl_comm() {
} else if (nccl_groups.size() > 1) {
std::string names;
for (const auto& n : nccl_groups) {
- if (!names.empty()) names += ", ";
+ if (!names.empty())
+ names += ", ";
names += "'" + n + "'";
}
LOG_WARNING(
"This TRT engine requires NCCL but multiple NCCL process groups are registered ("
- << names << "). Cannot auto-select a group — NCCL bind deferred. "
- "Use the recommended workflow: "
- "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
+ << names
+ << "). Cannot auto-select a group — NCCL bind deferred. "
+ "Use the recommended workflow: "
+ "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
} else {
LOG_WARNING(
"This TRT engine requires NCCL (requires_native_multidevice=true) but no NCCL process group "
ERROR: Some files do not conform to style guidelinesThere was a problem hiding this comment.
There are some changes that do not conform to C++ style guidelines:
diff --git a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp b/tmp/changes.txt
index 4b91415..ae5232b 100644
--- a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp
+++ b/tmp/changes.txt
@@ -573,14 +573,16 @@ bool TRTEngine::bind_nccl_comm() {
} else if (nccl_groups.size() > 1) {
std::string names;
for (const auto& n : nccl_groups) {
- if (!names.empty()) names += ", ";
+ if (!names.empty())
+ names += ", ";
names += "'" + n + "'";
}
LOG_WARNING(
"This TRT engine requires NCCL but multiple NCCL process groups are registered ("
- << names << "). Cannot auto-select a group — NCCL bind deferred. "
- "Use the recommended workflow: "
- "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
+ << names
+ << "). Cannot auto-select a group — NCCL bind deferred. "
+ "Use the recommended workflow: "
+ "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
} else {
LOG_WARNING(
"This TRT engine requires NCCL (requires_native_multidevice=true) but no NCCL process group "
ERROR: Some files do not conform to style guidelinesThere was a problem hiding this comment.
There are some changes that do not conform to C++ style guidelines:
diff --git a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp b/tmp/changes.txt
index 4b91415..ae5232b 100644
--- a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp
+++ b/tmp/changes.txt
@@ -573,14 +573,16 @@ bool TRTEngine::bind_nccl_comm() {
} else if (nccl_groups.size() > 1) {
std::string names;
for (const auto& n : nccl_groups) {
- if (!names.empty()) names += ", ";
+ if (!names.empty())
+ names += ", ";
names += "'" + n + "'";
}
LOG_WARNING(
"This TRT engine requires NCCL but multiple NCCL process groups are registered ("
- << names << "). Cannot auto-select a group — NCCL bind deferred. "
- "Use the recommended workflow: "
- "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
+ << names
+ << "). Cannot auto-select a group — NCCL bind deferred. "
+ "Use the recommended workflow: "
+ "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
} else {
LOG_WARNING(
"This TRT engine requires NCCL (requires_native_multidevice=true) but no NCCL process group "
ERROR: Some files do not conform to style guidelines| uses: pytorch/test-infra/.github/workflows/generate_binary_build_matrix.yml@main | ||
| with: | ||
| repository: pytorch/tensorrt | ||
| - name: Generate matrix | ||
| id: generate | ||
| run: | | ||
| set -eou pipefail | ||
| MATRIX_BLOB=${{ toJSON(needs.generate-matrix.outputs.matrix) }} | ||
| MATRIX_BLOB="$(python3 .github/scripts/filter-matrix.py --matrix "${MATRIX_BLOB}")" | ||
| echo "${MATRIX_BLOB}" | ||
| echo "matrix=${MATRIX_BLOB}" >> "${GITHUB_OUTPUT}" | ||
| package-type: wheel | ||
| os: linux | ||
| test-infra-repository: pytorch/test-infra | ||
| test-infra-ref: main | ||
| with-rocm: false | ||
| with-cpu: false | ||
|
|
||
| build: | ||
| needs: filter-matrix | ||
| permissions: | ||
| id-token: write | ||
| contents: read | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| include: | ||
| - repository: pytorch/tensorrt | ||
| pre-script: packaging/pre_build_script.sh | ||
| env-var-script: packaging/env_vars.txt | ||
| post-script: packaging/post_build_script.sh | ||
| smoke-test-script: packaging/smoke_test_script.sh | ||
| package-name: torch_tensorrt | ||
| display-name: Build Linux x86_64 torch-tensorrt whl package | ||
| name: ${{ matrix.display-name }} | ||
| uses: ./.github/workflows/build_linux.yml | ||
| with: | ||
| repository: ${{ matrix.repository }} | ||
| ref: "" | ||
| test-infra-repository: pytorch/test-infra | ||
| test-infra-ref: main | ||
| build-matrix: ${{ needs.filter-matrix.outputs.matrix }} | ||
| pre-script: ${{ matrix.pre-script }} | ||
| env-var-script: ${{ matrix.env-var-script }} | ||
| post-script: ${{ matrix.post-script }} | ||
| package-name: ${{ matrix.package-name }} | ||
| smoke-test-script: ${{ matrix.smoke-test-script }} | ||
| trigger-event: ${{ github.event_name }} | ||
| architecture: "x86_64" | ||
| use-rtx: false | ||
| pip-install-torch-extra-args: "--extra-index-url https://pypi.org/simple" | ||
| filter-matrix: |
| needs: [generate-matrix] | ||
| outputs: | ||
| matrix: ${{ steps.generate.outputs.matrix }} | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - uses: actions/setup-python@v6 | ||
| with: | ||
| python-version: "3.11" | ||
| - uses: actions/checkout@v6 | ||
| with: | ||
| repository: pytorch/tensorrt | ||
| - name: Generate matrix | ||
| id: generate | ||
| run: | | ||
| set -eou pipefail | ||
| MATRIX_BLOB=${{ toJSON(needs.generate-matrix.outputs.matrix) }} | ||
| MATRIX_BLOB="$(python3 .github/scripts/filter-matrix.py --matrix "${MATRIX_BLOB}")" | ||
| echo "${MATRIX_BLOB}" | ||
| echo "matrix=${MATRIX_BLOB}" >> "${GITHUB_OUTPUT}" | ||
|
|
||
| L0-dynamo-converter-tests: | ||
| name: ${{ matrix.display-name }} | ||
| needs: [filter-matrix, build] | ||
| if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }} | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| include: | ||
| - repository: pytorch/tensorrt | ||
| package-name: torch_tensorrt | ||
| pre-script: packaging/pre_build_script.sh | ||
| post-script: packaging/post_build_script.sh | ||
| smoke-test-script: packaging/smoke_test_script.sh | ||
| display-name: L0 dynamo converter tests | ||
| uses: ./.github/workflows/linux-test.yml | ||
| with: | ||
| job-name: L0-dynamo-converter-tests | ||
| repository: "pytorch/tensorrt" | ||
| ref: "" | ||
| test-infra-repository: pytorch/test-infra | ||
| test-infra-ref: main | ||
| build-matrix: ${{ needs.filter-matrix.outputs.matrix }} | ||
| pre-script: ${{ matrix.pre-script }} | ||
| script: | | ||
| set -euo pipefail | ||
| pushd . | ||
| cd tests/py/dynamo | ||
| python -m pytest -ra -n 8 --junitxml=${RUNNER_TEST_RESULTS_DIR}/l0_dynamo_converter_tests_results.xml --dist=loadscope --maxfail=20 conversion/ | ||
| popd | ||
| build: |
| name: ${{ matrix.display-name }} | ||
| needs: [filter-matrix, build] | ||
| if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }} | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| include: | ||
| - repository: pytorch/tensorrt | ||
| package-name: torch_tensorrt | ||
| pre-script: packaging/pre_build_script.sh | ||
| post-script: packaging/post_build_script.sh | ||
| smoke-test-script: packaging/smoke_test_script.sh | ||
| display-name: L0 dynamo converter tests | ||
| uses: ./.github/workflows/linux-test.yml | ||
| with: | ||
| job-name: L0-dynamo-converter-tests | ||
| repository: "pytorch/tensorrt" | ||
| ref: "" | ||
| test-infra-repository: pytorch/test-infra | ||
| test-infra-ref: main | ||
| build-matrix: ${{ needs.filter-matrix.outputs.matrix }} | ||
| pre-script: ${{ matrix.pre-script }} | ||
| script: | | ||
| set -euo pipefail | ||
| pushd . | ||
| cd tests/py/dynamo | ||
| python -m pytest -ra -n 8 --junitxml=${RUNNER_TEST_RESULTS_DIR}/l0_dynamo_converter_tests_results.xml --dist=loadscope --maxfail=20 conversion/ | ||
| popd | ||
|
|
||
| L0-py-core-tests: | ||
| name: ${{ matrix.display-name }} | ||
| needs: [filter-matrix, build] | ||
| if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }} | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| include: | ||
| - repository: pytorch/tensorrt | ||
| package-name: torch_tensorrt | ||
| pre-script: packaging/pre_build_script.sh | ||
| post-script: packaging/post_build_script.sh | ||
| smoke-test-script: packaging/smoke_test_script.sh | ||
| display-name: L0 core python tests | ||
| uses: ./.github/workflows/linux-test.yml | ||
| with: | ||
| job-name: L0-py-core-tests | ||
| repository: "pytorch/tensorrt" | ||
| ref: "" | ||
| test-infra-repository: pytorch/test-infra | ||
| test-infra-ref: main | ||
| build-matrix: ${{ needs.filter-matrix.outputs.matrix }} | ||
| pre-script: ${{ matrix.pre-script }} | ||
| script: | | ||
| set -euo pipefail | ||
| pushd . | ||
| cd tests/py/core | ||
| python -m pytest -ra -n 8 --junitxml=${RUNNER_TEST_RESULTS_DIR}/l0_py_core_tests_results.xml . | ||
| popd | ||
| L0-dynamo-core-tests: |
| name: ${{ matrix.display-name }} | ||
| needs: [filter-matrix, build] | ||
| if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }} | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| include: | ||
| - repository: pytorch/tensorrt | ||
| package-name: torch_tensorrt | ||
| pre-script: packaging/pre_build_script.sh | ||
| post-script: packaging/post_build_script.sh | ||
| smoke-test-script: packaging/smoke_test_script.sh | ||
| display-name: L0 dynamo core tests | ||
| uses: ./.github/workflows/linux-test.yml | ||
| with: | ||
| job-name: L0-dynamo-core-tests | ||
| repository: "pytorch/tensorrt" | ||
| ref: "" | ||
| test-infra-repository: pytorch/test-infra | ||
| test-infra-ref: main | ||
| build-matrix: ${{ needs.filter-matrix.outputs.matrix }} | ||
| pre-script: ${{ matrix.pre-script }} | ||
| script: | | ||
| set -euo pipefail | ||
| pushd . | ||
| cd tests/py | ||
| cd dynamo | ||
| python -m pytest -ra -n 8 --junitxml=${RUNNER_TEST_RESULTS_DIR}/l0_dynamo_core_runtime_tests_results.xml runtime/test_000_* | ||
| python -m pytest -ra -n 8 --junitxml=${RUNNER_TEST_RESULTS_DIR}/l0_dynamo_core_partitioning_tests_results.xml partitioning/test_000_* | ||
| python -m pytest -ra -n 8 --junitxml=${RUNNER_TEST_RESULTS_DIR}/l0_dynamo_core_lowering_tests_results.xml lowering/ | ||
| popd | ||
|
|
||
| L0-torchscript-tests: | ||
| name: ${{ matrix.display-name }} | ||
| needs: [filter-matrix, build] | ||
| if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }} | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| include: | ||
| - repository: pytorch/tensorrt | ||
| package-name: torch_tensorrt | ||
| pre-script: packaging/pre_build_script.sh | ||
| post-script: packaging/post_build_script.sh | ||
| smoke-test-script: packaging/smoke_test_script.sh | ||
| display-name: L0 torchscript tests | ||
| uses: ./.github/workflows/linux-test.yml | ||
| with: | ||
| job-name: L0-torchscript-tests | ||
| repository: "pytorch/tensorrt" | ||
| ref: "" | ||
| test-infra-repository: pytorch/test-infra | ||
| test-infra-ref: main | ||
| build-matrix: ${{ needs.filter-matrix.outputs.matrix }} | ||
| pre-script: ${{ matrix.pre-script }} | ||
| script: | | ||
| set -euo pipefail | ||
| pushd . | ||
| cd tests/modules | ||
| python hub.py | ||
| popd | ||
| pushd . | ||
| cd tests/py/ts | ||
| python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/l0_ts_api_tests_results.xml api/ | ||
| popd | ||
| L0-py-core-tests: |
| name: ${{ matrix.display-name }} | ||
| needs: [filter-matrix, build] | ||
| if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }} | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| include: | ||
| - repository: pytorch/tensorrt | ||
| package-name: torch_tensorrt | ||
| pre-script: packaging/pre_build_script.sh | ||
| post-script: packaging/post_build_script.sh | ||
| smoke-test-script: packaging/smoke_test_script.sh | ||
| display-name: L0 core python tests | ||
| uses: ./.github/workflows/linux-test.yml | ||
| with: | ||
| job-name: L0-py-core-tests | ||
| repository: "pytorch/tensorrt" | ||
| ref: "" | ||
| test-infra-repository: pytorch/test-infra | ||
| test-infra-ref: main | ||
| build-matrix: ${{ needs.filter-matrix.outputs.matrix }} | ||
| pre-script: ${{ matrix.pre-script }} | ||
| script: | | ||
| set -euo pipefail | ||
| pushd . | ||
| cd tests/py/core | ||
| python -m pytest -ra -n 8 --junitxml=${RUNNER_TEST_RESULTS_DIR}/l0_py_core_tests_results.xml . | ||
| popd | ||
|
|
||
| L1-dynamo-core-tests: | ||
| name: ${{ matrix.display-name }} | ||
| needs: [filter-matrix, build, L0-dynamo-converter-tests, L0-dynamo-core-tests, L0-py-core-tests, L0-torchscript-tests] | ||
| if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }} | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| include: | ||
| - repository: pytorch/tensorrt | ||
| package-name: torch_tensorrt | ||
| pre-script: packaging/pre_build_script.sh | ||
| post-script: packaging/post_build_script.sh | ||
| smoke-test-script: packaging/smoke_test_script.sh | ||
| display-name: L1 dynamo core tests | ||
| uses: ./.github/workflows/linux-test.yml | ||
| with: | ||
| job-name: L1-dynamo-core-tests | ||
| repository: "pytorch/tensorrt" | ||
| ref: "" | ||
| test-infra-repository: pytorch/test-infra | ||
| test-infra-ref: main | ||
| build-matrix: ${{ needs.filter-matrix.outputs.matrix }} | ||
| pre-script: ${{ matrix.pre-script }} | ||
| script: | | ||
| set -euo pipefail | ||
| pushd . | ||
| cd tests/py/dynamo | ||
| python -m pytest -ra -n 8 --junitxml=${RUNNER_TEST_RESULTS_DIR}/l1_dynamo_core_tests_results.xml runtime/test_001_* | ||
| python -m pytest -ra -n 8 --junitxml=${RUNNER_TEST_RESULTS_DIR}/l1_dynamo_core_partitioning_tests_results.xml partitioning/test_001_* | ||
| L0-torchscript-tests: |
| name: ${{ matrix.display-name }} | ||
| needs: | ||
| [ | ||
| filter-matrix, | ||
| build, | ||
| L1-dynamo-compile-tests, | ||
| L1-dynamo-core-tests, | ||
| L1-torch-compile-tests, | ||
| L1-torchscript-tests, | ||
| ] | ||
| if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }} | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| include: | ||
| - repository: pytorch/tensorrt | ||
| package-name: torch_tensorrt | ||
| pre-script: packaging/pre_build_script.sh | ||
| post-script: packaging/post_build_script.sh | ||
| smoke-test-script: packaging/smoke_test_script.sh | ||
| display-name: L2 dynamo compile tests | ||
| uses: ./.github/workflows/linux-test.yml | ||
| with: | ||
| job-name: L2-dynamo-compile-tests | ||
| repository: "pytorch/tensorrt" | ||
| ref: "" | ||
| test-infra-repository: pytorch/test-infra | ||
| test-infra-ref: main | ||
| build-matrix: ${{ needs.filter-matrix.outputs.matrix }} | ||
| pre-script: ${{ matrix.pre-script }} | ||
| script: | | ||
| set -euo pipefail | ||
| pushd . | ||
| cd tests/py/dynamo/ | ||
| python -m pytest -m "not critical" -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/l2_dynamo_compile_tests_results.xml models/ | ||
| python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/l2_dynamo_compile_llm_tests_results.xml llm/ | ||
| popd | ||
|
|
||
| L2-dynamo-plugin-tests: | ||
| name: ${{ matrix.display-name }} | ||
| needs: [filter-matrix, build, L1-dynamo-core-tests, L1-dynamo-compile-tests, L1-torch-compile-tests, L1-torchscript-tests] | ||
| if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }} | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| include: | ||
| - repository: pytorch/tensorrt | ||
| package-name: torch_tensorrt | ||
| pre-script: packaging/pre_build_script.sh | ||
| post-script: packaging/post_build_script.sh | ||
| smoke-test-script: packaging/smoke_test_script.sh | ||
| display-name: L2 dynamo plugin tests | ||
| uses: ./.github/workflows/linux-test.yml | ||
| with: | ||
| job-name: L2-dynamo-plugin-tests | ||
| repository: "pytorch/tensorrt" | ||
| ref: "" | ||
| test-infra-repository: pytorch/test-infra | ||
| test-infra-ref: main | ||
| build-matrix: ${{ needs.filter-matrix.outputs.matrix }} | ||
| pre-script: ${{ matrix.pre-script }} | ||
| script: | | ||
| set -euo pipefail | ||
| pushd . | ||
| cd tests/py/dynamo | ||
| python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/dynamo_converters_test_results.xml -n 4 conversion/ | ||
| python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/dynamo_converters_test_results.xml automatic_plugin/test_automatic_plugin.py | ||
| python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/dynamo_converters_test_results.xml automatic_plugin/test_automatic_plugin_with_attrs.py | ||
| python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/dynamo_converters_test_results.xml automatic_plugin/test_flashinfer_rmsnorm.py | ||
| popd | ||
| L2-dynamo-core-tests: |
| name: ${{ matrix.display-name }} | ||
| needs: | ||
| [ | ||
| filter-matrix, | ||
| build, | ||
| L1-dynamo-core-tests, | ||
| L1-dynamo-compile-tests, | ||
| L1-torch-compile-tests, | ||
| L1-torchscript-tests, | ||
| ] | ||
| if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }} | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| include: | ||
| - repository: pytorch/tensorrt | ||
| package-name: torch_tensorrt | ||
| pre-script: packaging/pre_build_script.sh | ||
| post-script: packaging/post_build_script.sh | ||
| smoke-test-script: packaging/smoke_test_script.sh | ||
| display-name: L2 dynamo core tests | ||
| uses: ./.github/workflows/linux-test.yml | ||
| with: | ||
| job-name: L2-dynamo-core-tests | ||
| repository: "pytorch/tensorrt" | ||
| ref: "" | ||
| test-infra-repository: pytorch/test-infra | ||
| test-infra-ref: main | ||
| build-matrix: ${{ needs.filter-matrix.outputs.matrix }} | ||
| pre-script: ${{ matrix.pre-script }} | ||
| script: | | ||
| set -euo pipefail | ||
| pushd . | ||
| cd tests/py/dynamo | ||
| python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/l2_dynamo_core_tests_results.xml -k "not test_000_ and not test_001_" runtime/* | ||
| popd | ||
|
|
||
| L2-torchscript-tests: | ||
| name: ${{ matrix.display-name }} | ||
| needs: [filter-matrix, build, L1-dynamo-core-tests, L1-dynamo-compile-tests, L1-torch-compile-tests, L1-torchscript-tests] | ||
| if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }} | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| include: | ||
| - repository: pytorch/tensorrt | ||
| package-name: torch_tensorrt | ||
| pre-script: packaging/pre_build_script.sh | ||
| post-script: packaging/post_build_script.sh | ||
| smoke-test-script: packaging/smoke_test_script.sh | ||
| display-name: L2 torch script tests | ||
| uses: ./.github/workflows/linux-test.yml | ||
| with: | ||
| job-name: L2-torchscript-tests | ||
| repository: "pytorch/tensorrt" | ||
| ref: "" | ||
| test-infra-repository: pytorch/test-infra | ||
| test-infra-ref: main | ||
| build-matrix: ${{ needs.filter-matrix.outputs.matrix }} | ||
| pre-script: ${{ matrix.pre-script }} | ||
| script: | | ||
| set -euo pipefail | ||
| pushd . | ||
| cd tests/modules | ||
| python hub.py | ||
| popd | ||
| pushd . | ||
| cd tests/py/ts | ||
| python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/l2_ts_integrations_tests_results.xml integrations/ | ||
| popd | ||
| L2-dynamo-plugin-tests: |
| name: ${{ matrix.display-name }} | ||
| needs: | ||
| [ | ||
| filter-matrix, | ||
| build, | ||
| L1-dynamo-core-tests, | ||
| L1-dynamo-compile-tests, | ||
| L1-torch-compile-tests, | ||
| L1-torchscript-tests, | ||
| ] | ||
| if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }} | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| include: | ||
| - repository: pytorch/tensorrt | ||
| package-name: torch_tensorrt | ||
| pre-script: packaging/pre_build_script.sh | ||
| post-script: packaging/post_build_script.sh | ||
| smoke-test-script: packaging/smoke_test_script.sh | ||
| display-name: L2 dynamo plugin tests | ||
| uses: ./.github/workflows/linux-test.yml | ||
| with: | ||
| job-name: L2-dynamo-plugin-tests | ||
| repository: "pytorch/tensorrt" | ||
| ref: "" | ||
| test-infra-repository: pytorch/test-infra | ||
| test-infra-ref: main | ||
| build-matrix: ${{ needs.filter-matrix.outputs.matrix }} | ||
| pre-script: ${{ matrix.pre-script }} | ||
| script: | | ||
| set -euo pipefail | ||
| pushd . | ||
| cd tests/py/dynamo | ||
| python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/dynamo_converters_test_results.xml -n 4 conversion/ | ||
| python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/dynamo_converters_test_results.xml automatic_plugin/test_automatic_plugin.py | ||
| python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/dynamo_converters_test_results.xml automatic_plugin/test_automatic_plugin_with_attrs.py | ||
| python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/dynamo_converters_test_results.xml automatic_plugin/test_flashinfer_rmsnorm.py | ||
| popd | ||
|
|
||
| L2-torchscript-tests: |
| name: ${{ matrix.display-name }} | ||
| needs: | ||
| [ | ||
| filter-matrix, | ||
| build, | ||
| L1-dynamo-core-tests, | ||
| L1-dynamo-compile-tests, | ||
| L1-torch-compile-tests, | ||
| L1-torchscript-tests, | ||
| ] | ||
| if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }} | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| include: | ||
| - repository: pytorch/tensorrt | ||
| package-name: torch_tensorrt | ||
| pre-script: packaging/pre_build_script.sh | ||
| post-script: packaging/post_build_script.sh | ||
| smoke-test-script: packaging/smoke_test_script.sh | ||
| display-name: L2 torch script tests | ||
| uses: ./.github/workflows/linux-test.yml | ||
| with: | ||
| job-name: L2-torchscript-tests | ||
| repository: "pytorch/tensorrt" | ||
| ref: "" | ||
| test-infra-repository: pytorch/test-infra | ||
| test-infra-ref: main | ||
| build-matrix: ${{ needs.filter-matrix.outputs.matrix }} | ||
| pre-script: ${{ matrix.pre-script }} | ||
| script: | | ||
| set -euo pipefail | ||
| pushd . | ||
| cd tests/modules | ||
| python hub.py | ||
| popd | ||
| pushd . | ||
| cd tests/py/ts | ||
| python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/l2_ts_integrations_tests_results.xml integrations/ | ||
| popd | ||
|
|
||
| L2-dynamo-distributed-tests: | ||
| name: ${{ matrix.display-name }} | ||
| needs: [filter-matrix, build, L1-dynamo-core-tests, L1-dynamo-compile-tests, L1-torch-compile-tests, L1-torchscript-tests] | ||
| if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }} | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| include: | ||
| - repository: pytorch/tensorrt | ||
| package-name: torch_tensorrt | ||
| pre-script: packaging/pre_build_script.sh | ||
| post-script: packaging/post_build_script.sh | ||
| smoke-test-script: packaging/smoke_test_script.sh | ||
| display-name: L2 dynamo distributed tests | ||
| uses: ./.github/workflows/linux-test.yml | ||
| with: | ||
| job-name: L2-dynamo-distributed-tests | ||
| repository: "pytorch/tensorrt" | ||
| ref: "" | ||
| test-infra-repository: pytorch/test-infra | ||
| test-infra-ref: main | ||
| build-matrix: ${{ needs.filter-matrix.outputs.matrix }} | ||
| pre-script: ${{ matrix.pre-script }} | ||
| script: | | ||
| set -euo pipefail | ||
| export USE_HOST_DEPS=1 | ||
| export CI_BUILD=1 | ||
| export USE_TRTLLM_PLUGINS=1 | ||
| dnf install -y mpich mpich-devel openmpi openmpi-devel | ||
| pushd . | ||
| cd tests/py | ||
| cd dynamo | ||
| python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/l2_dynamo_distributed_test_results.xml distributed/test_nccl_ops.py | ||
| popd | ||
| L2-dynamo-distributed-tests: |
9010576 to
3b62346
Compare
There was a problem hiding this comment.
There are some changes that do not conform to C++ style guidelines:
diff --git a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp b/tmp/changes.txt
index 4b91415..ae5232b 100644
--- a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp
+++ b/tmp/changes.txt
@@ -573,14 +573,16 @@ bool TRTEngine::bind_nccl_comm() {
} else if (nccl_groups.size() > 1) {
std::string names;
for (const auto& n : nccl_groups) {
- if (!names.empty()) names += ", ";
+ if (!names.empty())
+ names += ", ";
names += "'" + n + "'";
}
LOG_WARNING(
"This TRT engine requires NCCL but multiple NCCL process groups are registered ("
- << names << "). Cannot auto-select a group — NCCL bind deferred. "
- "Use the recommended workflow: "
- "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
+ << names
+ << "). Cannot auto-select a group — NCCL bind deferred. "
+ "Use the recommended workflow: "
+ "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
} else {
LOG_WARNING(
"This TRT engine requires NCCL (requires_native_multidevice=true) but no NCCL process group "
ERROR: Some files do not conform to style guidelines3b62346 to
c3dcb1c
Compare
There was a problem hiding this comment.
There are some changes that do not conform to Python style guidelines:
--- /home/runner/work/TensorRT/TensorRT/tests/py/dynamo/distributed/test_export_save_load.py 2026-04-21 18:14:22.530281+00:00
+++ /home/runner/work/TensorRT/TensorRT/tests/py/dynamo/distributed/test_export_save_load.py 2026-04-21 18:14:47.015206+00:00
@@ -299,11 +299,14 @@
path = rank_path(save_dir, rank, world_size)
loaded = torch_tensorrt.load(path)
loaded_model = loaded.module()
- with distributed_context(dist.group.WORLD, [loaded_model, compiled_model]) as (lm, cm):
+ with distributed_context(dist.group.WORLD, [loaded_model, compiled_model]) as (
+ lm,
+ cm,
+ ):
with torch.no_grad():
loaded_output = lm(inp)
compiled_output = cm(inp)
diff = float((compiled_output - loaded_output).abs().max())There was a problem hiding this comment.
There are some changes that do not conform to C++ style guidelines:
diff --git a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp b/tmp/changes.txt
index 4b91415..ae5232b 100644
--- a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp
+++ b/tmp/changes.txt
@@ -573,14 +573,16 @@ bool TRTEngine::bind_nccl_comm() {
} else if (nccl_groups.size() > 1) {
std::string names;
for (const auto& n : nccl_groups) {
- if (!names.empty()) names += ", ";
+ if (!names.empty())
+ names += ", ";
names += "'" + n + "'";
}
LOG_WARNING(
"This TRT engine requires NCCL but multiple NCCL process groups are registered ("
- << names << "). Cannot auto-select a group — NCCL bind deferred. "
- "Use the recommended workflow: "
- "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
+ << names
+ << "). Cannot auto-select a group — NCCL bind deferred. "
+ "Use the recommended workflow: "
+ "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
} else {
LOG_WARNING(
"This TRT engine requires NCCL (requires_native_multidevice=true) but no NCCL process group "
ERROR: Some files do not conform to style guidelinesc3dcb1c to
b565101
Compare
There was a problem hiding this comment.
There are some changes that do not conform to C++ style guidelines:
diff --git a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp b/tmp/changes.txt
index 4b91415..ae5232b 100644
--- a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp
+++ b/tmp/changes.txt
@@ -573,14 +573,16 @@ bool TRTEngine::bind_nccl_comm() {
} else if (nccl_groups.size() > 1) {
std::string names;
for (const auto& n : nccl_groups) {
- if (!names.empty()) names += ", ";
+ if (!names.empty())
+ names += ", ";
names += "'" + n + "'";
}
LOG_WARNING(
"This TRT engine requires NCCL but multiple NCCL process groups are registered ("
- << names << "). Cannot auto-select a group — NCCL bind deferred. "
- "Use the recommended workflow: "
- "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
+ << names
+ << "). Cannot auto-select a group — NCCL bind deferred. "
+ "Use the recommended workflow: "
+ "with torch_tensorrt.distributed.distributed_context(group, model) as m: m(inp)");
} else {
LOG_WARNING(
"This TRT engine requires NCCL (requires_native_multidevice=true) but no NCCL process group "
ERROR: Some files do not conform to style guidelinesThere was a problem hiding this comment.
There are some changes that do not conform to Python style guidelines:
--- /home/runner/work/TensorRT/TensorRT/tests/py/dynamo/distributed/test_export_save_load.py 2026-04-21 19:29:54.973984+00:00
+++ /home/runner/work/TensorRT/TensorRT/tests/py/dynamo/distributed/test_export_save_load.py 2026-04-21 19:30:16.927307+00:00
@@ -299,11 +299,14 @@
path = rank_path(save_dir, rank, world_size)
loaded = torch_tensorrt.load(path)
loaded_model = loaded.module()
- with distributed_context(dist.group.WORLD, [loaded_model, compiled_model]) as (lm, cm):
+ with distributed_context(dist.group.WORLD, [loaded_model, compiled_model]) as (
+ lm,
+ cm,
+ ):
with torch.no_grad():
loaded_output = lm(inp)
compiled_output = cm(inp)
diff = float((compiled_output - loaded_output).abs().max())| name: ${{ matrix.display-name }} | ||
| needs: | ||
| [ | ||
| filter-matrix, | ||
| build, | ||
| L1-dynamo-core-tests, | ||
| L1-dynamo-compile-tests, | ||
| L1-torch-compile-tests, | ||
| L1-torchscript-tests, | ||
| ] | ||
| if: ${{ (github.ref_name == 'main' || github.ref_name == 'nightly' || contains(github.event.pull_request.labels.*.name, 'Force All Tests[L0+L1+L2]')) && always() || success() }} | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| include: | ||
| - repository: pytorch/tensorrt | ||
| package-name: torch_tensorrt | ||
| pre-script: packaging/pre_build_script.sh | ||
| post-script: packaging/post_build_script.sh | ||
| smoke-test-script: packaging/smoke_test_script.sh | ||
| display-name: L2 dynamo distributed tests | ||
| uses: ./.github/workflows/linux-test.yml | ||
| with: | ||
| job-name: L2-dynamo-distributed-tests | ||
| repository: "pytorch/tensorrt" | ||
| ref: "" | ||
| test-infra-repository: pytorch/test-infra | ||
| test-infra-ref: main | ||
| build-matrix: ${{ needs.filter-matrix.outputs.matrix }} | ||
| pre-script: ${{ matrix.pre-script }} | ||
| runner: linux.g4dn.12xlarge.nvidia.gpu | ||
| script: | | ||
| set -euo pipefail | ||
| export USE_HOST_DEPS=1 | ||
| export CI_BUILD=1 | ||
| export USE_TRTLLM_PLUGINS=1 | ||
| dnf install -y mpich mpich-devel openmpi openmpi-devel | ||
| pushd . | ||
| cd tests/py | ||
| cd dynamo | ||
| python -m pytest -ra -v --junitxml=${RUNNER_TEST_RESULTS_DIR}/l2_dynamo_distributed_test_results.xml \ | ||
| distributed/test_nccl_ops.py \ | ||
| distributed/test_native_nccl.py \ | ||
| distributed/test_export_save_load.py | ||
| python -m torch_tensorrt.distributed.run --nproc_per_node=2 distributed/test_native_nccl.py --multirank | ||
| python -m torch_tensorrt.distributed.run --nproc_per_node=2 distributed/test_export_save_load.py --multirank | ||
| popd |
…istributed if its available
28b53a3 to
0525934
Compare
0525934 to
2f05500
Compare
Description
Opening this to test the CI
Fixes # (issue)
Type of change
Please delete options that are not relevant and/or add your own.
Checklist: