feat: add runtime cache API for TensorRT-RTX by tp5uiuc · Pull Request #4180 · pytorch/TensorRT

tp5uiuc · 2026-04-10T20:18:20Z

Description

Add runtime cache support for TensorRT-RTX JIT compilation results, replacing the timing cache which is not used by RTX (no autotuning).

TensorRT-RTX uses JIT compilation at inference time. The runtime cache (IRuntimeCache) stores these compilation results so that kernels and execution graphs are not recompiled on subsequent runs. This is analogous to the timing cache but operates at inference time rather than build time.

Fixes #3817

Changes

Skip timing cache for RTX: Early return in _create_timing_cache() and _save_timing_cache() when ENABLED_FEATURES.tensorrt_rtx is True (timing cache is a no-op in TRT-RTX)
Add runtime_cache_path setting: New RUNTIME_CACHE_PATH default and runtime_cache_path field in CompilationSettings, threaded through all compile functions
Wire up IRuntimeCache in PythonTorchTensorRTModule: Create RuntimeConfig with runtime cache on engine setup, load from disk if available, save on module destruction
File locking: Uses filelock for concurrent access safety when multiple processes share the same cache file
Documentation: Updated docstrings, compilation settings RST, and engine cache tutorial with new "Runtime Cache (TensorRT-RTX)" section

Type of change

New feature (non-breaking change which adds functionality)
This change requires a documentation update

Checklist:

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

Add runtime cache support for TensorRT-RTX JIT compilation results, replacing the timing cache which is not used by RTX (no autotuning). Changes: - Skip timing cache creation/saving for TensorRT-RTX in _TRTInterpreter - Add RUNTIME_CACHE_PATH default and runtime_cache_path setting - Wire up IRuntimeCache in PythonTorchTensorRTModule (setup, load, save) - Persist runtime cache to disk with filelock for concurrent access safety - Thread runtime_cache_path through all compile functions - Add unit tests (12 tests) and E2E model tests (6 tests) - Update docstrings and RST documentation Fixes pytorch#3817 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

tp5uiuc · 2026-04-10T20:20:20Z

docsrc/tutorials/resource_memory/engine_cache.rst

+   The timing cache is **not used with TensorRT-RTX**, which does not perform
+   autotuning. For TensorRT-RTX, see the *Runtime Cache* section below.
+
+Runtime Cache (TensorRT-RTX)


I have added the runtime cache to both APIs and docs, but these are shared between Enterprise and RTX TensorRT. I don't know if that's OK.

tp5uiuc · 2026-04-10T20:26:40Z

tests/py/dynamo/runtime/test_000_runtime_cache.py

@@ -0,0 +1,287 @@
+import gc


Let me know if the filename needs changing

tp5uiuc · 2026-04-10T20:27:49Z

tests/py/dynamo/models/test_runtime_cache_models.py

@@ -0,0 +1,329 @@
+import gc


Do these tests automatically get picked up, or is there a test list that we should add new test to?

tp5uiuc · 2026-04-10T20:29:18Z

py/torch_tensorrt/dynamo/runtime/_PythonTorchTensorRTModule.py

+            logger.debug(f"No existing runtime cache at {self.runtime_cache_path}")
+            return
+        try:
+            from filelock import FileLock


filelock is a torch dependency already, so we are not introducing additional dependencies here just for this feature. The version will be kept generic enough so that torch is the one providing the right version.

Version provided by upstream torch; no pin needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

tp5uiuc · 2026-04-10T20:56:04Z

tests/py/dynamo/runtime/test_000_runtime_cache.py

+    ENABLED_FEATURES.tensorrt_rtx,
+    "This test verifies standard TRT behavior (non-RTX)",
+)
+class TestNonRTXUnchanged(TestCase):


This can be removed let me know (I asked Claude to be extra defensive)

tp5uiuc · 2026-04-10T20:57:08Z

py/requirements.txt

 --extra-index-url https://pypi.ngc.nvidia.com
 pyyaml
 dllist
+filelock


Torch doesn't pin filelock as well, so there should be no dependency resolution failures I think

tp5uiuc · 2026-04-10T20:57:26Z

setup.py

    base_requirements = [
        "packaging>=23",
        "typing-extensions>=4.7.0",
+        "filelock",


uv.lock already has filelock because of torch

tp5uiuc · 2026-04-12T08:48:49Z

py/torch_tensorrt/dynamo/runtime/_PythonTorchTensorRTModule.py

+        if ENABLED_FEATURES.tensorrt_rtx:
+            self._setup_runtime_config()
+
+        self.context = self._create_context()


This only targets the Python runtime. Same with the dynamic shapes and cuda graphs MR that are to follow.

The C++ runtime changes potentially needs an ABI change, so I will put those in a separate MR after all the python-only changes are finalized.

tp5uiuc · 2026-04-12T09:25:27Z

py/torch_tensorrt/dynamo/runtime/_PythonTorchTensorRTModule.py

@@ -257,7 +264,7 @@ def set_device_memory_budget(self, budget_bytes: int) -> int:
        if self.context is not None:


Should there be a call to self._check_initialized()?

tp5uiuc · 2026-04-13T21:59:24Z

py/torch_tensorrt/dynamo/_compiler.py

    dryrun: bool = _defaults.DRYRUN,
    hardware_compatible: bool = _defaults.HARDWARE_COMPATIBLE,
    timing_cache_path: str = _defaults.TIMING_CACHE_PATH,
+    runtime_cache_path: str = _defaults.RUNTIME_CACHE_PATH,


Runtime cache is a JIT-time API : it may not much make sense for cross_compile_for_windows and convert_exported_program_to_serialized_trt_engine. I have added it to the interface as a common API for entry point into torch-TRT, but I can add it to unsupported_settings

meta-cla bot added the cla signed label Apr 10, 2026

tp5uiuc marked this pull request as draft April 10, 2026 20:18

github-actions bot requested a review from cehongwang April 10, 2026 20:18

tp5uiuc commented Apr 10, 2026

View reviewed changes

build: add filelock to base dependencies

882a001

Version provided by upstream torch; no pin needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions bot added the component: build system Issues re: Build system label Apr 10, 2026

tp5uiuc commented Apr 10, 2026

View reviewed changes

tp5uiuc marked this pull request as ready for review April 10, 2026 20:58

cehongwang requested review from lanluo-nvidia and removed request for cehongwang April 11, 2026 00:17

tp5uiuc commented Apr 12, 2026

View reviewed changes

tp5uiuc mentioned this pull request Apr 12, 2026

feat: add dynamic shapes kernel specialization strategy for TRT-RTX #4184

Draft

7 tasks

tp5uiuc commented Apr 13, 2026

View reviewed changes

tp5uiuc mentioned this pull request Apr 14, 2026

feat: add TRT-RTX native CUDA graph support #4187

Draft

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add runtime cache API for TensorRT-RTX#4180

feat: add runtime cache API for TensorRT-RTX#4180
tp5uiuc wants to merge 2 commits intopytorch:mainfrom
tp5uiuc:feat/runtime-cache-rtx

tp5uiuc commented Apr 10, 2026

Uh oh!

tp5uiuc Apr 10, 2026

Uh oh!

tp5uiuc Apr 10, 2026

Uh oh!

tp5uiuc Apr 10, 2026

Uh oh!

tp5uiuc Apr 10, 2026

Uh oh!

tp5uiuc Apr 10, 2026 •

edited

Loading

Uh oh!

tp5uiuc Apr 10, 2026

Uh oh!

tp5uiuc Apr 10, 2026

Uh oh!

tp5uiuc Apr 12, 2026

Uh oh!

tp5uiuc Apr 12, 2026

Uh oh!

tp5uiuc Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		@@ -257,7 +264,7 @@ def set_device_memory_budget(self, budget_bytes: int) -> int:
		if self.context is not None:

Conversation

tp5uiuc commented Apr 10, 2026

Description

Changes

Type of change

Checklist:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tp5uiuc Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tp5uiuc Apr 10, 2026 •

edited

Loading