feat: add runtime cache API for TensorRT-RTX#4180
feat: add runtime cache API for TensorRT-RTX#4180tp5uiuc wants to merge 2 commits intopytorch:mainfrom
Conversation
Add runtime cache support for TensorRT-RTX JIT compilation results, replacing the timing cache which is not used by RTX (no autotuning). Changes: - Skip timing cache creation/saving for TensorRT-RTX in _TRTInterpreter - Add RUNTIME_CACHE_PATH default and runtime_cache_path setting - Wire up IRuntimeCache in PythonTorchTensorRTModule (setup, load, save) - Persist runtime cache to disk with filelock for concurrent access safety - Thread runtime_cache_path through all compile functions - Add unit tests (12 tests) and E2E model tests (6 tests) - Update docstrings and RST documentation Fixes pytorch#3817 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| The timing cache is **not used with TensorRT-RTX**, which does not perform | ||
| autotuning. For TensorRT-RTX, see the *Runtime Cache* section below. | ||
|
|
||
| Runtime Cache (TensorRT-RTX) |
There was a problem hiding this comment.
I have added the runtime cache to both APIs and docs, but these are shared between Enterprise and RTX TensorRT. I don't know if that's OK.
| @@ -0,0 +1,287 @@ | |||
| import gc | |||
There was a problem hiding this comment.
Let me know if the filename needs changing
| @@ -0,0 +1,329 @@ | |||
| import gc | |||
There was a problem hiding this comment.
Do these tests automatically get picked up, or is there a test list that we should add new test to?
| logger.debug(f"No existing runtime cache at {self.runtime_cache_path}") | ||
| return | ||
| try: | ||
| from filelock import FileLock |
There was a problem hiding this comment.
filelock is a torch dependency already, so we are not introducing additional dependencies here just for this feature. The version will be kept generic enough so that torch is the one providing the right version.
Version provided by upstream torch; no pin needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| ENABLED_FEATURES.tensorrt_rtx, | ||
| "This test verifies standard TRT behavior (non-RTX)", | ||
| ) | ||
| class TestNonRTXUnchanged(TestCase): |
There was a problem hiding this comment.
This can be removed let me know (I asked Claude to be extra defensive)
| --extra-index-url https://pypi.ngc.nvidia.com | ||
| pyyaml | ||
| dllist | ||
| filelock |
There was a problem hiding this comment.
Torch doesn't pin filelock as well, so there should be no dependency resolution failures I think
| base_requirements = [ | ||
| "packaging>=23", | ||
| "typing-extensions>=4.7.0", | ||
| "filelock", |
There was a problem hiding this comment.
uv.lock already has filelock because of torch
| if ENABLED_FEATURES.tensorrt_rtx: | ||
| self._setup_runtime_config() | ||
|
|
||
| self.context = self._create_context() |
There was a problem hiding this comment.
This only targets the Python runtime. Same with the dynamic shapes and cuda graphs MR that are to follow.
The C++ runtime changes potentially needs an ABI change, so I will put those in a separate MR after all the python-only changes are finalized.
| @@ -257,7 +264,7 @@ def set_device_memory_budget(self, budget_bytes: int) -> int: | |||
| if self.context is not None: | |||
There was a problem hiding this comment.
Should there be a call to self._check_initialized()?
| dryrun: bool = _defaults.DRYRUN, | ||
| hardware_compatible: bool = _defaults.HARDWARE_COMPATIBLE, | ||
| timing_cache_path: str = _defaults.TIMING_CACHE_PATH, | ||
| runtime_cache_path: str = _defaults.RUNTIME_CACHE_PATH, |
There was a problem hiding this comment.
Runtime cache is a JIT-time API : it may not much make sense for cross_compile_for_windows and convert_exported_program_to_serialized_trt_engine. I have added it to the interface as a common API for entry point into torch-TRT, but I can add it to unsupported_settings
Description
Add runtime cache support for TensorRT-RTX JIT compilation results, replacing the timing cache which is not used by RTX (no autotuning).
TensorRT-RTX uses JIT compilation at inference time. The runtime cache (
IRuntimeCache) stores these compilation results so that kernels and execution graphs are not recompiled on subsequent runs. This is analogous to the timing cache but operates at inference time rather than build time.Fixes #3817
Changes
_create_timing_cache()and_save_timing_cache()whenENABLED_FEATURES.tensorrt_rtxis True (timing cache is a no-op in TRT-RTX)runtime_cache_pathsetting: NewRUNTIME_CACHE_PATHdefault andruntime_cache_pathfield inCompilationSettings, threaded through all compile functionsIRuntimeCacheinPythonTorchTensorRTModule: CreateRuntimeConfigwith runtime cache on engine setup, load from disk if available, save on module destructionfilelockfor concurrent access safety when multiple processes share the same cache fileType of change
Checklist: