Move TRT-RTX runtime mode controls from CompilationSettings to runtime context managers #4323

tp5uiuc · 2026-05-29T20:29:29Z

tp5uiuc
May 29, 2026
Collaborator

Background

Per @narendasan's review comments on #4237, several settings that were added to CompilationSettings for the TensorRT-RTX runtime features look more like runtime mode controls than properties of the compiled engine, and shouldn't be pinned at compile time or serialized into the engine info.

"Runtime mode controls should be controlled via context managers rather than passed in at compile time. Only information that is fixed at runtime needs to be here."

"Also are these properties of the engine or are they runtime mode configurations? The point of this interface is the bare minimum information to reconstruct the program from disk"

The same principle is what torch_tensorrt.runtime.set_cudagraphs_mode(...) already follows for whole-graph cudagraphs.

Affected settings

Introduced across #4180, #4184, #4187 and #4237 (and their Python/C++ counterparts):

Setting	Source	Runtime-mutable?
`cuda_graph_strategy`	#4187 / #4237	Yes — user may want to flip per call
`dynamic_shapes_kernel_specialization_strategy`	#4184 / #4237	Yes — affects per-execution kernel selection
`runtime_cache_path`	#4180 / #4237	Borderline — pinned once, but is a runtime concern, not an engine property

Today these live on CompilationSettings, are threaded through _compiler.compile(...) and friends, and get packed into the serialized engine tuple alongside true engine properties (binding names, target platform, etc.).

Proposed design

Move these to runtime context managers, mirroring torch_tensorrt.runtime.set_cudagraphs_mode:

with torch_tensorrt.runtime.cuda_graph_strategy("whole_graph_capture"):
    compiled_module(*inputs)

with torch_tensorrt.runtime.dynamic_shapes_kernel_strategy("eager"):
    compiled_module(*inputs)

torch_tensorrt.runtime.set_runtime_cache_path("/tmp/trt_rtx.cache")

The C++ engine reads these from runtime state when constructing IRuntimeConfig, not from the serialized engine info.

Implications:

Remove these fields from CompilationSettings, _compiler.compile(...), _compiler.cross_compile_for_windows(...), and _compiler.convert_exported_program_to_serialized_trt_engine(...).
Drop HAS_RUNTIME_CFG_IDX + RUNTIME_CACHE_PATH_IDX + DYNAMIC_SHAPES_KERNEL_STRATEGY_IDX + CUDA_GRAPH_STRATEGY_IDX from SerializedInfoIndex. SERIALIZATION_LEN drops back to 12.
Engine construction reads runtime context state (similar to CUDAGRAPHS_MODE) to populate TRTRuntimeConfig when the engine sets up IRuntimeConfig.
A new Python torch_tensorrt.runtime.* API surface for each control, plus C++ globals (e.g. extern std::string RUNTIME_CACHE_PATH;) following the MULTI_DEVICE_SAFE_MODE / CUDAGRAPHS_MODE pattern.

Scope

Coordinated change across:

py/torch_tensorrt/dynamo/_settings.py, _defaults.py, _compiler.py — remove the three fields
py/torch_tensorrt/runtime/ — new context-manager modules
py/torch_tensorrt/dynamo/runtime/_TRTEngine.py (Python runtime) — read context state at _setup_runtime_config
py/torch_tensorrt/dynamo/runtime/_TorchTensorRTModule.py — drop the cache + map state; remove HAS_RUNTIME_CFG_IDX packing
core/runtime/runtime.h — drop the four *_IDX slots; add global state for the three controls
core/runtime/register_jit_hooks.cpp — remove the slot accessors; add get/set ops for the three controls
core/runtime/TRTEngine.cpp — populate runtime_cfg from global state in the constructor
core/runtime/TRTRuntimeConfig.cpp — drop make_runtime_config_from_serialized; the engine handles it inline
Tests — both Python-runtime and C++-runtime test files need to be updated to use context managers

Open questions

Should runtime_cache_path be runtime-mutable or compile-time-pinned? It's set once in practice (path doesn't change per call), so a torch_tensorrt.runtime.set_runtime_cache_path(...) module-level singleton may be enough.
Backward-compat: do we keep the CompilationSettings fields as no-op deprecation warnings for one release, or hard-remove?
Interaction with CudaGraphsTorchTensorRTModule._check_monolithic_capturability: today it inspects engine._rtx_native_cudagraphs; with a runtime context manager the check happens against the current context state instead.

feat: add runtime cache API for TensorRT-RTX #4180 — original Python runtime cache (merged)
feat: add dynamic shapes kernel specialization strategy for TRT-RTX #4184 — Python dynamic shapes kernel specialization (merged)
feat: add TRT-RTX native CUDA graph support #4187 — Python TRT-RTX native CUDA graph strategy
feat(runtime): file-lock the TRT-RTX runtime cache #4237 — earlier C++ runtime + file-locking PR (review thread that prompted this)
feat: reintroduce TRT-RTX runtime cache, dynamic shapes, and native CUDA graph support #4294 — Python reintroduction after the runtime rework
The C++ counterpart of feat: reintroduce TRT-RTX runtime cache, dynamic shapes, and native CUDA graph support #4294 (currently on tp5uiuc/feat/trtrtx-cpp-runtime-v2)

narendasan · 2026-06-08T16:29:18Z

narendasan
Jun 8, 2026
Collaborator

Going to move this to the RFC section of discussions for easier commenting

1 reply

tp5uiuc Jun 12, 2026
Collaborator Author

Thanks Naren, the final polished NR is here : #4330

narendasan · 2026-06-08T16:30:19Z

narendasan
Jun 8, 2026
Collaborator

Do we want to fold this into the cudagraphs api? cuda_graph_strategy

4 replies

narendasan Jun 8, 2026
Collaborator

maybe some argument like rtx_cuda_graph_strategy

tp5uiuc Jun 12, 2026
Collaborator Author

Yeah, that's what I ended upon as well. For cudagraphs, there's two ways to set it now

Option 1 (Programmatic setting)

mod = torchtrt.compile(model, inputs=inputs)
mod.runtime_settings = RuntimeSettings(cuda_graph_strategy="whole_graph_capture")

with enable_cudagraphs(mod) as cg:
    cg(*inputs)

Option 2 (Scoped)

mod = torchtrt.compile(model, inputs=inputs)

with enable_cudagraphs(mod, cuda_graph_strategy="whole_graph_capture") as cg:
    cg(*inputs)

The cuda_graph_strategy kw in enable_cudagraphs will throw for TRT Enterprise and not for TRT-RTX.

tp5uiuc Jun 12, 2026
Collaborator Author

Users wanting to use the same API for both standard and TRT-RTX

import torch_tensorrt
from torch_tensorrt.runtime import enable_cudagraphs

# Same script works on both runtimes. The kwarg is RTX-only -- pass it only
# when the build is RTX; ``None`` (the default) is silently accepted on both.
strategy = (
    "whole_graph_capture"
    if torch_tensorrt.ENABLED_FEATURES.tensorrt_rtx
    else None
)

with enable_cudagraphs(compiled, cuda_graph_strategy=strategy) as wrapped:
    out = wrapped(*inputs)

# altrernatively
kwargs = (
    {"cuda_graph_strategy": "whole_graph_capture"}
    if torch_tensorrt.ENABLED_FEATURES.tensorrt_rtx
    else {}
)

with enable_cudagraphs(compiled, **kwargs) as wrapped:
    out = wrapped(*inputs)

tp5uiuc Jun 12, 2026
Collaborator Author

I chose to keep it "cuda_graph_strategy" for now because other arguments don't have a rtx_ prefix (I am talking about dynamic shapes and runtime cache). I am ok with adding rtx_ to everything to make the intent more clear. We can discuss more on the MR.

narendasan · 2026-06-08T16:33:26Z

narendasan
Jun 8, 2026
Collaborator

For runtime cache directory, not sure a context manager feels like the right api since I think its something like a property.

Maybe we add some sort of attribute of the engine similar to how we do profiling or layer info might make sense here

4 replies

narendasan Jun 8, 2026
Collaborator

Is it an engine property or a runtime property?

tp5uiuc Jun 12, 2026
Collaborator Author

On a first glance I agree, it does look like it might just be a property. If its a property, it should be a runtime property.

But more increasingly we are seeing customers keeping around the same runtime cache (in memory, not in disk) and attaching it to multiple modules. Having it as a simple string/filename means we
a) force customers to save after new kernels are added (mostly when inference is done)
b) shift the burden of file-locking, cache-integrity (i.e. ensuring no kernels are lost) to userspace
This problem is exacerbated by TRT-RTX not having any introspection APIs for the cache today (how many kernels, what are their signatures etc.)

tp5uiuc Jun 12, 2026
Collaborator Author

The current design allows users to drop in to advanced APIs based of their needs.

Simple usage

Direct, straightforward. Lifetime not in users' control and they don't care. They have a cache-file and want to use it.

    from torch_tensorrt.runtime import runtime_config

    with runtime_config(mod, runtime_cache="my_cache_file.bin"):
        out = mod(x)

Lifetime controlled

User wants to retain lifetime and control over cache usage

    from torch_tensorrt.runtime import runtime_cache

    # Cache implicitly attached to mod for the duration of the with-block.
    # Loaded from disk on enter, saved on exit.
    with runtime_cache(mod, "/var/cache/jit.bin"):
        out = mod(x)

    with runtime_cache(mod, "/var/cache/jit.bin") as rc:
        out = mod(x)
        # mid-block checkpoint
        rc.save("/var/cache/jit-mid.bin")

   # Attach multiple modules too
    with runtime_cache([mod1, mod2], "/var/cache/jit.bin") as rc:
        out = mod1(x)
        out2 = mod2(out)

tp5uiuc Jun 12, 2026
Collaborator Author

We also have the programmatic/direct setting of the runtime cache across the lifetime of a module in case users' dont want to muck around with context managers

narendasan · 2026-06-08T16:34:15Z

narendasan
Jun 8, 2026
Collaborator

dynamic_shapes_kernel_specialization_strategy: this is fine I think we can just execute on this

1 reply

tp5uiuc Jun 12, 2026
Collaborator Author

Yes, this is retained as is.

tp5uiuc · 2026-06-09T17:44:56Z

tp5uiuc
Jun 9, 2026
Collaborator Author

Iterations are being done here : tp5uiuc#3 (I am putting these changes on top of my earlier C++ changes, when I submit for review it will be clean, i.e. incremental on main). You can ignore the current draft MRs

feat(runtime): file-lock the TRT-RTX runtime cache #4237
feat(runtime): add TensorRT-RTX runtime cache, dynamic shapes strategy, and native CUDA graph support to C++ runtime #4202
And all comments there are folded in.

0 replies

Move TRT-RTX runtime mode controls from CompilationSettings to runtime context managers #4323

Uh oh!

tp5uiuc May 29, 2026 Collaborator

Background

Affected settings

Proposed design

Scope

Open questions

Related

Replies: 5 comments · 10 replies

Uh oh!

narendasan Jun 8, 2026 Collaborator

Uh oh!

tp5uiuc Jun 12, 2026 Collaborator Author

Uh oh!

narendasan Jun 8, 2026 Collaborator

Uh oh!

narendasan Jun 8, 2026 Collaborator

Uh oh!

tp5uiuc Jun 12, 2026 Collaborator Author

Option 1 (Programmatic setting)

Option 2 (Scoped)

Uh oh!

tp5uiuc Jun 12, 2026 Collaborator Author

Uh oh!

tp5uiuc Jun 12, 2026 Collaborator Author

Uh oh!

narendasan Jun 8, 2026 Collaborator

Uh oh!

narendasan Jun 8, 2026 Collaborator

Uh oh!

tp5uiuc Jun 12, 2026 Collaborator Author

Uh oh!

tp5uiuc Jun 12, 2026 Collaborator Author

Simple usage

Lifetime controlled

Uh oh!

tp5uiuc Jun 12, 2026 Collaborator Author

Uh oh!

narendasan Jun 8, 2026 Collaborator

Uh oh!

tp5uiuc Jun 12, 2026 Collaborator Author

Uh oh!

tp5uiuc Jun 9, 2026 Collaborator Author

tp5uiuc
May 29, 2026
Collaborator

Replies: 5 comments 10 replies

narendasan
Jun 8, 2026
Collaborator

tp5uiuc Jun 12, 2026
Collaborator Author

narendasan
Jun 8, 2026
Collaborator

narendasan Jun 8, 2026
Collaborator

tp5uiuc Jun 12, 2026
Collaborator Author

tp5uiuc Jun 12, 2026
Collaborator Author

tp5uiuc Jun 12, 2026
Collaborator Author

narendasan
Jun 8, 2026
Collaborator

narendasan Jun 8, 2026
Collaborator

tp5uiuc Jun 12, 2026
Collaborator Author

tp5uiuc Jun 12, 2026
Collaborator Author

tp5uiuc Jun 12, 2026
Collaborator Author

narendasan
Jun 8, 2026
Collaborator

tp5uiuc Jun 12, 2026
Collaborator Author

tp5uiuc
Jun 9, 2026
Collaborator Author