Skip to content

Move TRT-RTX runtime mode controls from CompilationSettings to runtime context managers #4310

@tp5uiuc

Description

@tp5uiuc

Background

Per @narendasan's review comments on #4237, several settings that were added to CompilationSettings for the TensorRT-RTX runtime features look more like runtime mode controls than properties of the compiled engine, and shouldn't be pinned at compile time or serialized into the engine info.

"Runtime mode controls should be controlled via context managers rather than passed in at compile time. Only information that is fixed at runtime needs to be here."

"Also are these properties of the engine or are they runtime mode configurations? The point of this interface is the bare minimum information to reconstruct the program from disk"

The same principle is what torch_tensorrt.runtime.set_cudagraphs_mode(...) already follows for whole-graph cudagraphs.

Affected settings

Introduced across #4180, #4184, #4187 and #4237 (and their Python/C++ counterparts):

Setting Source Runtime-mutable?
cuda_graph_strategy #4187 / #4237 Yes — user may want to flip per call
dynamic_shapes_kernel_specialization_strategy #4184 / #4237 Yes — affects per-execution kernel selection
runtime_cache_path #4180 / #4237 Borderline — pinned once, but is a runtime concern, not an engine property

Today these live on CompilationSettings, are threaded through _compiler.compile(...) and friends, and get packed into the serialized engine tuple alongside true engine properties (binding names, target platform, etc.).

Proposed design

Move these to runtime context managers, mirroring torch_tensorrt.runtime.set_cudagraphs_mode:

with torch_tensorrt.runtime.cuda_graph_strategy("whole_graph_capture"):
    compiled_module(*inputs)

with torch_tensorrt.runtime.dynamic_shapes_kernel_strategy("eager"):
    compiled_module(*inputs)

torch_tensorrt.runtime.set_runtime_cache_path("/tmp/trt_rtx.cache")

The C++ engine reads these from runtime state when constructing IRuntimeConfig, not from the serialized engine info.

Implications:

  • Remove these fields from CompilationSettings, _compiler.compile(...), _compiler.cross_compile_for_windows(...), and _compiler.convert_exported_program_to_serialized_trt_engine(...).
  • Drop HAS_RUNTIME_CFG_IDX + RUNTIME_CACHE_PATH_IDX + DYNAMIC_SHAPES_KERNEL_STRATEGY_IDX + CUDA_GRAPH_STRATEGY_IDX from SerializedInfoIndex. SERIALIZATION_LEN drops back to 12.
  • Engine construction reads runtime context state (similar to CUDAGRAPHS_MODE) to populate TRTRuntimeConfig when the engine sets up IRuntimeConfig.
  • A new Python torch_tensorrt.runtime.* API surface for each control, plus C++ globals (e.g. extern std::string RUNTIME_CACHE_PATH;) following the MULTI_DEVICE_SAFE_MODE / CUDAGRAPHS_MODE pattern.

Scope

Coordinated change across:

  • py/torch_tensorrt/dynamo/_settings.py, _defaults.py, _compiler.py — remove the three fields
  • py/torch_tensorrt/runtime/ — new context-manager modules
  • py/torch_tensorrt/dynamo/runtime/_TRTEngine.py (Python runtime) — read context state at _setup_runtime_config
  • py/torch_tensorrt/dynamo/runtime/_TorchTensorRTModule.py — drop the cache + map state; remove HAS_RUNTIME_CFG_IDX packing
  • core/runtime/runtime.h — drop the four *_IDX slots; add global state for the three controls
  • core/runtime/register_jit_hooks.cpp — remove the slot accessors; add get/set ops for the three controls
  • core/runtime/TRTEngine.cpp — populate runtime_cfg from global state in the constructor
  • core/runtime/TRTRuntimeConfig.cpp — drop make_runtime_config_from_serialized; the engine handles it inline
  • Tests — both Python-runtime and C++-runtime test files need to be updated to use context managers

Open questions

  • Should runtime_cache_path be runtime-mutable or compile-time-pinned? It's set once in practice (path doesn't change per call), so a torch_tensorrt.runtime.set_runtime_cache_path(...) module-level singleton may be enough.
  • Backward-compat: do we keep the CompilationSettings fields as no-op deprecation warnings for one release, or hard-remove?
  • Interaction with CudaGraphsTorchTensorRTModule._check_monolithic_capturability: today it inspects engine._rtx_native_cudagraphs; with a runtime context manager the check happens against the current context state instead.

Related

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions