Background
Per @narendasan's review comments on #4237, several settings that were added to CompilationSettings for the TensorRT-RTX runtime features look more like runtime mode controls than properties of the compiled engine, and shouldn't be pinned at compile time or serialized into the engine info.
"Runtime mode controls should be controlled via context managers rather than passed in at compile time. Only information that is fixed at runtime needs to be here."
"Also are these properties of the engine or are they runtime mode configurations? The point of this interface is the bare minimum information to reconstruct the program from disk"
The same principle is what torch_tensorrt.runtime.set_cudagraphs_mode(...) already follows for whole-graph cudagraphs.
Affected settings
Introduced across #4180, #4184, #4187 and #4237 (and their Python/C++ counterparts):
| Setting |
Source |
Runtime-mutable? |
cuda_graph_strategy |
#4187 / #4237 |
Yes — user may want to flip per call |
dynamic_shapes_kernel_specialization_strategy |
#4184 / #4237 |
Yes — affects per-execution kernel selection |
runtime_cache_path |
#4180 / #4237 |
Borderline — pinned once, but is a runtime concern, not an engine property |
Today these live on CompilationSettings, are threaded through _compiler.compile(...) and friends, and get packed into the serialized engine tuple alongside true engine properties (binding names, target platform, etc.).
Proposed design
Move these to runtime context managers, mirroring torch_tensorrt.runtime.set_cudagraphs_mode:
with torch_tensorrt.runtime.cuda_graph_strategy("whole_graph_capture"):
compiled_module(*inputs)
with torch_tensorrt.runtime.dynamic_shapes_kernel_strategy("eager"):
compiled_module(*inputs)
torch_tensorrt.runtime.set_runtime_cache_path("/tmp/trt_rtx.cache")
The C++ engine reads these from runtime state when constructing IRuntimeConfig, not from the serialized engine info.
Implications:
- Remove these fields from
CompilationSettings, _compiler.compile(...), _compiler.cross_compile_for_windows(...), and _compiler.convert_exported_program_to_serialized_trt_engine(...).
- Drop
HAS_RUNTIME_CFG_IDX + RUNTIME_CACHE_PATH_IDX + DYNAMIC_SHAPES_KERNEL_STRATEGY_IDX + CUDA_GRAPH_STRATEGY_IDX from SerializedInfoIndex. SERIALIZATION_LEN drops back to 12.
- Engine construction reads runtime context state (similar to
CUDAGRAPHS_MODE) to populate TRTRuntimeConfig when the engine sets up IRuntimeConfig.
- A new Python
torch_tensorrt.runtime.* API surface for each control, plus C++ globals (e.g. extern std::string RUNTIME_CACHE_PATH;) following the MULTI_DEVICE_SAFE_MODE / CUDAGRAPHS_MODE pattern.
Scope
Coordinated change across:
py/torch_tensorrt/dynamo/_settings.py, _defaults.py, _compiler.py — remove the three fields
py/torch_tensorrt/runtime/ — new context-manager modules
py/torch_tensorrt/dynamo/runtime/_TRTEngine.py (Python runtime) — read context state at _setup_runtime_config
py/torch_tensorrt/dynamo/runtime/_TorchTensorRTModule.py — drop the cache + map state; remove HAS_RUNTIME_CFG_IDX packing
core/runtime/runtime.h — drop the four *_IDX slots; add global state for the three controls
core/runtime/register_jit_hooks.cpp — remove the slot accessors; add get/set ops for the three controls
core/runtime/TRTEngine.cpp — populate runtime_cfg from global state in the constructor
core/runtime/TRTRuntimeConfig.cpp — drop make_runtime_config_from_serialized; the engine handles it inline
- Tests — both Python-runtime and C++-runtime test files need to be updated to use context managers
Open questions
- Should
runtime_cache_path be runtime-mutable or compile-time-pinned? It's set once in practice (path doesn't change per call), so a torch_tensorrt.runtime.set_runtime_cache_path(...) module-level singleton may be enough.
- Backward-compat: do we keep the
CompilationSettings fields as no-op deprecation warnings for one release, or hard-remove?
- Interaction with
CudaGraphsTorchTensorRTModule._check_monolithic_capturability: today it inspects engine._rtx_native_cudagraphs; with a runtime context manager the check happens against the current context state instead.
Related
Background
Per @narendasan's review comments on #4237, several settings that were added to
CompilationSettingsfor the TensorRT-RTX runtime features look more like runtime mode controls than properties of the compiled engine, and shouldn't be pinned at compile time or serialized into the engine info.The same principle is what
torch_tensorrt.runtime.set_cudagraphs_mode(...)already follows for whole-graph cudagraphs.Affected settings
Introduced across #4180, #4184, #4187 and #4237 (and their Python/C++ counterparts):
cuda_graph_strategydynamic_shapes_kernel_specialization_strategyruntime_cache_pathToday these live on
CompilationSettings, are threaded through_compiler.compile(...)and friends, and get packed into the serialized engine tuple alongside true engine properties (binding names, target platform, etc.).Proposed design
Move these to runtime context managers, mirroring
torch_tensorrt.runtime.set_cudagraphs_mode:The C++ engine reads these from runtime state when constructing
IRuntimeConfig, not from the serialized engine info.Implications:
CompilationSettings,_compiler.compile(...),_compiler.cross_compile_for_windows(...), and_compiler.convert_exported_program_to_serialized_trt_engine(...).HAS_RUNTIME_CFG_IDX+RUNTIME_CACHE_PATH_IDX+DYNAMIC_SHAPES_KERNEL_STRATEGY_IDX+CUDA_GRAPH_STRATEGY_IDXfromSerializedInfoIndex.SERIALIZATION_LENdrops back to 12.CUDAGRAPHS_MODE) to populateTRTRuntimeConfigwhen the engine sets upIRuntimeConfig.torch_tensorrt.runtime.*API surface for each control, plus C++ globals (e.g.extern std::string RUNTIME_CACHE_PATH;) following theMULTI_DEVICE_SAFE_MODE/CUDAGRAPHS_MODEpattern.Scope
Coordinated change across:
py/torch_tensorrt/dynamo/_settings.py,_defaults.py,_compiler.py— remove the three fieldspy/torch_tensorrt/runtime/— new context-manager modulespy/torch_tensorrt/dynamo/runtime/_TRTEngine.py(Python runtime) — read context state at_setup_runtime_configpy/torch_tensorrt/dynamo/runtime/_TorchTensorRTModule.py— drop the cache + map state; removeHAS_RUNTIME_CFG_IDXpackingcore/runtime/runtime.h— drop the four*_IDXslots; add global state for the three controlscore/runtime/register_jit_hooks.cpp— remove the slot accessors; addget/setops for the three controlscore/runtime/TRTEngine.cpp— populateruntime_cfgfrom global state in the constructorcore/runtime/TRTRuntimeConfig.cpp— dropmake_runtime_config_from_serialized; the engine handles it inlineOpen questions
runtime_cache_pathbe runtime-mutable or compile-time-pinned? It's set once in practice (path doesn't change per call), so atorch_tensorrt.runtime.set_runtime_cache_path(...)module-level singleton may be enough.CompilationSettingsfields as no-op deprecation warnings for one release, or hard-remove?CudaGraphsTorchTensorRTModule._check_monolithic_capturability: today it inspectsengine._rtx_native_cudagraphs; with a runtime context manager the check happens against the current context state instead.Related
tp5uiuc/feat/trtrtx-cpp-runtime-v2)