Expand RuntimeSettings with `profile_execution`, `resource_allocation_strategy`, `weight_streaming_budget` options #4346

tp5uiuc · 2026-06-17T03:18:09Z

tp5uiuc
Jun 17, 2026
Collaborator

This is a follow-up RFC from #4323. The existing RuntimeSettings covers three runtime knobs (dynamic_shapes_kernel_specialization_strategy, cuda_graph_strategy, runtime_cache) which necessitate context-recreation. Several other module-level controls share the same lifecycle (flip -> drop exec ctx -> next forward rebuilds) but live outside the dataclass, as standalone setters / CMs. The current RFC's motivation is to pull them in so users have one coherent surface and the snapshot/restore behaviour of the torch_tensorrt.runtime.runtime_config CM extends to them for free. Existing setters/CMs will also be kept (for historical reasons) but will be simple syntax sugar over the runtime_config() API

Scope

The initial implementation is to add three new fields on torch_tensorrt.runtime.RuntimeSettings, mirrored on the C++ side, all sharing the existing invalidate-on-change lifecycle (no new logic required for this, everything has been implemented already):

Field	Type	Replaces	Lifecycle
profile_execution	Optional[str] (None = off; str = path prefix)	mod.enable_profiling(path) / mod.disable_profiling() (no CM today)	invalidate ctx; rebuild attaches IProfiler at create time
profile_format	str (default "perfetto")	engine.set_profile_format(fmt) (passed via enable_profiling)	same; orthogonal "how" knob paired with the path
resource_allocation_strategy	str ("static" / "dynamic")	mod.set_resource_allocation_strategy(...) + ResourceAllocationStrategy(mod, ...) CM in _ResourceAllocator.py	invalidate ctx; rebuild picks new ExecutionContextAllocationStrategy

UX is seamless and consistent with other knobs today

Today mod.enable_profiling(path) / mod.disable_profiling() is a bare accessor pair: no CM, no exception-safety, no snapshot/restore. Forgetting the disable_profiling() (or an exception between the two) leaves the engine in profiling mode for the rest of the process. ResourceAllocationStrategy(mod, ...) has a CM but it doesn't restore the old allocation strategy, see below. Folding both into RuntimeSettings delivers the CM "for free" via the existing runtime_config(...):

with torch_tensorrt.runtime.runtime_config(mod, profile_execution="/tmp/run42"):
    out = mod(*inputs)
# automatically reverts on __exit__ (including on exception)

with torch_tensorrt.runtime.runtime_config(mod, resource_allocation_strategy="dynamic"):
    out = mod(*inputs)
# original strategy properly restored on exit

Existing ResourceAllocationStrategy CM bug

In the standalone CM at py/torch_tensorrt/dynamo/runtime/_ResourceAllocator.py both enter and exit apply the same flag, so it latches the strategy rather than scoping it. This is in contradiction to the docstring's "Upon exiting, it restores them to their original (static) resource allocation mode". The migration into runtime_config's proper snapshot/restore fixes this incidentally.

Design summary

Python RuntimeSettings dataclass gains the three fields above (profile_execution + profile_format count as one logical setting, with the path being optional-presence and the format orthogonal).
C++ RuntimeSettings struct mirrors them. A new ResourceAllocationStrategy enum-wrapper class follows the existing DynamicShapesKernelSpecializationStrategy / CudaGraphStrategy pattern (enum Value + to_string / from_string / from_underlying, two-overload operator==, implicit operator Value(), boundary cast via .to_underlying() for gcc-13) can be optionally added
RuntimeSettings::operator== and to_str extended.
TRTRuntimeConfig::ensure_initialized applies all three at context-create time (profile attach + format + ras pass through to createExecutionContext).
TRTEngine::runtime_settings(new_settings) setter unchanged: already invalidates exec ctx on any settings difference; the new operator== extension brings the new fields under the same check.
TorchBind Engine.update_runtime_settings schema extended from 3 args to 6 (flat). profile_execution uses empty string for "off".
_TorchTensorRTModule._send_to_engine passes the new args.
Migration: keep mod.set_resource_allocation_strategy, mod.enable_profiling, mod.disable_profiling, and the ResourceAllocationStrategy CM as thin shims that construct a RuntimeSettings and re-enter the setter.

Out of scope

weight_streaming_budget. The existing torch_tensorrt.runtime.weight_streaming CM has surface that does not fit a flat dataclass field: multi-submodule proportional distribution of the budget, introspection attributes (total_device_budget, streamable_budget, current_device_budget, get_automatic_weight_streaming_budget()), CUDA-graphs interaction (_reset_captured_graph on flip), and mid-block re-flips via property assignment. Folding it would either lose features or grow runtime_config into a CM-redesign.
nvtx_verbosity (could live-apply without exec ctx invalidation; filed separately if asked).

tp5uiuc · 2026-06-17T03:19:46Z

tp5uiuc
Jun 17, 2026
Collaborator Author

CC+ @narendasan for visibility. One (maybe huge) downside is that current RuntimeSettings are all TRT-RTX specific flags while the proposal is to update the flags to include product-agnostic knobs as well. We will continue to throw a warning if the wrong flags are used for the current installed TRT, but this maybe confusing to the users and we should weight cost-benefit.

0 replies

narendasan · 2026-06-22T16:18:54Z

narendasan
Jun 22, 2026
Collaborator

I think it makes sense to centralize these sort of light touch pieces of functionality that requires the context manager to be recreated. I think its ok to throw not implemented for RTX settings. I think we need to put in some thought into which settings should just throw warnings and pass through and which should error out if not supported.

CUDA Graphs and similar more complicated runtime modes we might need to think a bit about

1 reply

tp5uiuc Jun 23, 2026
Collaborator Author

Thanks Naren, I will prepare an MR for you to review then.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Expand RuntimeSettings with `profile_execution`, `resource_allocation_strategy`, `weight_streaming_budget` options #4346

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Expand RuntimeSettings with profile_execution, resource_allocation_strategy, weight_streaming_budget options #4346

Uh oh!

tp5uiuc Jun 17, 2026 Collaborator

Scope

UX is seamless and consistent with other knobs today

Existing ResourceAllocationStrategy CM bug

Design summary

Out of scope

Replies: 2 comments · 1 reply

Uh oh!

tp5uiuc Jun 17, 2026 Collaborator Author

Uh oh!

narendasan Jun 22, 2026 Collaborator

Uh oh!

tp5uiuc Jun 23, 2026 Collaborator Author

Expand RuntimeSettings with `profile_execution`, `resource_allocation_strategy`, `weight_streaming_budget` options #4346

tp5uiuc
Jun 17, 2026
Collaborator

Replies: 2 comments 1 reply

tp5uiuc
Jun 17, 2026
Collaborator Author

narendasan
Jun 22, 2026
Collaborator

tp5uiuc Jun 23, 2026
Collaborator Author