Move TRT-RTX runtime mode controls from CompilationSettings to runtime context managers

## Background

Per [@narendasan's review comments](https://github.com/pytorch/TensorRT/pull/4237#discussion_) on #4237, several settings that were added to `CompilationSettings` for the TensorRT-RTX runtime features look more like **runtime mode controls** than properties of the compiled engine, and shouldn't be pinned at compile time or serialized into the engine info.

> "Runtime mode controls should be controlled via context managers rather than passed in at compile time. Only information that is fixed at runtime needs to be here."
>
> "Also are these properties of the engine or are they runtime mode configurations? The point of this interface is the bare minimum information to reconstruct the program from disk"

The same principle is what `torch_tensorrt.runtime.set_cudagraphs_mode(...)` already follows for whole-graph cudagraphs.

## Affected settings

Introduced across #4180, #4184, #4187 and #4237 (and their Python/C++ counterparts):

| Setting | Source | Runtime-mutable? |
|---|---|---|
| `cuda_graph_strategy` | #4187 / #4237 | **Yes** — user may want to flip per call |
| `dynamic_shapes_kernel_specialization_strategy` | #4184 / #4237 | **Yes** — affects per-execution kernel selection |
| `runtime_cache_path` | #4180 / #4237 | **Borderline** — pinned once, but is a runtime concern, not an engine property |

Today these live on `CompilationSettings`, are threaded through `_compiler.compile(...)` and friends, and get packed into the serialized engine tuple alongside true engine properties (binding names, target platform, etc.).

## Proposed design

Move these to runtime context managers, mirroring `torch_tensorrt.runtime.set_cudagraphs_mode`:

```python
with torch_tensorrt.runtime.cuda_graph_strategy("whole_graph_capture"):
    compiled_module(*inputs)

with torch_tensorrt.runtime.dynamic_shapes_kernel_strategy("eager"):
    compiled_module(*inputs)

torch_tensorrt.runtime.set_runtime_cache_path("/tmp/trt_rtx.cache")
```

The C++ engine reads these from runtime state when constructing `IRuntimeConfig`, not from the serialized engine info.

Implications:
- Remove these fields from `CompilationSettings`, `_compiler.compile(...)`, `_compiler.cross_compile_for_windows(...)`, and `_compiler.convert_exported_program_to_serialized_trt_engine(...)`.
- Drop `HAS_RUNTIME_CFG_IDX` + `RUNTIME_CACHE_PATH_IDX` + `DYNAMIC_SHAPES_KERNEL_STRATEGY_IDX` + `CUDA_GRAPH_STRATEGY_IDX` from `SerializedInfoIndex`. `SERIALIZATION_LEN` drops back to 12.
- Engine construction reads runtime context state (similar to `CUDAGRAPHS_MODE`) to populate `TRTRuntimeConfig` when the engine sets up `IRuntimeConfig`.
- A new Python `torch_tensorrt.runtime.*` API surface for each control, plus C++ globals (e.g. `extern std::string RUNTIME_CACHE_PATH;`) following the `MULTI_DEVICE_SAFE_MODE` / `CUDAGRAPHS_MODE` pattern.

## Scope

Coordinated change across:

- `py/torch_tensorrt/dynamo/_settings.py`, `_defaults.py`, `_compiler.py` — remove the three fields
- `py/torch_tensorrt/runtime/` — new context-manager modules
- `py/torch_tensorrt/dynamo/runtime/_TRTEngine.py` (Python runtime) — read context state at `_setup_runtime_config`
- `py/torch_tensorrt/dynamo/runtime/_TorchTensorRTModule.py` — drop the cache + map state; remove `HAS_RUNTIME_CFG_IDX` packing
- `core/runtime/runtime.h` — drop the four `*_IDX` slots; add global state for the three controls
- `core/runtime/register_jit_hooks.cpp` — remove the slot accessors; add `get`/`set` ops for the three controls
- `core/runtime/TRTEngine.cpp` — populate `runtime_cfg` from global state in the constructor
- `core/runtime/TRTRuntimeConfig.cpp` — drop `make_runtime_config_from_serialized`; the engine handles it inline
- Tests — both Python-runtime and C++-runtime test files need to be updated to use context managers

## Open questions

- Should `runtime_cache_path` be runtime-mutable or compile-time-pinned? It's set once in practice (path doesn't change per call), so a `torch_tensorrt.runtime.set_runtime_cache_path(...)` module-level singleton may be enough.
- Backward-compat: do we keep the `CompilationSettings` fields as no-op deprecation warnings for one release, or hard-remove?
- Interaction with `CudaGraphsTorchTensorRTModule._check_monolithic_capturability`: today it inspects `engine._rtx_native_cudagraphs`; with a runtime context manager the check happens against the current context state instead.

## Related

- #4180 — original Python runtime cache (merged)
- #4184 — Python dynamic shapes kernel specialization (merged)
- #4187 — Python TRT-RTX native CUDA graph strategy
- #4237 — earlier C++ runtime + file-locking PR (review thread that prompted this)
- #4294 — Python reintroduction after the runtime rework
- The C++ counterpart of #4294 (currently on `tp5uiuc/feat/trtrtx-cpp-runtime-v2`)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move TRT-RTX runtime mode controls from CompilationSettings to runtime context managers #4310

Background

Affected settings

Proposed design

Scope

Open questions

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Setting	Source	Runtime-mutable?
`cuda_graph_strategy`	#4187 / #4237	Yes — user may want to flip per call
`dynamic_shapes_kernel_specialization_strategy`	#4184 / #4237	Yes — affects per-execution kernel selection
`runtime_cache_path`	#4180 / #4237	Borderline — pinned once, but is a runtime concern, not an engine property

Move TRT-RTX runtime mode controls from CompilationSettings to runtime context managers #4310

Description

Background

Affected settings

Proposed design

Scope

Open questions

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions