Replies: 2 comments 1 reply
-
|
CC+ @narendasan for visibility. One (maybe huge) downside is that current RuntimeSettings are all TRT-RTX specific flags while the proposal is to update the flags to include product-agnostic knobs as well. We will continue to throw a warning if the wrong flags are used for the current installed TRT, but this maybe confusing to the users and we should weight cost-benefit. |
Beta Was this translation helpful? Give feedback.
-
|
I think it makes sense to centralize these sort of light touch pieces of functionality that requires the context manager to be recreated. I think its ok to throw CUDA Graphs and similar more complicated runtime modes we might need to think a bit about |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
This is a follow-up RFC from #4323. The existing RuntimeSettings covers three runtime knobs (dynamic_shapes_kernel_specialization_strategy, cuda_graph_strategy, runtime_cache) which necessitate context-recreation. Several other module-level controls share the same lifecycle (flip -> drop exec ctx -> next forward rebuilds) but live outside the dataclass, as standalone setters / CMs. The current RFC's motivation is to pull them in so users have one coherent surface and the snapshot/restore behaviour of the torch_tensorrt.runtime.runtime_config CM extends to them for free. Existing setters/CMs will also be kept (for historical reasons) but will be simple syntax sugar over the
runtime_config()APIScope
The initial implementation is to add three new fields on torch_tensorrt.runtime.RuntimeSettings, mirrored on the C++ side, all sharing the existing invalidate-on-change lifecycle (no new logic required for this, everything has been implemented already):
UX is seamless and consistent with other knobs today
Today mod.enable_profiling(path) / mod.disable_profiling() is a bare accessor pair: no CM, no exception-safety, no snapshot/restore. Forgetting the disable_profiling() (or an exception between the two) leaves the engine in profiling mode for the rest of the process. ResourceAllocationStrategy(mod, ...) has a CM but it doesn't restore the old allocation strategy, see below. Folding both into RuntimeSettings delivers the CM "for free" via the existing runtime_config(...):
Existing ResourceAllocationStrategy CM bug
In the standalone CM at py/torch_tensorrt/dynamo/runtime/_ResourceAllocator.py both enter and exit apply the same flag, so it latches the strategy rather than scoping it. This is in contradiction to the docstring's "Upon exiting, it restores them to their original (static) resource allocation mode". The migration into runtime_config's proper snapshot/restore fixes this incidentally.
Design summary
Out of scope
current_device_budget, get_automatic_weight_streaming_budget()), CUDA-graphs interaction (_reset_captured_graph on flip), and mid-block re-flips via property assignment. Folding it would either lose features or grow runtime_config into a CM-redesign.Beta Was this translation helpful? Give feedback.
All reactions