Replies: 5 comments 10 replies
-
|
Going to move this to the RFC section of discussions for easier commenting |
Beta Was this translation helpful? Give feedback.
-
|
Do we want to fold this into the cudagraphs api? |
Beta Was this translation helpful? Give feedback.
-
|
For runtime cache directory, not sure a context manager feels like the right api since I think its something like a property. Maybe we add some sort of attribute of the engine similar to how we do profiling or layer info might make sense here |
Beta Was this translation helpful? Give feedback.
-
|
|
Beta Was this translation helpful? Give feedback.
-
|
Iterations are being done here : tp5uiuc#3 (I am putting these changes on top of my earlier C++ changes, when I submit for review it will be clean, i.e. incremental on main). You can ignore the current draft MRs |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Background
Per @narendasan's review comments on #4237, several settings that were added to
CompilationSettingsfor the TensorRT-RTX runtime features look more like runtime mode controls than properties of the compiled engine, and shouldn't be pinned at compile time or serialized into the engine info.The same principle is what
torch_tensorrt.runtime.set_cudagraphs_mode(...)already follows for whole-graph cudagraphs.Affected settings
Introduced across #4180, #4184, #4187 and #4237 (and their Python/C++ counterparts):
cuda_graph_strategydynamic_shapes_kernel_specialization_strategyruntime_cache_pathToday these live on
CompilationSettings, are threaded through_compiler.compile(...)and friends, and get packed into the serialized engine tuple alongside true engine properties (binding names, target platform, etc.).Proposed design
Move these to runtime context managers, mirroring
torch_tensorrt.runtime.set_cudagraphs_mode:The C++ engine reads these from runtime state when constructing
IRuntimeConfig, not from the serialized engine info.Implications:
CompilationSettings,_compiler.compile(...),_compiler.cross_compile_for_windows(...), and_compiler.convert_exported_program_to_serialized_trt_engine(...).HAS_RUNTIME_CFG_IDX+RUNTIME_CACHE_PATH_IDX+DYNAMIC_SHAPES_KERNEL_STRATEGY_IDX+CUDA_GRAPH_STRATEGY_IDXfromSerializedInfoIndex.SERIALIZATION_LENdrops back to 12.CUDAGRAPHS_MODE) to populateTRTRuntimeConfigwhen the engine sets upIRuntimeConfig.torch_tensorrt.runtime.*API surface for each control, plus C++ globals (e.g.extern std::string RUNTIME_CACHE_PATH;) following theMULTI_DEVICE_SAFE_MODE/CUDAGRAPHS_MODEpattern.Scope
Coordinated change across:
py/torch_tensorrt/dynamo/_settings.py,_defaults.py,_compiler.py— remove the three fieldspy/torch_tensorrt/runtime/— new context-manager modulespy/torch_tensorrt/dynamo/runtime/_TRTEngine.py(Python runtime) — read context state at_setup_runtime_configpy/torch_tensorrt/dynamo/runtime/_TorchTensorRTModule.py— drop the cache + map state; removeHAS_RUNTIME_CFG_IDXpackingcore/runtime/runtime.h— drop the four*_IDXslots; add global state for the three controlscore/runtime/register_jit_hooks.cpp— remove the slot accessors; addget/setops for the three controlscore/runtime/TRTEngine.cpp— populateruntime_cfgfrom global state in the constructorcore/runtime/TRTRuntimeConfig.cpp— dropmake_runtime_config_from_serialized; the engine handles it inlineOpen questions
runtime_cache_pathbe runtime-mutable or compile-time-pinned? It's set once in practice (path doesn't change per call), so atorch_tensorrt.runtime.set_runtime_cache_path(...)module-level singleton may be enough.CompilationSettingsfields as no-op deprecation warnings for one release, or hard-remove?CudaGraphsTorchTensorRTModule._check_monolithic_capturability: today it inspectsengine._rtx_native_cudagraphs; with a runtime context manager the check happens against the current context state instead.Related
tp5uiuc/feat/trtrtx-cpp-runtime-v2)Beta Was this translation helpful? Give feedback.
All reactions