handle symbolic shape for non tensor inputs in symbolic shape extraction#4124
Conversation
narendasan
left a comment
There was a problem hiding this comment.
@apbose can you add a test case for this?
|
@lanluo-nvidia can you merge and cherry-pick this for 2.11? |
…ng read in C++ runtime, adding test cases
fb72205 to
a0cd0d8
Compare
|
|
||
| std::vector<at::Tensor> outputs(compiled_engine->num_io.second); | ||
|
|
||
| // Shape tensor CPU buffers must outlive inferShapes() and enqueueV3() |
There was a problem hiding this comment.
Can you explain why? Would help others understand (basically afaict, its just that these inputs are not provided by torch and so we need to produce them ourselves and ensure they are available through enqueue)
There was a problem hiding this comment.
Yeah the jist is that. Shape tensor values require CPU memory buffers whose addresses are registered with TRT via setTensorAddress(). Unlike regular tensor inputs (whose GPU memory is reference-counted by PyTorch and stays alive via the caller's tensor references), shape tensor values are copied into std::vector<int64_t> buffers that we allocate ourselves (I should restore back the comment, will do that). TRT holds raw pointers to these buffers and reads from them during inferShapes() and enqueueV3(). Previously, these buffers were local to setup_input_tensors() and were freed on return. Moving the declaration here ensures the buffers outlive both calls.
narendasan
left a comment
There was a problem hiding this comment.
Think this looks mostly good, we can merge if its urgent otherwise documenting the reason for the C++ change would be good
This PR handles-
_symbolic_shape_capture.pyinputShapeTensorValueswas a local variable insetup_input_tensors(). When the function returned, the CPU buffer pointers registered with TRT viasetTensorAddress()became dangling.inferShapes()andenqueueV3()then read garbage from freed memory, producing nonsensical reshape dimensions.