Skip to content

handle symbolic shape for non tensor inputs in symbolic shape extraction#4124

Merged
lanluo-nvidia merged 3 commits into
mainfrom
abose/torchTRT_symbolic_shape_symint_inpit
Mar 10, 2026
Merged

handle symbolic shape for non tensor inputs in symbolic shape extraction#4124
lanluo-nvidia merged 3 commits into
mainfrom
abose/torchTRT_symbolic_shape_symint_inpit

Conversation

@apbose
Copy link
Copy Markdown
Collaborator

@apbose apbose commented Mar 7, 2026

This PR handles-

  1. SymInt/scalar inputs in _symbolic_shape_capture.py
  2. inputShapeTensorValues was a local variable in setup_input_tensors(). When the function returned, the CPU buffer pointers registered with TRT via setTensorAddress() became dangling. inferShapes() and enqueueV3() then read garbage from freed memory, producing nonsensical reshape dimensions.

@meta-cla meta-cla Bot added the cla signed label Mar 7, 2026
@github-actions github-actions Bot added component: conversion Issues re: Conversion stage component: core Issues re: The core compiler component: api [Python] Issues re: Python API component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels Mar 7, 2026
@github-actions github-actions Bot requested a review from zewenli98 March 7, 2026 02:20
Copy link
Copy Markdown
Collaborator

@narendasan narendasan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apbose can you add a test case for this?

@narendasan
Copy link
Copy Markdown
Collaborator

@lanluo-nvidia can you merge and cherry-pick this for 2.11?

@apbose apbose force-pushed the abose/torchTRT_symbolic_shape_symint_inpit branch from fb72205 to a0cd0d8 Compare March 10, 2026 01:38

std::vector<at::Tensor> outputs(compiled_engine->num_io.second);

// Shape tensor CPU buffers must outlive inferShapes() and enqueueV3()
Copy link
Copy Markdown
Collaborator

@narendasan narendasan Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why? Would help others understand (basically afaict, its just that these inputs are not provided by torch and so we need to produce them ourselves and ensure they are available through enqueue)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the jist is that. Shape tensor values require CPU memory buffers whose addresses are registered with TRT via setTensorAddress(). Unlike regular tensor inputs (whose GPU memory is reference-counted by PyTorch and stays alive via the caller's tensor references), shape tensor values are copied into std::vector<int64_t> buffers that we allocate ourselves (I should restore back the comment, will do that). TRT holds raw pointers to these buffers and reads from them during inferShapes() and enqueueV3(). Previously, these buffers were local to setup_input_tensors() and were freed on return. Moving the declaration here ensures the buffers outlive both calls.

Copy link
Copy Markdown
Collaborator

@narendasan narendasan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this looks mostly good, we can merge if its urgent otherwise documenting the reason for the C++ change would be good

@lanluo-nvidia lanluo-nvidia merged commit 07678f9 into main Mar 10, 2026
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed component: api [Python] Issues re: Python API component: conversion Issues re: Conversion stage component: core Issues re: The core compiler component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths component: runtime component: tests Issues re: Tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants