Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

The dynamic compilation does not handle composite shapes with multiple dimensions very effectively #127162

Open
tlogn opened this issue May 25, 2024 · 1 comment
Labels
module: dynamic shapes oncall: pt2 triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@tlogn
Copy link

tlogn commented May 25, 2024

馃悰 Describe the bug

Hello, I have encountered a performance issue and would greatly appreciate any assistance. Thank you.
After compiling the unet model with unet = torch.compile(unet, dynamic=True), it is capable of generating videos with multiple shapes. However, this process significantly slows down compared to testing with only one shape.
There are a total of 9 shapes that I use for testing purposes(The input has NCDHW 5 dims but only changing D, H, W dim) .During the warmup phase, the model generates the largest shape. Once all 9 shapes have been generated, the model's performance slows down when returning to the largest shape, as compared to its performance during the eager phase.
Is the performance related to the cache size? Torch dynamo settings lies below

    torch._dynamo.config.suppress_errors = True
    torch._dynamo.config.cache_size_limit = 64

Error logs

No response

Minified repro

No response

Versions

PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Debian GNU/Linux 11 (bullseye) (x86_64)
GCC version: (Debian 10.2.1-6) 10.2.1 20210110
Clang version: Could not collect
CMake version: version 3.18.4
Libc version: glibc-2.31

Python version: 3.9.2 (default, Feb 28 2021, 17:03:44) [GCC 10.2.1 20210110] (64-bit runtime)
Python platform: Linux-5.15.120.bsk.2-amd64-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 12.2.140

cc @ezyang @msaroufim @bdhirsh @anijain2305 @chauhang

@ezyang
Copy link
Contributor

ezyang commented May 27, 2024

@anijain2305 do we reorder what order compiled products get hit? If so, we might be hitting the generic non-specialized version instead of the original specialized code which likely runs faster.

@zou3519 zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: dynamic shapes oncall: pt2 triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

3 participants