[inline_inbuilt_nn_modules] Long compilation time for hf_T5_generate inference cause timeout

### 🐛 Describe the bug

The pass status for hf_T5_generate inference has been flaky in a few settings for the past month [link](https://hud.pytorch.org/benchmark/torchbench/inductor_with_cudagraphs?startTime=Wed%2C%2014%20Feb%202024%2021%3A15%3A20%20GMT&stopTime=Fri%2C%2015%20Mar%202024%2020%3A15%3A20%20GMT&granularity=day&mode=inference&dtype=bfloat16&lBranch=main&lCommit=38d9bb5abcc31ba97927a5399b88afe2cf60bf64&rBranch=main&rCommit=ca55468416ec96d31564304efe7a63bd92892e0b&model=hf_T5_generate)

The root cause is hf_T5_generate takes too long to compile. Here summarizes it's compilation time in different settings (dashboard [link](https://hud.pytorch.org/benchmark/compilers?startTime=Fri%2C%2008%20Mar%202024%2019%3A10%3A58%20GMT&stopTime=Fri%2C%2015%20Mar%202024%2018%3A10%3A58%20GMT&granularity=hour&suite=torchbench&mode=inference&dtype=bfloat16&lBranch=main&lCommit=660ec3d38d9d1c8567471ae7fe5b40ae7c6d7438&rBranch=main&rCommit=96ed37ac13366cc9a7e6645b8955061d0a14f80b) ):
- default: 223s
- cudagraphs: 1669s
- inductor_max_autotune: 1845s
- cudagraphs_freezing: 549s
- inductor_with_cudagraphs_freezing_autotune: 770s

I've seen timeout in both cudagraphs and inductor_max_autotune settings. Max autotune do increase compilation time but I think the main issue is not max autotune here.



### Error logs

..

### Minified repro

..

### Versions

..

cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @anijain2305 @chauhang @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @oulgen @jamesjwu @aorenste @laithsakka @bdhirsh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[inline_inbuilt_nn_modules] Long compilation time for hf_T5_generate inference cause timeout #121989

🐛 Describe the bug

Error logs

Minified repro

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[inline_inbuilt_nn_modules] Long compilation time for hf_T5_generate inference cause timeout #121989

Description

🐛 Describe the bug

Error logs

Minified repro

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions