Skip to content

[inline_inbuilt_nn_modules] Long compilation time for hf_T5_generate inference cause timeout #121989

@shunting314

Description

@shunting314

🐛 Describe the bug

The pass status for hf_T5_generate inference has been flaky in a few settings for the past month link

The root cause is hf_T5_generate takes too long to compile. Here summarizes it's compilation time in different settings (dashboard link ):

  • default: 223s
  • cudagraphs: 1669s
  • inductor_max_autotune: 1845s
  • cudagraphs_freezing: 549s
  • inductor_with_cudagraphs_freezing_autotune: 770s

I've seen timeout in both cudagraphs and inductor_max_autotune settings. Max autotune do increase compilation time but I think the main issue is not max autotune here.

Error logs

..

Minified repro

..

Versions

..

cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @anijain2305 @chauhang @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @oulgen @jamesjwu @aorenste @laithsakka @bdhirsh

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions