-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Closed
Labels
dynamo-triage-june2024high prioritymodule: compile-timeCompilation mechanism or time spent in (re)compilation, tracing, startupCompilation mechanism or time spent in (re)compilation, tracing, startupmodule: dynamomodule: guardsoncall: pt2triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
🐛 Describe the bug
The pass status for hf_T5_generate inference has been flaky in a few settings for the past month link
The root cause is hf_T5_generate takes too long to compile. Here summarizes it's compilation time in different settings (dashboard link ):
- default: 223s
- cudagraphs: 1669s
- inductor_max_autotune: 1845s
- cudagraphs_freezing: 549s
- inductor_with_cudagraphs_freezing_autotune: 770s
I've seen timeout in both cudagraphs and inductor_max_autotune settings. Max autotune do increase compilation time but I think the main issue is not max autotune here.
Error logs
..
Minified repro
..
Versions
..
cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @anijain2305 @chauhang @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @oulgen @jamesjwu @aorenste @laithsakka @bdhirsh
Metadata
Metadata
Assignees
Labels
dynamo-triage-june2024high prioritymodule: compile-timeCompilation mechanism or time spent in (re)compilation, tracing, startupCompilation mechanism or time spent in (re)compilation, tracing, startupmodule: dynamomodule: guardsoncall: pt2triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module