[inductor] fix bandwidth extimation for StarDep #120266

shunting314 · 2024-02-21T00:29:46Z

Stack from ghstack (oldest at bottom):

A lot of HF models fail when inductor_config.bechmark_kernel is enabled. The reason is the bandwidth estimation code assumes every dependencies has an index but StarDep does not. An exception is raised when StarDep.index is being accessed.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler @amjames

[ghstack-poisoned]

pytorch-bot · 2024-02-21T00:29:49Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/120266

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (3 Unrelated Failures)

As of commit b2fca72 with merge base cccacf6 ():

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 3, 5, linux.g5.4xlarge.nvidia.gpu) (gh)
inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_full_cuda
pull / linux-focal-py3.12-clang10 / test (default, 2, 3, linux.2xlarge) (gh)
RuntimeError: export/test_passes 1/1 failed

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

inductor / rocm6.0-py3.8-inductor / test (inductor, 1, 1, linux.rocm.gpu.2, unstable) (gh)
Action 'https://api.github.com/repos/pytorch/pytorch/tarball/dd6b5e236e3aee28c153455ecbc1fd6b4192d687' download has timed out. Error: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: ccb2f9a Pull Request resolved: #120266

shunting314 · 2024-02-21T00:40:55Z

@pytorchbot merge

pytorchmergebot · 2024-02-21T00:42:43Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Log a few more fields - num_atomic_add: perf of kernels using atomic_add are usually data dependent. Our benchmarking code generate all indices to be 0 which will result in worse perf than reality. - kernel_args_num_gb: estimate the amount of read/writes for kernel args. In-place args will be double counted. If we have a good estimation, this should be the lower bound of memory access that the GPU performs. Sometimes GPU will do more memory access since a single buffer may be access multiple times (e.g. for softmax when input tensor is quite large. cache only help a bit here). With this logged, and if we augment the metadata with amount of memory the GPU actually accessed, then it would be nice to dig into kernels that GPU access more memory. Pull Request resolved: #120274 Approved by: https://github.com/jansel ghstack dependencies: #120266

[inductor] fix bandwidth extimation for StarDep

b2fca72

[ghstack-poisoned]

shunting314 added a commit that referenced this pull request Feb 21, 2024

[inductor] fix bandwidth extimation for StarDep

bec9e54

ghstack-source-id: ccb2f9a Pull Request resolved: #120266

github-actions bot added module: inductor ciflow/inductor labels Feb 21, 2024

shunting314 requested review from Chillee, chenyang78, eellison and jansel February 21, 2024 00:33

eellison approved these changes Feb 21, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 21, 2024

shunting314 added the topic: not user facing topic category label Feb 21, 2024

pytorchmergebot added the merging label Feb 21, 2024

shunting314 mentioned this pull request Feb 21, 2024

[inductor] improve kernel metadata logging #120274

Closed

jansel approved these changes Feb 21, 2024

View reviewed changes

pytorchmergebot added the Merged label Feb 21, 2024

pytorchmergebot closed this in 800e9ac Feb 21, 2024

pytorchmergebot removed the merging label Feb 21, 2024

github-actions bot deleted the gh/shunting314/100/head branch March 23, 2024 01:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[inductor] fix bandwidth extimation for StarDep #120266

[inductor] fix bandwidth extimation for StarDep #120266

Uh oh!

shunting314 commented Feb 21, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Feb 21, 2024 •

edited

Loading

Uh oh!

shunting314 commented Feb 21, 2024

Uh oh!

pytorchmergebot commented Feb 21, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[inductor] fix bandwidth extimation for StarDep #120266

[inductor] fix bandwidth extimation for StarDep #120266

Uh oh!

Conversation

shunting314 commented Feb 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Feb 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/120266

✅ You can merge normally! (3 Unrelated Failures)

Uh oh!

shunting314 commented Feb 21, 2024

Uh oh!

pytorchmergebot commented Feb 21, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

shunting314 commented Feb 21, 2024 •

edited

Loading

pytorch-bot bot commented Feb 21, 2024 •

edited

Loading