Enable shape padding, cache benchmarking #100982

eellison · 2023-05-09T16:20:06Z

Stack from ghstack (oldest at bottom):

This wasn't enabled previously because of concerns around compilation time. From a recent run, this increases compile time at an average of 1-2s and reduces memory compression by ~2% but for significant performance gains (up to 30% on many HF models). There were flakey failures on my last benchmarking run, waiting for an update now that multithreading non-deterministic failures are fixed. We can wait for that before landing.

cc @soumith @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @desertfire

[ghstack-poisoned]

pytorch-bot · 2023-05-09T16:20:08Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/100982

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 6 New Failures

As of commit 2947c84:

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 62c42094b7f71d211bda7eaab3fee80ca8e1f436 Pull Request resolved: #100982

cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]

ghstack-source-id: 61a1e204dce30414813482266b665ec6ec14b33c Pull Request resolved: #100982

cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]

ghstack-source-id: 68a1d6a8408918cd66163be36bf1fdb57d1285c0 Pull Request resolved: #100982

cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]

ghstack-source-id: e3241ead62ff88d91788191935222aa452b77186 Pull Request resolved: #100982

For `python benchmarks/dynamo/huggingface.py --performance --backend inductor --amp --training --only PegasusForCausalLM` I got 319 cache hits and 22 cache misses. cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]

ghstack-source-id: 141f6dda4621549b761e97c66ee82a02ee1cb142 Pull Request resolved: #100982

For `python benchmarks/dynamo/huggingface.py --performance --backend inductor --amp --training --only PegasusForCausalLM` I got 319 cache hits and 22 cache misses. Previously increased compile time was the blocker. I think it should be much reduced now. Waiting on benchmarking results.. cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]

ghstack-source-id: 53a044fea347ba4e1522ece16ab40e31e77ec265 Pull Request resolved: #100982

For `python benchmarks/dynamo/huggingface.py --performance --backend inductor --amp --training --only PegasusForCausalLM` I got 319 cache hits and 22 cache misses. Previously increased compile time was the blocker. I think it should be much reduced now. Waiting on benchmarking results.. cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]

ghstack-source-id: 6e67412ac4b16e0612b9661c509f77938072b32c Pull Request resolved: #100982

For `python benchmarks/dynamo/huggingface.py --performance --backend inductor --amp --training --only PegasusForCausalLM` I got 319 cache hits and 22 cache misses. Previously increased compile time was the blocker. I think it should be much reduced now. Waiting on benchmarking results.. cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]

ghstack-source-id: b02deb949bd341a12826ef05475f33e0ce969c12 Pull Request resolved: #100982

For `python benchmarks/dynamo/huggingface.py --performance --backend inductor --amp --training --only PegasusForCausalLM` I got 319 cache hits and 22 cache misses. Previously increased compile time was the blocker. I think it should be much reduced now. Waiting on benchmarking results.. cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]

ghstack-source-id: 2a1f6ae40f0636c1acb9bd624749749ec2fa4527 Pull Request resolved: #100982

For `python benchmarks/dynamo/huggingface.py --performance --backend inductor --amp --training --only PegasusForCausalLM` I got 319 cache hits and 22 cache misses. Previously increased compile time was the blocker. I think it should be much reduced now. Waiting on benchmarking results.. cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]

For `python benchmarks/dynamo/huggingface.py --performance --backend inductor --amp --training --only PegasusForCausalLM` I got 319 cache hits and 22 cache misses. Previously increased compile time was the blocker. It is much reduced now. cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]

This wasn't enabled previously because of concerns around compilation time. From a recent run, this increases compile time at an average of 1-2s and reduces memory compression by ~2% but for significant performance gains (up to 30% on many HF models). There were flakey failures on my last benchmarking run, waiting for an update now that multithreading non-deterministic failures are fixed. We can wait for that before landing. cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]

bdhirsh · 2023-05-17T22:21:06Z

torch/_functorch/aot_autograd.py

-            with torch.autograd.detect_anomaly(check_nan=False):
+            with torch.autograd.set_multithreading_enabled(
+                False
+            ), torch.autograd.detect_anomaly(check_nan=False):


huh, I'm a little surprised we needed this since you already added this context manager to run in the outer-most bit of aot autograd (so I'd expect to already be turned on?) https://github.com/pytorch/pytorch/pull/100982/files#diff-df954bbf954d2dcb81f687876053267ffa4ddb36ed86b7d2bd76319ff2b94416L3136

im trying to find the source of a strange bug on CI that can't locally repro: FakeTensor doesn't have an attribute fake_mode. Was throwing things at the wall to see if anything fixed it. Still haven't found it yet.

ngimel · 2023-05-18T04:11:16Z

torch/_inductor/decomposition.py

        if op is torch.ops.aten.addmm:
            input_pad = None
            if input is not None and input.is_cuda:
                input_pad = torch.randn_like(input)
            pad_time = do_bench(
                lambda: pad_addmm(
                    input_pad,
-                    mat1_pad,
-                    mat2_pad,
+                    mat1,


could it be that you are reusing original inputs here, instead of making a copy (the source of your weird error)

I got the same error with no changes #101619. mat1 and mat2 are reassigned here, they're not the original. mat1_pad and mat2_pad are redundant.

My guess is they were there so the matrices wouldnt be in cache, but triton do_bench clears cache anyway

Rebase of #100982 which was already accepted cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 [ghstack-poisoned]

Rebase of #100982 which was already accepted Pull Request resolved: #102200 Approved by: https://github.com/ngimel, https://github.com/jansel

Enable shape padding

82a65d2

[ghstack-poisoned]

github-actions bot added ciflow/inductor module: inductor labels May 9, 2023

eellison added a commit that referenced this pull request May 9, 2023

Enable shape padding

7d972cd

ghstack-source-id: 62c42094b7f71d211bda7eaab3fee80ca8e1f436 Pull Request resolved: #100982

Update on "Enable shape padding"

96c6bdb

cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]

eellison added a commit that referenced this pull request May 9, 2023

Enable shape padding

77909a6

ghstack-source-id: 61a1e204dce30414813482266b665ec6ec14b33c Pull Request resolved: #100982

eellison added the ciflow/trunk Trigger trunk jobs on your pull request label May 10, 2023

Update on "Enable shape padding"

e2703a9

cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]

eellison added a commit that referenced this pull request May 11, 2023

Enable shape padding

6fa9f1b

ghstack-source-id: 68a1d6a8408918cd66163be36bf1fdb57d1285c0 Pull Request resolved: #100982

Update on "Enable shape padding"

c0f3f7e

cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]

eellison added a commit that referenced this pull request May 11, 2023

Enable shape padding

9166b65

ghstack-source-id: e3241ead62ff88d91788191935222aa452b77186 Pull Request resolved: #100982

eellison changed the title ~~Enable shape padding~~ Enable shape padding, cache benchmarking May 11, 2023

eellison added the topic: not user facing topic category label May 11, 2023

eellison added a commit that referenced this pull request May 11, 2023

Enable shape padding

6451e1b

ghstack-source-id: 141f6dda4621549b761e97c66ee82a02ee1cb142 Pull Request resolved: #100982

eellison added a commit that referenced this pull request May 11, 2023

Enable shape padding

9a0824c

ghstack-source-id: 53a044fea347ba4e1522ece16ab40e31e77ec265 Pull Request resolved: #100982

eellison added a commit that referenced this pull request May 15, 2023

Enable shape padding

4112402

ghstack-source-id: 6e67412ac4b16e0612b9661c509f77938072b32c Pull Request resolved: #100982

eellison added a commit that referenced this pull request May 15, 2023

Enable shape padding

9715ee7

ghstack-source-id: b02deb949bd341a12826ef05475f33e0ce969c12 Pull Request resolved: #100982

eellison added a commit that referenced this pull request May 15, 2023

Enable shape padding

6356968

ghstack-source-id: 2a1f6ae40f0636c1acb9bd624749749ec2fa4527 Pull Request resolved: #100982

This was referenced May 16, 2023

Inductor Freezing #100652

Closed

Fold Conv-Bn #100653

Closed

eellison added 2 commits May 16, 2023 18:03

eellison requested a review from jansel May 16, 2023 18:39

eellison requested a review from ngimel May 16, 2023 18:39

eellison mentioned this pull request May 16, 2023

Inference Horizontal Fuse Addmm #100746

Closed

eellison removed request for jansel and ngimel May 16, 2023 21:47

This was referenced May 16, 2023

testing ci #101621

Closed

testing ci #101622

Closed

jansel approved these changes May 17, 2023

View reviewed changes

bdhirsh reviewed May 17, 2023

View reviewed changes

ngimel reviewed May 18, 2023

View reviewed changes

eellison mentioned this pull request May 24, 2023

Cache mm padding decision #102200

Closed

pytorchmergebot pushed a commit that referenced this pull request May 25, 2023

Cache mm padding decision (#102200)

0833f47

Rebase of #100982 which was already accepted Pull Request resolved: #102200 Approved by: https://github.com/ngimel, https://github.com/jansel

eellison closed this May 26, 2023

facebook-github-bot deleted the gh/eellison/445/head branch June 25, 2023 14:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable shape padding, cache benchmarking #100982

Enable shape padding, cache benchmarking #100982

eellison commented May 9, 2023 •

edited

pytorch-bot bot commented May 9, 2023 •

edited

bdhirsh May 17, 2023

eellison May 17, 2023

ngimel May 18, 2023

eellison May 18, 2023 •

edited

Enable shape padding, cache benchmarking #100982

Enable shape padding, cache benchmarking #100982

Conversation

eellison commented May 9, 2023 • edited

pytorch-bot bot commented May 9, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/100982

❌ 6 New Failures

bdhirsh May 17, 2023

Choose a reason for hiding this comment

eellison May 17, 2023

Choose a reason for hiding this comment

ngimel May 18, 2023

Choose a reason for hiding this comment

eellison May 18, 2023 • edited

Choose a reason for hiding this comment

eellison commented May 9, 2023 •

edited

pytorch-bot bot commented May 9, 2023 •

edited

eellison May 18, 2023 •

edited