Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable shape padding, cache benchmarking #100982

Closed
wants to merge 13 commits into from

Conversation

eellison
Copy link
Contributor

@eellison eellison commented May 9, 2023

Stack from ghstack (oldest at bottom):

This wasn't enabled previously because of concerns around compilation time. From a recent run, this increases compile time at an average of 1-2s and reduces memory compression by ~2% but for significant performance gains (up to 30% on many HF models). There were flakey failures on my last benchmarking run, waiting for an update now that multithreading non-deterministic failures are fixed. We can wait for that before landing.

cc @soumith @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @desertfire

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented May 9, 2023

eellison added a commit that referenced this pull request May 9, 2023
ghstack-source-id: 62c42094b7f71d211bda7eaab3fee80ca8e1f436
Pull Request resolved: #100982
cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

[ghstack-poisoned]
eellison added a commit that referenced this pull request May 9, 2023
ghstack-source-id: 61a1e204dce30414813482266b665ec6ec14b33c
Pull Request resolved: #100982
@eellison eellison added the ciflow/trunk Trigger trunk jobs on your pull request label May 10, 2023
cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

[ghstack-poisoned]
eellison added a commit that referenced this pull request May 11, 2023
ghstack-source-id: 68a1d6a8408918cd66163be36bf1fdb57d1285c0
Pull Request resolved: #100982
cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

[ghstack-poisoned]
eellison added a commit that referenced this pull request May 11, 2023
ghstack-source-id: e3241ead62ff88d91788191935222aa452b77186
Pull Request resolved: #100982
@eellison eellison changed the title Enable shape padding Enable shape padding, cache benchmarking May 11, 2023
@eellison eellison added the topic: not user facing topic category label May 11, 2023
For `python benchmarks/dynamo/huggingface.py  --performance  --backend inductor --amp --training --only PegasusForCausalLM` I got 319 cache hits and 22 cache misses. 


cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

[ghstack-poisoned]
eellison added a commit that referenced this pull request May 11, 2023
ghstack-source-id: 141f6dda4621549b761e97c66ee82a02ee1cb142
Pull Request resolved: #100982
For `python benchmarks/dynamo/huggingface.py  --performance  --backend inductor --amp --training --only PegasusForCausalLM` I got 319 cache hits and 22 cache misses. 

Previously increased compile time was the blocker. I think it should be much reduced now. Waiting on benchmarking results.. 

cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

[ghstack-poisoned]
eellison added a commit that referenced this pull request May 11, 2023
ghstack-source-id: 53a044fea347ba4e1522ece16ab40e31e77ec265
Pull Request resolved: #100982
For `python benchmarks/dynamo/huggingface.py  --performance  --backend inductor --amp --training --only PegasusForCausalLM` I got 319 cache hits and 22 cache misses. 

Previously increased compile time was the blocker. I think it should be much reduced now. Waiting on benchmarking results.. 

cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

[ghstack-poisoned]
eellison added a commit that referenced this pull request May 15, 2023
ghstack-source-id: 6e67412ac4b16e0612b9661c509f77938072b32c
Pull Request resolved: #100982
For `python benchmarks/dynamo/huggingface.py  --performance  --backend inductor --amp --training --only PegasusForCausalLM` I got 319 cache hits and 22 cache misses. 

Previously increased compile time was the blocker. I think it should be much reduced now. Waiting on benchmarking results.. 

cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

[ghstack-poisoned]
eellison added a commit that referenced this pull request May 15, 2023
ghstack-source-id: b02deb949bd341a12826ef05475f33e0ce969c12
Pull Request resolved: #100982
For `python benchmarks/dynamo/huggingface.py  --performance  --backend inductor --amp --training --only PegasusForCausalLM` I got 319 cache hits and 22 cache misses. 

Previously increased compile time was the blocker. I think it should be much reduced now. Waiting on benchmarking results.. 

cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

[ghstack-poisoned]
eellison added a commit that referenced this pull request May 15, 2023
ghstack-source-id: 2a1f6ae40f0636c1acb9bd624749749ec2fa4527
Pull Request resolved: #100982
For `python benchmarks/dynamo/huggingface.py  --performance  --backend inductor --amp --training --only PegasusForCausalLM` I got 319 cache hits and 22 cache misses. 

Previously increased compile time was the blocker. I think it should be much reduced now. Waiting on benchmarking results.. 

cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

[ghstack-poisoned]
This was referenced May 16, 2023
For `python benchmarks/dynamo/huggingface.py  --performance  --backend inductor --amp --training --only PegasusForCausalLM` I got 319 cache hits and 22 cache misses. 

Previously increased compile time was the blocker. It is much reduced now. 

cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

[ghstack-poisoned]
For `python benchmarks/dynamo/huggingface.py  --performance  --backend inductor --amp --training --only PegasusForCausalLM` I got 319 cache hits and 22 cache misses. 

Previously increased compile time was the blocker. It is much reduced now. 

cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

[ghstack-poisoned]
@eellison eellison requested a review from jansel May 16, 2023 18:39
@eellison eellison requested a review from ngimel May 16, 2023 18:39

This wasn't enabled previously because of concerns around compilation time. From a recent run, this increases compile time at an average of 1-2s and reduces memory compression by ~2% but for significant performance gains (up to 30% on many HF models). There were flakey failures on my last benchmarking run, waiting for an update now that multithreading non-deterministic failures are fixed. We can wait for that before landing.


cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

[ghstack-poisoned]
@eellison eellison removed request for jansel and ngimel May 16, 2023 21:47
This was referenced May 16, 2023
with torch.autograd.detect_anomaly(check_nan=False):
with torch.autograd.set_multithreading_enabled(
False
), torch.autograd.detect_anomaly(check_nan=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh, I'm a little surprised we needed this since you already added this context manager to run in the outer-most bit of aot autograd (so I'd expect to already be turned on?) https://github.com/pytorch/pytorch/pull/100982/files#diff-df954bbf954d2dcb81f687876053267ffa4ddb36ed86b7d2bd76319ff2b94416L3136

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im trying to find the source of a strange bug on CI that can't locally repro: FakeTensor doesn't have an attribute fake_mode. Was throwing things at the wall to see if anything fixed it. Still haven't found it yet.

if op is torch.ops.aten.addmm:
input_pad = None
if input is not None and input.is_cuda:
input_pad = torch.randn_like(input)
pad_time = do_bench(
lambda: pad_addmm(
input_pad,
mat1_pad,
mat2_pad,
mat1,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could it be that you are reusing original inputs here, instead of making a copy (the source of your weird error)

Copy link
Contributor Author

@eellison eellison May 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got the same error with no changes #101619. mat1 and mat2 are reassigned here, they're not the original. mat1_pad and mat2_pad are redundant.

My guess is they were there so the matrices wouldnt be in cache, but triton do_bench clears cache anyway

eellison added a commit that referenced this pull request May 24, 2023
Rebase of #100982 which was already accepted


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10

[ghstack-poisoned]
eellison added a commit that referenced this pull request May 24, 2023
Rebase of #100982 which was already accepted


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10

[ghstack-poisoned]
eellison added a commit that referenced this pull request May 25, 2023
Rebase of #100982 which was already accepted


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10

[ghstack-poisoned]
eellison added a commit that referenced this pull request May 25, 2023
Rebase of #100982 which was already accepted


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request May 25, 2023
Rebase of #100982 which was already accepted

Pull Request resolved: #102200
Approved by: https://github.com/ngimel, https://github.com/jansel
@eellison eellison closed this May 26, 2023
@facebook-github-bot facebook-github-bot deleted the gh/eellison/445/head branch June 25, 2023 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request module: inductor topic: not user facing topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants