New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[Inductor] GEMM shape padding improvements #118522

Closed

kadeng wants to merge 1 commit into main from kadeng/inductor-cutlass-backend-3-PR1

Contributor

kadeng commented Jan 29, 2024 •

edited by pytorch-bot bot

Loading

Improvements to shape padding logic in torch/_inductor/pad_mm.py

These changes could lead up to 14% perf improvement for certain Meta internal models in experiments.

Most notably:

1.) Use aten.const_pad_nd operation to pad Tensors in a single op instead of using multiple steps involving intermediate buffers. This appears to be more performant than the previous logic, confirmed by Profiling & Benchmarking results ( Meta internal )
2.) Make many paddings unneccessary using explicitly transposed GEMM when either M or N dimension is properly aligned but the other is not, configurable via config.shape_pad_use_transpose (default: True).
3.) Enable shape padding for the Inductor CUDA / Cutlass backend for all GEMM ops where Cutlass would be enabled, without benchmarking in that case.
Add config flag to always pad shapes (without benchmarking first), configurable via config.force_shape_pad (default: False )
Added several new unit tests to ensure tensors are padded such that they meet all alignment requirements after padding.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @muchulee8 @aakhundov @ColinPeppler @amjames @desertfire @chauhang

pytorch-bot bot commented Jan 29, 2024 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/118522

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit baf3b9a with merge base 68c3cb7 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions bot added module: inductor ciflow/inductor labels

kadeng force-pushed the kadeng/inductor-cutlass-backend-3-PR1 branch from eac9bc0 to d9c820f Compare

January 29, 2024 19:00

kadeng requested review from jansel, ipiszy, eellison and aakhundov

January 30, 2024 10:19

kadeng self-assigned this

kadeng marked this pull request as ready for review

January 30, 2024 10:20

kadeng commented

View reviewed changes

torch/_inductor/fx_passes/pad_mm.py Show resolved Hide resolved

kadeng mentioned this pull request

[Inductor cutlass backend] merge EVT fusion feature branch ( phase 2 ) #118416

Closed

Contributor

facebook-github-bot commented Jan 30, 2024

@kadeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

jansel approved these changes

View reviewed changes

eellison reviewed

View reviewed changes

Contributor

eellison left a comment

Cool! Would you add a few tests?

torch/_inductor/fx_passes/pad_mm.py Show resolved Hide resolved

torch/_inductor/fx_passes/pad_mm.py Outdated Show resolved Hide resolved

torch/_inductor/fx_passes/pad_mm.py

+                          size=[batchsize, m, n],
+                          stride=[n * m, n, 1],
+                      )
+                      if use_cutlass_template(fake_layout):

Contributor

eellison Jan 30, 2024

Do you still want to pad bandwidth bound mms in cutlass ? I found for the small matmuls it wasnt profitable to pad

Contributor Author

kadeng Jan 31, 2024 •

edited

Loading

We're not going to use cutlass for small Matmuls, in practice they will have a size threshold (in terms of MNK) below which they won't be used. This is how I did it here ( Meta-internal link) https://fb.workplace.com/notes/347669491400529/ and arrived at whole-model speedups of up to 14%.

And bandwidth bound large matmuls are definitely what I want the padding to be applied to, these can be sped up considerably by Cutlass. This has a benefit on average because the padding can often be fused with other (previous) ops by Triton (such that no mem IO overhead remains through padding) or be done via no-op memory reinterpretation ( e.g. transpose etc.). In addition, the Cutlass speedup can be so large that it would speed things up even if the above were not the case.

See also this (really large) PR where the threshold is introduced: #118416

Contributor

eellison Feb 1, 2024 •

edited

Loading

This has a benefit on average because the padding can often be fused with other (previous) ops by Triton (such that no mem IO overhead remains through padding) or be done via no-op memory reinterpretation ( e.g. transpose etc.).

For both these cases we should be able to analyze the graph to know that these are going to fire... A general TODO in this file is to skip the padding on a tensor in benchmarking when it comes from a fusable operator.

torch/_inductor/fx_passes/pad_mm.py Show resolved Hide resolved

shunting314 reviewed

View reviewed changes

torch/_inductor/config.py Outdated Show resolved Hide resolved

shunting314 reviewed

View reviewed changes

torch/_inductor/fx_passes/pad_mm.py

Comment on lines +296 to +302

		elif n_padded_length == 0 and m_padded_length != 0:
		m_padded_length = 0

Contributor

shunting314 Jan 30, 2024

For my own understanding:

if A[M, K] @ B[K, N] , with only dimension M need padding, we will consider this matmul does not need padding?

Do you actually means dimension M does not worth padding?

Contributor Author

kadeng Jan 31, 2024 •

edited

Loading

right, because A[M,K] @ B[K,N] -> [M,N] -> this is fine, since N will be the last dimension, M won't have to meet alignment requirements ( assuming row-major output layout, which will then certainly be picked by the matmul backend if M is not aligned ).

So, in other words, we only need K and either M or N to be aligned.

kadeng force-pushed the kadeng/inductor-cutlass-backend-3-PR1 branch from d9c820f to 64fb65a Compare

January 31, 2024 19:26

Contributor

facebook-github-bot commented Jan 31, 2024

@kadeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

kadeng force-pushed the kadeng/inductor-cutlass-backend-3-PR1 branch from 64fb65a to 4b5cd88 Compare

January 31, 2024 19:30

Contributor

facebook-github-bot commented Jan 31, 2024

@kadeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Contributor Author

kadeng commented Jan 31, 2024

Cool! Would you add a few tests?

Just did that. I focused on testing the padding logic itself, not the pattern matching etc. which is unchanged.

eellison reviewed

View reviewed changes

torch/_inductor/config.py Outdated Show resolved Hide resolved

kadeng force-pushed the kadeng/inductor-cutlass-backend-3-PR1 branch from 4b5cd88 to a9ba985 Compare

February 1, 2024 11:12


          [Inductor] GEMM shape padding improvements (

baf3b9a

Improvements to shape padding logic in torch/_inductor/pad_mm.py

Most notably:

  * Enable shape padding for Cutlass
  * Add flag to always pad shapes
  * Use aten.const_pad_nd operation to pad Tensors in a single op instead of using multiple steps involving intermediate buffers.
  * Make many paddings unneccessary when either M or N dimension is properly aligned but the other is not ( configurable, on by default

Updates:

 * Addressed reviewer comments
 * Removed config setting to only pad K dimension
 * Added detailed unit tests in test/inductor/test_pad_mm.py

kadeng force-pushed the kadeng/inductor-cutlass-backend-3-PR1 branch from a9ba985 to baf3b9a Compare

February 1, 2024 11:17

Contributor

facebook-github-bot commented Feb 1, 2024

@kadeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

eellison approved these changes

View reviewed changes

Contributor

facebook-github-bot commented Feb 2, 2024

@pytorchbot merge -f 'Landed internally'

(Initiating merge automatically since Phabricator Diff has merged, using force because this PR might not pass merge_rules.json but landed internally)

pytorchmergebot added the merging label

pytorch-bot bot pushed a commit that referenced this pull request


          [Inductor] GEMM shape padding improvements (#118522)

Improvements to shape padding logic in torch/_inductor/pad_mm.py

These changes could lead up to 14% perf improvement for certain Meta internal models in experiments.

Most notably:
  * 1.) Use aten.const_pad_nd operation to pad Tensors in a single op instead of using multiple steps involving intermediate buffers. This appears to be more performant than the previous logic, confirmed by Profiling & Benchmarking results ( Meta internal )
 * 2.) Make many paddings unneccessary using explicitly transposed GEMM when either M or N dimension is properly aligned but the other is not, configurable via config.shape_pad_use_transpose (default: True).
  * 3.) Enable shape padding for the Inductor CUDA  /  Cutlass backend for all GEMM ops where Cutlass would be enabled, without benchmarking in that case.
  * Add config flag to always pad shapes (without benchmarking first), configurable via config.force_shape_pad (default: False )
  * Added several new unit tests to ensure tensors are padded such that they meet all alignment requirements after padding.

Pull Request resolved: #118522
Approved by: https://github.com/jansel, https://github.com/eellison

pytorch-bot bot pushed a commit that referenced this pull request


          Revert "[Inductor] GEMM shape padding improvements (#118522)"

ee2dc61

This reverts commit cc46829.

Reverted #118522 on behalf of https://github.com/eellison due to regresses HF ~4/5% ([comment](#118522 (comment)))

clee2000 pushed a commit that referenced this pull request


          Revert "[Inductor] GEMM shape padding improvements (#118522)"

ea5a1eb

This reverts commit cc46829.

Reverted #118522 on behalf of https://github.com/eellison due to regresses HF ~4/5% ([comment](#118522 (comment)))

github-actions bot commented Apr 7, 2024

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

github-actions bot added the Stale label

Contributor Author

kadeng commented May 6, 2024

Superseded by #119578

kadeng closed this

eellison added a commit that referenced this pull request


          Reland '[Inductor] GEMM shape padding improvements (#118522)'

6ab5a4e

Relanding just the pad in a single pass portion of the pr. Not including
the transpose logic:

[ghstack-poisoned]

eellison added a commit that referenced this pull request


          Reland '[Inductor] GEMM shape padding improvements (#118522)'

04ac1a2

Relanding just the pad in a single pass portion of the pr. Not including
the transpose logic:

ghstack-source-id: 710d916730270cc59da0e4410ecf12be434cebc0
Pull Request resolved: #125773

eellison mentioned this pull request

Reland '[Inductor] GEMM shape padding improvements (#118522)' #125773

Closed

eellison added a commit that referenced this pull request


          Update base for Update on "Reland '[Inductor] GEMM shape padding impr…

5c96d6f

…ovements (#118522)'"


Relanding just the pad in a single pass portion of [the pr](#118522). Not including
the transpose logic:

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]

eellison added a commit that referenced this pull request


          Update on "Reland '[Inductor] GEMM shape padding improvements (#118522)…

17d50f6

…'"


Relanding just the pad in a single pass portion of [the pr](#118522). Not including
the transpose logic:

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]

eellison added a commit that referenced this pull request


          Update base for Update on "Reland '[Inductor] GEMM shape padding impr…

4a374ce

…ovements (#118522)'"


Relanding just the pad in a single pass portion of [the pr](#118522). Not including
the transpose logic:

This was previously accepted and reviewed. 

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]

eellison added a commit that referenced this pull request


          Update on "Reland '[Inductor] GEMM shape padding improvements (#118522)…

493cacf

…'"


Relanding just the pad in a single pass portion of [the pr](#118522). Not including
the transpose logic:

This was previously accepted and reviewed. 

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]

eellison added a commit that referenced this pull request


          Update base for Update on "Reland '[Inductor] GEMM shape padding impr…

bc82d34

…ovements (#118522)'"


Relanding just the pad in a single pass portion of [the pr](#118522). Not including
the transpose logic:

This was previously accepted and reviewed. 

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]

eellison added a commit that referenced this pull request


          Update on "Reland '[Inductor] GEMM shape padding improvements (#118522)…

13eb7a0

…'"


Relanding just the pad in a single pass portion of [the pr](#118522). Not including
the transpose logic:

This was previously accepted and reviewed. 

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]

eellison added a commit that referenced this pull request


          Update base for Update on "Reland '[Inductor] GEMM shape padding impr…

70711b3

…ovements (#118522)'"


Relanding just the pad in a single pass portion of [the pr](#118522). Not including
the transpose logic:

This was previously accepted and reviewed. 

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]

eellison added a commit that referenced this pull request


          Update on "Reland '[Inductor] GEMM shape padding improvements (#118522)…

e46c5a0

…'"


Relanding just the pad in a single pass portion of [the pr](#118522). Not including
the transpose logic:

This was previously accepted and reviewed. 

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]

eellison added a commit that referenced this pull request


          Update base for Update on "Reland '[Inductor] GEMM shape padding impr…

ef54ec7

…ovements (#118522)'"


Relanding just the pad in a single pass portion of [the pr](#118522). Not including
the transpose logic:

This was previously accepted and reviewed. 

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]

eellison added a commit that referenced this pull request


          Update on "Reland '[Inductor] GEMM shape padding improvements (#118522)…

d1f146f

…'"


Relanding just the pad in a single pass portion of [the pr](#118522). Not including
the transpose logic:

This was previously accepted and reviewed. 

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]

eellison added a commit that referenced this pull request


          Update base for Update on "Reland '[Inductor] GEMM shape padding impr…

dac7901

…ovements (#118522)'"


Relanding just the pad in a single pass portion of [the pr](#118522). Not including
the transpose logic:

This was previously accepted and reviewed. 

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]

eellison added a commit that referenced this pull request


          Update on "Reland '[Inductor] GEMM shape padding improvements (#118522)…

2dfb7fb

…'"


Relanding just the pad in a single pass portion of [the pr](#118522). Not including
the transpose logic:

This was previously accepted and reviewed. 

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]

eellison added a commit that referenced this pull request


          Update base for Update on "Reland '[Inductor] GEMM shape padding impr…

…ovements (#118522)'"


Relanding just the pad in a single pass portion of [the pr](#118522). Not including
the transpose logic:

This was previously accepted and reviewed. 

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]

eellison added a commit that referenced this pull request


          Update on "Reland '[Inductor] GEMM shape padding improvements (#118522)…

6b9679f

…'"


Relanding just the pad in a single pass portion of [the pr](#118522). Not including
the transpose logic:

This was previously accepted and reviewed. 

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]

eellison added a commit that referenced this pull request


          Update base for Update on "Reland '[Inductor] GEMM shape padding impr…

5c94ac1

…ovements (#118522)'"


Relanding just the pad in a single pass portion of [the pr](#118522). Not including
the transpose logic:

This was previously accepted and reviewed. 

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]

eellison added a commit that referenced this pull request


          Update on "Reland '[Inductor] GEMM shape padding improvements (#118522)…

5d998e2

…'"


Relanding just the pad in a single pass portion of [the pr](#118522). Not including
the transpose logic:

This was previously accepted and reviewed. 

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]

pytorchmergebot pushed a commit that referenced this pull request


          Reland '[Inductor] GEMM shape padding improvements (#118522)' (#125773)

4fb5d69

Relanding just the pad in a single pass portion of [the pr](#118522). Not including
the transpose logic:

This was previously accepted and reviewed.

Pull Request resolved: #125773
Approved by: https://github.com/shunting314
ghstack dependencies: #125772

ZelboK pushed a commit to ZelboK/pytorch that referenced this pull request


          Reland '[Inductor] GEMM shape padding improvements (pytorch#118522)' (p…

b3b9f72

…ytorch#125773)

Relanding just the pad in a single pass portion of [the pr](pytorch#118522). Not including
the transpose logic:

This was previously accepted and reviewed.

Pull Request resolved: pytorch#125773
Approved by: https://github.com/shunting314
ghstack dependencies: pytorch#125772

github-actions bot deleted the kadeng/inductor-cutlass-backend-3-PR1 branch

June 6, 2024 01:53

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

shunting314 shunting314 left review comments

jansel jansel approved these changes

eellison eellison approved these changes

ipiszy Awaiting requested review from ipiszy

aakhundov Awaiting requested review from aakhundov

Labels

ciflow/inductor Merged module: inductor Reverted Stale