Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Inductor cutlass backend] Enable StreamK and nonzero workspace sizes #119005

Closed
wants to merge 8 commits into from

Conversation

This change introduces temporary workspace buffers, which can be used to
provide workspace memory to Inductor ops. This is a required feature
in order to enable the Cutlass StreamK tile scheduler, which should
bring performance benefits and more reliable GEMM performance for
many tile shapes.

See StreamK paper: https://arxiv.org/abs/2301.03598

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Feb 2, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/119005

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures

As of commit 8019b3a with merge base 520771d (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…space sizes"

This change introduces temporary workspace buffers, which can be used to
provide workspace memory to Inductor ops. This is a required feature
in order to enable the Cutlass StreamK tile scheduler, which should
bring performance benefits and more reliable GEMM performance for
many tile shapes.

See StreamK paper: https://arxiv.org/abs/2301.03598

[ghstack-poisoned]
…space sizes"

This change introduces temporary workspace buffers, which can be used to
provide workspace memory to Inductor ops. This is a required feature
in order to enable the Cutlass StreamK tile scheduler, which should
bring performance benefits and more reliable GEMM performance for
many tile shapes.

See StreamK paper: https://arxiv.org/abs/2301.03598

[ghstack-poisoned]
…space sizes"

This change introduces temporary workspace buffers, which can be used to
provide workspace memory to Inductor ops. This is a required feature
in order to enable the Cutlass StreamK tile scheduler, which should
bring performance benefits and more reliable GEMM performance for
many tile shapes.

See StreamK paper: https://arxiv.org/abs/2301.03598

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
…space sizes"

This change introduces temporary workspace buffers, which can be used to
provide workspace memory to Inductor ops. This is a required feature
in order to enable the Cutlass StreamK tile scheduler, which should
bring performance benefits and more reliable GEMM performance for
many tile shapes.

See StreamK paper: https://arxiv.org/abs/2301.03598

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
…space sizes"

This change introduces temporary workspace buffers, which can be used to
provide workspace memory to Inductor ops. This is a required feature
in order to enable the Cutlass StreamK tile scheduler, which should
bring performance benefits and more reliable GEMM performance for
many tile shapes.

See StreamK paper: https://arxiv.org/abs/2301.03598

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
…space sizes"

This change introduces temporary workspace buffers, which can be used to
provide workspace memory to Inductor ops. This is a required feature
in order to enable the Cutlass StreamK tile scheduler, which should
bring performance benefits and more reliable GEMM performance for
many tile shapes.

See StreamK paper: https://arxiv.org/abs/2301.03598

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
…space sizes"

This change introduces temporary workspace buffers, which can be used to
provide workspace memory to Inductor ops. This is a required feature
in order to enable the Cutlass StreamK tile scheduler, which should
bring performance benefits and more reliable GEMM performance for
many tile shapes.

See StreamK paper: https://arxiv.org/abs/2301.03598

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
@@ -3028,6 +3028,41 @@ class InputBuffer(Buffer):
pass


@dataclasses.dataclass
class WorkspaceBuffer(Buffer):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #117992 I also add workspace allocations but take a slightly different approach. The workspace size is an implementation detail of the code generation, so I don't represent it in the inductor IR and instead put it in KernelArgs and have the Kernel method create the args.

wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peterbell10 - Just looked at it, looks good to me. If the PR you mention is blocked through this ROCm issue (Is that blocking?),maybe you could split the workspace allocation code off into a separate PR, so we can get it merged quickly and I can adapt this to it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ROCm issues are fixed and I'm just waiting on perf results before merging.

@kadeng
Copy link
Contributor Author

kadeng commented Feb 6, 2024

Had to be squashed into #119007

@kadeng kadeng closed this Feb 6, 2024
@github-actions github-actions bot deleted the gh/kadeng/20/head branch March 9, 2024 01:49
OnlyFor pushed a commit to OnlyFor/pytorch that referenced this pull request May 3, 2024
Enable nonzero workspace and Cutlass StreamK for Inductor Cutlass GEMM ops.

This is a simpler rewrite of my original version of pytorch#119005 using peterbell10 's workspace allocation mechanism from pytorch#117992

Test Plan:
 - Additional unit test in test_cutlass_backend.py which specifically tests StreamK GEMM with workspace requirement
 - CI

ghstack-source-id: 24d06299f90a1e31af6b097316b76689e4944df2
Pull Request resolved: pytorch#125406
OnlyFor pushed a commit to OnlyFor/pytorch that referenced this pull request May 4, 2024
Enable nonzero workspace and Cutlass StreamK for Inductor Cutlass GEMM ops.

This is a simpler rewrite of my original version of pytorch#119005 using peterbell10 's workspace allocation mechanism from pytorch#117992

Test Plan:
 - Additional unit test in test_cutlass_backend.py which specifically tests StreamK GEMM with workspace requirement
 - CI

ghstack-source-id: 6b2a29b3d2754b1981b503939f79f7bc5889216e
Pull Request resolved: pytorch#125406
pytorchmergebot pushed a commit that referenced this pull request May 5, 2024
…amK (#125406)

Enable nonzero workspace and Cutlass StreamK for Inductor Cutlass GEMM ops.

This is a simpler rewrite of my original version of #119005 using @peterbell10 's workspace allocation mechanism from #117992

Test Plan:
 - Additional unit test in test_cutlass_backend.py which specifically tests StreamK GEMM with workspace requirement
 - CI

Pull Request resolved: #125406
Approved by: https://github.com/jansel
kadeng added a commit that referenced this pull request May 5, 2024
Enable nonzero workspace and Cutlass StreamK for Inductor Cutlass GEMM ops.

This is a simpler rewrite of my original version of #119005 using peterbell10 's workspace allocation mechanism from #117992

Test Plan:
 - Additional unit test in test_cutlass_backend.py which specifically tests StreamK GEMM with workspace requirement
 - CI

ghstack-source-id: eccd80aedce633b345a88eb25ca1d1149bb12756
Pull Request resolved: #125406
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants