Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Cutlass inductor backend] Cutlass GEMM size threshold #113569

Closed
wants to merge 24 commits into from

Conversation

kadeng
Copy link
Contributor

@kadeng kadeng commented Nov 13, 2023

Stack from ghstack (oldest at bottom):

Cutlass backend GEMMs are comparatively expensive to compile. So they should only be applied to sufficiently large GEMMs. This small diff introduces a new torch._inductor.config option called "cuda.cutlass_backend_min_gemm_size" which introduces a threshold for the size of GEMM problems that the Cutlass backend will be considered for.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @muchulee8 @aakhundov @ColinPeppler

Cutlass backend GEMMs are comparatively expensive to compile. So they should only be applied to sufficiently large GEMMs. This small diff introduces a new torch._inductor.config option called "cuda.cutlass_backend_min_gemm_size" which introduces a threshold for the size of GEMM problems that the Cutlass backend will be considered for.

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Nov 13, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/113569

Note: Links to docs will display an error until the docs builds have been completed.

❌ 13 New Failures, 13 Unrelated Failures

As of commit a9edf4f with merge base afe6d27 (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Cutlass backend GEMMs are comparatively expensive to compile. So they should only be applied to sufficiently large GEMMs. This small diff introduces a new torch._inductor.config option called "cuda.cutlass_backend_min_gemm_size" which introduces a threshold for the size of GEMM problems that the Cutlass backend will be considered for.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
kadeng added a commit that referenced this pull request Nov 13, 2023
Cutlass backend GEMMs are comparatively expensive to compile. So they should only be applied to sufficiently large GEMMs. This small diff introduces a new torch._inductor.config option called "cuda.cutlass_backend_min_gemm_size" which introduces a threshold for the size of GEMM problems that the Cutlass backend will be considered for.

ghstack-source-id: d89d08edb3b05dc6a11308c1273453eba52d82ba
Pull Request resolved: #113569
@@ -577,6 +577,9 @@ class cuda:
# are enabled for the CUTLASS backend.
cutlass_only_evt_capable_ops: bool = False

# Minimum of M*N*N to consider the CUTLASS backend for GEMM ops.
cutlass_backend_min_gemm_size: int = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifying the GEMM size threshold with a single number sounds a bit too simplistic. E.g., there can be tall or thin GEMMs where CUTLASS may outperform Triton, but the M * N * K would be as small as a moderately-sized square GEMM. Can't suggest an immediate alternative, though, as no total ordering in GEMM sizes.

@@ -577,6 +577,9 @@ class cuda:
# are enabled for the CUTLASS backend.
cutlass_only_evt_capable_ops: bool = False

# Minimum of M*N*N to consider the CUTLASS backend for GEMM ops.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: should be "MNK".

Cutlass backend GEMMs are comparatively expensive to compile. So they should only be applied to sufficiently large GEMMs. This small diff introduces a new torch._inductor.config option called "cuda.cutlass_backend_min_gemm_size" which introduces a threshold for the size of GEMM problems that the Cutlass backend will be considered for.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
Cutlass backend GEMMs are comparatively expensive to compile. So they should only be applied to sufficiently large GEMMs. This small diff introduces a new torch._inductor.config option called "cuda.cutlass_backend_min_gemm_size" which introduces a threshold for the size of GEMM problems that the Cutlass backend will be considered for.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
Cutlass backend GEMMs are comparatively expensive to compile. So they should only be applied to sufficiently large GEMMs. This small diff introduces a new torch._inductor.config option called "cuda.cutlass_backend_min_gemm_size" which introduces a threshold for the size of GEMM problems that the Cutlass backend will be considered for.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
Cutlass backend GEMMs are comparatively expensive to compile. So they should only be applied to sufficiently large GEMMs. This small diff introduces a new torch._inductor.config option called "cuda.cutlass_backend_min_gemm_size" which introduces a threshold for the size of GEMM problems that the Cutlass backend will be considered for.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
Cutlass backend GEMMs are comparatively expensive to compile. So they should only be applied to sufficiently large GEMMs. This small diff introduces a new torch._inductor.config option called "cuda.cutlass_backend_min_gemm_size" which introduces a threshold for the size of GEMM problems that the Cutlass backend will be considered for.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
Cutlass backend GEMMs are comparatively expensive to compile. So they should only be applied to sufficiently large GEMMs. This small diff introduces a new torch._inductor.config option called "cuda.cutlass_backend_min_gemm_size" which introduces a threshold for the size of GEMM problems that the Cutlass backend will be considered for.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
Cutlass backend GEMMs are comparatively expensive to compile. So they should only be applied to sufficiently large GEMMs. This small diff introduces a new torch._inductor.config option called "cuda.cutlass_backend_min_gemm_size" which introduces a threshold for the size of GEMM problems that the Cutlass backend will be considered for.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
Cutlass backend GEMMs are comparatively expensive to compile. So they should only be applied to sufficiently large GEMMs. This small diff introduces a new torch._inductor.config option called "cuda.cutlass_backend_min_gemm_size" which introduces a threshold for the size of GEMM problems that the Cutlass backend will be considered for.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
Cutlass backend GEMMs are comparatively expensive to compile. So they should only be applied to sufficiently large GEMMs. This small diff introduces a new torch._inductor.config option called "cuda.cutlass_backend_min_gemm_size" which introduces a threshold for the size of GEMM problems that the Cutlass backend will be considered for.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
Cutlass backend GEMMs are comparatively expensive to compile. So they should only be applied to sufficiently large GEMMs. This small diff introduces a new torch._inductor.config option called "cuda.cutlass_backend_min_gemm_size" which introduces a threshold for the size of GEMM problems that the Cutlass backend will be considered for.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
Cutlass backend GEMMs are comparatively expensive to compile. So they should only be applied to sufficiently large GEMMs. This small diff introduces a new torch._inductor.config option called "cuda.cutlass_backend_min_gemm_size" which introduces a threshold for the size of GEMM problems that the Cutlass backend will be considered for.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
Cutlass backend GEMMs are comparatively expensive to compile. So they should only be applied to sufficiently large GEMMs. This small diff introduces a new torch._inductor.config option called "cuda.cutlass_backend_min_gemm_size" which introduces a threshold for the size of GEMM problems that the Cutlass backend will be considered for.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
Cutlass backend GEMMs are comparatively expensive to compile. So they should only be applied to sufficiently large GEMMs. This small diff introduces a new torch._inductor.config option called "cuda.cutlass_backend_min_gemm_size" which introduces a threshold for the size of GEMM problems that the Cutlass backend will be considered for.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
Cutlass backend GEMMs are comparatively expensive to compile. So they should only be applied to sufficiently large GEMMs. This small diff introduces a new torch._inductor.config option called "cuda.cutlass_backend_min_gemm_size" which introduces a threshold for the size of GEMM problems that the Cutlass backend will be considered for.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
Cutlass backend GEMMs are comparatively expensive to compile. So they should only be applied to sufficiently large GEMMs. This small diff introduces a new torch._inductor.config option called "cuda.cutlass_backend_min_gemm_size" which introduces a threshold for the size of GEMM problems that the Cutlass backend will be considered for.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
Cutlass backend GEMMs are comparatively expensive to compile. So they should only be applied to sufficiently large GEMMs. This small diff introduces a new torch._inductor.config option called "cuda.cutlass_backend_min_gemm_size" which introduces a threshold for the size of GEMM problems that the Cutlass backend will be considered for.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
Cutlass backend GEMMs are comparatively expensive to compile. So they should only be applied to sufficiently large GEMMs. This small diff introduces a new torch._inductor.config option called "cuda.cutlass_backend_min_gemm_size" which introduces a threshold for the size of GEMM problems that the Cutlass backend will be considered for.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
Cutlass backend GEMMs are comparatively expensive to compile. So they should only be applied to sufficiently large GEMMs. This small diff introduces a new torch._inductor.config option called "cuda.cutlass_backend_min_gemm_size" which introduces a threshold for the size of GEMM problems that the Cutlass backend will be considered for.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
Cutlass backend GEMMs are comparatively expensive to compile. So they should only be applied to sufficiently large GEMMs. This small diff introduces a new torch._inductor.config option called "cuda.cutlass_backend_min_gemm_size" which introduces a threshold for the size of GEMM problems that the Cutlass backend will be considered for.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
Cutlass backend GEMMs are comparatively expensive to compile. So they should only be applied to sufficiently large GEMMs. This small diff introduces a new torch._inductor.config option called "cuda.cutlass_backend_min_gemm_size" which introduces a threshold for the size of GEMM problems that the Cutlass backend will be considered for.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
Cutlass backend GEMMs are comparatively expensive to compile. So they should only be applied to sufficiently large GEMMs. This small diff introduces a new torch._inductor.config option called "cuda.cutlass_backend_min_gemm_size" which introduces a threshold for the size of GEMM problems that the Cutlass backend will be considered for.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
Cutlass backend GEMMs are comparatively expensive to compile. So they should only be applied to sufficiently large GEMMs. This small diff introduces a new torch._inductor.config option called "cuda.cutlass_backend_min_gemm_size" which introduces a threshold for the size of GEMM problems that the Cutlass backend will be considered for.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
@kadeng
Copy link
Contributor Author

kadeng commented Dec 15, 2023

Moved to a (draft) feature branch, see #115919

@kadeng kadeng closed this Dec 15, 2023
@facebook-github-bot facebook-github-bot deleted the gh/kadeng/23/head branch January 14, 2024 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants