Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Cutlass 3.3 submodule upgrade] #112861

Closed
wants to merge 28 commits into from
Closed

Conversation

kadeng
Copy link
Contributor

@kadeng kadeng commented Nov 3, 2023

Updates third_party/cutlass to Cutlass v3.3. No further changes appear neccessary.

Cutlass release 3.3 has not been tagged yet, the revision-hash is 1d7f2a207ec215e037099f4ba5632ccfa0249673 ( Cutlass 3.3 and two minor hotfixes on top )

Cutlass 3.3 offers the following improvements:

  • Adds support for mixed precision GEMMs On Hopper and Ampere
  • Adds support for < 16B aligned GEMMs on Hopper
  • Enhancements to EVT
  • Enhancements to Python interface
  • Enhancements to Sub-byte type handling in CuTe
  • Several other bug-fixes and performance improvements.
  • minor doc update

Test Plan:

  • CI ( ciflow/trunk, ciflow/inductor )
  • pytest test/inductor/test_max_autotune.py

Stack from ghstack (oldest at bottom):

Differential Revision: D50988216

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @muchulee8 @aakhundov @ColinPeppler

Copy link

pytorch-bot bot commented Nov 3, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/112861

Note: Links to docs will display an error until the docs builds have been completed.

❌ 23 New Failures, 11 Unrelated Failures

As of commit bca0bd0 with merge base afe6d27 (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

kadeng added a commit that referenced this pull request Nov 3, 2023
ghstack-source-id: 6ef08d44aabacacf0827babadf2758e5a99eb780
Pull Request resolved: #112861
@kadeng kadeng added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 3, 2023
kadeng added a commit that referenced this pull request Nov 3, 2023
ghstack-source-id: 914166b584551faa2b4f73aa40d6fdd00701da9c
Pull Request resolved: #112861
kadeng added a commit that referenced this pull request Nov 3, 2023
ghstack-source-id: d8af54a5f1f55128c8c155351d21b207b89ecf51
Pull Request resolved: #112861
kadeng added a commit that referenced this pull request Nov 3, 2023
ghstack-source-id: 588d706b1c056aa2a7143923eaae1cd654b594fe
Pull Request resolved: #112861
kadeng added a commit that referenced this pull request Nov 3, 2023
ghstack-source-id: b49a4c870571fa592fd0ae66b373d632a3c81e00
Pull Request resolved: #112861
@kadeng
Copy link
Contributor Author

kadeng commented Nov 3, 2023

@kadeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@kadeng kadeng changed the title Upgrade Cutlass to 3.3 (experimental) [Cutlass 3.3 submodule upgrade] Nov 6, 2023
@kadeng kadeng marked this pull request as ready for review November 6, 2023 17:04
Updates third_party/cutlass to Cutlass v3.3. No further changes appear neccessary.

Cutlass release 3.3 has not been tagged yet, the revision-hash is 1d7f2a207ec215e037099f4ba5632ccfa0249673 ( Cutlass 3.3 and two minor hotfixes on top )

Cutlass 3.3 offers the following improvements:

-  Adds support for mixed precision GEMMs On Hopper and Ampere
-  Adds support for < 16B aligned GEMMs on Hopper
-  Enhancements to EVT
-  Enhancements to Python interface
-  Enhancements to Sub-byte type handling in CuTe
-  Several other bug-fixes and performance improvements.
- minor doc update

Test Plan:

 - CI ( ciflow/trunk, ciflow/inductor )
 - pytest test/inductor/test_max_autotune.py


Differential Revision: [D50988216](https://our.internmc.facebook.com/intern/diff/D50988216)

[ghstack-poisoned]
Updates third_party/cutlass to Cutlass v3.3. No further changes appear neccessary.

Cutlass release 3.3 has not been tagged yet, the revision-hash is 1d7f2a207ec215e037099f4ba5632ccfa0249673 ( Cutlass 3.3 and two minor hotfixes on top )

Cutlass 3.3 offers the following improvements:

-  Adds support for mixed precision GEMMs On Hopper and Ampere
-  Adds support for < 16B aligned GEMMs on Hopper
-  Enhancements to EVT
-  Enhancements to Python interface
-  Enhancements to Sub-byte type handling in CuTe
-  Several other bug-fixes and performance improvements.
- minor doc update

Test Plan:

 - CI ( ciflow/trunk, ciflow/inductor )
 - pytest test/inductor/test_max_autotune.py


Differential Revision: [D50988216](https://our.internmc.facebook.com/intern/diff/D50988216)

[ghstack-poisoned]
Updates third_party/cutlass to Cutlass v3.3. No further changes appear neccessary.

Cutlass release 3.3 has not been tagged yet, the revision-hash is 1d7f2a207ec215e037099f4ba5632ccfa0249673 ( Cutlass 3.3 and two minor hotfixes on top )

Cutlass 3.3 offers the following improvements:

-  Adds support for mixed precision GEMMs On Hopper and Ampere
-  Adds support for < 16B aligned GEMMs on Hopper
-  Enhancements to EVT
-  Enhancements to Python interface
-  Enhancements to Sub-byte type handling in CuTe
-  Several other bug-fixes and performance improvements.
- minor doc update

Test Plan:

 - CI ( ciflow/trunk, ciflow/inductor )
 - pytest test/inductor/test_max_autotune.py


Differential Revision: [D50988216](https://our.internmc.facebook.com/intern/diff/D50988216)

[ghstack-poisoned]
Updates third_party/cutlass to Cutlass v3.3. No further changes appear neccessary.

Cutlass release 3.3 has not been tagged yet, the revision-hash is 1d7f2a207ec215e037099f4ba5632ccfa0249673 ( Cutlass 3.3 and two minor hotfixes on top )

Cutlass 3.3 offers the following improvements:

-  Adds support for mixed precision GEMMs On Hopper and Ampere
-  Adds support for < 16B aligned GEMMs on Hopper
-  Enhancements to EVT
-  Enhancements to Python interface
-  Enhancements to Sub-byte type handling in CuTe
-  Several other bug-fixes and performance improvements.
- minor doc update

Test Plan:

 - CI ( ciflow/trunk, ciflow/inductor )
 - pytest test/inductor/test_max_autotune.py


Differential Revision: [D50988216](https://our.internmc.facebook.com/intern/diff/D50988216)

[ghstack-poisoned]
@kadeng
Copy link
Contributor Author

kadeng commented Dec 6, 2023

There is a bunch of CI errors, please take a look.

I was waiting for the official v3.3 tag. That's released now, and it actually fixes an important bug that (in the meantime ) caused build failures. Waiting for CI now..

Updates third_party/cutlass to Cutlass v3.3. No further changes appear neccessary.

Cutlass release 3.3 has not been tagged yet, the revision-hash is 1d7f2a207ec215e037099f4ba5632ccfa0249673 ( Cutlass 3.3 and two minor hotfixes on top )

Cutlass 3.3 offers the following improvements:

-  Adds support for mixed precision GEMMs On Hopper and Ampere
-  Adds support for < 16B aligned GEMMs on Hopper
-  Enhancements to EVT
-  Enhancements to Python interface
-  Enhancements to Sub-byte type handling in CuTe
-  Several other bug-fixes and performance improvements.
- minor doc update

Test Plan:

 - CI ( ciflow/trunk, ciflow/inductor )
 - pytest test/inductor/test_max_autotune.py


Differential Revision: [D50988216](https://our.internmc.facebook.com/intern/diff/D50988216)

[ghstack-poisoned]
Updates third_party/cutlass to Cutlass v3.3. No further changes appear neccessary.

Cutlass release 3.3 has not been tagged yet, the revision-hash is 1d7f2a207ec215e037099f4ba5632ccfa0249673 ( Cutlass 3.3 and two minor hotfixes on top )

Cutlass 3.3 offers the following improvements:

-  Adds support for mixed precision GEMMs On Hopper and Ampere
-  Adds support for < 16B aligned GEMMs on Hopper
-  Enhancements to EVT
-  Enhancements to Python interface
-  Enhancements to Sub-byte type handling in CuTe
-  Several other bug-fixes and performance improvements.
- minor doc update

Test Plan:

 - CI ( ciflow/trunk, ciflow/inductor )
 - pytest test/inductor/test_max_autotune.py


Differential Revision: [D50988216](https://our.internmc.facebook.com/intern/diff/D50988216)

[ghstack-poisoned]
kadeng added a commit that referenced this pull request Dec 7, 2023
Cutlass 3.3 offers the following improvements:

Adds support for mixed precision GEMMs On Hopper and Ampere
Adds support for < 16B aligned GEMMs on Hopper
Enhancements to EVT
Enhancements to Python interface
Enhancements to Sub-byte type handling in CuTe
Several other bug-fixes and performance improvements.
minor doc update
Test Plan:

CI ( ciflow/trunk, ciflow/inductor )
pytest test/inductor/test_max_autotune.py

ghstack-source-id: 1363752e2699509ab1c5dde200bb2111ec0694d9
Pull Request resolved: #112861
Updates third_party/cutlass to Cutlass v3.3. No further changes appear neccessary.

Cutlass release 3.3 has not been tagged yet, the revision-hash is 1d7f2a207ec215e037099f4ba5632ccfa0249673 ( Cutlass 3.3 and two minor hotfixes on top )

Cutlass 3.3 offers the following improvements:

-  Adds support for mixed precision GEMMs On Hopper and Ampere
-  Adds support for < 16B aligned GEMMs on Hopper
-  Enhancements to EVT
-  Enhancements to Python interface
-  Enhancements to Sub-byte type handling in CuTe
-  Several other bug-fixes and performance improvements.
- minor doc update

Test Plan:

 - CI ( ciflow/trunk, ciflow/inductor )
 - pytest test/inductor/test_max_autotune.py


Differential Revision: [D50988216](https://our.internmc.facebook.com/intern/diff/D50988216)

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
kadeng added a commit that referenced this pull request Dec 7, 2023
Cutlass 3.3 offers the following improvements:

Adds support for mixed precision GEMMs On Hopper and Ampere
Adds support for < 16B aligned GEMMs on Hopper
Enhancements to EVT
Enhancements to Python interface
Enhancements to Sub-byte type handling in CuTe
Several other bug-fixes and performance improvements.
minor doc update
Test Plan:

CI ( ciflow/trunk, ciflow/inductor )
pytest test/inductor/test_max_autotune.py

ghstack-source-id: 4956e5d00692fcf9ec3048085c798ca334808679
Pull Request resolved: #112861
Updates third_party/cutlass to Cutlass v3.3. No further changes appear neccessary.

Cutlass release 3.3 has not been tagged yet, the revision-hash is 1d7f2a207ec215e037099f4ba5632ccfa0249673 ( Cutlass 3.3 and two minor hotfixes on top )

Cutlass 3.3 offers the following improvements:

-  Adds support for mixed precision GEMMs On Hopper and Ampere
-  Adds support for < 16B aligned GEMMs on Hopper
-  Enhancements to EVT
-  Enhancements to Python interface
-  Enhancements to Sub-byte type handling in CuTe
-  Several other bug-fixes and performance improvements.
- minor doc update

Test Plan:

 - CI ( ciflow/trunk, ciflow/inductor )
 - pytest test/inductor/test_max_autotune.py


Differential Revision: [D50988216](https://our.internmc.facebook.com/intern/diff/D50988216)

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
kadeng added a commit that referenced this pull request Dec 7, 2023
Cutlass 3.3 offers the following improvements:

Adds support for mixed precision GEMMs On Hopper and Ampere
Adds support for < 16B aligned GEMMs on Hopper
Enhancements to EVT
Enhancements to Python interface
Enhancements to Sub-byte type handling in CuTe
Several other bug-fixes and performance improvements.
minor doc update
Test Plan:

CI ( ciflow/trunk, ciflow/inductor )
pytest test/inductor/test_max_autotune.py

ghstack-source-id: cca382a4785bce0b6b64d443f2cbcc6a522c5116
Pull Request resolved: #112861
Updates third_party/cutlass to Cutlass v3.3. No further changes appear neccessary.

Cutlass release 3.3 has not been tagged yet, the revision-hash is 1d7f2a207ec215e037099f4ba5632ccfa0249673 ( Cutlass 3.3 and two minor hotfixes on top )

Cutlass 3.3 offers the following improvements:

-  Adds support for mixed precision GEMMs On Hopper and Ampere
-  Adds support for < 16B aligned GEMMs on Hopper
-  Enhancements to EVT
-  Enhancements to Python interface
-  Enhancements to Sub-byte type handling in CuTe
-  Several other bug-fixes and performance improvements.
- minor doc update

Test Plan:

 - CI ( ciflow/trunk, ciflow/inductor )
 - pytest test/inductor/test_max_autotune.py


Differential Revision: [D50988216](https://our.internmc.facebook.com/intern/diff/D50988216)

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
kadeng added a commit that referenced this pull request Dec 9, 2023
Cutlass 3.3 offers the following improvements:

Adds support for mixed precision GEMMs On Hopper and Ampere
Adds support for < 16B aligned GEMMs on Hopper
Enhancements to EVT
Enhancements to Python interface
Enhancements to Sub-byte type handling in CuTe
Several other bug-fixes and performance improvements.
minor doc update
Test Plan:

CI ( ciflow/trunk, ciflow/inductor )
pytest test/inductor/test_max_autotune.py

ghstack-source-id: 980c3b025d6d04ced8415cae15131f443c55f360
Pull Request resolved: #112861
Updates third_party/cutlass to Cutlass v3.3. No further changes appear neccessary.

Cutlass release 3.3 has not been tagged yet, the revision-hash is 1d7f2a207ec215e037099f4ba5632ccfa0249673 ( Cutlass 3.3 and two minor hotfixes on top )

Cutlass 3.3 offers the following improvements:

-  Adds support for mixed precision GEMMs On Hopper and Ampere
-  Adds support for < 16B aligned GEMMs on Hopper
-  Enhancements to EVT
-  Enhancements to Python interface
-  Enhancements to Sub-byte type handling in CuTe
-  Several other bug-fixes and performance improvements.
- minor doc update

Test Plan:

 - CI ( ciflow/trunk, ciflow/inductor )
 - pytest test/inductor/test_max_autotune.py


Differential Revision: [D50988216](https://our.internmc.facebook.com/intern/diff/D50988216)

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
kadeng added a commit that referenced this pull request Dec 10, 2023
Cutlass 3.3 offers the following improvements:

Adds support for mixed precision GEMMs On Hopper and Ampere
Adds support for < 16B aligned GEMMs on Hopper
Enhancements to EVT
Enhancements to Python interface
Enhancements to Sub-byte type handling in CuTe
Several other bug-fixes and performance improvements.
minor doc update
Test Plan:

CI ( ciflow/trunk, ciflow/inductor )
pytest test/inductor/test_max_autotune.py

ghstack-source-id: b289b21b3e5e937975644cfa92888b55285087c2
Pull Request resolved: #112861
Updates third_party/cutlass to Cutlass v3.3. No further changes appear neccessary.

Cutlass release 3.3 has not been tagged yet, the revision-hash is 1d7f2a207ec215e037099f4ba5632ccfa0249673 ( Cutlass 3.3 and two minor hotfixes on top )

Cutlass 3.3 offers the following improvements:

-  Adds support for mixed precision GEMMs On Hopper and Ampere
-  Adds support for < 16B aligned GEMMs on Hopper
-  Enhancements to EVT
-  Enhancements to Python interface
-  Enhancements to Sub-byte type handling in CuTe
-  Several other bug-fixes and performance improvements.
- minor doc update

Test Plan:

 - CI ( ciflow/trunk, ciflow/inductor )
 - pytest test/inductor/test_max_autotune.py


Differential Revision: [D50988216](https://our.internmc.facebook.com/intern/diff/D50988216)

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
kadeng added a commit that referenced this pull request Dec 10, 2023
Cutlass 3.3 offers the following improvements:

Adds support for mixed precision GEMMs On Hopper and Ampere
Adds support for < 16B aligned GEMMs on Hopper
Enhancements to EVT
Enhancements to Python interface
Enhancements to Sub-byte type handling in CuTe
Several other bug-fixes and performance improvements.
minor doc update
Test Plan:

CI ( ciflow/trunk, ciflow/inductor )
pytest test/inductor/test_max_autotune.py

ghstack-source-id: 878506f289216d2e15b54876fb5b5a3cf6b780a8
Pull Request resolved: #112861
Updates third_party/cutlass to Cutlass v3.3. No further changes appear neccessary.

Cutlass release 3.3 has not been tagged yet, the revision-hash is 1d7f2a207ec215e037099f4ba5632ccfa0249673 ( Cutlass 3.3 and two minor hotfixes on top )

Cutlass 3.3 offers the following improvements:

-  Adds support for mixed precision GEMMs On Hopper and Ampere
-  Adds support for < 16B aligned GEMMs on Hopper
-  Enhancements to EVT
-  Enhancements to Python interface
-  Enhancements to Sub-byte type handling in CuTe
-  Several other bug-fixes and performance improvements.
- minor doc update

Test Plan:

 - CI ( ciflow/trunk, ciflow/inductor )
 - pytest test/inductor/test_max_autotune.py


Differential Revision: [D50988216](https://our.internmc.facebook.com/intern/diff/D50988216)

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
kadeng added a commit that referenced this pull request Dec 10, 2023
Cutlass 3.3 offers the following improvements:

Adds support for mixed precision GEMMs On Hopper and Ampere
Adds support for < 16B aligned GEMMs on Hopper
Enhancements to EVT
Enhancements to Python interface
Enhancements to Sub-byte type handling in CuTe
Several other bug-fixes and performance improvements.
minor doc update
Test Plan:

CI ( ciflow/trunk, ciflow/inductor )
pytest test/inductor/test_max_autotune.py

ghstack-source-id: 3880723ea0a3ec4fab13373ef50c1149da0c2888
Pull Request resolved: #112861
Updates third_party/cutlass to Cutlass v3.3. No further changes appear neccessary.

Cutlass release 3.3 has not been tagged yet, the revision-hash is 1d7f2a207ec215e037099f4ba5632ccfa0249673 ( Cutlass 3.3 and two minor hotfixes on top )

Cutlass 3.3 offers the following improvements:

-  Adds support for mixed precision GEMMs On Hopper and Ampere
-  Adds support for < 16B aligned GEMMs on Hopper
-  Enhancements to EVT
-  Enhancements to Python interface
-  Enhancements to Sub-byte type handling in CuTe
-  Several other bug-fixes and performance improvements.
- minor doc update

Test Plan:

 - CI ( ciflow/trunk, ciflow/inductor )
 - pytest test/inductor/test_max_autotune.py


Differential Revision: [D50988216](https://our.internmc.facebook.com/intern/diff/D50988216)

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
@kadeng
Copy link
Contributor Author

kadeng commented Dec 15, 2023

Moved to a (draft) feature branch, see #115919

@kadeng kadeng closed this Dec 15, 2023
@facebook-github-bot facebook-github-bot deleted the gh/kadeng/8/head branch January 14, 2024 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request module: inductor topic: not user facing topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants