Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Hopper support #5538

Merged
merged 4 commits into from
Oct 10, 2022
Merged

Add Hopper support #5538

merged 4 commits into from
Oct 10, 2022

Conversation

crtrott
Copy link
Member

@crtrott crtrott commented Oct 9, 2022

Addresses issue #5524

Copy link
Member

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switch (compute_capability) {
case 30:
case 32:
case 35: return 16;
case 37: return 80;
case 50:
case 53:
case 60:
case 62: return 64;
case 52:
case 61: return 96;
case 70:
case 80:
case 86: return 8;
case 75: return 32;

@crtrott
Copy link
Member Author

crtrott commented Oct 10, 2022

Added the shared config, and also the printconfig thing. Confirmed in tuning guide that it also can do 8kB shared memory.

Copy link
Contributor

@masterleinad masterleinad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about cmake/compile_tests/cuda_compute_capability.cc?

@crtrott
Copy link
Member Author

crtrott commented Oct 10, 2022

Fixed: also now actually checked for each use of ARCH_AMPERE and ARCH_VOLTA in our code base and made the appropriate adjustments.

@crtrott
Copy link
Member Author

crtrott commented Oct 10, 2022

Fixed the logic mistake in the half precision thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backend - CUDA CHANGELOG Item to be included in release CHANGELOG Patch Release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants