Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PTX] Use the correct ptx version (8.5) for CUDA 12.6 #16504

Closed
wants to merge 1 commit into from

Conversation

sergey-kozub
Copy link
Contributor

For CUDA 12.6, PTX version is 8.5 (simple versioning rule heuristic used previously no longer works):
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes

For CUDA 12.6 and above, use the highest known PTX version (8.5).
Without this change, the following error is observed:
'+ptx86' is not a recognized feature for this target (ignoring feature)
See: #16431

This PR also adds a basic test.

@sergey-kozub sergey-kozub requested a review from beckerhe August 27, 2024 11:51
@sergey-kozub sergey-kozub force-pushed the skozub/ptx_version_cuda_126 branch 2 times, most recently from 5a5a788 to b5b3f3e Compare August 27, 2024 11:56
Comment on lines 316 to 320
// CUDA 12.4 -> PTX 8.4 etc.
return (cuda_version[0] - 4) * 10 + cuda_version[1];
// CUDA 12.4 -> PTX 8.4
// This versioning scheme is valid until CUDA 12.6
int cuda_id = cuda_version[0] * 10 + cuda_version[1];
return cuda_id < 126 ? cuda_id - 40 : kMaxPtxVersion;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do the comparison of Cuda versions in ToolVersion - like cuda_version >= ToolVersion{12, 6}?

Also can you put that into a separate if-statement that returns early? The current version is a bit hard to read?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ToolVersion is defined in xla/stream_executor/cuda/cuda_asm_compiler.h (i.e. must live inside #if GOOGLE_CUDA conditional), which makes it hard to use. So I introduced another type alias, nvptx::Version, which is a simple int pair.

Replaced the if statements as you suggest above.

Comment on lines 714 to 717
int PtxVersionFromCudaVersionForTest(se::ToolVersion tool_version) {
return DetermineHighestSupportedPtxVersionFromCudaVersion(tool_version);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not expose the original function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exposed the original function in the nvptx namespace.

@@ -33,6 +38,44 @@ TEST(UtilsTest, TestGetSmName) {
ASSERT_EQ(nvptx::GetSmName(cc_next), "sm_90");
}

struct VersionPair {
int ptx_version;
int cuda_version;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use ToolVersion instead of an int for the CUDA version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment on lines 56 to 73
// CUDA 11
{70, 110},
{71, 111},
{72, 112},
{73, 113},
{74, 114},
{75, 115},
{76, 116},
{77, 117},
{78, 118},
// CUDA 12
{80, 120},
{81, 121},
{82, 122},
{83, 123},
{84, 124},
{85, 125},
{85, 126},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: If you were reversing the order (CUDA version first, PTX version second), then it would feel more naturally like a mapping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@sergey-kozub sergey-kozub force-pushed the skozub/ptx_version_cuda_126 branch 2 times, most recently from 4899d37 to f8ec224 Compare August 27, 2024 20:17
@NaiyerRizz NaiyerRizz self-assigned this Aug 28, 2024
Copy link
Member

@beckerhe beckerhe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Aug 28, 2024
Imported from GitHub PR openxla/xla#16504

For CUDA 12.6, PTX version is 8.5 (simple versioning rule heuristic used previously no longer works):
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes

For CUDA 12.6 and above, use the highest known PTX version (8.5).
Without this change, the following error is observed:
'+ptx86' is not a recognized feature for this target (ignoring feature)
See: openxla/xla#16431

This PR also adds a basic test.

Copybara import of the project:

--
f8ec224aff879ffa263ade91397d5dc3f03aca45 by Sergey Kozub <skozub@nvidia.com>:

[PTX] Use the correct ptx version (8.5) for CUDA 12.6

Merging this change closes #16504

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16504 from openxla:skozub/ptx_version_cuda_126 f8ec224aff879ffa263ade91397d5dc3f03aca45
PiperOrigin-RevId: 668329046
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Aug 28, 2024
Imported from GitHub PR openxla/xla#16504

For CUDA 12.6, PTX version is 8.5 (simple versioning rule heuristic used previously no longer works):
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes

For CUDA 12.6 and above, use the highest known PTX version (8.5).
Without this change, the following error is observed:
'+ptx86' is not a recognized feature for this target (ignoring feature)
See: openxla/xla#16431

This PR also adds a basic test.

Copybara import of the project:

--
f8ec224aff879ffa263ade91397d5dc3f03aca45 by Sergey Kozub <skozub@nvidia.com>:

[PTX] Use the correct ptx version (8.5) for CUDA 12.6

Merging this change closes #16504

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16504 from openxla:skozub/ptx_version_cuda_126 f8ec224aff879ffa263ade91397d5dc3f03aca45
PiperOrigin-RevId: 668329046
@sergey-kozub
Copy link
Contributor Author

@beckerhe Why does github say "Merging is blocked (Merging can be performed automatically with 1 approving review)" after your approval?

@beckerhe
Copy link
Member

@beckerhe Why does github say "Merging is blocked (Merging can be performed automatically with 1 approving review)" after your approval?

This is just because due to Copybara this is not the PR thet gets merged. The change is already being processed internally and should land in a few minutes.

copybara-service bot pushed a commit that referenced this pull request Aug 28, 2024
FUTURE_COPYBARA_INTEGRATE_REVIEW=#16504 from openxla:skozub/ptx_version_cuda_126 f8ec224
PiperOrigin-RevId: 668290110
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Aug 28, 2024
FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16504 from openxla:skozub/ptx_version_cuda_126 f8ec224aff879ffa263ade91397d5dc3f03aca45
PiperOrigin-RevId: 668290110
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Aug 28, 2024
Imported from GitHub PR openxla/xla#16504

For CUDA 12.6, PTX version is 8.5 (simple versioning rule heuristic used previously no longer works):
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes

For CUDA 12.6 and above, use the highest known PTX version (8.5).
Without this change, the following error is observed:
'+ptx86' is not a recognized feature for this target (ignoring feature)
See: openxla/xla#16431

This PR also adds a basic test.

Copybara import of the project:

--
f8ec224aff879ffa263ade91397d5dc3f03aca45 by Sergey Kozub <skozub@nvidia.com>:

[PTX] Use the correct ptx version (8.5) for CUDA 12.6

Merging this change closes #16504

PiperOrigin-RevId: 668368483
@sergey-kozub sergey-kozub deleted the skozub/ptx_version_cuda_126 branch October 7, 2024 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants