[PTX] Use the correct ptx version (8.5) for CUDA 12.6 #16504

sergey-kozub · 2024-08-27T11:51:38Z

For CUDA 12.6, PTX version is 8.5 (simple versioning rule heuristic used previously no longer works):
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes

For CUDA 12.6 and above, use the highest known PTX version (8.5).
Without this change, the following error is observed:
'+ptx86' is not a recognized feature for this target (ignoring feature)
See: #16431

This PR also adds a basic test.

beckerhe · 2024-08-27T11:57:42Z

xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc

-  // CUDA 12.4 -> PTX 8.4 etc.
-  return (cuda_version[0] - 4) * 10 + cuda_version[1];
+  // CUDA 12.4 -> PTX 8.4
+  // This versioning scheme is valid until CUDA 12.6
+  int cuda_id = cuda_version[0] * 10 + cuda_version[1];
+  return cuda_id < 126 ? cuda_id - 40 : kMaxPtxVersion;


Can you do the comparison of Cuda versions in ToolVersion - like cuda_version >= ToolVersion{12, 6}?

Also can you put that into a separate if-statement that returns early? The current version is a bit hard to read?

ToolVersion is defined in xla/stream_executor/cuda/cuda_asm_compiler.h (i.e. must live inside #if GOOGLE_CUDA conditional), which makes it hard to use. So I introduced another type alias, nvptx::Version, which is a simple int pair.

Replaced the if statements as you suggest above.

beckerhe · 2024-08-27T11:57:57Z

xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc

+int PtxVersionFromCudaVersionForTest(se::ToolVersion tool_version) {
+  return DetermineHighestSupportedPtxVersionFromCudaVersion(tool_version);
+}


Why not expose the original function?

Exposed the original function in the nvptx namespace.

beckerhe · 2024-08-27T11:58:25Z

xla/service/gpu/llvm_gpu_backend/gpu_backend_lib_test.cc

@@ -33,6 +38,44 @@ TEST(UtilsTest, TestGetSmName) {
  ASSERT_EQ(nvptx::GetSmName(cc_next), "sm_90");
 }

+struct VersionPair {
+  int ptx_version;
+  int cuda_version;


Can you use ToolVersion instead of an int for the CUDA version?

beckerhe · 2024-08-27T11:59:30Z

xla/service/gpu/llvm_gpu_backend/gpu_backend_lib_test.cc

+                             // CUDA 11
+                             {70, 110},
+                             {71, 111},
+                             {72, 112},
+                             {73, 113},
+                             {74, 114},
+                             {75, 115},
+                             {76, 116},
+                             {77, 117},
+                             {78, 118},
+                             // CUDA 12
+                             {80, 120},
+                             {81, 121},
+                             {82, 122},
+                             {83, 123},
+                             {84, 124},
+                             {85, 125},
+                             {85, 126},


Nit: If you were reversing the order (CUDA version first, PTX version second), then it would feel more naturally like a mapping.

beckerhe

Thank you!

Imported from GitHub PR openxla/xla#16504 For CUDA 12.6, PTX version is 8.5 (simple versioning rule heuristic used previously no longer works): https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes For CUDA 12.6 and above, use the highest known PTX version (8.5). Without this change, the following error is observed: '+ptx86' is not a recognized feature for this target (ignoring feature) See: openxla/xla#16431 This PR also adds a basic test. Copybara import of the project: -- f8ec224aff879ffa263ade91397d5dc3f03aca45 by Sergey Kozub <skozub@nvidia.com>: [PTX] Use the correct ptx version (8.5) for CUDA 12.6 Merging this change closes #16504 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16504 from openxla:skozub/ptx_version_cuda_126 f8ec224aff879ffa263ade91397d5dc3f03aca45 PiperOrigin-RevId: 668329046

sergey-kozub · 2024-08-28T09:44:40Z

@beckerhe Why does github say "Merging is blocked (Merging can be performed automatically with 1 approving review)" after your approval?

beckerhe · 2024-08-28T09:49:17Z

@beckerhe Why does github say "Merging is blocked (Merging can be performed automatically with 1 approving review)" after your approval?

This is just because due to Copybara this is not the PR thet gets merged. The change is already being processed internally and should land in a few minutes.

FUTURE_COPYBARA_INTEGRATE_REVIEW=#16504 from openxla:skozub/ptx_version_cuda_126 f8ec224 PiperOrigin-RevId: 668290110

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16504 from openxla:skozub/ptx_version_cuda_126 f8ec224aff879ffa263ade91397d5dc3f03aca45 PiperOrigin-RevId: 668290110

Imported from GitHub PR openxla/xla#16504 For CUDA 12.6, PTX version is 8.5 (simple versioning rule heuristic used previously no longer works): https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes For CUDA 12.6 and above, use the highest known PTX version (8.5). Without this change, the following error is observed: '+ptx86' is not a recognized feature for this target (ignoring feature) See: openxla/xla#16431 This PR also adds a basic test. Copybara import of the project: -- f8ec224aff879ffa263ade91397d5dc3f03aca45 by Sergey Kozub <skozub@nvidia.com>: [PTX] Use the correct ptx version (8.5) for CUDA 12.6 Merging this change closes #16504 PiperOrigin-RevId: 668368483

sergey-kozub requested a review from beckerhe August 27, 2024 11:51

sergey-kozub mentioned this pull request Aug 27, 2024

CUDA 12.6 uses PTX ISA version 8.5 not 8.6 #16431

Closed

sergey-kozub force-pushed the skozub/ptx_version_cuda_126 branch 2 times, most recently from 5a5a788 to b5b3f3e Compare August 27, 2024 11:56

beckerhe reviewed Aug 27, 2024

View reviewed changes

sergey-kozub force-pushed the skozub/ptx_version_cuda_126 branch 2 times, most recently from 4899d37 to f8ec224 Compare August 27, 2024 20:17

[PTX] Use the correct ptx version (8.5) for CUDA 12.6

f8ec224

NaiyerRizz self-assigned this Aug 28, 2024

beckerhe approved these changes Aug 28, 2024

View reviewed changes

copybara-service bot mentioned this pull request Aug 28, 2024

PR #16504: [PTX] Use the correct ptx version (8.5) for CUDA 12.6 tensorflow/tensorflow#74683

Merged

copybara-service bot pushed a commit that referenced this pull request Aug 28, 2024

Automated Code Change

a718ec9

FUTURE_COPYBARA_INTEGRATE_REVIEW=#16504 from openxla:skozub/ptx_version_cuda_126 f8ec224 PiperOrigin-RevId: 668290110

copybara-service bot mentioned this pull request Aug 28, 2024

Automated Code Change #16551

Merged

copybara-service bot closed this in bb99f7f Aug 28, 2024

copybara-service bot mentioned this pull request Aug 28, 2024

Automated Code Change tensorflow/tensorflow#74667

Merged

sergey-kozub deleted the skozub/ptx_version_cuda_126 branch October 7, 2024 19:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PTX] Use the correct ptx version (8.5) for CUDA 12.6 #16504

[PTX] Use the correct ptx version (8.5) for CUDA 12.6 #16504

sergey-kozub commented Aug 27, 2024

beckerhe Aug 27, 2024

sergey-kozub Aug 27, 2024

beckerhe Aug 27, 2024

sergey-kozub Aug 27, 2024

beckerhe Aug 27, 2024

sergey-kozub Aug 27, 2024

beckerhe Aug 27, 2024

sergey-kozub Aug 27, 2024

beckerhe left a comment

sergey-kozub commented Aug 28, 2024

beckerhe commented Aug 28, 2024

[PTX] Use the correct ptx version (8.5) for CUDA 12.6 #16504

[PTX] Use the correct ptx version (8.5) for CUDA 12.6 #16504

Conversation

sergey-kozub commented Aug 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

beckerhe left a comment

Choose a reason for hiding this comment

sergey-kozub commented Aug 28, 2024

beckerhe commented Aug 28, 2024