-
Notifications
You must be signed in to change notification settings - Fork 499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PTX] Use the correct ptx version (8.5) for CUDA 12.6 #16504
Conversation
5a5a788
to
b5b3f3e
Compare
// CUDA 12.4 -> PTX 8.4 etc. | ||
return (cuda_version[0] - 4) * 10 + cuda_version[1]; | ||
// CUDA 12.4 -> PTX 8.4 | ||
// This versioning scheme is valid until CUDA 12.6 | ||
int cuda_id = cuda_version[0] * 10 + cuda_version[1]; | ||
return cuda_id < 126 ? cuda_id - 40 : kMaxPtxVersion; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you do the comparison of Cuda versions in ToolVersion
- like cuda_version >= ToolVersion{12, 6}
?
Also can you put that into a separate if
-statement that returns early? The current version is a bit hard to read?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ToolVersion
is defined in xla/stream_executor/cuda/cuda_asm_compiler.h
(i.e. must live inside #if GOOGLE_CUDA
conditional), which makes it hard to use. So I introduced another type alias, nvptx::Version
, which is a simple int pair.
Replaced the if
statements as you suggest above.
int PtxVersionFromCudaVersionForTest(se::ToolVersion tool_version) { | ||
return DetermineHighestSupportedPtxVersionFromCudaVersion(tool_version); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not expose the original function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exposed the original function in the nvptx
namespace.
@@ -33,6 +38,44 @@ TEST(UtilsTest, TestGetSmName) { | |||
ASSERT_EQ(nvptx::GetSmName(cc_next), "sm_90"); | |||
} | |||
|
|||
struct VersionPair { | |||
int ptx_version; | |||
int cuda_version; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use ToolVersion
instead of an int
for the CUDA version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
// CUDA 11 | ||
{70, 110}, | ||
{71, 111}, | ||
{72, 112}, | ||
{73, 113}, | ||
{74, 114}, | ||
{75, 115}, | ||
{76, 116}, | ||
{77, 117}, | ||
{78, 118}, | ||
// CUDA 12 | ||
{80, 120}, | ||
{81, 121}, | ||
{82, 122}, | ||
{83, 123}, | ||
{84, 124}, | ||
{85, 125}, | ||
{85, 126}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: If you were reversing the order (CUDA version first, PTX version second), then it would feel more naturally like a mapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
4899d37
to
f8ec224
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Imported from GitHub PR openxla/xla#16504 For CUDA 12.6, PTX version is 8.5 (simple versioning rule heuristic used previously no longer works): https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes For CUDA 12.6 and above, use the highest known PTX version (8.5). Without this change, the following error is observed: '+ptx86' is not a recognized feature for this target (ignoring feature) See: openxla/xla#16431 This PR also adds a basic test. Copybara import of the project: -- f8ec224aff879ffa263ade91397d5dc3f03aca45 by Sergey Kozub <skozub@nvidia.com>: [PTX] Use the correct ptx version (8.5) for CUDA 12.6 Merging this change closes #16504 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16504 from openxla:skozub/ptx_version_cuda_126 f8ec224aff879ffa263ade91397d5dc3f03aca45 PiperOrigin-RevId: 668329046
Imported from GitHub PR openxla/xla#16504 For CUDA 12.6, PTX version is 8.5 (simple versioning rule heuristic used previously no longer works): https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes For CUDA 12.6 and above, use the highest known PTX version (8.5). Without this change, the following error is observed: '+ptx86' is not a recognized feature for this target (ignoring feature) See: openxla/xla#16431 This PR also adds a basic test. Copybara import of the project: -- f8ec224aff879ffa263ade91397d5dc3f03aca45 by Sergey Kozub <skozub@nvidia.com>: [PTX] Use the correct ptx version (8.5) for CUDA 12.6 Merging this change closes #16504 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16504 from openxla:skozub/ptx_version_cuda_126 f8ec224aff879ffa263ade91397d5dc3f03aca45 PiperOrigin-RevId: 668329046
@beckerhe Why does github say "Merging is blocked (Merging can be performed automatically with 1 approving review)" after your approval? |
This is just because due to Copybara this is not the PR thet gets merged. The change is already being processed internally and should land in a few minutes. |
FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16504 from openxla:skozub/ptx_version_cuda_126 f8ec224aff879ffa263ade91397d5dc3f03aca45 PiperOrigin-RevId: 668290110
Imported from GitHub PR openxla/xla#16504 For CUDA 12.6, PTX version is 8.5 (simple versioning rule heuristic used previously no longer works): https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes For CUDA 12.6 and above, use the highest known PTX version (8.5). Without this change, the following error is observed: '+ptx86' is not a recognized feature for this target (ignoring feature) See: openxla/xla#16431 This PR also adds a basic test. Copybara import of the project: -- f8ec224aff879ffa263ade91397d5dc3f03aca45 by Sergey Kozub <skozub@nvidia.com>: [PTX] Use the correct ptx version (8.5) for CUDA 12.6 Merging this change closes #16504 PiperOrigin-RevId: 668368483
For CUDA 12.6, PTX version is 8.5 (simple versioning rule heuristic used previously no longer works):
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes
For CUDA 12.6 and above, use the highest known PTX version (8.5).
Without this change, the following error is observed:
'+ptx86' is not a recognized feature for this target (ignoring feature)
See: #16431
This PR also adds a basic test.