Skip to content

Conversation

@andportnoy
Copy link
Contributor

@andportnoy andportnoy commented May 7, 2025

Separate commits to make it easier to review.

@andportnoy andportnoy added the CI Optional GPU Presubmit Label to flag PR to run additional GPU testing not in standard presubmits label May 7, 2025
@andportnoy andportnoy changed the title Mosaic gpu ptx isa from ptxas and llvm [Mosaic GPU] Use PTX ISA version = min(ptxas, LLVM) May 7, 2025
@andportnoy andportnoy requested a review from apaszke May 7, 2025 22:44
}
stdout += buf;
}
if (close(stdout_pipe[0]) == -1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use absl::Cleanup to close the pipes. This, or the close above might not get executed if something else fails along the way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return absl::InternalError(
absl::StrCat("Failed to read from pipe: ", strerror(errno)));
}
stdout += buf;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks sketchy to me? The addition will probably look for the null byte to decide where to stop the concatenation but there might be no null byte in there (read does not insert one). Use stdout.append(buf, status) (and perhaps rename status to bytes_read)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I'm more used to writing C for systems code like that and forming the string explicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

return absl::InternalError("Failed to wait for CUDA tool invocation");
}
if (status != 0) return absl::InternalError("CUDA tool failed");
if (status != 0) return absl::InternalError(stdout);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please prepend a failure message to stdout in case the tool didn't write something helpful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


const std::string sm = sm_arch_specific ? sm_arch_specific : sm_base;

absl::StatusOr<std::string> GetPtxIsaVersion(int major, int minor) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is dead now, isn't it? Perhaps delete it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this PR is a bit raw, I'll refactor it a bit better + add more inline docs. This function does play a role here, it takes an input PTX ISA version specified as major.minor and returns min(major.minor, <latest PTX ISA supported by LLVM>). This is to cover cases when ptxas is more recent and LLVM hasn't caught up to it yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored the function for clarity.

absl::StatusOr<int> GetLatestPtxasPtxIsaVersion() {
std::vector<const char*> ptxas_args = {"ptxas", "--input-as-string",
".version 99.99", nullptr};
auto result = RunCUDATool("ptxas", ptxas_args);
Copy link
Contributor

@jreiffers jreiffers May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Avoid using auto when the type isn't obvious. Something like

auto status = RunCUDATool("ptxas", ptxas_args).status();

Might be fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, fixed.

absl::StrFormat("Failed to parse PTX ISA minor version, expected a "
"parsable integer, instead got: %s",
major_minor[1]));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to fail loudly here if minor is ever >= 10.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


absl::StatusOr<std::string> GetPtxIsaVersion() {
int ptxas_latest_version;
TF_ASSIGN_OR_RETURN(ptxas_latest_version, GetLatestPtxasPtxIsaVersion());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: why not just put the declaration in here?

  TF_ASSIGN_OR_RETURN(int ptxas_latest_version, GetLatestPtxasPtxIsaVersion());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thank you.

for (const llvm::SubtargetFeatureKV& feature :
subtarget_info->getEnabledProcessorFeatures()) {
subtarget_info->getAllProcessorFeatures()) {
if (absl::StartsWith(feature.Key, "ptx")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing really wrong here, but it's not super idiomatic:

absl::string_view key = feature.Key;
if (absl::ConsumePrefix(&key, "ptx")) {
  // No need for version_string
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, didn't know about ConsumePrefix.

continue;
}
// Dump SASS.
std::cout << *result << std::endl;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you missed an explicit print for ptxas output above (if dump_ptxas is on)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed that (see last force push) and squashed the commits.

@google-ml-butler google-ml-butler bot added kokoro:force-run pull ready Ready for copybara import and testing labels May 26, 2025
@andportnoy andportnoy force-pushed the mosaic-gpu-ptx-isa-from-ptxas-and-llvm branch from b8d2305 to c1e8f25 Compare May 26, 2025 15:18
@copybara-service copybara-service bot merged commit 2cbec58 into jax-ml:main May 27, 2025
20 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI Optional GPU Presubmit Label to flag PR to run additional GPU testing not in standard presubmits kokoro:force-run pull ready Ready for copybara import and testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants