Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upNVPTX target specification #57937
Conversation
denzp
added some commits
Jan 19, 2019
rust-highfive
assigned
cramertj
Jan 27, 2019
This comment has been minimized.
This comment has been minimized.
|
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @cramertj (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
rust-highfive
added
the
S-waiting-on-review
label
Jan 27, 2019
This comment has been minimized.
This comment has been minimized.
|
r? @nagisa cc @rkruppe cc @alexcrichton |
rust-highfive
assigned
nagisa
and unassigned
cramertj
Jan 27, 2019
This comment has been minimized.
This comment has been minimized.
|
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
peterhj
reviewed
Jan 27, 2019
| cpu: "sm_20".to_string(), | ||
|
|
||
| // TODO(denzp): create tests for the atomics. | ||
| max_atomic_width: Some(64), |
This comment has been minimized.
This comment has been minimized.
peterhj
Jan 27, 2019
Contributor
BTW I did a few basic tests, and it turns out the the atomic_xadd rust-intrinsic seems to already generate correct PTX when doing atomic adds on u32 and u64. Hurray! I didn't test too many other atomic ops or sizes though.
(Sadly, atomic_xadd only supports integers and doesn't compile for f32 and f64 types.)
nagisa
reviewed
Jan 27, 2019
|
For now comments are as is. I find tests somewhat sketchy, but I understand that the hand may be forced due to inadequate flexibility of other test suites. |
| } | ||
|
|
||
| fn optimize(&mut self) { | ||
| self.cmd.arg(match self.sess.opts.optimize { |
This comment has been minimized.
This comment has been minimized.
nagisa
Jan 27, 2019
•
Contributor
This feels suspect. I’m not sure the optimisation level for the leaf crate should influence the optimisation level for the whole crate graph. It feels like this should at least in some way depend on the LTO flag?
This comment has been minimized.
This comment has been minimized.
denzp
Jan 28, 2019
Contributor
I followed args pattern from other linkers here.
At the moment the linker runs both LTO and optimisation passes over the final (complete) module when -O{1,2,3} is specified.
Does Rust runs optimisations before emitting bitcode object files? If yes, it can be avoided on linker's side then.
This comment has been minimized.
This comment has been minimized.
nagisa
Jan 28, 2019
•
Contributor
Does Rust runs optimisations before emitting bitcode object files?
Yes it does.
I followed args pattern from other linkers here.
For gcc and/or clang optimization flags during linkage mean very little if anything at all AFAIK. It definitely does not invoke some sort of global program re-optimisation, which is what happens with ptx-linker, it seems.
Such global-program optimisation is what lto would usually do, which is why I suggested that perhaps that flag is what should be accounted for :)
That could be left as a future endeavour as well, but invoking “LTO” just from -Copt-level feels wrong.
This comment has been minimized.
This comment has been minimized.
denzp
Jan 28, 2019
Contributor
Thanks for the explanation and suggestion, I overlooked Session::lto(&self) before!
I've also changed ptx-linker to not perform final global optimisation - indeed it didn't affect much.
| @@ -149,6 +149,7 @@ pub fn linker_and_flavor(sess: &Session) -> (PathBuf, LinkerFlavor) { | |||
| LinkerFlavor::Ld => "ld", | |||
| LinkerFlavor::Msvc => "link.exe", | |||
| LinkerFlavor::Lld(_) => "lld", | |||
| LinkerFlavor::PtxLinker => "rust-ptx-linker", | |||
This comment has been minimized.
This comment has been minimized.
nagisa
Jan 27, 2019
Contributor
Hmm… so we are depending on external project by default here. Something that users are very unlikely to have installed by default.
I wonder what the error looks like when rust-ptx-linker is not in $PATH. Perhaps it would make sense to ship rust-ptx-linker as part of rustlib like we do with e.g. lld component for some targets...
This comment has been minimized.
This comment has been minimized.
denzp
Jan 28, 2019
Contributor
That's true. On the other hand, to get started with CUDA development in Rust, users will follow either docs or tutorials, where it should be mentioned how to setup the environment.
The error message is, currently:
error: linker `rust-ptx-linker` not found
|
= note: No such file or directory (os error 2)
error: aborting due to previous error
Personally, I would love to have the linker to be shipped via rustup. But perhaps, it will be preferable to implement this as a separate PR?
This comment has been minimized.
This comment has been minimized.
nagisa
Jan 28, 2019
Contributor
But perhaps, it will be preferable to implement this as a separate PR?
Sure.
peterhj
reviewed
Jan 27, 2019
denzp
added some commits
Jan 27, 2019
This comment has been minimized.
This comment has been minimized.
|
@nagisa @peterhj thanks for reviewing! I can agree about tests - normally these 3 How can I use That's why I had an idea about introducing |
This comment has been minimized.
This comment has been minimized.
Currently, AFAIK there is no way. Minor changes to compiletest are necessary to make it possible (i.e. you can tell codegen test to generate assembly, but no way to tell compiletest to look at anything but |
This comment has been minimized.
This comment has been minimized.
|
Thanks for this! The only comment I'd add is that we unfortunately don't have resources for another image to run on Travis right now, but could this be folded into an existing builder that's already performing well under 2h on average? |
This comment has been minimized.
This comment has been minimized.
|
@alexcrichton I can see that Can I add NVPTX tests there? The only thing, it feels like the image should be renamed then. Do you have suggestions about the naming? |
This comment has been minimized.
This comment has been minimized.
|
@denzp sure yeah so long as it still runs in under 2 hrs should be fine to add! It should be fine to rename it to something like |
gnzlbg
reviewed
Jan 29, 2019
|
|
||
| ifeq ($(TARGET),nvptx64-nvidia-cuda) | ||
| all: | ||
| $(RUSTC) main.rs -Clink-arg=--arch=sm_60 --crate-type="bin" -O --target $(TARGET) |
This comment has been minimized.
This comment has been minimized.
gnzlbg
Jan 29, 2019
Contributor
Does RUSTFLAGS=-C target-cpu=sm_60 achieve the same effect here as passing the link flags ?
This comment has been minimized.
This comment has been minimized.
denzp
Jan 29, 2019
Contributor
Thanks, that's a nice catch.
I've addressed this providing target_cpu via --fallback-arch arg to the linker in the latest commit.
denzp
added some commits
Jan 29, 2019
This comment has been minimized.
This comment has been minimized.
|
@alexcrichton thanks, I've merged the images as suggested! |
This comment has been minimized.
This comment has been minimized.
|
|
This comment has been minimized.
This comment has been minimized.
|
The compiler bits look good to me. I’ll assume that Alex’s @bors r+ |
This comment has been minimized.
This comment has been minimized.
|
|
bors
added
S-waiting-on-bors
and removed
S-waiting-on-review
labels
Feb 1, 2019
This comment has been minimized.
This comment has been minimized.
added a commit
that referenced
this pull request
Feb 1, 2019
This comment has been minimized.
This comment has been minimized.
|
|
denzp commentedJan 27, 2019
•
edited
This change adds a built-in
nvptx64-nvidia-cudaGPGPU no-std target specification and a basic PTX assembly smoke tests.The approach is taken here and the target spec is based on
ptx-linker, a project started about 1.5 years ago. Key feature: bitcode object files being linked with LTO into the final module on the linker's side.Prior to this change, the linker used a
ldlinker-flavor, but I think, having the special CLI convention is a more reliable way.Questions about further progress on reliable CUDA workflow with Rust:
codegen-asmto verify end-to-end integration with LLVM backend?compile-failtests: add#![no_std]where possible and mark others asignore-nvptxdirective, or alternatively, introducecompile-fail-no-stdtest suite?ptx-linkereventually be integrated asrlsorclippy? Hopefully, this should allow to statically link against LLVM used in Rust and get rid of the current hacky solution.rustc_codegen_ssa::back::linker::Linkerthat can be useful for bitcode-only linking?Currently, there are no major public CUDA projects written in Rust I'm aware of, but I'm expecting to have a built-in target will create a solid foundation for further experiments and awesome crates.
Related to #38789
Fixes #38787
Fixes #38786