New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NVPTX target specification #57937

Merged
merged 8 commits into from Feb 2, 2019

Conversation

Projects
None yet
9 participants
@denzp
Copy link
Contributor

denzp commented Jan 27, 2019

This change adds a built-in nvptx64-nvidia-cuda GPGPU no-std target specification and a basic PTX assembly smoke tests.

The approach is taken here and the target spec is based on ptx-linker, a project started about 1.5 years ago. Key feature: bitcode object files being linked with LTO into the final module on the linker's side.

Prior to this change, the linker used a ld linker-flavor, but I think, having the special CLI convention is a more reliable way.

Questions about further progress on reliable CUDA workflow with Rust:

  1. Is it possible to create a test suite codegen-asm to verify end-to-end integration with LLVM backend?
  2. How would it be better to organise no-std compile-fail tests: add #![no_std] where possible and mark others as ignore-nvptx directive, or alternatively, introduce compile-fail-no-std test suite?
  3. Can we have the ptx-linker eventually be integrated as rls or clippy? Hopefully, this should allow to statically link against LLVM used in Rust and get rid of the current hacky solution.
  4. Am I missing some methods from rustc_codegen_ssa::back::linker::Linker that can be useful for bitcode-only linking?

Currently, there are no major public CUDA projects written in Rust I'm aware of, but I'm expecting to have a built-in target will create a solid foundation for further experiments and awesome crates.

Related to #38789
Fixes #38787
Fixes #38786

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Jan 27, 2019

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @cramertj (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Jan 27, 2019

@rust-highfive rust-highfive assigned nagisa and unassigned cramertj Jan 27, 2019

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Jan 27, 2019

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:2f5eff36:start=1548620577801138403,finish=1548620580132025617,duration=2330887214
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
Setting environment variables from .travis.yml
$ export IMAGE=x86_64-gnu-llvm-6.0
---

[00:03:30] travis_fold:start:tidy
travis_time:start:tidy
tidy check
[00:03:30] tidy error: /checkout/src/librustc_target/spec/nvptx64_nvidia_cuda.rs:27: TODO is deprecated; use FIXME
[00:03:30] tidy error: /checkout/src/librustc_target/spec/nvptx64_nvidia_cuda.rs:54: TODO is deprecated; use FIXME
[00:03:32] some tidy checks failed
[00:03:32] 
[00:03:32] 
[00:03:32] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/tidy" "/checkout/src" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "--no-vendor" "--quiet"
[00:03:32] 
[00:03:32] 
[00:03:32] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
[00:03:32] Build completed unsuccessfully in 0:00:48
[00:03:32] Build completed unsuccessfully in 0:00:48
[00:03:32] Makefile:68: recipe for target 'tidy' failed
[00:03:32] make: *** [tidy] Error 1
The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:0684e740
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
Sun Jan 27 20:26:44 UTC 2019
---
travis_time:end:1612cfd0:start=1548620805407301115,finish=1548620805412457762,duration=5156647
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:02355b0b
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb --batch -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:0524b050
travis_time:start:0524b050
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:21722ef8
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

cpu: "sm_20".to_string(),

// TODO(denzp): create tests for the atomics.
max_atomic_width: Some(64),

This comment has been minimized.

@peterhj

peterhj Jan 27, 2019

Contributor

BTW I did a few basic tests, and it turns out the the atomic_xadd rust-intrinsic seems to already generate correct PTX when doing atomic adds on u32 and u64. Hurray! I didn't test too many other atomic ops or sizes though.

(Sadly, atomic_xadd only supports integers and doesn't compile for f32 and f64 types.)

@nagisa
Copy link
Contributor

nagisa left a comment

For now comments are as is. I find tests somewhat sketchy, but I understand that the hand may be forced due to inadequate flexibility of other test suites.

}

fn optimize(&mut self) {
self.cmd.arg(match self.sess.opts.optimize {

This comment has been minimized.

@nagisa

nagisa Jan 27, 2019

Contributor

This feels suspect. I’m not sure the optimisation level for the leaf crate should influence the optimisation level for the whole crate graph. It feels like this should at least in some way depend on the LTO flag?

This comment has been minimized.

@denzp

denzp Jan 28, 2019

Author Contributor

I followed args pattern from other linkers here.
At the moment the linker runs both LTO and optimisation passes over the final (complete) module when -O{1,2,3} is specified.

Does Rust runs optimisations before emitting bitcode object files? If yes, it can be avoided on linker's side then.

This comment has been minimized.

@nagisa

nagisa Jan 28, 2019

Contributor

Does Rust runs optimisations before emitting bitcode object files?

Yes it does.

I followed args pattern from other linkers here.

For gcc and/or clang optimization flags during linkage mean very little if anything at all AFAIK. It definitely does not invoke some sort of global program re-optimisation, which is what happens with ptx-linker, it seems.

Such global-program optimisation is what lto would usually do, which is why I suggested that perhaps that flag is what should be accounted for :)

That could be left as a future endeavour as well, but invoking “LTO” just from -Copt-level feels wrong.

This comment has been minimized.

@denzp

denzp Jan 28, 2019

Author Contributor

Thanks for the explanation and suggestion, I overlooked Session::lto(&self) before!

I've also changed ptx-linker to not perform final global optimisation - indeed it didn't affect much.

Show resolved Hide resolved src/librustc_codegen_utils/symbol_names.rs Outdated
Show resolved Hide resolved src/librustc_codegen_utils/symbol_names.rs Outdated
Show resolved Hide resolved src/test/run-make/nvptx-binary-crate/Makefile Outdated
@@ -149,6 +149,7 @@ pub fn linker_and_flavor(sess: &Session) -> (PathBuf, LinkerFlavor) {
LinkerFlavor::Ld => "ld",
LinkerFlavor::Msvc => "link.exe",
LinkerFlavor::Lld(_) => "lld",
LinkerFlavor::PtxLinker => "rust-ptx-linker",

This comment has been minimized.

@nagisa

nagisa Jan 27, 2019

Contributor

Hmm… so we are depending on external project by default here. Something that users are very unlikely to have installed by default.

I wonder what the error looks like when rust-ptx-linker is not in $PATH. Perhaps it would make sense to ship rust-ptx-linker as part of rustlib like we do with e.g. lld component for some targets...

This comment has been minimized.

@denzp

denzp Jan 28, 2019

Author Contributor

That's true. On the other hand, to get started with CUDA development in Rust, users will follow either docs or tutorials, where it should be mentioned how to setup the environment.

The error message is, currently:

error: linker `rust-ptx-linker` not found
  |
  = note: No such file or directory (os error 2)

error: aborting due to previous error

Personally, I would love to have the linker to be shipped via rustup. But perhaps, it will be preferable to implement this as a separate PR?

This comment has been minimized.

@nagisa

nagisa Jan 28, 2019

Contributor

But perhaps, it will be preferable to implement this as a separate PR?

Sure.

@denzp

This comment has been minimized.

Copy link
Contributor Author

denzp commented Jan 28, 2019

@nagisa @peterhj thanks for reviewing!

I can agree about tests - normally these 3 run-make cases should be divided into many smaller ones. That's the reason why I call them "smoke tests" - they verify only basic functionality. The reason why I decided to condense the test cases is to avoid much similar boilerplate Makefile code that needs to be maintained.

How can I use codegen tests to assert assembly output? Also, how can I deal with the fact that the target is no-std and existing tests there often rely on libstd?

That's why I had an idea about introducing codegen-asm with mandatory target specification for each test case.

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Jan 28, 2019

How can I use codegen tests to assert assembly output?

Currently, AFAIK there is no way. Minor changes to compiletest are necessary to make it possible (i.e. you can tell codegen test to generate assembly, but no way to tell compiletest to look at anything but .ll output). For now it is fine to keep these tests as make-tests. We will figure out how to migrate them once compiletest is improved to support such use-case.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Jan 28, 2019

Thanks for this!

The only comment I'd add is that we unfortunately don't have resources for another image to run on Travis right now, but could this be folded into an existing builder that's already performing well under 2h on average?

@denzp

This comment has been minimized.

Copy link
Contributor Author

denzp commented Jan 28, 2019

@alexcrichton I can see that wasm-unknown image is usually runs pretty fast compared to other non-dist images.

Can I add NVPTX tests there? The only thing, it feels like the image should be renamed then. Do you have suggestions about the naming?

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Jan 29, 2019

@denzp sure yeah so long as it still runs in under 2 hrs should be fine to add! It should be fine to rename it to something like test-various perhaps?


ifeq ($(TARGET),nvptx64-nvidia-cuda)
all:
$(RUSTC) main.rs -Clink-arg=--arch=sm_60 --crate-type="bin" -O --target $(TARGET)

This comment has been minimized.

@gnzlbg

gnzlbg Jan 29, 2019

Contributor

Does RUSTFLAGS=-C target-cpu=sm_60 achieve the same effect here as passing the link flags ?

This comment has been minimized.

@denzp

denzp Jan 29, 2019

Author Contributor

Thanks, that's a nice catch.
I've addressed this providing target_cpu via --fallback-arch arg to the linker in the latest commit.

@denzp

This comment has been minimized.

Copy link
Contributor Author

denzp commented Jan 29, 2019

@alexcrichton thanks, I've merged the images as suggested!

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Jan 29, 2019

👍

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Feb 1, 2019

The compiler bits look good to me. I’ll assume that Alex’s 👍 is an approval for the infra changes.

@bors r+

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Feb 1, 2019

📌 Commit 49931fd has been approved by nagisa

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Feb 1, 2019

⌛️ Testing commit 49931fd with merge 2efa31b...

bors added a commit that referenced this pull request Feb 1, 2019

Auto merge of #57937 - denzp:nvptx, r=nagisa
NVPTX target specification

This change adds a built-in `nvptx64-nvidia-cuda` GPGPU no-std target specification and a basic PTX assembly smoke tests.

The approach is taken here and the target spec is based on `ptx-linker`, a project started about 1.5 years ago. Key feature: bitcode object files being linked with LTO into the final module on the linker's side.

Prior to this change, the linker used a `ld` linker-flavor, but I think, having the special CLI convention is a more reliable way.

Questions about further progress on reliable CUDA workflow with Rust:
1. Is it possible to create a test suite `codegen-asm` to verify end-to-end integration with LLVM backend?
1. How would it be better to organise no-std `compile-fail` tests: add `#![no_std]` where possible and mark others as `ignore-nvptx` directive, or alternatively, introduce `compile-fail-no-std` test suite?
1. Can we have the `ptx-linker` eventually be integrated as `rls` or `clippy`? Hopefully, this should allow to statically link against LLVM used in Rust and get rid of the [current hacky solution](https://github.com/denzp/rustc-llvm-proxy).
1. Am I missing some methods from `rustc_codegen_ssa:🔙:linker::Linker` that can be useful for bitcode-only linking?

Currently, there are no major public CUDA projects written in Rust I'm aware of, but I'm expecting to have a built-in target will create a solid foundation for further experiments and awesome crates.

Related to #38789
Fixes #38787
Fixes #38786
@bors

This comment has been minimized.

Copy link
Contributor

bors commented Feb 2, 2019

☀️ Test successful - checks-travis, status-appveyor
Approved by: nagisa
Pushing 2efa31b to master...

@bors bors merged commit 49931fd into rust-lang:master Feb 2, 2019

1 check passed

homu Test successful
Details

@denzp denzp deleted the denzp:nvptx branch Feb 2, 2019

@gnzlbg

This comment has been minimized.

Copy link
Contributor

gnzlbg commented Feb 4, 2019

Thank you @denzp !

kennytm added a commit to kennytm/rust that referenced this pull request Feb 5, 2019

Rollup merge of rust-lang#58108 - denzp:nvptx-manifest, r=alexcrichton
Add NVPTX target to a build manifest

Include `nvptx64-nvidia-cuda` target to a build manifest. I forgot this step at my first take on adding the target (rust-lang#57937).

Hopefully, this is the only reason why `rustup target add nvptx64-nvidia-cuda` doesn't work 🙁

r? @alexcrichton
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment