NVPTX target specification #57937

denzp · 2019-01-27T20:21:27Z

This change adds a built-in nvptx64-nvidia-cuda GPGPU no-std target specification and a basic PTX assembly smoke tests.

The approach is taken here and the target spec is based on ptx-linker, a project started about 1.5 years ago. Key feature: bitcode object files being linked with LTO into the final module on the linker's side.

Prior to this change, the linker used a ld linker-flavor, but I think, having the special CLI convention is a more reliable way.

Questions about further progress on reliable CUDA workflow with Rust:

Is it possible to create a test suite codegen-asm to verify end-to-end integration with LLVM backend?
How would it be better to organise no-std compile-fail tests: add #![no_std] where possible and mark others as ignore-nvptx directive, or alternatively, introduce compile-fail-no-std test suite?
Can we have the ptx-linker eventually be integrated as rls or clippy? Hopefully, this should allow to statically link against LLVM used in Rust and get rid of the current hacky solution.
Am I missing some methods from rustc_codegen_ssa::back::linker::Linker that can be useful for bitcode-only linking?

Currently, there are no major public CUDA projects written in Rust I'm aware of, but I'm expecting to have a built-in target will create a solid foundation for further experiments and awesome crates.

Related to #38789
Fixes #38787
Fixes #38786

rust-highfive · 2019-01-27T20:21:30Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @cramertj (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

Centril · 2019-01-27T20:38:08Z

r? @nagisa cc @rkruppe cc @alexcrichton

rust-highfive · 2019-01-27T20:58:02Z

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.

travis_time:end:2f5eff36:start=1548620577801138403,finish=1548620580132025617,duration=2330887214
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
Setting environment variables from .travis.yml
$ export IMAGE=x86_64-gnu-llvm-6.0
---

[00:03:30] travis_fold:start:tidy
travis_time:start:tidy
tidy check
[00:03:30] tidy error: /checkout/src/librustc_target/spec/nvptx64_nvidia_cuda.rs:27: TODO is deprecated; use FIXME
[00:03:30] tidy error: /checkout/src/librustc_target/spec/nvptx64_nvidia_cuda.rs:54: TODO is deprecated; use FIXME
[00:03:32] some tidy checks failed
[00:03:32] 
[00:03:32] 
[00:03:32] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/tidy" "/checkout/src" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "--no-vendor" "--quiet"
[00:03:32] 
[00:03:32] 
[00:03:32] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
[00:03:32] Build completed unsuccessfully in 0:00:48
[00:03:32] Build completed unsuccessfully in 0:00:48
[00:03:32] Makefile:68: recipe for target 'tidy' failed
[00:03:32] make: *** [tidy] Error 1
The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:0684e740
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
Sun Jan 27 20:26:44 UTC 2019
---
travis_time:end:1612cfd0:start=1548620805407301115,finish=1548620805412457762,duration=5156647
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:02355b0b
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb --batch -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:0524b050
travis_time:start:0524b050
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:21722ef8
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

peterhj · 2019-01-27T22:06:38Z

src/librustc_target/spec/nvptx64_nvidia_cuda.rs

+            cpu: "sm_20".to_string(),
+
+            // TODO(denzp): create tests for the atomics.
+            max_atomic_width: Some(64),


BTW I did a few basic tests, and it turns out the the atomic_xadd rust-intrinsic seems to already generate correct PTX when doing atomic adds on u32 and u64. Hurray! I didn't test too many other atomic ops or sizes though.

(Sadly, atomic_xadd only supports integers and doesn't compile for f32 and f64 types.)

nagisa

For now comments are as is. I find tests somewhat sketchy, but I understand that the hand may be forced due to inadequate flexibility of other test suites.

nagisa · 2019-01-27T21:49:31Z

src/librustc_codegen_ssa/back/linker.rs

+    }
+
+    fn optimize(&mut self) {
+        self.cmd.arg(match self.sess.opts.optimize {


This feels suspect. I’m not sure the optimisation level for the leaf crate should influence the optimisation level for the whole crate graph. It feels like this should at least in some way depend on the LTO flag?

I followed args pattern from other linkers here.
At the moment the linker runs both LTO and optimisation passes over the final (complete) module when -O{1,2,3} is specified.

Does Rust runs optimisations before emitting bitcode object files? If yes, it can be avoided on linker's side then.

Does Rust runs optimisations before emitting bitcode object files?

Yes it does.

I followed args pattern from other linkers here.

For gcc and/or clang optimization flags during linkage mean very little if anything at all AFAIK. It definitely does not invoke some sort of global program re-optimisation, which is what happens with ptx-linker, it seems.

Such global-program optimisation is what lto would usually do, which is why I suggested that perhaps that flag is what should be accounted for :)

That could be left as a future endeavour as well, but invoking “LTO” just from -Copt-level feels wrong.

Thanks for the explanation and suggestion, I overlooked Session::lto(&self) before!

I've also changed ptx-linker to not perform final global optimisation - indeed it didn't affect much.

src/librustc_codegen_utils/symbol_names.rs

src/test/run-make/nvptx-binary-crate/Makefile

nagisa · 2019-01-27T22:11:57Z

src/librustc_codegen_ssa/back/link.rs

@@ -149,6 +149,7 @@ pub fn linker_and_flavor(sess: &Session) -> (PathBuf, LinkerFlavor) {
                LinkerFlavor::Ld => "ld",
                LinkerFlavor::Msvc => "link.exe",
                LinkerFlavor::Lld(_) => "lld",
+                LinkerFlavor::PtxLinker => "rust-ptx-linker",


Hmm… so we are depending on external project by default here. Something that users are very unlikely to have installed by default.

I wonder what the error looks like when rust-ptx-linker is not in $PATH. Perhaps it would make sense to ship rust-ptx-linker as part of rustlib like we do with e.g. lld component for some targets...

That's true. On the other hand, to get started with CUDA development in Rust, users will follow either docs or tutorials, where it should be mentioned how to setup the environment.

The error message is, currently:

error: linker `rust-ptx-linker` not found | = note: No such file or directory (os error 2) error: aborting due to previous error

Personally, I would love to have the linker to be shipped via rustup. But perhaps, it will be preferable to implement this as a separate PR?

But perhaps, it will be preferable to implement this as a separate PR?

Sure.

src/librustc_target/spec/nvptx64_nvidia_cuda.rs

denzp · 2019-01-28T01:19:59Z

@nagisa @peterhj thanks for reviewing!

I can agree about tests - normally these 3 run-make cases should be divided into many smaller ones. That's the reason why I call them "smoke tests" - they verify only basic functionality. The reason why I decided to condense the test cases is to avoid much similar boilerplate Makefile code that needs to be maintained.

How can I use codegen tests to assert assembly output? Also, how can I deal with the fact that the target is no-std and existing tests there often rely on libstd?

That's why I had an idea about introducing codegen-asm with mandatory target specification for each test case.

nagisa · 2019-01-28T17:03:38Z

How can I use codegen tests to assert assembly output?

Currently, AFAIK there is no way. Minor changes to compiletest are necessary to make it possible (i.e. you can tell codegen test to generate assembly, but no way to tell compiletest to look at anything but .ll output). For now it is fine to keep these tests as make-tests. We will figure out how to migrate them once compiletest is improved to support such use-case.

alexcrichton · 2019-01-28T20:40:47Z

Thanks for this!

The only comment I'd add is that we unfortunately don't have resources for another image to run on Travis right now, but could this be folded into an existing builder that's already performing well under 2h on average?

denzp · 2019-01-28T23:16:00Z

@alexcrichton I can see that wasm-unknown image is usually runs pretty fast compared to other non-dist images.

Can I add NVPTX tests there? The only thing, it feels like the image should be renamed then. Do you have suggestions about the naming?

alexcrichton · 2019-01-29T15:09:37Z

@denzp sure yeah so long as it still runs in under 2 hrs should be fine to add! It should be fine to rename it to something like test-various perhaps?

gnzlbg · 2019-01-29T15:16:56Z

src/test/run-make/nvptx-binary-crate/Makefile

+
+ifeq ($(TARGET),nvptx64-nvidia-cuda)
+all:
+	$(RUSTC) main.rs -Clink-arg=--arch=sm_60 --crate-type="bin" -O --target $(TARGET)


Does RUSTFLAGS=-C target-cpu=sm_60 achieve the same effect here as passing the link flags ?

Thanks, that's a nice catch.
I've addressed this providing target_cpu via --fallback-arch arg to the linker in the latest commit.

denzp · 2019-01-29T20:02:08Z

@alexcrichton thanks, I've merged the images as suggested!

alexcrichton · 2019-01-29T20:19:15Z

👍

nagisa · 2019-02-01T22:07:05Z

The compiler bits look good to me. I’ll assume that Alex’s 👍 is an approval for the infra changes.

@bors r+

bors · 2019-02-01T22:07:06Z

📌 Commit 49931fd has been approved by nagisa

bors · 2019-02-01T23:43:40Z

⌛ Testing commit 49931fd with merge 2efa31b...

NVPTX target specification This change adds a built-in `nvptx64-nvidia-cuda` GPGPU no-std target specification and a basic PTX assembly smoke tests. The approach is taken here and the target spec is based on `ptx-linker`, a project started about 1.5 years ago. Key feature: bitcode object files being linked with LTO into the final module on the linker's side. Prior to this change, the linker used a `ld` linker-flavor, but I think, having the special CLI convention is a more reliable way. Questions about further progress on reliable CUDA workflow with Rust: 1. Is it possible to create a test suite `codegen-asm` to verify end-to-end integration with LLVM backend? 1. How would it be better to organise no-std `compile-fail` tests: add `#![no_std]` where possible and mark others as `ignore-nvptx` directive, or alternatively, introduce `compile-fail-no-std` test suite? 1. Can we have the `ptx-linker` eventually be integrated as `rls` or `clippy`? Hopefully, this should allow to statically link against LLVM used in Rust and get rid of the [current hacky solution](https://github.com/denzp/rustc-llvm-proxy). 1. Am I missing some methods from `rustc_codegen_ssa::back::linker::Linker` that can be useful for bitcode-only linking? Currently, there are no major public CUDA projects written in Rust I'm aware of, but I'm expecting to have a built-in target will create a solid foundation for further experiments and awesome crates. Related to #38789 Fixes #38787 Fixes #38786

bors · 2019-02-02T02:18:22Z

☀️ Test successful - checks-travis, status-appveyor
Approved by: nagisa
Pushing 2efa31b to master...

gnzlbg · 2019-02-04T13:14:05Z

Thank you @denzp !

@alexcrichton

Add NVPTX target to a build manifest Include `nvptx64-nvidia-cuda` target to a build manifest. I forgot this step at my first take on adding the target (rust-lang#57937). Hopefully, this is the only reason why `rustup target add nvptx64-nvidia-cuda` doesn't work 🙁 r? @alexcrichton

The following targets are now built and distributed: - aarch64-unknown-none: rust-lang/rust#68334 - mips64-unknown-linux-muslabi64: rust-lang/rust#65843 - mips64el-unknown-linux-muslabi64: rust-lang/rust#65843 - nvptx64-nvidia-cuda: rust-lang/rust#57937 - riscv32i-unknown-none-elf: rust-lang/rust#62784 - riscv64gc-unknown-linux-gnu: rust-lang/rust#68037 - thumbv8m.base-none-eabi: rust-lang/rust#59182 - thumbv8m.main-none-eabi : rust-lang/rust#56954 - thumbv8m.main-none-eabihf: rust-lang/rust#59182

denzp added 2 commits January 27, 2019 16:04

Create nvptx64-nvidia-cuda target specification

d3903d5

Enable CI for nvptx64-nvidia-cuda

97c8e82

rust-highfive assigned cramertj Jan 27, 2019

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 27, 2019

rust-highfive assigned nagisa and unassigned cramertj Jan 27, 2019

peterhj reviewed Jan 27, 2019

View reviewed changes

nagisa reviewed Jan 27, 2019

View reviewed changes

peterhj reviewed Jan 27, 2019

View reviewed changes

src/librustc_target/spec/nvptx64_nvidia_cuda.rs Outdated Show resolved Hide resolved

denzp added 2 commits January 28, 2019 01:16

NVPTX: by-default use target cpu "sm_30"

ceacde3

SymbolPathBuffer shallow refactoring

8d53c92

Adjust PTXLinker LTO logic and CLI

6f86a70

gnzlbg reviewed Jan 29, 2019

View reviewed changes

denzp added 3 commits January 29, 2019 19:06

Merge NVPTX and WASM test images into test-various

899d936

Provide PTXLinker with fallback to internal target-cpu

3f62445

Use correct CI test image for WASM and NVPTX

49931fd

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 1, 2019

bors merged commit 49931fd into rust-lang:master Feb 2, 2019

bors mentioned this pull request Feb 2, 2019

Stabilize linker-plugin based LTO (aka cross-language LTO) #58057

Merged

denzp deleted the nvptx branch February 2, 2019 16:14

denzp mentioned this pull request Feb 3, 2019

Add NVPTX target to a build manifest #58108

Merged

termoshtt mentioned this pull request Dec 30, 2019

Restart accel termoshtt/accel#64

Open

7 tasks

ehuss mentioned this pull request May 15, 2020

Update platform tiers. rust-lang/rust-forge#356

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVPTX target specification #57937

NVPTX target specification #57937

denzp commented Jan 27, 2019 •

edited

Loading

rust-highfive commented Jan 27, 2019

Centril commented Jan 27, 2019 •

edited

Loading

rust-highfive commented Jan 27, 2019

peterhj Jan 27, 2019

nagisa left a comment

nagisa Jan 27, 2019 •

edited

Loading

denzp Jan 28, 2019

nagisa Jan 28, 2019 •

edited

Loading

denzp Jan 28, 2019

nagisa Jan 27, 2019

denzp Jan 28, 2019

nagisa Jan 28, 2019

denzp commented Jan 28, 2019

nagisa commented Jan 28, 2019 •

edited

Loading

alexcrichton commented Jan 28, 2019

denzp commented Jan 28, 2019

alexcrichton commented Jan 29, 2019

gnzlbg Jan 29, 2019

denzp Jan 29, 2019

denzp commented Jan 29, 2019

alexcrichton commented Jan 29, 2019

nagisa commented Feb 1, 2019

bors commented Feb 1, 2019

bors commented Feb 1, 2019

bors commented Feb 2, 2019

gnzlbg commented Feb 4, 2019

NVPTX target specification #57937

NVPTX target specification #57937

Conversation

denzp commented Jan 27, 2019 • edited Loading

rust-highfive commented Jan 27, 2019

Centril commented Jan 27, 2019 • edited Loading

rust-highfive commented Jan 27, 2019

Choose a reason for hiding this comment

nagisa left a comment

Choose a reason for hiding this comment

nagisa Jan 27, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nagisa Jan 28, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

denzp commented Jan 28, 2019

nagisa commented Jan 28, 2019 • edited Loading

alexcrichton commented Jan 28, 2019

denzp commented Jan 28, 2019

alexcrichton commented Jan 29, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

denzp commented Jan 29, 2019

alexcrichton commented Jan 29, 2019

nagisa commented Feb 1, 2019

bors commented Feb 1, 2019

bors commented Feb 1, 2019

bors commented Feb 2, 2019

gnzlbg commented Feb 4, 2019

denzp commented Jan 27, 2019 •

edited

Loading

Centril commented Jan 27, 2019 •

edited

Loading

nagisa Jan 27, 2019 •

edited

Loading

nagisa Jan 28, 2019 •

edited

Loading

nagisa commented Jan 28, 2019 •

edited

Loading