Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve CUDA support #612

Merged
merged 5 commits into from Aug 2, 2021
Merged

Improve CUDA support #612

merged 5 commits into from Aug 2, 2021

Conversation

dot-asm
Copy link
Contributor

@dot-asm dot-asm commented Jul 25, 2021

It's rather tricky to use the CUDA compiler with cc-rs, because as it is now you would customarily get a bunch of unresolved symbols linking errors. Answer is a) perform CUDA-specific --device-link step, b) link with CUDA run-time. The latter is solved as a feature.

For reference, this is similar to #508, but things are done in a more Windows-friendly manner and a CI test is added.

@dot-asm
Copy link
Contributor Author

dot-asm commented Jul 25, 2021

Just in case, reference to "Windows friendliness" means that I can confirm that it works on Windows. Well, cargo test --manifest-path cc-test/Cargo.toml --features cudart doesn't as is, but it can be rectified by moving the new which(nvcc) thing up in cc-test/build.rs, more specifically prior to Test that the 'windows_registry' module ..., because it wipes %PATH%. Arranging CI test on Windows is problematic because of the lack of usable installation instructions.

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me! I don't know anything about cuda really but I'm happy to defer!

Cargo.toml Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
@dot-asm
Copy link
Contributor Author

dot-asm commented Jul 27, 2021

I apologize for the noise from CI. Rust is not my language of 1st choice:-)

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok sounds good to me about the which business. I'll take on the responsibility of dealing with those bugs if they ever arise in the future :)

src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated
env::var_os("PATH").as_ref().and_then(|path_entries| {
env::split_paths(path_entries).find_map(|path_entry| {
let mut exe = path_entry.join(tool);
return if check_exe(&mut exe) { Some(exe) } else { None };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return here can be elided (the last expression in a closure is an implicit return)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It says "you might have meant to ... return Some(exe);"...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that probably means you need to remove the semicolon at the end of the line as well, changing it to:

if check_exe(&mut exe) { Some(exe) } else { None }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that's what I tried, and it did complain, 1.53.

src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated
// it's on user to specify one by passing it through
// RUSTFLAGS environment variable.
let mut libtst = false;
let mut libdir = nvcc.canonicalize().unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've found historically that canonicalize() can fail for weird reasons, but is that necesary to do here? We already find nvcc in PATH so I don't think we may need to canonicalize here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that PATH is evaluated only in the case nvcc here is a "single-word" command. Otherwise one can set the NVCC environment variable to explicit path to whatever you want to call nvcc. The idea behind canonicalize() is to handle the case when the environment variable is set to relative path. One can argue that it's an edge case that is not worth covering. If you want to say "scrap it," then just say the word:-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If nvcc is set to a relative path is there an issue with passing a relative path as a -L argument? I would naively expect that to work ok, which I think means that the canonicalize() probably isn't necessary here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's surely just me, but I tend to think of unlike cases. Let's say you want to compile with NVCC set to './nvcc'. Of course nobody in their right mind would actually do it, but it's not really an excuse to disregard the possibility. Without canonicalize() the current code will break. Alternative can be to not rely on libdir.pop(), libdir.pop(), but do libdir.pop(), libdir.push("..")... This way most unlike case is covered without canonicalize(). Let me test how it works...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

canonicalize() is gone.

src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated
match which(&self.get_compiler().path) {
None => (),
Some(nvcc) => {
// Try to figure out the -L search path. If it fails,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, do you know why nvcc doesn't do this logic automatically? Is there documentation we can point to somewhere recommending this -L path and such?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does do this automatically, but the problem is that cargo doesn't call nvcc to link the final binary. Instead is calls the system linker, and then you have to craft -L and -l yourself. Of course ideal would be if one could query nvcc and ask "what's your library path," but there is no such option:-(

Hmmm, when I think about it... If there is a way to dynamically override the target linker, and set it to nvcc, it might work too... In such case one wouldn't need this -L thing, but then cross-compilation might get tricky. [Just in case, reference to "cross-compilation" means that it does work, I can cross-compile CUDA program for aarch64.]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The final linker is actually invoked by rustc rather than Cargo, and the target's default can be overridden with -Clinker=/path/to/nvcc (which can also be configured in cargo either via RUSTFLAGS or as a string with CARGO_TARGET_..._LINKER=...)

Is this a case where the Rust target should be defaulting to nvcc rather than cc?

(I should point out again I'm very naive about CUDA/nvcc/etc, I naively assume right now that there's one Rust target that everyone uses for GPU things (and that nvcc is for GPU things) and that you're compiling for that target)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a case where the Rust target should be defaulting to nvcc rather than cc?

Yes. Or rather that's what I had in mind, it remains to be seen if it actually works all the way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I naively assume right now that there's one Rust target that everyone uses for GPU things (and that nvcc is for GPU things) and that you're compiling for that target.

nvcc is available for x86_64, arm64 and ppc64le, And it can do cross-compilations. It's most common scenario when you run eclipse on x86_64 and target remote arm64.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting LINKER to nvcc doesn't work, because it doesn't recognize all system's linker flags.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh it might be the case that rustc needs more invasive knowledge about nvcc if it runs nvcc, there's a few flavors of linkers implemented in rustc and nvcc may need its own flavor (unless it looks exactly like gcc, which sounds like that's not the case)

Cargo.toml Outdated Show resolved Hide resolved
@@ -1205,6 +1245,9 @@ impl Build {
if !msvc || !is_asm || !is_arm {
cmd.arg("-c");
}
if self.cuda {
cmd.arg("--device-c");
}
Copy link
Contributor Author

@dot-asm dot-asm Jul 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional thought. And I suppose it's rather a question to @YoshikawaMasashi. I've effectively taken this from #508, but the flag is not generally required, only when you compile multiple modules. For this reason it might be more appropriate to shift the burden to application and suggest to add .flag("--devce-c") when constructing the compilation. Alternatively one can make the [flag] addition contingent on amount of files being compiled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have to defer to your judgement on this, I don't think I know enough about cuda to know what the best option is here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went for additional self.files.len() > 1.

This allows to perform the final link with system linker.
src/lib.rs Outdated
Some(opt) => opt.as_str(),
None => "none",
};
if self.cuda && cudart != "none" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here you can probably just do if let Some(cudart) = &self.cudart { ... } because if someone specifies cudart then self.cuda is automatically set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I let Some(cudart) = &self.cudart, I'll still have to check for "none". Because cudart being None and Some("none") are equivalent cases. But yes, I can omit self.cuda...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh how come Some("none") is equivalent to None? (I thought that was accidental, didn't realize it was intentional)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.cuda is removed, added a commentary to remind that Some("none") is an option. I've also spotted error, I recognized "dynamic" instead of "shared."

Copy link
Contributor Author

@dot-asm dot-asm Jul 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how come Some("none") is equivalent to None?

Since .cudart() aims to mimic --cudart, it recognizes all --cudart's options, {none|shared|static}. So that nvcc's users feel "home."

Cargo.toml Outdated Show resolved Hide resolved
@dot-asm
Copy link
Contributor Author

dot-asm commented Jul 29, 2021

As for mimicking the --cudart compiler option. Currently not calling .cudart() method is equivalent to --cudart none, which is not nvcc's default. The rationale behind this suggestion is following. I would assume that if somebody is using the cc-rs with CUDA now, they have to do something about linking. And making sudden moves like forcing equivalent of --cudart static can interfere with whatever current users do. Now, one can make case for not worrying about the said interference, enforce the nvcc's default, and tell users to adjust. I personally would prefer the nvcc's default, but I'm merely starting with Rust-CUDA combo, so my opinion doesn't weigh much:-)

Try to locate the library in standard location relative to nvcc command.
If it fails, user is held responsible for specifying one in RUSTFLAGS.
Execution is bound to fail without card, but the failure is ignored.
It's rather a compile-n-link test. The test is suppressed if 'nvcc'
is not found on the $PATH.
This can interfere with current deployments in the wild, in which
case some adjustments might be required. Most notably one might
have to add .cuda("none") to the corresponding Builder instantiation
to restore the original behaviour.
@dot-asm
Copy link
Contributor Author

dot-asm commented Jul 30, 2021

one can make case for not worrying about the said interference, enforce the nvcc's default, and tell users to adjust.

The top-most commit, "Harmonize CUDA support with NVCC default --cudart static," 1bc17a0, is to show how enforcing the nvcc's default looks like in practice. Whether or not it's accepted is up to the maintainer. I for one am obviously in favour:-) I'm pinging earlier CUDA support contributors, @peterhj, @trolleyman, @YoshikawaMasashi, for possible feedback.

@alexcrichton
Copy link
Member

Ok that all sounds good to me, and I think that this is good to go from at least my end in terms of idioms and such. Are you comfortable with the implemented semantics here? If so I can hit the button

@dot-asm
Copy link
Contributor Author

dot-asm commented Jul 30, 2021

It might be appropriate to give some time for other contributors to weigh in. Naturally if they choose to... To recap, the question is if it would be appropriate to start linking with cudart by default. As opposed to having making user add an additional method call to the Builder.

@alexcrichton
Copy link
Member

Ok well in any case this doesn't change the defaults as-is but I think it's fine to consider changing the defaults in the future as well. I'm gonna go ahead and merge this.

@alexcrichton alexcrichton merged commit 4a6e8b7 into rust-lang:master Aug 2, 2021
@dot-asm
Copy link
Contributor Author

dot-asm commented Aug 2, 2021

Thank you for being patient! :-) If there are any, even remotely related problems, don't hesitate to tag my handle. I'll be happy to help. Cheers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants