-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow compiler to inline code #3
Conversation
I do not think this achieves what you want it to achieve, sadly, unless you’re calling the functions from this library directly, as opposed to calling them via intrinsics/operations. I also would like to not to Finally, I doubt the NVPTX backend will use any of these functions at all (if the proper intrinsics are called), and will generate relevant hardware instructions instead. |
I agree that marking as "inline" large functions is not that good and might be confusing, but I think (or I hope) Rust compiler will decide where to inline the function, and where to call it. Without My problem with NVPTX backend is that it doesn't have std and so we don't have any mathematic functions available which are very critically because CUDA is being used for any kind of computations. So exactly the problem: without
which isn't a valid code to run on CUDA. And with It's in general possible to use NVVM intrinsics from code, but it doesn't mean we have to do this. Intrinsics are a faster alternative to standard functions in CUDA. Like an compromise, we can add |
Not exactly true. Something like
will gladly compile to
without any special support otherwise. I see what the problem is here, though. Like I said, I fear I feel that to avoid this problem you rather might want to use rustc to emit LLVM-IR ( The only problem I see with this approach is that LLVM is excessively buggy and even something like this:
will fail to compile with an error
So with all that in mind I’m fine with accepting this PR, I guess. I’ll think about it a bit tomorrow. |
Thanks for the explanation and approach with I didn't know PTX has any math function instructions like trigonometry. I tried a similar example with
That's why I abandoned my tries to go with intrinsics. Looks like, this happened because PTX has limited support for the math function instructions compared to intrinsics. Btw, I found that inlined and optimised math.rs sqrt:
sqrt.rn.f64:
|
With a help of
#[inline]
Rust will be able to inline the code into dependent crates.This might be useful in cases when we need
--emit asm
.For example, Rust code can be built for
NVPTX
arch which allows to run it on CUDA GPU but unfortunately, we must inline all code for that.