-
Notifications
You must be signed in to change notification settings - Fork 12.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Same code is slow in Rust, but is fast in C using clang 3.9 #39446
Comments
Note that optimized Rust treats signed overflow as wrapping, whereas in C it's undefined behavior. I found that GCC is about the same with or without |
Here's one of the hotspots with
And the same with
In both cases, it's optimizing the division into multiplication with fixups. But it looks like in the first case, with The same line from rustc is basically identical to the GCC outputs the same for me either way.
|
Ok, if this confirmed to be a limitation caused by wrapping semantics, should be there an option in |
@pftbest you might try adding |
@ranma42 Your suggestion helped, but it wasn't enough. I've added this intrinsic to both |
@pftbest my suggestion was specific for the |
@ranma42 yes, I know, that is why I was asking if there is a switch in the compiler. |
Another way to "trick" rustc without using --- main.rs.orig 2017-02-18 14:41:31.993231078 -0800
+++ main.rs 2017-02-18 14:43:25.862194979 -0800
@@ -43,8 +43,8 @@ fn main() {
y_y = 0;
i = 0;
while (i < max_iter && x_x + y_y <= 800) {
- x_x = (x * x) / 200;
- y_y = (y * y) / 200;
+ x_x = ((x * x) as u32 / 200) as i32;
+ y_y = ((y * y) as u32 / 200) as i32;
if (x_x + y_y > 800) {
the_char = 48 + i;
if (i > 9) {
@@ -53,9 +53,9 @@ fn main() {
} else {
temp = x_x - y_y + x0;
if ((x < 0 && y > 0) || (x > 0 && y < 0)) {
- y = (-1 * ((-1 * (x * y)) / 100)) + y0;
+ y = (-1 * ((-1 * (x * y)) as u32 / 100) as i32) + y0;
} else {
- y = x * y / 100 + y0;
+ y = ((x * y) as u32 / 100) as i32 + y0;
}
x = temp;
} |
FWIW, I filed a GCC performance bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79665 |
Here is another classic example of overflow optimizations: pub fn times_ten(num: isize) -> isize {
let a = num * 30;
let b = a / 3;
b
} gives: example::times_ten:
imul rax, rdi, 30
movabs rcx, 6148914691236517206
imul rcx
mov rax, rdx
shr rax, 63
lea rax, [rax + rdx]
ret while C code knows its just times 10: times_ten(int): # @times_ten(int)
add edi, edi
lea eax, [rdi + 4*rdi]
ret Looks like when RFC 0560 was changed no one considered the performance impact of this semantics. Or maybe they did consider, but did not mention it in the RFC. |
Just bumping this issue, it looks like GCC has issued a performance fix for this: https://www.phoronix.com/scan.php?page=news_item&px=GCC-Inefficiency-Fix-67 |
I was able to achieve similar performance gains by casting to usize. I'm running on latest nightly, but the issue is reproducable with stable on the playground: https://play.rust-lang.org/?gist=0d67cb96632865292cdeb338ca463027&version=nightly&backtrace=0 I'm not sure whether I should open this as a separate bug or whether this is part of this one. |
@th0br0 you'll need to use |
The panic you see is intended behavior, so there is no bug here. Also I don't see how changing isize to usize would give any performance in your case, because they compile to the same assembly code: |
@pftbest the change was less a matter of isize/usize but rather explicitly casting each component of the addition to usize. Sorry that I didn't make that clear in my previous post. see the different assembly code: ( |
Ok, I see it now, the reason this would be fast in C because it will implicitly cast each component to |
Triage: it's been a year and a half, and seems like this isn't a bug, just differences in semantics. Closing! |
In case anyone is still interested in this issue: as of 2020-11-12, the benchmark program using
|
Simple code with loops and integer arithmetic is 15-20% slower in Rust, than the same exact (byte to byte exact) C code compiled by clang-3.9. I suspect this may be some issue in LLVM that is triggered by Rust.
Rust code: https://gist.github.com/pftbest/5c18e458cddd6a055878503c08a38848
C code: https://gist.github.com/pftbest/85ac44272eecb365ad62b7fbc4f72115
Rust version:
clang version:
Results from first PC (16% difference):
Results from second PC (26% difference):
The text was updated successfully, but these errors were encountered: