-
Notifications
You must be signed in to change notification settings - Fork 15.3k
Description
Consider the following Rust code (no need to know Rust, it's very straightforward).
pub fn f1(q: &mut [u64; 8], implicit: u64) {
let mut carry = false;
for i in 0 .. q.len() {
let (r, o) = u64::carrying_add(q[i], implicit, carry); // Add q[i] + implicit + carry, return tuple with "result modulo 2^64" and "new carry"
q[i] = r;
carry = o;
}
}On x86, this gets compiled to what you would expect.
add qword ptr [rdi], rsi
adc qword ptr [rdi + 8], rsi
adc qword ptr [rdi + 16], rsi
adc qword ptr [rdi + 24], rsi
adc qword ptr [rdi + 32], rsi
adc qword ptr [rdi + 40], rsi
adc qword ptr [rdi + 48], rsi
adc qword ptr [rdi + 56], rsi
retHowever, if you take the carry as a parameter, rather than initialising it to zero
pub fn f2(q: &mut [u64; 8], implicit: u64, mut carry: bool) {
for i in 0 .. q.len() {
let (r, o) = u64::carrying_add(q[i], implicit, carry);
q[i] = r;
carry = o;
}
}I would expect it to be compiled to
add dl, -1 ; Set carry flag to 0/1 if dl is 0/1 resp
adc qword ptr [rdi], rsi
adc qword ptr [rdi + 8], rsi
adc qword ptr [rdi + 16], rsi
adc qword ptr [rdi + 24], rsi
adc qword ptr [rdi + 32], rsi
adc qword ptr [rdi + 40], rsi
adc qword ptr [rdi + 48], rsi
adc qword ptr [rdi + 56], rsi
retHowever llvm (via rustc) writes it instead in a very roundabout way:
mov rax, qword ptr [rdi]
add rax, rsi
setb cl
mov edx, edx
add rdx, rax
setb al
or al, cl
mov qword ptr [rdi], rdx
add al, -1
adc qword ptr [rdi + 8], rsi
adc qword ptr [rdi + 16], rsi
adc qword ptr [rdi + 24], rsi
adc qword ptr [rdi + 32], rsi
adc qword ptr [rdi + 40], rsi
adc qword ptr [rdi + 48], rsi
adc qword ptr [rdi + 56], rsiThis is a godbolt illustrating this. https://godbolt.org/z/9M1Tr4cGj
This is the equivalent godbolt, but in C++: https://godbolt.org/z/sKz5vrEjr . If you look at this one, you can switch to gcc to see how it gets it right.
All my attempts to nudge codegen into the optimsed form (e.g. by explicitly writing something like let (dummy, o) = u64::carrying_add(0, u64::MAX, carry); carry = 0) have failed. This interests me because the optimised codegen cuts an inner loop in a program of mine in half! :)