Sub-optimal codegen when adding with carry

Consider the following Rust code (no need to know Rust, it's very straightforward).

```rust
pub fn f1(q: &mut [u64; 8], implicit: u64) {
    let mut carry = false;
    for i in 0 .. q.len() {
        let (r, o) = u64::carrying_add(q[i], implicit, carry);  // Add q[i] + implicit + carry, return tuple with "result modulo 2^64" and "new carry"
        q[i] = r;
        carry = o;
    }
}
```

On x86, this gets compiled to what you would expect.

```asm
add     qword ptr [rdi], rsi
adc     qword ptr [rdi + 8], rsi
adc     qword ptr [rdi + 16], rsi
adc     qword ptr [rdi + 24], rsi
adc     qword ptr [rdi + 32], rsi
adc     qword ptr [rdi + 40], rsi
adc     qword ptr [rdi + 48], rsi
adc     qword ptr [rdi + 56], rsi
ret
```

However, if you take the `carry` as a parameter, rather than initialising it to zero

```rust
pub fn f2(q: &mut [u64; 8], implicit: u64, mut carry: bool) {
    for i in 0 .. q.len() {
        let (r, o) = u64::carrying_add(q[i], implicit, carry);
        q[i] = r;
        carry = o;
    }
}
```

I would expect it to be compiled to 

```asm
add     dl, -1  ; Set carry flag to 0/1 if dl is 0/1 resp
adc     qword ptr [rdi], rsi
adc     qword ptr [rdi + 8], rsi
adc     qword ptr [rdi + 16], rsi
adc     qword ptr [rdi + 24], rsi
adc     qword ptr [rdi + 32], rsi
adc     qword ptr [rdi + 40], rsi
adc     qword ptr [rdi + 48], rsi
adc     qword ptr [rdi + 56], rsi
ret
```

However llvm (via rustc) writes it instead in a very roundabout way:

```asm
mov     rax, qword ptr [rdi]
add     rax, rsi
setb    cl
mov     edx, edx
add     rdx, rax
setb    al
or      al, cl
mov     qword ptr [rdi], rdx
add     al, -1
adc     qword ptr [rdi + 8], rsi
adc     qword ptr [rdi + 16], rsi
adc     qword ptr [rdi + 24], rsi
adc     qword ptr [rdi + 32], rsi
adc     qword ptr [rdi + 40], rsi
adc     qword ptr [rdi + 48], rsi
adc     qword ptr [rdi + 56], rsi
```

This is a godbolt illustrating this. https://godbolt.org/z/9M1Tr4cGj

This is the equivalent godbolt, but in C++: https://godbolt.org/z/sKz5vrEjr . If you look at this one, you can switch to gcc to see how it gets it right.

All my attempts to nudge codegen into the optimsed form (e.g. by explicitly writing something like `let (dummy, o) = u64::carrying_add(0, u64::MAX, carry); carry = 0`) have failed. This interests me because the optimised codegen cuts an inner loop in a program of mine in half! :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sub-optimal codegen when adding with carry #169691

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Sub-optimal codegen when adding with carry #169691

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions