Incorrect store merging causes misaligned stores #289

Shai-Aviv · 2021-08-29T14:11:03Z

Compiling this C code with commit 5964b5c (GCC 11.1.0) and the options -march=rv64gc -O3:

#include <stdint.h>

void zero_two_uint16(uint16_t* ptr) {
    ptr[0] = 0;
    ptr[1] = 0;
}

void zero(uint16_t* ptr) {
    for (int i = 0; i < 16; ++i) {
        zero_two_uint16(ptr);
        ptr += 2;
    }
}

yields the following (incorrect) assembly:

<zero_two_uint16>:
        sh      zero,0(a0)
        sh      zero,2(a0)
        ret

<zero>:
        andi    a5,a0,7
        sd      zero,0(a0)
        sd      zero,8(a0)
        sd      zero,16(a0)
        sd      zero,24(a0)
        sd      zero,32(a0)
        sd      zero,40(a0)
        sd      zero,48(a0)
        sd      zero,56(a0)
        bnez    a5,<.L4>
        ret

<.L4>:
        ret

This causes misaligned stores when the pointer given to zero() is not aligned on a 4-byte boundary. I tried to use the option -fno-store-merging as a workaround to prevent this, but the output was the same (which I suspect is yet another bug).

With GCC 10.2.0, the output seems fine (again with -march=rv64gc -O3):

<zero_two_uint16>:
        sh      zero,0(a0)
        sh      zero,2(a0)
        ret

<zero>:
        andi    a5,a0,7
        bnez    a5,<.L4>
        sd      zero,0(a0)
        sd      zero,8(a0)
        sd      zero,16(a0)
        sd      zero,24(a0)
        sd      zero,32(a0)
        sd      zero,40(a0)
        sd      zero,48(a0)
        sd      zero,56(a0)
        ret

<.L4>:
        sh      zero,0(a0)
        sh      zero,2(a0)
        sh      zero,4(a0)
        sh      zero,6(a0)
        sh      zero,8(a0)
        sh      zero,10(a0)
        sh      zero,12(a0)
        sh      zero,14(a0)
        sh      zero,16(a0)
        sh      zero,18(a0)
        sh      zero,20(a0)
        sh      zero,22(a0)
        sh      zero,24(a0)
        sh      zero,26(a0)
        sh      zero,28(a0)
        sh      zero,30(a0)
        sh      zero,32(a0)
        sh      zero,34(a0)
        sh      zero,36(a0)
        sh      zero,38(a0)
        sh      zero,40(a0)
        sh      zero,42(a0)
        sh      zero,44(a0)
        sh      zero,46(a0)
        sh      zero,48(a0)
        sh      zero,50(a0)
        sh      zero,52(a0)
        sh      zero,54(a0)
        sh      zero,56(a0)
        sh      zero,58(a0)
        sh      zero,60(a0)
        sh      zero,62(a0)
        ret

The text was updated successfully, but these errors were encountered:

kito-cheng · 2021-08-29T14:53:01Z

Thanks for report that, it seems interesting bug...that's not cause by -fstore-merging, it cause by -ftree-vectorize, so the workaround for this case is apply -fno-tree-vectorize to your CFLAGS.

I need more time to figure out what happen.

jim-wilson · 2021-08-29T23:52:03Z

A git bisect on the FSF GCC tree points the blame at

commit f5e18dd
Author: Kewen Lin linkw@gcc.gnu.org
Date: Tue Nov 3 02:51:47 2020 +0000

pass: Run cleanup passes before SLP [PR96789]
...

This suggests the failure occurs in 168t.fre4. Except that the wrong code doesn't appear until 299r.jump2. I get the right result if I use -fno-slp-vectorize, or if I comment out the pass_fre inside pass_pre_slp_scalar_cleanup in passes.def. Very odd. There might be some internal state that doesn't appear in the dump files that is messed up. Or maybe git bisect is accidentally pointing me at the wrong patch because of complex interactions between the various optimization patches. This needs more investigation.

jim-wilson · 2021-08-30T22:50:38Z

I was looking at where the block of half word stores was being optimized away which happens late, but is not where the real problem is. I failed to notice that they weren't half-word stores anymore. The problem occurs in 170t.slp1 which vectorizes the block of half-word stores. This appears to happen because 168t.fre4 optimized away a phi node. So my bisect appears to be correct. I'm not an expert on any of these passes in question, so we should file this upstream. This looks target independent, but is only a problem on targets that don't support unaligned loads/stores, and RISC-V is unfortunately the only major architecture in that category.

jim-wilson · 2021-08-31T00:17:16Z

I filed a bug upstream
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102139
with a bit more info and verifying that it is still broken on mainline sources.

jim-wilson · 2021-09-01T18:24:50Z

There is a fix on FSF GCC mainline for this now. No fix available for gcc-11 yet.

IdanHo · 2022-03-13T12:56:02Z

Looks like it was cherry picked into gcc-11, can we get it into the riscv-gcc repo?

kito-cheng · 2022-04-15T06:16:39Z

GCC 12 will release soon, I plan bump to GCC 12 once it release, so that problem should gone :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect store merging causes misaligned stores #289

Incorrect store merging causes misaligned stores #289

Shai-Aviv commented Aug 29, 2021 •

edited

Loading

kito-cheng commented Aug 29, 2021

jim-wilson commented Aug 29, 2021

jim-wilson commented Aug 30, 2021

jim-wilson commented Aug 31, 2021

jim-wilson commented Sep 1, 2021

IdanHo commented Mar 13, 2022

kito-cheng commented Apr 15, 2022

Incorrect store merging causes misaligned stores #289

Incorrect store merging causes misaligned stores #289

Comments

Shai-Aviv commented Aug 29, 2021 • edited Loading

kito-cheng commented Aug 29, 2021

jim-wilson commented Aug 29, 2021

jim-wilson commented Aug 30, 2021

jim-wilson commented Aug 31, 2021

jim-wilson commented Sep 1, 2021

IdanHo commented Mar 13, 2022

kito-cheng commented Apr 15, 2022

Shai-Aviv commented Aug 29, 2021 •

edited

Loading