Unaligned access on aarch64 wih "+strict-align" on debug using enum #66897

christophcharles · 2019-11-30T09:17:53Z

I may have stumbled on a bug regarding aligned access on aarch64: when using enum as return values, in some cases, the rust compiler produces code accessing 64-bit value with 32-bit alignement only, even if "+string-align" is selected. The problem is difficult to catch as it really is an issue only if the MMU is disabled (so bare-metal code).

This is the minimal code I was able to produce that shows the problem:

enum Result {
    Ok(u32),
    Err(ErrorTest)
}

pub enum ErrorTest {
    InvalidPin,
    Err2
}

fn test(reg: u32, pin: u32) -> Result {
    if pin < 32 {
        return Result::Ok(reg);
    }
    Result::Err(ErrorTest::InvalidPin)
}

fn main() {
    let _ = test(0, 14);
}

It is directly inspired from bare metal code that I want to run on aarch64 without the MMU enabled. As such, I am compiling in debug, using "+strict-align". Using the target aarch64-unknown-linux-gnu, this produces the following assembly for the functiontest:

_ZN12rust_unalign4test17ha1e5255e8972d19dE:
    3bb4:	ff 83 00 d1 	sub	sp, sp, #32
    3bb8:	e0 0f 00 b9 	str	w0, [sp, #12]
    3bbc:	e1 13 00 b9 	str	w1, [sp, #16]
    3bc0:	e8 13 40 b9 	ldr	w8, [sp, #16]
    3bc4:	1f 81 00 71 	cmp	w8, #32
    3bc8:	63 01 00 54 	b.lo	#44 <_ZN12rust_unalign4test17ha1e5255e8972d19dE+0x40>
    3bcc:	08 00 80 52 	mov	w8, #0
    3bd0:	e8 7f 00 39 	strb	w8, [sp, #31]
    3bd4:	e9 53 00 91 	add	x9, sp, #20
    3bd8:	e8 7f 40 39 	ldrb	w8, [sp, #31]
    3bdc:	2a 00 80 52 	mov	w10, #1
    3be0:	08 01 0a 0a 	and	w8, w8, w10
    3be4:	28 05 00 39 	strb	w8, [x9, #1]
    3be8:	28 00 80 52 	mov	w8, #1
    3bec:	e8 53 00 39 	strb	w8, [sp, #20]
    3bf0:	05 00 00 14 	b	#20 <_ZN12rust_unalign4test17ha1e5255e8972d19dE+0x50>
    3bf4:	e8 0f 40 b9 	ldr	w8, [sp, #12]
    3bf8:	e8 1b 00 b9 	str	w8, [sp, #24]
    3bfc:	08 00 80 52 	mov	w8, #0
    3c00:	e8 53 00 39 	strb	w8, [sp, #20]
    3c04:	e0 43 41 f8 	ldur	x0, [sp, #20]
    3c08:	ff 83 00 91 	add	sp, sp, #32
    3c0c:	c0 03 5f d6 	ret

The instruction at 3c04 is the problematic one as it is accessing a 64-bit value at a 32-bit aligned address (sp is 8-byte aligned).

The only way I was able to reproduce this, but I am not sure my search was extensive enough, was by using enum. If the discriminant and the stored value naturally fit in two 32-bit fields, rust seems to pack them into a single 64-bit register but forgets to correctly align the access. struct does not seem to have this problem.

The problem already appears in the llvm-ir produced by cargo rustc --target aarch64-unknown-linux-gnu -- --emit=llvm-ir:

define internal i64 @_ZN12rust_unalign4test17ha1e5255e8972d19dE(i32, i32) unnamed_addr #0 !dbg !13 {
start:
  %_6 = alloca i8, align 1
  %_0 = alloca %Result, align 4
  %pin = alloca i32, align 4
  %reg = alloca i32, align 4
  store i32 %0, i32* %reg, align 4
  call void @llvm.dbg.declare(metadata i32* %reg, metadata !31, metadata !DIExpression()), !dbg !32
  store i32 %1, i32* %pin, align 4
  call void @llvm.dbg.declare(metadata i32* %pin, metadata !33, metadata !DIExpression()), !dbg !32
  %2 = load i32, i32* %pin, align 4, !dbg !34
  %3 = icmp ult i32 %2, 32, !dbg !34
  br i1 %3, label %bb2, label %bb1, !dbg !35

bb1:                                              ; preds = %start
  store i8 0, i8* %_6, align 1, !dbg !36
  %4 = bitcast %Result* %_0 to %"Result::Err"*, !dbg !37
  %5 = getelementptr inbounds %"Result::Err", %"Result::Err"* %4, i32 0, i32 1, !dbg !37
  %6 = load i8, i8* %_6, align 1, !dbg !37, !range !38
  %7 = trunc i8 %6 to i1, !dbg !37
  %8 = zext i1 %7 to i8, !dbg !37
  store i8 %8, i8* %5, align 1, !dbg !37
  %9 = bitcast %Result* %_0 to i8*, !dbg !37
  store i8 1, i8* %9, align 4, !dbg !37
  br label %bb3, !dbg !39

bb2:                                              ; preds = %start
  %10 = load i32, i32* %reg, align 4, !dbg !40
  %11 = bitcast %Result* %_0 to %"Result::Ok"*, !dbg !41
  %12 = getelementptr inbounds %"Result::Ok", %"Result::Ok"* %11, i32 0, i32 1, !dbg !41
  store i32 %10, i32* %12, align 4, !dbg !41
  %13 = bitcast %Result* %_0 to i8*, !dbg !41
  store i8 0, i8* %13, align 4, !dbg !41
  br label %bb3, !dbg !42

bb3:                                              ; preds = %bb1, %bb2
  %14 = bitcast %Result* %_0 to i64*, !dbg !39
  %15 = load i64, i64* %14, align 4, !dbg !39
  ret i64 %15, !dbg !39
}

Here we can see that the return value is loaded through an i64 with 4-byte alignement which translates to the wrong alignement in the final assembly.

Here is a link to the source to reproduce the bug:
https://github.com/christophcharles/rust_unalign
I have put the assembly outputs of various stages and platforms in the asm folder.

I am not savvy enough to dabble into the rust source code but it seems to affect both stable and nightly rust. All the examples here though are using stable.

Meta

rustc --version --verbose (used for these outputs):

binary: rustc
commit-hash: 4560ea788cb760f0a34127156c78e2552949f734
commit-date: 2019-11-04
host: x86_64-unknown-linux-gnu
release: 1.39.0
LLVM version: 9.0```

The text was updated successfully, but these errors were encountered:

hanna-kruppe · 2019-11-30T10:07:03Z

I think this might be an LLVM bug, rustc correctly records in the IR that the access is underaligned. LLVM backends must lower underaligned accesses correctly, even (especially) when compiling for a target where unaligned accesses are not OK. It seems the AArch64 backend is not doing that in this case, assuming that the function @_ZN12rust_unalign4test17ha1e5255e8972d19dE indeed has the strict-align feature enabled. Please verify that, or allow others to verify it easily by including the full LLVM IR output.

However, independently of that, the setup you're using in the reproduction repo (standard target triple and Cargo invocation, only adding -Ctarget-feature=+strict-align in .cargo/config) is not correct, as it will not recompile the standard library with the same target feature, so the precompiled code from there won't be affected and may do unaligned accesses regardless. This is probably not relevant for the LLVM bug I suspect (because it's in code from your crate) but for your actual application you need to compile all crates including standard library with the target feature enabled.

christophcharles · 2019-11-30T11:19:32Z

Yes, the strict-align feature is correctly enabled for @_ZN12rust_unalign4test17ha1e5255e8972d19dE as can be seen from https://github.com/christophcharles/rust_unalign/blob/master/asm/llvmir.aarch64#L94 .

Yes you are right. The standard library will not be correctly compiled. I have put the minimal code together quite quickly and did not iron it out properly. Still, my own file should be compiled correctly and my bare-metal code is using a custom json target file and uses xbuild to cross-compile the core library, so I should be covered there.

I am not proficient with LLVM so my first interpretation might really be wrong. I'm happy to report this as a bug on LLVM if someone can confirm that indeed the llvm-ir should not translate to the provided assembly when strict-align is enabled.

hanna-kruppe · 2019-11-30T11:40:47Z

Hm, I did a little digging and found tests which test exactly this scenario and indicate it should work. I also observe the same codegen with the target triple and function attributes indicated by the IR you uploaded. So something else is going wrong and someone will have to reduce the Rust code (and the process of compiling it) to figure out where it happens.

hanna-kruppe · 2019-11-30T11:45:11Z

Some good first reduction steps to try would probably be to:

Turn this std-using binary into a no_std, staticlib crate (much less IR and asm).
Replace the whole enum dance with a core::ptr::load_unaligned call (that would be an align 1 load, not align 4, but hopefully that doesn't matter).

christophcharles · 2019-11-30T13:43:08Z

My first steps reducing code is probably suboptimal : https://github.com/christophcharles/rust_unalign/tree/reduced_code
I went no_std but rlib rather than staticlib to avoid the whole panic_handler shenanigans.

I did not understand how you would reproduce the bug using core::ptr::load_unaligned. I tried several combinations declaring various u32 variables and reading a u64 (using load_unaligned), but everything was aligned every single time.

Still I stumbled on something weird. My debian install only has llvm-8, but I tried feeding the llvm-ir directly into it using llc-8 < asm/llvmir.aarch64 > asm/llc-8-output.aarch64 and got https://github.com/christophcharles/rust_unalign/blob/reduced_code/asm/llc-8-output.aarch64.
Most notably, all accesses are correctly aligned! Using godbolt, accesses are aligned as well.

Is there some attribute I am not sending to llc when using it directly through the command line (or through godbolt) with the ir output from rustc?

hanna-kruppe · 2019-11-30T14:52:29Z

I did not understand how you would reproduce the bug using core::ptr::load_unaligned. I tried several combinations declaring various u32 variables and reading a u64 (using load_unaligned), but everything was aligned every single time.

Are you saying the LLVM IR contained load i64, ..., align 8 instead of load i64, ..., align 1 when you did that? Or that the assembly correctly exploded the unaligned load into a large number of (obviously aligned) byte-sized loads. What I was thinking of, by the way, was basically this:

fn foo(p: *const i64) -> i64 {
    ptr::load_unaligned(p)
}

Still I stumbled on something weird. My debian install only has llvm-8, but I tried feeding the llvm-ir directly into it using llc-8 < asm/llvmir.aarch64 > asm/llc-8-output.aarch64 and got https://github.com/christophcharles/rust_unalign/blob/reduced_code/asm/llc-8-output.aarch64.
Most notably, all accesses are correctly aligned! Using godbolt, accesses are aligned as well.

Is there some attribute I am not sending to llc when using it directly through the command line (or through godbolt) with the ir output from rustc?

That is very weird indeed!

hanna-kruppe · 2019-11-30T15:44:02Z

Is there some attribute I am not sending to llc when using it directly through the command line (or through godbolt) with the ir output from rustc?

Ah, this was the key. As far as I can tell (my ARM assembly is not great), llc -O0 consistently gets alignment wrong, while bare llc and any higher -O<n> is correct. This is probably because -O0 runs a very different codegen pipeline than higher settings. In particular, instruction selection and legalization (which I think is what should expand unaligned loads) is totally different on some targets including AArch64.

I let rustc generate an IR file that does nothing more than load i64, i64* %ptr, align 1. With bare llc, this expands to an ugly but (presumably) correct sequence of byte loads and bfis (https://godbolt.org/z/gRSFJn) but with llc -O0 you just get a ldr. So it appears that AArch64 GlobalISel is buggy.

The above LLVM IR is the output after optimizations when compiling this simple function:

pub unsafe fn load_unaligned(p: *const i64) -> i64 {
    ptr::read_unaligned(p)
}

When building without optimizations, I believe I can see the same bug (ldr from unaligned address despite strict-align) but it's a bit hard to tell because it's hidden underneath a lot of clutter generated by debug assertions and overflow checks.

christophcharles · 2019-12-01T07:50:37Z

Ok! Thank you!

So this is definitely an LLVM bug. I will report it to the LLVM project after a bit more testing (with the latest build) and reducing it to a small test case.

For me, this is also quite nice as I can at least solve my problem in the meantime, by compiling with O1 for my bare metal code.

nikic · 2019-12-07T12:15:59Z

I went ahead and reported this upstream: https://bugs.llvm.org/show_bug.cgi?id=44246

RReverser · 2023-01-22T22:20:49Z

FWIW looks like the upstream issue was fixed and this probably can be closed.

nikic · 2023-01-22T22:39:53Z

I think so. Godbolt for reference: https://rust.godbolt.org/z/8vEPo3Kcj

workingjubilee added O-AArch64 Armv8-A or later processors in AArch64 mode and removed O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state labels Mar 18, 2022

nikic closed this as completed Jan 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unaligned access on aarch64 wih "+strict-align" on debug using enum #66897

Unaligned access on aarch64 wih "+strict-align" on debug using enum #66897

christophcharles commented Nov 30, 2019 •

edited

hanna-kruppe commented Nov 30, 2019 •

edited

christophcharles commented Nov 30, 2019 •

edited

hanna-kruppe commented Nov 30, 2019

hanna-kruppe commented Nov 30, 2019

christophcharles commented Nov 30, 2019

hanna-kruppe commented Nov 30, 2019

hanna-kruppe commented Nov 30, 2019

christophcharles commented Dec 1, 2019

nikic commented Dec 7, 2019

RReverser commented Jan 22, 2023 •

edited

nikic commented Jan 22, 2023

Unaligned access on aarch64 wih "+strict-align" on debug using enum #66897

Unaligned access on aarch64 wih "+strict-align" on debug using enum #66897

Comments

christophcharles commented Nov 30, 2019 • edited

Meta

hanna-kruppe commented Nov 30, 2019 • edited

christophcharles commented Nov 30, 2019 • edited

hanna-kruppe commented Nov 30, 2019

hanna-kruppe commented Nov 30, 2019

christophcharles commented Nov 30, 2019

hanna-kruppe commented Nov 30, 2019

hanna-kruppe commented Nov 30, 2019

christophcharles commented Dec 1, 2019

nikic commented Dec 7, 2019

RReverser commented Jan 22, 2023 • edited

nikic commented Jan 22, 2023

christophcharles commented Nov 30, 2019 •

edited

hanna-kruppe commented Nov 30, 2019 •

edited

christophcharles commented Nov 30, 2019 •

edited

RReverser commented Jan 22, 2023 •

edited