Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unaligned access on aarch64 wih "+strict-align" on debug using enum #66897

Closed
christophcharles opened this issue Nov 30, 2019 · 11 comments
Closed
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. O-AArch64 Armv8-A or later processors in AArch64 mode T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@christophcharles
Copy link

christophcharles commented Nov 30, 2019

I may have stumbled on a bug regarding aligned access on aarch64: when using enum as return values, in some cases, the rust compiler produces code accessing 64-bit value with 32-bit alignement only, even if "+string-align" is selected. The problem is difficult to catch as it really is an issue only if the MMU is disabled (so bare-metal code).

This is the minimal code I was able to produce that shows the problem:

enum Result {
    Ok(u32),
    Err(ErrorTest)
}

pub enum ErrorTest {
    InvalidPin,
    Err2
}

fn test(reg: u32, pin: u32) -> Result {
    if pin < 32 {
        return Result::Ok(reg);
    }
    Result::Err(ErrorTest::InvalidPin)
}

fn main() {
    let _ = test(0, 14);
}

It is directly inspired from bare metal code that I want to run on aarch64 without the MMU enabled. As such, I am compiling in debug, using "+strict-align". Using the target aarch64-unknown-linux-gnu, this produces the following assembly for the functiontest:

_ZN12rust_unalign4test17ha1e5255e8972d19dE:
    3bb4:	ff 83 00 d1 	sub	sp, sp, #32
    3bb8:	e0 0f 00 b9 	str	w0, [sp, #12]
    3bbc:	e1 13 00 b9 	str	w1, [sp, #16]
    3bc0:	e8 13 40 b9 	ldr	w8, [sp, #16]
    3bc4:	1f 81 00 71 	cmp	w8, #32
    3bc8:	63 01 00 54 	b.lo	#44 <_ZN12rust_unalign4test17ha1e5255e8972d19dE+0x40>
    3bcc:	08 00 80 52 	mov	w8, #0
    3bd0:	e8 7f 00 39 	strb	w8, [sp, #31]
    3bd4:	e9 53 00 91 	add	x9, sp, #20
    3bd8:	e8 7f 40 39 	ldrb	w8, [sp, #31]
    3bdc:	2a 00 80 52 	mov	w10, #1
    3be0:	08 01 0a 0a 	and	w8, w8, w10
    3be4:	28 05 00 39 	strb	w8, [x9, #1]
    3be8:	28 00 80 52 	mov	w8, #1
    3bec:	e8 53 00 39 	strb	w8, [sp, #20]
    3bf0:	05 00 00 14 	b	#20 <_ZN12rust_unalign4test17ha1e5255e8972d19dE+0x50>
    3bf4:	e8 0f 40 b9 	ldr	w8, [sp, #12]
    3bf8:	e8 1b 00 b9 	str	w8, [sp, #24]
    3bfc:	08 00 80 52 	mov	w8, #0
    3c00:	e8 53 00 39 	strb	w8, [sp, #20]
    3c04:	e0 43 41 f8 	ldur	x0, [sp, #20]
    3c08:	ff 83 00 91 	add	sp, sp, #32
    3c0c:	c0 03 5f d6 	ret

The instruction at 3c04 is the problematic one as it is accessing a 64-bit value at a 32-bit aligned address (sp is 8-byte aligned).

The only way I was able to reproduce this, but I am not sure my search was extensive enough, was by using enum. If the discriminant and the stored value naturally fit in two 32-bit fields, rust seems to pack them into a single 64-bit register but forgets to correctly align the access. struct does not seem to have this problem.

The problem already appears in the llvm-ir produced by cargo rustc --target aarch64-unknown-linux-gnu -- --emit=llvm-ir:

define internal i64 @_ZN12rust_unalign4test17ha1e5255e8972d19dE(i32, i32) unnamed_addr #0 !dbg !13 {
start:
  %_6 = alloca i8, align 1
  %_0 = alloca %Result, align 4
  %pin = alloca i32, align 4
  %reg = alloca i32, align 4
  store i32 %0, i32* %reg, align 4
  call void @llvm.dbg.declare(metadata i32* %reg, metadata !31, metadata !DIExpression()), !dbg !32
  store i32 %1, i32* %pin, align 4
  call void @llvm.dbg.declare(metadata i32* %pin, metadata !33, metadata !DIExpression()), !dbg !32
  %2 = load i32, i32* %pin, align 4, !dbg !34
  %3 = icmp ult i32 %2, 32, !dbg !34
  br i1 %3, label %bb2, label %bb1, !dbg !35

bb1:                                              ; preds = %start
  store i8 0, i8* %_6, align 1, !dbg !36
  %4 = bitcast %Result* %_0 to %"Result::Err"*, !dbg !37
  %5 = getelementptr inbounds %"Result::Err", %"Result::Err"* %4, i32 0, i32 1, !dbg !37
  %6 = load i8, i8* %_6, align 1, !dbg !37, !range !38
  %7 = trunc i8 %6 to i1, !dbg !37
  %8 = zext i1 %7 to i8, !dbg !37
  store i8 %8, i8* %5, align 1, !dbg !37
  %9 = bitcast %Result* %_0 to i8*, !dbg !37
  store i8 1, i8* %9, align 4, !dbg !37
  br label %bb3, !dbg !39

bb2:                                              ; preds = %start
  %10 = load i32, i32* %reg, align 4, !dbg !40
  %11 = bitcast %Result* %_0 to %"Result::Ok"*, !dbg !41
  %12 = getelementptr inbounds %"Result::Ok", %"Result::Ok"* %11, i32 0, i32 1, !dbg !41
  store i32 %10, i32* %12, align 4, !dbg !41
  %13 = bitcast %Result* %_0 to i8*, !dbg !41
  store i8 0, i8* %13, align 4, !dbg !41
  br label %bb3, !dbg !42

bb3:                                              ; preds = %bb1, %bb2
  %14 = bitcast %Result* %_0 to i64*, !dbg !39
  %15 = load i64, i64* %14, align 4, !dbg !39
  ret i64 %15, !dbg !39
}

Here we can see that the return value is loaded through an i64 with 4-byte alignement which translates to the wrong alignement in the final assembly.

Here is a link to the source to reproduce the bug:
https://github.com/christophcharles/rust_unalign
I have put the assembly outputs of various stages and platforms in the asm folder.

I am not savvy enough to dabble into the rust source code but it seems to affect both stable and nightly rust. All the examples here though are using stable.

Meta

rustc --version --verbose (used for these outputs):

binary: rustc
commit-hash: 4560ea788cb760f0a34127156c78e2552949f734
commit-date: 2019-11-04
host: x86_64-unknown-linux-gnu
release: 1.39.0
LLVM version: 9.0```

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Nov 30, 2019

I think this might be an LLVM bug, rustc correctly records in the IR that the access is underaligned. LLVM backends must lower underaligned accesses correctly, even (especially) when compiling for a target where unaligned accesses are not OK. It seems the AArch64 backend is not doing that in this case, assuming that the function @_ZN12rust_unalign4test17ha1e5255e8972d19dE indeed has the strict-align feature enabled. Please verify that, or allow others to verify it easily by including the full LLVM IR output.

However, independently of that, the setup you're using in the reproduction repo (standard target triple and Cargo invocation, only adding -Ctarget-feature=+strict-align in .cargo/config) is not correct, as it will not recompile the standard library with the same target feature, so the precompiled code from there won't be affected and may do unaligned accesses regardless. This is probably not relevant for the LLVM bug I suspect (because it's in code from your crate) but for your actual application you need to compile all crates including standard library with the target feature enabled.

@jonas-schievink jonas-schievink added C-bug Category: This is a bug. O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. labels Nov 30, 2019
@christophcharles
Copy link
Author

christophcharles commented Nov 30, 2019

Yes, the strict-align feature is correctly enabled for @_ZN12rust_unalign4test17ha1e5255e8972d19dE as can be seen from https://github.com/christophcharles/rust_unalign/blob/master/asm/llvmir.aarch64#L94 .

Yes you are right. The standard library will not be correctly compiled. I have put the minimal code together quite quickly and did not iron it out properly. Still, my own file should be compiled correctly and my bare-metal code is using a custom json target file and uses xbuild to cross-compile the core library, so I should be covered there.

I am not proficient with LLVM so my first interpretation might really be wrong. I'm happy to report this as a bug on LLVM if someone can confirm that indeed the llvm-ir should not translate to the provided assembly when strict-align is enabled.

@hanna-kruppe
Copy link
Contributor

Hm, I did a little digging and found tests which test exactly this scenario and indicate it should work. I also observe the same codegen with the target triple and function attributes indicated by the IR you uploaded. So something else is going wrong and someone will have to reduce the Rust code (and the process of compiling it) to figure out where it happens.

@hanna-kruppe
Copy link
Contributor

Some good first reduction steps to try would probably be to:

  1. Turn this std-using binary into a no_std, staticlib crate (much less IR and asm).
  2. Replace the whole enum dance with a core::ptr::load_unaligned call (that would be an align 1 load, not align 4, but hopefully that doesn't matter).

@christophcharles
Copy link
Author

My first steps reducing code is probably suboptimal : https://github.com/christophcharles/rust_unalign/tree/reduced_code
I went no_std but rlib rather than staticlib to avoid the whole panic_handler shenanigans.

I did not understand how you would reproduce the bug using core::ptr::load_unaligned. I tried several combinations declaring various u32 variables and reading a u64 (using load_unaligned), but everything was aligned every single time.

Still I stumbled on something weird. My debian install only has llvm-8, but I tried feeding the llvm-ir directly into it using llc-8 < asm/llvmir.aarch64 > asm/llc-8-output.aarch64 and got https://github.com/christophcharles/rust_unalign/blob/reduced_code/asm/llc-8-output.aarch64.
Most notably, all accesses are correctly aligned! Using godbolt, accesses are aligned as well.

Is there some attribute I am not sending to llc when using it directly through the command line (or through godbolt) with the ir output from rustc?

@hanna-kruppe
Copy link
Contributor

I did not understand how you would reproduce the bug using core::ptr::load_unaligned. I tried several combinations declaring various u32 variables and reading a u64 (using load_unaligned), but everything was aligned every single time.

Are you saying the LLVM IR contained load i64, ..., align 8 instead of load i64, ..., align 1 when you did that? Or that the assembly correctly exploded the unaligned load into a large number of (obviously aligned) byte-sized loads. What I was thinking of, by the way, was basically this:

fn foo(p: *const i64) -> i64 {
    ptr::load_unaligned(p)
}

Still I stumbled on something weird. My debian install only has llvm-8, but I tried feeding the llvm-ir directly into it using llc-8 < asm/llvmir.aarch64 > asm/llc-8-output.aarch64 and got https://github.com/christophcharles/rust_unalign/blob/reduced_code/asm/llc-8-output.aarch64.
Most notably, all accesses are correctly aligned! Using godbolt, accesses are aligned as well.

Is there some attribute I am not sending to llc when using it directly through the command line (or through godbolt) with the ir output from rustc?

That is very weird indeed!

@hanna-kruppe
Copy link
Contributor

Is there some attribute I am not sending to llc when using it directly through the command line (or through godbolt) with the ir output from rustc?

Ah, this was the key. As far as I can tell (my ARM assembly is not great), llc -O0 consistently gets alignment wrong, while bare llc and any higher -O<n> is correct. This is probably because -O0 runs a very different codegen pipeline than higher settings. In particular, instruction selection and legalization (which I think is what should expand unaligned loads) is totally different on some targets including AArch64.

I let rustc generate an IR file that does nothing more than load i64, i64* %ptr, align 1. With bare llc, this expands to an ugly but (presumably) correct sequence of byte loads and bfis (https://godbolt.org/z/gRSFJn) but with llc -O0 you just get a ldr. So it appears that AArch64 GlobalISel is buggy.

The above LLVM IR is the output after optimizations when compiling this simple function:

pub unsafe fn load_unaligned(p: *const i64) -> i64 {
    ptr::read_unaligned(p)
}

When building without optimizations, I believe I can see the same bug (ldr from unaligned address despite strict-align) but it's a bit hard to tell because it's hidden underneath a lot of clutter generated by debug assertions and overflow checks.

@christophcharles
Copy link
Author

Ok! Thank you!

So this is definitely an LLVM bug. I will report it to the LLVM project after a bit more testing (with the latest build) and reducing it to a small test case.

For me, this is also quite nice as I can at least solve my problem in the meantime, by compiling with O1 for my bare metal code.

@nikic
Copy link
Contributor

nikic commented Dec 7, 2019

I went ahead and reported this upstream: https://bugs.llvm.org/show_bug.cgi?id=44246

@workingjubilee workingjubilee added O-AArch64 Armv8-A or later processors in AArch64 mode and removed O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state labels Mar 18, 2022
@RReverser
Copy link
Contributor

RReverser commented Jan 22, 2023

FWIW looks like the upstream issue was fixed and this probably can be closed.

@nikic
Copy link
Contributor

nikic commented Jan 22, 2023

I think so. Godbolt for reference: https://rust.godbolt.org/z/8vEPo3Kcj

@nikic nikic closed this as completed Jan 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. O-AArch64 Armv8-A or later processors in AArch64 mode T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants