Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault on no-std binary on 1.72.0, since nightly 2023-06-17 #115225

Closed
MarcusGrass opened this issue Aug 25, 2023 · 8 comments
Closed

Segfault on no-std binary on 1.72.0, since nightly 2023-06-17 #115225

MarcusGrass opened this issue Aug 25, 2023 · 8 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. regression-untriaged Untriaged performance or correctness regression. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@MarcusGrass
Copy link
Contributor

Summary

I've written a tiny-stdlib that works without libc, binaries produced by it seems to be broken per nightly-2023-06-17 and forward.

A small repro can be found here.

Running gdb on the binary shows:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7fed791 in tiny_std::unix::symbols::memset (s=0x7fffffffc2cc, c=0, n=4) at src/unix/symbols.rs:154
154     pub unsafe extern "C" fn memset(s: *mut u8, c: core::ffi::c_int, n: usize) -> *mut u8 {

It's actually in this line that it breaks, but in the copied version here which hasn't changed in compiler builtins since.

It could be that I have UB somewhere, or something about the emission of that memset changed between those times.
I'll try to search further but maybe this rings a bell somewhere.

Code

I tried this code:

#![no_std]
#![no_main]
use argon2::{Algorithm, Params, Version};

#[no_mangle]
extern "C" fn main() -> i32 {
    tiny_std::unix::host_name::host_name().unwrap();
    unix_print::unix_println!("Start");
    let mut key = [0u8; 32];
    let pass_bytes = [2u8; 128];
    let salt = [1u8; 32];
    argon2::Argon2::new(
        Algorithm::Argon2i,
        Version::V0x13,
        Params::new(
            65536,
            10,
            4,
            Some(32),
        )
            .unwrap()
    )
        .hash_password_into(&pass_bytes, &salt, &mut key).unwrap();
    unix_print::unix_println!("Done");
    0
}

I expected to see this happen: No segfault

Instead, this happened: Sefault

Version it worked on

It most recently worked on: 1.71.0 and nightly-2023-06-16

Version with regression

rustc --version --verbose:

rustc 1.72.0 (5680fa18f 2023-08-23)
binary: rustc
commit-hash: 5680fa18feaa87f3ff04063800aec256c3d4b4be
commit-date: 2023-08-23
host: x86_64-unknown-linux-gnu
release: 1.72.0
LLVM version: 16.0.5
@MarcusGrass MarcusGrass added C-bug Category: This is a bug. regression-untriaged Untriaged performance or correctness regression. labels Aug 25, 2023
@rustbot rustbot added I-prioritize Issue: Indicates that prioritization has been requested for this issue. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Aug 25, 2023
@asquared31415
Copy link
Contributor

It looks like tiny_std::unix::symbols::memset is itself calling tiny_std::unix::symbols::memset. I saw over 14_000 consecutive frames of that in the call stack at the segfault. I think you've fallen victim to the fact that compilers see code that looks like memset and optimize it into a call memset, since it thinks it knows that function. Unfortunately this can cause problems when you're trying to implement well known symbols like memset, memcpy or similar. Unfortunately I do not know what the workarounds are, hopefully someone will be able to help you with that.

@MarcusGrass
Copy link
Contributor Author

It looks like tiny_std::unix::symbols::memset is itself calling tiny_std::unix::symbols::memset. I saw over 14_000 consecutive frames of that in the call stack at the segfault. I think you've fallen victim to the fact that compilers see code that looks like memset and optimize it into a call memset, since it thinks it knows that function. Unfortunately this can cause problems when you're trying to implement well known symbols like memset, memcpy or similar. Unfortunately I do not know what the workarounds are, hopefully someone will be able to help you with that.

Oh, Interesting, I'll look into that, thanks for the lead!

@MarcusGrass
Copy link
Contributor Author

MarcusGrass commented Aug 28, 2023

I had a few lazy attempts at trying to inline, uninline, asm, global_asm, but haven't been able to work around this, as you said 1.72.0 (and nightly-2023-06-16+) produces infinite recursion on memset here.

@Nilstrieb Nilstrieb added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Sep 4, 2023
@Nilstrieb
Copy link
Member

There should be flags passed to C compilers to disable this. You can maybe pass them as -Cllvm-args? Not sure about the specifics though.

@MarcusGrass
Copy link
Contributor Author

I am a bit out of my depth here, but I previously encountered something similar https://users.rust-lang.org/t/reliably-working-around-rust-emitting-memset-when-putting-a-slice-on-the-stack/97080/6.
Running with -C link-arg=-fno-builtin-memset doesn't do anything though, my half-baked theory is that there's something changed in the rustc codegen rather than LLVM, since LLVM-versions haven't changed at all between working and not working. But I have no idea what I'm talking about so that could be nonsensical.

@bjorn3
Copy link
Member

bjorn3 commented Sep 5, 2023

Compiler-builtins uses #![no_builtins] to prevent LLVM from inserting calls to compiler intrinsics. By the way is there a reason you are not using compiler-builtins itself with the mem feature enabled? That provides all these methods for you.

@MarcusGrass
Copy link
Contributor Author

Compiler-builtins uses #![no_builtins] to prevent LLVM from inserting calls to compiler intrinsics. By the way is there a reason you are not using compiler-builtins itself with the mem feature enabled? That provides all these methods for you.

Ah I see, yeah the reason for not using it is because I won't be able to compile on stable if I do, compiler builtins uses a bunch of features that are never planned for stabilization so then I'd need to be on nightly forever. Previously this was fine on stable, I just needed to provide the symbols

@MarcusGrass
Copy link
Contributor Author

Compiler-builtins uses #![no_builtins] to prevent LLVM from inserting calls to compiler intrinsics. By the way is there a reason you are not using compiler-builtins itself with the mem feature enabled? That provides all these methods for you.

#![no_bulitins]
On the crate-root fixes the issue and it still runs on stable, thanks a lot! Might need to sub-crate the symbols to not spread that too far though.

@apiraino apiraino removed the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Nov 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. regression-untriaged Untriaged performance or correctness regression. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants