Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miscompilation with pointer address rountrip through address 0 #107326

Open
lukas-code opened this issue Jan 26, 2023 · 10 comments
Open

Miscompilation with pointer address rountrip through address 0 #107326

lukas-code opened this issue Jan 26, 2023 · 10 comments
Labels
A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-strict-provenance Area: Strict provenance for raw pointers C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@lukas-code
Copy link
Contributor

lukas-code commented Jan 26, 2023

playground

// SAFETY: The pointer must be null.
pub unsafe fn assert_null(ptr: *mut u8) -> *mut u8 {
    if !ptr.is_null() {
        // SAFETY: The caller guarantees that the pointer is null.
        unsafe { core::hint::unreachable_unchecked() }
    }
    ptr
}

pub fn bad(r: &mut u8) -> u8 {
    let ptr: *mut u8 = r;
    let addr = ptr as usize;
    let null = ptr.wrapping_sub(addr);
    
    // SAFETY: `ptr - ptr == null`
    let definitely_null = unsafe { assert_null(null) };
    let ptr2 = definitely_null.wrapping_add(addr);

    // SAFETY: `ptr2` has the same address and provenance as `ptr`
    // and `ptr` was derived from a safe reference
    unsafe { *ptr2 }
}

fn main() {
    println!("{}", bad(&mut 42));
}

The program gets optimized into ud2.

original example:

I tried this code:

#![feature(strict_provenance)]

pub fn bad(r: &mut u8) -> bool {
    let ptr: *mut u8 = r;

    let addr = ptr.expose_addr();
    let null_with_provenance = core::ptr::from_exposed_addr_mut::<u8>(0);
    let ptr2 = null_with_provenance.wrapping_add(addr);

    let val_before = unsafe { *ptr };
    unsafe { *ptr2 = 2; }
    let val_after = unsafe { *ptr };
    return val_before == val_after;
}

fn main() {
    println!("{}", bad(&mut 1));
}

I expected to see this happen: The program always prints false.

Instead, this happened: The program prints true if optimizations are enabled.

The documentation of from_exposed_addr_mut says that the function will "guess" the correct provenance if able, so I expected it to guess ptr.with_addr(0) here.

Miri does not detect undefined behavior in the program.

Meta

playground nightly (2023-01-25 c18a5e8)

@lukas-code lukas-code added the C-bug Category: This is a bug. label Jan 26, 2023
@Nilstrieb
Copy link
Member

Under the proposed provenance model for C, PNVI-ae-udi, this code contains undefined behavior, as from_exposed_addr can only pick up provenance that was expose at the address of the address you're using.
I think LLVM mostly handwaves everything with provenance, not having a provenance model at all, but it's probably closest to the semantics of PNVI-ae-udi here, reasoning that no valid provenance can be exposed at the null pointer, which is quite interesting and justifies this optimization.
This explains why this happens but of course doesn't answer whether we want this to work, which will be a bigger thing for T-opsem :D.
More proof that strict provenance is great and that following it will make your life simpler!

Also for Miri, it doesn't really check int2ptr casts much and has false negatives if you use them, allowing obviously invalid code to pass.

cc @RalfJung

@Nilstrieb Nilstrieb added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools A-strict-provenance Area: Strict provenance for raw pointers labels Jan 26, 2023
@lukas-code
Copy link
Contributor Author

I found another case of this without int-to-ptr casts: (playground link)

// SAFETY: The pointer must be null.
pub unsafe fn assert_null(ptr: *mut u8) -> *mut u8 {
    if !ptr.is_null() {
        // SAFETY: The caller guarantees that the pointer is null.
        unsafe { core::hint::unreachable_unchecked() }
    }
    ptr
}

pub fn bad(r: &mut u8) -> u8 {
    let ptr: *mut u8 = r;
    let addr = ptr as usize;
    let null = ptr.wrapping_sub(addr);
    
    // SAFETY: `ptr - ptr == null`
    let definitely_null = unsafe { assert_null(null) };
    let ptr2 = definitely_null.wrapping_add(addr);

    // SAFETY: `ptr2` has the same address and provenance as `ptr`
    // and `ptr` was derived from a safe reference
    unsafe { *ptr2 }
}

fn main() {
    println!("{}", bad(&mut 42));
}

This should probably work, but LLVM will just "optimize" the entire program to ud2.

@JakobDegen
Copy link
Contributor

We've run into this one before, I think @nikic fixed it in LLVM at the time - this seems like another case of LLVM incorrectly assuming that address == 0 implies pointer == null.

@Nilstrieb Nilstrieb changed the title integer-to-pointer cast picks up wrong provenance Miscompilation with pointer address rountrip through address 0 Jan 26, 2023
@Nilstrieb
Copy link
Member

I edited your example from the comment into the main post and changed the title to reflect that int2ptr casts are not really at the core of this but that the issue is much simpler.

@nikic
Copy link
Contributor

nikic commented Jan 26, 2023

The issue from the original description and the one from the comment are different ones. The original one is that inttoptr 0 is not (necessarily) null, the one from the comment is that it's not (generally) possible to replace pointers based on a dominating equality. Both issues are known.

@RalfJung
Copy link
Member

RalfJung commented Jan 29, 2023

Under the proposed provenance model for C, PNVI-ae-udi, this code contains undefined behavior, as from_exposed_addr can only pick up provenance that was expose at the address of the address you're using.

Are you sure that is the case? My recollection was that any previously exposed provenance can be picked up, independent of its address. I think it is not uncommon to cast ptr2int, do arithmetic, and cast back, and such code is intended to be allowed. The intended semantics for Rust also allow picking up provenance exposed at a different address.

@RalfJung
Copy link
Member

@nikic

Both issues are known.

Are there LLVM issues tracking these?

@nikic
Copy link
Contributor

nikic commented Jan 29, 2023

@RalfJung The equality replacement issue is llvm/llvm-project#34577 (it might be familiar to you!) I don't think the inttoptr 0 vs null issue is tracked anywhere, though it has come up in various discussions in the past.

@lukas-code
Copy link
Contributor Author

PNVI-ae-udi

Quoting from N3005:

Integer-to-pointer conversion An integer-to-pointer conversion (cast) or IO (scanf with "%p") is only defined if the corresponding storage instance had been exposed, and if the result is a pointer to a byte (or one-past) of the storage instance.

@RalfJung
Copy link
Member

@lukas-code that doesn't require the pointer to be at the exact address it was exposed at.

However C generally does not allow any pointers to exist that are not in-bounds of their provenance, so I guess the issue of inttoptr 0 needing to pick up some provenance cannot possibly come up since 0 is never in-bounds of anything. Rust does allow such pointers (they can be easily created in safe code), so it wouldn't make sense to have a similar restriction on Rust's inttoptr.

@Nilstrieb Nilstrieb added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-strict-provenance Area: Strict provenance for raw pointers C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants