Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are statics confined to the size indicated by their type? #259

Open
RalfJung opened this issue Nov 28, 2020 · 10 comments
Open

Are statics confined to the size indicated by their type? #259

RalfJung opened this issue Nov 28, 2020 · 10 comments
Labels
A-memory Topic: Related to memory accesses C-open-question Category: An open question that we should revisit

Comments

@RalfJung
Copy link
Member

Occasionally, people ask about a pattern where a program has extern statics like

extern "C" { 
  static mut BEGIN: u32;
  static mut END: u32;
}

and the intention is not that these are two integers, but that these indicate an array of integers between the two addresses.

Is that legal, or may the compiler assume that accesses outside of the u32 are out-of-bounds and thus UB?
That boils down to two questions:

  • Does LLVM make any assumption based on the type of a static, that that static is as big as the type says so accesses beyond that size are UB?
  • If no, does Rust introduce such an assumption?

The former is something that hopefully the LLVM docs can answer. If the answer is "yes", Rust would have to follow suit or lobby for changing LLVM. If the answer is "no", we can decide either way.

The pattern is also a bit weird in that it promises that there are at least 4 bytes of memory available at begin and end each. If the actual region of memory is smaller than that, I am fairly sure that at least any use of the static is UB -- BEGIN denotes a place of size 4, and it is (currently) UB to create dangling places. If we wanted to change this we again need to start by figuring out what LLVM's rules are.

@RalfJung RalfJung added A-memory Topic: Related to memory accesses C-open-question Category: An open question that we should revisit labels Nov 28, 2020
@bjorn3
Copy link
Member

bjorn3 commented Nov 28, 2020

Does LLVM make any assumption based on the type of a static, that that static is as big as the type says so accesses beyond that size are UB?

It doesn't seem so for as long as you use one of a certain set of linkage types that you can only achieve using #[linkage].

https://llvm.org/docs/LangRef.html#global-variables

For global variables declarations, as well as definitions that may be replaced at link time (linkonce, weak, extern_weak and common linkage types), LLVM makes no assumptions about the allocation size of the variables, except that they may not overlap.


If no, does Rust introduce such an assumption?

Yes

#![feature(linkage)]

extern "C" {
    #[linkage = "weak"]
    static mut BEGIN: *mut ();
    #[linkage = "weak"]
    static mut END: *mut ();
}

pub fn memset() {
    unsafe {
        std::ptr::write_bytes(BEGIN, 42, (END as *mut _ as usize) - (BEGIN as *mut _ as usize))
    }
}
define void @_ZN7example6memset17hc68914140fa4fa7bE() unnamed_addr #0 !dbg !6 {
  ret void, !dbg !10
}
#![feature(linkage)]

extern "C" {
    #[linkage = "weak"]
    static mut BEGIN: *mut u8;
    #[linkage = "weak"]
    static mut END: *mut u8;
}

pub fn memset() {
    unsafe {
        std::ptr::write_bytes(BEGIN, 42, (END as *mut _ as usize) - (BEGIN as *mut _ as usize))
    }
}
@BEGIN = weak global i8
@END = weak global i8

define void @_ZN7example6memset17hc68914140fa4fa7bE() unnamed_addr #0 !dbg !6 {
start:
  tail call void @llvm.memset.p0i8.i64(i8* nonnull align 1 @BEGIN, i8 42, i64 sub (i64 ptrtoint (i8* @END to i64), i64 ptrtoint (i8* @BEGIN to i64)), i1 false) #2, !dbg !10
  ret void, !dbg !16
}

@RalfJung
Copy link
Member Author

@bjorn3 your first Rust version is wrong, you need to use BEGIN as *mut u8 or else it'll write N * 0 bytes so of course nothing happens.

The 2nd version seems fine, right? When you just paste some code without comments I have no clue what it is you want to demonstrate with that code. :)

@bjorn3
Copy link
Member

bjorn3 commented Nov 28, 2020

@bjorn3 your first Rust version is wrong, you need to use BEGIN as *mut u8 or else it'll write N * 0 bytes so of course nothing happens.

oops

The 2nd version seems fine, right? When you just paste some code without comments I have no clue what it is you want to demonstrate with that code. :)

I wanted to demonstrate the difference, but you already found why it behaves different.

@comex
Copy link

comex commented Nov 28, 2020

Note that in this pattern, the address of END is generally a “one past the end” pointer - i.e. there are not expected to be any bytes available there, except by chance.

But it seems like any issues with that could be worked around by declaring END as a ZST.

@RalfJung
Copy link
Member Author

But it seems like any issues with that could be worked around by declaring END as a ZST.

Further, if the range is empty, BEGIN would also be "one past the end". That's why I think both should be a ZST.

@cbiffle
Copy link

cbiffle commented Dec 10, 2020

Edit: Bah, after writing this I have noticed your bug report over in cortex-m-rt where you pointed this out already.

Original:
This is quite relevant to the embedded end of things.

Your discussion suggests to me that the code found in, e.g., cortex-m-rt's reset routine may be unsound. All the s objects are lying about their size (which can be zero!) and all of the e objects don't really exist (they are one-past-the-end symbols generated by the linker script).

Separately, the way they are made to not exist in their current linker script ensures that they will alias whatever static happens to appear next in link order. (e.g. _edata aliases _sbss.) Which seems pretty bad independent of the decision y'all make here.

Making the symbols types ZSTs and eliminating the aliasing seems like a good idea. That would wind up with code like this (slightly simplified from the upstream impl):

    extern "C" {

        // These symbols come from `link.x` and assume that
        // data, bss, and uninit appear in exactly that order. Each
        // section must begin and end at a 4-byte boundary.
        static mut __sdata: ();
        static mut __sbss: ();
        static mut __suninit: ();
    }

    // Initialize RAM
    r0::zero_bss(&mut __sbss as *mut u32, &mut __suninit as *mut u32);
    r0::init_data(&mut __sdata as *mut u32, &mut __sbss as *mut u32, &__sidata as *const u32);

Now, a thing to note there: it is entirely possible for __sdata, __sbss, and __suninit to alias, if neither the data nor BSS section contain any data. Because we have to &mut them to get their addresses, that means the code at the end there is producing &muts that may alias each other. That's generally a bad thing -- does that change for ZSTs?

Assuming it does not, this routine seems to either need to require that the linker script insert some minimal dummy data in each section to stop aliasing, or export pointer-valued statics it can load from instead of ZST symbols at boundaries....

cbiffle added a commit to oxidecomputer/hubris that referenced this issue Dec 11, 2020
r0 is likely unsound. This replaces the data/bss init with an assembly
routine that is guaranteed not to get mangled by the compiler.

Context:
rust-lang/unsafe-code-guidelines#259
cbiffle added a commit to oxidecomputer/hubris that referenced this issue Dec 11, 2020
r0 is likely unsound. This replaces the data/bss init with an assembly
routine that is guaranteed not to get mangled by the compiler.

Context:
rust-lang/unsafe-code-guidelines#259
@RalfJung
Copy link
Member Author

RalfJung commented Dec 12, 2020

Because we have to &mut them to get their addresses, that means the code at the end there is producing &muts that may alias each other. That's generally a bad thing -- does that change for ZSTs?

Aliasing is about references pointing to overlapping memory regions. ZST references cannot overlap with anything as they point to 0 bytes, i.e., to no memory at all.
It is even possible in safe code to obtain multiple &mut () that point to the same address, such as

let mut x = ((), ());
let ref1 = &mut x.1;
let ref1 = &mut x.2;

IOW, aliasing is not a concern with ZST pointers.

@ds84182
Copy link

ds84182 commented Feb 23, 2023

Moving from rust-lang/rust#107975, can the reference be updated regarding the behavior of statics around aliasing and linker scripts? From the above discussion it's clear that extern statics should be declared as ZSTs when used to demarcate sections of memory.

@nikic
Copy link

nikic commented Feb 27, 2023

Just wanted to note that the LangRef wording has since been updated to reflect reality:

For global variable declarations, as well as definitions that may be replaced at link time (linkonce, weak, extern_weak and common linkage types), the allocation size and alignment of the definition it resolves to must be greater than or equal to that of the declaration or replaceable definition, otherwise the behavior is undefined.

That is, accesses past the declared size are not generally UB, but the global must be dereferenceable up to the declared size at least.

@workingjubilee
Copy link

I assume that our semantics follow suit, then. That is, we may, for now, decline to answer the question of "What if you relocate-in a validly initialized instance of something bigger than the static? Can you read past the end of the static then?" but we definitely assert that the originally defined size is dereferenceable (for an immutable static). This much is already implied by our rules regarding extern statics, but we could spell it out with a clue-by-four, for the benefit of the programmers who would prefer to skim rather than closely read the rules regarding sound and unsound behavior.

...wait, does that imply that this also holds for static mut?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-memory Topic: Related to memory accesses C-open-question Category: An open question that we should revisit
Projects
None yet
Development

No branches or pull requests

7 participants