New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asserting-Conflicting-Access Model #26

Open
nikomatsakis opened this Issue Sep 12, 2016 · 11 comments

Comments

Projects
None yet
4 participants
@nikomatsakis
Owner

nikomatsakis commented Sep 12, 2016

This is my best effort to describe the model put forward by @arielb1 and @ubsan. Naturally they should correct any misconceptions that I have. We should edit this description to be accurate. Also, we need a catchy name. =) I'm calling it the Asserting-Conflicting-Access (ACA) model based on its core rule.

The Aliasing Rule

The heart of the model is "the aliasing rule":

If a reference r (the head reference) with lifetime 'r is created, then afterwards, before 'r ends but at no particular order

  • Some memory location l is accessed through a borrow-chain from r - this is the asserting access.
  • The same memory location is accessed in any other way - this is the conflicting access.
  • One of these accesses is a write.
    Then an aliasing access has occured.

Aliasing accesses are UB, unless there exists a reference s to l with lifetime 's, such that

  • s was created by a borrow-chain from r
  • s has sufficient permissions for the conflicting access.
  • 's is alive at the point of access.
  • s is potentially alive. This means either:
    • s is an immutable reference
    • s is the reference used for the conflicting access
    • s was converted to a raw pointer before the conflicting access

In that case, s is called the guaranteeing reference.

Note the use of lifetimes in the model. From conversation with @arielb1, I gather that the intution is that, in a safe function, the lifetimes inferred by the compiler are to be used. But in unsafe code, the compiler must assume that the user could have chosen whatever (legal) lifetimes they would like. These lifetimes would still be subject to the basic constraints of the type system.

Facets

The model is access-based and defined in terms of undefined behavior. That is, the mere possession of a reference or overlapping pointer does not create undefined behavior. What creates undefined behavior is the conflicting use thereof. Moreover, undefined behavior does not necessarily define "blame" -- that is, the model can tell you what trace is illegal, but it can't always tell you if a function will lead to undefined behavior in isolation.

Intuition

I think the best intuition is to think of having permission to access memory at a particular point. Permissions flow from safe references. Raw pointers created from a reference get the permission of the reference they came from. When we exit the lifetime of a safe reference s that was borrowed from borrowed from r, the permissions for s return to r.

Example 1

Shows a mutable reference a that is aliased by a raw pointer p. a is then reborrowed to b. Using b is ok and it's ok to use p after we're done using b.

unsafe {
    let mut a: &mut i32 = i32;
    let p: *mut i32 = a;

    *a += 1; // asserting access

    {
      // This is ok, because `p` derives from `a`.
      // Therefore although `*p` is a conflicting access,
      // there exists a reference (`a`) with a lifetime that
      // covers this access, sufficient permissions, and which
      // was converted to a raw pointer.
      *p += 1;
    }

    {
      // `b` is derived from a borrow chain
      let mut b: &mut i32 = &mut *a;

      // Conflicting access relative to `*a += 1` above.
      // OK because of the existence of reference `b`.
      //
      // Also serves as "asserting access" to the next line.
      *b += 1;

      // Conflicting access. Is this UB? 
      // There exists no reference created by a borrow chain from `b`.
      // So the question hinges on whether the lifetime of `b` includes
      // this point. Because unsafe code authors can pick the lifetimes
      // they want, the answer would presumably be no.
      //
      // Intuitively, this makes sense, because you can think of the lifetimes
      // as forming a kind of stack, but the unsafe code author gets to pick where the "pop"
      // occurs. So by defining (and then using) `b`, we pushed `b` onto the
      // stack, but here we can say we popped it.
      *p += 1;
    }

}

Example 2

Like example 1, except that we intermingle uses of b and p. I think this is expected to be illegal, but I can't figure out how it plays out in terms of the formal language (@arielb1 or @ubsan, can you interpret this for me?).

unsafe {
    let mut a: &mut i32 = i32;
    let p: *mut i32 = a;

    *a += 1; // asserting access

    {
      let mut b: &mut i32 = &mut *a;

      *b += 1;

      // Can this serve as an asserting access? I can't
      // tell if accesses through a raw pointer count
      // as being accessed through a "borrow chain" from `a`.
      // I think the answer is no.
      *p += 1;

      *b += 1;
    }

}

The reason I expect this to be illegal is that it could correspond to something like:

fn patsy<F>(b: &mut i32, f: F) where F: FnOnce() {
    *b += 1;
    f(); // should not affect `*b`
    *b += 1;
 }

being invoked like:

 fn villain(a: &mut i32) {
     let p: *mut i32 = a;
     patsy(a, || *p += 1)
 }

@nikomatsakis nikomatsakis changed the title from Asserting-Conflicating-Access Model to Asserting-Conflicting-Access Model Sep 12, 2016

@nikomatsakis

This comment has been minimized.

Owner

nikomatsakis commented Sep 12, 2016

I noted some concerns and a question in my recent blog post. Here is the relevant section, excerpted and lightly edited. The question comes at the end.


Lifetimes are significant to the model, but you can't trust the compiler's inference in unsafe code. There, we have to assume that the unsafe code author is free to pick any valid lifetime, so long as it would still type check (not "borrow check" -- i.e., it only has to ensure that no data outlives its owning scope).

Nonetheless, I have concerns about this formulation because it seems to assume that the logic for unsafe code can be expressed in terms of Rust's lifetimes -- but as I wrote above Rust's lifetimes are really a conservative approximation. As we improve our type system, they can change and become more precise -- and users might have in mind more precise and flow-dependent lifetimes still. In particular, it seems like the "ACA" would disallow my alloc_free2 example:

fn alloc_free2() {
    unsafe {
        let p: *mut i32 = allocate_an_integer();
        let q: &i32 = &*p;
        let r = *q; // (1)
        if condition1() {
            free(p); // (2)
        }
        if condition2() {
            use(r); // (3)
            if condition3() {
                use_again(*q); // (4)
            }
        }
    }
}

Intuitively, the problem is that the lifetime of q must enclose the points (1), (2), (3), and (4) that are commented above. But the user knows that condition1() and condition3() are mutually exclusive, so in their mind, the lifetime ends either when we reach point (2), since they know that this means that point (4) is unreachable.

In terms of their model, the conflicting access would be (2) and the asserting access would be (1). But I might be misunderstanding how this whole thing works.

@Ericson2314

This comment has been minimized.

Ericson2314 commented Sep 12, 2016

If we do "lifetimes as sets of paths", I'd hope users could manually write out in all the endpoints. And if users manually specify a lifetime in unsafe code and mark it safe, then we could use it for optimization!

@arielb1

This comment has been minimized.

Collaborator

arielb1 commented Sep 12, 2016

@ubsan is worried that the model does not interact well with LLVM, as the model permits this kind of code:

fn example(foo_addr: *mut usize) -> usize {
    let mut data = 0;
    if pointers_equal(&mut data as *mut _, foo_addr) {
        unsafe { *foo_addr = 42; }
    }
    data
}

He is convinced we should track derived information through everything and not have broadcasts, including both unsafe pointers and integers (am I right?).

However, unless we have an exotic way of defining derivation, even that is incompatible with LLVM's instcombine in cases like this:

pub fn example(foo_addr: usize) -> usize {
    let mut data = 0;
    if pointers_equal(&mut data as *mut _, foo_addr) {
        data = 1;
        unsafe {
            *((foo_addr + (0*(&mut data as *const _ as usize))) as *mut usize) = 42;
        }
    }
    data
}
@nikomatsakis

This comment has been minimized.

Owner

nikomatsakis commented Sep 12, 2016

@arielb1 this is an interesting example:

fn example(foo_addr: *mut usize) -> usize {
    let mut data = 0;
    if pointers_equal(&mut data as *mut _, foo_addr) {
        unsafe { *foo_addr = 42; }
    }
    data
}

Do we have an issue for it? Probably not. I've often talked informally about how, in the Tootsie Pop Model, you can treat your machine (within unsafe code) as "a simple turing machine". This kind of code is precisely what I was thinking of ...

@arielb1

This comment has been minimized.

Collaborator

arielb1 commented Sep 12, 2016

s is potentially alive.

I think "s potentially caused that access" is the better way to term that requirement. I'm not sure why I added the "s is an immutable reference" case - seems like a mistake.

BTW, I would like to mention that this model is intended for the language spec. It is emphatically not supposed to be closed under any class of endooptimizations - for that we should have a weaker MIR memory model and the even-weaker LLVM memory model.

Also, there is no ordering requirement between the asserting and conflicting access - only that both of them must fulfill the conditions at their respective time of access

Example 1

Intuitively, this makes sense, because you can think of the lifetimes as forming a kind of stack, but the unsafe code author gets to pick where the "pop" occurs.

Sure, and that location might be chosen non-deterministically at runtime, although it must be inside the same function.

Example 2

unsafe {
    let mut a: &mut i32 = i32;
    let p: *mut i32 = a;

    *a += 1; // asserting access

    {
      let mut b: &mut i32 = &mut *a;

      *b += 1;

      // Can this serve as an asserting access? I can't
      // tell if accesses through a raw pointer count
      // as being accessed through a "borrow chain" from `a`.
      // I think the answer is no.
      *p += 1;

      *b += 1;
    }
}

Can this serve as an asserting access?

This does not matter because a has been already asserted for its entire lifetime by the first write. What that access does is conflict with both (asserting) accesses to b - and there is no guarantor derived from b to prevent UB.

@nikomatsakis

This comment has been minimized.

Owner

nikomatsakis commented Sep 18, 2016

So I think that one bit of confusion for me might have been what is meant by "lifetime". My assumption was that lifetime here was referring to the lifetimes that the Rust compiler computes. But I realize that perhaps lifetime was meant to be defined more in terms of the dynamic trace -- in other words, the span between the creation of a reference and one of its uses (if we ignore other threads for a moment and so forth).

This would explain for example why @ubsan mentioned (on IRC) that the alloc_free2 example I cited earlier is legal. Certainly if one defines "lifetime" in more dynamic terms, everything should be ok. But in that case @arielb1 our talk of "trusting lifetimes in safe code" doesn't make sense to me.

He is convinced we should track derived information through everything and not have broadcasts, including both unsafe pointers and integers (am I right?).

Here, can you say more about what you mean when you say "broadcasts"? I think it is something like this:

fn main() {
    let a: &mut i32 = &mut 3;
    let b: *mut i32 = a;
    let c: &mut i32 = a;
    let d: *mut i32 = c;
    // can I use `b` here?
}

I think that in this case, the question is whether having coerced c into the raw pointer d gives us the right to use b, a raw pointer which was coerced earlier from a parent of c (a, specifically). Is this correct?

@arielb1

This comment has been minimized.

Collaborator

arielb1 commented Sep 18, 2016

Here, can you say more about what you mean when you say "broadcasts"? I think it is something like this:

fn main() {
    let a: &mut i32 = &mut 3;
    let b: *mut i32 = a;
    let c: &mut i32 = a;
    let d: *mut i32 = c;
    // can I use `b` here?
}

Broadcasts, as in the intptrcast model, mean that when you create a raw pointer, all accesses can happen "as if through that raw pointer". In your case, d is a raw pointer, so all accesses (including accesses from b) can happen "as if through b".

This model is very conservative, and has the advantage of not including any "counterfactual derived magic", but by itself is insufficient to allow LLVM's standard optimizations on allocas (so we need another derivation-based restriction).

@nikomatsakis

This comment has been minimized.

Owner

nikomatsakis commented Sep 20, 2016

@arielb1

Thanks for the clarification!

the intptrcast model

Are you referring to something specific, or is this a new term that you coined?

insufficient to allow LLVM's standard optimizations on allocas

Can you elaborate here too? Do we have a nice issue summarizing those interactions? I'd love to get all the facts in your head documented. =)

@eternaleye

This comment has been minimized.

Collaborator

eternaleye commented Sep 20, 2016

@nikomatsakis

intptrcast almost certainly refers to http://sf.snu.ac.kr/intptrcast/

@arielb1

This comment has been minimized.

Collaborator

arielb1 commented Sep 20, 2016

intptrcast is "A Formal C Memory Model Supporting Integer-Pointer Casts" available at http://sf.snu.ac.kr/intptrcast/ (whatistherightwaytociteit?). There is probably some other less-formaly place describing that, but I could not be bothered to look.

Can you elaborate here too? Do we have a nice issue summarizing those interactions? I'd love to get all the facts in your head documented. =)

This function is well-typed in the ACA model (in particular when data==extra), but is liable to be misoptimized with data being noalias:

fn example(data: &mut u32, extra: *mut u32) -> u32 {
    unsafe { if (data as *mut u32) == extra { *extra = 0; } }
    *data
}
@nikomatsakis

This comment has been minimized.

Owner

nikomatsakis commented Sep 21, 2016

Cool. I hadn't read that paper yet. I just did so and took the opportunity to write up a summary in #30 (tagged as K-Related-Work).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment