Skip to content
This repository has been archived by the owner on Apr 5, 2024. It is now read-only.

assume no "pointer arithmetic" performed on an &T #19

Closed
nikomatsakis opened this issue Sep 2, 2016 · 11 comments
Closed

assume no "pointer arithmetic" performed on an &T #19

nikomatsakis opened this issue Sep 2, 2016 · 11 comments

Comments

@nikomatsakis
Copy link
Contributor

It is a very powerful optimization for us to assume that the function foo here cannot access a or c:

let x = Foo { a: 22, b: 33, c: 44 };
foo(&x.b);

@pcwalton reports that by default in C or C++ -- or at least in LLVM -- it would be legal to have a function like this:

void foo(unsigned *x) {
    x[1] += 1;
}

and then to invoke it on foo(&x.b) in a similar scenario.

Telling LLVM to assume that Rust functions don't play such games appears likely to have a significant impact on our ability to optimize. I'll report some numbers when @pcwalton is more sure of them. :)

@strega-nil
Copy link

That makes me very uncomfortable. You have an object of the correct size, and you're accessing it through a pointer which is safely derived.

@eternaleye
Copy link

eternaleye commented Sep 2, 2016

Hm; that implies either a much stricter handling of pointers in unsafe {} than the C model, or that this assumption must be disabled at function scope for functions containing unsafe {} (perhaps even transitively).

In particular, in C, it's defined so long as the (offset) pointer is still part of the same "object" (read: allocation), while this model basically introduces a subtyping relation on "objects" (in the C sense).

One pattern this would break hard is container_of() - a pattern I suspect is used in implementing intrusive collections in Rust, not just C.

container_of() not only makes use of offset-pointer-inside-same-allocation being valid, it does so wiith a negative offset.

@Amanieu
Copy link
Member

Amanieu commented Sep 3, 2016

I use a variant of container_of in intrusive-collections.

@nikomatsakis
Copy link
Contributor Author

This code which does not cross function boundaries does a very similar thing:

let mut s = Struct { x: 0, y: 1, z: 2 };
let p: *mut i32 = &mut s.x;
*p.offset(1) += 1;

And yet I feel differently at fn boundaries. I think this is because I think about functions in terms of permissions. Having a safe function like this:

pub fn foo(p: &mut i32) {
    let q: *mut i32 = p;
    *q.offset(1) += 1;
}

seems completely and obviously wrong. It exceeds the permissions that was given to it (which applied to *p, not *(p+1), and hence (in my mind) is an illegal function.

I think before we are going to be able to settle any of these kinds of questions, we have to lay out the higher-level views that we are coming from and try to reach some agreement there.

@strega-nil
Copy link

I don't think function boundaries should do anything (beyond create an extra scope), for UB purposes. If people are pulling things out into functions, their code must not become undefined, IMHO.

@burdges
Copy link

burdges commented Sep 3, 2016

I imagine this concerns only repr(Rust) given this repository and that it'd break C stuff.

Could these container_of style applications be dealt with by borrowing &x along with &x.b? I'm thinking along the lines of owning_ref or maybe some phantom &x?

@eternaleye
Copy link

I'll note that I want this to be enabled - I'm just not sure how viable it is given existing code.

@Amanieu: In the specific case of intrusive containers, how viable would it be to, instead of having Adaptor<Link> use an associated type for Container, instead be implemented on Self = Container? In that case, get_container simply becomes self. As far as I can tell, that looks workable (I've only looked at Cursor though).

@Amanieu
Copy link
Member

Amanieu commented Sep 3, 2016

@eternaleye That doesn't work since a Container may contain an arbitrary number of links Links. This is the case when a single object is a member of multiple intrusive collections.

@Amanieu
Copy link
Member

Amanieu commented Sep 3, 2016

After looking through my code a bit, I don't think it will be affected by this since it uses *const Link everywhere instead of &Link. However it is treading a very fine line.

@RalfJung
Copy link
Member

RalfJung commented Sep 13, 2016

I would argue this is yet another instance of how closely we "trust types", just with opposite "polarity". In the blog post, Niko discussed whether inside the body of a function that takes v: &usize as argument, that function can trust the target of v not to be mutated even by unknown code. Here, we discuss whether the caller of such a function that just takes a v: &mut usize as argument can rely on the function not to touch any memory except for globally reachable memory and the memory range indicated by value and type of its argument. In some sense, these are two sides of the same medal. (In fact, if we translate the function type to a separation logic Hoare triple, the answer to both questions would be "Yes, you can trust": A separation logic triple would give the function the permission to access v and rely on no conflicting accesses to happen, while also promising to the context that nothing else will be needed or touched by the function. This is called the "frame rule".)

A function that takes a v: &mut usize and accesses more than just the 4/8 bytes covered by the usize itself, is very much like a function that takes a raw pointer and turns it into a Box<T>: The function actually relies on more than is guaranteed by the type system (i.e., by the general contract that covers all safe interfaces). The function is just inherently unsafe to call, and should be marked as such.

It seems to me that whether the compiler is allowed to rely on this assumption ("functions that need more than is given by their arguments are unsafe") for optimizations again depends on whether this is an "unsafe context" or not -- generally, we want such optimizations as there is some potential for producing faster code, but for people writing unsafe code this may be a very dangerous prospect.

@RalfJung
Copy link
Member

I am pretty sure that this is not allowed in C -- see e.g. this formalization of the C object memory model, which clearly forbids the original code.

In Rust, Stacked Borrows proposes to forbid this. Some related issues:

I will close this issue since the above seem to cover the relevant aspects.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants