Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assume no "pointer arithmetic" performed on an `&T` #19

Open
nikomatsakis opened this Issue Sep 2, 2016 · 10 comments

Comments

Projects
None yet
6 participants
@nikomatsakis
Copy link
Owner

nikomatsakis commented Sep 2, 2016

It is a very powerful optimization for us to assume that the function foo here cannot access a or c:

let x = Foo { a: 22, b: 33, c: 44 };
foo(&x.b);

@pcwalton reports that by default in C or C++ -- or at least in LLVM -- it would be legal to have a function like this:

void foo(unsigned *x) {
    x[1] += 1;
}

and then to invoke it on foo(&x.b) in a similar scenario.

Telling LLVM to assume that Rust functions don't play such games appears likely to have a significant impact on our ability to optimize. I'll report some numbers when @pcwalton is more sure of them. :)

@ubsan

This comment has been minimized.

Copy link
Collaborator

ubsan commented Sep 2, 2016

That makes me very uncomfortable. You have an object of the correct size, and you're accessing it through a pointer which is safely derived.

@eternaleye

This comment has been minimized.

Copy link
Collaborator

eternaleye commented Sep 2, 2016

Hm; that implies either a much stricter handling of pointers in unsafe {} than the C model, or that this assumption must be disabled at function scope for functions containing unsafe {} (perhaps even transitively).

In particular, in C, it's defined so long as the (offset) pointer is still part of the same "object" (read: allocation), while this model basically introduces a subtyping relation on "objects" (in the C sense).

One pattern this would break hard is container_of() - a pattern I suspect is used in implementing intrusive collections in Rust, not just C.

container_of() not only makes use of offset-pointer-inside-same-allocation being valid, it does so wiith a negative offset.

@Amanieu

This comment has been minimized.

Copy link

Amanieu commented Sep 3, 2016

I use a variant of container_of in intrusive-collections.

@nikomatsakis

This comment has been minimized.

Copy link
Owner Author

nikomatsakis commented Sep 3, 2016

This code which does not cross function boundaries does a very similar thing:

let mut s = Struct { x: 0, y: 1, z: 2 };
let p: *mut i32 = &mut s.x;
*p.offset(1) += 1;

And yet I feel differently at fn boundaries. I think this is because I think about functions in terms of permissions. Having a safe function like this:

pub fn foo(p: &mut i32) {
    let q: *mut i32 = p;
    *q.offset(1) += 1;
}

seems completely and obviously wrong. It exceeds the permissions that was given to it (which applied to *p, not *(p+1), and hence (in my mind) is an illegal function.

I think before we are going to be able to settle any of these kinds of questions, we have to lay out the higher-level views that we are coming from and try to reach some agreement there.

@nikomatsakis nikomatsakis referenced this issue Sep 3, 2016

Open

Canvas unsafe code in the wild #18

0 of 9 tasks complete
@ubsan

This comment has been minimized.

Copy link
Collaborator

ubsan commented Sep 3, 2016

I don't think function boundaries should do anything (beyond create an extra scope), for UB purposes. If people are pulling things out into functions, their code must not become undefined, IMHO.

@burdges

This comment has been minimized.

Copy link

burdges commented Sep 3, 2016

I imagine this concerns only repr(Rust) given this repository and that it'd break C stuff.

Could these container_of style applications be dealt with by borrowing &x along with &x.b? I'm thinking along the lines of owning_ref or maybe some phantom &x?

@eternaleye

This comment has been minimized.

Copy link
Collaborator

eternaleye commented Sep 3, 2016

I'll note that I want this to be enabled - I'm just not sure how viable it is given existing code.

@Amanieu: In the specific case of intrusive containers, how viable would it be to, instead of having Adaptor<Link> use an associated type for Container, instead be implemented on Self = Container? In that case, get_container simply becomes self. As far as I can tell, that looks workable (I've only looked at Cursor though).

@Amanieu

This comment has been minimized.

Copy link

Amanieu commented Sep 3, 2016

@eternaleye That doesn't work since a Container may contain an arbitrary number of links Links. This is the case when a single object is a member of multiple intrusive collections.

@Amanieu

This comment has been minimized.

Copy link

Amanieu commented Sep 3, 2016

After looking through my code a bit, I don't think it will be affected by this since it uses *const Link everywhere instead of &Link. However it is treading a very fine line.

@RalfJung

This comment has been minimized.

Copy link
Collaborator

RalfJung commented Sep 13, 2016

I would argue this is yet another instance of how closely we "trust types", just with opposite "polarity". In the blog post, Niko discussed whether inside the body of a function that takes v: &usize as argument, that function can trust the target of v not to be mutated even by unknown code. Here, we discuss whether the caller of such a function that just takes a v: &mut usize as argument can rely on the function not to touch any memory except for globally reachable memory and the memory range indicated by value and type of its argument. In some sense, these are two sides of the same medal. (In fact, if we translate the function type to a separation logic Hoare triple, the answer to both questions would be "Yes, you can trust": A separation logic triple would give the function the permission to access v and rely on no conflicting accesses to happen, while also promising to the context that nothing else will be needed or touched by the function. This is called the "frame rule".)

A function that takes a v: &mut usize and accesses more than just the 4/8 bytes covered by the usize itself, is very much like a function that takes a raw pointer and turns it into a Box<T>: The function actually relies on more than is guaranteed by the type system (i.e., by the general contract that covers all safe interfaces). The function is just inherently unsafe to call, and should be marked as such.

It seems to me that whether the compiler is allowed to rely on this assumption ("functions that need more than is given by their arguments are unsafe") for optimizations again depends on whether this is an "unsafe context" or not -- generally, we want such optimizations as there is some potential for producing faster code, but for people writing unsafe code this may be a very dangerous prospect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.