New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validity of references: Memory-related properties #77

Open
RalfJung opened this Issue Jan 10, 2019 · 10 comments

Comments

Projects
None yet
4 participants
@RalfJung
Copy link
Collaborator

RalfJung commented Jan 10, 2019

Discussing the memory-related properties of references: does &T have to point to allocated memory (with at least size_of::<T>() bytes being allocated)? If yes, does the memory have to contain data that satisfies the validity invariant of T?

If the answer to both of these questions is "yes", one consequence is that &! is uninhabited: There is no valid reference of type &!.

Currently, during LLVM lowering, we add a "dereferencable" attribute to references, indicating that the answer to the first question should be "yes". This is a rather unique case in that this is the only case where validity depends on the contents of memory. This opens some new, interesting questions:

  1. I mentioned above that size_of::<T>() many bytes need to be dereferencable. How do we handle unsized types? We could determine the size according to the metadata and the type of the unsized tail. For slices, that's really easy, but for trait objects this involves the vtable, so it would introduce yet another kind of dependy of validity on the memory. However, vtables must not be modified, and they never deallocated (right?), so this is a fairly weak form of dependency where if a pointer was a valid vtable pointer once, then it always will be.

    With more exotic forms of unsized types, this becomes less easy. extern type we can mostly ignore, we cannot even dynamically know their size so we basically can just assume it is 0, and check dereferencability for that. But what about custom DST? I don't think we want to make validity depend on executing arbitrary user-defined code. We could just check validity for the sized prefix of this unsized type, but that would introduce an inconsistency between primitive DST and user-defined custom DST. Is that a problem?

    For unsized types, even the requirement that the pointer be well-aligned becomes subtle because determining alignment has similar issues than determining the size.

  2. What about validity of ManuallyDrop<&T>? ManuallyDrop<T> certainly shares all the bit-level properties of T, because we perform layout optimization on it. But does ManuallyDrop<&T> have to be dereferencable?

@RalfJung

This comment has been minimized.

Copy link
Collaborator Author

RalfJung commented Jan 10, 2019

Notice that there is an alternative that would make validity not depend on memory, while maintaining the dereferencable attribute: the Stacked Borrows aliasing model includes an operation on references called "retagging" that, among other things, raises UB if the reference is not dereferencable. So, if we answer the two questions from the OP with "yes" and "no", respectively, we could equivalently say that validity does not make any requirements about references being dereferencable, but the aliasing model does. That would make validity be a property that only depends on raw bits, not on memory, which would simplify the discussion elsewhere (and resolve #50 (comment)).

With this approach, the properties of ManuallyDrop<&T> would be determined by whether retagging descends into the fields of ManuallyDrop or not.

@RalfJung

This comment has been minimized.

Copy link
Collaborator Author

RalfJung commented Jan 10, 2019

Concerning the second question, my personal thinking is that we should not require the pointed-to memory to be valid itself.

One good argument for making things UB with a strict validity invariant is bug-checking tools, but in this case actually doing recursive checking of references all the time is really costly, and probably makes it near impossible to actually develop useful tools.

On the other hand, it is very useful when writing unsafe code to be able to pass around a &mut T to some uninitialized data, and have another function write into that reference to initialize it. If we say that valid references must point to valid data, this pattern becomes UB. As a consequence, then, we should offer tons of new APIs in libstd that take raw pointers instead of references, so that they can be used for initialization.

Some examples:

@nikomatsakis

This comment has been minimized.

Copy link
Collaborator

nikomatsakis commented Jan 31, 2019

So @arielb1, for example, has traditionally maintained that having &T require that its referent is valid would invalidate far too much code. I'm inclined to agree. I think that our idea for ! patterns kind of assuaged my concerns about how to handle &!, so I feel comfortable with making the validity invariant shallow. (The argument about &mut T to an uninitialized T is also strong.)

I am also intrigued by this comment from @RalfJung :

Notice that there is an alternative that would make validity not depend on memory,

That seems like a very good property to have. I am inclined to pursue this approach, personally.

@nagisa

This comment has been minimized.

Copy link

nagisa commented Feb 4, 2019

I (personally) think that not considering & to be "pointers" is the only sensible solution here (similar to how C++ does it, references behave more like plain values rather than pointers). My motivating example is that given a function like this:

fn generic<T>(foo: &T) {
    // body
}

it is way too easy to end up with something that will most likely indefinitely stay UB for T = ! where code would otherwise be valid for all other Ts. Making &! uninhabited avoids this problem altogether and we may be able to relax this later on if we figure out that:

  1. There are convincing use cases for &! not be uninhabited;
  2. Figure out how to make all the safe constructs for &T be defined with &!.
@arielb1

This comment has been minimized.

Copy link

arielb1 commented Feb 4, 2019

it is way too easy to end up with something that will most likely indefinitely stay UB for T = ! where code would otherwise be valid for all other Ts.

Could you come up with such an example - that is UB for T = ! but not UB for say T = bool and foo containing a pointer to an invalid bool?

@nagisa

This comment has been minimized.

Copy link

nagisa commented Feb 4, 2019

@arielb1 I do not think I’m able to come up with an example (is there one?) where it would not be UB if bool had invalid bit pattern, but at least it is possible to produce a reference to anything valid at all for &bool.

I realized since I last wrote the comment that, in order to obtain &!, unsafe code is necessary one way or the other (even though @RalfJung says this is a property of the safety system, not value validity system, and those are independent). With that in mind, I’m fine with whatever ends up being decided here.

@RalfJung

This comment has been minimized.

Copy link
Collaborator Author

RalfJung commented Feb 5, 2019

We talked about this at the all-hands. @cramertj expressed interest in &! being uninhabited to be able to optimize functions away for being dead code. @Centril noted that in particular related to matches, if we just make validity of &T recursive, there is no question about automatically going below reference types in a match, such as in

fn foo<T>(x: &!) -> T { match x { } }

Even in unsafe code, the match can never cause issues on its own, the reference would already be invalid and hence you'd have UB earlier.

I believe we should handle all types consistently, meaning that if &! is uninhabited (from a validity perspective, not just from a safety perspective), then we should also say that &bool is UB if it does not point to a valid bool, and so on.

One issue with this is that this makes validity very hard to check for in a UB checker like Miri, or in a valgrind tool. You'd have to do a recursive walk following all the pointers. Also, it is unclear how much optimizations benefit from this (beyond removing dead code for &!) because a value that used to be valid at some point, might become invalid later when the contents of memory change.
Also, new hard questions then pop up about the interaction with Stacked Borrows, where I think it might be hard to make sure that transitively through the pointer chain, all the aliasing works out the right way. Retagging is currently a key ingredient for this, but if we do this transitively we'd have to Retag references that are stored in memory, which I don't think we want to do -- magically modifying memory seems like a bad idea.

@RalfJung

This comment has been minimized.

Copy link
Collaborator Author

RalfJung commented Feb 5, 2019

it is way too easy to end up with something that will most likely indefinitely stay UB for T = ! where code would otherwise be valid for all other Ts. Making &! uninhabited avoids this problem altogether.

I don't understand what you are saying here. Making &! uninhabited makes strictly more programs UB? How is that supposed to solve problems with programs being UB?

@RalfJung

This comment has been minimized.

Copy link
Collaborator Author

RalfJung commented Feb 5, 2019

Also one thing @Centril brought up at the all-hands: we need more data. In particular, we should figure out if there are interesting patterns of unsafe code that rely on having references to invalid data, and that would be too disruptive to convert to raw pointers or too widely used to break.

@RalfJung

This comment has been minimized.

Copy link
Collaborator Author

RalfJung commented Feb 8, 2019

One issue with requiring references to be transitively valid: we have a whole bunch of existing reference-based APIs, such as for slices, that we could then not use. I expect this to cause a lot of trouble with existing code, but I am not sure.


Another proposal for references that enables @cramertj's optimizations could be: if reference's validity depends on memory in complex ways, we will need a notion of "bitstring validity". (Avoiding that is one argument for shallow validity, IMO.) We could define validity of a reference to require that the pointee is bitstring valid. This makes checking validity feasible and enables some optimizations. However, it would mean that &! is uninhabited while &&! is not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment