Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aliasing rules for Vec<T> and other standard containers #262

Closed
chorman0773 opened this issue Dec 5, 2020 · 14 comments
Closed

Aliasing rules for Vec<T> and other standard containers #262

chorman0773 opened this issue Dec 5, 2020 · 14 comments

Comments

@chorman0773
Copy link
Contributor

chorman0773 commented Dec 5, 2020

This was brought up briefly in #258, but I was wondering about the aliasing rules for Standard Containers, partricularily Vec<T,A>.
Right now, a safety invariant for Vec is that the inner pointer is not aliased (this is necessary, because Vec can modify it, and implements DerefMut). However, as was noted, making this an aliasing invariant may be non-trivial. However, notably, Vec<T,A> has an unstable layout, so an implementation could be (Unique<[T]>,usize,A). My question is, would this be a valid implementation? Would an implementation be permitted to treat Vec<T> specially w.r.t. the stacked borrows aliasing model?

(Note: all of the above explicitly references Vec<T,A> but the question also stands without loss of generality, to all standard library containers. IE. can an implementation assume exclusive ownership of the pointers managed by standard containers, however that may extend, even if the container is unused and mem::forgotten.)

@RalfJung
Copy link
Member

RalfJung commented Dec 6, 2020

However, notably, Vec<T,A> has an unstable layout

Well... quoting from the docs:

Most fundamentally, Vec is and always will be a (pointer, capacity, length) triplet. No more, no less. The order of these fields is completely unspecified, and you should use the appropriate methods to modify these. The pointer will never be null, so this type is null-pointer-optimized.

So, only the order is unstable.

so a valid implementation could be (Unique<[T]>,usize,A)

Is the slice length here len or cap? If it is cap, shouldn't this be Unique<[MaybeUninit<T>]>?


Regarding your higher-level point, I don't have an answer, but I can try to tease this apart into two questions:

  • Vec as implemented in rustc uses Unique<T>. What does that mean in terms of aliasing guarantees? In SB it means nothing, but that's only because SB isn't well-suited to make aliasing assumptions without a known size. This is a problem elsewhere as well (Stacked Borrows: raw pointer usable only for T too strict? #134), so if a "sizeless" way of asserting aliasing surfaces, I will certainly attempt to wire that up with Unique.
  • Looking beyond rustc itself, what is the "library UB" around aliasing for container data types? This is very hard to even say precisely, and I do not have a clear idea of the trade-offs here. Is there any legitimate code that would be ruled out by saying "all containers may assume their inner pointers are not aliased by user-held pointers"? This seems extremely vague so I am not even sure if it is useful.

One data point: for Vec, we actually do allow the user to keep pointers to some interior data as long as they ensure that the capacity does not change. You can see this witnessed by this test. This could be incompatible with your proposal of Unique<[T]>, if the aliasing requirements made by Unique are sufficiently strong.

@chorman0773
Copy link
Contributor Author

chorman0773 commented Dec 6, 2020

Most fundamentally, Vec is and always will be a (pointer, capacity, length) triplet. No more, no less. The order of these fields is completely unspecified, and you should use the appropriate methods to modify these. The pointer will never be null, so this type is null-pointer-optimized.

I'd argue that isn't a stable guarantee, especially since an interpretation that it is wouldn't necessarily be compatible with the efforts in https://github.com/rust-lang/wg-allocators. This information has to be stored somewhere, and, at least for A = Global it must be reconstructible from these (in all other cases it must be reconstructible from those + A), but

Is the slice length here len or cap? If it is cap, shouldn't this be Unique<[MaybeUninit<T>]>?

Either one is reasonable. I actually originally had the question be Unique<[MaybeUninit<T>]> indicating it should be capacity, but having it be length may work better for things like derefence/deref_mut in particular, and also into_boxed_slice.

As for how the aliasing rules work, I'm not worried about the concreate rules for the implementation. lccc defines noalias in terms of the object a pointer is pointing to, which notably, for non-dynamic allocations, is (or should be) one-to-one with the Stacked Borrows model (reference-to-slices simply create a fictional object of type [T;n] where n is the length of the slice, but need not be a constant expression, and point into this).
The higher level rules are more where I am concerned, if I am even permitted to make these assumptions. However, seeing as it is (presumably) possible to calculate the length of a slice for aliasing purposes, defining it in terms of Unique<[T]> should be sufficiently permissive to allow such an implementation.
Thus turning to the rules for Unique, I like a version of the rules I use, that is:

  • Unique<T> shall be a non-null, well aligned pointer to T
  • If Unique<T> is not dangling, the region it points to shall not be aliased, as though occupying a Uniq item on the borrow stack.

we actually do allow the user to keep pointers to some interior data as long as they ensure that the capacity does not change

These are notably derivative pointers of the interior, by definition. The definition I use notably permits as many derivative pointers as it wants (provided either only 1 exists, or none have the unique attribute). I believe Stacked Borrows is similarily permissive when reborrowing as/converting to a raw pointer? So this would be well defined if getting the pointer to the array is defined as a reborrow to SharedReadWrite or SharedReadOnly (depending on the type of pointer). Though this may be dependant on #248, I'm not 100% familiar with the model (I will be reading up on it at some point, to attempt to form a relation between Stacked Borrows and the model lccc uses, to hopefully prove the latters correctness).

Is there any legitimate code that would be ruled out by saying "all containers may assume their inner pointers are not aliased by user-held pointers"?

This was a sort-of catch all question, and can certainly be analyzed individually, for each such container. In particular, this is desirable from the implementation-side for String, which is just a special-case of Vec<u8>, and probably for HashSet and HashMap as well. Whether or not any others could benefit from the same, I do not know. As for user code, I don't know of any that would be ruled out by a blanket statement like this, but I'd be open for input if there is. I assume in general, it would be similar things to what is discussed for Box in #258, though whether any actual code for this exists, I do not know.
I can happily settle for a more narrowly tailored statement, for specifically those types I mentioned (and even then, I'm the most attached to allowing this for Vec and String specifically)

@digama0
Copy link

digama0 commented Dec 6, 2020

I'd argue that isn't a stable guarantee, especially since an interpretation that it is wouldn't necessarily be compatible with the efforts in https://github.com/rust-lang/wg-allocators. This information has to be stored somewhere, and, at least for A = Global it must be reconstructible from these (in all other cases it must be reconstructible from those + A)

I wouldn't say it's incompatible; the quoted stability guarantee is only for the Vec<T> that we know and love, which is being ret-conned as Vec<T, Global>. Vec<T, A> for other values of A is a new kind of thing which may not have the same guarantees, but the existing guarantee about Vec<T> behaving as expected implies that Global is necessarily a ZST (and of course it is).

@RalfJung
Copy link
Member

RalfJung commented Dec 6, 2020

I'd argue that isn't a stable guarantee

Things like that, stated in the doc, as as close to a stable guarantee as we have for libraries. If this isn't a stable guarantee, I don't know what is.

@chorman0773
Copy link
Contributor Author

My question is be, what is that stable guarantee. What does the fact that a type consists of only one pointer and two usizes, but is repr(rust), guarantee. Even if we narrow it to only Vec<T,Global>, I can't see what it supposedly says. Does it say that Vec<T,Global> has the same size and alignment as [usize;3]? In my opinion, at best, this is a logical guarantee.

@Lokathor
Copy link
Contributor

Lokathor commented Dec 6, 2020

I've seen old unsafe code that relied on Vec being "3x usize", on the grounds that the docs say it's a stable part of the type.

@bjorn3
Copy link
Member

bjorn3 commented Dec 6, 2020

What does the fact that a type consists of only one pointer and two usizes, but is repr(rust), guarantee.

I did say that the guarantee is that decomposing into the raw parts and assembling it back again will always give exactly the original value back. It doesn't have any private fields that would get lost in the translation. Nor does it have any extra validity invariants, only extra safety invariants.

@chorman0773
Copy link
Contributor Author

I did say that the guarantee is that decomposing into the raw parts and assembling it back again will always give exactly the original value back

That's what I thought, and originally mentioned in my comment on that ("this information has to be stored somewhere, and, at least for A = Global it must be reconstructible from these").

@scottmcm
Copy link
Member

scottmcm commented Dec 7, 2020

Nor does it have any extra validity invariants, only extra safety invariants.

I largely agree with your post, but I'm not certain about this one. It already has a non-null validity invariant in the current implementation, which is tighter validity than the *const that from_raw_parts takes. I think it would be within its rights, for example, to have a validity invariant that the len and capacity cannot be usize::MAX when the type is size 2, for example.

To me, this statement in the documentation is about saying "no, it's not reference counted" and "no, it's not a SmallVec" and "yes, the elements always stay at the same address through a move" and similar things like that. It's a straightforward Box<[MaybeUninit<T>]> with a length to track which are initialized. But I don't think that it means it has to store literally three fields of those precise types with precisely those values.

@steveklabnik
Copy link
Member

Historically, we treat these kinds of comments in documentation as guarantees, and I've taken pains to get the libs team to sign off on them.

@elichai
Copy link

elichai commented Dec 8, 2020

To me, this statement in the documentation is about saying "no, it's not reference counted" and "no, it's not a SmallVec" and "yes, the elements always stay at the same address through a move" and similar things like that. It's a straightforward Box<[MaybeUninit<T>]> with a length to track which are initialized. But I don't think that it means it has to store literally three fields of those precise types with precisely those values.

FWIW

Most fundamentally, Vec is and always will be a (pointer, capacity, length) triplet

Never says that length/capacity have to be usize and have to be valid for all values, so I do think it should be allowed in the future to constrain them for capacity < usize::MAX/size_of::<T>() and length < usize::MAX/size_of::<T>().

@chorman0773
Copy link
Contributor Author

I'd like to bring this up again and ask a further question (one that also applies to Box):

  • If the implementation can treat Vec<T> (or any collection in general) as though it has Unique access to the inner allocation, can an implementation treat a borrow (Uniq, SharedReadWrite, or SharedReadOnly) of the container as the same kind of borrow of the inner? On rustc's impl, I remember seeing something along the lines of "if the Unique is accessed through a shared borrow, only shared access is allowed to the pointee" (or it may have been the contrapositive, that mutable access to the pointee was only allowed through mutable access to the Unique).
  • If not, can safe or unsafe functions called on a mutable reference to a container still invalidate any borrows of the inner allocation even if it does not necessarily access the allocation or the borrowed portion thereof (IE. Vec::set_len, Vec::resize for slices less than the new size, where no new allocation is necessary).

@RalfJung
Copy link
Member

On rustc's impl, I remember seeing something along the lines of "if the Unique is accessed through a shared borrow, only shared access is allowed to the pointee"

The type system has to ensure this restriction for soundness reasons. However, Stacked Borrows doesn't do anything like that. It work strictly per-pointer / per-allocation, and in particular "the pointer used to load X" has no effect on the 'content' of X even if X is itself a pointer.

can safe or unsafe functions called on a mutable reference to a container still invalidate any borrows of the inner allocation even if it does not necessarily access the allocation or the borrowed portion thereof (IE. Vec::set_len, Vec::resize for slices less than the new size, where no new allocation is necessary).

To make sure I understand the question -- that would rule out code like this, right? Currently, it is my interpretation of the documentation of Vec that at least some of that code is intended to be allowed. Vec documents that if the backing buffer is big enough, push will not reallocate -- that statement is only useful to clients if push does not invalidate borrows of the existing elements of the vector. So I spent some time to patch Vec until those existing borrows are maintained.

I think this has to necessarily imply "no" to both of your questions, or am I misunderstanding something?

bors added a commit to rust-lang/miri that referenced this issue Jun 3, 2021
added a strings.rs regression test case for potential future UB

This PR adds a regression test for the aliasing rules of a `Unique<T>` pointer.
    At the time of writing this test case, Miri does not treat `Unique<T>`
    pointers as a special case, these are treated like any other raw pointer.
    However, there are existing Github issues which may lead to `Unique<T>`
    becoming a special case through asserting unique ownership over the pointee:
    - rust-lang/unsafe-code-guidelines#258
    - rust-lang/unsafe-code-guidelines#262
    In the new test case, the calls to `String::remove` and `String::insert[_str]` follow
    code paths that would trigger undefined behavior in case `Unique<T>`
    would ever assert semantic ownership over the pointee. Internally,
    these methods call `self.vec.as_ptr()` and `self.vec.as_mut_ptr()` on
    the vector of bytes that are backing the `String`. That `Vec<u8>` holds a
    `Unique<u8>` internally. The second call to `Vec::as_mut_ptr(&mut self)`
    would then invalidate the pointers derived from `Vec::as_ptr(&self)`.
    Note that as long as `Unique<T>` is treated like any other raw pointer,
    this test case should pass. It is merely here as a canary test for
    potential future undefined behavior.
@RalfJung
Copy link
Member

RalfJung commented Jul 3, 2023

Closing in favor of #326.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants