Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collection docs #27902

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
75 changes: 75 additions & 0 deletions src/libcollections/vec.rs
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,81 @@ use super::range::RangeArgument;
/// if the vector's length is increased to 11, it will have to reallocate, which
/// can be slow. For this reason, it is recommended to use `Vec::with_capacity`
/// whenever possible to specify how big the vector is expected to get.
///
/// # Guarantees
///
/// Due to its incredibly fundamental nature, Vec makes a lot of guarantees
/// about its design. This ensures that it's as low-overhead as possible in
/// the general case, and can be correctly manipulated in primitive ways
/// by unsafe code. Note that these guarantees refer to an unqualified `Vec<T>`.
/// If additional type parameters are added (e.g. to support custom allocators),
/// overriding their defaults may change the behavior.
///
/// Most fundamentally, Vec is and always will be a (pointer, capacity, length)
/// triplet. No more, no less. The order of these fields is completely
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this also be true if we ever add a allocator type paramter? Possible designs I've seen for this might add an additional field to the type:

struct Vec<T, A: Allocator=Heap> {
    data: *mut T,
    len: usize,
    capacity: usize,
    allocator: A::HandleData,
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right above this:

Note that these guarantees refer to an unqualified Vec<T>.
If additional type parameters are added (e.g. to support custom allocators),
overriding their defaults may change the behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(allocator would be a ZST by default, so its only impact could be ordering, which is already unspecified)

/// unspecified, and you should use the appropriate methods to modify these.
/// The pointer will never be null, so this type is null-pointer-optimized.
///
/// However, the pointer may not actually point to allocated memory. In particular,
/// if you construct a Vec with capacity 0 via `Vec::new()`, `vec![]`,
/// `Vec::with_capacity(0)`, or by calling `shrink_to_fit()` on an empty Vec, it
/// will not allocate memory. Similarly, if you store zero-sized types inside
/// a Vec, it will not allocate space for them. *Note that in this case the
/// Vec may not report a `capacity()` of 0*. Vec will allocate if and only
/// if `mem::size_of::<T>() * capacity() > 0`. In general, Vec's allocation
/// details are subtle enough that it is strongly recommended that you only
/// free memory allocated by a Vec by creating a new Vec and dropping it.
///
/// If a Vec *has* allocated memory, then the memory it points to is on the heap
/// (as defined by the allocator Rust is configured to use by default), and its
/// pointer points to `len()` initialized elements in order (what you would see
/// if you coerced it to a slice), followed by `capacity() - len()` logically
/// uninitialized elements.
///
/// Vec will never perform a "small optimization" where elements are actually
/// stored on the stack for two reasons:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the stack is kind of the wrong terminology. The vec is just a value, and it can live anywhere, so "inline" is more correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

///
/// * It would make it more difficult for unsafe code to correctly manipulate
/// a Vec. The contents of a Vec wouldn't have a stable address if it were
/// only moved, and it would be more difficult to determine if a Vec had
/// actually allocated memory.
///
/// * It would penalize the general case, incurring an additional branch
/// on every access.
///
/// Vec will never automatically shrink itself, even if completely empty. This
/// ensures no unnecessary allocations or deallocations occur. Emptying a Vec
/// and then filling it back up to the same `len()` should incur no calls to
/// the allocator. If you wish to free up unused memory, use `shrink_to_fit`.
///
/// `push` and `insert` will never (re)allocate if the reported capacity is
/// sufficient. `push` and `insert` *will* (re)allocate if `len() == capacity()`.
/// That is, the reported capacity is completely accurate, and can be relied on.
/// It can even be used to manually free the memory allocated by a Vec if
/// desired. Bulk insertion methods *may* reallocate, even when not necessary.
///
/// Vec does not guarantee any particular growth strategy when reallocating
/// when full, nor when `reserve` is called. The current strategy is basic
/// and it may prove desirable to use a non-constant growth factor. Whatever
/// strategy is used will of course guarantee `O(1)` amortized `push`.
///
/// `vec![x; n]`, `vec![a, b, c, d]`, and `Vec::with_capacity(n)`, will all
/// produce a Vec with exactly the requested capacity. If `len() == capacity()`,
/// (as is the case for the `vec!` macro), then a `Vec<T>` can be converted
/// to and from a `Box<[T]>` without reallocating or moving the elements.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can document this, but why guarantee it? capacity() doesn't even have to return that exact value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If people are trying to reason based on stable addresses, this is important.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capacity definitely needs to be exact though -- otherwise it's impossible to decompose and reconstitute through raw_parts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to be equivalent for jemalloc purposes, guess it needs to end up in the same size bucket or so

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to be equivalent for $ARBITRARY_GLOBAL_ALLOCATOR, which in general is just the exact value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't disagree. You have to keep around whatever capacity() tells you you have.

But can we guarantee that vec![a, b, c] will always be capacity = 3 when we in other places reserve the right to use extra capacity that the allocator gives to us?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those reservations are, to my knowledge, future proofing for local allocators, which will be overloading a default type parameter, which these docs specifically guard against.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could try asking jemalloc for extra space (actual usable space) in the future. I guess it could be used in some call paths and not the ones we lock down here. I'm just 1) confused and 2) always reluctant to guarantee stuff.

///
/// Vec will not specifically overwrite any data that is removed from it,
/// but also won't specifically preserve it. Its uninitialized memory is
/// scratch space that it may use however it wants. It will generally just do
/// whatever is most efficient or otherwise easy to implement. Do not rely on
/// removed data to be erased for security purposes. Even if you drop a Vec, its
/// buffer may simply be reused by another Vec. Even if you zero a Vec's memory
/// first, that may not actually happen because the optimizer does not consider
/// this a side-effect that must be preserved.
///
/// Vec does not currently guarantee the order in which elements are dropped
/// (the order has changed in the past, and may change again).
///
#[unsafe_no_drop_flag]
#[stable(feature = "rust1", since = "1.0.0")]
pub struct Vec<T> {
Expand Down
30 changes: 4 additions & 26 deletions src/libstd/collections/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,9 @@
//!
//! Rust's collections can be grouped into four major categories:
//!
//! * Sequences: `Vec`, `VecDeque`, `LinkedList`, `BitVec`
//! * Maps: `HashMap`, `BTreeMap`, `VecMap`
//! * Sets: `HashSet`, `BTreeSet`, `BitSet`
//! * Sequences: `Vec`, `VecDeque`, `LinkedList`
//! * Maps: `HashMap`, `BTreeMap`
//! * Sets: `HashSet`, `BTreeSet`
//! * Misc: `BinaryHeap`
//!
//! # When Should You Use Which Collection?
Expand Down Expand Up @@ -70,22 +70,11 @@
//! * You want to be able to get all of the entries in order on-demand.
//! * You want a sorted map.
//!
//! ### Use a `VecMap` when:
//! * You want a `HashMap` but with known to be small `usize` keys.
//! * You want a `BTreeMap`, but with known to be small `usize` keys.
//!
//! ### Use the `Set` variant of any of these `Map`s when:
//! * You just want to remember which keys you've seen.
//! * There is no meaningful value to associate with your keys.
//! * You just want a set.
//!
//! ### Use a `BitVec` when:
//! * You want to store an unbounded number of booleans in a small space.
//! * You want a bit vector.
//!
//! ### Use a `BitSet` when:
//! * You want a `BitVec`, but want `Set` properties
//!
//! ### Use a `BinaryHeap` when:
//!
//! * You want to store a bunch of elements, but only ever want to process the
Expand Down Expand Up @@ -123,11 +112,9 @@
//! | Vec | O(1) | O(n-i)* | O(n-i) | O(m)* | O(n-i) |
//! | VecDeque | O(1) | O(min(i, n-i))* | O(min(i, n-i)) | O(m)* | O(min(i, n-i)) |
//! | LinkedList | O(min(i, n-i)) | O(min(i, n-i)) | O(min(i, n-i)) | O(1) | O(min(i, n-i)) |
//! | BitVec | O(1) | O(n-i)* | O(n-i) | O(m)* | O(n-i) |
//!
//! Note that where ties occur, Vec is generally going to be faster than VecDeque, and VecDeque
//! is generally going to be faster than LinkedList. BitVec is not a general purpose collection, and
//! therefore cannot reasonably be compared.
//! is generally going to be faster than LinkedList.
//!
//! ## Maps
//!
Expand All @@ -139,15 +126,6 @@
//! |----------|-----------|----------|----------|-------------|
//! | HashMap | O(1)~ | O(1)~* | O(1)~ | N/A |
//! | BTreeMap | O(log n) | O(log n) | O(log n) | O(log n) |
//! | VecMap | O(1) | O(1)? | O(1) | O(n) |
//!
//! Note that VecMap is *incredibly* inefficient in terms of space. The O(1)
//! insertion time assumes space for the element is already allocated.
//! Otherwise, a large key may require a massive reallocation, with no direct
//! relation to the number of elements in the collection. VecMap should only be
//! seriously considered for small keys.
//!
//! Note also that BTreeMap's precise performance depends on the value of B.
//!
//! # Correct and Efficient Usage of Collections
//!
Expand Down