Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allocator RFC, take II #244

Closed
wants to merge 70 commits into from
Closed

Conversation

pnkfelix
Copy link
Member

(rendered)

Summary

Add a standard allocator interface and support for user-defined allocators, with the following goals:

  1. Allow libraries to be generic with respect to the allocator, so that users can supply their own memory allocator and still make use of library types like Vec or HashMap (assuming that the library types are updated to be parameterized over their allocator).

    In particular, stateful per-container allocators are supported.

  2. Support ability of garbage-collector (GC) to identify roots stored in statically-typed user-allocated objects outside the GC-heap, without sacrificing efficiency for code that does not use Gc<T>.

  3. Do not require an allocator itself to extract metadata (such as the size of an allocation) from the initial address of the allocated block; instead, force the client to supply size at the deallocation site.

    In other words, do not provide a free(ptr)-based API. (This can improve overall performance on certain hot paths.)

  4. Incorporate data alignment constraints into the API, as many allocators have efficient support for meeting such constraints built-in, rather than forcing the client to build their own re-aligning wrapper around a malloc-style interface.

fleshed out discussion of call correspondence.

added initial work on standard high-level 1:n Alloc.

Moved note-to-self on Reap support to an unresolved question.
@Gankra
Copy link
Contributor

Gankra commented Sep 23, 2014

If you could express "allocate me a buffer to fit x T's, y U's, and z V's, and return a ptr to it" and "interpret this ptr as a buffer to x T's, y U's, and z V's, and return me a ptr for each buffer", I think that would address the use case. Then you can wrap that logic to represent the capacities in as efficient of a way as possible/desired in the given domain. In the case of BTree, the capacities are (x, x, x+1), so the wrapper just needs to remember x. A generic multi-vector could be built that only supports all-equal capacity, or only supports all-different capacity, if desired.

@nikomatsakis
Copy link
Contributor

On Mon, Sep 22, 2014 at 04:38:28AM -0700, Felix S Klock II wrote:

  1. I still think this RFC has value, but it needs to say up front that it does not handle every use case.

Sure.

  1. It may need to be revised so that it can cover arbitrary DSTs as well. I am hoping we can come up with an acceptable API in the short-term that does not need something as complex as the given AllocCore, though I am not sure whether that is feasible.

I personally don't think this needs to be in this RFC and could come
in a follow-up at some later time (or may not even need an RFC at all
-- it's just a new library, basically).

To me, the key questions that decide the fate of this RFC at this
time
are (1) whether we think these APIs make sense even without a GC
and (2) what is the impetus to standardize now. I think it's clear
that a typed interface is useful with a GC, and it seems like we've
established that we can accommodate custom layouts in various ways.
But what happens if we decide that support for GC types doesn't carry
its weight?

It seems to me that having typed APIs is still useful even without GC,
just as a convenience. We might simplify the API somewhat. We
certainly wouldn't need a deinit method. We might not want a trait
at all, and we could instead make the typed methods just be top-level
fns or default methods in the RawAlloc trait. That said, I can still
imagine wanting to use the typed API in order to have type-specific
policies (though it requires either reflection or impl specialization
to do so, since you need some way to "test" a type T and decide what
its properties are). So it seems that if we wound up with GC, the APIs
in this RFC might be slightly more complex than necessary, but not too
much -- at worst we have one deprecated method (deinit) and a
wrapper type to go from raw to typed that seems somewhat superfluous
(but hey, composition over inheritance is all the rage these days
anyway).

Still, as @strcat's original message suggested, we can maintain the
status quo and hold off on standardizing any allocator API. The main
drivers for standardizing sooner, I think, are the following:

a. it encourages people to use the typed APIs when suitable, thus
making more code forwards compatible with tracing later. Otherwise
tracing will invalidate quite a bit of unsafe code.
b. it allows us to make a number of types -- not necessarily including
hashtable -- parameterized by an allocator in a stable way.

I'm not sure yet whether it makes more sense to take this RFC, some
variant of this RFC that tries to do a bit less, or no RFC yet. I
personally am not ready yet to give up on support for tracing -- it
seems to be within reach and without great cost (to code that doesn't
use it), though obviously the jury is still out.

@zwarich
Copy link

zwarich commented Sep 26, 2014

@pnkfelix What are the drawbacks of a GC API based entirely on a GcTraceable trait, where the compiler identifies the local variables that must be traced via this trait, with no allocator support? Obviously you get into situations like trait objects where a type is abstracted over with an existential, and for soundness you need to expose GcTraceable in the type of the trait object, but what else goes wrong?

@pnkfelix
Copy link
Member Author

@zwarich The main issue that the allocator support here is trying to resolve is the case where you have blocks of allocated data (not managed by the GC heap) that still needs to be treated as additional roots by the GC [*]. Note that some such blocks might currently only be reachable via pointers held in native libraries; they may not be visible via any path from a local variable that is statically known to the compiler.

There are a number of ways one can attempt to address the above problem. (For example, one could require separate manual registration of such blocks of memory before allowing the pointers to them from being lost from the locally scanned roots.) The design of this RFC is to choose a high-level approach that can accommodate many different approaches, at a level of abstraction that was thought to be easy to understand.

[*] I posted a more complete explanation of my thinking here: http://discuss.rust-lang.org/t/on-native-allocations-pointing-into-a-gc-heap/564

@mitsuhiko
Copy link
Contributor

One thing I want to add is that the horrible allocator system in the STL is the primary reason why a large part of the C++ community that really likes performance (gamedev) does not use it. Because Rust's language design generally encourages stuff to allocate memory everywhere I think an allocator API generally is important.

One part that has not been mentioned here however is that it's not just collections that want to allocate memory. There are also algorithms and other things. One common pattern is that you have a certain subsystem and you want to make sure that everything that subsystem does is allocated through one allocator so that you can track your memory patterns.

Maybe it would make sense to collect some outsider feedback on allocators?

@huonw
Copy link
Member

huonw commented Oct 1, 2014

Another thing for the typed allocation stuff, the Vec.map_in_place method converts a Vec<T> to a Vec<U> in place (T and U have to be compatible); this presumably wrecks havoc on trying to 'statically' classify allocations.

@nikomatsakis
Copy link
Contributor

On Wed, Oct 01, 2014 at 04:06:13PM -0700, Huon Wilson wrote:

Another thing for the typed allocation stuff, the
Vec.map_in_place
method

converts a Vec<T> to a Vec<U> in place (T and U have to be
compatible); this presumably wrecks havoc on trying to 'statically'
classify allocations.

Indeed. That seems like it would require a "reclassify" method to
accommodate (or else a custom tracer for vecs, but if even vecs
require a custom tracer than perhaps more-or-less everything does).
Of course it might be easier -- and satisfactory -- to slap a
NoManaged bound on the man-in-place method.

(As an aside, this seems like a nice use-case for having a marker trait
that checks that types have same size/alignment/whatever. This could
be used for transmute() as well in place of the ad-hoc compiler checks
we do now. I was thinking of trying to implement this in my copious
spare time just to see how hard it is (should be fairly easy, I
think).)

@Gankra
Copy link
Contributor

Gankra commented Nov 11, 2014

@pnkfelix Are the contents of this RFC still relevant for discussion, or has your design substantially changed?

@pnkfelix
Copy link
Member Author

closing; I want to do a take III at some point that handles cases like the HashMap implementation more properly.

@pnkfelix pnkfelix closed this Nov 13, 2014
@Gankra
Copy link
Contributor

Gankra commented Nov 17, 2014

I have been instructed to leave any allocator notes here, for future reference. A lot of this is me just trying to transcribe what was discussed in IRC.

First off, I really like the high-level API design here, it removes a lot of the boiler-plate we're seeing in libcollections now. Vec and Ringbuf as well as HashMap's RawTable and BTree's Node (in rust-lang/rust/#18028) are duplicating a lot of logic that I'd like to see moved into a higher-level API. I've been deferring trying to do this because this might trivially fall out this allocator work. However it looks like I may create a mock version of the top-level API as a shim to improve code quality while this bakes.

I'd like to note that non-global allocators put libcollections in an awkward spot. Unless the local allocator is managed by some global one, we will almost certainly have to have every Box/Rc/etc contain a pointer back to their local allocator instance. This would add an extra uint that is strictly unnecessary in the context of single-allocator collections. Therefore collections that use Box (currently TrieMap, TreeMap, and DList), may be forced to use only raw ptrs in order to avoid this. This would rack up our unsafeness in those collections by a significant amount. The interaction with unwinding is also unfortunate.

One solution to this would be an aborting-on-destruction linear type Box. It would basically be a Box that you have to manually deallocate. Let's call it Mox. Mox could have the following API:

pub struct Mox<T, A>{
  ptr: *mut T
  allocator: ContravariantType<A>
}

// add allocation/emplace/new logic here

impl<T, A> Mox<T, A> {
  /// Deallocate and drop contents, unsafe because it assumes the right allocator was given
  unsafe fn deallocate(self, allocator: A);
  /// Deallocate and return contents, unsafe because it assumes the right allocator was given
  unsafe fn into_inner(self, allocator: A) -> T;
}

impl<T,A> Drop for Mox<T, A> {
  fn drop(&mut self) {
    // we'll never be correctly freed if this happens, so just abort rather than leaking memory
    abort!();
  }
}

This gets rid of most of the boiler-plate, and ensures that either we handle the memory correctly or abort. However there doesn't seem to be any way to ensure that the right allocator is used beyond the fact that it was the right kind of allocator. Of course just accepting the uint-per-box overhead will always work exactly right.

I would likely to reaffirm my desire for a multialloc! macro that take an allocator and (Type, len) pairs, allocates all the space needed in a single buffer, and then return a tuple of all the ptrs to the "individual" arrays. Would also need a multidealloc!. HashMap wants this, BTree will want this soon.

We also need to formally work out the OOM semantics for libstd and libcollections in particular. On IRC @nikomatsakis suggested that panic! would be acceptable for the standard library. This sucks for anyone who wants different behaviour, though (Kernels, Embedded, ..?). To accommodate them we would have to, I guess, return Results for every allocating operation. This would be very noisy, and likely less efficient. It has been suggested that in the future we can offer an alternative collections library with these semantics. Although the maintenance cost would likely be nasty. Maybe codegen can handle it.

Still, if we're all happy with panic for now, I would like to have a "just panic on OOM" (or I guess call alloc::oom()?) wrapper for allocators so that users can focus on the logic that matters, and not checking for nulls. Having a wrapper that returns a Result would also be interesting.

@nikomatsakis also suggested the possibility of adding more built-in pointer types, like one that is explicitly not null, but is otherwise unsafe to use (might be uninitialized, for one). The suggested wrappers could use such a ptr type.

Finally, I'm not sure if this is strictly in the scope of the allocators API, but it feels strongly related, so I want to note it in case it has any influence on design. There is strong desire to eventually have support for a HeapVec (what we have now), StackVec (uses a [T, ..n] as backing storage-- potentially avoids storing any capacity field in favour of having n part of its type), and HybridVec (StackVec that falls back to HeapVec when too full). Of course, there's nothing fundamental to Vec; all of our array-based collections could leverage this functionality. I believe this can be achieved by making these collections generic over a Buffer provider. This is very similar to what the allocator API provides, but a bit higher level. The capacity and ptr management would be completely abstracted away. In this way the StackBuffer can omit storing a capacity, and the HeapBuffer can reallocate.

Here's a sketch of what this would look like (haven't dug too deeply into it because I don't know what shape the allocator API will take):

pub trait Buffer<T> {
    /// Gets the capacity this buffer has. 
    ///
    /// May be zero for HeapBuffer, in which case it hasn't actually allocated 
    /// yet. Need to call `grow` on it first.
    ///
    /// For a StackBuffer this would be a fixed constant.
    fn capacity(&self) -> uint;

    // maybe this?
    /// This buffer cannot reserve more space than this. 
    ///
    /// Useful for dealing with next_power_of_two amortization strategies 
    /// with something inflexible like StackBuffer.
    // fn max_capacity(&self) -> uint; 

    /// Gets a raw ptr to the head of the buffer's storage. This should not be
    /// cached between operations, as `grow` and `shrink` may change where the
    /// buffer is.
    fn ptr(&mut self) -> *mut T;

    /// Asks the buffer to grow. Returns the new size. 
    ///
    /// StackBuffer naturally ignores this. Unsure if this should be handled by:
    /// * panic
    /// * return a Result
    /// * simply report that the capacity is the same, expect them to check
    fn grow(&mut self, uint: amount) -> uint;

    /// Asks the buffer to shrink. Returns the new size. StackBuffer naturally 
    /// ignores this.
    ///
    /// see above
    fn shrink(&mut self, uint: amount) -> uint;

    // Should Buffers free themselves when Dropped, or have a manual deallocate
    // call? Do we want to support taking an &[T] as a Buffer so someone can
    // populate it with a Vec that gets mem::forgotten?

    // May want try_grow_inplace for RingBuf, as it wants to manually handle
    // copying in the non-inplace case.
}

struct HeapBuffer<T, A: Allocator> {
    ptr: *mut T,
    capacity: uint,
    allocator: A,
}

// hypothetical world where n can be the type, and you can manually_drop the
// contents of a fixed-size array.
struct StackBuffer<T, n> {
    buf: [T, ..n],
}

struct HybridBuffer<T, n, A: Allocator> {
    bool: on_stack,
    stack_buf: StackBuffer<T, n>,
    heap_buf: HeapBuffer<T, A>,
}

// impl buffer for all these...

struct MetaVec<T, B: Buffer<T>> {
    len: uint,
    buf: B, // note, for B=HeapBuffer, this is indentical to current Vec
}

impl<T, B: Default> MetaVec<T, B> {
    fn new() -> MetaVec {
        MetaVec {
            len: 0,
            // Makes a [T, n] of uninitialized() for StackVec, just makes a null ptr for HeapVec
            buf: Default::default(); 
        }
    }
}

// support with_buffer as well?

impl<T, B> MetaVec<T, B> {
    fn push(&mut self, elem: T) {
        self.reserve(1); 
        let len = self.len() as int;
        unsafe {
            //using my proposed ptr methods here
            self.buf.ptr().offset(len).write(elem); 
        }
    }

    fn reserve(&mut self, additional: uint) {
        let len = self.len();
        let desired_cap = len.checked_add(additional).expect("capacity overflow");
        let old_cap = self.buf.capacity();
        if desired_cap > old_cap {
            let new_cap = checked_next_power_of_two(desired_cap).expect("capacity overflow");
            let max_cap = self.buf.max_capacity();
            // This will immediately panic for StackBuffer (assuming we choose those semantics)
            // May also panic for OOM, capacity overflow, or "max size of contiguous buffer issues"
            self.buf.grow(new_cap - old_cap);
        }
    }
}

// drop/clear/etc 

@Gankra
Copy link
Contributor

Gankra commented Nov 17, 2014

Modified the MetaVec example after I realized max_capacity is unecessary.

@Ericson2314
Copy link
Contributor

Grankro, can't the Allocator instance for a global allocator be a zero sized type so there is no overhead in the global allocator case?

@Gankra
Copy link
Contributor

Gankra commented Nov 18, 2014

Yes, global allocators can have no overhead, as is the case today.

@Ericson2314
Copy link
Contributor

I'd like to note that non-global allocators put libcollections in an awkward spot. Unless the local allocator is managed by some global one, we will almost certainly have to have every Box/Rc/etc contain a pointer back to their local allocator instance. This would add an extra uint that is strictly unnecessary in the context of single-allocator collections.

Wait, then what is the problem you mean by this?

@Gankra
Copy link
Contributor

Gankra commented Nov 18, 2014

If you have a non-global allocator. That is one you make several concrete instances of, and dole out to specific collections at runtime, the Boxes that use these allocators need a pointer to their local allocator (or some other identifier that prevents the allocator from being dropped, and lets the Box's destructor call into it).

Of course there's some possible work-arounds. Like if the allocators are managed by a global allocator that can figure out which sub-allocators control which pointers. But in general some local allocators will need the boxes to be bloated with runtime metadata. Especially since the value of local custom allocators (I'm told) is in their simplicity. Presumably this is unacceptable overhead for our collections to inherit when all the boxes in a single collection "use" the same allocator.


Another thing I just remembered. We need a mechanism to identify if two Allocators are the same at runtime. This is necessary to support functionality like moving the Node of one DList to another without allocating.

@pnkfelix
Copy link
Member Author

@gankro I'm not sure if you addressed this or not, but another work-around for the desired mapping, that is, the mapping from: (a value (i.e. address of some heap block) + zero-sized per-value source-allocator type) and to: the allocator instance, would be to play tricks within the allocator itself where one can derive a pointer to the allocator instance from the heap block address.

Concrete example: the allocator could ensure that all the blocks it hands out come from a pool where applying bitmask on the addresses it hands out yields the address of a pointer to the source allocator. This provides a foundation for a cheap mapping from an object's address to its source allocator.

Note: This is basically trading fragmentation in one spot (the per-allocation back-pointer) for fragmentation elsewhere (namely, the requirement that the local allocator pre-reserve chunks of the address space -- chunks that may be larger than you expected).

But my point is, the same tricks that are used in modern global allocators can also be employed by the local ones, in order to avoid per-allocation overheads.

Note 2: I explicitly noted that the mapping gets the zero-sized source allocator type as input, to make it clear that the protocol described that maps the allocated address to the pointer to the allocator is not a global scheme that all allocators would have to subscribe to -- it is a local scheme that we "know" is sound to use because we are given the zero-sized type of the source allocator as a marker telling us that this is the right scheme to use.

@Gankra
Copy link
Contributor

Gankra commented Nov 19, 2014

Yep, that's basically the idea I was getting at in the above comment. Totally viable, but I'm not sure if we can abandon allocators that don't go in for such a scheme.

@pnkfelix
Copy link
Member Author

@gankro i don't know what you mean by "abandon allocators that don't go in for such a scheme" -- the design outlined by this RFC (and probably others) can support both strategies, and in fact such allocators can be mixed together in the same program, as I tried to outline in "Note 2".

@Gankra
Copy link
Contributor

Gankra commented Nov 19, 2014

Yeah, sorry I was a bit overly dramatic there. I just mean treating them as second-class citizens, efficiency-wise, because not doing that would necessitate More Unsafe (and some dubious tricks to deal with unwinding).

@Ericson2314
Copy link
Contributor

@pnkfelix That is exactly what I was thinking with my "Deallocator trait" trait proposal. Since the type one parametrizes a Box with is really a (usually zero-sized) proxy, and it is mainly needed for freeing, I figured Deallocator would be a good name for the trait. Allocator would be implemented by the local instance itself.

A variation, to implement @gankro's Mox trick: Make the deallocator trait's methods take a the associated allocator. That way metadata can be stored both once per collection, and once per pointer (here the allocator trait would probably be implemented with a pointer type), rather than obligate the implementers to find the instance solely from the pointer dealloc is called with.

A very rough sketch (ignoring the raw vs nice traits and other proposed stuff just for the simplicity of the example):

trait Deallocator {
  type A: Allocator;
  // I explain why self is consumed below
  fn dealloc(self, &A, *mut u8);
  // realloc, etc...
}

struct Box<D, T> {
  // This is not &D so that there is no overhead in the global allocator common case.
  // Since D is stored one copy per pointer, it might as well be consumed by `dealloc`.
  per_pointer_metadata: D, 
  raw: *T, // or whatever normally hides in a Box
}

struct Collection<D, T> {
  // stuff in array dealloc'd with this as second arg to `dealloc`
  per_collection_metadata: Arc<D::A>,
  ptrs: [Box<D, T>, ..16],
}

@thestinger
Copy link

I don't think there's a use case for more stateless allocators. The general purpose allocator is already great, and there's little that can be done to improve over it. Rust's general purpose allocator already supports making isolated arenas rather than using the default ones. The whole point of custom allocator support is enabling allocators with very low constant costs by having a pointer to the arena and making good use of sized deallocation.

@Ericson2314
Copy link
Contributor

@thestinger I'm confused, is that at me or earlier comments? I made those contortions so the general purpose allocator would fit the allocator API without any overhead whatsoever. Even if the global allocator is the only stateless one in practice, I'd think supporting it without overhead would be reason enough to contort the API.

@pnkfelix pnkfelix mentioned this pull request Dec 22, 2014
withoutboats pushed a commit to withoutboats/rfcs that referenced this pull request Jan 15, 2017
Export BiLockAcquire and BiLockAcquired.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.