Skip to content

Loading…

make boxes self-describing #1493

Closed
nikomatsakis opened this Issue · 6 comments

4 participants

@nikomatsakis

Today we have a map that maps from each box allocation to a type descriptor describing its payload. This requires hashtable lookups on malloc/free and seems inefficient. However, to do cycle collection or any form of GC, boxes DO need to be self-describing. Therefore, I propose we layout a box like this:

struct box {
    tydesc *desc;
    int refcnt;
    <payload>
}

This has a few advantages:

  • no more map lookups on alloc, free; also memory consumption is surely lower
  • the desc and refcnt are always at known offsets, regardless of the alignment of payload
  • we can unify the paths for opaque types like closures, objects, and ifaces. They should also begin with a type descriptor.

I think it would be best if the type descriptor describes the entire box (including the desc and refcnt fields). The reason is that this avoids the need to consider the alignment of payload when finding the offset of the data described by the type descriptor.

@jdm

If this ends up going through, please file a bug to update the debuginfo representation of boxes.

@brson

This is a major perf issue and is why our binarytrees shootout benchmark takes 20x as long as gcc. I'm going to be updating that benchmark to use unique pointers instead of boxes.

@brson brson added a commit that referenced this issue
@brson brson bench: Update shootout-binarytrees to use unique boxes
Shared boxes have a huge performance overhead due to #1493
c6f62b6
@nikomatsakis

I started looking into fixing this. What makes it annoying is that we currently rely on the map to enumerate the allocations we have done. Probably the best way to fix this is to rewrite the allocators to use per-task arenas; as each allocation is self-describing, we could then walk the arena fairly easily by examining the type descriptors. The annoying part is we'll have to handle the free logic ourselves (it is, admittedly, not terribly complex to do).

@nikomatsakis

I've thought about this some more. I think the best thing for now is just to use some optimized allocator (jemalloc, say) and allocate a little header so that we can basically keep a doubly linked list of all outstanding allocations. Basically, the header for every box would be:

struct box_header {
    type_desc *td;
    int ref_count;
    box_header *prev;
    box_header *next;
}

It's a pity to have so much per-object overhead, but I do not believe the other schemes I was thinking about will not be efficient unless we have a compacting collector, and I have no interest in implementing THAT right now. Basically, although we could keep a packed heap and walk from allocation to allocation, we'd have no efficient way to skip those allocations which have been freed, so we'd end up walking all allocations that have been performed by the task. This is ok if you're doing a compacting collector so your fragmentation is low but unless we try and get real smart it seems to me that fragmentation could become quite high.

If anyone thinks they have a better idea let me know. =) Otherwise I will implement it this way (might be worth doing as an intermediate step anyhow)

@ssylvan

A random idea to reduce the per-object overhead is the observation that you only really need the "next object" pointer for objects located right before free space (for runs of non-free objects you just use the object sizes to get the next object). Well, since they're adjacent to free space, you can store the pointer in that free space (so the header structure would be just td+ref_count). So basically, the first header in a run of free objects would signifiy that it's been free'd somehow (e.g. ref_count==0), and also store a "skip distance" to the next non-free object. The skip distance could be stored in the space the "td" pointer would normally take, as its not needed for a free'd object.

This skip distance can be updated opportunistically when freeing an object (if the next object down is also free, set your skip distance to equal its skip distance plus your own size). This obviously won't get you all the way there (although stack-owned allocations tend to cause deallocations in reverse allocation order so it may work out well!). The first time you traverse the arena you'd do a real patch-up to merge runs of free space so the first object in the run skips past the whole thing.
For a pathological pattern of deallocations you can end up having to traverse every single object (including free'd ones) the first time through.

@nikomatsakis nikomatsakis was assigned
@nikomatsakis

That's an interesting idea. I think what I'll do is implement this in stages: first, the "large header" variant. Then, open a new bug to optimize local allocations better if it seems necessary. The work towards the "large header" variant basically has to be done anyways. (I'm part of the way through now, still in testing phase)

@nikomatsakis nikomatsakis closed this issue
Commit has since been removed from the repository and is no longer available.
@jdm jdm reopened this
@nikomatsakis nikomatsakis reopened this
@mitsugu mitsugu pushed a commit that referenced this issue
@nikomatsakis nikomatsakis make boxes self-describing (fixes #1493)" (take 2)
this will be used to generate a new snapshot.
196d69b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.