New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: replace span freelist with free bitmap #8894

Open
dvyukov opened this Issue Oct 7, 2014 · 2 comments

Comments

Projects
None yet
3 participants
@dvyukov
Member

dvyukov commented Oct 7, 2014

The slowest parts of mallocgc and sweeping are freelist manipulation.
The idea is to replace the freelist with bitmap of free blocks. For spans with objects
>= 128 bytes, we allocate an additional word in MSpan for this; for spans < 128
bytes, we reserve the necessary amount of words at the end of the span.

Here is a prototype CL:
https://golang.org/cl/113640043/

This has several advantages:
1. We always allocate in memory order.
2. We don't touch MSpan nor memory blocks themselves in mallocgc (freemap is cached in
MCache).
3. We don't touch blocks at all in MSpan_Sweep (only mark bits and free bitmap).
4. We don't work with linked lists in MSpan_Sweep.
5. We don't walk freelist in MSpan_Sweep to mark already free objects (bitmap set is
idempotent).
@randall77

This comment has been minimized.

Contributor

randall77 commented Oct 7, 2014

Comment 1:

Now that the garbage collector is fully precise, we can drop free lists before GC
instead of walking and marking them during sweep.  That will eliminate #5 and get us #1
also.
I'm not sure #2 is actually an advantage.  We want to touch the objects we're
allocating, the caller is going to want them in the cache.  #3 could actually be
helpful, though.
I'm not convinced linked list operations are really that slow.  Popping something off
the head of the list is just two loads and a store.
@dvyukov

This comment has been minimized.

Member

dvyukov commented Oct 8, 2014

Comment 2:

To answer this question we simply need to benchmark. But I don't have time for it right
now.
But I can say that the freelist operation is the slowest part of mallocgc.
Regarding caching, yes, caller wants the block in cache. But there is significant
difference between async store that goes into write buffer causing prefetch into cache
and a load of uncached data with a subsequent control dependency on the value. See eg:
https://golang.org/cl/100230043/diff/80001/src/pkg/runtime/mgc0.c

@rsc rsc added this to the Unplanned milestone Apr 10, 2015

@rsc rsc removed release-none labels Apr 10, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment