-
Notifications
You must be signed in to change notification settings - Fork 17.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: improve mcentral scalability #37487
Comments
Change https://golang.org/cl/221179 mentions this issue: |
Change https://golang.org/cl/221178 mentions this issue: |
Change https://golang.org/cl/221182 mentions this issue: |
Change https://golang.org/cl/221181 mentions this issue: |
Change https://golang.org/cl/221180 mentions this issue: |
Change https://golang.org/cl/221183 mentions this issue: |
Change https://golang.org/cl/221184 mentions this issue: |
Currently markrootSpans, the scanning routine which scans span specials (particularly finalizers) as roots, uses sweepSpans to shard work and find spans to mark. However, as part of a future CL to change span ownership and how mcentral works, we want to avoid having markrootSpans use the sweep bufs to find specials, so in this change we introduce a new mechanism. Much like for the page reclaimer, we set up a per-page bitmap where the first page for a span is marked if the span contains any specials, and unmarked if it has no specials. This bitmap is updated by addspecial, removespecial, and during sweeping. markrootSpans then shards this bitmap into mark work and markers iterate over the bitmap looking for spans with specials to mark. Unlike the page reclaimer, we don't need to use the pageInUse bits because having a special implies that a span is in-use. While in terms of computational complexity this design is technically worse, because it needs to iterate over the mapped heap, in practice this iteration is very fast (we can skip over large swathes of the heap very quickly) and we only look at spans that have any specials at all, rather than having to touch each span. This new implementation of markrootSpans is behind a feature flag called go115NewMarkrootSpans. Updates #37487. Change-Id: I8ea07b6c11059f6d412fe419e0ab512d989377b8 Reviewed-on: https://go-review.googlesource.com/c/go/+/221178 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
This change copies the gcSweepBuf data structure into a new file and renames it spanSet. It will serve as the basis for a heavily modified version of the gcSweepBuf data structure for the new mcentral implementation. We move it into a separate file now for two reasons: 1. We will need both implementations as they will coexist simultaneously for a time. 2. By creating it now in a new change it'll make future changes which modify it easier to review (rather than introducing the new file then). Updates #37487. Change-Id: If80603cab6e813a1ee2e5ecd49dcde5d8045a6c7 Reviewed-on: https://go-review.googlesource.com/c/go/+/221179 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
This change adds a global pool of spanSetBlocks to the spanSet data structure and adds support for eagerly freeing these blocks back to the pool if the block goes empty. This change prepares us to use this data structure in more places in the runtime by allowing reuse of spanSetBlock. Updates #37487. Change-Id: I0752226e3667a9e3e1d87c9b66edaedeae1ac23f Reviewed-on: https://go-review.googlesource.com/c/go/+/221180 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
This change implements the spanSet data structure which is based off of the gcSweepBuf data structure. While the general idea is the same (one has two of these which one switches between every GC cycle; one to push to and one to pop from), there are some key differences. Firstly, we never have a need to iterate over this data structure so delete numBlocks and block. Secondly, we want to be able to pop from the front of the structure concurrently with pushes to the back. As a result we need to maintain both a head and a tail and this change introduces an atomic headTail structure similar to the one used by sync.Pool. It also implements popfirst in a similar way. As a result of this headTail, we need to be able to explicitly reset the length, head, and tail when it goes empty at the end of sweep termination, so add a reset method. Updates #37487. Change-Id: I5b8ad290ec32d591e3c8c05e496c5627018074f6 Reviewed-on: https://go-review.googlesource.com/c/go/+/221181 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
Change https://golang.org/cl/230497 mentions this issue: |
The span set data structure may leak blocks due to a race in the logic to check whether it's safe to free a block. The simplest example of this race is between two poppers: 1. Popper A claims slot spanSetEntries-2. 2. Popper B claims slot spanSetEntries-1. 3. Popper A gets descheduled before it subtracts from block.used. 4. Popper B subtracts from block.used, sees that claimed spanSetEntries-1, but also that block.used != 0, so it returns. 5. Popper A comes back and subtracts from block.used, but it didn't claim spanSetEntries-1 so it also returns. The spine is left with a stale block pointer and the block later gets overwritten by pushes, never to be re-used again. The problem here is that we designate the claimer of slot spanSetEntries-1 to be the one who frees the block, but that may not be the thread that actually does the last subtraction from block.used. Fixing this problem is tricky, and the fundamental problem there is that block.used is not stable: it may be observed to be zero, but that doesn't necessarily mean you're the last popper! Do something simpler: keep a counter of how many pops have happened to a given block instead of block.used. This counter monotonically increases when a pop is _completely done_. Because this counter is monotonically increasing, and only increases when a popper is done, then we know for sure whichever popper is the last to increase it (i.e. its value is spanSetBlockEntries) is also the last popper in the block. Because the race described above still exists, the last popper may not be the one which claimed the last slot in the block, but we know for certain nobody else is popping from that block anymore so we can safely free it. Finally, because pops serialize with pushes to the same slot, we need not worry about concurrent pushers at all. Updates #37487. Change-Id: I6697219372774c8ca7d8ee6895eaa230a64ce9e1 Reviewed-on: https://go-review.googlesource.com/c/go/+/230497 Run-TryBot: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
This change copies the gcSweepBuf data structure into a new file and renames it spanSet. It will serve as the basis for a heavily modified version of the gcSweepBuf data structure for the new mcentral implementation. We move it into a separate file now for two reasons: 1. We will need both implementations as they will coexist simultaneously for a time. 2. By creating it now in a new change it'll make future changes which modify it easier to review (rather than introducing the new file then). Updates golang#37487. Change-Id: If80603cab6e813a1ee2e5ecd49dcde5d8045a6c7 Reviewed-on: https://go-review.googlesource.com/c/go/+/221179 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
This change adds a global pool of spanSetBlocks to the spanSet data structure and adds support for eagerly freeing these blocks back to the pool if the block goes empty. This change prepares us to use this data structure in more places in the runtime by allowing reuse of spanSetBlock. Updates golang#37487. Change-Id: I0752226e3667a9e3e1d87c9b66edaedeae1ac23f Reviewed-on: https://go-review.googlesource.com/c/go/+/221180 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
This change implements the spanSet data structure which is based off of the gcSweepBuf data structure. While the general idea is the same (one has two of these which one switches between every GC cycle; one to push to and one to pop from), there are some key differences. Firstly, we never have a need to iterate over this data structure so delete numBlocks and block. Secondly, we want to be able to pop from the front of the structure concurrently with pushes to the back. As a result we need to maintain both a head and a tail and this change introduces an atomic headTail structure similar to the one used by sync.Pool. It also implements popfirst in a similar way. As a result of this headTail, we need to be able to explicitly reset the length, head, and tail when it goes empty at the end of sweep termination, so add a reset method. Updates golang#37487. Change-Id: I5b8ad290ec32d591e3c8c05e496c5627018074f6 Reviewed-on: https://go-review.googlesource.com/c/go/+/221181 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
Currently mcentral is implemented as a couple of linked lists of spans protected by a lock. Unfortunately this design leads to significant lock contention. The span ownership model is also confusing and complicated. In-use spans jump between being owned by multiple sources, generally some combination of a gcSweepBuf, a concurrent sweeper, an mcentral or an mcache. So first to address contention, this change replaces those linked lists with gcSweepBufs which have an atomic fast path. Then, we change up the ownership model: a span may be simultaneously owned only by an mcentral and the page reclaimer. Otherwise, an mcentral (which now consists of sweep bufs), a sweeper, or an mcache are the sole owners of a span at any given time. This dramatically simplifies reasoning about span ownership in the runtime. As a result of this new ownership model, sweeping is now driven by walking over the mcentrals rather than having its own global list of spans. Because we no longer have a global list and we traditionally haven't used the mcentrals for large object spans, we no longer have anywhere to put large objects. So, this change also makes it so that we keep large object spans in the appropriate mcentral lists. In terms of the static lock ranking, we add the spanSet spine locks in pretty much the same place as the mcentral locks, since they have the potential to be manipulated both on the allocation and sweep paths, like the mcentral locks. This new implementation is turned on by default via a feature flag called go115NewMCentralImpl. Benchmark results for 1 KiB allocation throughput (5 runs each): name \ MiB/s go113 go114 gotip gotip+this-patch AllocKiB-1 1.71k ± 1% 1.68k ± 1% 1.59k ± 2% 1.71k ± 1% AllocKiB-2 2.46k ± 1% 2.51k ± 1% 2.54k ± 1% 2.93k ± 1% AllocKiB-4 4.27k ± 1% 4.41k ± 2% 4.33k ± 1% 5.01k ± 2% AllocKiB-8 4.38k ± 3% 5.24k ± 1% 5.46k ± 1% 8.23k ± 1% AllocKiB-12 4.38k ± 3% 4.49k ± 1% 5.10k ± 1% 10.04k ± 0% AllocKiB-16 4.31k ± 1% 4.14k ± 3% 4.22k ± 0% 10.42k ± 0% AllocKiB-20 4.26k ± 1% 3.98k ± 1% 4.09k ± 1% 10.46k ± 3% AllocKiB-24 4.20k ± 1% 3.97k ± 1% 4.06k ± 1% 10.74k ± 1% AllocKiB-28 4.15k ± 0% 4.00k ± 0% 4.20k ± 0% 10.76k ± 1% Fixes golang#37487. Change-Id: I92d47355acacf9af2c41bf080c08a8c1638ba210 Reviewed-on: https://go-review.googlesource.com/c/go/+/221182 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
The span set data structure may leak blocks due to a race in the logic to check whether it's safe to free a block. The simplest example of this race is between two poppers: 1. Popper A claims slot spanSetEntries-2. 2. Popper B claims slot spanSetEntries-1. 3. Popper A gets descheduled before it subtracts from block.used. 4. Popper B subtracts from block.used, sees that claimed spanSetEntries-1, but also that block.used != 0, so it returns. 5. Popper A comes back and subtracts from block.used, but it didn't claim spanSetEntries-1 so it also returns. The spine is left with a stale block pointer and the block later gets overwritten by pushes, never to be re-used again. The problem here is that we designate the claimer of slot spanSetEntries-1 to be the one who frees the block, but that may not be the thread that actually does the last subtraction from block.used. Fixing this problem is tricky, and the fundamental problem there is that block.used is not stable: it may be observed to be zero, but that doesn't necessarily mean you're the last popper! Do something simpler: keep a counter of how many pops have happened to a given block instead of block.used. This counter monotonically increases when a pop is _completely done_. Because this counter is monotonically increasing, and only increases when a popper is done, then we know for sure whichever popper is the last to increase it (i.e. its value is spanSetBlockEntries) is also the last popper in the block. Because the race described above still exists, the last popper may not be the one which claimed the last slot in the block, but we know for certain nobody else is popping from that block anymore so we can safely free it. Finally, because pops serialize with pushes to the same slot, we need not worry about concurrent pushers at all. Updates golang#37487. Change-Id: I6697219372774c8ca7d8ee6895eaa230a64ce9e1 Reviewed-on: https://go-review.googlesource.com/c/go/+/230497 Run-TryBot: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
This change removes the old markrootSpans implementation and deletes the feature flag. Updates #37487. Change-Id: Idb5a2559abcc3be5a7da6f2ccce1a86e1d7634e3 Reviewed-on: https://go-review.googlesource.com/c/go/+/221183 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Austin Clements <austin@google.com>
This change deletes the old mcentral implementation from the code base and the newMCentralImpl feature flag along with it. Updates #37487. Change-Id: Ibca8f722665f0865051f649ffe699cbdbfdcfcf2 Reviewed-on: https://go-review.googlesource.com/c/go/+/221184 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
After #35112 landed we saw some notable improvements in allocator scalability, but performance still plateaus at around 20-24 cores. The cause is
mcentral
, which has become the new scalability bottleneck. The reason it's a bottleneck is because each per-size-classmcentral
is a pair of linked lists protected by lock. The lock covers all iteration and operations on themcentral
. While a single addition or removal from the linked list isn't a source of great contention, caching a span from anmcentral
involves iteration which is a source of significant contention this lock.Furthermore, the code around
mcentral
is fairly confusing. The main source of this confusion is span ownership: when should a span be in anmcentral
? Currently a span may be owned simultaneously by:mcentral
.mheap_.sweepSpans
.mcache
.This makes reasoning about the span lifecycle and ownership tricky. With some refactoring, I think we can achieve scalability and also change the span ownership model to limit the number of simultaneous owners. @aclements suggested before that we could repurpose the
gcSweepBuf
, a data structure built for fast concurrent access, to replace the linked lists in themcentral
. We can take this idea further and also use these data structures for sweeping, instead of having a separatemheap_.sweepSpans
. This means thatmarkrootSpans
will need a different mechanism for finding spans with specials, but we can use a bitmap for that similar to the page reclaimer.By unifying the sweep queue with
mcentral
we can also make it so that concurrent sweepers take complete ownership of the span, which makes reasoning about(*mspan).sweep
much easier as well. Finally, since we don't need to acquire a lock, there's nothing wrong with anmcache
taking complete ownership of a span.There remains one place where multiple span ownership would still exist and that's with the page reclaimer, which will probably never be able to take ownership of a span. But that's OK, since it only ever sweeps spans which will be freed to the heap, so all the other mechanisms can just ignore spans which are picked up by the page reclaimer (identified by their
sweepgen
value).The text was updated successfully, but these errors were encountered: