runtime: Large maps cause significant GC pauses #9477

Closed
cespare opened this Issue Dec 30, 2014 · 30 comments

Projects

None yet
@cespare
Contributor
cespare commented Dec 30, 2014

I'm on Go 1.4 and linux/amd64. I've noticed this since at least Go 1.2.

Large maps incur large GC pause times. This happens even when the map key and value types do not contain pointers, and when the map isn't changing between GC runs. I assume it comes from the collector traversing internal pointers in the map implementation.

Roughly speaking, with a single map containing millions of entries, one will typically see GC pause times of hundreds of ms.

Here's an example program that shows a few different approaches: http://play.golang.org/p/AeHSLyLz_c. In particular, it compares

  1. Large map with a pointer-typed value
  2. Large map with a pointer-free value
  3. Sharded map
  4. Big slice

In the real program where this issue caused me problems, I have a server that used a very large map (>10M entries) as an index into off-heap data. Even after getting rid of pointers in the map value and sharding the map, the pause times were >100ms which was too large for my use case. So I ended up writing a very simple hashmap implementation using only slices/indexes and that brought the pause times to under 1ms.

I wonder if the map implementation and GC can be made to work together better to mitigate this.

(It's possible that this is expected/unfortunate, or that it will get better with improved GC along with everything else, but I wanted to have a record of it to center the discussion.)

@bradfitz bradfitz added this to the Go1.5Maybe milestone Dec 30, 2014
@joliver
joliver commented Dec 31, 2014

The reference conversation for this support issue can be found here:
https://groups.google.com/forum/#!topic/golang-nuts/baU4PZFyBQQ

My findings at the end of the thread are here:
https://groups.google.com/forum/#!msg/golang-nuts/baU4PZFyBQQ/fCzQelfmbNYJ

@randall77
Contributor

This may be fixed in 1.5 with the concurrent gc. However, the work of scanning the hash tables will not go away, it will just be paid for in smaller chunks. Hash tables will still have overflow pointers so they will still need to be scanned and there are no plans to fix that. I'm open to ideas if anyone has them.

@mish15
mish15 commented Jan 1, 2015

👍. I added some example cases with preallocated maps for comparison.
http://play.golang.org/p/E7z9npFXm-

@joliver agree with the slice issue too. We use arrays for very large blocks where possible, but it's annoying.

@josharian
Contributor

@randall77 on pain of going into a rat hole, I ran @cespare's tests with a few different values of loadFactor.

6.5 6.0 5.5
map[int32]*int32 3070ms 2872ms 2790ms
map[int32]int32 659ms 414ms 403ms
map shards 385ms 386ms 380ms

It's possible that in loadFactor will be worth revisiting, once the concurrent gc has stabilized.

@RLH
Contributor
RLH commented Jan 5, 2015

Is the problem about how long a call to runtime.GC() takes or is it really
about GC latency, how much CPU time the application code (mutator) is
allotted over the course of some time delta, and how much HW needs to be
provisioned to achieved the goal? The 1.5 GC addresses the later. There are
no plans to address how long a call to runtime.GC() takes.

On Fri, Jan 2, 2015 at 2:46 PM, Josh Bleecher Snyder <
notifications@github.com> wrote:

@randall77 https://github.com/randall77 on pain of going into a rat
hole, I ran @cespare https://github.com/cespare's tests with a few
different values of loadFactor.
6.5 6.0 5.5 map[int32]*int32 3070ms 2872ms 2790ms map[int32]int32
659ms 414ms 403ms map shards 385ms 386ms 380ms

It's possible that in loadFactor will be worth revisiting, once the
concurrent gc has stabilized.


Reply to this email directly or view it on GitHub
#9477 (comment).

@joliver
joliver commented Jan 5, 2015

It actually is a problem with the garbage collection itself. The examples thus far call runtime.GC() to more easily demonstrate and expose the issue. In production I have a number of large maps and using the default implementation we have observed garbage collection pauses measuring hundreds of milliseconds. It was bad enough that we dug in to pinpoint exactly what was causing it.

The biggest question being raised in this issue is whether optimizations could be made on certain kinds of structures such as maps where the contained type is known to not contain pointers, such as a map[int]bool, etc.

@RLH
Contributor
RLH commented Jan 5, 2015

If go reduced the observed garbage collection pauses from hundreds of
milliseconds to 10 miliseconds out every 50 milliseconds would this solve
your problem?

On Mon, Jan 5, 2015 at 11:22 AM, Jonathan Oliver notifications@github.com
wrote:

It actually is a problem with the garbage collection itself. The examples
thus far call runtime.GC() to more easily demonstrate and expose the
issue. In production I have a number of large maps and using the default
implementation we have observed garbage collection pauses measuring
hundreds of milliseconds. It was bad enough that we dug in to pinpoint
exactly what was causing it.

The biggest question being raised in this issue is whether optimizations
could be made on certain kinds of structures such as maps where the
contained type is known to not contain pointers, such as a map[int]bool,
etc.


Reply to this email directly or view it on GitHub
#9477 (comment).

@rhysh
Contributor
rhysh commented Jan 5, 2015

I run an application that uses a lot of map and sees similar issues (~600ms pause time with a 1.6GB heap). Decreasing the pause to 10ms at a time would be a big help to this app. However, I wonder if the overall cost of GC could be decreased separately.

@randall77, I read through hashmap.go recently and it looks like the use of overflow buckets may be restricted enough that they could be allocated from a map-local arena instead of on the global heap. It may not have to lean on the general-purpose GC just to keep track of its overflow buckets.

It looks like overflow buckets aren't ever freed except during an evacuation, and that pointers to them are only accessible within small parts of the runtime. The memory for the overflow buckets could be backed by an array similar to hmap.buckets (with a bump-the-pointer allocator), and could be referenced by their offset into the array instead of a real pointer (which would be chased by the GC).

Is this approach possible?

@siritinga

I suppose that the total GC time would be the same or maybe slower if the start/stop takes some time, but if GC runs 20% of the time, instead of 600 ms pause, it would be 3 seconds at 80% your code, 20% the GC. Maybe it is a solution to avoid long pauses in interactive programs but the loss of performance is there in any case.

I wonder if it would be possible to produce garbage faster than it is collected...

@randall77
Contributor

@rhysh, it is an interesting idea to allocate overflow buckets as a single array (perhaps contiguous with the main buckets). Then we could use bucket ids instead of pointers to reference them.

The main problem is that there's no a priori maximum number of overflow buckets needed. It depends on how the hashing works out. It could be as bad as half the size of the main bucket array. You wouldn't want to allocate this worst case ahead of time. Maybe there would be a way to trigger growth on # of used overflow buckets instead of on just the number of entries as we do now. That way, we could have a fixed maximum, say 10%, of overflow buckets allocated at the start and we grow if we run out of them.

@RLH
Contributor
RLH commented Jan 5, 2015

A goroutine that is allocating or creating other GC work such as writing
pointers at a rate faster than the GC can deal with will need to be
throttled. This will allow the GC to complete in a timely fashion. What you
don't want is a goroutine that is not creating additional work for the GC
to also be throttled.

On Mon, Jan 5, 2015 at 4:41 PM, siritinga notifications@github.com wrote:

I suppose that the total GC time would be the same or maybe slower if the
start/stop takes some time, but if GC runs 20% of the time, instead of 600
ms pause, it would be 3 seconds at 80% your code, 20% the GC. Maybe it is a
solution to avoid long pauses in interactive programs but the loss of
performance is there in any case.

I wonder if it would be possible to produce garbage faster than it is
collected...


Reply to this email directly or view it on GitHub
#9477 (comment).

@rmdamiao
rmdamiao commented Jan 5, 2015

We would not have this kind of problem if go provided an "unmanaged" library which could implement manual memory management for regions of heap to be ignored by the GC. Neither a pointer to any address to the "unmanaged" heap nor any pointer type stored inside the "unmanaged" heap would be considered for GC. This could be a very simple solution which would solve once and for all the problems that go has with long-lived pointer values and which will probably never be solved by GC.

@rhysh
Contributor
rhysh commented Jan 5, 2015

Right @randall77, there's no limit on the number of overflow buckets. If a map stays about the same size but has a lot of churn, it seems like each of the primary buckets could grow to have a considerable number of overflow buckets chained to them - the random distribution of elements could bump each bucket over 8 or 16 elements, without ever increasing the average beyond the load factor. Since the overflow buckets aren't freed when they're emptied via runtime·mapdelete, the expansion would be permanent.

There'd probably need to be an "overflow bucket evacuation" process that would operate like the current evacuation to allow resizing of the new array. Added complexity for sure, but it may be worth it.

Do you happen to have statistics on how many overflow buckets end up used for long-lived maps with churn? This could maybe be collected at runtime by chasing the overflow map pointers, or offline by inspecting debug.WriteHeapDump output?

@randall77
Contributor

@rhysh, no I don't have any data on long-lived maps.

I have been meaning to implement compacting and freeing of the overflow buckets when the map has never been iterated over. For maps which have had an iterator started on them, however, I don't see any easy way to compact. Your idea of repurposing evacuation to do compacting may work. We'd then have two types of evacuation - to a map of the same size and a map of double the size. Both would be triggered by running out of the overflow bucket pool. We'd have to figure out which to do at that point (how?).

@rhysh
Contributor
rhysh commented Jan 6, 2015

@randall77 I haven't thought through how iteration interacts with this. Maybe golang-dev has some good ideas?

My rough impression is that the evacuation work has to be complete by the time we'd need a subsequent resize. It looks like you guarantee this by doing evacuate work on writes and doubling the size. If the size doesn't double, then we might run into problems (needing another evacuate before the current one has finished). I wonder if the overflow compaction could happen only when the table was already twice the size it needed to be.

If we've got a table with 6000 elements, it will fit in a hmap with 1024 buckets (if loadFactor is 6.5). When it runs out of overflow buckets due to churn, maybe we resize it to 2048 buckets. When it runs out of overflow buckets again (but still has ~6000 elements), we can notice that it's already twice as big as required and evacuate it to a new table of the same size. This same-size evacuation might have to run twice as fast to finish in time, but it's still a constant factor.

@dvyukov
Member
dvyukov commented Jan 11, 2015

If we have a map[k]v where both k and v does not contain pointers and we want to improve scan performance, then we can do the following.
Add 'allOverflow []unsafe.Pointer' to hmap and store all overflow buckets in it. Then mark bmap as noScan. This will make scanning very fast, as we won't scan any user data.
In reality it will be somewhat more complex, because we will need to remove old overflow buckets from allOverflow. And also it will increase size of hmap, so it may require some reshuffling of data as well.

@dvyukov
Member
dvyukov commented Jan 11, 2015

I am not sure how important are hashmaps w/o pointers, somebody needs to do a study. I feel that they are not very frequently used, but on the other hand it is a nice optimization option if you have a large map and can refactor the code so that k/v do not contain pointers (e.g. change string to [8]byte).

@vessenes

I would dearly love this optimization as an option; I posted to https://groups.google.com/forum/#!topic/golang-nuts/pHYverdFcLc with a description of what I'm facing. But think 10+second stop the world GC halts frequently. @cespare indicated that he had good success rolling his own implementation, If for some reason his code isn't shareable, I'll see what I can cook up.

@ianlancetaylor
Contributor

This is all completely different on tip and I think the large pauses are already gone there.

@siritinga

@ianlancetaylor you mean that tip should have lower GC times or simply lower pauses because it is concurrent? In a quick check, I see that GC total time is almost double than 1.4.1 for a big map[string]string but the time is in the first field of the gctrace debug instead of the third (I think the first is the mark phase that now is concurrent?).

@ianlancetaylor
Contributor

Lower pause times. Which is what this big is about.

@RLH
Contributor
RLH commented Jan 26, 2015

Tip contains a lot of code to check the validity of the GC phases and
help us find bugs. I would not draw any conclusions from these numbers.

On Sun, Jan 25, 2015 at 3:51 AM, siritinga notifications@github.com wrote:

@ianlancetaylor https://github.com/ianlancetaylor you mean that tip
should have lower GC times or simply lower pauses because it is concurrent?
In a quick check, I see that GC total time is almost double than 1.4.1 for
a big map[string]string but the time is in the first field of the gctrace
debug instead of the third (I think the first is the mark phase that now is
concurrent?).


Reply to this email directly or view it on GitHub
#9477 (comment).

@dvyukov
Member
dvyukov commented Jan 26, 2015

Mailed a change that makes GC ignore maps when both key and value do not contain pointers:
https://go-review.googlesource.com/#/c/3288/
I've tested it on a map[int]int with 2e8 elements. The change reduces scan time from 8 seconds to zero.

@dvyukov dvyukov added a commit that referenced this issue Jan 27, 2015
@dvyukov dvyukov runtime: do not scan maps when k/v do not contain pointers
Currently we scan maps even if k/v does not contain pointers.
This is required because overflow buckets are hanging off the main table.
This change introduces a separate array that contains pointers to all
overflow buckets and keeps them alive. Buckets themselves are marked
as containing no pointers and are not scanned by GC (if k/v does not
contain pointers).

This brings maps in line with slices and chans -- GC does not scan
their contents if elements do not contain pointers.

Currently scanning of a map[int]int with 2e8 entries (~8GB heap)
takes ~8 seconds. With this change scanning takes negligible time.

Update #9477.

Change-Id: Id8a04066a53d2f743474cad406afb9f30f00eaae
Reviewed-on: https://go-review.googlesource.com/3288
Reviewed-by: Keith Randall <khr@golang.org>
85e7bee
@cespare
Contributor
cespare commented Jan 27, 2015

@dvyukov Thanks so much!

@dvyukov
Member
dvyukov commented Jan 27, 2015

The change is merged.
Is here anything else actionable?

@vessenes

I'm sorry to say there is; I have been having weird corruption issues with maps post-GC while running on tip, and git bisect fingered this commit.

@dvyukov, where do you want me to include details? Here, or on golang-devs?

@dvyukov
Member
dvyukov commented Feb 12, 2015

@vessenes please file a separate issue with repro instructions

@vessenes

The bug I mentioned got discussed on -dev and fixed by @dvyukov.

@cespare
Contributor
cespare commented Apr 4, 2015

Closing this out as the low-hanging fruit was fixed by @dvyukov and there's no clear next actions wrt pointer-containing map types. Of course I'm still looking forward to future map improvements :)

With @dvyukov's change, GC pause behavior for maps should more closely resemble that for the other aggregate data types and should be easier to reason about in general (say, if you're trying to reduce pause latency and you know the "duration is proportional to # of pointers on the heap" rule of thumb).

@cespare cespare closed this Apr 4, 2015
@mikioh mikioh modified the milestone: Go1.5Maybe, Go1.5 Apr 10, 2015
@gopherbot gopherbot locked and limited conversation to collaborators May 26, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.