Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: GC causes latency spikes with single processor #18534

Closed
tarm opened this issue Jan 6, 2017 · 14 comments

Comments

Projects
None yet
6 participants
@tarm
Copy link

commented Jan 6, 2017

(This may be the same as #14812, but the discussion for that issue has veered towards sync.Pool and there are many processors available in the traces people have posted to that issue.)

What version of Go are you using (go version)?

go1.7.4 Testing with the latest master branch does not make a difference for my real application.

What operating system and processor architecture are you using (go env)?

Linux on an x64 single-core 1.4GHz Atom processor

What did you do?

It does not run on the playground (takes too long and cannot write trace file), but here's a minimal test case that reproduces the issue
https://play.golang.org/p/aJmmumzKpD
(original example that was slightly more complicated: https://play.golang.org/p/_tQTfBR33g)

Specifically, that reproduces the approximate amount of memory, GC frequency, GC irregularity that I see in our application. Concretely, this reproduction test case has GCs that collect 50-100MB but also sometimes cause application delays with long tails on the order of 100ms.

To mimic the single core processor from my laptop, I was running the test with CPU affinity like this: "GODEBUG=gcpacertrace=1,gctrace=1 taskset -c 0 ./main"

What did you expect to see?

The main goroutine is driven off of a 100ms tick. Ideally that tick would arrive every 100ms and then the main goroutine would run immediately after the tick then and be done.

What did you see instead?

Sometimes the tick gets delayed and sometimes the main goroutine gets delayed even after the tick fires. Measuring the deviation away from the expected 100ms interval, I see a latency distribution like this:

  0.0% -64.359857ms
  1.0% -41.943823ms
  5.0% -19.403262ms
  50.0% 121.887µs
  95.0% 20.226485ms
  99.0% 40.27582ms
  100.0% 75.602625ms

The mean is as expected, but the 75ms deviation is much higher than we want to see. (The negative values are the ticker catching up after previous delays).

The gctrace output before the 75ms delay looks like this:

gc 79 @38.948s 10%: 0.099+133+0.59 ms clock, 0.099+78/33/0+0.59 ms cpu, 72->78->70 MB, 78 MB goal, 1 P
pacer: sweep done at heap size 76MB; allocated 6MB of spans; swept 10013 pages at +2.094311e-004 pages/byte
pacer: assist ratio=+4.354140e+000 (scan 103 MB in 116->140 MB) workers=0+1
Higher latency than expected at iteration 395.  Tick was delayed 26.735579ms
pacer: H_m_prev=73777328 h_t=+6.622503e-001 H_T=122636387 h_a=+7.576055e-001 H_a=129671440 h_g=+1.000000e+000 H_g=147554656 u_a=+4.951149e-001 u_g=+2.500000e-001 W_
a=51445176 goalΔ=+3.377497e-001 actualΔ=+9.535521e-002 u_a/u_g=+1.980459e+000
gc 80 @39.688s 10%: 0.11+118+0.26 ms clock, 0.11+29/30/28+0.26 ms cpu, 116->123->55 MB, 140 MB goal, 1 P
pacer: sweep done at heap size 55MB; allocated 0MB of spans; swept 15855 pages at +3.785020e-004 pages/byte
pacer: assist ratio=+5.837807e+000 (scan 85 MB in 96->111 MB) workers=0+1
pacer: H_m_prev=58283288 h_t=+7.367016e-001 H_T=101220679 h_a=+7.406318e-001 H_a=101449744 h_g=+1.000000e+000 H_g=116566576 u_a=+2.500000e-001 u_g=+2.500000e-001 W_
a=34801104 goalΔ=+2.632984e-001 actualΔ=+3.930191e-003 u_a/u_g=+1.000000e+000
gc 81 @40.666s 9%: 0.11+56+0.20 ms clock, 0.11+0/16/39+0.20 ms cpu, 96->96->37 MB, 111 MB goal, 1 P
pacer: sweep done at heap size 37MB; allocated 0MB of spans; swept 12413 pages at +3.744516e-004 pages/byte
pacer: assist ratio=+1.236822e+001 (scan 62 MB in 70->75 MB) workers=0+1
Higher latency than expected at iteration 409.  Tick was delayed 75.602625ms

My lay-person reading of the trace is that the "child" garbage-generating goroutines are asked to assist with the GC, but then they end up blocking execution of both the runtime.timerproc goroutine and the main goroutine.

This seems like a scheduler issue and goroutines that generate minimal garbage should be prioritized over goroutines assisting with the GC.

gctrace=1,gcpacertrace=1 output

runtime trace The 74ms delay occurs around 41 seconds into the trace. The tick should have fired at 41003ms but did not fire until 41079ms and the G1 goroutine did not run until 41140ms. The complete trace is split up into multiple chunks, but this GC can be found 12.6s into its chunk.

@aclements aclements self-assigned this Jan 6, 2017

@aclements aclements added this to the Go1.9 milestone Jan 6, 2017

@tarm

This comment has been minimized.

Copy link
Author

commented Jan 31, 2017

@RLH @aclements @dvyukov Is this on your radar for go1.9? Have you had a chance to try the example code yet?

I tried the patches that Austin posted for #14812, but those did not resolve the issue in the example that I posted. I also tried playing around with more scheduler options in the runtime, but I'm not familiar with the GC and scheduler and none of the basic "reschedule now" changes that I made seemed to help.

@quentinmit

This comment has been minimized.

Copy link
Contributor

commented Jan 31, 2017

Your taskset invocation does not accurately model the behavior of a single CPU system. In particular, Go is still going to use a larger GOMAXPROCS, which is actively going to make this worse by trying to schedule multiple threads on a single physical core.

I think you should be able to get closer by setting GOMAXPROCS=1 when running the test, in addition to taskset. What does that do to the latency profile?

@minux

This comment has been minimized.

Copy link
Member

commented Jan 31, 2017

@tarm

This comment has been minimized.

Copy link
Author

commented Mar 18, 2017

I am also able to reproduce this issue on darwin amd64 with go1.8 and GOMAXPROCS=1.

$ GOMAXPROCS=1 ./bug18534
  0.0% -68.680343ms
  1.0% -36.249343ms
  5.0% -21.846205ms
  50.0% 33.946µs
  95.0% 21.279699ms
  99.0% 35.011957ms
  100.0% 58.775745ms
$ ./bug18534             # 4 cores, 8 procs
  0.0% -6.406381ms
  1.0% -4.595611ms
  5.0% -3.099409ms
  50.0% -22.001µs
  95.0% 3.246163ms
  99.0% 4.806314ms
  100.0% 7.189159ms

The scheduler/GC seems to need those extra cores to keep latency down.

@tarm

This comment has been minimized.

Copy link
Author

commented May 19, 2017

Some more data points:

$ taskset -c 0 ./issue18534 
  0.0% -70.533562ms
  1.0% -52.267319ms
  5.0% -35.006899ms
  50.0% -2.676µs
  95.0% 36.313112ms
  99.0% 57.591539ms
  100.0% 86.679896ms

GOMAXPROCS=2 taskset -c 0 ./issue18534 
  0.0% -61.117609ms
  1.0% -31.714017ms
  5.0% -21.586722ms
  50.0% 949ns
  95.0% 20.895062ms
  99.0% 33.908325ms
  100.0% 62.686614ms

$ GOMAXPROCS=4 taskset -c 0 ./issue18534 
  0.0% -33.988405ms
  1.0% -26.603924ms
  5.0% -11.102534ms
  50.0% 1.049µs
  95.0% 10.910585ms
  99.0% 23.701329ms
  100.0% 34.930726ms

$ GOMAXPROCS=5 taskset -c 0 ./issue18534
  0.0% -41.173521ms
  1.0% -15.659458ms
  5.0% -7.67135ms
  50.0% 70ns
  95.0% 7.62678ms
  99.0% 15.607855ms
  100.0% 41.169155ms

$ GOMAXPROCS=8 taskset -c 0 ./issue18534 
  0.0% -17.46882ms
  1.0% -10.194657ms
  5.0% -5.408951ms
  50.0% -315ns
  95.0% 5.410479ms
  99.0% 10.139459ms
  100.0% 16.585539ms

$ GOMAXPROCS=16 taskset -c 0 ./issue18534 
  0.0% -21.709796ms
  1.0% -8.175178ms
  5.0% -4.094512ms
  50.0% -173ns
  95.0% 4.09989ms
  99.0% 8.111605ms
  100.0% 21.653956ms

(top indicates that taskset is constraining the process to 1 core as expected.)

My understanding is that setting GOMAXPROCS will force the runtime to create more threads. Even when those threads are running on the same core and have to be scheduled by the OS (linux in this case), that seems to lead to lower latency than just letting the Go runtime schedule a single thread.

Here are some measurements without taskset, but with different values for GOMAXPROCS:

$  ./issue18534 # 40 logical cores, GOMAXPROCS set by default
  0.0% -8.913432ms
  1.0% -103.472µs
  5.0% -51.566µs
  50.0% -56ns
  95.0% 50.329µs
  99.0% 107.945µs
  100.0% 8.917999ms

$ GOMAXPROCS=4 ./issue18534 
  0.0% -16.824243ms
  1.0% -2.714634ms
  5.0% -80.119µs
  50.0% -358ns
  95.0% 82.118µs
  99.0% 2.521557ms
  100.0% 16.79671ms

$ GOMAXPROCS=5 ./issue18534 
  0.0% -11.703387ms
  1.0% -205.105µs
  5.0% -72.006µs
  50.0% -1.152µs
  95.0% 72.07µs
  99.0% 181.71µs
  100.0% 11.744029ms

$ GOMAXPROCS=4 ./issue18534 
  0.0% -13.447356ms
  1.0% -5.392172ms
  5.0% -89.006µs
  50.0% -401ns
  95.0% 90.569µs
  99.0% 5.325731ms
  100.0% 13.443492ms

$ GOMAXPROCS=5 ./issue18534 
  0.0% -10.005695ms
  1.0% -205.612µs
  5.0% -68.098µs
  50.0% 241ns
  95.0% 64.149µs
  99.0% 186.699µs
  100.0% 10.021666ms

$ GOMAXPROCS=8 ./issue18534 
  0.0% -180.467µs
  1.0% -106.693µs
  5.0% -56.36µs
  50.0% -228ns
  95.0% 56.368µs
  99.0% 91.327µs
  100.0% 213.66µs

It's interesting that there is a consistent change in the 99% latency between GOMAXPROCS=4 and =5. I suspect that is an artifact of some decision within the runtime about the number of idle GC workers, but have not looked at a trace. A similar looking change can also be seen in the taskset samples when going from GOMAXPROCS=4->5.

To me, these samples point to a deficiency in the Go runtime's handling of scheduling/preemption during GC. That issue is probably hidden on systems with more than 4 cores, which many server systems have these days.

@RLH

This comment has been minimized.

Copy link
Contributor

commented May 19, 2017

@aclements aclements modified the milestones: Go1.9Maybe, Go1.9 Jul 18, 2017

@aclements

This comment has been minimized.

Copy link
Member

commented Jul 18, 2017

I re-ran your reproducer with current master, which annotates mark assists in the execution trace, and added a go badness() just before the "high latency" print so I'd have something to search for in the execution trace.

This does appear to be caused by GC and scheduling, though several things have to go wrong simultaneously. Here's the new trace I'm analyzing with tip.

The worst delay is a 57ms delay that gets reported at 1,571.288 ms into the trace. First, a fractional mark worker starts at 1,473.412 ms. Since there's just one P, this necessarily delays all other work. This is supposed to run for at most 10ms, but for reasons I don't fully understand runs for 20ms. This delays the timer goroutine, so six wakeups all happen at 1,493.612 ms when the fractional worker finishes. It so happens these are all childLoops that happened to randomly pick short sleeps. These childLoops each run for ~20ms, partly because they're doing a lot of assist work (#14951) and partly because they take a long time to run even when not assisting. Again, I'm not sure why this is 20ms instead of 10ms. Together, these consume the next 60ms of CPU time. The mainLoop wake up is supposed to happen fairly early in this 60ms window. Because these are saturating the CPU, again, the timer goroutine doesn't get to run, so another three timer wakeups pile up until 1,557.327 ms. One of these (it happens to be the last one serviced), is mainLoop.

I need to dig in to why all of these preemptions are happening at 20ms. My guess is that sysmon has fallen back to a 10ms tick and has to observe them in the same goroutine for two ticks in a row before preempting, which seems a bit bogus. I suspect fixing that and fixing the over-assist problem (#14951) would fix this issue.

@aclements aclements modified the milestones: Go1.10, Go1.9Maybe Jul 18, 2017

@gopherbot

This comment has been minimized.

Copy link

commented Aug 29, 2017

Change https://golang.org/cl/59970 mentions this issue: runtime: separate soft and hard heap limits

@gopherbot

This comment has been minimized.

Copy link

commented Aug 29, 2017

Change https://golang.org/cl/59971 mentions this issue: runtime: reduce background mark work to from 25% to 20%

@tarm

This comment has been minimized.

Copy link
Author

commented Aug 30, 2017

@aclements Thanks for your work on this!

I cherry-picked your 2 patches onto tip and ran our actual application (not just the test reproduction case in this ticket):

Go v1.9.0:

I0830 00:43:14.905531  196377 main.go:804] logic loop time distribution (10000 samples): min = 4.808µs, 50% <= 11.472µs, 90% <= 18.879µs, 99% <= 5.018271ms, 99.9% <= 8.561956ms, max = 99.805794ms
I0830 00:59:59.306973  196377 main.go:804] logic loop time distribution (10000 samples): min = 5.443µs, 50% <= 10.039µs, 90% <= 16.947µs, 99% <= 53.701µs, 99.9% <= 11.057286ms, max = 136.385102ms
I0830 01:16:46.312292  196377 main.go:804] logic loop time distribution (10000 samples): min = 5.516µs, 50% <= 15.208µs, 90% <= 20.33µs, 99% <= 5.425571ms, 99.9% <= 12.001578ms, max = 122.486144ms

Go tip (d77d4f5) + patches applied

I0830 00:19:54.003992  185790 main.go:804] logic loop time distribution (10000 samples): min = 4.541µs, 50% <= 14.558µs, 90% <= 18.88µs, 99% <= 30.912µs, 99.9% <= 6.240578ms, max = 25.673079ms
I0830 00:36:35.013456  185790 main.go:804] logic loop time distribution (10000 samples): min = 4.521µs, 50% <= 17.13µs, 90% <= 20.187µs, 99% <= 29.954µs, 99.9% <= 6.296957ms, max = 12.587176ms
I0830 00:53:18.005010  185790 main.go:804] logic loop time distribution (10000 samples): min = 4.688µs, 50% <= 10.847µs, 90% <= 17.855µs, 99% <= 26.167µs, 99.9% <= 5.468955ms, max = 7.078815ms
I0830 01:10:01.610305  185790 main.go:804] logic loop time distribution (10000 samples): min = 5.305µs, 50% <= 13.549µs, 90% <= 19.775µs, 99% <= 31.4µs, 99.9% <= 6.350999ms, max = 16.407586ms

The 99.9% latencies moved from about 10ms -> 6ms, but the worst case latencies (out of 10k samples) went from ~120ms -> ~15ms. That is huge improvement in the worst case mutator latencies and would be enough to resolve this issue for us.

The PRs are still marked as DO NOT SUBMIT until benchmarking is complete. I am eager to see them merged, so let me know if there are is more I can help with to move them along. (My real application does not create that much garbage, and so is not useful for benchmarking odd GC assist interactions.)

@aclements

This comment has been minimized.

Copy link
Member

commented Aug 30, 2017

@tarm, that's great! Could you also check what effect the patches have on heap size and throughput?

You're running with GOMAXPROCS=1, right? There's still an issue with fractional workers running for a long time without preemption, which I suspect is responsible for the 15ms worst-case you saw. I just filed an issue focused on that problem (#21698).

@tarm

This comment has been minimized.

Copy link
Author

commented Aug 31, 2017

For the recent benchmarking, I have been running with taskset -c 0 but that should constrain it to one core as well or better than GOMAXPROCS=1.

This application is a tick based application instead of a run-to-completion type application so throughput is more difficult to measure. Over about 12 hours of running, the accumulated CPU time (as reported through top) was about 1.4% higher in the version with the patches, but that also includes changes from v1.9.0 to tip, so some of that may not be from the patches.

The application avoids allocation and is not a heavy user of RAM, so is also not a great stressor of the heap size and we have not specifically instrumented for heap usage. That said, the memory usage reported through top was about 1% higher for the new version.

@gopherbot

This comment has been minimized.

Copy link

commented Aug 31, 2017

Change https://golang.org/cl/60790 mentions this issue: cmd/trace: add minimum mutator utilization (MMU) plot

@gopherbot

This comment has been minimized.

Copy link

commented Oct 5, 2017

Change https://golang.org/cl/68573 mentions this issue: runtime: preempt fractional worker after reaching utilization goal

gopherbot pushed a commit that referenced this issue Oct 13, 2017

runtime: preempt fractional worker after reaching utilization goal
Currently fractional workers run until preempted by the scheduler,
which means they typically run for 20ms. During this time, all other
goroutines on that P are blocked, which can introduce significant
latency variance.

This modifies fractional workers to self-preempt shortly after
achieving the fractional utilization goal. In practice this means they
preempt much sooner, and the scale of their preemption is on the order
of how often the user goroutine block (so, if the application is
compute-bound, the fractional workers will also run for long times,
but if the application blocks frequently, the fractional workers will
also preempt quickly).

Fixes #21698.
Updates #18534.

Change-Id: I03a5ab195dae93154a46c32083c4bb52415d2017
Reviewed-on: https://go-review.googlesource.com/68573
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>

@gopherbot gopherbot closed this in 03eb948 Oct 31, 2017

gopherbot pushed a commit that referenced this issue Oct 31, 2017

runtime: allow 5% mutator assist over 25% background mark
Currently, both the background mark worker and the goal GC CPU are
both fixed at 25%. The trigger controller's goal is to achieve the
goal CPU usage, and with the previous commit it can actually achieve
this. But this means there are *no* assists, which sounds ideal but
actually causes problems for the trigger controller. Since the
controller can't lower CPU usage below the background mark worker CPU,
it saturates at the CPU goal and no longer gets feedback, which
translates into higher variability in heap growth.

This commit fixes this by allowing assists 5% CPU beyond the 25% fixed
background mark. This avoids saturating the trigger controller, since
it can now get feedback from both sides of the CPU goal. This leads to
low variability in both CPU usage and heap growth, at the cost of
reintroducing a low rate of mark assists.

We also experimented with 20% background plus 5% assist, but 25%+5%
clearly performed better in benchmarks.

Updates #14951.
Updates #14812.
Updates #18534.

Combined with the previous CL, this significantly improves tail
mutator utilization in the x/bechmarks garbage benchmark. On a sample
trace, it increased the 99.9%ile mutator utilization at 10ms from 26%
to 59%, and at 5ms from 17% to 52%. It reduced the 99.9%ile zero
utilization window from 2ms to 700µs. It also helps the mean mutator
utilization: it increased the 10s mutator utilization from 83% to 94%.
The minimum mutator utilization is also somewhat improved, though
there is still some unknown artifact that causes a miniscule fraction
of mutator assists to take 5--10ms (in fact, there was exactly one
10ms mutator assist in my sample trace).

This has no significant effect on the throughput of the
github.com/dr2chase/bent benchmarks-50.

This has little effect on the go1 benchmarks (and the slight overall
improvement makes up for the slight overall slowdown from the previous
commit):

name                      old time/op    new time/op    delta
BinaryTree17-12              2.40s ± 0%     2.41s ± 1%  +0.26%  (p=0.010 n=18+19)
Fannkuch11-12                2.95s ± 0%     2.93s ± 0%  -0.62%  (p=0.000 n=18+15)
FmtFprintfEmpty-12          42.2ns ± 0%    42.3ns ± 1%  +0.37%  (p=0.001 n=15+14)
FmtFprintfString-12         67.9ns ± 2%    67.2ns ± 3%  -1.03%  (p=0.002 n=20+18)
FmtFprintfInt-12            75.6ns ± 3%    76.8ns ± 2%  +1.59%  (p=0.000 n=19+17)
FmtFprintfIntInt-12          123ns ± 1%     124ns ± 1%  +0.77%  (p=0.000 n=17+14)
FmtFprintfPrefixedInt-12     148ns ± 1%     150ns ± 1%  +1.28%  (p=0.000 n=20+20)
FmtFprintfFloat-12           212ns ± 0%     211ns ± 1%  -0.67%  (p=0.000 n=16+17)
FmtManyArgs-12               499ns ± 1%     500ns ± 0%  +0.23%  (p=0.004 n=19+16)
GobDecode-12                6.49ms ± 1%    6.51ms ± 1%  +0.32%  (p=0.008 n=19+19)
GobEncode-12                5.47ms ± 0%    5.43ms ± 1%  -0.68%  (p=0.000 n=19+20)
Gzip-12                      220ms ± 1%     216ms ± 1%  -1.66%  (p=0.000 n=20+19)
Gunzip-12                   38.8ms ± 0%    38.5ms ± 0%  -0.80%  (p=0.000 n=19+20)
HTTPClientServer-12         78.5µs ± 1%    78.1µs ± 1%  -0.53%  (p=0.008 n=20+19)
JSONEncode-12               12.2ms ± 0%    11.9ms ± 0%  -2.38%  (p=0.000 n=17+19)
JSONDecode-12               52.3ms ± 0%    53.3ms ± 0%  +1.84%  (p=0.000 n=19+20)
Mandelbrot200-12            3.69ms ± 0%    3.69ms ± 0%  -0.19%  (p=0.000 n=19+19)
GoParse-12                  3.17ms ± 1%    3.19ms ± 1%  +0.61%  (p=0.000 n=20+20)
RegexpMatchEasy0_32-12      73.7ns ± 0%    73.2ns ± 1%  -0.66%  (p=0.000 n=17+20)
RegexpMatchEasy0_1K-12       238ns ± 0%     239ns ± 0%  +0.32%  (p=0.000 n=17+16)
RegexpMatchEasy1_32-12      69.1ns ± 1%    69.2ns ± 1%    ~     (p=0.669 n=19+13)
RegexpMatchEasy1_1K-12       365ns ± 1%     367ns ± 1%  +0.49%  (p=0.000 n=19+19)
RegexpMatchMedium_32-12      104ns ± 1%     105ns ± 1%  +1.33%  (p=0.000 n=16+20)
RegexpMatchMedium_1K-12     33.6µs ± 3%    34.1µs ± 4%  +1.67%  (p=0.001 n=20+20)
RegexpMatchHard_32-12       1.67µs ± 1%    1.62µs ± 1%  -2.78%  (p=0.000 n=18+17)
RegexpMatchHard_1K-12       50.3µs ± 2%    48.7µs ± 1%  -3.09%  (p=0.000 n=19+18)
Revcomp-12                   384ms ± 0%     386ms ± 0%  +0.59%  (p=0.000 n=19+19)
Template-12                 61.1ms ± 1%    60.5ms ± 1%  -1.02%  (p=0.000 n=19+20)
TimeParse-12                 307ns ± 0%     303ns ± 1%  -1.23%  (p=0.000 n=19+15)
TimeFormat-12                323ns ± 0%     323ns ± 0%  -0.12%  (p=0.011 n=15+20)
[Geo mean]                  47.1µs         47.0µs       -0.20%

https://perf.golang.org/search?q=upload:20171030.4

It slightly improve the performance the x/benchmarks:

name                         old time/op  new time/op  delta
Garbage/benchmem-MB=1024-12  2.29ms ± 3%  2.22ms ± 2%  -2.97%  (p=0.000 n=18+18)
Garbage/benchmem-MB=64-12    2.24ms ± 2%  2.21ms ± 2%  -1.64%  (p=0.000 n=18+18)
HTTP-12                      12.6µs ± 1%  12.6µs ± 1%    ~     (p=0.690 n=19+17)
JSON-12                      11.3ms ± 2%  11.3ms ± 1%    ~     (p=0.163 n=17+18)

and fixes some of the heap size bloat caused by the previous commit:

name                         old peak-RSS-bytes  new peak-RSS-bytes  delta
Garbage/benchmem-MB=1024-12          1.88G ± 2%          1.77G ± 2%  -5.52%  (p=0.000 n=20+18)
Garbage/benchmem-MB=64-12             248M ± 8%           226M ± 5%  -8.93%  (p=0.000 n=20+20)
HTTP-12                              47.0M ±27%          47.2M ±12%    ~     (p=0.512 n=20+20)
JSON-12                               206M ±11%           206M ±10%    ~     (p=0.841 n=20+20)

https://perf.golang.org/search?q=upload:20171030.5

Combined with the change to add a soft goal in the previous commit,
the achieves a decent performance improvement on the garbage
benchmark:

name                         old time/op  new time/op  delta
Garbage/benchmem-MB=1024-12  2.40ms ± 4%  2.22ms ± 2%  -7.40%  (p=0.000 n=19+18)
Garbage/benchmem-MB=64-12    2.23ms ± 1%  2.21ms ± 2%  -1.06%  (p=0.000 n=19+18)
HTTP-12                      12.5µs ± 1%  12.6µs ± 1%    ~     (p=0.330 n=20+17)
JSON-12                      11.1ms ± 1%  11.3ms ± 1%  +1.87%  (p=0.000 n=16+18)

https://perf.golang.org/search?q=upload:20171030.6

Change-Id: If04ddb57e1e58ef2fb9eec54c290eb4ae4bea121
Reviewed-on: https://go-review.googlesource.com/59971
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>

@golang golang locked and limited conversation to collaborators Oct 31, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.