Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

MT cpu spike due to GC -> requests can easily take >5s to get answered #172

Open
Dieterbe opened this issue Mar 9, 2016 · 7 comments
Open
Assignees
Labels
Milestone

Comments

@Dieterbe
Copy link
Contributor

Dieterbe commented Mar 9, 2016

i'm going to look into techniques to lower GC cpu overhead.
we currently reference a lot of data through pointers, i suspect we may be able to lower GC quite a bit by being smarter about this.

@Dieterbe
Copy link
Contributor Author

Dieterbe commented Apr 4, 2016

test with https://gist.github.com/Dieterbe/bda3f2af50c56146e98580a03c2b6eaa applied to raintank-docker to auto-apply realistic workload
results in:
sys https://snapshot.raintank.io/dashboard/snapshot/1zc8flsQTV4pyjOv6fIm5BXH3eird4kF
MT https://snapshot.raintank.io/dashboard/snapshot/hvtuSiLV0CDtJy31zWdDKQ2ZQWQOW1VI (GC~spikes correlation visible on duration chart)

vegeta:

cat attack.out | vegeta report 2>&1 | egrep -v 'connection reset|timed out|timeout'
Requests      [total, rate]            60000, 200.00
Duration      [total, attack, wait]    5m26.855960507s, 4m59.994999853s, 26.860960654s
Latencies     [mean, 50, 95, 99, max]  17.006800796s, 13.661618151s, 43.008771756s, 53.01141618s, 2m7.401387905s
Bytes In      [total, mean]            1756638977, 29277.32
Bytes Out     [total, mean]            0, 0.00
Success       [ratio]                  68.44%
Status Codes  [code:count]             200:41066  0:18934  
Error Set:
root@benchmark:/opt/raintank/raintank-tsdb-benchmark# cat attack.out | vegeta report 2>&1 | egrep -c 'connection reset|timed out|timeout'
2163

new sys https://snapshot.raintank.io/dashboard/snapshot/i9TIko5tB522Wh8RQVgR7BG6BZmjmFna
new MT https://snapshot.raintank.io/dashboard/snapshot/wFketZbpnZUZjxg1QbIZmNNJgSoJdnUn

vegeta:

cat vegeta-after
root@benchmark:/opt/raintank/raintank-tsdb-benchmark# cat attack.out | vegeta report 2>&1 | egrep -v 'connection reset|timed out|timeout'
Requests      [total, rate]            60000, 200.00
Duration      [total, attack, wait]    5m42.394862196s, 4m59.994999882s, 42.399862314s
Latencies     [mean, 50, 95, 99, max]  17.811976008s, 14.105108138s, 43.010377182s, 53.013462631s, 1m14.607911294s
Bytes In      [total, mean]            1677464219, 27957.74
Bytes Out     [total, mean]            0, 0.00
Success       [ratio]                  66.55%
Status Codes  [code:count]             200:39932  0:20068  
Error Set:
root@benchmark:/opt/raintank/raintank-tsdb-benchmark# cat attack.out | vegeta report 2>&1 | egrep -c 'connection reset|timed out|timeout'
2253

=> my test was probably using too many req/s or something. it seemed graphite-api itself had issues keeping up, however we can still tell what we need to tell:
=> no discernable change. similar latency spikes at GC runs

@Dieterbe
Copy link
Contributor Author

Dieterbe commented Apr 6, 2016

confirmed again using latest golang master, which includes Austin's fix.

@Dieterbe
Copy link
Contributor Author

Dieterbe commented Aug 1, 2016

latest master has GC changes that should help

@Dieterbe Dieterbe self-assigned this Aug 1, 2016
@Dieterbe
Copy link
Contributor Author

a fix was merged in Go for golang/go#16293 : golang/go@cf4f1d0 , this has shown good results for large maps (see also spion/hashtable-latencies#13). It will likely fix our issue as well. We just need to test it.
Only problem is it's in git master, and there most likely won't be a 1.7.x release for it so we have to use go from git master and/or wait for 1.8

@dgryski
Copy link

dgryski commented Sep 25, 2016

Is it reasonable to cherry-pick that fix onto 1.7.1 ?

@Dieterbe
Copy link
Contributor Author

i'll just run a bench in raintank-docker.
now is especially a good time because of https://groups.google.com/forum/m/#!topic/golang-dev/Ab1sFeoZg_8 also

@Dieterbe
Copy link
Contributor Author

golang/go#14812

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants