-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
revert to go1.4.2 (and wait for go1.6+) #5217
Comments
FWIW, go 1.6-beta showed increased CPU time when run for a period of time and we reverted our cluster back to 1.4.2 which is stable (and consistent) for many days - which has proven solid under load. I'm sure somebody with experience could use this as a useful test for Go 1.6 beta testing to try to get InfluxDB performance back to that achieved closer to 1.4.2 - see https://groups.google.com/forum/#!topic/golang-nuts/24zV9JeBoEE - but sadly I dont have that expertise. Its ironic that we see less jitter in response times (in and out) with the pre-new-GC in Go, but we do! The effect of the higher CPU usage (which is an order of magnitude for us) far outstrips the benefit in reduced GC sweep times. For now our suggestion is to stick with 1.4.2 until somebody has time to dig into the regression revealed by 1.6 (skipping 1.5 is a no-brainer) If anybody has any suggestions for environment vars, or output that we could provide, we can double-write to an identical machine running 1.6 fairly easily. |
This is going to be resolved by #5331, so I'm closing this out. Thanks for all the benchmarks! |
The 1.6 vs 1.4 perf issues is being looked at here: golang/go#14189 |
@sebito91 @daviesalex We've been doing a lot of work to reduce allocations, and we're now seeing better performance on Go 1.6.2 than Go 1.4.3 for some synthetic tests. Would you guys be able to test Go 1.6.2 with master and let us know what performance looks like for you? Thanks! |
Sure thing, we'll take a look now with current HEAD. |
FYI, current HEAD may not so GC heavy as before since #5522 reduces many pointers in tsm1 buffer. Each entry in tsm1 buffer had:
I removed 1, but 2 remains. So number of pointers in tsm1 buffer is half of before. |
Confirmed that go1.6.2 is looking very, very good! As @methane pointed out, the GC numbers have virtually dropped to 0, which is legendary...no data loss, full capabilities to date. Great work! Attached a screenshot of our setup, disregard that small window (admin work on the machine, unrelated to influxdb) |
@sebito91 Is that with any changes to |
@sebito91 Go 1.4.3 and Go 1.6.2 use same version of influxdb? |
@methane We removed many other uses of |
@jwilder no changes to gogc whatsover, purely stock implmentation. We built from head, commit 1d9919a using go1.6.2 instead of standard go1.4.3 and dropped the binaries into place. Pretty sweet changes to be honest! @methane, yes they are the same stock version but the more recent data is built from HEAD vs stock rpm. |
@toddboom, would be awesome to hear your plans for upgrading to go1.6.2 now that things seem to have improved. Any thoughts on rough timing? |
@sebito91 We just released v0.13.0 today, and it's all on Go 1.6.2 now. Thanks for all your help! |
The recent shift to go1.5.2 has resulted in some drastic changes to overall load, throughput and utilization in our cluster. I know that we recently went back and forth between version, finally settling on go1.5.2 but it looks like there may be some regressions in terms of performance that are difficult to overlook.
As an exercise, we compiled the same version of the influxdb client using each of go1.4.2, go1.5.2 and go1.6beta1 to varying degrees of 'success'. In terms of overall performance, the ranking shakes out as follows:
Image at bottom of Issue Tracker...
Overall, we've noticed a dramatic increase in CPU utilization and system load when using go1.5.2. In our case we're very lucky to have very powerful hardware to use, but handling a sustained load ~40-50 is a lot...especially given that the same binary using go1.4.2 hovers ~5.
In addition, pointsRx variance drops by an order of magnitude going from go1.5.2 to go1.4.2/go1.6beta1 with a much smaller standard deviation over time. It seems as if the binary's ability to ingest metrics from the UDP sockets is drastically improved in both versions over go1.5.2.
RSS does stay relatively consistent over time across all versions, but we have not spun up GODEBUG=gctrace=1,schedtrace=10000 across each iteration to truly see how things are working under the cover (that's next).
Good news on the query front is that all queries stay relatively consistent in terms of return times, averaging about 700ms for the image below (all versions of go yield a similar result). This means the changes largely impact the metrics ingestion portion of influxdb, and my strong suspicion without any data (yet) is down to GC.
If you'd like any logs, let me know.
The text was updated successfully, but these errors were encountered: