Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

investigate switch to bigcache #3

Merged
merged 1 commit into from
Dec 11, 2018
Merged

investigate switch to bigcache #3

merged 1 commit into from
Dec 11, 2018

Conversation

mroth
Copy link
Owner

@mroth mroth commented Dec 11, 2018

Investigation of bigcache performance, which should theory speed concurrent reads. I also like that it has a configurable hard max size for cache which could prevent OOM killer in certain edge case scenarios.

Initial results look fairly good:

benchmark                               old ns/op     new ns/op     delta
BenchmarkHTTPRequest-16                 3460          3424          -1.04%
BenchmarkHTTPRequestWithCache-16        611           597           -2.29%
BenchmarkHTTPRequestPar-16              459           442           -3.70%
BenchmarkHTTPRequestParWithCache-16     176           151           -14.20%

benchmark                               old allocs     new allocs     delta
BenchmarkHTTPRequest-16                 9              9              +0.00%
BenchmarkHTTPRequestWithCache-16        5              7              +40.00%
BenchmarkHTTPRequestPar-16              9              9              +0.00%
BenchmarkHTTPRequestParWithCache-16     5              7              +40.00%

benchmark                               old bytes     new bytes     delta
BenchmarkHTTPRequest-16                 840           839           -0.12%
BenchmarkHTTPRequestWithCache-16        661           789           +19.36%
BenchmarkHTTPRequestPar-16              911           1017          +11.64%
BenchmarkHTTPRequestParWithCache-16     790           982           +24.30%

This does add 2 memory allocations to the cache-hit case (which is consistent with the benchmarks in bigcache's readme compared to a map based solution), but we see an overall 14% improvement in throughput in the parallel test, which seems good!

However things get more interesting if I increase the benchtime significantly, testing here with it bumped up to 1m:

benchmark                               old ns/op     new ns/op     delta
BenchmarkHTTPRequest-16                 3457          3323          -3.88%
BenchmarkHTTPRequestWithCache-16        722           802           +11.08%
BenchmarkHTTPRequestPar-16              627           684           +9.09%
BenchmarkHTTPRequestParWithCache-16     896           973           +8.59%

benchmark                               old allocs     new allocs     delta
BenchmarkHTTPRequest-16                 9              9              +0.00%
BenchmarkHTTPRequestWithCache-16        5              7              +40.00%
BenchmarkHTTPRequestPar-16              9              9              +0.00%
BenchmarkHTTPRequestParWithCache-16     5              7              +40.00%

benchmark                               old bytes     new bytes     delta
BenchmarkHTTPRequest-16                 854           854           +0.00%
BenchmarkHTTPRequestWithCache-16        721           849           +17.75%
BenchmarkHTTPRequestPar-16              899           903           +0.44%
BenchmarkHTTPRequestParWithCache-16     721           853           +18.31%

Interesting to note that in both these cases (go-cache and bigcache), the overall BenchmarkParallel performance on this machine (8-core Xeon) becomes significantly worse... I suspected what may be happening is the impact of garbage collection, however the cached version being disproportionally effected is confusing.

@mroth
Copy link
Owner Author

mroth commented Dec 11, 2018

Trying again with GOMAXPROCS reduced so we still see parallelism but not to the same degree. (Hyperthreading issues?)

$ go test -bench=HTTPRequestPar -cpu 1,2,4,8,16,32 -benchtime 1s
goos: darwin
goarch: amd64
pkg: github.com/mroth/geominder
BenchmarkHTTPRequestPar                	  500000	      3309 ns/op
BenchmarkHTTPRequestPar-2              	 1000000	      1733 ns/op
BenchmarkHTTPRequestPar-4              	 2000000	       947 ns/op
BenchmarkHTTPRequestPar-8              	 2000000	       538 ns/op
BenchmarkHTTPRequestPar-16             	 3000000	       437 ns/op
BenchmarkHTTPRequestPar-32             	 3000000	       436 ns/op
BenchmarkHTTPRequestParWithCache       	 2000000	       602 ns/op
BenchmarkHTTPRequestParWithCache-2     	 3000000	       408 ns/op
BenchmarkHTTPRequestParWithCache-4     	 5000000	       269 ns/op
BenchmarkHTTPRequestParWithCache-8     	10000000	       164 ns/op
BenchmarkHTTPRequestParWithCache-16    	10000000	       131 ns/op
BenchmarkHTTPRequestParWithCache-32    	10000000	       117 ns/op
PASS
ok  	github.com/mroth/geominder	26.977s

go test -bench=HTTPRequestPar -cpu 1,2,4,8,16,32 -benchtime 10s 
goos: darwin
goarch: amd64
pkg: github.com/mroth/geominder
BenchmarkHTTPRequestPar                	 5000000	      3340 ns/op
BenchmarkHTTPRequestPar-2              	10000000	      1779 ns/op
BenchmarkHTTPRequestPar-4              	20000000	       972 ns/op
BenchmarkHTTPRequestPar-8              	30000000	       528 ns/op
BenchmarkHTTPRequestPar-16             	30000000	       426 ns/op
BenchmarkHTTPRequestPar-32             	30000000	       418 ns/op
BenchmarkHTTPRequestParWithCache       	20000000	       624 ns/op
BenchmarkHTTPRequestParWithCache-2     	30000000	       360 ns/op
BenchmarkHTTPRequestParWithCache-4     	50000000	       248 ns/op
BenchmarkHTTPRequestParWithCache-8     	100000000	       292 ns/op
BenchmarkHTTPRequestParWithCache-16    	50000000	       332 ns/op
BenchmarkHTTPRequestParWithCache-32    	100000000	       365 ns/op
PASS
ok  	github.com/mroth/geominder	239.573s

@mroth
Copy link
Owner Author

mroth commented Dec 11, 2018

Since this is at least an improvement, I'm going to go ahead and merge this for now, and make a mental TODO to investigate whether it's worth creating my own cache based around sync.Map that may be even more performant.

(Notes for that: I think the way to do it would essentially be to use a fnv hash for key value to avoid pointers in GC sweep [see httpcache], and construct a simple FIFO queue keeping track of entries for sweeping out periodically. However needing to RWMutex lock the queue would create a potential bottleneck that obviates the fancy perf characteristics of sync.Map versus a normal mutex coordinated map.)

@mroth mroth merged commit dc1b1d9 into master Dec 11, 2018
@mroth mroth deleted the bigcache branch December 17, 2018 21:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant