-
Notifications
You must be signed in to change notification settings - Fork 17.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: working with small maps is 4x-10x slower than in nodejs #19495
Comments
In doing this kind of benchmarking, it's very important to remember that on amd64 the Go type |
I'm surprised the pprof output shows mapaccess, not mapaccessfast*. |
Good point, thanks for noticing. So with the following program: package main
//import "fmt"
func main() {
//a := make(map[int]int)
a := make(map[int32]int32)
for i := int32(0); i < 100000000; i++ {
a[i & 0xffff] = i
//a[i & 0x7f] = i
//a[i] = i
}
//fmt.Println(a)
} --- a/map.go.kirr
+++ b/map.go
@@ -2,9 +2,10 @@ package main
//import "fmt"
func main() {
- a := make(map[int]int)
+ //a := make(map[int]int)
+ a := make(map[int32]int32)
- for i := 0; i < 100000000; i++ {
+ for i := int32(0); i < 100000000; i++ {
a[i & 0xffff] = i
//a[i & 0x7f] = i
//a[i] = i map.go timings are:
in other words it stays the same as with |
@josharian it is |
Huh. We have I'd suggest that as a first step for this issue. I have a few other things on my plate I'd like to do first, but if no one speaks up for it, I'll plan to eventually get to this. |
How do you suggest to automate generating the fast versions? I can see a Go file, run by go generate, doing the right sequence of printf or template substitutions, is that what you're thinking? |
@rasky yeah, something like that. I took a quick look at this earlier and there was more divergence between the regular and the _fast methods than I anticipated. It's probably worth doing one manually and benchmarking the improvements before investing too much time in automating it. Do you plan to work on this? |
@josharian I wish, but I don't have enough time in the next two weeks. |
CL https://golang.org/cl/38091 mentions this issue. |
@josharian I have a patchset up that passes tests and shows some improvements on benchmarks. Are you the right person to review this?
Not quite sure why benchstat doesn't show a significant delta... |
@huguesb you need to run each benchmark multiple times for |
@mvdan Yeah, I screwed up the result concatenation the first time. Here goes
|
Please add the numbers to the CL too. Thanks for working on this! |
@mvdan Done |
Add benchmarks for map assignment with int32/int64/string key Benchmark results on darwin/amd64 name old time/op new time/op delta MapAssignInt32_255-8 24.7ns ± 3% 17.4ns ± 2% -29.75% (p=0.000 n=10+10) MapAssignInt32_64k-8 45.5ns ± 4% 37.6ns ± 4% -17.18% (p=0.000 n=10+10) MapAssignInt64_255-8 26.0ns ± 3% 17.9ns ± 4% -31.03% (p=0.000 n=10+10) MapAssignInt64_64k-8 46.9ns ± 5% 38.7ns ± 2% -17.53% (p=0.000 n=9+10) MapAssignStr_255-8 47.8ns ± 3% 24.8ns ± 4% -48.01% (p=0.000 n=10+10) MapAssignStr_64k-8 83.0ns ± 3% 51.9ns ± 3% -37.45% (p=0.000 n=10+9) name old time/op new time/op delta BinaryTree17-8 3.11s ±19% 2.78s ± 3% ~ (p=0.095 n=5+5) Fannkuch11-8 3.26s ± 1% 3.21s ± 2% ~ (p=0.056 n=5+5) FmtFprintfEmpty-8 50.3ns ± 1% 50.8ns ± 2% ~ (p=0.246 n=5+5) FmtFprintfString-8 82.7ns ± 4% 80.1ns ± 5% ~ (p=0.238 n=5+5) FmtFprintfInt-8 82.6ns ± 2% 81.9ns ± 3% ~ (p=0.508 n=5+5) FmtFprintfIntInt-8 124ns ± 4% 121ns ± 3% ~ (p=0.111 n=5+5) FmtFprintfPrefixedInt-8 158ns ± 6% 160ns ± 2% ~ (p=0.341 n=5+5) FmtFprintfFloat-8 249ns ± 2% 245ns ± 2% ~ (p=0.095 n=5+5) FmtManyArgs-8 513ns ± 2% 519ns ± 3% ~ (p=0.151 n=5+5) GobDecode-8 7.48ms ±12% 7.11ms ± 2% ~ (p=0.222 n=5+5) GobEncode-8 6.25ms ± 1% 6.03ms ± 2% -3.56% (p=0.008 n=5+5) Gzip-8 252ms ± 4% 252ms ± 4% ~ (p=1.000 n=5+5) Gunzip-8 38.4ms ± 3% 38.6ms ± 2% ~ (p=0.690 n=5+5) HTTPClientServer-8 76.9µs ±41% 66.4µs ± 6% ~ (p=0.310 n=5+5) JSONEncode-8 16.5ms ± 3% 16.7ms ± 3% ~ (p=0.421 n=5+5) JSONDecode-8 54.6ms ± 1% 54.3ms ± 2% ~ (p=0.548 n=5+5) Mandelbrot200-8 4.45ms ± 3% 4.47ms ± 1% ~ (p=0.841 n=5+5) GoParse-8 3.43ms ± 1% 3.32ms ± 2% -3.28% (p=0.008 n=5+5) RegexpMatchEasy0_32-8 88.2ns ± 3% 89.4ns ± 2% ~ (p=0.333 n=5+5) RegexpMatchEasy0_1K-8 205ns ± 1% 206ns ± 1% ~ (p=0.905 n=5+5) RegexpMatchEasy1_32-8 85.1ns ± 1% 85.5ns ± 5% ~ (p=0.690 n=5+5) RegexpMatchEasy1_1K-8 365ns ± 1% 371ns ± 9% ~ (p=1.000 n=5+5) RegexpMatchMedium_32-8 129ns ± 2% 128ns ± 3% ~ (p=0.730 n=5+5) RegexpMatchMedium_1K-8 39.8µs ± 0% 39.7µs ± 4% ~ (p=0.730 n=4+5) RegexpMatchHard_32-8 1.99µs ± 3% 2.05µs ±16% ~ (p=0.794 n=5+5) RegexpMatchHard_1K-8 59.3µs ± 1% 60.3µs ± 7% ~ (p=1.000 n=5+5) Revcomp-8 1.36s ±63% 0.52s ± 5% ~ (p=0.095 n=5+5) Template-8 62.6ms ±14% 60.5ms ± 5% ~ (p=0.690 n=5+5) TimeParse-8 330ns ± 2% 324ns ± 2% ~ (p=0.087 n=5+5) TimeFormat-8 350ns ± 3% 340ns ± 1% -2.86% (p=0.008 n=5+5) name old speed new speed delta GobDecode-8 103MB/s ±11% 108MB/s ± 2% ~ (p=0.222 n=5+5) GobEncode-8 123MB/s ± 1% 127MB/s ± 2% +3.71% (p=0.008 n=5+5) Gzip-8 77.1MB/s ± 4% 76.9MB/s ± 3% ~ (p=1.000 n=5+5) Gunzip-8 505MB/s ± 3% 503MB/s ± 2% ~ (p=0.690 n=5+5) JSONEncode-8 118MB/s ± 3% 116MB/s ± 3% ~ (p=0.421 n=5+5) JSONDecode-8 35.5MB/s ± 1% 35.8MB/s ± 2% ~ (p=0.397 n=5+5) GoParse-8 16.9MB/s ± 1% 17.4MB/s ± 2% +3.45% (p=0.008 n=5+5) RegexpMatchEasy0_32-8 363MB/s ± 3% 358MB/s ± 2% ~ (p=0.421 n=5+5) RegexpMatchEasy0_1K-8 4.98GB/s ± 1% 4.97GB/s ± 1% ~ (p=0.548 n=5+5) RegexpMatchEasy1_32-8 376MB/s ± 1% 375MB/s ± 5% ~ (p=0.690 n=5+5) RegexpMatchEasy1_1K-8 2.80GB/s ± 1% 2.76GB/s ± 9% ~ (p=0.841 n=5+5) RegexpMatchMedium_32-8 7.73MB/s ± 1% 7.76MB/s ± 3% ~ (p=0.730 n=5+5) RegexpMatchMedium_1K-8 25.8MB/s ± 0% 25.8MB/s ± 4% ~ (p=0.651 n=4+5) RegexpMatchHard_32-8 16.1MB/s ± 3% 15.7MB/s ±14% ~ (p=0.794 n=5+5) RegexpMatchHard_1K-8 17.3MB/s ± 1% 17.0MB/s ± 7% ~ (p=0.984 n=5+5) Revcomp-8 273MB/s ±83% 488MB/s ± 5% ~ (p=0.095 n=5+5) Template-8 31.1MB/s ±13% 32.1MB/s ± 5% ~ (p=0.690 n=5+5) Updates #19495 Change-Id: I116e9a2a4594769318b22d736464de8a98499909 Reviewed-on: https://go-review.googlesource.com/38091 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
@huguesb, @josharian thanks for working on this. After recent patch (ec091b6) timings, compared to #19495 (comment), improve a bit:
but go is still 3x - 7.5x slower than nodejs. for
|
@navytux are you sure the js benchmark is correct? The code does not reference any values added to the map during the loop, so a smart compiler will optimise away the whole loop. |
@davecheney thanks for asking. Yes, I checked it this way:
--- a/map.js.kirr
+++ b/map.js
@@ -4,12 +4,12 @@ function main() {
var a = {};
for (var i = 0; i < 100000000; i++) {
- a[i & 0xffff] = i;
- //a[i & 0x7f] = i;
+ //a[i & 0xffff] = i;
+ a[i & 0x7f] = i;
//a[i] = i;
}
- //console.log(a)
+ console.log(a)
}
main() it is:
it is only a bit more than 0.377 we had for this case without Said that I also have to say I'm complete newbie to javascript and can be missing something. Thanks again, |
Can you try how things change if you manually unroll the loop 8 times on both languages? |
I looked around a bit to see how nodejs compiles this (I used
it could be seen the loop is there (100000000 = 0x5f5e100), |
@rasky, sure. For the following programs: package main
//import "fmt"
func main() {
//a := make(map[int]int)
a := make(map[int32]int32)
for i := int32(0); i < 100000000; {
//a[i & 0xffff] = i
//a[i & 0x7f] = i
//a[i] = i
a[i & 0xffff] = i // 0
i++
a[i & 0xffff] = i // 1
i++
a[i & 0xffff] = i // 2
i++
a[i & 0xffff] = i // 3
i++
a[i & 0xffff] = i // 4
i++
a[i & 0xffff] = i // 5
i++
a[i & 0xffff] = i // 6
i++
a[i & 0xffff] = i // 7
i++
}
//fmt.Println(a)
} #!/usr/bin/env nodejs
function main() {
var a = {};
for (var i = 0; i < 100000000;) {
//a[i & 0xffff] = i;
//a[i & 0x7f] = i;
//a[i] = i;
a[i & 0xffff] = i; // 0
i++;
a[i & 0xffff] = i; // 1
i++;
a[i & 0xffff] = i; // 2
i++;
a[i & 0xffff] = i; // 3
i++;
a[i & 0xffff] = i; // 4
i++;
a[i & 0xffff] = i; // 5
i++;
a[i & 0xffff] = i; // 6
i++;
a[i & 0xffff] = i; // 7
i++;
}
//console.log(a)
}
main() timings are:
and just in case if I uncomment --- a/map-unroll.js.kirr
+++ b/map-unroll.js
@@ -26,7 +26,7 @@ function main() {
i++;
}
- //console.log(a)
+ console.log(a)
}
main()
so it is 0.501s vs 0.307s but there are 64K entries to print and imho it is reasonable to expect it to add time. Still even 0.5s vs 2.9s (and go case does not print anything) is a big difference. Thanks, |
CL https://golang.org/cl/38172 mentions this issue. |
I've been doing some profiling and it looks like a good chunk of time is spent iterating inside buckets. My feeling is that the memory layout is distinctly suboptimal because every insert needs to check all cells of the matching bucket in case the matching entry is at the end of the bucket. This implies a constant factor increase up to 8x. Assuming keys are evenly distributed inside buckets, which seems reasonable in this test case, I would expect an average of 4 probes per insert, which seems roughly in line with the observed performance difference with other hash tables (which I suspect use some variant of open addressing and do much less probing in practice). I'm going to experiment with a few micro-optimizations but I think eventually we'll have to use a different memory layout to get significant improvements. |
In the common case you only need to probe the 8 tophash bytes; you don't need to look at the key and value slots themselves.
takes a long time? |
|
Maybe that branch is systematically mispredicted, leading to lots of stalls. |
Add benchmarks for map delete with int32/int64/string key Benchmark results on darwin/amd64 name old time/op new time/op delta MapDelete/Int32/1-8 151ns ± 8% 99ns ± 3% -34.39% (p=0.008 n=5+5) MapDelete/Int32/2-8 128ns ± 2% 111ns ±15% -13.40% (p=0.040 n=5+5) MapDelete/Int32/4-8 128ns ± 5% 114ns ± 2% -10.82% (p=0.008 n=5+5) MapDelete/Int64/1-8 144ns ± 0% 104ns ± 3% -27.53% (p=0.016 n=4+5) MapDelete/Int64/2-8 153ns ± 1% 126ns ± 3% -17.17% (p=0.008 n=5+5) MapDelete/Int64/4-8 178ns ± 3% 136ns ± 2% -23.60% (p=0.008 n=5+5) MapDelete/Str/1-8 187ns ± 3% 171ns ± 3% -8.54% (p=0.008 n=5+5) MapDelete/Str/2-8 221ns ± 3% 206ns ± 4% -7.18% (p=0.016 n=5+4) MapDelete/Str/4-8 256ns ± 5% 232ns ± 2% -9.36% (p=0.016 n=4+5) name old time/op new time/op delta BinaryTree17-8 2.78s ± 7% 2.70s ± 1% ~ (p=0.151 n=5+5) Fannkuch11-8 3.21s ± 2% 3.19s ± 1% ~ (p=0.310 n=5+5) FmtFprintfEmpty-8 49.1ns ± 3% 50.2ns ± 2% ~ (p=0.095 n=5+5) FmtFprintfString-8 78.6ns ± 4% 80.2ns ± 5% ~ (p=0.460 n=5+5) FmtFprintfInt-8 79.7ns ± 1% 81.0ns ± 3% ~ (p=0.103 n=5+5) FmtFprintfIntInt-8 117ns ± 2% 119ns ± 0% ~ (p=0.079 n=5+4) FmtFprintfPrefixedInt-8 153ns ± 1% 146ns ± 3% -4.19% (p=0.024 n=5+5) FmtFprintfFloat-8 239ns ± 1% 237ns ± 1% ~ (p=0.246 n=5+5) FmtManyArgs-8 506ns ± 2% 509ns ± 2% ~ (p=0.238 n=5+5) GobDecode-8 7.06ms ± 4% 6.86ms ± 1% ~ (p=0.222 n=5+5) GobEncode-8 6.01ms ± 5% 5.87ms ± 2% ~ (p=0.222 n=5+5) Gzip-8 246ms ± 4% 236ms ± 1% -4.12% (p=0.008 n=5+5) Gunzip-8 37.7ms ± 4% 37.3ms ± 1% ~ (p=0.841 n=5+5) HTTPClientServer-8 64.9µs ± 1% 64.4µs ± 0% -0.80% (p=0.032 n=5+4) JSONEncode-8 16.0ms ± 2% 16.2ms ±11% ~ (p=0.548 n=5+5) JSONDecode-8 53.2ms ± 2% 53.1ms ± 4% ~ (p=1.000 n=5+5) Mandelbrot200-8 4.33ms ± 2% 4.32ms ± 2% ~ (p=0.841 n=5+5) GoParse-8 3.24ms ± 2% 3.27ms ± 4% ~ (p=0.690 n=5+5) RegexpMatchEasy0_32-8 86.2ns ± 1% 85.2ns ± 3% ~ (p=0.286 n=5+5) RegexpMatchEasy0_1K-8 198ns ± 2% 199ns ± 1% ~ (p=0.310 n=5+5) RegexpMatchEasy1_32-8 82.6ns ± 2% 81.8ns ± 1% ~ (p=0.294 n=5+5) RegexpMatchEasy1_1K-8 359ns ± 2% 354ns ± 1% -1.39% (p=0.048 n=5+5) RegexpMatchMedium_32-8 123ns ± 2% 123ns ± 1% ~ (p=0.905 n=5+5) RegexpMatchMedium_1K-8 38.2µs ± 2% 38.6µs ± 8% ~ (p=0.690 n=5+5) RegexpMatchHard_32-8 1.92µs ± 2% 1.91µs ± 5% ~ (p=0.460 n=5+5) RegexpMatchHard_1K-8 57.6µs ± 1% 57.0µs ± 2% ~ (p=0.310 n=5+5) Revcomp-8 483ms ± 7% 441ms ± 1% -8.79% (p=0.016 n=5+4) Template-8 58.0ms ± 1% 58.2ms ± 7% ~ (p=0.310 n=5+5) TimeParse-8 324ns ± 6% 312ns ± 2% ~ (p=0.087 n=5+5) TimeFormat-8 330ns ± 1% 329ns ± 1% ~ (p=0.968 n=5+5) name old speed new speed delta GobDecode-8 109MB/s ± 4% 112MB/s ± 1% ~ (p=0.222 n=5+5) GobEncode-8 128MB/s ± 5% 131MB/s ± 2% ~ (p=0.222 n=5+5) Gzip-8 78.9MB/s ± 4% 82.3MB/s ± 1% +4.25% (p=0.008 n=5+5) Gunzip-8 514MB/s ± 4% 521MB/s ± 1% ~ (p=0.841 n=5+5) JSONEncode-8 121MB/s ± 2% 120MB/s ±10% ~ (p=0.548 n=5+5) JSONDecode-8 36.5MB/s ± 2% 36.6MB/s ± 4% ~ (p=1.000 n=5+5) GoParse-8 17.9MB/s ± 2% 17.7MB/s ± 4% ~ (p=0.730 n=5+5) RegexpMatchEasy0_32-8 371MB/s ± 1% 375MB/s ± 3% ~ (p=0.310 n=5+5) RegexpMatchEasy0_1K-8 5.15GB/s ± 1% 5.13GB/s ± 1% ~ (p=0.548 n=5+5) RegexpMatchEasy1_32-8 387MB/s ± 2% 391MB/s ± 1% ~ (p=0.310 n=5+5) RegexpMatchEasy1_1K-8 2.85GB/s ± 2% 2.89GB/s ± 1% ~ (p=0.056 n=5+5) RegexpMatchMedium_32-8 8.07MB/s ± 2% 8.06MB/s ± 1% ~ (p=0.730 n=5+5) RegexpMatchMedium_1K-8 26.8MB/s ± 2% 26.6MB/s ± 7% ~ (p=0.690 n=5+5) RegexpMatchHard_32-8 16.7MB/s ± 2% 16.7MB/s ± 5% ~ (p=0.421 n=5+5) RegexpMatchHard_1K-8 17.8MB/s ± 1% 18.0MB/s ± 2% ~ (p=0.310 n=5+5) Revcomp-8 527MB/s ± 6% 577MB/s ± 1% +9.44% (p=0.016 n=5+4) Template-8 33.5MB/s ± 1% 33.4MB/s ± 7% ~ (p=0.310 n=5+5) Updates #19495 Change-Id: Ib9ece1690813d9b4788455db43d30891e2138df5 Reviewed-on: https://go-review.googlesource.com/38172 Reviewed-by: Hugues Bruant <hugues.bruant@gmail.com> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
Any solution or update to this issue? Funny I met this problem while I was doing a simple leetcode program and the thought came to me that it might be interesting to compare the results of python and go, and I was scared by the result. |
I don't think there have been any significant changes since the |
Change https://golang.org/cl/115616 mentions this issue: |
Change https://golang.org/cl/118358 mentions this issue: |
Change https://golang.org/cl/142437 mentions this issue: |
Divide the "empty" slot state into two, "emptyOne" and "emptyRest". emptyOne means just that slot is empty. emptyRest means all subsequent slots in that bucket are empty and the overflow pointer is nil. When scanning a bucket, we can often stop at emptyRest, reducing the total work we have to do. (This is similar to how tombstones work in open addressing.) Ideally on delete we have to figure out whether to zero the slot with an emptyOne or emptyRest marker. For now, we choose the safe but non-optimal choice. (Fix in subsequent CL?) This is a simpler CL than some others we've tried, including my CL sequence 11835[5-8] and Ilya's CL 115616. Update #19495 name old time/op new time/op delta MegMap 8.96ns ± 2% 8.74ns ± 6% -2.44% (p=0.020 n=10+10) MegOneMap 8.91ns ± 2% 5.53ns ± 2% -37.99% (p=0.000 n=10+10) MegEqMap 46.0µs ± 1% 45.8µs ± 3% ~ (p=0.315 n=9+10) MegEmptyMap 2.50ns ± 0% 2.50ns ± 2% ~ (p=0.957 n=8+10) SmallStrMap 8.54ns ± 1% 8.71ns ± 2% +2.01% (p=0.000 n=10+10) MapStringKeysEight_16 8.61ns ± 3% 8.71ns ± 3% +1.20% (p=0.026 n=9+9) MapStringKeysEight_32 8.54ns ± 2% 8.97ns ± 1% +5.05% (p=0.000 n=10+9) MapStringKeysEight_64 8.66ns ± 2% 8.99ns ± 2% +3.87% (p=0.000 n=10+10) MapStringKeysEight_1M 8.57ns ± 2% 8.95ns ± 2% +4.51% (p=0.000 n=10+9) IntMap 6.69ns ± 1% 7.46ns ± 1% +11.60% (p=0.000 n=9+9) MapFirst/1 3.69ns ± 1% 3.63ns ± 3% -1.52% (p=0.040 n=10+10) MapFirst/2 3.70ns ± 2% 3.63ns ± 2% -1.95% (p=0.001 n=9+9) MapFirst/3 3.74ns ± 2% 3.66ns ± 2% -2.12% (p=0.000 n=8+10) MapFirst/4 3.71ns ± 2% 3.66ns ± 4% ~ (p=0.073 n=9+10) MapFirst/5 3.69ns ± 1% 3.62ns ± 2% -1.88% (p=0.000 n=9+10) MapFirst/6 3.68ns ± 2% 3.62ns ± 1% -1.83% (p=0.001 n=10+9) MapFirst/7 3.67ns ± 1% 3.60ns ± 1% -1.98% (p=0.000 n=10+8) MapFirst/8 3.68ns ± 2% 3.61ns ± 2% -1.87% (p=0.000 n=10+10) MapFirst/9 8.03ns ± 4% 7.89ns ± 2% -1.76% (p=0.007 n=10+10) MapFirst/10 7.99ns ± 2% 7.86ns ± 3% -1.64% (p=0.009 n=9+10) MapFirst/11 7.96ns ± 1% 7.80ns ± 2% -2.01% (p=0.000 n=10+10) MapFirst/12 7.96ns ± 1% 7.82ns ± 1% -1.67% (p=0.000 n=10+10) MapFirst/13 8.06ns ± 3% 7.92ns ± 3% ~ (p=0.055 n=10+10) MapFirst/14 7.95ns ± 1% 7.80ns ± 1% -1.88% (p=0.000 n=10+9) MapFirst/15 8.01ns ± 2% 7.80ns ± 2% -2.57% (p=0.000 n=10+10) MapFirst/16 8.05ns ± 2% 7.90ns ± 2% -1.84% (p=0.005 n=9+10) MapMid/1 4.00ns ± 1% 3.94ns ± 2% -1.30% (p=0.021 n=8+9) MapMid/2 4.39ns ± 2% 4.32ns ± 4% ~ (p=0.128 n=10+10) MapMid/3 4.40ns ± 2% 4.27ns ± 2% -2.93% (p=0.000 n=10+9) MapMid/4 4.76ns ± 2% 4.65ns ± 1% -2.26% (p=0.000 n=10+9) MapMid/5 4.76ns ± 1% 4.65ns ± 1% -2.27% (p=0.000 n=10+10) MapMid/6 5.11ns ± 2% 4.98ns ± 2% -2.55% (p=0.000 n=10+10) MapMid/7 5.12ns ± 1% 5.01ns ± 3% -2.02% (p=0.003 n=9+9) MapMid/8 5.71ns ± 3% 5.97ns ± 1% +4.51% (p=0.000 n=10+9) MapMid/9 8.72ns ±10% 8.89ns ±10% ~ (p=0.458 n=9+10) MapMid/10 10.1ns ±15% 9.6ns ± 7% ~ (p=0.080 n=9+10) MapMid/11 9.88ns ±10% 9.44ns ±11% ~ (p=0.065 n=10+10) MapMid/12 9.90ns ±13% 10.04ns ± 9% ~ (p=1.000 n=10+8) MapMid/13 9.67ns ±14% 10.23ns ±10% ~ (p=0.209 n=10+9) MapMid/14 9.12ns ±14% 9.14ns ±13% ~ (p=0.927 n=10+10) MapMid/15 9.16ns ±12% 9.15ns ±16% ~ (p=0.955 n=10+10) MapMid/16 9.37ns ±11% 9.60ns ±23% ~ (p=0.825 n=9+10) MapLast/1 4.08ns ± 1% 3.92ns ± 0% -3.91% (p=0.000 n=10+9) MapLast/2 4.37ns ± 1% 4.28ns ± 1% -1.95% (p=0.000 n=10+10) MapLast/3 4.94ns ± 2% 4.65ns ± 1% -5.79% (p=0.000 n=9+8) MapLast/4 5.40ns ± 3% 5.02ns ± 2% -7.13% (p=0.000 n=9+9) MapLast/5 5.88ns ± 2% 5.67ns ± 2% -3.57% (p=0.000 n=10+10) MapLast/6 6.48ns ± 3% 5.90ns ± 2% -8.89% (p=0.000 n=10+10) MapLast/7 7.01ns ± 2% 6.27ns ± 5% -10.56% (p=0.000 n=10+10) MapLast/8 7.60ns ± 2% 6.62ns ± 2% -12.93% (p=0.000 n=9+10) MapLast/9 10.6ns ± 9% 10.9ns ±15% ~ (p=0.344 n=9+10) MapLast/10 11.0ns ±12% 10.9ns ±14% ~ (p=0.985 n=10+10) MapLast/11 11.4ns ±12% 11.8ns ±22% ~ (p=0.671 n=10+10) MapLast/12 11.6ns ±10% 12.1ns ±19% ~ (p=0.617 n=10+10) MapLast/13 12.5ns ±23% 11.8ns ±13% ~ (p=0.827 n=10+9) MapLast/14 10.5ns ±22% 10.4ns ± 5% ~ (p=0.797 n=10+9) MapLast/15 10.0ns ±15% 10.3ns ±16% ~ (p=0.565 n=10+10) MapLast/16 10.4ns ±12% 10.5ns ±13% ~ (p=0.889 n=10+9) MapCycle 22.3ns ± 1% 22.0ns ± 2% -1.43% (p=0.002 n=9+10) RepeatedLookupStrMapKey32 16.4ns ± 1% 16.6ns ± 1% +1.24% (p=0.000 n=10+9) RepeatedLookupStrMapKey1M 35.6µs ± 0% 35.4µs ± 1% -0.62% (p=0.002 n=10+10) NewEmptyMap 5.36ns ± 1% 9.05ns ± 1% +69.02% (p=0.000 n=10+8) NewSmallMap 51.2ns ± 2% 33.7ns ± 1% -34.22% (p=0.000 n=10+9) MapIter 83.8ns ± 1% 88.4ns ± 1% +5.55% (p=0.000 n=10+10) MapIterEmpty 4.32ns ± 3% 5.54ns ± 3% +28.12% (p=0.000 n=10+10) SameLengthMap 4.31ns ± 1% 4.59ns ± 2% +6.41% (p=0.000 n=9+10) BigKeyMap 24.2ns ± 2% 24.3ns ± 1% ~ (p=0.432 n=10+10) BigValMap 24.3ns ± 1% 24.4ns ± 2% ~ (p=0.200 n=10+9) SmallKeyMap 17.5ns ± 1% 18.5ns ± 2% +5.81% (p=0.000 n=9+10) MapPopulate/1 29.0ns ± 4% 18.8ns ± 1% -35.27% (p=0.000 n=10+9) MapPopulate/10 736ns ± 5% 693ns ± 4% -5.92% (p=0.000 n=10+10) MapPopulate/100 11.3µs ± 2% 10.8µs ± 3% -4.38% (p=0.000 n=10+10) MapPopulate/1000 139µs ± 8% 132µs ± 4% -5.10% (p=0.002 n=10+10) MapPopulate/10000 1.21ms ± 5% 1.16ms ± 5% -4.56% (p=0.002 n=10+10) MapPopulate/100000 12.2ms ± 3% 11.8ms ± 5% ~ (p=0.052 n=10+10) ComplexAlgMap 73.9ns ± 1% 74.4ns ± 2% ~ (p=0.161 n=9+10) GoMapClear/Reflexive/1 36.0ns ± 1% 26.9ns ± 2% -25.31% (p=0.000 n=10+10) GoMapClear/Reflexive/10 35.2ns ± 1% 24.4ns ± 1% -30.62% (p=0.000 n=10+10) GoMapClear/Reflexive/100 69.6ns ± 2% 59.2ns ± 1% -14.92% (p=0.000 n=10+10) GoMapClear/Reflexive/1000 1.06µs ± 2% 1.05µs ± 1% -1.16% (p=0.013 n=10+9) GoMapClear/Reflexive/10000 11.7µs ± 1% 11.7µs ± 1% ~ (p=0.542 n=10+10) GoMapClear/NonReflexive/1 96.3ns ± 1% 90.0ns ± 1% -6.52% (p=0.000 n=10+10) GoMapClear/NonReflexive/10 110ns ± 2% 101ns ± 0% -8.10% (p=0.000 n=10+7) GoMapClear/NonReflexive/100 270ns ± 2% 235ns ± 2% -12.94% (p=0.000 n=10+10) GoMapClear/NonReflexive/1000 3.02µs ± 2% 2.48µs ± 1% -17.92% (p=0.000 n=10+10) GoMapClear/NonReflexive/10000 23.7µs ± 1% 19.6µs ± 1% -17.30% (p=0.000 n=10+9) MapPop100 9.65µs ± 6% 9.18µs ± 8% -4.82% (p=0.008 n=9+10) MapPop1000 162µs ± 6% 148µs ± 4% -8.67% (p=0.000 n=9+9) MapPop10000 3.05ms ± 8% 2.82ms ±15% -7.66% (p=0.023 n=10+10) MapAssign/Int32/256 15.7ns ± 4% 14.6ns ± 2% -7.08% (p=0.000 n=10+10) MapAssign/Int32/65536 29.8ns ± 1% 30.4ns ± 0% +2.04% (p=0.000 n=10+8) MapAssign/Int64/256 14.9ns ± 5% 14.8ns ± 4% ~ (p=0.611 n=10+10) MapAssign/Int64/65536 30.3ns ± 2% 30.4ns ± 1% +0.54% (p=0.046 n=10+9) MapAssign/Str/256 17.8ns ± 3% 19.8ns ± 4% +11.08% (p=0.000 n=10+10) MapAssign/Str/65536 35.7ns ± 1% 36.4ns ± 1% +1.82% (p=0.000 n=10+10) MapOperatorAssign/Int32/256 18.8ns ± 5% 14.6ns ± 3% -22.57% (p=0.000 n=10+10) MapOperatorAssign/Int32/65536 29.8ns ± 1% 30.5ns ± 1% +2.39% (p=0.000 n=10+10) MapOperatorAssign/Int64/256 16.6ns ± 4% 15.0ns ± 6% -9.34% (p=0.000 n=10+10) MapOperatorAssign/Int64/65536 30.1ns ± 1% 31.7ns ± 2% +5.21% (p=0.000 n=10+10) MapOperatorAssign/Str/256 1.70µs ± 1% 1.61µs ± 2% -5.55% (p=0.000 n=10+8) MapOperatorAssign/Str/65536 289ns ± 7% 294ns ± 4% ~ (p=0.425 n=10+10) MapAppendAssign/Int32/256 34.3ns ± 2% 31.0ns ± 3% -9.59% (p=0.000 n=9+9) MapAppendAssign/Int32/65536 51.8ns ± 3% 47.1ns ±13% -9.17% (p=0.002 n=9+10) MapAppendAssign/Int64/256 32.5ns ± 8% 31.2ns ± 6% ~ (p=0.065 n=10+10) MapAppendAssign/Int64/65536 51.4ns ± 4% 47.2ns ±10% -8.07% (p=0.005 n=9+10) MapAppendAssign/Str/256 105ns ±12% 109ns ± 4% ~ (p=0.138 n=10+8) MapAppendAssign/Str/65536 101ns ±14% 81ns ± 8% -19.82% (p=0.000 n=10+9) MapDelete/Int32/100 32.0ns ± 1% 35.0ns ± 2% +9.59% (p=0.000 n=9+10) MapDelete/Int32/1000 27.0ns ± 3% 30.3ns ± 1% +12.10% (p=0.000 n=10+9) MapDelete/Int32/10000 29.2ns ± 1% 32.9ns ± 2% +12.80% (p=0.000 n=10+10) MapDelete/Int64/100 31.5ns ± 1% 35.7ns ± 2% +13.16% (p=0.000 n=10+10) MapDelete/Int64/1000 27.0ns ± 2% 30.6ns ± 1% +13.21% (p=0.000 n=10+10) MapDelete/Int64/10000 30.3ns ± 1% 34.4ns ± 3% +13.47% (p=0.000 n=10+10) MapDelete/Str/100 23.4ns ± 8% 26.7ns ± 6% +14.10% (p=0.000 n=10+9) MapDelete/Str/1000 31.0ns ± 2% 35.1ns ± 3% +13.19% (p=0.000 n=10+9) MapDelete/Str/10000 38.8ns ± 1% 43.4ns ± 2% +12.02% (p=0.000 n=9+10) Change-Id: I564ce0f40936589f0f9b837f7f2bbcca4c4a1070 Reviewed-on: https://go-review.googlesource.com/c/142437 Reviewed-by: Giovanni Bajo <rasky@develer.com> Reviewed-by: Martin Möhrmann <martisch@uos.de> Run-TryBot: Martin Möhrmann <martisch@uos.de> TryBot-Result: Gobot Gobot <gobot@golang.org>
any update? I met this problem while I was rewriting a NodeJS project with golang, and found out that the performance of map in golang is much slower than NodeJS one |
@NeoyeElf focusing on microbenchmarks can be misleading. Have you found the performance of the application as a whole, specifically request latency and throughput to be worse than your existing NodeJS codebase? |
@davecheney My project will get&put data like spreadsheet which has row, columns. In NodeJS, the data structure is like: btw: I change the data structure to slice, and the speed is expected and acceptable. Maybe when the key is consecutive numbers, the object in NodeJS is implemented by array as well...just guess :) |
Can you provide a benchmark to support this? |
@davecheney
it cost about 60ms. The NodeJS program:
it costs about 2.6ms. These programs above was running on my mac with 2.2 GHz 4 Intel Core i7, 16GB 1600 MHz DDR3 |
I'm not terribly surprised that a consecutive number optimization would help in cases like this. Anyone have the pointer to the nodeJS map implementation source? |
In this case, the Golang is about 2 times faster than NodeJS (63ms vs 128ms).
I found the post fast properties in V8. It solved my question |
@randall77 I believe it is in the v8 project in C++ and in these files:
@indutny please help correct me if am wrong about, or missing any references. Thank you. |
Ok, then I'm not really sure that there's anything to fix here. Go's maps are general purpose. For the case of known dense integer indexes, we could have a special case map implementation. But that's a lot of complexity to add for a speedup in a fairly special case (and one for which a slice would quite possibly work just as well). |
V8 (in Node.js) has two separate dictionary modes for numeric and non-numeric keys in the map, i.e. if you store values by consecutive numerical keys in the |
Based on the recent responses I am going to close this issue as the current question has been answered and is unrelated to the original inquiry several years ago. For future commenters—please open a new issue; fill out the template, explain your problem, give a procedure for someone else to reproduce what you are seeing and if it turns out this is an issue, it will be addressed. Thank you all. |
@davecheney, why do you close the issue? Even if nodejs is doing runtime map magic which Go is not willing to accept (but see P.S.), it was already explicitly posted here 3 years ago that:
Why that information is being ignored? Please reopen the issue, P.S. |
@navytux since the original issue was a comparison with V8 (which seems to be fixed, aside from the sequential-keys case), and your comment is a comparison with Pypy, maybe it makes sense to open a new issue for that comparison? EDIT: however, that doesn't mean this shouldn't be reopened, if the discussion above suggested further optimizations which haven't been tried. |
( I'm taking time break due to overload; I hope to reply to @networkimprov in 1 or 2 months. I appologize for the inconvenience ) |
Please answer these questions before submitting your issue. Thanks!
What did you do?
Hello up there. Map performance was already discussed some time ago in #3885 and improved a bit. It was also said there that the map algorithm is choosen to work very well with very very large maps. However maps are not always very very large and imho in many practical cases they are small and medium.
So please consider the following 3 programs:
(https://play.golang.org/p/rPH1pSM1Xk)
The time it takes to run them on i7-6600U is as follows:
The go version is 9.5x slower than javascript one, and ~ 1.8x slower than pypy one.
If we reduce the actual map size from 64K elements to 128 elements, activating the
a[i & 0x7f] = i
case via e.g. the following patch:timings become:
javascript becomes only a bit faster here while go & pypy improved ~ 2.3x / 2.2x respectively. Still go is 4x slower than javascript and 1.7x slower than pypy.
We can also test how it works if we do not limit the map size and let it grow on every operation. Yes, javascript and pypy are more memory hungry and for original niter=1E8 I'm getting out-of-memory in their cases on my small laptop, but let's test with e.g. niter=1E7 (diff to original program):
timings become:
So it is go/js ~6.5x slower and go/pypy is ~2.2x slower.
The profile for original program (
a[i & 0xffff] = i
) is:What did you expect to see?
Map operations for small / medium maps are as fast or better than in nodejs.
What did you see instead?
Map operations are 4x-10x slower than in javascript for maps sizes that are commonly present in many programs.
Does this issue reproduce with the latest release (go1.8)?
Yes.
System details
Thanks beforehand,
Kirill
/cc @rsc, @randall77
The text was updated successfully, but these errors were encountered: