New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: a faster, better fnv32 implementation #19780
Comments
The core implementation in std looks essentially the same as yours:
( I believe the reason your function is faster is that your implementation does not conform to the EDIT: this is actually already noted in the issue you linked. The point is that I don't think we can change the Hash package now, moving to pure functions and whatnot. That would break the go1 compatibility promise. |
Thanks for your response. Indeed it seems that the reason for the faster implementation on our project is due to the fact the runtime has less work to do because we are not using the The thing is: our benchmarks show a very big difference as you can see in the issue. Our version was in the range of 90.8 ns/op - 95.6 ns/op vs 165 ns/op - 170 ns/op if we re-use the hasher (or 214 ns/op vs 249 ns/op if we don't!). I think that as this level it's worth giving people the option to use the faster version with native go. |
I understand the motivation but I don't see a reasonable way to do this. As I wrote we can't change the current API. So we would have to add a second package, with the same functionality, but implemented using plain functions, or just dump a bunch of new functions in the current package, duplicating everything there's now. Both solutions are IMHO undesirable. |
Well, I'm religious about beautiful and idiomatic code as much as the next gopher, but if it was my application, I would rather suffer something like an extra package for such a significant performance improvement. But I see your point, and most applications are probably not performance critical like ones that use concurrent-map. |
crc32 has separate functions for doing the operation directly and not via some interface: https://golang.org/pkg/hash/crc32/#Checksum Same with sha1 and md5: https://golang.org/pkg/crypto/sha1/#Sum ... https://golang.org/pkg/crypto/md5/#Sum I don't think it'd be weird to have a dedicated function in the fnv package for this. |
The Sha and Md5 packages are very small though. They have Still, that's a good point: since other packages do this, there's a precedent. |
I'm not worried about 4 sum functions. It's a small and focused package. If somebody comes looking for fnv, they won't be overwhelmed by 4 fnv functions. |
Thank you both for the feedback. Would it be reasonable at this stage if I will implement the new fnv32 function as a seperate function in the spirit of the other hashes mentioned here, and submit the contribution for code review as per the guidelines? |
Is it possible for the compile to optimize out the |
Actually, https://go-review.googlesource.com/c/37751/ landed this cycle @orcaman, could you profile Go 1.8 vs Go tip for the standard fnv package? |
@orcaman Also it would be great if you could benchmark each function multiple times using |
@bradfitz @ALTree hey guys, I just ran a benchmark of go1.8 vs go tip (BTW, for go tip I just built go from the latest master, is this OK?). Here are the results: for go 1.8: $ go test -bench=. -count 10 > go1_8.txt
$ cat go1_8.txt
BenchmarkFNV32Custom-8 20000000 72.1 ns/op
BenchmarkFNV32Custom-8 20000000 71.8 ns/op
BenchmarkFNV32Custom-8 20000000 75.2 ns/op
BenchmarkFNV32Custom-8 20000000 73.2 ns/op
BenchmarkFNV32Custom-8 20000000 69.6 ns/op
BenchmarkFNV32Custom-8 20000000 75.6 ns/op
BenchmarkFNV32Custom-8 20000000 73.2 ns/op
BenchmarkFNV32Custom-8 20000000 70.2 ns/op
BenchmarkFNV32Custom-8 20000000 72.2 ns/op
BenchmarkFNV32Custom-8 20000000 75.2 ns/op
BenchmarkFNV32NativeAlloc-8 10000000 179 ns/op
BenchmarkFNV32NativeAlloc-8 10000000 145 ns/op
BenchmarkFNV32NativeAlloc-8 10000000 146 ns/op
BenchmarkFNV32NativeAlloc-8 10000000 145 ns/op
BenchmarkFNV32NativeAlloc-8 10000000 146 ns/op
BenchmarkFNV32NativeAlloc-8 10000000 145 ns/op
BenchmarkFNV32NativeAlloc-8 10000000 147 ns/op
BenchmarkFNV32NativeAlloc-8 10000000 153 ns/op
BenchmarkFNV32NativeAlloc-8 10000000 150 ns/op
BenchmarkFNV32NativeAlloc-8 10000000 149 ns/op
BenchmarkFNV32NativeReuse-8 10000000 137 ns/op
BenchmarkFNV32NativeReuse-8 10000000 136 ns/op
BenchmarkFNV32NativeReuse-8 10000000 135 ns/op
BenchmarkFNV32NativeReuse-8 10000000 133 ns/op
BenchmarkFNV32NativeReuse-8 10000000 135 ns/op
BenchmarkFNV32NativeReuse-8 10000000 137 ns/op
BenchmarkFNV32NativeReuse-8 10000000 138 ns/op
BenchmarkFNV32NativeReuse-8 10000000 132 ns/op
BenchmarkFNV32NativeReuse-8 10000000 133 ns/op
BenchmarkFNV32NativeReuse-8 10000000 135 ns/op
PASS
ok github.com/orcaman/fnv-bench 73.664s for go built from the latest master branch: $ /usr/local/go_tip/go/bin/go test -bench=. -count 10 > go_tip.txt
$ cat go_tip.txt
goos: darwin
goarch: amd64
pkg: github.com/orcaman/fnv-bench
BenchmarkFNV32Custom-8 20000000 72.0 ns/op
BenchmarkFNV32Custom-8 20000000 70.2 ns/op
BenchmarkFNV32Custom-8 20000000 73.8 ns/op
BenchmarkFNV32Custom-8 20000000 72.1 ns/op
BenchmarkFNV32Custom-8 20000000 76.0 ns/op
BenchmarkFNV32Custom-8 20000000 77.7 ns/op
BenchmarkFNV32Custom-8 20000000 76.5 ns/op
BenchmarkFNV32Custom-8 20000000 74.3 ns/op
BenchmarkFNV32Custom-8 20000000 76.6 ns/op
BenchmarkFNV32Custom-8 20000000 75.3 ns/op
BenchmarkFNV32NativeAlloc-8 10000000 185 ns/op
BenchmarkFNV32NativeAlloc-8 10000000 154 ns/op
BenchmarkFNV32NativeAlloc-8 10000000 151 ns/op
BenchmarkFNV32NativeAlloc-8 10000000 154 ns/op
BenchmarkFNV32NativeAlloc-8 10000000 153 ns/op
BenchmarkFNV32NativeAlloc-8 10000000 150 ns/op
BenchmarkFNV32NativeAlloc-8 10000000 152 ns/op
BenchmarkFNV32NativeAlloc-8 10000000 149 ns/op
BenchmarkFNV32NativeAlloc-8 10000000 152 ns/op
BenchmarkFNV32NativeAlloc-8 10000000 152 ns/op
BenchmarkFNV32NativeReuse-8 10000000 140 ns/op
BenchmarkFNV32NativeReuse-8 10000000 138 ns/op
BenchmarkFNV32NativeReuse-8 10000000 137 ns/op
BenchmarkFNV32NativeReuse-8 10000000 136 ns/op
BenchmarkFNV32NativeReuse-8 10000000 139 ns/op
BenchmarkFNV32NativeReuse-8 10000000 139 ns/op
BenchmarkFNV32NativeReuse-8 10000000 141 ns/op
BenchmarkFNV32NativeReuse-8 10000000 139 ns/op
BenchmarkFNV32NativeReuse-8 10000000 138 ns/op
BenchmarkFNV32NativeReuse-8 10000000 138 ns/op
PASS
ok github.com/orcaman/fnv-bench 80.214s For reference, here is my benchmarking code: package fnvbench
import (
"hash/fnv"
"math/rand"
"testing"
)
const (
maxElems = 10000000
maxStrLen = 128
letters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
)
// generate random string of given length
func randString(n int) string {
b := make([]byte, n)
for i := range b {
b[i] = letters[rand.Int63()%int64(len(letters))]
}
return string(b)
}
// generate slice with maxElems strings of variable length: 0 - maxStrLen
func prepareData() *[]string {
var strLen int
stringArray := make([]string, maxElems)
for i := 0; i < maxElems; i++ {
strLen = rand.Intn(maxStrLen)
stringArray[i] = randString(strLen)
}
return &stringArray
}
// custom FNV32 from concurrent-map package
func fnv32(key string) uint32 {
hash := uint32(2166136261)
const prime32 = uint32(16777619)
for i := 0; i < len(key); i++ {
hash *= prime32
hash ^= uint32(key[i])
}
return hash
}
var dataSet = prepareData()
func BenchmarkFNV32Custom(b *testing.B) {
for i := 0; i < b.N; i++ {
fnv32((*dataSet)[i%maxElems])
}
}
func BenchmarkFNV32NativeAlloc(b *testing.B) {
for i := 0; i < b.N; i++ {
hasher := fnv.New32()
hasher.Write([]byte((*dataSet)[i%maxElems]))
hasher.Sum32()
}
}
func BenchmarkFNV32NativeReuse(b *testing.B) {
hasher := fnv.New32()
b.ResetTimer()
for i := 0; i < b.N; i++ {
hasher.Reset()
hasher.Write([]byte((*dataSet)[i%maxElems]))
hasher.Sum32()
}
} |
And here are those same results presented using benchstat. Please use it in the future.
|
Are you sure you're not comparing apples and oranges? Those string to byte slice conversions look like allocations in your inner loop to me. |
@bradfitz seems you are right! I changed the benchmarking code to prepare the data as slices of bytes instead of strings (see here) and the results are now quite comparable. This is after running the new benchmark 10 times again: $ benchstat go1_8.txt go_tip.txt
name old time/op new time/op delta
FNV32Custom-8 89.5ns ±10% 83.0ns ± 5% -7.25% (p=0.010 n=10+10)
FNV32NativeAlloc-8 105ns ± 8% 110ns ± 4% ~ (p=0.108 n=10+10)
FNV32NativeReuse-8 91.0ns ±10% 89.0ns ± 6% ~ (p=0.382 n=10+10) |
Looks like there's nothing to do here, then. You should run more iterations of the benchmark I'll close this but holler if you see room for improvement. |
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?go version go1.7.3 darwin/amd64
What operating system and processor architecture are you using (
go env
)?GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/orcaman/github.com/orcaman/go"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/lc/lr_hrpk55218jm8dn3s78ww00000gn/T/go-build334927550=/tmp/go-build -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"
What did you do?
At the concurrent-map project we have written a faster, better implementation of
function fnv32
.Please have a look at this issue to see the implementation and some benchmarks showing the performance improvement: orcaman/concurrent-map#48
This is the commit where we added our custom implementation:
orcaman/concurrent-map@e3b11eb
We clearly see on multiple machines of different kinds that our implementation is significantly faster. I would like to contribute this change to the go source tree.
Cheers,
Or
The text was updated successfully, but these errors were encountered: