proposal: a faster, better fnv32 implementation #19780

orcaman · 2017-03-30T10:21:08Z

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (`go version`)?

go version go1.7.3 darwin/amd64

What operating system and processor architecture are you using (`go env`)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/orcaman/github.com/orcaman/go"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/lc/lr_hrpk55218jm8dn3s78ww00000gn/T/go-build334927550=/tmp/go-build -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"

What did you do?

At the concurrent-map project we have written a faster, better implementation of function fnv32.

Please have a look at this issue to see the implementation and some benchmarks showing the performance improvement: orcaman/concurrent-map#48

This is the commit where we added our custom implementation:
orcaman/concurrent-map@e3b11eb

We clearly see on multiple machines of different kinds that our implementation is significantly faster. I would like to contribute this change to the go source tree.

Cheers,
Or

The text was updated successfully, but these errors were encountered:

ALTree · 2017-03-30T10:56:42Z

The core implementation in std looks essentially the same as yours:

func (s *sum32) Write(data []byte) (int, error) {
	hash := *s
	for _, c := range data {
		hash *= prime32
		hash ^= sum32(c)
	}
	*s = hash
	return len(data), nil
}

(sum32 is a cast).

I believe the reason your function is faster is that your implementation does not conform to the Hash interface and for this reason the runtime has less work to do when you use it.

EDIT: this is actually already noted in the issue you linked. The point is that I don't think we can change the Hash package now, moving to pure functions and whatnot. That would break the go1 compatibility promise.

orcaman · 2017-03-30T11:09:34Z

Thanks for your response.

Indeed it seems that the reason for the faster implementation on our project is due to the fact the runtime has less work to do because we are not using the Hash interface.

The thing is: our benchmarks show a very big difference as you can see in the issue. Our version was in the range of 90.8 ns/op - 95.6 ns/op vs 165 ns/op - 170 ns/op if we re-use the hasher (or 214 ns/op vs 249 ns/op if we don't!).

I think that as this level it's worth giving people the option to use the faster version with native go.

ALTree · 2017-03-30T15:25:55Z

I understand the motivation but I don't see a reasonable way to do this.

As I wrote we can't change the current API. So we would have to add a second package, with the same functionality, but implemented using plain functions, or just dump a bunch of new functions in the current package, duplicating everything there's now.

Both solutions are IMHO undesirable.

orcaman · 2017-03-30T15:34:44Z

Well, I'm religious about beautiful and idiomatic code as much as the next gopher, but if it was my application, I would rather suffer something like an extra package for such a significant performance improvement. But I see your point, and most applications are probably not performance critical like ones that use concurrent-map.

bradfitz · 2017-03-30T16:01:29Z

crc32 has separate functions for doing the operation directly and not via some interface: https://golang.org/pkg/hash/crc32/#Checksum

Same with sha1 and md5: https://golang.org/pkg/crypto/sha1/#Sum ... https://golang.org/pkg/crypto/md5/#Sum

I don't think it'd be weird to have a dedicated function in the fnv package for this.

ALTree · 2017-03-30T16:11:21Z

The Sha and Md5 packages are very small though. They have New for the hasher and then Sum, and that's it. fnv will need either 4 new exported Sum functions (for 32, 32a, 64, 64a) or a more complicated single function that takes another parameter (after the data) and switches on implementations. It'll be certainly more confusing to document than md5.

Still, that's a good point: since other packages do this, there's a precedent.

bradfitz · 2017-03-30T16:12:56Z

I'm not worried about 4 sum functions. It's a small and focused package. If somebody comes looking for fnv, they won't be overwhelmed by 4 fnv functions.

orcaman · 2017-03-30T18:44:53Z

Thank you both for the feedback. Would it be reasonable at this stage if I will implement the new fnv32 function as a seperate function in the spirit of the other hashes mentioned here, and submit the contribution for code review as per the guidelines?

bronze1man · 2017-03-31T05:38:54Z

Is it possible for the compile to optimize out the Hash interface when the user just use one implement?

bradfitz · 2017-03-31T05:58:29Z

Actually, https://go-review.googlesource.com/c/37751/ landed this cycle

@orcaman, could you profile Go 1.8 vs Go tip for the standard fnv package?

ALTree · 2017-03-31T08:36:26Z

@orcaman Also it would be great if you could benchmark each function multiple times using -count, like in go test -bench=. -count 10 > old.txt , and then generate a report with benchstat (https://godoc.org/golang.org/x/perf/cmd/benchstat).

orcaman · 2017-04-01T13:43:06Z

@bradfitz @ALTree hey guys, I just ran a benchmark of go1.8 vs go tip (BTW, for go tip I just built go from the latest master, is this OK?).

Here are the results:

for go 1.8:

$ go test -bench=. -count 10 > go1_8.txt
$ cat go1_8.txt 
BenchmarkFNV32Custom-8        	20000000	        72.1 ns/op
BenchmarkFNV32Custom-8        	20000000	        71.8 ns/op
BenchmarkFNV32Custom-8        	20000000	        75.2 ns/op
BenchmarkFNV32Custom-8        	20000000	        73.2 ns/op
BenchmarkFNV32Custom-8        	20000000	        69.6 ns/op
BenchmarkFNV32Custom-8        	20000000	        75.6 ns/op
BenchmarkFNV32Custom-8        	20000000	        73.2 ns/op
BenchmarkFNV32Custom-8        	20000000	        70.2 ns/op
BenchmarkFNV32Custom-8        	20000000	        72.2 ns/op
BenchmarkFNV32Custom-8        	20000000	        75.2 ns/op
BenchmarkFNV32NativeAlloc-8   	10000000	       179 ns/op
BenchmarkFNV32NativeAlloc-8   	10000000	       145 ns/op
BenchmarkFNV32NativeAlloc-8   	10000000	       146 ns/op
BenchmarkFNV32NativeAlloc-8   	10000000	       145 ns/op
BenchmarkFNV32NativeAlloc-8   	10000000	       146 ns/op
BenchmarkFNV32NativeAlloc-8   	10000000	       145 ns/op
BenchmarkFNV32NativeAlloc-8   	10000000	       147 ns/op
BenchmarkFNV32NativeAlloc-8   	10000000	       153 ns/op
BenchmarkFNV32NativeAlloc-8   	10000000	       150 ns/op
BenchmarkFNV32NativeAlloc-8   	10000000	       149 ns/op
BenchmarkFNV32NativeReuse-8   	10000000	       137 ns/op
BenchmarkFNV32NativeReuse-8   	10000000	       136 ns/op
BenchmarkFNV32NativeReuse-8   	10000000	       135 ns/op
BenchmarkFNV32NativeReuse-8   	10000000	       133 ns/op
BenchmarkFNV32NativeReuse-8   	10000000	       135 ns/op
BenchmarkFNV32NativeReuse-8   	10000000	       137 ns/op
BenchmarkFNV32NativeReuse-8   	10000000	       138 ns/op
BenchmarkFNV32NativeReuse-8   	10000000	       132 ns/op
BenchmarkFNV32NativeReuse-8   	10000000	       133 ns/op
BenchmarkFNV32NativeReuse-8   	10000000	       135 ns/op
PASS
ok  	github.com/orcaman/fnv-bench	73.664s

for go built from the latest master branch:

$ /usr/local/go_tip/go/bin/go test -bench=. -count 10 > go_tip.txt
$ cat go_tip.txt
goos: darwin
goarch: amd64
pkg: github.com/orcaman/fnv-bench
BenchmarkFNV32Custom-8        	20000000	        72.0 ns/op
BenchmarkFNV32Custom-8        	20000000	        70.2 ns/op
BenchmarkFNV32Custom-8        	20000000	        73.8 ns/op
BenchmarkFNV32Custom-8        	20000000	        72.1 ns/op
BenchmarkFNV32Custom-8        	20000000	        76.0 ns/op
BenchmarkFNV32Custom-8        	20000000	        77.7 ns/op
BenchmarkFNV32Custom-8        	20000000	        76.5 ns/op
BenchmarkFNV32Custom-8        	20000000	        74.3 ns/op
BenchmarkFNV32Custom-8        	20000000	        76.6 ns/op
BenchmarkFNV32Custom-8        	20000000	        75.3 ns/op
BenchmarkFNV32NativeAlloc-8   	10000000	       185 ns/op
BenchmarkFNV32NativeAlloc-8   	10000000	       154 ns/op
BenchmarkFNV32NativeAlloc-8   	10000000	       151 ns/op
BenchmarkFNV32NativeAlloc-8   	10000000	       154 ns/op
BenchmarkFNV32NativeAlloc-8   	10000000	       153 ns/op
BenchmarkFNV32NativeAlloc-8   	10000000	       150 ns/op
BenchmarkFNV32NativeAlloc-8   	10000000	       152 ns/op
BenchmarkFNV32NativeAlloc-8   	10000000	       149 ns/op
BenchmarkFNV32NativeAlloc-8   	10000000	       152 ns/op
BenchmarkFNV32NativeAlloc-8   	10000000	       152 ns/op
BenchmarkFNV32NativeReuse-8   	10000000	       140 ns/op
BenchmarkFNV32NativeReuse-8   	10000000	       138 ns/op
BenchmarkFNV32NativeReuse-8   	10000000	       137 ns/op
BenchmarkFNV32NativeReuse-8   	10000000	       136 ns/op
BenchmarkFNV32NativeReuse-8   	10000000	       139 ns/op
BenchmarkFNV32NativeReuse-8   	10000000	       139 ns/op
BenchmarkFNV32NativeReuse-8   	10000000	       141 ns/op
BenchmarkFNV32NativeReuse-8   	10000000	       139 ns/op
BenchmarkFNV32NativeReuse-8   	10000000	       138 ns/op
BenchmarkFNV32NativeReuse-8   	10000000	       138 ns/op
PASS
ok  	github.com/orcaman/fnv-bench	80.214s

For reference, here is my benchmarking code:

package fnvbench

import (
	"hash/fnv"
	"math/rand"
	"testing"
)

const (
	maxElems  = 10000000
	maxStrLen = 128
	letters   = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
)

// generate random string of given length
func randString(n int) string {
	b := make([]byte, n)
	for i := range b {
		b[i] = letters[rand.Int63()%int64(len(letters))]
	}
	return string(b)
}

// generate slice with maxElems strings of variable length: 0 - maxStrLen
func prepareData() *[]string {
	var strLen int
	stringArray := make([]string, maxElems)
	for i := 0; i < maxElems; i++ {
		strLen = rand.Intn(maxStrLen)
		stringArray[i] = randString(strLen)
	}
	return &stringArray
}

// custom FNV32 from concurrent-map package
func fnv32(key string) uint32 {
	hash := uint32(2166136261)
	const prime32 = uint32(16777619)
	for i := 0; i < len(key); i++ {
		hash *= prime32
		hash ^= uint32(key[i])
	}
	return hash
}

var dataSet = prepareData()

func BenchmarkFNV32Custom(b *testing.B) {
	for i := 0; i < b.N; i++ {
		fnv32((*dataSet)[i%maxElems])
	}
}

func BenchmarkFNV32NativeAlloc(b *testing.B) {
	for i := 0; i < b.N; i++ {
		hasher := fnv.New32()
		hasher.Write([]byte((*dataSet)[i%maxElems]))
		hasher.Sum32()
	}
}

func BenchmarkFNV32NativeReuse(b *testing.B) {
	hasher := fnv.New32()
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		hasher.Reset()
		hasher.Write([]byte((*dataSet)[i%maxElems]))
		hasher.Sum32()
	}
}

josharian · 2017-04-01T13:52:08Z

And here are those same results presented using benchstat. Please use it in the future.

name                old time/op  new time/op  delta
FNV32Custom-8       72.8ns ± 4%  74.5ns ± 6%    ~     (p=0.118 n=10+10)
FNV32NativeAlloc-8   147ns ± 4%   152ns ± 2%  +3.09%  (p=0.003 n=9+9)
FNV32NativeReuse-8   135ns ± 2%   138ns ± 1%  +2.52%  (p=0.001 n=10+8)

bradfitz · 2017-04-01T14:07:00Z

Are you sure you're not comparing apples and oranges? Those string to byte slice conversions look like allocations in your inner loop to me.

orcaman · 2017-04-01T14:51:22Z

@bradfitz seems you are right!

I changed the benchmarking code to prepare the data as slices of bytes instead of strings (see here) and the results are now quite comparable.

This is after running the new benchmark 10 times again:

$ benchstat go1_8.txt go_tip.txt 
name                old time/op  new time/op  delta
FNV32Custom-8       89.5ns ±10%  83.0ns ± 5%  -7.25%  (p=0.010 n=10+10)
FNV32NativeAlloc-8   105ns ± 8%   110ns ± 4%    ~     (p=0.108 n=10+10)
FNV32NativeReuse-8  91.0ns ±10%  89.0ns ± 6%    ~     (p=0.382 n=10+10)

bradfitz · 2017-04-01T17:08:01Z

Looks like there's nothing to do here, then.

You should run more iterations of the benchmark -count=... so the p value above is lower and says something. Right now they look comparable enough, though.

I'll close this but holler if you see room for improvement.

gopherbot added this to the Proposal milestone Mar 30, 2017

gopherbot added the Proposal label Mar 30, 2017

bradfitz closed this as completed Apr 1, 2017

orcaman mentioned this issue Apr 1, 2017

function fnv32 orcaman/concurrent-map#48

Closed

golang locked and limited conversation to collaborators Apr 1, 2018

gopherbot added the FrozenDueToAge label Apr 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: a faster, better fnv32 implementation #19780

proposal: a faster, better fnv32 implementation #19780

orcaman commented Mar 30, 2017 •

edited

ALTree commented Mar 30, 2017 •

edited

orcaman commented Mar 30, 2017

ALTree commented Mar 30, 2017 •

edited

orcaman commented Mar 30, 2017

bradfitz commented Mar 30, 2017

ALTree commented Mar 30, 2017

bradfitz commented Mar 30, 2017

orcaman commented Mar 30, 2017

bronze1man commented Mar 31, 2017 •

edited

bradfitz commented Mar 31, 2017

ALTree commented Mar 31, 2017

orcaman commented Apr 1, 2017

josharian commented Apr 1, 2017

bradfitz commented Apr 1, 2017

orcaman commented Apr 1, 2017

bradfitz commented Apr 1, 2017

proposal: a faster, better fnv32 implementation #19780

proposal: a faster, better fnv32 implementation #19780

Comments

orcaman commented Mar 30, 2017 • edited

What version of Go are you using (go version)?

What operating system and processor architecture are you using (go env)?

What did you do?

ALTree commented Mar 30, 2017 • edited

orcaman commented Mar 30, 2017

ALTree commented Mar 30, 2017 • edited

orcaman commented Mar 30, 2017

bradfitz commented Mar 30, 2017

ALTree commented Mar 30, 2017

bradfitz commented Mar 30, 2017

orcaman commented Mar 30, 2017

bronze1man commented Mar 31, 2017 • edited

bradfitz commented Mar 31, 2017

ALTree commented Mar 31, 2017

orcaman commented Apr 1, 2017

josharian commented Apr 1, 2017

bradfitz commented Apr 1, 2017

orcaman commented Apr 1, 2017

bradfitz commented Apr 1, 2017

orcaman commented Mar 30, 2017 •

edited

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?

ALTree commented Mar 30, 2017 •

edited

ALTree commented Mar 30, 2017 •

edited

bronze1man commented Mar 31, 2017 •

edited