-
Notifications
You must be signed in to change notification settings - Fork 18.9k
Closed
Description
Update, Mar 16 2022 - Current proposal is #42710 (comment).
The overhead of constructing a Hash object in order to call Hash.Write greatly diminishes its utility for small buffers.
Consider the following micro-benchmark:
var source = []byte("hello, world") // 12 bytes long
var sink uint64
func BenchmarkMapHash(b *testing.B) {
var seed = maphash.MakeSeed()
for i := 0; i < b.N; i++ {
var h maphash.Hash
h.SetSeed(seed)
h.Write(source)
sink = h.Sum64()
}
}
//go:linkname runtime_memhash runtime.memhash
//go:noescape
func runtime_memhash(p unsafe.Pointer, seed, s uintptr) uintptr
func BenchmarkUnsafeHash(b *testing.B) {
for i := 0; i < b.N; i++ {
sink = uint64(runtime_memhash(*(*unsafe.Pointer)(unsafe.Pointer(&source)), 0, uintptr(len(source))))
}
}On my machine this produces:
BenchmarkMapHash 56459700 21.6 ns/op
BenchmarkUnsafeHash 246443527 4.90 ns/op
Directly using runtime_memhash is around ~4x faster since it avoids all the unnecessary state contained by Hash that are relatively pointless when hashing a single small string.
I propose adding the following API:
func Sum(b []byte, seed Seed) uint64where the function is a thin wrapper over runtime_memhash. It takes Seed as an input to force the user to think about the application of seeds.
I chose the name Sum to be consistent with md5.Sum or sha1.Sum.
Reactions are currently unavailable