Skip to content

hash/maphash: add Bytes and String #42710

@dsnet

Description

@dsnet

Update, Mar 16 2022 - Current proposal is #42710 (comment).


The overhead of constructing a Hash object in order to call Hash.Write greatly diminishes its utility for small buffers.

Consider the following micro-benchmark:

var source = []byte("hello, world") // 12 bytes long
var sink uint64

func BenchmarkMapHash(b *testing.B) {
	var seed = maphash.MakeSeed()
	for i := 0; i < b.N; i++ {
		var h maphash.Hash
		h.SetSeed(seed)
		h.Write(source)
		sink = h.Sum64()
	}
}

//go:linkname runtime_memhash runtime.memhash
//go:noescape
func runtime_memhash(p unsafe.Pointer, seed, s uintptr) uintptr

func BenchmarkUnsafeHash(b *testing.B) {
	for i := 0; i < b.N; i++ {
		sink = uint64(runtime_memhash(*(*unsafe.Pointer)(unsafe.Pointer(&source)), 0, uintptr(len(source))))
	}
}

On my machine this produces:

BenchmarkMapHash      	56459700	        21.6 ns/op
BenchmarkUnsafeHash   	246443527	        4.90 ns/op

Directly using runtime_memhash is around ~4x faster since it avoids all the unnecessary state contained by Hash that are relatively pointless when hashing a single small string.

I propose adding the following API:

func Sum(b []byte, seed Seed) uint64

where the function is a thin wrapper over runtime_memhash. It takes Seed as an input to force the user to think about the application of seeds.

I chose the name Sum to be consistent with md5.Sum or sha1.Sum.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions