Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rollinghash-related improvements #1

Open
chmduquesne opened this issue Mar 13, 2019 · 0 comments
Open

rollinghash-related improvements #1

chmduquesne opened this issue Mar 13, 2019 · 0 comments

Comments

@chmduquesne
Copy link

chmduquesne commented Mar 13, 2019

Hi @kshedden ,

I was curious about how you were using rollinghash, so I started reading your code.

For increased speed, I would highly recommend that you cast your rolling hashes to the actual underlying hash used. So at cmd/muscato_screen/main.go:146, I would replace

	hashes := *hashPool.Get().(*[]rollinghash.Hash32)

with

	hashes := *hashPool.Get().(*[]buzhash32.Buzhash32)

Indeed, when you cast to an interface, the compiler does not know which actual implementation you are referring to (because it could be changed at run time). Specifying the actual implementation allows the compiler to inline the calls to Roll(), which should lead to noticeable speed improvements. I mention this in https://github.com/chmduquesne/rollinghash#gotchas, but you are a long time user so you wrote this code before I realized this and documented it.

Another thing you could do is to attempt using buzhash32.GenerateHashes(seed). It is similar to your genTables(), but it is directly provided by buzhash32. So in cmd/muscato_screen/main.go:106, you could replace

	New: func() interface{} {
		hashes := make([]rollinghash.Hash32, config.NumHash)
		for j := range hashes {
			hashes[j] = buzhash32.NewFromUint32Array(tables[j])
		}
		return &hashes
	},

with

	New: func() interface{} {
		hashes := make([]buzhash32.Buzhash32, config.NumHash)
		for j := range hashes {
			hashes[j] = buzhash32.NewFromUint32Array(buzhash32.GenerateHashes(rand.Int63()))
		}
		return &hashes
	},

One direct benefit is that it would be less code to maintain for you: you could get rid of your tables array and of your genTables function.

A second, indirect benefit, is that buzhash32.GenerateHashes is designed to predictably generate from a seed, so you could take advantage of this in order to always get the same collection of rollinghashes. You would do that by replacing the call to rand.Int63() with a call to myrand.Int63(), where myrand is a rand.Rand object initiated with a predictable source. I believe this could serve you when writing tests.

All the best,
@chmduquesne

Keep in mind that I did not attempt to compile any of this, so there might be mistakes :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant