Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: allow map hashes with different tradeoffs #36915

Open
eric-s-raymond opened this issue Jan 31, 2020 · 10 comments
Open

runtime: allow map hashes with different tradeoffs #36915

eric-s-raymond opened this issue Jan 31, 2020 · 10 comments

Comments

@eric-s-raymond
Copy link

@eric-s-raymond eric-s-raymond commented Jan 31, 2020

What version of Go are you using (go version)?

$ go version
go version go1.13.4 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
go version go1.13.4 linux/amd64
esr@snark:~/WWW/reposurgeon$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/esr/.cache/go-build"
GOENV="/home/esr/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/esr/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/lib/go-1.13"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go-1.13/pkg/tool/linux_amd64"
GCCGO="/usr/bin/gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/esr/WWW/reposurgeon/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build890210982=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Wrote a complicated data transformation using maps heavily. Observed that after I did enough optimization, map hashing costs actually competed with GC as a performance bottleneck.

This is where the standard bug template stops being useful. The problem I see is that the choice of aeshash512 is more expensive than is right for my application, and I can't fix that.

I'm not arguing with the choice to optimize for cryptographic hardness over performance by default; given Golang's intended role as a language for network servers this was sensible. But my application - reposurgeon - doesn't need that hardening, and is paying a significant performance cost for it.

Therefore, this RFE: Offer a runtime switch that allows plugging in a weaker but faster hash.

@randall77

This comment has been minimized.

Copy link
Contributor

@randall77 randall77 commented Jan 31, 2020

aeshash512

Where did 512 come from? That doesn't exist.

What do your keys look like (type, size, etc.)? Do you have a suggestion for an alternative, and how fast do you think it would be?

@eric-s-raymond

This comment has been minimized.

Copy link
Author

@eric-s-raymond eric-s-raymond commented Jan 31, 2020

randall77: I thought I saw that suffix in the profiles.

My keys are typically strings or 32-bit ints (in the latter case these numbers are revision IDs in a repository event sequence).

I'm not an expert on hash functions, so I hesitate to suggest a concrete alternative. What I want is lookup speed.

@mvdan

This comment has been minimized.

Copy link
Member

@mvdan mvdan commented Jan 31, 2020

Have you explored using a custom map implementation? For example, if you want to optimize for performance, and you know that the keys are 32-bit integers, I'm sure it's possible to beat map[int32]Value with a custom type. For example, see https://github.com/brentp/intintmap - though I'm not an expert either, and I haven't tried that particular package.

You'd have to replace the nice syntax with methods, but I think that's an OK tradeoff for the niche use cases where map isn't enough.

@odeke-em odeke-em changed the title RFE: Allow map hashes with different tradeoffs runtime: allow map hashes with different tradeoffs Jan 31, 2020
@seebs

This comment has been minimized.

Copy link
Contributor

@seebs seebs commented Jan 31, 2020

https://godoc.org/modernc.org/hash <-- this also exists

having seen the nightmares the Java people get from allowing users to specify hashes, I am moderately inclined to think that Go's choice of "don't let user specify a hash" is a good one for the innate map implementation.

@networkimprov

This comment has been minimized.

Copy link

@networkimprov networkimprov commented Jan 31, 2020

Re "nightmares the Java people get," could you elaborate?

@seebs

This comment has been minimized.

Copy link
Contributor

@seebs seebs commented Jan 31, 2020

A while back Java started switching buckets to tree structures under load because it's fairly common for user-provided hash choices to be nigh-pessimal, producing a handful of buckets which end up with most of the items in them. I was told once that this may be in part because some IDE's prefilled template for a hash implementation provided a spectacularly poorly-considered suggested default hash, but that could be apocryphal. But generally, it turns out that if you allow people to pick hash functions, they do, and most of us are genuinely not qualified to pick hash functions, but also most of us don't realize this. Allowing a bit more specialization without actually letting users pick the function might work, but I'd be wary.

I also note that, at least in my experience, if I am seeing the map lookup get expensive, it's possible that I need to coalesce operations -- extract a value, do operations on it, put it back in, rather than writing a long sequence of operations using the map key. That's bitten me more than once.

@randall77

This comment has been minimized.

Copy link
Contributor

@randall77 randall77 commented Jan 31, 2020

@networkimprov There is also the danger of writing a hash function and an equals function that aren't compatible (two equal values hash to different things). When that happens, it leads to very confusing behavior.

@eric-s-raymond

This comment has been minimized.

Copy link
Author

@eric-s-raymond eric-s-raymond commented Jan 31, 2020

@beoran

This comment has been minimized.

Copy link

@beoran beoran commented Jan 31, 2020

Maybe this could be done by adding a third argument to make() for maps, where the third argument is a string with the description of the hashing function/strategy. The compiler errors if the third argument is unknown to it.

Then again, if we get Go generics, then a generic hash map library should solve this issue just a easily and perhaps with even better performance.

@CAFxX

This comment has been minimized.

Copy link
Contributor

@CAFxX CAFxX commented Feb 1, 2020

Maybe this could be done by adding a third argument to make() for maps, where the third argument is a string with the description of the hashing function/strategy. The compiler errors if the third argument is unknown to it.

I would rather avoid extra parameters, and have maps start with a faster/weaker hash: if then the map detects extremely skewed bucket overflows (that IIRC we already use to trigger map growth) it could switch to a slower hash with better confusion properties (like the current aeshash) during map growth.

@cagedmantis cagedmantis added this to the Backlog milestone Feb 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
8 participants
You can’t perform that action at this time.