-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: container/unordered: a generic hash table with custom hash function and equivalence relation #69559
Comments
Related Issues and Documentation
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.) |
Your proposal has several semantic differences from the builtin map type. Are these intentional or oversights? Ones I noticed:
On the implementation detail side, if we keep semantics identical to builtin maps, I suspect we could share an implementation (except for iteration). If semantics differ slightly, I think sharing would be difficult, as the builtin maps are very performance sensitive, so adding special cases may hurt overall performance. I worry a bit about maintenance burden about having many map implementations (builtin, sync.Map, this one, #60630), especially if we want them all to be high performance. |
All good questions. My personal feelings are:
More important to me than any of these items is that the designs for ordered map, unordered map and set are considered together. |
I would much prefer the name Get instead of At. Seems to parallel Set, and matches existing similar functions I've seen. I recommend Get returning the zero value if the key is not present (simpler to use most of the time) but then having a Lookup method which returns an ok bool as well (a pain if it's not available when you do need it). This is like reflect.StructTag.Get and .Lookup, and os.Getenv vs LookupEnv. Stable String order is very useful IMO. I think "stable" is better than "sorted", eg "10" before "2" is okay as long as it's well defined. The main thing you want is stability for comparisons (normally in tests, but not always). Size hint - I think a sensible default is normally fine. We can always add NewMapSize later on if really needed. I think needing the previous value in Set and Delete would be extremely rare. I'd slightly prefer not having it for a simpler API. |
Out of interest, would this work for starlark-go's maps, or is that too specific a use case? |
It's quite common to want the previous value back from set, either for its value, or to distinguish insert from update at no cost. This can be done by calling Len() before and after Set, but I think all our collections types should provide at least a boolean result from Set. Delete too for similar reasons, though deletions are rare in practice. I agree with your other thoughts.
Starlark maps are insertion-ordered, and we definitely aren't promising that here. (I think the other semantics about mutation, freeze, and iteration could be implemented as a wrapper.) |
Meta: So there are proposals for
From that, it seems like there are missing entries
|
Not sure that the ordered map/set are a good idea in the stdlib. Just so that there is some amount of visibility. (just a note in passing) |
as far as the shared interface I think they should all have: // v = m[k]
At(K) V
// v, ok = m[k]
Get(K) (V, bool)
// _, ok = m[k]
Contains(K) bool (Maybe |
To get back to this specific proposal: The All and Keys methods iterate over Type instead of K. There is no Values iterator. Should the hash function return |
Thanks, both fixed.
Probably, yes. Done. |
We have implemented something similar at our organization: gomap.Map. It has the goal of providing the exact same semantics of Go's builtin map, but with user-provided hash and equals functions. One difference in API from the current proposal is that the hash function takes in a |
I believe that by default we should use We can provide a func NewMap[K, V any](hash func(maphash.Seed, K) uint64, eq func(K, K) bool) *Map[K, V] |
A stable string representation is useful for tests. There isn't actually a very good reason to call .String() on a Map besides for a test or debugging anyway, so the cost of sorting it is not a big deal. You're not going to display an unformatted Map to an end user and expect them to understand it. On the method names bikeshed, I like |
Providing a |
@aaronbee That is true, i would also prefer Also that would make even more sense, with the newly introduced global functions: #54670. NewMap[int, int](maphash.Comparable[int], /*...*/) |
To work with A more general solution would be to take a seed type as a generic param but then you need that and a third func to handle resetting the seed on That is getting unwieldy and overfits to To avoid this, it could take an interface type HashProvider[T any] interface {
Hash(T) uint64
Equal(T, T) bool
Reset()
} |
If this becomes a problem we can always add a |
The interface version doesn't limit the state to a uint64 or tie usage to maphash when it isn't needed while still allowing one to use maphash quite easily. |
I agree with that, but that will complicate the API easily. I guess most users will never use anything else than maphash. |
it would only matter when creating a new map and you could still have type NewMap[K, V any](hp HashProvider[K]) *Map[K, V] {/*...*/}
type NewMapFuncs[K, V any](hash func(K) uint64, eq func(K, K) bool) *Map[K, V] {
return NewMap[K, V](&basicHasher[K]{hash, eq})
} |
But then the type NewMapFuncs[K, V any](hash func(K) uint64, eq func(K, K) bool) *Map[K, V] {
return NewMap[K, V](&basicHasher[K]{hash, eq})
} |
Here's an example of using type ByteHasher struct {
hash *maphash.Hash
}
func (b *ByteHasher) Hash(v []byte) uint64 {
defer b.hash.Reset()
b.hash.Write(v)
return b.hash.Sum64()
}
func (b *ByteHasher) Equal(v0, v1 []byte) bool {
return bytes.Equal(v0, v1)
}
// called on first use and Clear
func (b *ByteHasher) Reset() {
if b.hash == nil {
b.hash = &maphash.Hash{}
}
b.hash.SetSeed(maphash.MakeSeed())
}
var m = unordered.NewMap[[]byte, someV](&ByteHasher{}) and without using var m = unordered.NewMapFuncs[type.Type, someV](types.Hash, types.Identical) |
In my opinion, the fact that
are really just saying "seed should be a I agree with @jimmyfrasche that making the seed type transparent is an inherent advantage of the interface approach. I would much prefer it over "passing multiple functions". Also, because that actually makes it clearer that the different operations need to co-operate. This is something that is special about this proposal: As far as I can tell, this is the first time we would have a standard library package that needs more than one user-defined "operator" that are connected in how they behave. That being said, I am a bit concerned because making the seed transparent means that to support it, you intrinsically have to allocate. No matter whether we use functions or an interface. One way to solve that would be to make it a type-parameter: type HashProvider[T, Seed any] struct {
Hash(T, Seed) uint64
Equal(T, T) bool
Reset(Seed)
} That way, you could have a I have also argued in various places, that the "take functions to customize operators" approach has an intrinsic downside for containers specifically: It means the container can no longer have a useful zero value. Arguably, we are used to that with In my opinion, for containers specifically, using methods on the element type is better. It does require additional boilerplate in some cases, yes. But it enables useful zero values and it makes it easier for the compiler to generate efficient code. Function-value fields can theoretically change, but a method can't. In my opinion, if every container-operation has to incur an indirect call because the compiler can't properly devirtualize it, it almost defeats the purpose of user-defined containers in the first place. Which is that I care deeply about specific performance characteristics. In any case, I think my largest point with this comment is: The design space for library container types is huge and IMO still mostly unexplored - at least when it comes to exploration by the Go team itself (who I trust most when it comes to Go API design). I'm not aware of any generic containers in the standard library or I'd point to |
This proposal has been added to the active column of the proposals project |
Background: In #69420 I proposed to promote the golang.org/x/tools/go/types/typeutil.Map data type to the go/types package, modernizing it with generics. @jimmyfrasche pointed out that really the only part of that proposal that needs to be in go/types is the
types.Hash
function, which should have semantics consistent withtypes.Identical
: that is, identical types must have equal hashes. So, I reduced that proposal to just the hash function, and this proposal governs the table.Proposal: the standard library should provide a generic hash table that allows the user to specify the hash function and equivalence relation. Here is a starting point for the API:
Open questions:
Related:
The text was updated successfully, but these errors were encountered: