-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: container: add tree-based ordered.Map/Set #60630
Comments
|
There's already a set proposal that is snagged on iteration. This seems related to that. This could be containers/ordered.Set and containers/ordered.Map and have overlapping methods with containers/set. |
@ianlancetaylor I'm agree that
For those internal maybe relay on The rest are At least for CPP, operations on containers can be split into several parts, If I understand correctly, what you care about above is the |
I am curious why an ordered map cannot be introduced or considered. As stated, there are many different implementations of it, which are increasingly being adopted. Still, they are painful to implement when using third-party libraries (especially when those involve To provide a more concrete example, I am using libopenapi, which uses a custom orderedmap, rendering specifications requires the ordering to be in place, especially when visualizing the results. With the current lack of an ordered map solution, a developer must introduce another solution compatible with the native supporting packages (e.g., templates and range functionality). |
Coming back, with the use of |
The Keys and Values methods (and a new All method) should definitely use go1.23 iterators rather than materializing a slice. func (*Map[K, V]) Keys() iter.Seq[K]
func (*Map[K, V]) Values() iter.Seq[V]
func (*Map[K, V]) All() iter.Seq2[K, V] Reverse too could return an iterator rather than a clone: func (*Map[K, V]) Reverse() iter.Seq2[K, V] Similarly, SubMap (renamed Range) could be an iterator for a portion of the key space. If you need to construct a new map from a Range (or All), combine the iterator with a InsertAll method that consumes the iterator. func (*Map[K, V]) Range(from, to K) iter.Seq2[K, V]
func (*Map[K, V]) InsertAll(iter.Seq2[K, V]) bool What is Entry? Just a pair? Or a reference to a part of the representation? (Java's Map.Entry took the latter choice and IIRC it was a major barrier to optimized special-purpose maps, because they still had to allocate one object per entry.) Is it necessary? I like that all the operations return the maximum amount of information in a single access (e.g. Insert reports whether there was a previous entry), as these can really improve efficiency. Retain should be DeleteFunc, with sense inverted. We should try to harmonize the APIs of related collections types such as sets, lists, maps, queues, and so on. Java 1.5's collections did a good job of establishing conventions across a range of data types so that, for example, they all have an Naming suggestion: |
I don't like
👍 to Also, |
I feel like |
@DeedleFake I would expect inserting into an |
Agreed, when this proposal was made, iterators had not yet been implemented. Now that they have been implemented, it makes sense to use iterator syntax. The existence of Entry is a personal preference. Since seq2 has never been seen in other languages, tuple is used instead, but since go uses seq2, there is no better reason not to use it here. Short names are indeed better than long names, changed in original proposal.
Although I still think seq2 is a less successful abstraction. ^_^ |
Since the native map supports direct insertion of a key-value pair at the syntactic level, but this does not work here. If you insert a sequence using insert, then I cannot think of any other suitable naming for inserting the key-value. How about use |
|
I've been looking into this topic a bit and I'd like to share some of that work. This comment is about the leading ordered map implementation, github.com/google/btree. I think it's relevant to talk about, because if it's great, or maybe even good enough, then it can satisfy the ecosystem's need for an ordered map and there's no compelling reason to add one to the standard library. Python seems to have taken a similar route: there is no ordered map in the Python standard library, but there is at least one high-quality third-party implementation. The strengths of google/btree (as I'll call it throughout) are:
Its weaknesses are:
Last but not least, google/btree uses copy-on-write. One way to think about that is that cloning, instead of copying all at once, happens incrementally. Each clone copies a few of the shared tree nodes during each modification, until both are fully copied and no nodes are shared. Of course, if those modifications never happen, no copying happens either. Copy-on-write provides a kind of iteration guarantee. Clone a btree, then iterate over the clone while confining all modifications to the original. The iteration will observe a snapshot of the original's items. Furthermore, the iteration is concurrency-safe (though not the individual keys and values), because the tree nodes are never written—they are always copied first. Is copy-on-write a strength or a weakness? I would say both. It makes some operations, like iteration and keeping a snapshot of a tree, quite inexpensive. But its performance can be surprising if you are not aware of what's happening inside the implementation. For example, even after the iteration over a clone is finished and the clone becomes garbage, the original will still be copied each time a "shared" node is modified; the implementation has no good way to tell that the nodes are no longer shared. google/btree is arguably good enough to serve as the ecosystem's ordered map of choice. On the other hand, its quirks and lack of ranged delete leave room to argue that there is a place for a more idiomatic implementation. |
In this second comment (the first is #60630 (comment)), I'd like to suggest a modern, idiomatic API for ordered maps. The github.com/jba/omap package is based on Russ Cox's rsc.io/omap but adds a few common operations and an improved range API. In comparison with github.com/google/btree, this implemention:
If we were to adopt an ordered map implementation, I think it would be a good starting point. |
It does support range-over-func, actually, just not explicitly: https://go.dev/play/p/pc574L-uHyf |
@jba Most agree, But why is there a MapFunc type? Can it write to func NewMap[K, V any](cmp func(K, K) int) *Map[K, V]
func NewOrderedMap[K Ordered, V any]() *Map[K, V] {
return NewMap[K, V](cmp.Compare[K])
} |
|
@jba, could If any method wants to take another map as an argument, the current two-struct approach is going to make it awkward to implement.
Maybe an argument can be make that |
@jba I wonder if you can confirm that github.com/jba/omap is strictly superior to github.com/google/btree? P.S. I am a Go user needing an OrderedMap for backing sparse matrices. |
On the contrary, I think it's important to compare these packages. I don't think there is a clear winner. In #60630 (comment) I talk in detail about the strengths and weaknesses of google/btree. The main weaknesses are the clumsy API, the lack of range delete, and the lack of iteration guarantees without doing a possibly expensive clone. Google/btree is probably faster because jba/omap is unoptimized. For example, jba/omap allocates all new nodes on the heap instead of using a sync.Pool. There are other implementation tricks too that I haven't bothered with because the main purpose of jba/omap is to serve as a reference implementation for this proposal. So I think it depends on your particular use case. From my point of view, I'd prefer you use jba/omap to help shake out bugs and serious performance issues, and perhaps to contribute improvements. I'd say if you're working on a personal or student project, it would be a more forward-looking choice. If you need something for serious production work, then certainly google/btree has a lot more mileage on it.
The Go team talks about this from time to time, but the problems are that it's a lot of work, and also we want to remain impartial. |
@jba Thanks for your helpful comparison!
Understood, and thanks to the entire Go team for this great work all these years! |
There are at least several thousand related implementations OrderedMap/OrderedSet, although we have generics, there are no related containers in the standard library, so people have to implement their own sorting containers.
So I propose add
ordered.Map/Set
intostd
, maybecontainer
package is a suitable place.The main API are:
Appendix A
rb-tree
Appendix B
If we land sum type in go, those return a bool flag would be instead by
Option
orMaybe
, it's intuitive and more suitable.Sum type is more suitable than Product Type here.
Appendix C
Since
Iter
should be a separate issue, so this proposal would not includeIter
api. So moveIter
API to end of section.The text was updated successfully, but these errors were encountered: