proposal: container/set: new package to provide a generic set type (discussion) #47331

ianlancetaylor · 2021-07-21T23:39:51Z

ianlancetaylor
Jul 21, 2021
Maintainer

This is a discussion that is intended to turn into a proposal.

This proposal is for use with #43651. We propose defining a new package, container/set, that will introduce a new set type. It is possible that this proposal, if accepted, will be included with the first release of Go that implements #43651 (we currently expect that that will be Go 1.18). Or it may be appropriate to delay this package until a later release. This package will not be in the 1.18 release, but is for consideration for a later release.

This description below is focused on the API, not the implementation. In general the implementation will be straightforward.

See also the slices proposal at #45955 (discussion at #47203) and the maps proposal discussion at #47330.

// Package set defines a Set type that holds a set of elements.
package set

// A Set is a set of elements of some comparable type.
// Sets are implemented using maps, and have similar performance characteristics.
// Like maps, Sets are reference types.
// That is, for Sets s1 = s2 will leave s1 and s2 pointing to the same set of elements:
// changes to s1 will be reflected in s2 and vice-versa.
// Unlike maps, the zero value of a Set is usable; there is no equivalent to make.
// As with maps, concurrent calls to functions and methods that read values are fine;
// concurrent calls to functions and methods that write values are racy.
type Set[Elem comparable] struct {
	// contains filtered or unexported fields
}

// Of returns a new set containing the listed elements.
func Of[Elem comparable](v ...Elem) Set[Elem]

// Add adds elements to a set.
func (s *Set[Elem]) Add(v ...Elem)

// AddSet adds the elements of set s2 to s.
func (s *Set[Elem]) AddSet(s2 Set[Elem])

// Remove removes elements from a set.
// Elements that are not present are ignored.
func (s *Set[Elem]) Remove(v ...Elem)

// RemoveSet removes the elements of set s2 from s.
// Elements present in s2 but not s are ignored.
func (s *Set[Elem]) RemoveSet(s2 Set[Elem])

// Contains reports whether v is in the set.
func (s *Set[Elem]) Contains(v Elem) bool

// ContainsAny reports whether any of the elements in s2 are in s.
func (s *Set[Elem]) ContainsAny(s2 Set[Elem]) bool

// ContainsAll reports whether all of the elements in s2 are in s.
func (s *Set[Elem]) ContainsAll(s2 Set[Elem]) bool

// Values returns the elements in the set s as a slice.
// The values will be in an indeterminate order.
func (s *Set[Elem]) Values() []Elem

// Equal reports whether s and s2 contain the same elements.
func (s *Set[Elem]) Equal(s2 Set[Elem]) bool

// Clear removes all elements from s, leaving it empty.
func (s *Set[Elem]) Clear()

// Clone returns a copy of s.
// The elements are copied using assignment,
// so this is a shallow clone.
func (s *Set[Elem]) Clone() Set[Elem]

// Filter deletes any elements from s for which keep returns false.
func (s *Set[Elem]) Filter(keep func(Elem) bool)

// Len returns the number of elements in s.
func (s *Set[Elem]) Len() int

// Do calls f on every element in the set s,
// stopping if f returns false.
// f should not change s.
// f will be called on values in an indeterminate order.
func (s *Set[Elem]) Do(f func(Elem) bool)

// Union constructs a new set containing the union of s1 and s2.
func Union[Elem comparable](s1, s2 Set[Elem]) Set[Elem]

// Intersection constructs a new set containing the intersection of s1 and s2.
func Intersection[Elem comparable](s1, s2 Set[Elem]) Set[Elem]

// Difference constructs a new set containing the elements of s1 that
// are not present in s2.
func Difference[Elem comparable](s1, s2 Set[Elem]) Set[Elem]

randall77 · 2021-07-22T00:17:15Z

randall77
Jul 22, 2021
Maintainer

Needs some sort of performance promise. I think it would be ok to promise that Add and Has are amortized O(log(n)).
Not sure we need to enumerate that everywhere, but maybe a note at the bottom of package docs would list promises for all the functions.

12 replies

robaho Sep 19, 2022

The implementation of a map won’t change so neither will it’s performance. There is no such thing as the “performance of a set” - it’s the performance of the implementation - which can vary based on the operation (get, remove, iterate, etc) and other characteristics (memory vs cpu, etc)

e.g a TreeSet has different performance characteristics than a HashSet - similarly for a set backed by a map.

apparentlymart Sep 20, 2022

I'm not sure if you were replying to me specifically @robaho but in the interests of being clear: my comment was assuming that we are talking about a specific implementation of sets rather than the general idea of sets, because the original message showed a struct type and subsequent discussion has been explicit that this is talking about only a thin wrapper around a map for the initial implementation. My question was therefore about whether the participants in this sub-thread would be satisfied by a promise of performance relative to the built-in map implementation, rather than a promise of a specific big-O complexity.

You're right that it doesn't make sense to make performance promises about an interface, and so if I misunderstood and this is instead talking about an interface intended to be implemented elsewhere (outside of the standard library?) then indeed my question (and possibly this entire sub-thread) is moot.

robaho Sep 20, 2022

I am not sure you can discuss a specific implementation of a set then go create an interface that has different method signatures - so you are defacto defining the interface by creating the implementation first (unless creating wrapper structs is on the table - which seems not ideal for core stdlib collection implementations).

Merovius Sep 20, 2022

@apparentlymart

my comment was assuming that we are talking about a specific implementation of sets rather than the general idea of sets, because the original message showed a struct type and subsequent discussion has been explicit that this is talking about only a thin wrapper around a map for the initial implementation.

You assumed correctly.

My question was therefore about whether the participants in this sub-thread would be satisfied by a promise of performance relative to the built-in map implementation, rather than a promise of a specific big-O complexity.

Personally, I could accept this, but it seems very awkward to me. There aren't really "equivalent map operations" per se. Like, yes, we will know what it means, but it kind of breaks the entire idea of not talking about the implementation and hiding it in a struct, if we say "just pretend that it's a map[T]struct{} and s.Add(v) is the same as s[v] = struct{}{}" or whatever.

So, just writing down actual complexity classes seems easier and less ambiguous. And I don't see a lot of downsides either. We do it for container/heap, for example. It's a very important and useful piece of information about a data structure, so why not just provide it?

apparentlymart Sep 20, 2022

I suppose I was imagining this being a brief sentence in each operation's documentation stating which map operation the set operation is analogous to. For example, for Add: "Worst-case complexity is guaranteed equivalent to insertion of a new element into a map."

With that said, I also don't have any objection to stating that effect more directly. For some reason on my initial read of this discussion I understood that there was some concern about doing that and so I was attempting to propose a compromise. However, on re-read I can see that I misunderstood and the question was instead whether this should be a concrete implementation at all, vs. an interface with potentially many implementations that would each have their own performance characteristics. I'm sorry for the noise and I withdraw the suggestion.

smasher164 · 2021-07-22T04:31:34Z

smasher164
Jul 22, 2021
Collaborator

Regarding the naming, it seems inconsistent to me to have standalone functions named Union, Intersection, and Difference, when the set type has methods named AddSet and RemoveSet. It almost suggests that AddSet and RemoveSet perform a different operation (I understand they work in place).

4 replies

nickkeets Jul 22, 2021

FWIW Swift calls the in-place versions formUnion and formIntersection.

ianlancetaylor Jul 23, 2021
Maintainer Author

I don't think it's inconsistent. The Union function forms a union of two sets. The AddSet function adds one set to another. While clearly the concepts are very similar I think it would be more confusing to have a Union function that returns a new set and a Union method that changes an existing set.

ajlive Jul 23, 2021

I agree the function and the method should have different names, but I think Swift is at least on the right track here.

Scanning the API proposal above, I did a double-take at AddSet and had to check the comment to make sure I knew what it did.

FormUnion or, say, UnionWith, I think are more clear about both what the method does and what makes it different from the Union function.

deanveloper Jul 23, 2021

I agree with Ian here - AddSet is a good name for a mutable function. Union (or anything containing the word) makes me think of immutable mathematical operations. FormUnion is okay I guess, but IMO it still could be mistaken for s1.FormUnion(s2) creating a new map. s1.AddSet(s2) is very clear that it adds s2 to s1.

Merovius · 2021-07-22T06:17:41Z

Merovius
Jul 22, 2021

// (Note: New and Of are separate because New will always require a type parameter
// whereas Of will often be able to infer the type parameter from the arguments.
// For example, set.New[int] or set.Of(1, 2), but not set.Of[int]().)

I don't understand this argument (also, set.New[int]()). As Of is the more general function, this comment would suggest that New saves us having to write type-parameters - but it, of course, doesn't. "Always having to write a type-parameter" is clearly not a beneficial property of a function.

IMO there should just be one function New[Elem comparable](v ...Elem) Set[Elem]. It still allows you to write the same New call as before and you can get type-inference if you list the elements. Seems strictly better to me.

4 replies

Merovius Jul 22, 2021

In other words: AIUI Of[T]() will be equivalent to New[T]() so New is redundant and should be removed. Freeing up the canonical New name for Of.

ianlancetaylor Jul 23, 2021
Maintainer Author

I wrote both because they read differently. But, yes, we could remove New.

DeedleFake Jul 23, 2021

While New() seems more idiomatic, Of() actually reads quite well for both:

set.Of(1, 2, 3)
set.Of[int]()

ianlancetaylor Jul 26, 2021
Maintainer Author

I've removed New to provide just Of.

I've also changed to pointer receivers, so that the zero value is useful.

Merovius · 2021-07-22T06:27:41Z

Merovius
Jul 22, 2021

I find it awkward to say "Like maps, Sets are reference types" and having to explain that, instead of just making it a pointer. It might feel weird to always have to pass around a Set via a pointer, but for declared types, that's not super uncommon. And if we made every method use a pointer, we can make the zero value of Set equivalent to an empty set - maps are one of the main reasons why I often have to write constructors instead of have my zero values be useful, so I would prefer to be have useful zero values.

4 replies

bcmills Jul 22, 2021

Agreed — having to make all of the fields of map types within a struct gets pretty tedious. The meaningful zero-value is one of the few things about sync.Map that is more ergonomic than a built-in map.

ianlancetaylor Jul 26, 2021
Maintainer Author

I've made the zero value useful by changing the methods to take pointer receivers.

cespare Jul 26, 2021
Collaborator

I've made the zero value useful by changing the methods to take pointer receivers.

That seems like an improvement, but I think we should further encourage sets to be passed by pointer by having all functions that accept or return sets use *Set[Elem] rather than Set[Elem]. (So change Of, Clone, HasAny, Union, etc.) As it is right now, this type uses a mix of pointer and non-pointer types that looks pretty unusual for a data structure type.

jhenstridge Sep 27, 2021

In particular, it means that none of the defined methods are in the method set of the value returned by Of(). While in many cases this doesn't matter, I suspect it will lead to confusion when people try to stuff sets into interface variables. Take the following contrived example, which would fail to compile:

type HasLen interface {
    Len() int
}
var foo HasLen = set.Of(1, 2, 3, 4, 5)

fzipp · 2021-07-22T06:41:29Z

fzipp
Jul 22, 2021

Any reason why Union, Intersection and Difference are functions instead of methods? image.Rectangle has Union, Intersect and other binary operations as methods, image.Point and the types in math/big as well.

Methods would feel more like infix notation. My suggestion would be Set[Elem].Union, Set[Elem].Intersect, Set[Elem].Diff

1 reply

Merovius Jul 22, 2021

Any reason why Union, Intersection and Difference are functions instead of methods?

Union is AddSet and Difference is RemoveSet, except that they operate in-place. There's definitely value in having both an in-place version and a not-in-place version, so adding Union as a method would mean it's the not-in-place version. Personally, I find it unsavory to return a new Set from a method (Clone is an obvious exception). image.Rectangle is different, to me, because it has a constant (small) size.

rogpeppe · 2021-07-22T07:02:54Z

rogpeppe
Jul 22, 2021
Collaborator

Why the requirement not to modify the set during iteration? Maps don't have that requirement and that's a very helpful property. Also, iterating by using a function is awkward (you can't just return from the outer function, or break more than one level of loop), so I wonder if it might be nicer to support some kind of iterator so iteration can be done with a straightforward for loop.

11 replies

Merovius Jul 27, 2021

Another huge difficulty, of course, is that when iterating []T, you might want to get the index you're at, for something like map[K]V you might want to get the key you're at and for something like chan T there is neither. So, a "general purpose iteration API" also needs to concern itself with how to resolve that.

FWIW we can always add methods to work with iterators, once we have an iterator interface.

fzipp Jul 27, 2021

@Merovius I see the challenges, but in my opinion it is worth directing more energy towards figuring out an iteration concept before adding packages for collection types, i.e. lay the foundation before building castles. I see an iteration interface on the same level of fundamentality as io.Reader/Writer that turned out to be incredibly beneficial to the standard library.

Merovius Jul 27, 2021

Note that io.Reader/Writer came after their concrete implementations, though.

fzipp Jul 27, 2021

Note that io.Reader/Writer came after their concrete implementations, though.

@Merovius In 2008, before the public release (called io.Read and io.Write back then) 😁, long before a compatibility promise was in sight, and it actually seems like they were defined at the same time as the first concrete Read(er) implementations.

I don't find "we can always add it later" compelling in this case. It concerns 9 out of the 18 proposed functions/methods (50%). Adding the more general functions later will increase the API surface unnecessarily, and the good function names will be taken by the obsolete functions.

ianlancetaylor Jul 27, 2021
Maintainer Author

It is not clear to me that we will want methods on Set that use iterators. Maybe we will, maybe we won't. In Go it seems to me more likely that iterator operations will take the form of functions. Containers will most likely need methods that provide an iteration mechanism, and then we might have a function like func AddTo(container.Addable, container.Iterator). (This may be a good argument that Set.Add should take a single element rather than a variadic, I'm not sure.)

Separately I'll note that Go already has a builtin iteration mechanism in the for/range statement. A general iterator mechanism should work cleanly with that, which will require language changes one way or another.

fzipp · 2021-07-22T07:30:13Z

fzipp
Jul 22, 2021

"Len" is an unusual name for the cardinality of a set, the colloquial term is usually "Size". Of course, "Len" would fit the Len method in sort.Interface (but the proposed Set type can't be sorted), and other types are measured via the "len" builtin function, so that's probably where it comes from.

2 replies

DeedleFake Jul 23, 2021

Both bytes.Buffer and strings.Builder have a Len() method and aren't sortable, though they are ordered. While I agree that Len() isn't the usual name for the cardinality, I think that I prefer the consistency over whatever's going on with Java's standard library.

ianlancetaylor Jul 26, 2021
Maintainer Author

I think Len is the standard Go name here. Even len(m) works for any map m.

Cyberax · 2021-07-22T08:10:01Z

Cyberax
Jul 22, 2021

Can we get explicit promises for the iteration order? And maybe another hashset without such promises?

10 replies

Cyberax Jul 22, 2021

Yeah, I don't necessarily want to impose order on all sets, just make it very explicit what kind of the collection you're dealing with.

leighmcculloch Jul 24, 2021

A set that preserves not just order from one iteration to the next but insert order would be very useful.

ianlancetaylor Jul 26, 2021
Maintainer Author

This type does not provide any ordering guarantee. This is intentional. I expect that over time there will be other set types that provide different performance/ordering guarantees.

fzipp Jul 26, 2021

I expect that over time there will be other set types that provide different performance/ordering guarantees.

Interoperability between different set types will be difficult, because these methods and functions take concrete types: AddSet, RemoveSet, HasAny, HasAll, Equal, Union, Intersection, Difference

ianlancetaylor Jul 27, 2021
Maintainer Author

I don't yet see a reason for an AddSet method that takes a generalized set. That's a role for a function that takes a container.Addable and a container.Iterator.

cespare · 2021-07-22T08:31:50Z

cespare
Jul 22, 2021
Collaborator

Making Add and Remove variadic doesn't feel quite right to me. These are two of the fundamental primitives for sets and having those methods involve slices seems like extra amount of complexity that doesn't sit flush with the rest of the API. I suspect that the vast majority of Add and Remove calls will have exactly one argument anyway, and there is some small performance penalty to the variadic version.

6 replies

cespare Jul 22, 2021
Collaborator

Personally, from an API perspective, I very much like that Add and Remove are variadic. It easily allows both to add a single element and a slice - and I don't even think adding a slice is that uncommon.

IMO the main reason to use a variadic function is if it will mostly be called with a variable number of arguments in the call. Sometimes in code review I see that someone has written a variadic function but then all the call sites look like f(s...). In that case, it would be better for the function to take a normal, non-variadic slice.

Here, my intuition is that the usage will look like this:

s.Add(v)            // very common
s.Add(vs...)        // an order of magnitude less common
s.Add(v0, v1, v2)   // another order of magnitude less

If that is roughly correct, then the purpose of this being variadic is a kind of backdoor function overloading to allow accepting both a single value and a slice, and mostly not to support s.Add(v0, v1, v2). That's what doesn't feel quite right to me. When I look through all the variadic functions in the standard library (there aren't many) I can't spot any that that seem like they are used this way. Most of them exist to support printf-like functionality.

If we want to support passing in a single value and also passing in a slice, I think that two separate functions (Add(Elem) and AddAll([]Elem)) are better.

I would personally expect them to be inlineable and thus have no penalty.

That does not follow. There is more work to be done to handle a slice than there is to handle a single value: preparing the slice in the caller; looping over the slice in the callee. I'm sure the compiler can do a good job of reducing the overhead, will it get clever enough to reduce it to zero?

Just as an illustration, this benchmark compares Add(string) vs. Add(...string) using a trivial implementation of a string set using a map[string]struct{}. Both functions are inlined. At tip, this gives me

name    old time/op  new time/op  delta
Add-12  10.9ns ± 5%  11.7ns ± 2%  +6.95%  (p=0.000 n=10+10)

Merovius Jul 23, 2021

Fair enough. I find it surprising that there is a performance difference (albeit a small one, TBH), but I can't argue with the numbers. I was obviously making wrong assumptions.

ianlancetaylor Jul 26, 2021
Maintainer Author

I think we should fix the performance difference. We shouldn't decide between single argument and variadic argument on the basis of performance, only on the basis of readability.

cespare Jul 26, 2021
Collaborator

My main argument is that using variadic methods here is a worse API than using methods which take single values (so I wrote that down first and mentioned performance afterward).

However, I'm curious what it could mean to "fix" the performance difference. ISTM that the variadic methods imply passing values inside slices which inherently brings more complexity/instructions than passing a single value. Or are you positing some compiler optimization where it notices a variadic call with a single value and then "unrolls" the loop from the caller?

ianlancetaylor Jul 26, 2021
Maintainer Author

My main argument is that using variadic methods here is a worse API than using methods which take single values (so I wrote that down first and mentioned performance afterward).

Right, sorry for misrepresenting.

But from a readability perspective I don't see much difference. s.Add(1) and s.Add(1, 2) both seem clear to me. Even s.Add(vals...) is reasonably clear. So to me the variadic API doesn't seem like it involves any extra complexity to the reader.

As far as performance goes, we need to teach the compiler to unroll a range loop over a known slice value with less than N elements where N is some value larger than 1. Then I assume the performance will be the same after inlining.

I expect it will look something like the following sequence of rewrites:

    s.Add(1, 2)
    s.Add([]int{1, 2}...)
    for _, v := range []int{1, 2} { s.m[v] = struct{}{} }
    s.m[1] = struct{}{}; s.m[2] = struct{}{}

jba · 2021-07-22T12:03:32Z

jba
Jul 22, 2021
Maintainer

I would expect EqualFunc, as in maps.

1 reply

ianlancetaylor Jul 26, 2021
Maintainer Author

The proposed maps EqualFunc function uses the function to compare map values, not map keys. Sets have no values, only keys, so there is no need for EqualFunc.

jba · 2021-07-22T12:22:16Z

jba
Jul 22, 2021
Maintainer

This proposal uses Has but Contains is used everywhere else (strings, bytes, the proposed slices). To be clear, I prefer Has on its own, but the inconsistency makes it harder to learn and I think that's more important.

1 reply

ianlancetaylor Aug 12, 2021
Maintainer Author

Switched to Contains.

bcmills · 2021-07-22T15:40:21Z

bcmills
Jul 22, 2021

// Filter deletes any elements from s for which keep returns false.
func (s Set[Elem]) Filter(keep func(Elem) bool)

In functional programming, Filter has the connotation of returning a new data structure. Can we find a different name for this?

Personally, I like Prune — which, to me, carries the connotation of lopping off parts (of a plant) in-place.

6 replies

fzipp Jul 22, 2021

@bcmills With Prune I would expect the predicate function to be negated.

Sajmani Jul 22, 2021
Maintainer

I was thinking Keep, which is the name of the function arg, too.

s.Keep(func(i int) {return i > 0})

leighmcculloch Jul 24, 2021

The term keep doesn't tell me for sure that it isn't in place.

Could we have .Filter and .FilterInPlace? The first a copy, the second in place. It's clear.

Providing both copy and in place methods are common in some stdlibs. E.g. in Ruby there are both but functions that mutate typically have a ! as the last character in their name: filter, filter!. Having the two functions with similar names helps me understand the API.

ianlancetaylor Aug 12, 2021
Maintainer Author

I'm leaning toward RemoveIf, which would be an in-place method.

I'm not sure we need a method that combines copying and removal. As this point that seems like a frill that we can consider adding if a lot of code winds up needing it.

sbstp Aug 22, 2021

Rust names this functionality retain. The method name makes it obvious what it does and if the condition is negated or not.

bcmills · 2021-07-22T17:11:37Z

bcmills
Jul 22, 2021

Parallel to #47330 (comment) for maps, would it make sense to reserve the Values() name for an iterator, and name the method that returns a slice something like ValueSlice() instead?

4 replies

cespare Jul 22, 2021
Collaborator

Or ToSlice or AsSlice or just Slice?

leighmcculloch Jul 24, 2021

Instead of placing the slice function on the type could it live on the iterator? Values returns an iterator and then you can construct a slice from the iterator.

bcmills Jul 30, 2021

@leighmcculloch, it could, but then we would have to define the iterator API before we could extract the values from a Set.

I think extracting the values as a slice is a common enough operation to merit its own method, and I don't think that this proposal necessarily needs to be gated on figuring out an entire ecosystem of “iterator” APIs.

smyrman Mar 24, 2022

Are we sure we always want the method to allocate the slice? If we used append semantics, then both of the following would be supported:

set := New(1,2,3)
var slice []int
slice = set.AppendTo(slice)

Or:

set := New(1,2,3)
slice := set.AppendTo(nil)

You could at least imagine applications where this could be more efficient, as the end-user is offered the opportunity to control allocation if the code runs in a tight loop.

Not saying it's worth the trade-off. As a reference hash.Sum works with an append style interface, but it's also pretty confusing.

DeedleFake · 2021-07-23T07:26:30Z

DeedleFake
Jul 23, 2021

In Java, it is often annoying that the various container types' add() methods are void. Although some of that annoyance is solved here by it being variadic, would it be possible for Add(), Remove(), and the like to return their own receiver? They'd still be in place, but that way something could be added and the set could be passed to something else in a single expression.

1 reply

ianlancetaylor Jul 26, 2021
Maintainer Author

That is not standard Go style, though. And speaking purely personally, I would it misleading that code like F(s.Add(1)) would modify s. It's much clearer that s is modified when the entire statement is s.Add(1).

rsc · 2021-08-13T12:44:14Z

rsc
Aug 13, 2021
Maintainer

I think the doc comment can be reduced to:

// A Set is a set of elements of some comparable type.
// The zero value of a Set is an empty set ready to use.
type Set[Elem comparable] struct {
    ... unexported fields ...
}

Calling sets "reference types" is confusing because they're not, at least as currently documented. They're just structs that happen to hold pointers, which is true of most Go types. We don't refer to them all as reference types.

Calling out that Sets cannot be modified concurrently from multiple goroutines is also confusing, because that's the default expectation for every Go data structure. We only document the deviations, when concurrent modifications are allowed.

(Compare list.List: it is just as much a "reference type" and is similarly safe to read but unsafe to modify from multiple goroutines, but we don't call attention to any of that. Or bytes.Buffer, which is also just as much a "reference type".)

Mentioning maps also encourages the reader to start thinking about questions like why isn't this a map or exactly how maps work. It is probably better to just let Sets be Sets.

0 replies

leighmcculloch · 2021-08-13T16:20:13Z

leighmcculloch
Aug 13, 2021

The Set as proposed, relevant components below, does not have a convenient method for iteration.

type Set[Elem comparable] struct {
	// contains filtered or unexported fields
}

// Values returns the elements in the set s as a slice.
// The values will be in an indeterminate order.
func (s *Set[Elem]) Values() []Elem

// Do calls f on every element in the set s,
// stopping if f returns false.
// f should not change s.
// f will be called on values in an indeterminate order.
func (s *Set[Elem]) Do(f func(Elem) bool)

I see two approaches to iteration given the current API:

Using the Do method:

Do allows you to provide a function that will be called for every element in the Set. This introduces a new pattern to iteration that isn't found in other types since iteration in Go in all other types are supported with a for range. This different pattern has some limitations.
- It is not possible to return immediately from within an iteration. Instead, a value must be copied to a temporary variable and returned outside the iteration function, resulting in code that looks significantly different for iterating sets that other types. For example:
```
func _() (Elem, bool) {
    set := set.Set{}
    // ...
    found := false
    var foundE Elem
    set.Do(func(e Elem) bool {
        if condition {
            found = true
            foundE = e
            return false
        }
        return true
    })
    return foundE, found
}
```
  I think it would be easier to work with map[key]struct{}. For example:
```
func _() (Elem, bool) {
    set := map[Elem]struct{}{}
    // ...
    for v := range set {
        if condition {
            return v, true
        }
    }
    return Elem{}, false
}
```
- Similar to the return example it is not possible to break out of higher up nested for's without a similar pattern.
Using the Values method and range:

This approach doesn't share the disadvantages of the Do approach, but it comes at a cost since the Set is copied into a slice.

Could we include in this proposal a method of iterating Set using range? I understand this would be a language change and is therefore worthy of a separate proposal. I'm happy to open a new issue for it but I was holding off because it sounded like we'd get range for free with #47331 (comment) but that proposal has been retracted.

24 replies

robaho Aug 13, 2021

Sorry - Java doesn’t allow it by design to avoid confusing race conditions.

I also think though that you will will need a different Do signature because you need to handle the case where the index/key is made available not just the value.

Merovius Aug 13, 2021

@robaho Yes. Presumably there will be some flexibility there (i.e. we'd allow both Do(T) and Do(K,V)). But again, the details aren't important right now.

deanveloper Aug 13, 2021

One issue with using Do as the “standard iterator” is that poorly written iterators which ignore return false will continue calling the Do argument

Merovius Aug 13, 2021

@deanveloper Any user-implemented iteration API will have that problem though.

In any case, given that the intention is explicitly not to discuss iteration now, but do it later, it seems we shouldn't go too deep into the ins and out of how Do would work for that case. I think it's clear it could be made to work with range though, alleviating at least that concern (if it's a good idea to do so is a question for another day).

leighmcculloch Aug 15, 2021

Given that I've seen a couple comments above that iteration isn't something to discuss within the container/set proposal, I've opened a new issue #47707 containing the proposal to support for range with user-defined types using the ideas described above.

bcmills · 2021-08-31T20:44:41Z

bcmills
Aug 31, 2021

[Edit: this turns about to be confusion over naming rather than a problem with the API signature itself.]

The lack of a short-circuit mechanism for Do seems awkward to me.

The corresponding sync.Map method (Range) returns a boolean from the callback to allow the caller to stop early. That makes the calls much more efficient if the caller is looking for a property that is reasonably likely to be satisfied by an arbitrary small sample of elements from a much larger set.

It is clearly possible to write every Do call site nearly as efficiently (but a bit more verbosely) using Range, but I do not see a way to write a Range-like call (that breaks early) using Do. (The closest we have in the current API is ContainsAny, but that method requires that the “small sample” be a single element.)

5 replies

Merovius Aug 31, 2021

The signature of Do in the top-post is

// Do calls f on every element in the set s,
// stopping if f returns false.
// f should not change s.
// f will be called on values in an indeterminate order.
func (s *Set[Elem]) Do(f func(Elem) bool)

So, maybe I'm missing something, but ISTM that the short-circuit mechanism you want is already in place?

deanveloper Aug 31, 2021

I think I’ve written this a few times (sorry if I am a bit repetitive) - I really think that a standard iteration mechanism should be implemented before we add more containers. This avoids the problem of having both “the .Do way” and “the .Iter way” in the future, and it would also solve usability concerns such as this one.

bcmills Aug 31, 2021

Oh, so it is! Apparently I fail at reading comprehension today. 😩

bcmills Aug 31, 2021

Ah! I was confused by the other Do thread, in which the API is compared to other Do methods in the standard library that lack the boolean. Perhaps that suggests a change in naming for consistency: the Range name should imply the boolean, while the Do name should imply its absence.

ajlive Sep 1, 2021

I was confused by the other Do thread, in which the API is compared to other Do methods in the standard library that lack the boolean

Me too

deanveloper · 2021-09-03T14:54:38Z

deanveloper
Sep 3, 2021

Retracted

I think something that may be concerning is that because this is implemented using a map, this would mean that sets are not comparable. So, a Set of Sets is illegal. However, I'm not exactly sure how we could implement sets such that a set of sets would be useful, other than abandoning the map idea, and using custom hash/equality methods. I don't really like that idea though.

9 replies

Merovius Sep 3, 2021

It's theoretically possible to build variable sized data structures which are comparable. Theoretically, it might be possible to extend this technique to a reasonably efficient comparable set data-structure (probably not with O(1) access time, but maybe O(log(n))).

However, this doesn't seem very go like. I think it's better to just accept that sets are not comparable.

ajlive Sep 3, 2021

I'm not sure that it would be, unless Go someday allows for immutable maps which would be comparable.

I was imagining that a FrozenSet need not necessarily be implemented using maps. That's an implementation detail that could be hidden?

In Java, this is done by allowing classes to override the int hashCode() and boolean equals(Object other) methods.
In Rust, this is done by implementing the Eq and Hash traits.

There has been discussion elsewhere here about extracting some kind of interface for sets in the future, with more or less controversy.

ajlive Sep 3, 2021

I hit "reply" before @Merovius 's comment showedd up on my screen.

I think it's better to just accept that sets are not comparable.

Agree. Haven't ever used frozenset in Python, and someone could build their own if they really needed one.

deanveloper Sep 3, 2021

It's theoretically possible to build variable sized data structures which are comparable. Theoretically, it might be possible to extend this technique to a reasonably efficient comparable set data-structure (probably not with O(1) access time, but maybe O(log(n))).

That's a really neat example. I wouldn't really say that the example "isn't Go-like" (I wouldn't mind seeing this as a third-party library), but I agree that something like this shouldn't be the standard implementation or even exist in the standard library. However I would love to use something like this to convert a set into an immutable set, then use that to make a set of sets. Going to retract this.

~~Edit - Definitely can't have O(log(n)) time without the elements being orderable unfortunately, the "generic case" is definitely O(n)~~

Merovius Sep 3, 2021

@deanveloper I tend to be careful with using "definitely" when applied to negatives :) Just because the obvious idea doesn't work, doesn't mean there aren't any non-obvious ideas. For example, you could definitely use reflect and hash/maphash to build a custom hash-function for arbitrary comparable types and then build a binary search tree using those hashes as keys, to get O(log(n)) lookups. You can't easily use it to build a hash-map, as that needs a variable number of randomly accessible buckets, but if you get creative with skip-lists you might well get something O(log(log(n))). I can even imagine someone being even more creative and find an O(1) implementation.

It's always harder to prove that something is not possible, than it is to prove that it is. So I prefer to be conservative about such claims :)

robaho · 2021-09-03T16:35:57Z

robaho
Sep 3, 2021

FWIW in Java the collection will often cache the hash and invalidate when the collection is modified making many uses more efficient than they might be thought of.

…

On Sep 3, 2021, at 10:52 AM, Dean Bassett ***@***.***> wrote: It's theoretically possible to build variable sized data structures which are comparable. Theoretically, it might be possible to extend this technique to a reasonably efficient comparable set data-structure (probably not with O(1) access time, but maybe O(log(n))). That's a really neat example. I wouldn't really say that the example "isn't Go-like" (I wouldn't mind seeing this as a third-party library), but I agree that something like this shouldn't be the standard implementation or even exist in the standard library. However I would love to use something like this to convert a set into an immutable set, then use that to make a set of sets. Going to retract this. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

3 replies

deanveloper Sep 3, 2021

I don't think efficiency is the main reason why we wouldn't want to use custom hash functions, mainly just that it doesn't feel very Go-like

Merovius Sep 3, 2021

@robaho Please use the web interface when participating in discussions, for proper threading. You've been reminded a bunch of times about this.

As to your point: This might be something a custom implementation of a hash-set might do/allow via an API, but it doesn't address the actual point of having a set representation which can be used as a key for the builtin map. The builtin map does not allow overriding the equality operator or the hash, so you actually need to have a comparable type. Obviously, putting a cached hash into a struct doesn't help, because a) the rest of that struct would still have to support the equality operator just the same and b) the hash would then be part of the identity of the value as well, meaning it would have to be re-computed on every operation, which clearly loses all efficiency benefit.

robaho Sep 3, 2021

Sorry again. Why not fix this - it’s stupid. If I reply to a particular message it should know where the reply belongs.

sfllaw · 2021-09-11T22:01:00Z

sfllaw
Sep 11, 2021

Maybe this is a nitpick, but Intersection and Difference seem to be much longer than I'd expect.

Is there any appetite for the verb forms of Union, Intersect, and Diff?

12 replies

sfllaw Oct 11, 2021

I did a little more research and it appears that Scala has a diff operator for sets and R has a setdiff function.

It also appears that when voicing the “∩” operator, lazy mathematicions read A∩B as “A inter B”.

It is also convenient that set.Inter could be considered an abbreviation for “intersection”, set.Diff for “difference”, and set.Union is short enough, so whether these names are technically nouns or verbs is inconsequential.

sfllaw Oct 11, 2021

In summary, I recommend:

// Union constructs a new set containing the union of s1 and s2.
func Union[Elem comparable](s1, s2 Set[Elem]) Set[Elem]

// Inter constructs a new set containing the intersection of s1 and s2,
// which are the elements only contained in both s1 and s2.
func Inter[Elem comparable](s1, s2 Set[Elem]) Set[Elem]

// Diff constructs a new set containing the set difference of s1 and s2,
// which are the elements of s1 that are not present in s2.
func Diff[Elem comparable](s1, s2 Set[Elem]) Set[Elem]

// Difference constructs a new set containing the symmetric difference of s1 and s2,
// which are the elements of s1 that are not present in s2,
// and the elements of s2 that are not present in s1.
func SymDiff[Elem comparable](s1, s2 Set[Elem]) Set[Elem]

Merovius Oct 11, 2021

@sfllaw I'm arguing that Diff has pre-existing meaning incompatible with what the API is doing. I think this is something that makes the name slightly less good than Difference.

sfllaw Oct 14, 2021

@Merovius Fair enough! It seems like we have a difference of opinion 😆 about how confusing a shortened name would be.

Since this package probably isn’t going to land soon, shall we see if there is any more feedback from other developers? Perhaps they will upvote / downvote these comments and that will help us decide?

deanveloper Oct 19, 2021

Can we look at other languages and what they do?

Rust
- union, intersection, difference, and symmetric_difference
- usage, a.union(&b) (immutable)
Java only has mutable operations, so it's a lot different
- addAll, retainAll, and removeAll (no symmetric difference function)
- usage a.addAll(b) (mutable)
C++
- set_union, set_intersection, set_difference, and set_symmetric_difference
- usage std::set_union(a, b) (immutable)
Swift
- union, intersection, difference, and symmetricDifference
- usage a.union(b) (immutable)
- also has formUnion, formIntersection, etc. (mutable)
Python
- union, intersection, difference, and symmetric_difference
- usage a.union(b) (immutable, even for non-frozen sets)
JavaScript
- does not provide any of these operations, and you should implement them yourself
Scala
- union, intersect, and diff (no symmetric difference function).
- usage a union b (immutable)
- Symmetric difference in Scala is done using (a diff b) union (b diff a)
Kotlin
- union, intersect, and subtract (no symmetric difference function).
- usage a union b (immutable)
- Symmetric difference in Kotlin is done using (a subtract b) union (b subtract a)
Haskell
- union, intersection, and difference (no symmetric difference function)
- usage a `union` b
- One may instead use \\ instead of difference, usage a \\ b

These are not cherry-picked. I picked a language in my head, and then researched what the terms were for that language. The only languages in this list that I am proficient in are Java and Javascript (and Kotlin too, although never worked too much with Sets), for which Java/JS aren't really applicable. So I'm hoping I'm not offering too much of a skewed perspective here.

Edit - I meant to say this before posting, but I wanted to point out that the only languages that seem to use abbreviated names typically use infix functions (ie a diff b) which I think is justified.

jhenstridge · 2021-09-28T02:48:12Z

jhenstridge
Sep 28, 2021

// Like maps, Sets are reference types.
// That is, for Sets s1 = s2 will leave s1 and s2 pointing to the same set of elements:
// changes to s1 will be reflected in s2 and vice-versa.
// Unlike maps, the zero value of a Set is usable; there is no equivalent to make.

Is it actually possible to implement this behaviour with usable zero values? Consider the following two code fragments:

var a, b set.Set[int]
a.Add(1)
b = a
b.Add(2)

and:

var a, b set.Set[int]
b = a
a.Add(1)
b.Add(2)

If sets really are reference types, then it shouldn't matter when I perform the assignment. But if the set is in its uninitialised state when I perform the assignment, there's no internal pointer to copy over linking the two variables.

If the implementation of Add() is "make() the internal map if it is currently nil", then the two code fragments will definitely not be equivalent.

6 replies

jhenstridge Sep 29, 2021

While the receivers for the methods were switched to pointers, the Of, Union, Intersection, and Difference functions all use non-pointers as arguments or return types. The same goes for the methods that take a set as an argument. That should probably change if we want to encourage people to pass sets as pointers.

deanveloper Sep 29, 2021

I don’t think using pointer receivers is as much about passing sets as pointers, as it is about allowing sets to satisfy interfaces easily. In the most common case, people shouldn’t need to explicitly make pointers to sets.

jhenstridge Sep 29, 2021

The discussion seems to show the decision to change to pointer receivers was to make zero values usable as sets: if the Add() method had a non-pointer receiver, then it can't create the backing map on demand.

It certainly doesn't make using sets with interfaces easier though, since now the method set of Set[T] is empty (I mentioned this earlier in #47331 (reply in thread)).

Merovius Sep 29, 2021

It would be possible for all methods and functions to take/return pointers and still have a useful zero values. As an example, look at bytes.Buffer. Yes, you have to take the address of a bytes.Buffer to assign it to interfaces. But you can still use it as a value-field in a struct and get a useful zero value of that struct. That's useful behavior and it was what I was intending when I was suggesting to move to pointer receivers.

jhenstridge Sep 30, 2021

I'd be quite happy if sets behaved similar to bytes.Buffer. The main differences I see are:

We don't tell people that bytes.Buffer values act like references, so people don't expect copying values to work.
Functions that create new buffers (i.e. bytes.NewBuffer and bytes.NewBufferString) return pointers

smyrman · 2022-03-23T13:28:47Z

smyrman
Mar 23, 2022

I don't see that it has been suggested yet, so I will:

Add method MarshalJSON that writes data as an alphanumerically sorted array (or [] when empty).
Add method UnmarshalJSON that decodes an array of elements into a set while removing any duplicates.

This makes sense if we agree that an array is the only/most reasonable way to encode a set into JSON, and if we want the set types, like the time.Time type and other commonly used types, to work well in JSON-based applications with minimal boiler-plate.

PS! This thread should be about JSON encoding support only. I think XML, sql etc. should be discussed (and probably rejected) in separate threads.

4 replies

Merovius Mar 23, 2022

This would require the set package to depend on json, which seems like a bad idea. Also

Add method MarshalJSON that writes data as an alphanumerically sorted array (or [] when empty)

This would require the element type to be constraints.Ordered, instead of comparable, so it isn't really possible.

And then there are also Func variants of sorting and there is a question of whether sort should be stable or not. And a lot of usecases wouldn't need sorting at all.

Note that this doesn't require a lot of code to do manually, e.g.

type JSONSet[T constraints.Ordered] []T

func (s *JSONSet[T]) MarshalJSON() ([]byte, error) {
    vals := s.Values()
    slices.Sort(vals)
    return json.Marshal(vals)
}

func (s *JSONSet[T]) UnmarshalJSON(p []byte) error {
    var vals []T
    if err := json.Unmarshal(p, &vals); err != nil {
        return err
    }
    *s = set.Of(vals...)
}

So this seems relatively straight forward to do, if you need it - and at that point, it's not a problem to require both json and set (and slices). So easy, that it seems easier to leave this up to people who can then also decide precisely what they want from it.

smyrman Mar 23, 2022

I think those are really good, compelling arguments. I will try to argue them anyway, or at least ask some follow up questions.

This would require the element type to be constraints.Ordered, instead of comparable, so it isn't really possible.

Maybe not. The goal here is really for the encoding to be consistent, so maybe we could just sort the JSON encoded bytes?

For reference, the map[K]V type works by ordering the text encoding of K using utf-8 sorting rules.

This would require the set package to depend on json

I think this is a good point, and I think we should avoid that dependency.

It would definitively be possible to make an implementation of Set[K].MarshalJSON that works for all the types that would work for map[K]V without such a dependency, but I can also see why this is a bad idea. The expectation probably should be that json.Marshal works for all valid types Set[T] where json.Marshal of []T works today, right?

So I just want to ask the obvious question: is it OK to have a dependency the other way (json depending on containers/set)?

Note that this doesn't require a lot of code to do manually.

It doesn't but I tend to need sets everywhere (or at least string sets), and I do end up duplicating that code in a lot of places. It's of course better with this suggestion so that I don't also have to duplicate the set semantics, but it would be really useful if at least for a limited subset of type T, then json.Marshal/json.Unmarshal of type Set[T], just worked.

Merovius Mar 23, 2022

I think it would be entirely reasonable for encoding/json to special-case the default-encoding of Set[T] - at least once it actually enters the stdlib. I think that would satisfy you, without needing to actually have a MarshalJSON method at all.

smyrman Mar 23, 2022

Yes, from my perspective, that sounds like a sound approach.

ismail · 2022-05-11T12:46:10Z

ismail
May 11, 2022

Is this still planned for Go 1.19 release?

2 replies

fzipp May 11, 2022

@ismail The Go 1.19 freeze started this week. As far as I am aware there are no generics-related additions to the standard library scheduled for Go 1.19 (except sync/atomic.Pointer[T]). The set type design hasn't even gone through the proposal process yet.

ismail May 11, 2022

@ismail The Go 1.19 freeze started this week. As far as I am aware there are no generics-related additions to the standard library scheduled for Go 1.19 (except sync/atomic.Pointer[T]). The set type design hasn't even gone through the proposal process yet.

Thanks a lot for the update!

vfaronov · 2022-07-10T12:13:05Z

vfaronov
Jul 10, 2022

Perhaps there should be a variant of the constructor to make a map with a given capacity. Might be useful in situations like:

itemIDs := set.WithCapacity[uint64](len(items))
for _, item := range items {
	itemIDs.Add(item.ID)
}

2 replies

earthboundkid Jul 10, 2022

You can do set.Of(items) and it will probably do the efficient thing. I do think adding Set.Grow(int) and Set.Shrink() would probably be a good idea though.

vfaronov Jul 10, 2022

I can’t do set.Of(items) because it will be a Set[Item] whereas I need a Set[uint64].

robaho · 2022-07-10T17:42:36Z

robaho
Jul 10, 2022

Set.Grow() and Set.Shrink() are not great interface methods. It leaks implementation details and offers semantics that may not be applicable.

Better to add these to a ResizableSet interface or similar.

0 replies

bvk · 2022-09-11T18:29:58Z

bvk
Sep 11, 2022

// Clone returns a copy of s.
// The elements are copied using assignment,
// so this is a shallow clone.
func (s *Set[Elem]) Clone() Set[Elem]

What is the rationale for Clone method on a pointer receiver type returning a Set[Elem] instead of a pointer? Is there a general guidance on this distinction?

IMO it could be more appropriate for a method with pointer receiver type to return a pointer to the value or non-pointer receiver type to return a non-pointer value -- so that both could pass a Clonable generic like type Clonable[T any] interface { Clone() T } -- but here a pointer receiver type is returning a non-pointer value.

0 replies

robaho · 2022-09-19T14:02:33Z

robaho
Sep 19, 2022

I am not sure if this has been mentioned but how will range adapt to an ordered set? How will a user be able to know if the set being iterated over is ordered?

if the an interface is passed to functions it implies we need additional marker methods (e.g IsOrdered) for the consumer.

55 replies

smyrman Sep 27, 2022

All the Set arguments in this proposal can be replaced with ReadableSet.

Actually, most arguments can probably be replaced by just:

type Rangable[Elem any] interface {
     Range() Iter[Elem]
}

Elem still needs to be comparable for our use-case, but that should be handled by the constraints on Elem in the set declaration.

- // AddSet adds the elements of set s2 to s.
- func (s *Set[Elem]) AddSet(s2 Set[Elem])
+ // AddAll adds the elements from r to s.
+ func (s *Set[Elem]) AddAll(r Rangable[Elem])

- // RemoveSet removes the elements of set s2 from s.
- // Elements present in s2 but not s are ignored.
- func (s *Set[Elem]) RemoveSet(s2 Set[Elem]) 
+ // RemoveAll removes all elements in r from s.
+ // Elements present in r but not s are ignored.
+ func (s *Set[Elem]) RemoveAll(r Rangable[Elem]) 

- // ContainsAny reports whether any of the elements in s2 are in s.
- func (s *Set[Elem]) ContainsAny(s2 Set[Elem]) bool
+ // ContainsAny reports whether any of the elements in r are in s.
+ func (s *Set[Elem]) ContainsAny(r Rangable[Elem]) bool

- // ContainsAll reports whether all of the elements in s2 are in s.
- func (s *Set[Elem]) ContainsAll(s2 Set[Elem]) bool
+ // ContainsAll reports whether all of the elements in r are in s.
+ func (s *Set[Elem]) ContainsAll(r Rangable[Elem]) bool

// Equal reports whether s and s2 contain the same elements.
- func (s *Set[Elem]) Equal(s2 Set[Elem]) bool
+ func (s *Set[Elem]) Equal(s2 ReadableSet[Elem]) bool

Otherwise very good points on the naming @jba.

smyrman Sep 27, 2022

If needed, this should perhaps be raised as a separate discussion, as it's pretty far from the opening comment of sorted iteration. Although somewhat in line with the following discussion.

I also want to point out that while this creates a dependency on #54245, it doesn't actually require the suggested language change; the described interface is perfectly possible to use directly for an initial implementation inside the exp namespace of both proposals together.

Merovius Sep 27, 2022

@jba @smyrman The issue I'm having with that idea (and, to some extent, with set interfaces in general) is what I point out here. If you use interface{ Range() iter.Iter }, your Range method has to return exactly iter.Iter, which kills a lot of the assumptions in #54245 - namely, you can't return a more specific type to support stopping, error handling, everything under "Optional future extensions"…

So, you'd need type Rangable[E any, Iter iter.Iter[E]] { Range() Iter } (or the similar ReadableSet interface). But with that, you can't actually use them as method arguments, as you'd require extra type parameters on methods.

I don't see anything wrong with saying that the methods of the proposed hash set only operates using hash sets and an abstract set library could then expose functions which operate on any set. Note that such functions can then use the Rangeable[E, Iter] interface, as they are not methods.

Merovius Sep 27, 2022

FWIW we could replace the *Set methods with *All variants which take an iter.Iter. That is, have func (*Set) AddAll(it iter.Iter[E]). The calls would then be a.AddAll(b.Range()), instead of a.AddSet(b), which seems fine. Arguably, the ...E could be replaced with iter.Iter[E] as well, though this makes the single-element-add more inconvenient. We could extend #54245 by making E implement iter.Iter[E] by returning exactly one value. That probably goes too far, but it would be conceptually elegant.

smyrman Sep 28, 2022

Yes, I think all except the Equal function could (and therefore should) accept an iter.Iter[E] instance rather than a Rangable interface variant.

jfesler · 2022-09-19T19:03:18Z

jfesler
Sep 19, 2022

How do I avoid the cost of an ordered set when I only need an unordered set?

Ordering seems to be a quality of a given implementation - much like three safety (built in synchronization or lack thereof).

1 reply

robaho Sep 19, 2022

Exactly. If I write a function - I can optimize the code if I know the passed set provides ordered keys (otherwise I might need to scan and order them myself). So I either declare that I need an ordered set (and the caller is responsible for providing one) or I detect that the set that was passed is an ordered set and bypass the code to create a temporary ordered instance). To do either of those I need a concept of an ordered set. This doesn’t strictly affect range - but it relates - because if I have no way to determine the semantics of the passed set I need to always read the full set and sort.

This is the reason Java has marker interfaces like SortedSet.

So, I reiterate, this is why you define the interfaces first then decide on implementations. Anything else is not OO (I know Go is not OO but the discussion here around the best design for range/iterators certainly is).

antong · 2022-09-23T09:23:58Z

antong
Sep 23, 2022

map[T]bool or map[T]struct{} are already familiar and working ways of implementing a generic set. Also, the implementation indicated in this proposal is map[T]struct{}. What then are the pros for hiding this within a struct in container/set, instead of just making generic functions that operate on such a map and methods on a generic derived type (type Set[T comparable] map[T]struct{} )? The maps and slices proposals do it like this I think. In particular, the maps proposal is so close to such a set, that many functions are identical (Clear, Equal, Filter is maps.DeleteFunc, and Values is maps.Keys).

Exposing the map[T], would immediately remove all the questions about complexity guarantees, and let us defer the iterator interface discussion as range already works directly on maps.

2 replies

Merovius Sep 23, 2022

What then are the pros for hiding this within a struct in container/set, instead of just making generic functions that operate on such a map and methods on a generic derived type (type Set[T comparable] map[T]struct{} )?

One clear and tangible benefit is that it allows the zero value to be an empty set valid for writing, by lazily initializing the hidden field as needed. Essentially, we are going back to the days where map was a value type and had to be passed by explicit pointers (i.e. you had to write F(&m) to let F modify the contents of m).

A less tangible benefit is that it allows to optimize the implementation for sets in the future should we find a good way to do that. And perhaps that requires keeping additional state, or adding additional code over the obvious m[x] = struct{}{}.

Lastly, it means you get consistency. There is ultimately nothing preventing a user from writing m[x] = struct{}{} sometimes and m.Add(x) other times, if you make it a plain type. Hiding it inside a struct prevents that, enforcing a consistent API to be used.

The maps and slices proposals do it like this I think.

Those packages do not define types at all, I believe. They are just plain functions to manipulate any map or slice type respectively. The API discussed here, defining a type, is a value-add, because s.Add(x) is clearer in its intent than s[x] = struct{}{}. And s.Contains(x) is more convenient ann clearer than _, has := s[x], as it doesn't require an extra statement.

let us defer the iterator interface discussion as range already works directly on maps.

I disagree with this. range does work on maps, that is true. But the two-value form for a, b := range s also works on them. But it's meaningless for a map[T]struct{}. But it's not immediately obvious that it's meaningless. And furthermore, it's confusing, because for existing containers, the "value" in the two-value form is in the second variable, but for map[T]struct{}, it's in the first.

So, by hiding the mappiness - and if we adopt #54245 - we make the usage clearer and unambiguous and easier to understand, by only providing an iter.Iter and not provide an iter.Iter2.

Don't get me wrong, for my personal projects, I use type Set[T] map[T]struct{} myself and I've written a couple of internal packages for sets doing exactly that. But I think for the stdlib, the consistency and polish of all of this outweighs the simplicity of directly exposing the map[T]struct{}. It is more important for the stdlib, to be "as good as possible", even if it forces us to answer hard questions.

antong Sep 23, 2022

Thank you @Merovius , that is excellent set (!) of benefits, exactly what I was asking for!

This comment has been minimized.

Sign in to view

This comment has been minimized.

Sign in to view

This comment has been minimized.

Sign in to view

proposal: container/set: new package to provide a generic set type (discussion) #47331

ianlancetaylor Jul 21, 2021 Maintainer

Replies: 45 comments · 352 replies

randall77 Jul 22, 2021 Maintainer

smasher164 Jul 22, 2021 Collaborator

ianlancetaylor Jul 23, 2021 Maintainer Author

ianlancetaylor Jul 23, 2021 Maintainer Author

ianlancetaylor Jul 26, 2021 Maintainer Author

ianlancetaylor Jul 26, 2021 Maintainer Author

cespare Jul 26, 2021 Collaborator

rogpeppe Jul 22, 2021 Collaborator

ianlancetaylor Jul 27, 2021 Maintainer Author

ianlancetaylor Jul 26, 2021 Maintainer Author

ianlancetaylor Jul 26, 2021 Maintainer Author

ianlancetaylor Jul 27, 2021 Maintainer Author

cespare Jul 22, 2021 Collaborator

cespare Jul 22, 2021 Collaborator

ianlancetaylor Jul 26, 2021 Maintainer Author

cespare Jul 26, 2021 Collaborator

ianlancetaylor Jul 26, 2021 Maintainer Author

jba Jul 22, 2021 Maintainer

ianlancetaylor Jul 26, 2021 Maintainer Author

jba Jul 22, 2021 Maintainer

ianlancetaylor Aug 12, 2021 Maintainer Author

This comment has been minimized.

This comment has been minimized.

Sajmani Jul 22, 2021 Maintainer

ianlancetaylor Aug 12, 2021 Maintainer Author

cespare Jul 22, 2021 Collaborator

ianlancetaylor Jul 26, 2021 Maintainer Author

ianlancetaylor
Jul 21, 2021
Maintainer

Replies: 45 comments 352 replies

randall77
Jul 22, 2021
Maintainer

smasher164
Jul 22, 2021
Collaborator

ianlancetaylor Jul 23, 2021
Maintainer Author

ianlancetaylor Jul 23, 2021
Maintainer Author

ianlancetaylor Jul 26, 2021
Maintainer Author

ianlancetaylor Jul 26, 2021
Maintainer Author

cespare Jul 26, 2021
Collaborator

rogpeppe
Jul 22, 2021
Collaborator

ianlancetaylor Jul 27, 2021
Maintainer Author

ianlancetaylor Jul 26, 2021
Maintainer Author

ianlancetaylor Jul 26, 2021
Maintainer Author

ianlancetaylor Jul 27, 2021
Maintainer Author

cespare
Jul 22, 2021
Collaborator

cespare Jul 22, 2021
Collaborator

ianlancetaylor Jul 26, 2021
Maintainer Author

cespare Jul 26, 2021
Collaborator

ianlancetaylor Jul 26, 2021
Maintainer Author

jba
Jul 22, 2021
Maintainer

ianlancetaylor Jul 26, 2021
Maintainer Author

jba
Jul 22, 2021
Maintainer

ianlancetaylor Aug 12, 2021
Maintainer Author

Sajmani Jul 22, 2021
Maintainer

ianlancetaylor Aug 12, 2021
Maintainer Author

cespare Jul 22, 2021
Collaborator

ianlancetaylor Jul 26, 2021
Maintainer Author