Skip to content

proposal: container/set: new package to provide a generic set type (discussion) #47331

proposal: container/set: new package to provide a generic set type (discussion) #47331
Jul 21, 2021 · 41 comments · 288 replies

This is a discussion that is intended to turn into a proposal.

This proposal is for use with #43651. We propose defining a new package, container/set, that will introduce a new set type. It is possible that this proposal, if accepted, will be included with the first release of Go that implements #43651 (we currently expect that that will be Go 1.18). Or it may be appropriate to delay this package until a later release. This package will not be in the 1.18 release, but is for consideration for a later release.

This description below is focused on the API, not the implementation. In general the implementation will be straightforward.

See also the slices proposal at #45955 (discussion at #47203) and the maps proposal discussion at #47330.

// Package set defines a Set type that holds a set of elements.
package set

// A Set is a set of elements of some comparable type.
// Sets are implemented using maps, and have similar performance characteristics.
// Like maps, Sets are reference types.
// That is, for Sets s1 = s2 will leave s1 and s2 pointing to the same set of elements:
// changes to s1 will be reflected in s2 and vice-versa.
// Unlike maps, the zero value of a Set is usable; there is no equivalent to make.
// As with maps, concurrent calls to functions and methods that read values are fine;
// concurrent calls to functions and methods that write values are racy.
type Set[Elem comparable] struct {
	// contains filtered or unexported fields
}

// Of returns a new set containing the listed elements.
func Of[Elem comparable](v ...Elem) Set[Elem]

// Add adds elements to a set.
func (s *Set[Elem]) Add(v ...Elem)

// AddSet adds the elements of set s2 to s.
func (s *Set[Elem]) AddSet(s2 Set[Elem])

// Remove removes elements from a set.
// Elements that are not present are ignored.
func (s *Set[Elem]) Remove(v ...Elem)

// RemoveSet removes the elements of set s2 from s.
// Elements present in s2 but not s are ignored.
func (s *Set[Elem]) RemoveSet(s2 Set[Elem])

// Contains reports whether v is in the set.
func (s *Set[Elem]) Contains(v Elem) bool

// ContainsAny reports whether any of the elements in s2 are in s.
func (s *Set[Elem]) ContainsAny(s2 Set[Elem]) bool

// ContainsAll reports whether all of the elements in s2 are in s.
func (s *Set[Elem]) ContainsAll(s2 Set[Elem]) bool

// Values returns the elements in the set s as a slice.
// The values will be in an indeterminate order.
func (s *Set[Elem]) Values() []Elem

// Equal reports whether s and s2 contain the same elements.
func (s *Set[Elem]) Equal(s2 Set[Elem]) bool

// Clear removes all elements from s, leaving it empty.
func (s *Set[Elem]) Clear()

// Clone returns a copy of s.
// The elements are copied using assignment,
// so this is a shallow clone.
func (s *Set[Elem]) Clone() Set[Elem]

// Filter deletes any elements from s for which keep returns false.
func (s *Set[Elem]) Filter(keep func(Elem) bool)

// Len returns the number of elements in s.
func (s *Set[Elem]) Len() int

// Do calls f on every element in the set s,
// stopping if f returns false.
// f should not change s.
// f will be called on values in an indeterminate order.
func (s *Set[Elem]) Do(f func(Elem) bool)

// Union constructs a new set containing the union of s1 and s2.
func Union[Elem comparable](s1, s2 Set[Elem]) Set[Elem]

// Intersection constructs a new set containing the intersection of s1 and s2.
func Intersection[Elem comparable](s1, s2 Set[Elem]) Set[Elem]

// Difference constructs a new set containing the elements of s1 that
// are not present in s2.
func Difference[Elem comparable](s1, s2 Set[Elem]) Set[Elem]

Replies

41 comments
·
288 replies

Needs some sort of performance promise. I think it would be ok to promise that Add and Has are amortized O(log(n)).
Not sure we need to enumerate that everywhere, but maybe a note at the bottom of package docs would list promises for all the functions.

6 replies
@robaho

I don’t think container interfaces ever give performance promises - that is an implementation detail But I see this is not an interface though? Why? You might have hash sets and linked hash sets etc. all with different performance profiles for different operations.

What is the value of a single concrete implementation of Set?

@Merovius

Why O(log(n))? My assumption would be that we'd use a hash-set (especially given the comparable constraint) which should provide O(1)?

@randall77

There's no obvious way to hash comparables. Although we might get a backdoor to the runtime to do that.

Another natural implementation would be a red-black tree, which is O(log(n)). Although that requires orderables.

It's all a question of how much we want to commit to the underlying implementation. O(log(n)) gives us some wiggle room. O(1) basically means it must be a hash table.
The "amortized" word is also doing some heavy lifting, as far as giving us some flexibility in the implementation.

Regarding the naming, it seems inconsistent to me to have standalone functions named Union, Intersection, and Difference, when the set type has methods named AddSet and RemoveSet. It almost suggests that AddSet and RemoveSet perform a different operation (I understand they work in place).

4 replies
@nickkeets

FWIW Swift calls the in-place versions formUnion and formIntersection.

@ianlancetaylor

I don't think it's inconsistent. The Union function forms a union of two sets. The AddSet function adds one set to another. While clearly the concepts are very similar I think it would be more confusing to have a Union function that returns a new set and a Union method that changes an existing set.

@ajlive

I agree the function and the method should have different names, but I think Swift is at least on the right track here.

Scanning the API proposal above, I did a double-take at AddSet and had to check the comment to make sure I knew what it did.

FormUnion or, say, UnionWith, I think are more clear about both what the method does and what makes it different from the Union function.

// (Note: New and Of are separate because New will always require a type parameter
// whereas Of will often be able to infer the type parameter from the arguments.
// For example, set.New[int] or set.Of(1, 2), but not set.Of[int]().)

I don't understand this argument (also, set.New[int]()). As Of is the more general function, this comment would suggest that New saves us having to write type-parameters - but it, of course, doesn't. "Always having to write a type-parameter" is clearly not a beneficial property of a function.

IMO there should just be one function New[Elem comparable](v ...Elem) Set[Elem]. It still allows you to write the same New call as before and you can get type-inference if you list the elements. Seems strictly better to me.

4 replies
@Merovius

In other words: AIUI Of[T]() will be equivalent to New[T]() so New is redundant and should be removed. Freeing up the canonical New name for Of.

@ianlancetaylor

I wrote both because they read differently. But, yes, we could remove New.

@DeedleFake

While New() seems more idiomatic, Of() actually reads quite well for both:

set.Of(1, 2, 3)
set.Of[int]()

I find it awkward to say "Like maps, Sets are reference types" and having to explain that, instead of just making it a pointer. It might feel weird to always have to pass around a Set via a pointer, but for declared types, that's not super uncommon. And if we made every method use a pointer, we can make the zero value of Set equivalent to an empty set - maps are one of the main reasons why I often have to write constructors instead of have my zero values be useful, so I would prefer to be have useful zero values.

4 replies
@bcmills

Agreed — having to make all of the fields of map types within a struct gets pretty tedious. The meaningful zero-value is one of the few things about sync.Map that is more ergonomic than a built-in map.

@ianlancetaylor

I've made the zero value useful by changing the methods to take pointer receivers.

@cespare

I've made the zero value useful by changing the methods to take pointer receivers.

That seems like an improvement, but I think we should further encourage sets to be passed by pointer by having all functions that accept or return sets use *Set[Elem] rather than Set[Elem]. (So change Of, Clone, HasAny, Union, etc.) As it is right now, this type uses a mix of pointer and non-pointer types that looks pretty unusual for a data structure type.

Any reason why Union, Intersection and Difference are functions instead of methods? image.Rectangle has Union, Intersect and other binary operations as methods, image.Point and the types in math/big as well.

Methods would feel more like infix notation. My suggestion would be Set[Elem].Union, Set[Elem].Intersect, Set[Elem].Diff

1 reply
@Merovius

Any reason why Union, Intersection and Difference are functions instead of methods?

Union is AddSet and Difference is RemoveSet, except that they operate in-place. There's definitely value in having both an in-place version and a not-in-place version, so adding Union as a method would mean it's the not-in-place version. Personally, I find it unsavory to return a new Set from a method (Clone is an obvious exception). image.Rectangle is different, to me, because it has a constant (small) size.

Why the requirement not to modify the set during iteration? Maps don't have that requirement and that's a very helpful property. Also, iterating by using a function is awkward (you can't just return from the outer function, or break more than one level of loop), so I wonder if it might be nicer to support some kind of iterator so iteration can be done with a straightforward for loop.

11 replies
@Merovius

so I wonder if it might be nicer to support some kind of iterator so iteration can be done with a straightforward for loop.

It's surprisingly hard to write an iterator over a map. reflect does it, so container/set could do as well, but it seems a bit strange for a package like this to require cooperation from the runtime to write.

That being said, I also would prefer an Iterator-API.

@cespare

On golang-dev @ianlancetaylor mentioned that another idea he considered (but liked less) was to name this package sets rather than container/sets and to have it operate on maps rather than creating an opaque set type.

I'm not sure whether that means a type like this

type Set[T comparable] map[T]struct{}
func (s Set[T]) Has(v T) bool

or functions like this

func Has[T comparable](set map[T]struct{}, v T) bool

but either way, a benefit is that we can use normal range iteration:

for elem := range mySet { ... }

There are definitely downsides to that approach, though, and on net I think I also prefer the container/set approach described here.

@ianlancetaylor

I don't think we know what general purpose iteration over generic containers should look like, so I'm a bit reluctant to propose something here. But clearly some form of iteration is required to make that useful, so I borrowed the one currently provided by container/ring.

"Len" is an unusual name for the cardinality of a set, the colloquial term is usually "Size". Of course, "Len" would fit the Len method in sort.Interface (but the proposed Set type can't be sorted), and other types are measured via the "len" builtin function, so that's probably where it comes from.

2 replies
@DeedleFake

Both bytes.Buffer and strings.Builder have a Len() method and aren't sortable, though they are ordered. While I agree that Len() isn't the usual name for the cardinality, I think that I prefer the consistency over whatever's going on with Java's standard library.

@ianlancetaylor

I think Len is the standard Go name here. Even len(m) works for any map m.

Can we get explicit promises for the iteration order? And maybe another hashset without such promises?

10 replies
@Merovius

Or we could get a separate set.Ordered type. Generally, it seems very appropriate to me for a set to not be ordered by default.

@fzipp

@Merovius A second set type would raise more questions about the standalone functions (Union, Intersection, Difference), unless it gets its own package, like container/set/ordered.Set.

@Merovius

Maybe an argument to make those methods.

I have no strong opinions on whether they should be types or packages, but I can't think of separate package names which are both meaningful (ordered is not, it doesn't say anything about sets) and reasonably short (orderedset is too long).

Making Add and Remove variadic doesn't feel quite right to me. These are two of the fundamental primitives for sets and having those methods involve slices seems like extra amount of complexity that doesn't sit flush with the rest of the API. I suspect that the vast majority of Add and Remove calls will have exactly one argument anyway, and there is some small performance penalty to the variadic version.

6 replies
@Merovius

there is some small performance penalty to the variadic version.

I would personally expect them to be inlineable and thus have no penalty. Though apparently i'm alone in assuming that the implementation will just be map[T]struct{}.

Personally, from an API perspective, I very much like that Add and Remove are variadic. It easily allows both to add a single element and a slice - and I don't even think adding a slice is that uncommon.

@cespare

Personally, from an API perspective, I very much like that Add and Remove are variadic. It easily allows both to add a single element and a slice - and I don't even think adding a slice is that uncommon.

IMO the main reason to use a variadic function is if it will mostly be called with a variable number of arguments in the call. Sometimes in code review I see that someone has written a variadic function but then all the call sites look like f(s...). In that case, it would be better for the function to take a normal, non-variadic slice.

Here, my intuition is that the usage will look like this:

s.Add(v)            // very common
s.Add(vs...)        // an order of magnitude less common
s.Add(v0, v1, v2)   // another order of magnitude less

If that is roughly correct, then the purpose of this being variadic is a kind of backdoor function overloading to allow accepting both a single value and a slice, and mostly not to support s.Add(v0, v1, v2). That's what doesn't feel quite right to me. When I look through all the variadic functions in the standard library (there aren't many) I can't spot any that that seem like they are used this way. Most of them exist to support printf-like functionality.

If we want to support passing in a single value and also passing in a slice, I think that two separate functions (Add(Elem) and AddAll([]Elem)) are better.

I would personally expect them to be inlineable and thus have no penalty.

That does not follow. There is more work to be done to handle a slice than there is to handle a single value: preparing the slice in the caller; looping over the slice in the callee. I'm sure the compiler can do a good job of reducing the overhead, will it get clever enough to reduce it to zero?

Just as an illustration, this benchmark compares Add(string) vs. Add(...string) using a trivial implementation of a string set using a map[string]struct{}. Both functions are inlined. At tip, this gives me

name    old time/op  new time/op  delta
Add-12  10.9ns ± 5%  11.7ns ± 2%  +6.95%  (p=0.000 n=10+10)
@Merovius

Fair enough. I find it surprising that there is a performance difference (albeit a small one, TBH), but I can't argue with the numbers. I was obviously making wrong assumptions.

I would expect EqualFunc, as in maps.

1 reply
@ianlancetaylor

The proposed maps EqualFunc function uses the function to compare map values, not map keys. Sets have no values, only keys, so there is no need for EqualFunc.

This proposal uses Has but Contains is used everywhere else (strings, bytes, the proposed slices). To be clear, I prefer Has on its own, but the inconsistency makes it harder to learn and I think that's more important.

1 reply
@ianlancetaylor

Switched to Contains.

This comment has been minimized.

@bcmills

This comment has been minimized.

// Filter deletes any elements from s for which keep returns false.
func (s Set[Elem]) Filter(keep func(Elem) bool)

In functional programming, Filter has the connotation of returning a new data structure. Can we find a different name for this?

Personally, I like Prune — which, to me, carries the connotation of lopping off parts (of a plant) in-place.

6 replies
@bcmills

(Compare #47330 (comment) for the maps discussion.)

@fzipp

@bcmills With Prune I would expect the predicate function to be negated.

@Sajmani

I was thinking Keep, which is the name of the function arg, too.

s.Keep(func(i int) {return i > 0})

Parallel to #47330 (comment) for maps, would it make sense to reserve the Values() name for an iterator, and name the method that returns a slice something like ValueSlice() instead?

4 replies
@cespare

Or ToSlice or AsSlice or just Slice?

@leighmcculloch

Instead of placing the slice function on the type could it live on the iterator? Values returns an iterator and then you can construct a slice from the iterator.

@bcmills

@leighmcculloch, it could, but then we would have to define the iterator API before we could extract the values from a Set.

I think extracting the values as a slice is a common enough operation to merit its own method, and I don't think that this proposal necessarily needs to be gated on figuring out an entire ecosystem of “iterator” APIs.

In Java, it is often annoying that the various container types' add() methods are void. Although some of that annoyance is solved here by it being variadic, would it be possible for Add(), Remove(), and the like to return their own receiver? They'd still be in place, but that way something could be added and the set could be passed to something else in a single expression.

1 reply
@ianlancetaylor

That is not standard Go style, though. And speaking purely personally, I would it misleading that code like F(s.Add(1)) would modify s. It's much clearer that s is modified when the entire statement is s.Add(1).

Retracted

As @rsc has retracted #47331 (comment), I've retracted this. I still don't like Do, and would rather go with @leighmcculloch 's suggestion to use Values and range (#47331 (comment))


If #47331 (comment) is agreed on and a Set is just a map[T]struct{}, I propose we drop Set.Do and just use range.

for e := range mySet { ... }

I like this much better than Do. Do not want.

6 replies
@ajlive

(Thanks to @leighmcculloch for pointing out that the range notation is really beautiful since you can drop the second param.)

@ajlive

Also agree with his general philosophy around using range wherever possible: #47331 (reply in thread)

Range is generic and works on a variety of types, even if it's only built-in types. I don't think it's so bad if we use it for sets. I think we should be leaning towards how do we use range on more collection types rather than less. Do would be harder to use.

@cristaloleg

Also using loop might help escape analysis (afair there were problems with closures and inlining).

@rogpeppe asked "Why the requirement not to modify the set during iteration?"
but that thread diverged into general iterators.

I think we should drop the requirement that the set is not modified.
The reason we allow modification of maps during range applies equally well here:
disallowing modification essentially makes modification "undefined behavior"
which means implementation-specific behaviors, portability problems, security holes, and abuse.

If modification during Do must be disallowed, then we should require an implementation to panic when it is attempted.
(Do can set a 'no modifications' bit that the other methods can check.)

0 replies
3 replies
@rsc

It's fine to write your own list-based set for a specific use case.
That's not a plausible general implementation in this package.
Remember that type Set here is a specific implementation, not an interface.

Also note that replying via email apparently starts a new thread instead of actually replying.
Please try to reply on the web instead.

@robaho

So sorry again. Clearly the world is telling me not to respond to these.

But that begs the point - why so much focus on what the “standard” implementation is. Why is that a topic? The details should be hidden away and can be changed easily if needed. The discussion should focus on the operations.

@rsc

Agree about focusing on operations. Marking this resolved.

I thought for a while about what if people want more complex set operations like symmetric difference, union of more than two sets, and so on. Clearly that can be built out of the pieces here:

set.Union(x, set.Union(y, z)) // three-set union
set.Union(set.Difference(x, y), set.Difference(y, x)) // symmetric difference of x and y.
set.Difference(set.Union(x, y), set.Intersection(x, y)) // also symmetric difference of x and y

and so on. The problem is that all of these generate significant garbage, like Map, Filter, and Reduce in the slices API.

I wonder Union, Intersection, and Difference should be removed and instead we should encourage programmers to write them using temporaries they can control, or reusing existing sets. This is apparently what Java does: see "Set Interface Bulk Operations" in https://docs.oracle.com/javase/tutorial/collections/interfaces/set.html.

In Java, it was impossible to have the top-level functions because Set is an interface not an implementation. The mutating receiver version specifies the implementation. Even so, it is still also the case that they avoided a garbage-heavy API.

In Rust these operation produce distinct types that appear to serve as iterators alone.

In Swift and Python they do generate new sets.

5 replies
@ajlive

I wonder Union, Intersection, and Difference should be removed

I'm surprised to find myself receptive to this idea, at least in the name of a minimal initial API

@ajlive

If we remove Union, Intersection, and Difference, we should add Java's Set.RetainAll.

@ajlive

However, following the naming convention in AddSet and RemoveSet, I don't think it's anywhere near as clear what RetainSet does compared to the other two methods. Maybe we should reconsider copying Java's API there: AddAll, RemoveAll, RetainAll.

I think the doc comment can be reduced to:

// A Set is a set of elements of some comparable type.
// The zero value of a Set is an empty set ready to use.
type Set[Elem comparable] struct {
    ... unexported fields ...
}

Calling sets "reference types" is confusing because they're not, at least as currently documented. They're just structs that happen to hold pointers, which is true of most Go types. We don't refer to them all as reference types.

Calling out that Sets cannot be modified concurrently from multiple goroutines is also confusing, because that's the default expectation for every Go data structure. We only document the deviations, when concurrent modifications are allowed.

(Compare list.List: it is just as much a "reference type" and is similarly safe to read but unsafe to modify from multiple goroutines, but we don't call attention to any of that. Or bytes.Buffer, which is also just as much a "reference type".)

Mentioning maps also encourages the reader to start thinking about questions like why isn't this a map or exactly how maps work. It is probably better to just let Sets be Sets.

0 replies

The Set as proposed, relevant components below, does not have a convenient method for iteration.

type Set[Elem comparable] struct {
	// contains filtered or unexported fields
}

// Values returns the elements in the set s as a slice.
// The values will be in an indeterminate order.
func (s *Set[Elem]) Values() []Elem

// Do calls f on every element in the set s,
// stopping if f returns false.
// f should not change s.
// f will be called on values in an indeterminate order.
func (s *Set[Elem]) Do(f func(Elem) bool)

I see two approaches to iteration given the current API:

  1. Using the Do method:

    Do allows you to provide a function that will be called for every element in the Set. This introduces a new pattern to iteration that isn't found in other types since iteration in Go in all other types are supported with a for range. This different pattern has some limitations.

    • It is not possible to return immediately from within an iteration. Instead, a value must be copied to a temporary variable and returned outside the iteration function, resulting in code that looks significantly different for iterating sets that other types. For example:

      func _() (Elem, bool) {
          set := set.Set{}
          // ...
          found := false
          var foundE Elem
          set.Do(func(e Elem) bool {
              if condition {
                  found = true
                  foundE = e
                  return false
              }
              return true
          })
          return foundE, found
      }

      I think it would be easier to work with map[key]struct{}. For example:

      func _() (Elem, bool) {
          set := map[Elem]struct{}{}
          // ...
          for v := range set {
              if condition {
                  return v, true
              }
          }
          return Elem{}, false
      }
    • Similar to the return example it is not possible to break out of higher up nested for's without a similar pattern.

  2. Using the Values method and range:

    This approach doesn't share the disadvantages of the Do approach, but it comes at a cost since the Set is copied into a slice.

Could we include in this proposal a method of iterating Set using range? I understand this would be a language change and is therefore worthy of a separate proposal. I'm happy to open a new issue for it but I was holding off because it sounded like we'd get range for free with #47331 (comment) but that proposal has been retracted.

24 replies
@ajlive

Agree that Do is bad and would rather make...do...with Values and range until a better solution is found.

@rsc

Do is much more efficient than Values. Also, lots of existing container types have Do methods for iteration.
(There are also Do methods for other things, which is unfortunate, but Do for iteration works nicely.)
It's not perfect, but it should be included.

@Merovius

I think it is beyond argument, currently, to develop a better iteration API before we release container/set. I tried suggesting a compromise but the feedback is a pretty unambiguous "we won't do iterators now".

Personally, I'd argue that we should remove Do then (I strongly agree that it's a bad iteration API), but given that a) we apparently won't release container/set with a different iteration API and b) it doesn't make sense to have a set API that doesn't allow to iterate over elements, I guess that would imply not releasing container/set, which also doesn't seem acceptable.

[Edit: this turns about to be confusion over naming rather than a problem with the API signature itself.]

The lack of a short-circuit mechanism for Do seems awkward to me.

The corresponding sync.Map method (Range) returns a boolean from the callback to allow the caller to stop early. That makes the calls much more efficient if the caller is looking for a property that is reasonably likely to be satisfied by an arbitrary small sample of elements from a much larger set.

It is clearly possible to write every Do call site nearly as efficiently (but a bit more verbosely) using Range, but I do not see a way to write a Range-like call (that breaks early) using Do. (The closest we have in the current API is ContainsAny, but that method requires that the “small sample” be a single element.)

5 replies
@Merovius

The signature of Do in the top-post is

// Do calls f on every element in the set s,
// stopping if f returns false.
// f should not change s.
// f will be called on values in an indeterminate order.
func (s *Set[Elem]) Do(f func(Elem) bool)

So, maybe I'm missing something, but ISTM that the short-circuit mechanism you want is already in place?

@deanveloper

I think I’ve written this a few times (sorry if I am a bit repetitive) - I really think that a standard iteration mechanism should be implemented before we add more containers. This avoids the problem of having both “the .Do way” and “the .Iter way” in the future, and it would also solve usability concerns such as this one.

@bcmills

Oh, so it is! Apparently I fail at reading comprehension today. 😩

Retracted


I think something that may be concerning is that because this is implemented using a map, this would mean that sets are not comparable. So, a Set of Sets is illegal. However, I'm not exactly sure how we could implement sets such that a set of sets would be useful, other than abandoning the map idea, and using custom hash/equality methods. I don't really like that idea though.

9 replies
@ajlive

This isn't a particular concern of mine, but my main experience with sets is in Python, where the same restriction applies:

>>> {{1,2}, {1,2}, {2,3}}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'set'

Are there languages that have a (relatively fundamental) set type where a set of sets is allowed?

@ajlive

The way they got around this in Python is that they created an immutable frozenset type that is hashable:

>>> set([frozenset([1,2]), frozenset([1,2]), frozenset([2,3])])
{frozenset({2, 3}), frozenset({1, 2})}

I imagine something like this would be possible in a future iteration of this package?

@deanveloper

Are there languages that have a (relatively fundamental) set type where a set of sets is allowed?

Particularly Java and Rust, which allow you to define a hash/equality function on a type:

  • In Java, this is done by allowing classes to override the int hashCode() and boolean equals(Object other) methods.
  • In Rust, this is done by implementing the Eq and Hash traits.

The Set implementation will then use those hash/equality functions instead of the default ones if they exist.

3 replies
@deanveloper

I don't think efficiency is the main reason why we wouldn't want to use custom hash functions, mainly just that it doesn't feel very Go-like

@Merovius

@robaho Please use the web interface when participating in discussions, for proper threading. You've been reminded a bunch of times about this.

As to your point: This might be something a custom implementation of a hash-set might do/allow via an API, but it doesn't address the actual point of having a set representation which can be used as a key for the builtin map. The builtin map does not allow overriding the equality operator or the hash, so you actually need to have a comparable type. Obviously, putting a cached hash into a struct doesn't help, because a) the rest of that struct would still have to support the equality operator just the same and b) the hash would then be part of the identity of the value as well, meaning it would have to be re-computed on every operation, which clearly loses all efficiency benefit.

@robaho

Sorry again. Why not fix this - it’s stupid. If I reply to a particular message it should know where the reply belongs.

Maybe this is a nitpick, but Intersection and Difference seem to be much longer than I'd expect.

Is there any appetite for the verb forms of Union, Intersect, and Diff?

12 replies
@deanveloper

IMO, exported names should (usually) avoid abbreviation. Also, Intersect is a verb, which may lead someone to think that Intersect is a mutable operation (where using the noun Intersection is much more clear that it returns a new set).

@sfllaw

If we look at the standard library, there are plenty of exported symbols that are abbreviated. I'd also like to point out that image/Rectangle.Intersect is an equivalent name for a pure method.

I am also considering #47331 (comment) and how clumsy the API will get with long names.

@as

To further expand on the last point, many other names in the standard library are abbreviations and result in a cleaner API.

func Jacobi(x, y *Int) int
type Float struct{ ... }
type Int struct{ ... }
type Rat struct{ ... }

Int is not Integer, Rat is not Rational, Float is not FloatingPoint and os is not operatingsystem. I also find it hard to believe people will think Diff means anything other than difference.

// Like maps, Sets are reference types.
// That is, for Sets s1 = s2 will leave s1 and s2 pointing to the same set of elements:
// changes to s1 will be reflected in s2 and vice-versa.
// Unlike maps, the zero value of a Set is usable; there is no equivalent to make.

Is it actually possible to implement this behaviour with usable zero values? Consider the following two code fragments:

var a, b set.Set[int]
a.Add(1)
b = a
b.Add(2)

and:

var a, b set.Set[int]
b = a
a.Add(1)
b.Add(2)

If sets really are reference types, then it shouldn't matter when I perform the assignment. But if the set is in its uninitialised state when I perform the assignment, there's no internal pointer to copy over linking the two variables.

If the implementation of Add() is "make() the internal map if it is currently nil", then the two code fragments will definitely not be equivalent.

6 replies
@Merovius

I think the doc-comment needs to update, to reflect the change of using Set as a pointer.

@jhenstridge

While the receivers for the methods were switched to pointers, the Of, Union, Intersection, and Difference functions all use non-pointers as arguments or return types. The same goes for the methods that take a set as an argument. That should probably change if we want to encourage people to pass sets as pointers.

@deanveloper

I don’t think using pointer receivers is as much about passing sets as pointers, as it is about allowing sets to satisfy interfaces easily. In the most common case, people shouldn’t need to explicitly make pointers to sets.

I don't see that it has been suggested yet, so I will:

  • Add method MarshalJSON that writes data as an alphanumerically sorted array (or [] when empty).
  • Add method UnmarshalJSON that decodes an array of elements into a set while removing any duplicates.

This makes sense if we agree that an array is the only/most reasonable way to encode a set into JSON, and if we want the set types, like the time.Time type and other commonly used types, to work well in JSON-based applications with minimal boiler-plate.

PS! This thread should be about JSON encoding support only. I think XML, sql etc. should be discussed (and probably rejected) in separate threads.

4 replies
@Merovius

This would require the set package to depend on json, which seems like a bad idea. Also

Add method MarshalJSON that writes data as an alphanumerically sorted array (or [] when empty)

This would require the element type to be constraints.Ordered, instead of comparable, so it isn't really possible.

And then there are also Func variants of sorting and there is a question of whether sort should be stable or not. And a lot of usecases wouldn't need sorting at all.

Note that this doesn't require a lot of code to do manually, e.g.

type JSONSet[T constraints.Ordered] []T

func (s *JSONSet[T]) MarshalJSON() ([]byte, error) {
    vals := s.Values()
    slices.Sort(vals)
    return json.Marshal(vals)
}

func (s *JSONSet[T]) UnmarshalJSON(p []byte) error {
    var vals []T
    if err := json.Unmarshal(p, &vals); err != nil {
        return err
    }
    *s = set.Of(vals...)
}

So this seems relatively straight forward to do, if you need it - and at that point, it's not a problem to require both json and set (and slices). So easy, that it seems easier to leave this up to people who can then also decide precisely what they want from it.

@smyrman

I think those are really good, compelling arguments. I will try to argue them anyway, or at least ask some follow up questions.

This would require the element type to be constraints.Ordered, instead of comparable, so it isn't really possible.

Maybe not. The goal here is really for the encoding to be consistent, so maybe we could just sort the JSON encoded bytes?

For reference, the map[K]V type works by ordering the text encoding of K using utf-8 sorting rules.

This would require the set package to depend on json

I think this is a good point, and I think we should avoid that dependency.

It would definitively be possible to make an implementation of Set[K].MarshalJSON that works for all the types that would work for map[K]V without such a dependency, but I can also see why this is a bad idea. The expectation probably should be that json.Marshal works for all valid types Set[T] where json.Marshal of []T works today, right?

So I just want to ask the obvious question: is it OK to have a dependency the other way (json depending on containers/set)?

Note that this doesn't require a lot of code to do manually.

It doesn't but I tend to need sets everywhere (or at least string sets), and I do end up duplicating that code in a lot of places. It's of course better with this suggestion so that I don't also have to duplicate the set semantics, but it would be really useful if at least for a limited subset of type T, then json.Marshal/json.Unmarshal of type Set[T], just worked.

@Merovius

I think it would be entirely reasonable for encoding/json to special-case the default-encoding of Set[T] - at least once it actually enters the stdlib. I think that would satisfy you, without needing to actually have a MarshalJSON method at all.

Is this still planned for Go 1.19 release?

2 replies
@fzipp

@ismail The Go 1.19 freeze started this week. As far as I am aware there are no generics-related additions to the standard library scheduled for Go 1.19 (except sync/atomic.Pointer[T]). The set type design hasn't even gone through the proposal process yet.

@ismail

@ismail The Go 1.19 freeze started this week. As far as I am aware there are no generics-related additions to the standard library scheduled for Go 1.19 (except sync/atomic.Pointer[T]). The set type design hasn't even gone through the proposal process yet.

Thanks a lot for the update!

Perhaps there should be a variant of the constructor to make a map with a given capacity. Might be useful in situations like:

itemIDs := set.WithCapacity[uint64](len(items))
for _, item := range items {
	itemIDs.Add(item.ID)
}
2 replies
@carlmjohnson

You can do set.Of(items) and it will probably do the efficient thing. I do think adding Set.Grow(int) and Set.Shrink() would probably be a good idea though.

@vfaronov

I can’t do set.Of(items) because it will be a Set[Item] whereas I need a Set[uint64].

Set.Grow() and Set.Shrink() are not great interface methods. It leaks implementation details and offers semantics that may not be applicable.

Better to add these to a ResizableSet interface or similar.

0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
generics Discussion-Closed