Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: container/set: new package to provide a generic set type #69230

Open
lucas-deangelis opened this issue Sep 3, 2024 · 23 comments
Open
Labels
Milestone

Comments

@lucas-deangelis
Copy link

lucas-deangelis commented Sep 3, 2024

This proposal is entirely based on the initial proposal and following discussion at #47331 and https://go.dev/blog/range-functions. There was a proposal to reopen this discussion 2 weeks ago (#69033) but from what I understand proposals must include what's actually proposed. I apologize if this is not the proper way to do things.

Proposal details

// Package set defines a Set type that holds a set of elements.
package set

// A Set is a set of elements of some comparable type.
// The zero value of a Set is an empty set ready to use.
type Set[Elem comparable] struct {
	// contains filtered or unexported fields
}

// Add adds elements to a set.
func (s *Set[Elem]) Add(v ...Elem)

// Remove removes elements from a set.
// Elements that are not present are ignored.
func (s *Set[Elem]) Remove(v ...Elem)

// Contains reports whether v is in the set.
func (s *Set[Elem]) Contains(v Elem) bool

// Len returns the number of elements in s.
func (s *Set[Elem]) Len() int

// All is an iterator over the elements of s.
func (s *Set[Elem]) All() iter.Seq[Elem]

// Union constructs a new set containing the union of s1 and s2.
func Union[Elem comparable](s1, s2 Set[Elem]) Set[Elem]

// Intersection constructs a new set containing the intersection of s1 and s2.
func Intersection[Elem comparable](s1, s2 Set[Elem]) Set[Elem]

// Difference constructs a new set containing the elements of s1 that
// are not present in s2.
func Difference[Elem comparable](s1, s2 Set[Elem]) Set[Elem]

This is a partial copy of the API proposed at #47331, with the doc comment modified following #47331 (comment), and a new All function that comes from https://go.dev/blog/range-functions.

@gopherbot gopherbot added this to the Proposal milestone Sep 3, 2024
@gabyhelp
Copy link

gabyhelp commented Sep 3, 2024

Related Issues and Documentation

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

@lucas-deangelis
Copy link
Author

Thank you gaby! I'll try compiling a small history of set-related issues:

January 2014: #7088

Proposed adding sets to the language itself (instead of adding them to the standard library). The closing comment, by https://github.com/ianlancetaylor:

Either Go 2 will have some support for generics, in which case you can define your own Set type (perhaps using an underlying map), or it won't, in which case we still won't want to add another special kind of type that is so similar to maps. So, we aren't going to do this. Closing.

There was also some discussion about the performance trade-offs between sets based on maps and sets based on slices.

November 2017: #22812

Proposed an unordered set type with "a simplified range operator for sets (with only one assigned variable)". Today this is covered by range over func/iterators.

It was closed as a duplicate of #7088 but also of #15292, a general proposal for adding generics to Go.

July 2021: #47331

Almost all of proposal is based on that discussion. This was done after generics were introduced in Go but before range over func/iterators (#61405). My understanding is that it was closed because it depended on the outcome of the iteration protocol, as said by https://github.com/fzipp in #61884 (comment):

Please see this previous discussion: #47331
A proposal for a set data type depends on the outcome of the iteration proposal #61405.

August 2021: #47963

Not directly related to sets but a more general question on what to do with the existing container packages following the adoption of generics.

August 2023: #61884

Closed for being a duplicate of #47331.

@earthboundkid
Copy link
Contributor

Yes, it's finally time to add this! 😄 🎉

I suggest calling the iterator Values() rather than All() because slices.All is an iter.Seq2 that includes the element indices and maps.All is an iter.Seq2 of keys and values. Since this iterator would only be an iter.Seq, the name Values might be more appropriate.

@earthboundkid
Copy link
Contributor

I suggest also adding set.From which takes an iter.Seq. This would let you easily consume a map or a slice, etc.

@jimmyfrasche
Copy link
Member

maps/slices calls that Collect

@ianlancetaylor
Copy link
Member

Thanks for the proposal. I was actually getting very close to making my own proposal. Here is mine, for comparison.

The main thing I am still pondering is whether functions that return a set should return a Set[E] or a *Set[E]. There are clear arguments for either choice.

// Package set defines a Set type.
package set

import(
	"iter"
	"maps"
)

// A Set is a set of elements of some comparable type.
// The zero value of a Set is ready to use.
// A Set contains a map, and has similar performance characteristics,
// copying behavior, and similar handling of floating-point NaN values.
type Set[E comparable] struct {
	m map[E]struct{}
}

// Of returns a new [Set] containing the listed elements.
func Of[E comparable](v ...E) Set[E] {
	s := Set[E]{m: make(map[E]struct{})}
	for _, w := range v {
		s.m[w] = struct{}{}
	}
	return s
}

// Add adds an element to s.
// It reports whether the element was not present before.
func (s *Set[E]) Add(v E) bool {
	if s.m == nil {
		s.m = make(map[E]struct{})
	}
	ln := len(s.m)
	s.m[v] = struct{}{}
	return len(s.m) > ln
}

// AddSeq adds the values from seq to s.
func (s *Set[E]) AddSeq(seq iter.Seq[E]) {
	for v := range seq {
		s.Add(v)
	}
}

// Delete removes an element from s.
// It reports whether the element was in the set.
func (s *Set[E]) Delete(v E) bool {
	ln := len(s.m)
	delete(s.m, v)
	return len(s.m) < ln
}

// DeleteSeq deletes the elements in seq from s.
// Elements that are not present are ignored.
func (s *Set[E]) DeleteSeq(seq iter.Seq[E]) {
	for v := range seq {
		delete(s.m, v)
	}
}

// DeleteFunc deletes the elements in s for which del returns true.
func (s *Set[E]) DeleteFunc(del func(E) bool) {
	for v := range s.m {
		if del(v) {
			delete(s.m, v)
		}
	}
}

// Contains reports whether s contains an element.
func (s *Set[E]) Contains(v E) bool {
	_, ok := s.m[v]
	return ok
}

// ContainsAny reports whether any of the elements in seq are in s.
// It stops reading from seq after finding an element that is in s.
func (s *Set[E]) ContainsAny(seq iter.Seq[E]) bool {
	for v := range seq {
		if _, ok := s.m[v]; ok {
			return true
		}
	}
	return false
}

// ContainsAll reports whether all of the elements in seq are in s.
// It stops reading from seq after finding an element that is not in s.
func (s *Set[E]) ContainsAll(seq iter.Seq[E]) bool {
	for v := range seq {
		if _, ok := s.m[v]; !ok {
			return false
		}
	}
	return true
}

// All returns an iterator over all the elements of s.
// The elements are returned in an unpredictable order.
func (s *Set[E]) All() iter.Seq[E] {
	return func(yield func(E) bool) {
		for v := range s.m {
			if !yield(v) {
				return
			}
		}
	}
}

// Equal reports whether s and s2 contain the same elements.
func (s *Set[E]) Equal(s2 *Set[E]) bool {
	if len(s.m) != len(s2.m) {
		return false
	}
	for v := range s2.m {
		if _, ok := s.m[v]; !ok {
			return false
		}
	}
	return true
}

// Clear removes all elements from s, leaving it empty.
func (s *Set[E]) Clear() {
	clear(s.m)
}

// Clone returns a copy of s. This is a shallow clone:
// the new elements are set using ordinary assignment.
func (s *Set[E]) Clone() Set[E] {
	return Set[E]{m: maps.Clone(s.m)}
}

// Len returns the number of elements in s.
func (s *Set[E]) Len() int {
	return len(s.m)
}

// Collect collects elements from seq into a new [Set].
func Collect[E comparable](seq iter.Seq[E]) Set[E] {
	var r Set[E]
	for v := range seq {
		r.Add(v)
	}
	return r
}

// Union returns a new [Set] containing the union of two sets.
func Union[E comparable](s1, s2 *Set[E]) Set[E] {
	var r Set[E]
	for v := range s1.m {
		r.Add(v)
	}
	for v := range s2.m {
		r.Add(v)
	}
	return r
}

// Intersection returns a new [Set] containing the intersection of two sets.
func Intersection[E comparable](s1, s2 *Set[E]) Set[E] {
	var r Set[E]
	ma, mb := s1.m, s2.m
	if len(ma) > len(mb) {
		// Loop through the shorter set.
		mb, ma = ma, mb
	}
	for v := range ma {
		if _, ok := mb[v]; ok {
			r.Add(v)
		}
	}
	return r
}

// Difference returns a new [Set] containing the elements of s1
// that are not present in s2.
func Difference[E comparable](s1, s2 *Set[E]) Set[E] {
	var r Set[E]
	for v := range s1.m {
		if _, ok := s2.m[v]; !ok {
			r.Add(v)
		}
	}
	return r
}

// SymmetricDifference returns a new [Set] containing the elements
// that are in either s1 or s2, but not both.
func SymmetricDifference[E comparable](s1, s2 *Set[E]) Set[E] {
	var r Set[E]
	ma, mb := s1.m, s2.m
	if len(ma) > len(mb) {
		// Loop through the shorter set.
		mb, ma = ma, mb
	}
	for v := range ma {
		if _, ok := mb[v]; !ok {
			r.Add(v)
		}
	}
	return r
}

@ianlancetaylor ianlancetaylor moved this to Incoming in Proposals Sep 3, 2024
@ianlancetaylor
Copy link
Member

@earthboundkid Our current convention is that the All method should return an iterator over all elements of the container. For a Set that should clearly be a iter.Seq[E]. It's true that slices and maps use a iter.Seq2, but that is because those containers have both a key and a value. A set only has a key.

@mateusz834
Copy link
Member

mateusz834 commented Sep 3, 2024

@ianlancetaylor map in internally a pointer, so i believe it should return a Set[E], the method receivers should be a Set[E] and functions line Difference should accept a Set[E]. Do you see any issues with that?

EDIT: I missed that methods like Add, initialize the map, so not every method.

@ianlancetaylor
Copy link
Member

@mateusz834 Right, with the approach I''m outlining some methods have to use a pointer receiver. And it's generally easier to understand a type if all receivers are either pointers or values.

@jimmyfrasche
Copy link
Member

Missing predicates: Subset, ProperSubset, Disjoint.

@lucas-deangelis
Copy link
Author

@ianlancetaylor For what it's worth, I do like your API way more.

@mvdan
Copy link
Member

mvdan commented Sep 3, 2024

Can we avoid the stutter in set.Set? Even maps.Set or container.Set would seem better to me.

@jimmyfrasche
Copy link
Member

maps is for maps not sets and if it's in container why not put EVERYTHING there?

@earthboundkid
Copy link
Contributor

You will rarely write set.Set as an end user because it mostly starts with set.Of or set.Collect. It seems like you’d only need to write it in struct definitions.

@septemhill
Copy link

Missing predicates: Subset, ProperSubset, Disjoint.

How about Disjoint Union ?

@lucas-deangelis
Copy link
Author

A list of predicates found in other Go libraries and languages:

Predicates contains containsAll empty equal subset properSubset superset properSuperset disjoint
Go proposal Contains ContainsAll Equal
golang-set Contains IsEmpty Equal IsSubset IsProperSubset IsSuperset IsProperSuperset
go-set Contains Empty EqualSet Subset ProperSubset
C++ contains
Dart contains containsAll isEmpty ==
Haskell member null isSubsetOf isProperSubsetOf disjoint
Java contains containsAll
JavaScript has isSubsetOf isSupersetOf isDisjointFrom
Kotlin contains containsAll isEmpty
PHP contains isEmpty
Python in issubset/<= < issuperset/>= > isdisjoint
Ruby include? empty? == subset?/<= proper_subset?/< superset?/>= proper_superset?/> disjoint?
Rust contains is_empty is_subset is_superset is_disjoint
Scala contains subsetOf
Swift contains isEmpty == isSubset isStrictSubset isSuperset isStrictSuperset disjoint

I didn't include idioms which makes it looks like the proposal is missing "empty", but for containers in Go it's myContainer.Len() == 0.

@apparentlymart
Copy link

apparentlymart commented Sep 7, 2024

One other set-related function I've used in some of my many hand-written set implementations over the years is Cartesian product, which I expect would have a signature something like this:

func CartesianProduct[ElemA, ElemB comparable](a Set[ElemA], b Set[ElemB]) Set[struct { a ElemA, b ElemB }]

However, I do already have mixed feelings about whether it's common enough to deserve to be in stdlib. Compared to the other set operations described I've needed it far less frequently, but I have wanted it more than once.

If this does seem like something worth implementing then I suppose it's another potential use-case for tuple structs (#63221) and/or for variadic generics (#56462, #66651). I suppose that's probably a good argument against offering this initially so that those broader related proposals have some chance to settle first.

@guneyizol
Copy link

Thanks for the proposal, I strongly believe that a language such as Go which takes pride in its standart library should have more well-designed generic data structures.

To add to the ideas presented here, a constructor function such as func New(size int) Set[E] {...} would be beneficial for performance reasons, similar in philosophy to make(map[KeyType]ValueType, size).

@Albertp4
Copy link

Albertp4 commented Nov 11, 2024

Is there a reason why some common Set functions are missing in this proposal? I.e empty, subset, properSubset, superset, properSuperset, and disjoint.

@thinkgos
Copy link

Miss some function Diff, DiffVary, may be it also usefull.

// Diff returns s diff of s2, return added, removed, remained sets
// with the given s2 set.
// For example:
// s1 = {a1, a3, a5, a7}
// s2 = {a3, a4, a5, a6}
// added = {a4, a6}
// removed = {a1, a7}
// remained = {a3, a6}
func (s Set[T]) Diff(s2 Set[T]) (added, removed, remained Set[T]) {
	removed = newSet[T](len(s))
	added = newSet[T](len(s2))
	remained = newSet[T](len(s))
	for key := range s {
		if s2.Contains(key) {
			remained[key] = struct{}{}
		} else {
			removed[key] = struct{}{}
		}
	}
	for key := range s2 {
		if !s.Contains(key) {
			added[key] = struct{}{}
		}
	}
	return added, removed, remained
}

// DiffVary returns s diff of s2, return added, removed sets
// with the given s2 set.
// For example:
// s1 = {a1, a3, a5, a7}
// s2 = {a3, a4, a5, a6}
// added = {a4, a6}
// removed = {a1, a7}
func (s Set[T]) DiffVary(s2 Set[T]) (added, removed Set[T]) {
	removed = newSet[T](len(s))
	added = newSet[T](len(s2))
	for key := range s {
		if !s2.Contains(key) {
			removed[key] = struct{}{}
		}
	}
	for key := range s2 {
		if !s.Contains(key) {
			added[key] = struct{}{}
		}
	}
	return added, removed
}

@apparentlymart
Copy link

apparentlymart commented Dec 13, 2024

@thinkgos It seems like those functions are just combinations of functions already included in the proposal. 🤔

func (s Set[T]) Diff(s2 Set[T]) (added, removed, remained Set[T]) {
	return s2.Difference(s), s.Difference(s2), Intersection(s, s2)
}

func (s Set[T]) DiffVary(s2 Set[T]) (added, removed Set[T]) {
	return s2.Difference(s), s.Difference(s2)
}

Admittedly your Diff can calculate both remained and removed in a single loop whereas my version would loop twice, but it doesn't seem like that would make any significant difference. Otherwise, these implementations seem equivalent to me. Would you agree?

It doesn't seem to me that these new additions are really adding enough to justify their presence, particularly given that it seems confusing to have both Set.Difference and Set.Diff with different behavior (the word "diff" is just an abbreviation of "difference".)

@thinkgos
Copy link

@apparentlymart Yes! I agree! Difference and Intersection can meet the requirements. because my code need get added and removed at the same time frequently. so I add this functuon. maybe the function name should rename another name.

@paskozdilar
Copy link

paskozdilar commented Feb 8, 2025

I've implemented a Set library before using type definition type Set[E comparable] map[E]struct{} and most of the methods mentioned here.

Some nice side effects of this choice:

  1. Set can be created using make, simplifying construction (we could use New(size int) as another comment mentioned, but then we cannot have New() without size param as Go doesn't support ad-hoc polymorphism)
  2. Set can be for-range iterated, just like maps, so an iterator method is not required

Some not-so-nice side effects:

  1. Set is not a struct, so implementation cannot be changed/optimized in the future
  2. Set can be for-range iterated with two values, second one always being struct{}{}

Additionally, one of the methods I've implemented in my library is SymmetricDifference.
One can emulate it using Union of Differences, but if the use case is common enough, maybe it would be more efficient to have a dedicated method for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests