Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: package collection and iterator design for go #50112

Open
kulics opened this issue Dec 12, 2021 · 14 comments
Open

proposal: package collection and iterator design for go #50112

kulics opened this issue Dec 12, 2021 · 14 comments

Comments

@kulics
Copy link

kulics commented Dec 12, 2021

I propose to introduce package collection in the generic version.

Problem

Obviously, access to the container object necessarily involves an iterative algorithm. You can either write the traversal method to the container object (internal iterator) or not provide any traversal algorithm at all and let the people who use the container implement it themselves (external iterator). Both of these cases seem to solve the problem.

However, in the former case, the container is overloaded with functions, it is not only responsible for the maintenance of elements in its own "container" (add, delete, etc.), but also has to provide the interface to iterate over itself; and because of the problem of saving the iterated state, you cannot do multiple iterations over the same container object at the same time. The second way is less hassle, but the internal details of the container is exposed.

And the emergence of the iterator pattern, a good solution to the above two cases of disadvantages.

Before the go language had generics, we had a more limited means of uniformly abstracting all containers because the types of elements were diverse, either at the cost of performance or at the cost of size, but with the advent of generics, we now have the conditions that not only give application developers a uniform way of traversing them, but also give library developers the possibility of extending them.

By introducing the collection package and providing an iterator interface in the standard library, it can be used like database/sql for the benefit of countless go developers at a fraction of the cost.

Solution

I tried to design the iterator pattern in this repository using the generic feature of go:

https://github.com/kulics-works/gollection

I think it would be more useful to design iterators and related operations based on the following rules.

  • Key Point 1

The next function of the iterator returns values and bool types.

The reason is that this pattern is quite common in go, and is more in line with go's convention of not introducing optional types in addition.

type Iterator[T any] interface {
	Next() (T, bool)
}
  • Key Point 2

Iterator should also implement Iterable.

The iterator itself is also iterable, which allows iterators to be used as iterables and allows a wider range of arguments to the operator.

The cost of iterator is just one more function that returns itself, which is very cost effective.

type Iterator[T any] interface {
	Iter() Iterator[T]
	Next() (T, bool)
}

type Iterable[T any] interface {
	Iter() iterator[T]
}

The iterator function on intermediate operations should always accept an Iterable argument that returns an iterator, allowing for greater type compatibility.

func Map[T any, R any](transform func(T) R, it Iterable[T]) Iterator[R]
  • Key Point 3

Iterator operator functions such as map and filter should always be designed to operate as the preceding operation, placing the Iterable as the following argument.

This way the operation code is closer to the function name and is more readable when used in a nested fashion.

func Filter[T any](predecate func(T) bool, it Iterable[T]) Iterator[T]
func Map[T any, R any](transform func(T) R, it Iterable[T]) Iterator[R]
func Sum[T number](it Iterable[T]) T

func main() {
	Sum(Map(Square, Filter(TakeEvenNumbers, ListOf(1, 2, 3))))
}
  • Key Point 4

Always use a for loop as the implementation solution for iterator termination operations.

Even though it would be clearer to use recursive code, for loops are clearly more usable.

func Reduce[T any, R any](initial R, operation func(R, T) R, it Iterable[T]) R {
	var iter = it.Iter()
	var result = initial
	for v, ok := iter.Next(); ok; v, ok = iter.Next() {
		result = operation(result, v)
	}
	return result
}

Here are the api's that I think are essential.

package collection

type Iterator[T any] interface {
	Iter() Iterator[T]
	Next() (T, bool)
}

type Iterable[T any] interface {
	Iter() iterator[T]
}

// Convert a slice of any type to an iterator
func SliceIter[T any](source []T) Iterator[T]

// Convert a string to an iterator
func StringIter(source string) Iterator[byte]

type Pair[T any, R any] struct {
	First T
	Second R
}

// Returns an indexed iterator
func WithIndex[T any](it Iterable[T]) Iterator[Pair[int, T]]

func Map[T any, R any](transform func(T) R, it Iterable[T]) Iterator[R]

func Filter[T any](predecate func(T) bool, it Iterable[T]) Iterator[T]

// Returns an iterator that only iterates over the first specified number of elements
func Limit[T any](count int, it Iterable[T]) Iterator[T]

// Returns an iterator that skips the specified number of elements, the skipping is one-time.
func Skip[T any](count int, it Iterable[T]) Iterator[T]

// Returns an iterator that repeatedly skips the specified number of elements; the skipping is repetitive.
func Step[T any](count int, it Iterable[T]) Iterator[T]

// Returns a new iterator spliced by two Iterables
func Concat[T any](left Iterable[T], right Iterable[T]) Iterator[T]

type Number interface {
	type int, int64, int32, int16, int8, uint64, uint32, uint16, uint8, float64, float32
}

func Sum[T Number](it Iterable[T]) T

func Product[T Number](it Iterable[T]) T

func Max[T Number](it Iterable[T]) T

func Min[T Number](it Iterable[T]) T

func Average[T Number](it Iterable[T]) T

func Count[T any](it Iterable[T]) int

func ForEach[T any](action func(T), it Iterable[T])

func AllMatch[T any](predicate func(T) bool, it Iterable[T]) bool

func NoneMatch[T any](predicate func(T) bool, it Iterable[T]) bool

func AnyMatch[T any](predicate func(T) bool, it Iterable[T]) bool

func First[T any](it Iterable[T]) (value T, ok bool)

func Last[T any](it Iterable[T]) (value T, ok bool)

func At[T any](index int, it Iterable[T]) (value T, ok bool)

// Aggregate from front to back into one result
func Reduce[T any, R any](initial R, operation func(R, T) R, it Iterable[T]) R

// Aggregate from back to front into one result
func Fold[T any, R any](initial R, operation func(T, R) R, it Iterable[T]) R

With the above api, it will be very convenient to operate on any container type.

These are all the ideas.

@gopherbot gopherbot added this to the Proposal milestone Dec 12, 2021
@ianlancetaylor ianlancetaylor changed the title proposal: Iterator design for go proposal: iterator design for go Dec 12, 2021
@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented Dec 12, 2021

I'm not clear on what the actual proposal is here.

If you want to discuss iterators, please don't do so in an issue. Please use the golang-nuts Google Group. Thanks.

If this is a proposal, can you clarify what package you are proposing? And please use exported identifiers where appropriate to make clear what is exported and what is not. Thanks.

@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) Dec 12, 2021
@kulics
Copy link
Author

kulics commented Dec 12, 2021

@ianlancetaylor
OK, I'll adjust the description

@kulics kulics changed the title proposal: iterator design for go proposal: package collection and iterator design for go Dec 12, 2021
@aarzilli
Copy link
Contributor

aarzilli commented Dec 12, 2021

How is this going to work with built in maps, slices, arrays and strings? What changes are needed to support containers that are already in the standard library (container/list.List, sync.Map...)? Why are Sum, Min and Max essential but Product and Average aren't? Why is NoneMatch essential when there's already AnyMatch and AllMatch? What do Skip, Step and Fold do?

@kulics
Copy link
Author

kulics commented Dec 12, 2021

@aarzilli

How is this going to work with built in maps, slices, arrays and strings?

slices and strings is relatively easy, SliceIter is used to generate slices iterators, string can introduce another StringIter function.

Maps, since the internal structure is invisible, can only be supported by go itself for conversion operations, otherwise it may pay a lot of overhead. But it is not impossible to implement.

arrays can be converted to slices, and SliceIter can be reused simply by using the slice operation. until there is a constant generic, there may not be a clean solution. Or have the compiler generate different iterators based on size.

What changes are needed to support containers that are already in the standard library (container/list.List, sync.Map...)?

The standard library container only needs to add code that implements iterators to support collections, and the workload is manageable.

Why are Sum, Min and Max essential but Product and Average aren't?

The necessity is subjective, Product and Average are indeed useful and I also think they should be added to the package. Thanks for the suggestion.

Why is NoneMatch essential when there's already AnyMatch and AllMatch?

NoneMatch is better in name readability, and I think the cost of adding this function for readability is acceptable.

What do Skip, Step and Fold do?

Maybe I should add some notes.

Skip: Skip the specified number of iterations at once.

Step: Repeatedly skip the specified number of iterations.

Fold: the direction of fold is the opposite of reduce, reduce is usually front-to-back, while fold is in reverse order.

@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented Dec 12, 2021

I think that we need to do in this area is experiment by writing code and seeing how it works. I think it's way to early to add a package to the standard library.

@robpike
Copy link
Contributor

robpike commented Dec 12, 2021

The first thing to have is a clear definition of the problem to be solved, and why it is important. Start with the problem, not the solution.

@kulics
Copy link
Author

kulics commented Dec 13, 2021

@ianlancetaylor
Yes, we don't have to rush to add a package to the standard library, but we can have a forward-looking discussion before we start working on it.

The points I have made are based on a lot of practical thinking, and perhaps we can discuss scenarios I have tried.

Regarding the iterator pattern, as long as go introduces generics, then it is natural to use the iterator pattern to provide a uniform interface design for all container types, which is the case in most industrial languages, and I don't need to mention the benefits of this interface.

Then we only need to consider the design of the iterator, usually iterators either use HasNext and Next functions with the design, this design is very classic, but the two operations are not bound, compared to the common design of multiple return values in go language, it is less consistent with the go language habits.

type Iterator[T any] interface {
	HasNext() bool
	Next() T
}

for it := someIterator(); it.HasNext(); {
	item := it.Next()
	...
}

Using multiple return values without introducing optional types makes both operations safer, since we are used to using multiple return values to determine existence in go.

type Iterator[T any] interface {
	Next() (T, bool)
}

for it := someIterator(); true; {
	if v, ok := it.Next(); ok {
		...
	} else {
		break
	}
}

And about the iterator operator function, I tried the following patterns.

The first is a chain call that best fits the pipeline design, but since the go language interface has no default implementation and does not support generic recipient functions, there is no way to achieve the following effect.

ListOf(1, 2, 3).Filter(TakeEvenNumbers).Map(Square).Sum()

The second one is to introduce a separate type to design related operations after the java stream api, but the go language does not support generic receiver functions, so it is still not possible to define functions like Map that require more type parameters.

Stream(ListOf(1, 2, 3)).Filter(TakeEvenNumbers).Map(Square).Sum()

That leaves only global functions as an approach, and having operations close to function names is an obvious and more readable approach than having data close to function names, and python has adopted this design.

Sum(Map(Square, Filter(TakeEvenNumbers, ListOf(1, 2, 3))))
Sum(Map(Filter(ListOf(1, 2, 3), TakeEvenNumbers), Square))

I hope my experience will be helpful to you.

@kulics
Copy link
Author

kulics commented Dec 13, 2021

@robpike
Makes sense, I will update my description.

@robpike
Copy link
Contributor

robpike commented Dec 14, 2021

While this is all reasonable on its face, and consistent with the way other languages have approached this topic, your proposal is a very long list of "essential" functions. The list seems too long for a starting point, and to this admittedly biased reviewer not very Go-like.

I thank you for adding a problem statement up front, but it makes a number of assumptions that may not be valid. My own experience with other languages that try to generalize iteration is that it all becomes very messy. I'm not convinced we need or want a general solution here, and certainly not one this involved.

There must be a simpler solution, if indeed it's a problem that needs one.

My biases are showing, I admit.

@kulics
Copy link
Author

kulics commented Dec 14, 2021

@robpike

Admittedly, "essential" is my subjective choice, and the languages that do support iterators all support more or less similar functions. But that's certainly not the biggest problem, we can always remove functions that we don't think we need, or are not go-like.

The biggest problem is what things can be messing with a mature iterator pattern?

We know that a certain type of functionality can benefit from a unified interface, like database/sql, so why not containers?

With the introduction of generics, it was only logical to introduce a uniform interface for container types.

The only difference is the look of this interface.

I previously tried another version of this design, which was simple enough, but lost the benefit of lazy execution.

This is what it looks like:

type IteratorFunc[T any] func(func(T))

func (f IteratorFunc[T]) ForEach(action func(T)) {
	f(action)
}

type Iterator[T any] interface {
	ForEach(func(T))
}

func SliceIter[T any](source []T) Iterator[T] {
	return IteratorFunc[T](func(action func(T)) {
		for _, v := range source {
			action(v)
		}
	})
}

func Map[T any, R any](transform func(T) R, it Iterator[T]) []R {
	result := make([]R, 0)
	it.ForEach(func(v T) {
		result = append(result, transform(v))
	})
	return result
}

It is very simple and only requires providing the ForEach implementation to provide the same functionality.

The entire list can also be reduced to:

package collection

type Iterator[T any] interface {
	ForEach(func(T))
}

type IteratorFunc[T any] func(func(T))

func SliceIter[T any](source []T) Iterator[T]
func MapIter[K any, V any](source map[K]V) Iterator[Pair[K, V]]

type Pair[T any, R any] struct {
	First T
	Second R
}

func Map[T any, R any](transform func(T) R, it Iterator[T]) []R

func Filter[T any](predecate func(T) bool, it Iterator[T]) []T

func Reduce[T any, R any](initial R, operation func(R, T) R, it Iterator[T]) R

It is simple enough, if we accept its cost.

@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented Dec 14, 2021

Yes, we don't have to rush to add a package to the standard library, but we can have a forward-looking discussion before we start working on it.

Absolutely.

But that is not what the proposal process is for. The proposal process is to discuss a specific change that is clear and well understood, not for a general discussion of a change that is not clear and not well understood.

Also, our experience shows that the GitHub issue tracker is not a good mechanism for discussion. It is not threaded, and it is unwieldy as the number of comments increases.

So while I agree that this is a topic that deserves discussion, I don't think that this is the place for that discussion. I think it would be easier to handle on golang-nuts. Or perhaps we could create a GitHub discussion (not a GitHub issue) for this.

@kulics
Copy link
Author

kulics commented Dec 15, 2021

@ianlancetaylor
Yes, I agree with you. It seems like a GitHub discussion would be a good option, so how would I create one?

@carlmjohnson
Copy link
Contributor

carlmjohnson commented Dec 15, 2021

I disagree that this would be a good GitHub discussion. GitHub discussions only allow one level of threading. It's good for something like adding a new package netip or slices, where there are a number of functions/methods that have tradeoffs that need to be hashed out. For an open ended conversation, like #48287, it tends to just fall apart.

I think this needs to bake for longer. Make some generic packages. Use them in your real projects. Come back in 6 months or a year, and write some experience reports about what was convenient/inconvenient needs fixing/good enough and then make a iterators proposal.

@ghost
Copy link

ghost commented Feb 15, 2022

IMO the intention is good, but I don't think the gist of Golang is to add n^n functions that do the same thing, in the near future with generics we can implement any foreach (loop etc) as curry functions and expose like third party library, a good comparison might be to take a look at the Java world, it has many ways to do loops, but in the real world, in our daily work, we will block all usages except one (in our experience, the guideline of company or community). I come from Java, Typescript and I thank Golang because it is easy and simple, maybe some functions are missing to bring the functional paradigm closer but they can be solved with custom implementations.

Also you can check the new package that adds basic operations with slices https://pkg.go.dev/golang.org/x/exp/slices to see what is coming out.

Regards!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests

6 participants