Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iter: new package for iterators #61897

Open
rsc opened this issue Aug 9, 2023 · 177 comments
Open

iter: new package for iterators #61897

rsc opened this issue Aug 9, 2023 · 177 comments

Comments

@rsc
Copy link
Contributor

rsc commented Aug 9, 2023

We propose to add a new package iter that defines helpful types for iterating over sequences. We expect these types will be used by other APIs to signal that they return iterable functions.

This is one of a collection of proposals updating the standard library for the new 'range over function' feature (#61405). It would only be accepted if that proposal is accepted.

See also:

Note regarding push vs pull iterator types: The vast majority of the time, push iterators are more convenient to implement and to use, because setup and teardown can be done around the yield calls rather than having to implement those as separate operations and then expose them to the caller. Direct use (including with a range loop) of the push iterator requires giving up storing any data in control flow, so individual clients may occasionally want a pull iterator instead. Any such code can trivially call Pull and defer stop.

I’m unaware of any significant evidence in favor of a parallel set of pull-based APIs: instead, iterators can be defined in push form and preserved in that form by any general combination functions and then only converted to pull form as needed at call sites, once all adapters and other transformations have been applied. This avoids the need for any APIs that make cleanup (stop functions) explicit, other than Pull. Adapters that need to convert to pull form inside an iterator function can defer stop and hide that conversion from callers. See the implementation of the adapters in #TODO for examples.

Note regarding standard library: There a few important reasons to convert the standard library:

  • Ship a complete solution. We should not release a package we don’t use in obvious places where it should be used. This applies especially to new interfaces.
  • Put functionality in the right places. Ian’s earlier draft included FromSlice and FromMap, but these are more appropriately slices.All and maps.All.
  • Find problems or rough edges in the iter package itself that we want to find before its release. I have already changed a few definitions from what I started with as a result of working through the standard library changes.
  • Set an example (hopefully a good one) for others to follow.

Note that os.ReadDir and filepath.Glob do not get iterator treatment, since the sorted results imply they must collect the full slice before returning any elements of the sequence. filepath.SplitList could add an iterator form, but PATH lists are short enough that it doesn’t seem worth adding new API. A few other packages, like bufio, archive/tar, and database/sql might benefit from iterators as well, but they are not used as much, so they seem okay to leave out from the first round of changes.


The iter package would be:

/*
Package iter provides basic definitions and operations related to
iterators over sequences.

Iterators

An iterator is a function that passes successive elements of a
sequence to a callback function, conventionally named yield, stopping
either when the sequence is finished or when yield breaks the sequence
by returning false. This package defines [Seq] and [Seq2]
(pronounced like seek - the first syllable of sequence)
as shorthands for iterators that pass 1 or 2 values per sequence element
to yield:

type (
	Seq[V any]     func(yield func(V) bool) bool
	Seq2[K, V any] func(yield func(K, V) bool) bool
)

Seq2 represents a sequence of paired values, conventionally key-value,
index-value, or value-error pairs.

Yield returns true when the iterator should continue with the next
element in the sequence, false if it should stop. The iterator returns
true if it finished the sequence, false if it stopped early at yield's
request. The iterator function's result is used when composing
iterators, such as in [Concat]:

func Concat[V any](seqs ...Seq[V]) Seq[V] {
	return func(yield func(V) bool) bool {
		for _, seq := range seqs {
			if !seq(yield) {
				return false
			}
		}
		return true
	}
}

Iterator functions are most often called by a range loop, as in:

func PrintAll[V any](seq iter.Seq[V]) {
	for _, v := range seq {
		fmt.Println(v)
	}
}

Naming

Iterator functions and methods are named for the sequence being walked:

// All returns an iterator over elements in s.
func (s *Set[V]) All() iter.Seq[V]

The iterator method on a collection type is conventionally named All,
as in the second example, because it iterates a sequence of all the
values in the collection.

When there are multiple possible iteration orders, the method name may
indicate that order:

// All iterates through the list from head to tail.
func (l *List[V]) All() iter.Seq[V]

// Backward iterates backward through the list from tail to head.
func (l *List[V]) Backward() iter.Seq[V]

If an iterator requires additional configuration, the constructor function
can take additional configuration arguments:

// Bytes iterates through the indexes and bytes in the string s.
func Bytes(s string) iter.Seq2[int, byte]

// Split iterates through the (possibly-empty) substrings of s
// separated by sep.
func Split(s, sep string) iter.Seq[string]

Single-Use Iterators

Most iterators provide the ability to walk an entire sequence:
when called, the iterator does any setup necessary to start the
sequence, then calls yield on successive elements of the sequence,
and then cleans up before returning. Calling the iterator again
walks the sequence again.

Some iterators break that convention, providing the ability to walk a
sequence only once. These “single-use iterators” typically report values
from a data stream that cannot be rewound to start over.
Calling the iterator again after stopping early may continue the
stream, but calling it again after the sequence is finished will yield
no values at all, immediately returning true. Doc comments for
functions or methods that return single-use iterators should document
this fact:

// Lines iterates through lines read from r.
// It returns a single-use iterator.
func (r *Reader) Lines() iter.Seq[string]

Errors

If iteration can fail, it is conventional to iterate value, error pairs:

// Lines iterates through the lines of the named file.
// Each line in the sequence is paired with a nil error.
// If an error is encountered, the final element of the
// sequence is an empty string paired with the error.
func Lines(file string) iter.Seq2[string, error]

Pulling Values

Functions and methods that are iterators or accept or return iterators
should use the standard yield-based function signature, to ensure
compatibility with range loops and with other iterator adapters.
The standard iterators can be thought of as “push iterator”, which
push values to the yield function.

Sometimes a range loop is not the most natural way to consume values
of the sequence. In this case, [Pull] converts a standard push iterator
to a “pull iterator”, which can be called to pull one value at a time
from the sequence. [Pull] starts an iterator and returns a pair
of functions next and stop, which return the next value from the iterator
and stop it, respectively.

For example:

// Pairs returns an iterator over successive pairs of values from seq.
func Pairs[V any](seq iter.Seq[V]) iter.Seq2[V, V] {
	return func(yield func(V, V) bool) bool {
		next, stop := iter.Pull(it)
		defer stop()
		v1, ok1 := next()
		v2, ok2 := next()
		for ok1 || ok2 {
			if !yield(v1, v2) {
				return false
			}
		}
		return true
	}
}

Clients must call stop if they do not read the sequence to completion,
so that the iterator function can be allowed to finish. As shown in
the example, the conventional way to ensure this is to use defer.

Other Packages

Many packages in the standard library provide iterator-based APIs. Here are some notable examples.

TODO FILL THIS IN AS OTHER PACKAGES ARE UPDATED

Mutation

Iterators only provide the values of the sequence, not any direct way
to modify it. If an iterator wishes to provide a mechanism for modifying
a sequence during iteration, the usual method is to define a position type
with the extra operations and then provide an iterator over positions.

For example, a tree implementation might provide:

// Positions iterates through positions in the sequence.
func (t *Tree[V]) Positions() iter.Seq[*Pos]

// A Pos represents a position in the sequence.
// It is only valid during the yield call it is passed to.
type Pos[V any] struct { ... }

// Pos returns the value at the cursor.
func (p *Pos[V]) Value() V

// Delete deletes the value at this point in the iteration.
func (p *Pos[V]) Delete()

// Set changes the value v at the cursor.
func (p *Pos[V]) Set(v V)

And then a client could delete boring values from the tree using:

for p := range t.Positions() {
	if boring(p.Value()) {
		p.Delete()
	}
}

*/
package iter

// Seq is an iterator over sequences of individual values.
// See the [iter] package documentation for details.
type Seq[V any] func(yield func(V) bool) bool
// Seq2 is an iterator over pairs of values, conventionally
// key-value or value-error pairs.
// See the [iter] package documentation for details.
type Seq2[K, V any] func(yield func(K, V) bool) bool

Note: If and when generic type aliases are implemented (#46477), we might also want to add type Yield[V any] = func(V bool) and type Yield2[K, V any] = func(K, V) bool. That way, code writing a function signature to implement Seq or Seq2 can write the argument as yield iter.Yield[V].

// Pull starts the iterator in its own coroutine, returning accessor functions.
// Calling next returns the next value from the sequence. When there is
// such a value v, next returns v, true. When the sequence is over, next
// returns zero, false. Stop ends the iteration, allowing the iterator function
// to return. Callers that do not iterate to the end must call stop to let the
// function return. It is safe to call stop after next has returned false,
// and it is safe to call stop multiple times. Typically callers should defer stop().
func Pull[V any](seq Seq[V]) (next func() (V, bool), stop func())
// Pull2 starts the iterator in its own coroutine, returning accessor functions.
// Calling next returns the next key-value pair from the sequence. When there is
// such a pair k, v, next returns k, v, true. When the sequence is over, next
// returns zero, zero, false. Stop ends the iteration, allowing the iterator function
// to return. Callers that do not iterate to the end must call stop to let the
// function return. It is safe to call stop after next has returned false,
// and it is safe to call stop multiple times. Typically callers should defer stop().
func Pull2[K, V any](seq Seq[K, V]) (next func() (K, V, bool), stop func())
@szabba
Copy link

szabba commented Aug 9, 2023

Note: If and when generic type aliases are implemented (#46477), we might also want to add type Yield[V any] = func(V bool) and type Yield2[K, V any] = func(K, V) bool. That way, code writing a function signature to implement Seq or Seq2 can write the argument as yield iter.Yield[V].

I think that's supposed to say type Yield[V any] = func(V) bool?

@icholy
Copy link

icholy commented Aug 9, 2023

In the "Pulling Values" example, I think next, stop := iter.Pull(it) should be next, stop := iter.Pull(seq)

@mateusz834
Copy link
Member

mateusz834 commented Aug 9, 2023

Isn't this kind of API more preferable? Consider a case when you need to pass the next and stop functions separately to an function, struct, it might be annoying.

type PullIter[T any] struct{}

func (*PullIter[T]) Stop()
func (*PullIter[T]) Next() (T, bool)

func Pull[V any](seq Seq[V]) PullIter[V]

@gazerro
Copy link
Contributor

gazerro commented Aug 9, 2023

Based on:

Range expression                                   1st value          2nd value
function, 1 value   f  func(func(V)bool) bool      value    v  V

the range in the Concat and PrintAll functions should have only one value?

@DmitriyMV
Copy link
Contributor

I believe

// Pairs returns an iterator over successive pairs of values from seq.
func Pairs[V any](seq iter.Seq[V]) iter.Seq2[V, V] {
		...
		next, stop := iter.Pull(it)
		...
}

should be

// Pairs returns an iterator over successive pairs of values from seq.
func Pairs[V any](seq iter.Seq[V]) iter.Seq2[V, V] {
		...
		next, stop := iter.Pull(seq)
		...
}

@jimmyfrasche
Copy link
Member

Meta: Most of the first post is about iterators in general. It's not obviously clear on a first reading what is actually going into the iter package. Also the formatting is a little rough.

@jimmyfrasche
Copy link
Member

Given that

  • every Seq2 can be converted to a Seq by a left or right projection
  • every Seq can be extended to a Seq2 by padding it like Seq[K]Seq2[K, struct{}] or an operation like python's enumerate

one way to avoid having F{,2} for each F would be to provide the extension/projection helpers in this package and let all the general operations be on Seq2 (possibly even naming that Seq and the other Seq1).

@jimmyfrasche
Copy link
Member

I know having a separate function/method to call to get the error can lead to people forgetting to do so but I really don't see how Seq2[K, error] makes sense except in the case where each K could have an associated error. I get why it seems like it would be appealing but I don't think it's going to be nice in practice:

It's easy to ignore with k := range f.

It's not easy to not ignore without being awkward. Neither

var k Key
var err error
for k, err = range f {
  if err != nil {
    break
  }
  proc(k)
}
if err != nil {
  handle(err)
}

nor

for k, err := range f {
  if err != nil {
    handle(err)
    break
  }
  proc(k)
}

look especially readable to me.

@DmitriyMV
Copy link
Contributor

@jimmyfrasche Map fits quite nicely for this for k, err := ... pattern. Working with buffers too. Parallel execution.

@earthboundkid
Copy link
Contributor

earthboundkid commented Aug 9, 2023

one way to avoid having F{,2} for each F would be to provide the extension/projection helpers in this package and let all the general operations be on Seq2 (possibly even naming that Seq and the other Seq1).

I think the other way around, you should have a Seq2 → Seq[Pair[T1, T2]] mapping and make Seq the default most places.

Edit: I guess you should have both mappings 1 → 2 and 2 → 1, but I also think 1 will end up being the default for things like Filter and TakeWhile etc.

@earthboundkid
Copy link
Contributor

Isn't this kind of API more preferable? …

I think in most cases pull iters will be used in place, not passed around, so a pair of funcs is better than an object with methods.

@rsc
Copy link
Contributor Author

rsc commented Aug 9, 2023

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@AndrewHarrisSPU
Copy link

xiter isn't addressing error-flavored 2-ary forms. There has also been a significant amount of prior discussion about e.g. Next() (T, error) (Maybe this is the TLDR? #61405 (comment))

If I thought an iter.Stream type or an iter/stream package would be worth exploring, I'm wondering if we'd have a preference or any guidance about where to explore that - in the current set of proposals, in a separate proposal, or leave that unresolved for now (maybe forever), etc.?

@djordje200179
Copy link

I think that method name for iterating through sequence shouldn't be All. Because my first guess when seeing that is that it is Python-like and C# LINQ-like method for checking if all elements meet the criteria.
Also, its non-consistent to have methods All and Backwards. In that case name Forwards would be more appropriate.

@Merovius
Copy link
Contributor

@djordje200179 IMO for _, v := range it.All() makes it clear, that this is not a predicate, but an iteration. And Forwards does not work as a conventional name, as not all data structures have a dedicated direction - see e.g. map.

@leaxoy
Copy link

leaxoy commented Aug 10, 2023

Although the proposal is attractive, but to be honest, the Seq2[K, V] looks ugly and very similar to the Seq[V], every time when add iter-func, we must consider there are two to version of Seq, it's a bad thing. In the long run, two versions of iterators can quickly bloat the code.

Introduce tuple type for Seq2 maybe another choice, but the final decision rests with the go official team.

What do you think about this potential problem, @rsc.

@kardianos

This comment was marked as resolved.

@earthboundkid
Copy link
Contributor

Re: errors, the proposal says:

If iteration can fail, it is conventional to iterate value, error pairs:

In practice, the .Err() pattern has lead to a lot of bugs where .Err() is omitted. The most glaring example is at https://github.com/features/copilot/ (click on write_sql.go and note the lack of .Err()). I think the existing .Err() iterators in the std library should probably just stick around because they already exist, but we want a new pattern moving forward.

Re: SQL, see #61637.

@bobg
Copy link

bobg commented Aug 10, 2023

Cosmetic suggestion: I like Of better than Seq as the type name. IMO Of is a good name for generic Go container types in general. Qualified with the package name and the element type it reads naturally: iter.Of[string] is an iterator of strings.

@gopherbot
Copy link

Change https://go.dev/cl/565935 mentions this issue: runtime: make checking if tracing is enabled non-atomic

@gophun
Copy link

gophun commented Feb 22, 2024

How are iter.Seqs reflexible? For instance, if I'm interested in detecting and using them in a func Marshal(v any) ([]byte, error) function, as proposed in #65873, or in a fmt-style function that takes values of type any?

I can detect range functions via reflection (below is an example for a 1-value range function). But how would I range over them?

func isRangeFunc(v any) (elemType reflect.Type, ok bool) {
	typ := reflect.TypeOf(v)
	if typ.Kind() == reflect.Func && typ.NumIn() == 1 && typ.NumOut() == 0 {
		yieldType := typ.In(0)
		if yieldType.Kind() == reflect.Func && yieldType.NumIn() == 1 && yieldType.NumOut() == 1 && yieldType.Out(0).Kind() == reflect.Bool {
			return yieldType.In(0), true
		}
	}
	return nil, false
}

@Merovius
Copy link
Contributor

Merovius commented Feb 22, 2024

But how would I range over them?

You can always use reflect.Value.Call. Though it probably makes sense to add first-level reflect support for them.

This works for me:

package main

import (
	"fmt"
	"iter"
	"reflect"
)

func main() {
	Range(Iterate([]int{1, 2, 3}))
}

func Range(v any) {
	rv := reflect.ValueOf(v)
	yt := rv.Type().In(0)
	rf := reflect.MakeFunc(yt, func(in []reflect.Value) []reflect.Value {
		fmt.Println(in[0].Interface())
		return []reflect.Value{reflect.ValueOf(true)}
	})
	rv.Call([]reflect.Value{rf})
}

func Iterate[T any](s []T) iter.Seq[T] {
	return func(yield func(T) bool) {
		for _, e := range s {
			if !yield(e) {
				return
			}
		}
	}
}

@gophun
Copy link

gophun commented Feb 22, 2024

Thank you, @Merovius !

As a reusable function:

func ToSeqAny(v any) (seq iter.Seq[any], ok bool) {
	if !IsRangeFunc(v) {
		return nil, false
	}
	rv := reflect.ValueOf(v)
	yt := rv.Type().In(0)
	return func(yield func(any) bool) {
		rf := reflect.MakeFunc(yt, func(in []reflect.Value) []reflect.Value {
			return []reflect.Value{reflect.ValueOf(yield(in[0].Interface()))}
		})
		rv.Call([]reflect.Value{rf})
	}, true
}

func IsRangeFunc(v any) bool {
	rt := reflect.TypeOf(v)
	if rt.Kind() == reflect.Func && rt.NumIn() == 1 && rt.NumOut() == 0 {
		yt := rt.In(0)
		if yt.Kind() == reflect.Func && yt.NumIn() == 1 && yt.NumOut() == 1 && yt.Out(0).Kind() == reflect.Bool {
			return true
		}
	}
	return false
}

@SOF3
Copy link

SOF3 commented Mar 1, 2024

What is the significance of adding a _, v instead of v when consuming an iter.Seq[V]? If we wanted to be consistent with the slice iteration syntax, shouldn't we just make it iter.Seq[int, V]?

gopherbot pushed a commit that referenced this issue Mar 8, 2024
Tracing is currently broken when using iter.Pull from the rangefunc
experiment partly because the "tracing is off" fast path in traceAcquire
was deemed too expensive to check (an atomic load) during the coroutine
switch.

This change adds trace.enabled, a non-atomic indicator of whether
tracing is enabled. It doubles trace.gen, which is the source of truth
on whether tracing is enabled. The semantics around trace.enabled are
subtle.

When tracing is enabled, we need to be careful to make sure that if gen
!= 0, goroutines enter the tracer on traceAcquire. This is enforced by
making sure trace.enabled is published atomically with trace.gen. The
STW takes care of synchronization with most Ms, but there's still sysmon
and goroutines exiting syscalls. We need to synchronize with those
explicitly anyway, which luckily takes care of trace.enabled as well.

When tracing is disabled, it's always OK for trace.enabled to be stale,
since traceAcquire will always double-check gen before proceeding.

For #61897.

Change-Id: I47c2a530fb5339c15e419312fbb1e22d782cd453
Reviewed-on: https://go-review.googlesource.com/c/go/+/565935
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
gopherbot pushed a commit that referenced this issue Mar 22, 2024
This change resolves a TODO in the coroutine switch implementation (used
exclusively by iter.Pull at the moment) to enable tracing. This was
blocked on eliminating the atomic load in the tracer's "off" path
(completed in the previous CL in this series) and the addition of new
tracer events to minimize the overhead of tracing in this circumstance.

This change introduces 3 new event types to support coroutine switches:
GoCreateBlocked, GoSwitch, and GoSwitchDestroy.

GoCreateBlocked needs to be introduced because the goroutine created for
the coroutine starts out in a blocked state. There's no way to represent
this in the tracer right now, so we need a new event for it.

GoSwitch represents the actual coroutine switch, which conceptually
consists of a GoUnblock, a GoBlock, and a GoStart event in series
(unblocking the next goroutine to run, blocking the current goroutine,
and then starting the next goroutine to run).

GoSwitchDestroy is closely related to GoSwitch, implementing the same
semantics except that GoBlock is replaced with GoDestroy. This is used
when exiting the coroutine.

The implementation of all this is fairly straightforward, and the trace
parser simply translates GoSwitch* into the three constituent events.

Because GoSwitch and GoSwitchDestroy imply a GoUnblock and a GoStart,
they need to synchronize with other past and future GoStart events to
create a correct partial ordering in the trace. Therefore, these events
need a sequence number for the goroutine that will be unblocked and
started.

Also, while implementing this, I noticed that the coroutine
implementation is actually buggy with respect to LockOSThread. In fact,
it blatantly disregards its invariants without an explicit panic. While
such a case is likely to be rare (and inefficient!) we should decide how
iter.Pull behaves with respect to runtime.LockOSThread.

Lastly, this change also bumps the trace version from Go 1.22 to Go
1.23. We're adding events that are incompatible with a Go 1.22 parser,
but Go 1.22 traces are all valid Go 1.23 traces, so the newer parser
supports both (and the CL otherwise updates the Go 1.22 definitions of
events and such). We may want to reconsider the structure and naming of
some of these packages though; it could quickly get confusing.

For #61897.

Change-Id: I96897a46d5852c02691cde9f957dc6c13ef4d8e7
Reviewed-on: https://go-review.googlesource.com/c/go/+/565937
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
@adonovan
Copy link
Member

adonovan commented Mar 25, 2024

I wonder whether All is the right name for a push iterator method, as opposed to, say Elements. I would expect a method named All to compute a conjunction across the elements (f(a) && ... && f(z)), which has a boolean result. The iterator (Elements) can be trivially expressed in terms of All by passing f=yield and discarding the result:

// Elements is a go1.23 iterator over the collection's elements.
func (c Collection[T]) Elements(yield func(T) bool) {
    _ = c.All(yield) // discard unwanted boolean
}

// All reports whether f is true for all elements of the collection.
func (Collection[T]) All(f func(T) bool) bool { ... }

it seems a shame to force the client to use more logic (and the machinery of a stackless coroutine) to re-materialize the discarded boolean:

all := true
for elem := collection.Elements {
    if !f(elem) {
        all = false
        break
    }
}

@adamroyjones
Copy link

I wonder whether All is the right name for a push iterator method, as opposed to, say Elements. I would expect a method named All to compute a conjunction across the elements (f(a) && ... && f(z)), which has a boolean result.

As ballast for this: this is what all? means in Ruby (Enumerable#all?) and Elixir (Enum.all?).

@Merovius
Copy link
Contributor

Merovius commented Mar 25, 2024

@adonovan To me, the fact this argument doesn't work for the dual Some weakens that argument. It makes it seem like a coincidence, to me.

func (c Collection[T]) Elements(yield func(T) bool) {
    // well, can't be implemented in terms of Some…
}
// Alternatively: What would Some(yield) be?

// Some reports whether f is true for at least one element in the collection.
func (Collection[T]) Some(f func(T) bool) bool { ... }

Meanwhile, when considering these as functions on iterators, they have perfectly dual implementations:

func All[E any](s iter.Seq[E], f func(E) bool) bool {
    for v := range s {
        if !f(E) {
            return false
        }
    }
    return true
}

func Some[E any](s iter.Seq[E], f func(E) bool) bool {
    for v := range s {
        if f(E) {
            return true
        }
    }
    return false
}

@adonovan
Copy link
Member

adonovan commented Mar 25, 2024

Some (usually called Any) isn't essential: using DeMorgan's laws, you can express it as !All(negate(f)). (And the converse is true too, so All (and thus Elements) can in fact be expressed in terms of Some: "it is not the case that some yield returned false".)

My point is that iterators can be trivially expressed as a wrapper around the slightly more general All, so we shouldn't take that name away from types that want to expose it, possibly along with an iterator.

@fzipp
Copy link
Contributor

fzipp commented Mar 25, 2024

so we shouldn't take that name away from types that want to expose it

It's not like there are no alternatives. In JavaScript, this function is named "every". We also have an "any" function in the slices package, there it's called "ContainsFunc" (named "some" in JavaScript, by the way). It's evident that there's no consistency across languages anyway. Personally, I advocate for using the good and concise name "All" for the most common operation, which is to get a sequence of all elements.

@Merovius
Copy link
Contributor

Merovius commented Mar 26, 2024

@adonovan Yes, but my argument wasn't that you can't express All using Some, but that Elements = All(yield) seems less profound, once you realize that Some(yield) is really nothing. I just don't think anyone would go "oh, right, iterating over elements is the same as if I just check if a predicate is true for everything in the collection", so expecting people to implement Elements in terms of All just seems confusing, more than anything.

(Also, I'll note the grain of salt that yield has side-effects, so while it works in this case, I don't find arguments using logical equivalencies necessarily convincing)

@atdiar
Copy link

atdiar commented Apr 1, 2024

@adonovan opened issue #66637 and it reminded that I'm not sure whether the question of giving a specific signature to yield functions by returning a defined boolean type had been addressed.

I know the idea came up a few times during the discussion but I can't recall having seen a rationale indicating that using plain bool was better.

Seems to me that returning a defined type (e.g. type IterDone bool) could help with discoverability?

It would be less likely for a type to implement the iterable interface by mistake for instance.

Is this sensible?

@Merovius
Copy link
Contributor

Merovius commented Apr 1, 2024

Returning a defined boolean type means that type needs to be defined somewhere. Given that the spec has to define the behavior of range and that boolean appears in the signature of those functions means that it would have to appear in the spec. We generally are stingy with identifiers being defined in the spec, if we can avoid it.

Add to this that func(E) bool is not assignable to func(E) IterDone or vise-versa.

I don't think a defined boolean type (different from bool) is a good idea.

@atdiar
Copy link

atdiar commented Apr 1, 2024

Yes it would have to be defined somewhere.

On the other hand I see the unassignability as the intended purpose.

It makes the semantics explicit. Even for documentation purposes.
Not sure that every func signature that has a single parameter and return a bool is supposed to convey "iterator".

It's also easy to convert since we have higher order functions/closures.

@Merovius
Copy link
Contributor

Merovius commented Apr 1, 2024

The point of mentioning the mutual non-assignability, is that it makes clear that "somewhere" has to mean "the spec". Because if we put bool into the spec and IterDone into package iter, then iter.Seq is no longer rangeable. And the cost of putting a new identifier into the spec is high and I don't see it justified by the relatively minor advantage of disambiguation.

Under the premise that we would be willing to put something like that into the spec, yes, the unassignability would become a bug, not a feature. But you should consider my argument as a whole.

Not sure that every func signature that has a single parameter and return a bool is supposed to convey "iterator".

That's an incorrect categorization. "Iterator" is characterized by "a function that takes [what you said]". Which is exotic enough that confusion should be rare enough, that "don't do that then" is an adequate response.

@atdiar
Copy link

atdiar commented Apr 1, 2024

Yes I did intend for the spec to include a IterDone type (or whatever it should be called, probably lowercase "i" would be more frictionless), as was done for error, which would define an iterator as a function that takes a much more specific kind of functions for yield.
An iterator has to be defined formally anyway so I didn't think that would be the point to discuss.
Unassignability would be a feature then, not a bug. Just as we use types to ensure/enforce semantics.

There is also a point about legibility/readability, especially for newcomers, which is also reflected in tooling.
So I still think that this is a question that should be at least open for consideration to some extent.

For instance, the current yield signature looks like every other predicate and the bool return does not necessarily convey that it is for iteration termination.
This is not self-documenting.

@antichris
Copy link

the current yield signature looks like every other predicate and the bool return does not necessarily convey that it is for iteration termination

That's valid point; this squishiness in the API makes me uneasy too. I can't think of a case where the ability to plop in a typical predicate unaltered as the yield is a useful feature.

OTOH, I can't also picture how a thing as deliberate as that could happen by accident, how a bug, caused by misuse of a predicate as yield, could hide for longer than the very first test run. ISTM that's not a rake that has to be stepped onto in order for one to grasp how is the application of yield different from that of other predicates. But maybe I just lack imagination.

@rogpeppe
Copy link
Contributor

rogpeppe commented Apr 3, 2024

For instance, the current yield signature looks like every other predicate and the bool return does not necessarily convey that it is for iteration termination.

I think this situation will be improved with the advent of generic type aliases, due to land in 1.23.
To quote from #61897:

Note: If and when generic type aliases are implemented (#46477), we might also want to add type Yield[V any] = func(V bool) and type Yield2[K, V any] = func(K, V) bool. That way, code writing a function signature to implement Seq or Seq2 can write the argument as yield iter.Yield[V].

So, when #46477 lands and iter.Yield is added, it will become conventional to write the yield signature as iter.Yield[T] which will be more nicely self-documenting.

For the record, I don't support adding a special named bool type for this.

@atdiar
Copy link

atdiar commented Apr 3, 2024

@rogpeppe 👍🏿 that should work. (Should be good enough)

@andig
Copy link
Contributor

andig commented Apr 24, 2024

Is there a full list of functions available? I.e. where would I find Filter and friends?

@gophun
Copy link

gophun commented Apr 24, 2024

@andig

Is there a full list of functions available?

The full list of functions of package iter is in the issue description above.

I.e. where would I find Filter and friends?

They are not part of this proposal, they are part of a different proposal for an experimental package, #61898.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Accepted
Development

No branches or pull requests