Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: iter/sorted: functions on sorted iterators #70140

Open
jba opened this issue Oct 31, 2024 · 19 comments
Open

proposal: iter/sorted: functions on sorted iterators #70140

jba opened this issue Oct 31, 2024 · 19 comments
Labels
Milestone

Comments

@jba
Copy link
Contributor

jba commented Oct 31, 2024

I propose adding a package of functions that operate on iterators whose values are sorted.

Sorted sequences can arise in a number of ways:

Common operations on pairs of sorted sequences include merging them into a single sorted sequence, as well as the set operations union, intersection and set difference. I propose a package with these operations, with the API given below.

As is common in packages like slices, there are two functions for each operation, one using the natural ordering of a type and one that accepts a comparison function. We could also consider adding functions that operate on the keys of iter.Seq2s, bringing along the corresponding values. I don't know if those sequences would arise enough to make that worthwhile. We could reconsider adding the Seq2 functions if and when we add ordered maps.

API

package sorted

Package sorted provides operations over iterators whose values are sorted.

FUNCTIONS

func Intersect[T cmp.Ordered](s1, s2 iter.Seq[T]) iter.Seq[T]
    Intersect returns an iterator over all elements of s1 that are also in s2,
    in the same order. Each element of s1 appears only once in the result.
    Both s1 and s2 must be sorted.

func IntersectFunc[T any](s1, s2 iter.Seq[T], cmp func(T, T) int) iter.Seq[T]
    IntersectFunc returns an iterator over all elements of s1 that are also in
    s2, in the same order. Each element of s1 appears only once in the result.
    Both s1 and s2 must be sorted according to cmp.

func Merge[T cmp.Ordered](s1, s2 iter.Seq[T]) iter.Seq[T]
    Merge returns an iterator over all elements of s1 and s2, in the same order.
    Both s1 and s2 must be sorted.

func MergeFunc[T any](s1, s2 iter.Seq[T], cmp func(T, T) int) iter.Seq[T]
    MergeFunc returns an iterator over all elements of s1 and s2, in the same
    order. Both s1 and s2 must be sorted according to cmp.

func Subtract[T cmp.Ordered](s1, s2 iter.Seq[T]) iter.Seq[T]
    Subtract returns an iterator over all elements of s1 that are not in s2,
    in the same order. Each element of s1 appears only once in the result.
    Both s1 and s2 must be sorted.

func SubtractFunc[T any](s1, s2 iter.Seq[T], cmp func(T, T) int) iter.Seq[T]
    SubtractFunc returns an iterator over all elements of s1 that are not in s2,
    in the same order. Each element of s1 appears only once in the result.
    Both s1 and s2 must be sorted according to cmp.

func Union[T cmp.Ordered](s1, s2 iter.Seq[T]) iter.Seq[T]
    Union returns an iterator over all elements of s1 and s2, in the same order.
    Each element appears only once in the result. Both s1 and s2 must be sorted.

func UnionFunc[T any](s1, s2 iter.Seq[T], cmp func(T, T) int) iter.Seq[T]
    UnionFunc returns an iterator over all elements of s1 and s2, in the same
    order. Each element appears only once in the result. Both s1 and s2 must be
    sorted according to cmp.

There is a working implementation at github.com/jba/sorted.

@jba jba added the Proposal label Oct 31, 2024
@gopherbot gopherbot added this to the Proposal milestone Oct 31, 2024
@gabyhelp
Copy link

@jimmyfrasche
Copy link
Member

Is there any difference between this Merge and the one in #61898?

If this package gets added, it seems like a good place for the sequence equivalent for Compact: #67441

@jba
Copy link
Contributor Author

jba commented Nov 4, 2024

Is there any difference between this Merge and the one in #61898?

No, I just forgot about the other one.

If this package gets added, it seems like a good place for the sequence equivalent for Compact: #67441

Agreed.

@isgj
Copy link

isgj commented Nov 4, 2024

I agree with the API but shouldn't this package come after #61898 (considering #61898 will go to exp first) ?

IMHO

  1. the operators proposed in the other proposal are more common, used more frequently
  2. the ones here will work only with a subset of sequences

@jub0bs
Copy link

jub0bs commented Nov 7, 2024

I'm guessing that iter/sorted functions don't check that their iterator arguments are sorted, and I'm worried that some people will pass them unsorted iterators; the type system can't help us, in this case, since both sorted and unsorted iterators have type iter.Seq[T].

If this proposal gets accepted, its documentation should encourage other packages that provide sorted iterators to document them as such.

@jba
Copy link
Contributor Author

jba commented Nov 7, 2024

I omitted that from the spec so that the implementation could choose to do it if we wanted it to. We don't want to document that it does, because that would force the functions to run slower.

In general, Go doesn't perform expensive checks like this. For example, sort functions that take a less or cmp function as argument don't check that those functions behave properly.

My hope is that by putting these in a separate package from other iterator functions, users would be reminded of the requirements at the call site: sorted.Union(a, b) and so on.

@Merovius
Copy link
Contributor

Merovius commented Nov 7, 2024

You could also introduce package sorted; type Seq[E any] iter.Seq[E], requiring at least an explicit conversion.

@mateusz834
Copy link
Member

@Merovius sadly you can still pass an func(yield(T) bool) to it without explicit conversion.

@Merovius
Copy link
Contributor

Merovius commented Nov 8, 2024

Yes, I believe most iterators will be iter.Seq[E], though. Note that the general recommendation is to have methods All() iter.Seq[E], not All(yield func(E) bool), for example.

In any case, I’m not saying it’s perfect, or even that we necessarily should do it. But it’s IMO the strongest reasonable thing, has no runtime cost and will catch the overwhelming majority of mistakes.

(also, sorted.Seq[E] IMO reads nicely and it’s more appropriate to have the invariant of being sorted be part of the type than part of the function or package name)

@jub0bs
Copy link

jub0bs commented Nov 8, 2024

Even if we had a sorted.Seq type, sortedness wouldn't be checked at compile time. People could produce sorted.Seq[T] values that are in fact not sorted at all. The value of adding such a type is therefore moot, IMO.

@Merovius
Copy link
Contributor

Merovius commented Nov 8, 2024

You said

I'm guessing that iter/sorted functions don't check that their iterator arguments are sorted, and I'm worried that some people will pass them unsorted iterators; the type system can't help us, in this case, since both sorted and unsorted iterators have type iter.Seq[T].

If this proposal gets accepted, its documentation should encourage other packages that provide sorted iterators to document them as such.

If there is value in such documentation, than the value of sorted.Seq would be stricter higher. In that it serves itself as documentation - and a form of documentation that can't be overlooked. And it is, in fact, a way in which the type system can help us.

I agree that it doesn't provide an absolute guarantee. And I can understand the position, that the value it would provide is not enough to justify its cost. But it seems absurd to me to claim it has none, after saying that a doc-comment has enough.

I was just trying to provide a way in which the type system might actually address your concern.

@jub0bs
Copy link

jub0bs commented Nov 8, 2024

@Merovius

it seems absurd to me to claim it has none, after saying that a doc-comment has enough.

Note that moot != none. I'm not dismissing a hypothetical sorted.Seq type altogether.

If there is value in such documentation, than the value of sorted.Seq would be stricter higher. [...] it is, in fact, a way in which the type system can help us.

Not sure I agree. If anything, a sorted.Seq would represent one more way to potentially "lie" about sortedness (in addition to such statements in doc comments), since a non-sorted iterator could masquerade as a sorted.Seq.

@apparentlymart
Copy link

apparentlymart commented Nov 9, 2024

It is true that sorted.Seq would not do anything to enforce "sortedness" -- "sortedness" isn't really even a single property in general, since of course a "sorted" sequence for one cmp func(T, T) int is not necessarily sorted for another.

If there were some way to capture the specific sort order itself as part of the type or value then that would be interesting, but I can't really think of any good way to do that with Go's current type system:

  • encoding it as part of the value would require some way to check whether a s1 and s2 both have the same cmp, which isn't possible because we can't compare two function pointers for meaningful identity.
  • encoding it as part of the type would require each comparison function to be a separate type -- presumably a single-method interface instead of just a plain function pointer -- and that seems like a considerable ergonomic cost.

While I do sympathize with the idea that types often represent a claim about what something is or how it behaves rather than an enforcement of that (e.g. sort.Interface describes a set of requirements for Less but cannot enforce that they are followed), in this particular case I don't think "this sequence is sorted in some unspecified way" alone is a specific enough claim for the type to be useful as a way to discourage incorrect usage. 😖

@Merovius
Copy link
Contributor

Merovius commented Nov 9, 2024

I find myself defending an idea I don't even believe that strongly in, but I just find the counter-arguments kind of confusing, TBH. This is a proposal for a package sorted, specifically described as "functions on sorted iterators". How is everything you are saying not an argument against this proposal, then? If "being sorted" is not something you can reasonable say about an iterator?

@jub0bs
Copy link

jub0bs commented Nov 9, 2024

@Merovius I'm not sure who you're replying to, but I never claimed that

"being sorted" is not something you can reasonable say about an iterator

I only claim that you cannot verify that at compile time. Therefore, there is room for misuse (passing a non-sorted iterator where a sorted one is expected). And I doubt that a sorted.Seq type would be effective at preventing misuse. That is all.

@apparentlymart
Copy link

apparentlymart commented Nov 9, 2024

I apologize if I misunderstood the intent. My assumption was that the goal of introducing a new type to represent "sortedness" was to make it a compile-time error to use a sequence that isn't correctly sorted, and so I was trying to find a way to solve for some way that an API author can encode which specific order they used as part of the type, to reduce the likelihood of using a sequence that is sorted by the wrong comparison function.

At the risk of taking this to an unreasonable extreme, it seems to me that any sequence can be said to be "sorted in some way". What these functions seem to require for correct usage is that they be sorted in some specific way -- the way described by the cmp callback, and/or the same way as the other given sequence -- and it was that which I was trying (and failing) to find a way to check at compile time while still keeping it ergonomic to use.

@Merovius
Copy link
Contributor

Merovius commented Nov 9, 2024

@apparentlymart I understood what you where saying. But not, why the same can't be said about a sorted package in general. If we are nitpicking what a "sorted iterator" means, why are we even talking about a package for functions that deal with sorted iterators?

solve for some way that an API author can encode which specific order they used as part of the type, to reduce the likelihood of using a sequence that is sorted by the wrong comparison function.

That is easy: Use methods, instead of passing functions. And yes, I understand the reasons against that as well. Just pointing out again, that there are ways that the type system can address at least some of the concerns y'all bring up, if there is an actual interest in solving them. Here is an API that both prevents accidentally passing in a sequence that the programmer does not know is sorted and makes sure that the sorting is consistent:

package sorted

type Comparer[T any] interface { Compare(T) int }

type Seq[E Comparer[E]] iter.Seq[E]

func Union[E Comparer[E]](s1, s2 Seq[E]) Seq[E]

Or we can go even one step further than that, by actually preventing you from using a func(func(E) bool):

type Seq[E Comparer[E]] struct{ s iter.Seq[E] }

// Make makes a Seq that checks at runtime, that it is actually sorted. If it is not, iterating over it panics.
func Make[E Comparer[E]](s iter.Seq[E]) Seq[E] {
    return Seq[E]{func(yield func(E) bool) {
        var (
            last E
            haveLast bool
        )
        for e := range s {
            if haveLast {
                if last.Compare(e) > 0 {
                    panic("sequence not sorted")
                }
            }
            if !yield(e) { return }
            last, haveLast = e, true
        }
    }}
}

// UnsafeMake asserts that s is sorted, without actually providing any guarantee.
func UnsafeMake[E Comparer[E]](s iter.Seq[E]) Seq[E] {
    return Seq[E]{s}
}

If you are concerned about misuse, we can talk about ways to prevent that misuse. Or we can say "we trust the programmer to be sensible". That's what we do for slices.BinarySearchFunc. Or we can say "if there is no guarantee that misuse is prevented, let's not add nice things at all".

To me, type Seq[E any] iter.Seq[E] would be a reasonable compromise: It prevents probably most of the accidents, while not introducing terrible burdens on the programmer. But I'm not sure even that is necessary, given that we are fine with slices.BinarySearchFunc .

@Merovius
Copy link
Contributor

Merovius commented Nov 9, 2024

By analogy with slices.BinarySerach{,Func}, an argument could also be made to just put these functions directly into iter. Or that we should have added a package slices/sorted as well.

@apparentlymart
Copy link

To be clear, I would prefer to get some more compile-time safety here, and only discarded the idea of using a single-method interface because I recall from earlier discussions about package cmp that the consensus was to prioritize the ergonomics of using a single inline closure.

I suppose the way I'd sum up what I was trying to get at is: I would be in favor of a design that makes the comparer part of the type, such as what you've just proposed, but with a sorted.Seq that has no additional type parameters over iter.Seq all we'd be able to represent is "this sequence is in some unspecified order" and that doesn't seem like enough of a safety improvement over iter.Seq to be worth the additional API complexity.

I assumed, perhaps incorrectly, that consensus would not favor using a single-method interface for the comparers, even though it would in principle allow that additional safety, based on earlier discussions in this area.

(FWIW I also think it would be just fine to put these in the main package iter, rather than in a separate package. I can't think of many other examples of using a package to represent that its symbols some special usage requirement, rather than just using documentation.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests

9 participants