Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add is_sorted to the standard library #2351

Merged
merged 10 commits into from
Aug 19, 2018
272 changes: 272 additions & 0 deletions text/0000-is-sorted.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,272 @@
- Feature Name: is_sorted
- Start Date: 2018-02-24
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)

# Summary
[summary]: #summary

Add the methods `is_sorted`, `is_sorted_by` and `is_sorted_by_key` to `[T]` and
`Iterator`.

# Motivation
[motivation]: #motivation

In quite a few situations, one needs to check whether a sequence of elements
is sorted. The most important use cases are probably **unit tests** and
**pre-/post-condition checks**.

The lack of an `is_sorted()` function in Rust's standard library has led to
[countless programmers implementing their own](https://github.com/search?l=Rust&q=%22fn+is_sorted%22&type=Code&utf8=%E2%9C%93).
While it is possible to write a one-liner using iterators (e.g.
`(0..arr.len() - 1).all(|i| arr[i] < arr[i + 1])`), it is still unnecessary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a <= (thanks @orlp)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my secret plan: hide a subtle bug in the "motivation" section that no one notices for months. Now that it has been discovered, it's the perfect motivation that is_sorted is certainly needed! Muhaha!

In all seriousness, it happens all the time.

Will fix this bug in the RFC later. It's blocked right now anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping, given FCP :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, I'd almost love to see a section in the motivation showcasing just how easy it is to make this mistake!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ExpHP Great idea, done!

overhead while writing *and* reading the code.

In [the corresponding issue on the main repository](https://github.com/rust-lang/rust/issues/44370)
(from which I will reference a few comments) everyone seems to agree on the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/from which I will reference a few comments/from which a few comments are referenced/

basic premise: we want such a function.

Having `is_sorted()` and friends in the standard library would:
- prevent people from spending time on writing their own,
- improve readbility of the code by clearly showing the author's intent,
- and encourage to write more unit tests and/or pre-/post-condition checks.

Another proof of this functions' usefulness is the inclusion in the
standard library of many other languages:
C++'s [`std::is_sorted`](http://en.cppreference.com/w/cpp/algorithm/is_sorted),
Go's [`sort.IsSorted`](https://golang.org/pkg/sort/#IsSorted),
D's [`std.algorithm.sorting.is_sorted`](https://dlang.org/library/std/algorithm/sorting/is_sorted.html)
and others. (Curiously, many (mostly) more high-level programming language –
like Ruby, Javascript, Java, Haskell and Python – seem to lack such a function.)


# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

Possible documentation of the three new methods of `Iterator`:

> ```rust
> fn is_sorted(self) -> bool
> where
> Self::Item: Ord,
Copy link

@ExpHP ExpHP Feb 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why Ord instead of PartialOrd? Testing if a <= b <= c <= ... <= z is a perfectly well-defined operation on a partially-ordered set, and to give floating point numbers the shaft yet again even for something as innocent as this just seems spiteful.

(note: I'm anthropomorphizing the std lib here; not calling you spiteful)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, sorry for not properly thinking about that beforehand. So the comparator for the _by versions may also return an Option<Ordering>, right? And how do we want to handle a None then? I might be too sleepy think... but it's not immediately obvious whether [1.0, NaN, 2.0] should be considered ordered or not, right?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. This is a predicament.

If this existed in a vacuum, then I would say the comparator should produce bool, because an <= operation really is most natural definition for a partial ordering. Option<Ordering> seems to me to be a sort of an allowance to make deriving Ord from PartialOrd possible.

But obviously, this does not exist in a vacuum, and so a bool comparitor could create unnecessary confusion for users. So yes, I suppose we should try Option<Ordering>.


Re: [1.0, NaN, 2.0], I am about to post a much larger comment on this, but in summary, I am of the stance that there is only one sensible implementation of is_sorted on a partially ordered set, and this implementation returns false for a sequence like [1.0, NaN, 2.0].

> ```
> Checks if the elements of this iterator are sorted.
>
> That is, for each element `a` and its following element `b`, `a <= b`
> must hold. If the iterator yields exactly zero or one element, `true`
> is returned.
>
> ## Example
>
> ```rust
> assert!([1, 2, 2, 9].iter().is_sorted());
> assert!(![1, 3, 2, 4).iter().is_sorted());
> assert!([0].iter().is_sorted());
> assert!(std::iter::empty::<i32>().is_sorted());
> ```
> ---
>
> ```rust
> fn is_sorted_by<F>(self, compare: F) -> bool
> where
> F: FnMut(&Self::Item, &Self::Item) -> Ordering,
> ```
> Checks if the elements of this iterator are sorted using the given
> comparator function.
>
> Instead of using `Ord::cmp`, this function uses the given `compare`
> function to determine the ordering of two elements. Apart from that,
> it's equivalent to `is_sorted`; see its documentation for more
> information.
>
> ---
>
> ```rust
> fn is_sorted_by_key<F, K>(self, f: F) -> bool
> where
> F: FnMut(&Self::Item) -> K,
> K: Ord,
> ```
> Checks if the elements of this iterator are sorted using the given
> key extraction function.
>
> Instead of comparing the iterator's elements directly, this function
> compares the keys of the elements, as determined by `f`. Apart from
> that, it's equivalent to `is_sorted`; see its documentation for more
> information.
>
> ## Example
>
> ```rust
> assert!(["c", "bb", "aaa"].iter().is_sorted_by_key(|s| s.len()));
> assert!(![-2i32, -1, 0, 3].iter().is_sorted_by_key(|n| n.abs()));
> ```

The methods for `[T]` will have analogous documentations.

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

This RFC proposes to add the following three methods to `[T]` (slices) and `Iterator`:

```rust
impl<T> [T] {
fn is_sorted(&self) -> bool
where
T: Ord,
{ ... }

fn is_sorted_by<F>(&self, compare: F) -> bool
where
F: FnMut(&T, &T) -> Ordering,
{ ... }

fn is_sorted_by_key<F, K>(&self, f: F) -> bool
where
F: FnMut(&T) -> K,
K: Ord,
{ ... }
}

trait Iterator {
fn is_sorted(self) -> bool
where
Self::Item: Ord,
{ ... }

fn is_sorted_by<F>(mut self, compare: F) -> bool
where
F: FnMut(&Self::Item, &Self::Item) -> Ordering,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use case for FnMut here as opposed to Fn?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FnMut is less restrictive than Fn in where-clauses

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am aware. But just because you can do something doesn't mean you ought to do something. So I'm wondering if there are any good reasons why you should be able to pass in a stateful closure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay. But this would apply to almost all libstd functions that take closures, then. I don't think any method uses an Fn bound when FnMut suffices. This also applies to all sort methods we already have, so consistency is another argument for FnMut.

Copy link
Member Author

@LukasKalbertodt LukasKalbertodt Feb 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's exactly the reason I chose FnMut: it's everywhere already. In particular: [T]::sort_by_key.

Do you think I should mention this in the RFC? Maybe that would be a good idea to prevent future readers from asking the same question about FnMut...

Oh, and just for protocol: I wondered about the usage of FnMuts, too. Exactly for the "should be stateless" reason you, @Centril, mentioned.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I knew about the consistency reason beforehand, but I think that we should decide on this with a rationale more than "this is what we've done in other places", unless there's some other place where rationale has been given(?).

Copy link

@ExpHP ExpHP Feb 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Centril why should a function deliberately choose a more restrictive Fn than it needs, unless it wants to open itself to implementation changes that may require those restrictions?

The rules for picking Fn traits are simple:

  • If you only need to call it once, ask for FnOnce.
  • If you call it multiple times but do not require reentrancy, ask for FnMut.
  • If you need to call it concurrently, ask for Fn.

...and I rather doubt the stdlib needs to keep itself open to the possibility of changing this function to run in multiple threads.

By requiring Fn you make it difficult for the user to do things like caching and memoization, forcing people to use RefCell and thereby deferring compile-time borrow checks to runtime. There is no upside.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ExpHP great arguments, so FnMut it is.

{ ... }

fn is_sorted_by_key<F, K>(self, mut f: F) -> bool
where
F: FnMut(&Self::Item) -> K,
K: Ord,
{ ... }
}
```

In addition to the changes shown above, the three methods should also be added
to `core::slice::SliceExt` as they don't require heap allocations.

To repeat the exact semantics from the prior section: the methods return `true`
if and only if for each element `a` and its following element `b`, the
condition `a <= b` holds. For slices/iterators with zero or one element, `true`
is returned.

A note about implementation: it's sufficient to only do real work in
`Iterator::is_sorted_by`. All other methods can simply be implemented by
(directly or indirectly) using `Iterator::is_sorted_by`. A sample
implementation can be found [here](https://play.rust-lang.org/?gist=8ad7ece1c99e68ee2f5c3ec2de7767b2&version=stable).


# Drawbacks
[drawbacks]: #drawbacks

It increases the size of the standard library by a tiny bit.

# Rationale and alternatives
[alternatives]: #alternatives

### Only add the three methods to `Iterator`, but not to `[T]`
Without `is_sorted()` defined for slices directly, one can still fairly easily
test if a slice is sorted by obtaining an iterator via `iter()`. So instead of
`v.is_sorted()`, one would need to write `v.iter().is_sorted()`.

This always works for `is_sorted()` because of the `Ord` blanket impl which
implements `Ord` for all references to an `Ord` type. For `is_sorted_by` and
`is_sorted_by_key` it would introduce an additional reference to the closures'
arguments (i.e. `v.iter().is_sorted_by_key(|x| ...))` where `x` is `&&T`).

While these two inconveniences are not deal-breakers, being able to call those
three methods on slices (and all `Deref<Target=[T]>` types) directly, could be
favourable for many programmers (especially given the popularity of slice-like
data structures, like `Vec<T>`). Additionally, the `sort` method and friends
are defined for slices, thus one might expect the `is_sorted()` method there,
too.


### Add the three methods to additional data structures (like `LinkedList`) as well
Adding these methods to every data structure in the standard libary is a lot of
duplicate code. Optimally, we would have a trait that represents sequential
data structures and would only add `is_sorted` and friends to said trait. We
don't have such a trait as of now; so `Iterator` is the next best thing. Slices
deserve special treatment due to the reasons mentioned above (popularity and
`sort()`).


### `Iterator::while_sorted`, `is_sorted_until`, `sorted_prefix`, `num_sorted`, ...
[In the issue on the main repository](https://github.com/rust-lang/rust/issues/44370),
concerns about completely consuming the iterator were raised. Some alternatives,
such as [`while_sorted`](https://github.com/rust-lang/rust/issues/44370#issuecomment-327873139),
were suggested. However, consuming the iterator is neither uncommon nor a
problem. Methods like `count()`, `max()` and many more consume the iterator,
too. [One comment](https://github.com/rust-lang/rust/issues/44370#issuecomment-344516366) mentions:

> I am a bit skeptical of the equivalent on Iterator just because the return
> value does not seem actionable -- you aren't going to "sort" the iterator
> after you find out it is not already sorted. What are some use cases for this
> in real code that does not involve iterating over a slice?

As mentioned above, `Iterator` is the next best thing to a trait representing
sequential data structures. So to check if a `LinkedList`, `VecDeque` or
another sequential data structure is sorted, one would simply call
`collection.iter().is_sorted()`. It's likely that this is the main usage for
`Iterator`'s `is_sorted` methods. Additionally, code like
`if v.is_sorted() { v.sort(); }` is not very useful: `sort()` already runs in
O(n) for already sorted arrays.

Suggestions like `is_sorted_until` are not really useful either: one can easily
get a subslice or a part of an iterator (via `.take()`) and call `is_sorted()`
on that part.


# Unresolved questions
[unresolved]: #unresolved-questions


### Is `Iterator::is_sorted_by_key` useless?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so, precisely for the reason you mention.

We already have quite many methods in Iterator and I think that additions should only be made if they increase readability and helps you express things which are quite tedious and unidiomatic to express otherwise. In this case I think iter.map(extract).is_sorted() is better than iter.is_sorted_by_key(extract).

If there is popular demand for this later we can always add it then, but for now I think it is premature.


[One comment in the corresponding issue](https://github.com/rust-lang/rust/issues/44370#issuecomment-327740685)
mentions that `Iterator::is_sorted_by_key` is not really necessary, given that
you can simply call `map()` beforehand. This is true, but it might still be
favourable to include said function for consistency and ease of use. The
standard library already hosts a number of sorting-related functions all of
which come in three flavours: *raw*, `_by` and `_by_key`. By now, programmers would
probably expect there to be an `is_sorted_by_key` as well.


### Add `std::cmp::is_sorted` instead

As suggested [here](https://github.com/rust-lang/rust/issues/44370#issuecomment-345495831),
one could also add this free function (plus the `_by` and `_by_key` versions)
to `std::cmp`:

```rust
fn is_sorted<C>(collection: C) -> bool
where
C: IntoIterator,
C::Item: Ord,
```

This can be seen as a better design as it avoids the question about which data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps if Rust has a built in operator for function composition and made point-free notation easy, but that is not the case, so let's keep the RFC as is.

structure should get `is_sorted` methods. However, it might have the
disadvantage of being less discoverable and also less convenient (long path or
import).


### About the closure of `Iterator::is_sorted_by_key`

The method `Iterator::is_sorted_by_key` as proposed above takes a closure
`F: FnMut(&Self::Item) -> K`. Since the iterator is consumed and – in theory –
one only needs to extract the key once per element, the closure could take
`Self::Item` by value instead of by reference. It is not immediately clear,
whether this would have any real advantages.

It has the disadvantage of being a bit special: `is_sorted_by`'s closure *has
to* receive its arguments by reference, as do the closures of `[T]::is_sorted_by`
and `[T]::is_sorted_by_key`. Additionally, when taking `Self::Item` by value,
one can no longer implement `Iterator::is_sorted_by_key` with
`Iterator::is_sorted_by` but would have to write a new implementation, taking
care to call the key extraction method only once for each element.