-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Add is_sorted
to the standard library
#2351
Changes from 3 commits
6ed2eb4
04e6c33
65199fa
77310a3
1a4bfdc
22d12d9
85c3b5f
970ef5c
e35ec18
1aa75ab
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,272 @@ | ||
- Feature Name: is_sorted | ||
- Start Date: 2018-02-24 | ||
- RFC PR: (leave this empty) | ||
- Rust Issue: (leave this empty) | ||
|
||
# Summary | ||
[summary]: #summary | ||
|
||
Add the methods `is_sorted`, `is_sorted_by` and `is_sorted_by_key` to `[T]` and | ||
`Iterator`. | ||
|
||
# Motivation | ||
[motivation]: #motivation | ||
|
||
In quite a few situations, one needs to check whether a sequence of elements | ||
is sorted. The most important use cases are probably **unit tests** and | ||
**pre-/post-condition checks**. | ||
|
||
The lack of an `is_sorted()` function in Rust's standard library has led to | ||
[countless programmers implementing their own](https://github.com/search?l=Rust&q=%22fn+is_sorted%22&type=Code&utf8=%E2%9C%93). | ||
While it is possible to write a one-liner using iterators (e.g. | ||
`(0..arr.len() - 1).all(|i| arr[i] < arr[i + 1])`), it is still unnecessary | ||
overhead while writing *and* reading the code. | ||
|
||
In [the corresponding issue on the main repository](https://github.com/rust-lang/rust/issues/44370) | ||
(from which I will reference a few comments) everyone seems to agree on the | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. s/from which I will reference a few comments/from which a few comments are referenced/ |
||
basic premise: we want such a function. | ||
|
||
Having `is_sorted()` and friends in the standard library would: | ||
- prevent people from spending time on writing their own, | ||
- improve readbility of the code by clearly showing the author's intent, | ||
- and encourage to write more unit tests and/or pre-/post-condition checks. | ||
|
||
Another proof of this functions' usefulness is the inclusion in the | ||
standard library of many other languages: | ||
C++'s [`std::is_sorted`](http://en.cppreference.com/w/cpp/algorithm/is_sorted), | ||
Go's [`sort.IsSorted`](https://golang.org/pkg/sort/#IsSorted), | ||
D's [`std.algorithm.sorting.is_sorted`](https://dlang.org/library/std/algorithm/sorting/is_sorted.html) | ||
and others. (Curiously, many (mostly) more high-level programming language – | ||
like Ruby, Javascript, Java, Haskell and Python – seem to lack such a function.) | ||
|
||
|
||
# Guide-level explanation | ||
[guide-level-explanation]: #guide-level-explanation | ||
|
||
Possible documentation of the three new methods of `Iterator`: | ||
|
||
> ```rust | ||
> fn is_sorted(self) -> bool | ||
> where | ||
> Self::Item: Ord, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why (note: I'm anthropomorphizing the std lib here; not calling you spiteful) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point, sorry for not properly thinking about that beforehand. So the comparator for the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hm. This is a predicament. If this existed in a vacuum, then I would say the comparator should produce But obviously, this does not exist in a vacuum, and so a bool comparitor could create unnecessary confusion for users. So yes, I suppose we should try Re: |
||
> ``` | ||
> Checks if the elements of this iterator are sorted. | ||
> | ||
> That is, for each element `a` and its following element `b`, `a <= b` | ||
> must hold. If the iterator yields exactly zero or one element, `true` | ||
> is returned. | ||
> | ||
> ## Example | ||
> | ||
> ```rust | ||
> assert!([1, 2, 2, 9].iter().is_sorted()); | ||
> assert!(![1, 3, 2, 4).iter().is_sorted()); | ||
> assert!([0].iter().is_sorted()); | ||
> assert!(std::iter::empty::<i32>().is_sorted()); | ||
> ``` | ||
> --- | ||
> | ||
> ```rust | ||
> fn is_sorted_by<F>(self, compare: F) -> bool | ||
> where | ||
> F: FnMut(&Self::Item, &Self::Item) -> Ordering, | ||
> ``` | ||
> Checks if the elements of this iterator are sorted using the given | ||
> comparator function. | ||
> | ||
> Instead of using `Ord::cmp`, this function uses the given `compare` | ||
> function to determine the ordering of two elements. Apart from that, | ||
> it's equivalent to `is_sorted`; see its documentation for more | ||
> information. | ||
> | ||
> --- | ||
> | ||
> ```rust | ||
> fn is_sorted_by_key<F, K>(self, f: F) -> bool | ||
> where | ||
> F: FnMut(&Self::Item) -> K, | ||
> K: Ord, | ||
> ``` | ||
> Checks if the elements of this iterator are sorted using the given | ||
> key extraction function. | ||
> | ||
> Instead of comparing the iterator's elements directly, this function | ||
> compares the keys of the elements, as determined by `f`. Apart from | ||
> that, it's equivalent to `is_sorted`; see its documentation for more | ||
> information. | ||
> | ||
> ## Example | ||
> | ||
> ```rust | ||
> assert!(["c", "bb", "aaa"].iter().is_sorted_by_key(|s| s.len())); | ||
> assert!(![-2i32, -1, 0, 3].iter().is_sorted_by_key(|n| n.abs())); | ||
> ``` | ||
|
||
The methods for `[T]` will have analogous documentations. | ||
|
||
# Reference-level explanation | ||
[reference-level-explanation]: #reference-level-explanation | ||
|
||
This RFC proposes to add the following three methods to `[T]` (slices) and `Iterator`: | ||
|
||
```rust | ||
impl<T> [T] { | ||
fn is_sorted(&self) -> bool | ||
where | ||
T: Ord, | ||
{ ... } | ||
|
||
fn is_sorted_by<F>(&self, compare: F) -> bool | ||
where | ||
F: FnMut(&T, &T) -> Ordering, | ||
{ ... } | ||
|
||
fn is_sorted_by_key<F, K>(&self, f: F) -> bool | ||
where | ||
F: FnMut(&T) -> K, | ||
K: Ord, | ||
{ ... } | ||
} | ||
|
||
trait Iterator { | ||
fn is_sorted(self) -> bool | ||
where | ||
Self::Item: Ord, | ||
{ ... } | ||
|
||
fn is_sorted_by<F>(mut self, compare: F) -> bool | ||
where | ||
F: FnMut(&Self::Item, &Self::Item) -> Ordering, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the use case for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I am aware. But just because you can do something doesn't mean you ought to do something. So I'm wondering if there are any good reasons why you should be able to pass in a stateful closure. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, okay. But this would apply to almost all libstd functions that take closures, then. I don't think any method uses an There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, that's exactly the reason I chose Do you think I should mention this in the RFC? Maybe that would be a good idea to prevent future readers from asking the same question about Oh, and just for protocol: I wondered about the usage of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So I knew about the consistency reason beforehand, but I think that we should decide on this with a rationale more than "this is what we've done in other places", unless there's some other place where rationale has been given(?). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Centril why should a function deliberately choose a more restrictive The rules for picking Fn traits are simple:
...and I rather doubt the stdlib needs to keep itself open to the possibility of changing this function to run in multiple threads. By requiring There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @ExpHP great arguments, so FnMut it is. |
||
{ ... } | ||
|
||
fn is_sorted_by_key<F, K>(self, mut f: F) -> bool | ||
where | ||
F: FnMut(&Self::Item) -> K, | ||
K: Ord, | ||
{ ... } | ||
} | ||
``` | ||
|
||
In addition to the changes shown above, the three methods should also be added | ||
to `core::slice::SliceExt` as they don't require heap allocations. | ||
|
||
To repeat the exact semantics from the prior section: the methods return `true` | ||
if and only if for each element `a` and its following element `b`, the | ||
condition `a <= b` holds. For slices/iterators with zero or one element, `true` | ||
is returned. | ||
|
||
A note about implementation: it's sufficient to only do real work in | ||
`Iterator::is_sorted_by`. All other methods can simply be implemented by | ||
(directly or indirectly) using `Iterator::is_sorted_by`. A sample | ||
implementation can be found [here](https://play.rust-lang.org/?gist=8ad7ece1c99e68ee2f5c3ec2de7767b2&version=stable). | ||
|
||
|
||
# Drawbacks | ||
[drawbacks]: #drawbacks | ||
|
||
It increases the size of the standard library by a tiny bit. | ||
|
||
# Rationale and alternatives | ||
[alternatives]: #alternatives | ||
|
||
### Only add the three methods to `Iterator`, but not to `[T]` | ||
Without `is_sorted()` defined for slices directly, one can still fairly easily | ||
test if a slice is sorted by obtaining an iterator via `iter()`. So instead of | ||
`v.is_sorted()`, one would need to write `v.iter().is_sorted()`. | ||
|
||
This always works for `is_sorted()` because of the `Ord` blanket impl which | ||
implements `Ord` for all references to an `Ord` type. For `is_sorted_by` and | ||
`is_sorted_by_key` it would introduce an additional reference to the closures' | ||
arguments (i.e. `v.iter().is_sorted_by_key(|x| ...))` where `x` is `&&T`). | ||
|
||
While these two inconveniences are not deal-breakers, being able to call those | ||
three methods on slices (and all `Deref<Target=[T]>` types) directly, could be | ||
favourable for many programmers (especially given the popularity of slice-like | ||
data structures, like `Vec<T>`). Additionally, the `sort` method and friends | ||
are defined for slices, thus one might expect the `is_sorted()` method there, | ||
too. | ||
|
||
|
||
### Add the three methods to additional data structures (like `LinkedList`) as well | ||
Adding these methods to every data structure in the standard libary is a lot of | ||
duplicate code. Optimally, we would have a trait that represents sequential | ||
data structures and would only add `is_sorted` and friends to said trait. We | ||
don't have such a trait as of now; so `Iterator` is the next best thing. Slices | ||
deserve special treatment due to the reasons mentioned above (popularity and | ||
`sort()`). | ||
|
||
|
||
### `Iterator::while_sorted`, `is_sorted_until`, `sorted_prefix`, `num_sorted`, ... | ||
[In the issue on the main repository](https://github.com/rust-lang/rust/issues/44370), | ||
concerns about completely consuming the iterator were raised. Some alternatives, | ||
such as [`while_sorted`](https://github.com/rust-lang/rust/issues/44370#issuecomment-327873139), | ||
were suggested. However, consuming the iterator is neither uncommon nor a | ||
problem. Methods like `count()`, `max()` and many more consume the iterator, | ||
too. [One comment](https://github.com/rust-lang/rust/issues/44370#issuecomment-344516366) mentions: | ||
|
||
> I am a bit skeptical of the equivalent on Iterator just because the return | ||
> value does not seem actionable -- you aren't going to "sort" the iterator | ||
> after you find out it is not already sorted. What are some use cases for this | ||
> in real code that does not involve iterating over a slice? | ||
|
||
As mentioned above, `Iterator` is the next best thing to a trait representing | ||
sequential data structures. So to check if a `LinkedList`, `VecDeque` or | ||
another sequential data structure is sorted, one would simply call | ||
`collection.iter().is_sorted()`. It's likely that this is the main usage for | ||
`Iterator`'s `is_sorted` methods. Additionally, code like | ||
`if v.is_sorted() { v.sort(); }` is not very useful: `sort()` already runs in | ||
O(n) for already sorted arrays. | ||
|
||
Suggestions like `is_sorted_until` are not really useful either: one can easily | ||
get a subslice or a part of an iterator (via `.take()`) and call `is_sorted()` | ||
on that part. | ||
|
||
|
||
# Unresolved questions | ||
[unresolved]: #unresolved-questions | ||
|
||
|
||
### Is `Iterator::is_sorted_by_key` useless? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I think so, precisely for the reason you mention. We already have quite many methods in If there is popular demand for this later we can always add it then, but for now I think it is premature. |
||
|
||
[One comment in the corresponding issue](https://github.com/rust-lang/rust/issues/44370#issuecomment-327740685) | ||
mentions that `Iterator::is_sorted_by_key` is not really necessary, given that | ||
you can simply call `map()` beforehand. This is true, but it might still be | ||
favourable to include said function for consistency and ease of use. The | ||
standard library already hosts a number of sorting-related functions all of | ||
which come in three flavours: *raw*, `_by` and `_by_key`. By now, programmers would | ||
probably expect there to be an `is_sorted_by_key` as well. | ||
|
||
|
||
### Add `std::cmp::is_sorted` instead | ||
|
||
As suggested [here](https://github.com/rust-lang/rust/issues/44370#issuecomment-345495831), | ||
one could also add this free function (plus the `_by` and `_by_key` versions) | ||
to `std::cmp`: | ||
|
||
```rust | ||
fn is_sorted<C>(collection: C) -> bool | ||
where | ||
C: IntoIterator, | ||
C::Item: Ord, | ||
``` | ||
|
||
This can be seen as a better design as it avoids the question about which data | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps if Rust has a built in operator for function composition and made point-free notation easy, but that is not the case, so let's keep the RFC as is. |
||
structure should get `is_sorted` methods. However, it might have the | ||
disadvantage of being less discoverable and also less convenient (long path or | ||
import). | ||
|
||
|
||
### About the closure of `Iterator::is_sorted_by_key` | ||
|
||
The method `Iterator::is_sorted_by_key` as proposed above takes a closure | ||
`F: FnMut(&Self::Item) -> K`. Since the iterator is consumed and – in theory – | ||
one only needs to extract the key once per element, the closure could take | ||
`Self::Item` by value instead of by reference. It is not immediately clear, | ||
whether this would have any real advantages. | ||
|
||
It has the disadvantage of being a bit special: `is_sorted_by`'s closure *has | ||
to* receive its arguments by reference, as do the closures of `[T]::is_sorted_by` | ||
and `[T]::is_sorted_by_key`. Additionally, when taking `Self::Item` by value, | ||
one can no longer implement `Iterator::is_sorted_by_key` with | ||
`Iterator::is_sorted_by` but would have to write a new implementation, taking | ||
care to call the key extraction method only once for each element. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a
<=
(thanks @orlp)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my secret plan: hide a subtle bug in the "motivation" section that no one notices for months. Now that it has been discovered, it's the perfect motivation that
is_sorted
is certainly needed! Muhaha!In all seriousness, it happens all the time.
Will fix this bug in the RFC later. It's blocked right now anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping, given FCP :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, I'd almost love to see a section in the motivation showcasing just how easy it is to make this mistake!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ExpHP Great idea, done!