Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

paper cut: is a slice / Iterator sorted? #44370

Closed
gnzlbg opened this issue Sep 6, 2017 · 21 comments

Comments

Projects
None yet
@gnzlbg
Copy link
Contributor

commented Sep 6, 2017

I just used binary_search inside a loop and realized that it would be nice to check before the loop if my immutable slice was sorted. So I spent 10 minutes going through the standard library just to find out that there aren't any is_sorted, is_sorted_by, is_sorted_by_key, etc. algorithms implemented for neither slices nor iterators.

We should add:

  • Iterator::is_sorted
  • Iterator::is_sorted_by
  • Iterator::is_sorted_by_key

and since Iterator is implemented for slices, we can add those methods to slices as well by just forwarding to the Iterator implementation.

@gnzlbg gnzlbg changed the title paper cut: check if a slice is sorted before using binary search paper cut: is a slice / Iterator sorted? Sep 6, 2017

@arielb1

This comment has been minimized.

Copy link
Contributor

commented Sep 6, 2017

We also shouldn't change it to be O(n) on debug-assertions builds - debug-assertions should be for the last 2x of performance, not for potential order-of-magnitude slowdowns.

@gnzlbg

This comment has been minimized.

Copy link
Contributor Author

commented Sep 6, 2017

@arielb1 sorry this issue turned out to be two issues, I've split that part to #44371.

@leonardo-m

This comment has been minimized.

Copy link

commented Sep 6, 2017

In the D language standard library there is also a wrapper for sorted iterables (that performs statistical tests inside its constructor), with it you can create functions that document in the code that return/accept sorted iterables.

@bluss

This comment has been minimized.

Copy link
Contributor

commented Sep 6, 2017

and since Iterator is implemented for slices those methods should work on slices as well.

Just to help clarify here: slices are iterable (IntoIterator for &[T], &mut [T]), but not themselves iterators.

@gnzlbg

This comment has been minimized.

Copy link
Contributor Author

commented Sep 7, 2017

@bluss what I meant is that we should probably implement those methods on slices as well, but that implementation should just be forwarded to the Iterator version. I've reworded the issue.

@Thiez

This comment has been minimized.

Copy link
Contributor

commented Sep 7, 2017

How would it make sense for this operation to be defined on an iterator? It would have to consume the entire iterator to get the result, leaving you only with the information that the iterator that you now can't use anymore would indeed have been sorted. .is_sorted_by_key(select_the_key) seems particularly redundant, given that it would just be .map(select_the_key).is_sorted().

@gnzlbg

This comment has been minimized.

Copy link
Contributor Author

commented Sep 7, 2017

It would have to consume the entire iterator to get the result,

Of course.

leaving you only with the information that the iterator that you now can't use anymore would indeed have been sorted.

There are many useful algorithms that consume an iterator. Iterators don't necessarily own the items; some iterators do, some don't. If you don't want the items to get lost then don't consume an iterator that owns them. You can either copy/clone the iterator before you consume it, or you can create it again. For example, given:

let v: Vec<u32> = vec![0, 1, 2, 3, 4, 5];

then this works

if v.iter().is_sorted() {
  for i in v.iter().map(|i| i* 2) {
    println!("{}", i);
  }
} 

but this won't compile

if v.into_iter().is_sorted() { // vec is consumed here
  for i in v.iter().map(|i| i * 2) {
    println!("{}", i);
  }
} 

Given that iterating over an Iterator consumes the iterator:

let v: Vec<u32> = vec![0, 1, 2, 3, 4, 5];
let it = v.iter().map(|i| i* 2);
for i in it { println!("{}", i); }  // it is moved here
for i in it { println!("{}", i); }  // ERROR: it was already moved

and that is_sorted must, by necessity, look at each element on the sequence, maybe you could elaborate on why you think that consuming an iterator in this algorithm is bad, and suggest what could be done about it instead?

.is_sorted_by_key(select_the_key) seems particularly redundant, given that it would just be .map(select_the_key).is_sorted().

I've added them for both convenience and consistency. The stable and unstable sort, sort_by, and sort_by_key functions are redundant as well, yet they are very convenient. To be consistent with the sort algorithms, I think is_sorted should provide these alternatives as well. However, I also think that having them is convenient.

@the8472

This comment has been minimized.

Copy link

commented Sep 7, 2017

How about a non-terminal .while_sorted() operation? All it would have to do is compare adjacent elements and stop iterating when they are not monotonic. .require_sorted() could panic instead. Maybe this could be generalized into .while_pair(|a,b| ...) and put into itertools?

@gnzlbg

This comment has been minimized.

Copy link
Contributor Author

commented Sep 7, 2017

@the8472 itertools already has .tuple_window which combined with take_while is more generic than .while_pair.

Anyhow, I am neither in favour nor opposed to doing that, but I think that discussion should belong in a different issue.

This issue is about std neither checking the precondition of binary_search nor providing users the algorithm to check it themselves.

There is some prior art for is_sorted in other standard libraries, e.g., in C++, although there it is implemented on top of is_sorted_until.

@the8472

This comment has been minimized.

Copy link

commented Sep 7, 2017

is_sorted_until seems like a good model. In the rust case that could be sorted_prefix() -> (impl Iterator, impl Iterator). Then a consumer could do a binary search with the prefix and either panic if the tail is non-empty or do something else with it.

@Thiez

This comment has been minimized.

Copy link
Contributor

commented Sep 7, 2017

That seems like a lot of complexity to avoid a single call to sort() (or collect::<Vec<_>>() followed by sort()). Especially since your proposed sorted_prefix would have to perform hidden allocations to do its work anyway.

@the8472

This comment has been minimized.

Copy link

commented Sep 7, 2017

Ok, how about num_sorted() -> usize for iterators? That could get you the sorted prefix of a slice for example. It's more powerful than is_sorted() without additional cost.

@Ixrec

This comment has been minimized.

Copy link
Contributor

commented Sep 17, 2017

Are there any concrete use cases for a "check sorted-ness while iterating" method like is_sorted_until() or while_sorted()? I can imagine those theoretically being used for optimizations like allowing if v is sorted { for x in v ... } to do only one iteration over v, or abort the iteration very early if v is likely to be randomly shuffled, but I have no idea what use case would require those particular optimizations. In particular, I don't know of a use case where "iterate only over the sorted prefix" is part of the desired semantics. So methods like that seem niche enough that they should probably start out in itertools or some other crate.

In contrast, it's quite obvious what the is_sorted{,_by{,_key}} trio would be useful for. So much so that it's almost surprising they aren't in std already.

@dtolnay

This comment has been minimized.

Copy link
Member

commented Nov 15, 2017

I agree that it would be good to have an easy way to check whether a slice is sorted. This exists in Go as sort.IsSorted, C++ as std::is_sorted, D std.algorithm.sorting.is_sorted. Plenty of people are already defining their own version of this in Rust (135 results at present).

I am a bit skeptical of the equivalent on Iterator just because the return value does not seem actionable -- you aren't going to "sort" the iterator after you find out it is not already sorted. What are some use cases for this in real code that does not involve iterating over a slice?

@bluss

This comment has been minimized.

Copy link
Contributor

commented Nov 15, 2017

I don't think it is a problem o consume the iterator, we make and consume iterators for many different tasks. For example any use of Iterator::position consumes an iterator just to answer the position question.

@gnzlbg

This comment has been minimized.

Copy link
Contributor Author

commented Nov 15, 2017

@dtolnay

Consuming an Iterator is orthogonal to consuming an underlying slice:

let mut v: Vec<_> = vec![1,3,2,4];
let not_sorted: bool = !v.iter().is_sorted();
if  not_sorted {
  v.sort();  // works
}
let not_sorted: bool = !v.iter_mut().is_sorted();
if not_sorted {
  v.sort(); // works
}

vs:

let mut v: Vec<_> = vec![1,3,2,4];
let not_sorted: bool = !v.into_iter().is_sorted();
if  not_sorted {
  v.sort();  // ERROR: v has been moved
}

The point of implementing this on Iterator is that it allows the algorithm to work on everything that can be iterated upon, not only slices.

You mentioned:

This exists in Go as sort.IsSorted, C++ as std::is_sorted, D std.algorithm.sorting.is_sorted

C++ std::is_sorted and D's std.algorithm.sorting.is_sorted require only a ForwardRange, that is, anything that can be Iterated except for streaming iterators.

If they were to work only on slices they would require a ContiguousRange or a RandomAccessRange in C++ parlance.

@dtolnay

This comment has been minimized.

Copy link
Member

commented Nov 19, 2017

I would be prepared to consider a PR or RFC for this (whichever is easier). Either way please include a list of design alternatives that you evaluated -- method on slice, trait method on Iterator, free function like std::cmp::is_sorted that takes an IntoIterator, etc.

@gnzlbg

This comment has been minimized.

Copy link
Contributor Author

commented Feb 5, 2018

I've implemented this in this PR for the itertools crate in case someone wants to play with them: bluss/rust-itertools#261

@bluss

This comment has been minimized.

Copy link
Contributor

commented Feb 6, 2018

Since Iterator/Itertools conflicts are pretty thorny, I'd prefer if we can merge them to Iterator in libcore directly if that's an option. But they do make good companions with wherever the .sorted() method is on an Iterator too (the method that sorts the elements and returns an iterator).

@LukasKalbertodt

This comment has been minimized.

Copy link
Contributor

commented Feb 24, 2018

Just FYI: I'll try to write an RFC about this in the next couple of days. I hope to summarize everything that has been said in this thread. I will link it here, once I'm finished (or update this issue if I give up :<).

@gnzlbg

This comment has been minimized.

Copy link
Contributor Author

commented Aug 30, 2018

The RFC has been merged: #53485

@gnzlbg gnzlbg closed this Aug 30, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.