Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add {min,max}_set(_by{_key)?)? functions #323

Closed
wants to merge 1 commit into from

Conversation

zayenz
Copy link
Contributor

@zayenz zayenz commented Dec 11, 2018

The function min_set returns a Vec of all the minimum values. All variants for max and with key extraction and comparison functions are added also.

Since the functions need to return an unknown number of values and the values are not known until the iterator is finished, the function needs to allocate memory. Therefore Vec is used for returning the values. From the guidelines for itertools, this also makes it unlikely to be included in the standard library.

The reason for adding these functions is that I often find situations where I want to know all the minimum values.

The naming is not optimal; min_set indicates that a set is returned while a Vec is returned. Another name is all_min, but I think that is less explanatory and not as discoverable. YMMV.

While an empty Vec could be used to handle the case of an empty iterator, I think that it is more safe to push that check into the type-system.

@zayenz zayenz force-pushed the feature/extrema-set branch 2 times, most recently from 0eb8bf3 to 65444a9 Compare Dec 11, 2018
@zayenz
Copy link
Contributor Author

zayenz commented Jan 11, 2019

Just wondering if there is any interest in this feature?

@fenhl
Copy link
Contributor

fenhl commented Jul 14, 2019

I would find this feature useful, although wrapping in Option seems like a strange decision, as Vec methods that require elements to be present already return Option. I foresee many more and_then chains in my code if this is merged as is.

@jswrenn jswrenn self-assigned this Jul 18, 2019
Copy link
Member

@phimuemue phimuemue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functionality sounds great! I appreciate that you already considered min/max and by/by_key.

Overall, I think we could try to be a bit simpler and generic in some places.


for element in it {
let key = key_for(&element);
if lt(&element, &result[0], &key, &current_key) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to replace lt (returning bool) by cmp (returning Ordering) so that we could avoid the second call to lt in line 23?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should work nicely.

As it is, I essentially copied the structure from the minimax set of functions, where I see now that multiple comparisons are also made.

I'm happy to make the change, but will probably not have the time to do so in the next couple of weeks (currently on vacation).

Copy link
Member

@phimuemue phimuemue Jul 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. That didn't occur to me when I looked at minmax last time. I could, however, imagine that things are a bit more complicated there because it needs to look for minimum and maximum simultaneously.

Maybe I should go back and have a look at minmax.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look at minmax and think that it needs to do more stuff since it searches for both the minimum and the maximum. Here, I think we can use less calls to compare elements.

/// Implementation guts for `min_set`, `min_set_by`, and `min_set_by_key`.
pub fn min_set_impl<I, K, F, L>(mut it: I,
mut key_for: F,
mut lt: L) -> Option<Vec<I::Item>>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about returning an empty Vec instead of None? I think it would make things a bit more clear, especially since the types do not indicate that a Some always contains a non-empty vector.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we could/should be a bit more generic than Vec here? I.e. should we try to be like collect (which returns something FromIterator<Self::Item> and lets the caller choose)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think that returning an Option is a good choice. In application code that I write, when using max or min functionality of sequences, an empty sequence is often an indication that something has gone awry and should be forced to be handled in another way. I imagine that in most cases, the Option-ness can be handled by the ?-operator. Also, it matches the signature for max/min from the standard library, which returns an Option of an Item. I can naturally change it, but I do think that Option makes sense here.

As for why Vec and not some kind of iterator. The number of results is unknown before computing the result, and the result is unknown before going through the whole input iterator, which implies that some kind of temporary storage area is needed. Since the data is already in a Vec, I thought that the least convoluted thing was to just give the user back that. We can of course wrap the already constructed Vec in something that exposes it like a FromIterator, I have no real opinion here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the Vec: I naively thought about substituting the whole Vec even within the implementation (basically allowing to customize the kind of temporary storage), so that no conversion would be required upon returning it.

However, I see that FromIterator alone is possibly a bit too narrow to acchieve that. Maybe any container satisfying Default + Extend or FromIterator + Extend could be used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the code by neccesity (unless double iteration is used) provisionally accumulates an unbounded number of result items, and discards these when a new provisionally best results is found, something more than FromIterator + Extend would be needed AFAIK.

My guess would be that anything that also has the possibility to clear the results is no better than the interal Vec, and would also be less clear. I'd be glad to learn I'm wrong though!

src/extrema_set.rs Show resolved Hide resolved
/// assert_eq!(a.iter().min_set(), Some(vec![&1, &1, &1, &1]));
/// ```
///
/// The elements can be floats but no particular result is guaranteed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I often run into situations where I like to simply compare floats. But most of the time, I am happy that Rust tells me that f64 is not Ord.

Iterator::min also requires Ord. Do you think we should stick to that requirement and require the caller to resolve the float issues?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, floats not being Ord is annoying, but also great.

In this case, I again copied the style and wording from minmax.

@zayenz
Copy link
Contributor Author

zayenz commented Jul 21, 2019

I would find this feature useful, although wrapping in Option seems like a strange decision, as Vec methods that require elements to be present already return Option. I foresee many more and_then chains in my code if this is merged as is.

See my reply here: #323 (comment)

src/extrema_set.rs Show resolved Hide resolved
The function min_set returns a Vec of all the minimum values. All
variants for max and with key extraction and comparison functions are
added also.

Since the functions need to return an unknown number of values and the
values are not known until the iterator is finished, the function
needs to allocate memory. Therefore Vec is used for returning the
values.
@phimuemue
Copy link
Member

@zayenz Do you want to drive this to completion? (No worries if your answer is "no".)

If so, could you:

  • Make the returned type simply a Vec? (No Option to denote emptiness.)
  • Have the relevant functions require Ord? (Instead of PartialOrd; we can relax the constraints later if actually needed.)
  • Avoid duplicate calls to lt by accepting a FnMut(...)->Ordering in min_set_impl?
  • Resort to internal iteration (using fold/for_each instead of a manual for-loop)?

@zayenz
Copy link
Contributor Author

zayenz commented Aug 20, 2021

@phimuemue I'd forgotten about this one. Yes, I would like to fix this as I think the methods are quite useful.

I will take a look at updating the code, and then seeing about the changes mid next week.

@phimuemue
Copy link
Member

phimuemue commented May 4, 2022

Hi @zayenz, thanks for this! I did some changes on top of your proposal, and moved the PR to #613.

@phimuemue phimuemue closed this May 4, 2022
bors bot added a commit that referenced this pull request May 23, 2022
613: Minmax set (prev "Add {min,max}_set(_by{_key)?)? functions") r=jswrenn a=phimuemue

This PR supersedes #323 (I did not know how I could amend to the original PR, so sorry for the "duplicate" PR here.)

The original PR by `@zayenz` looked rather good. I adjusted the stuff that came up during discussion back then:

* Teturn type `Vec` instead of `Option` - emptiness is sufficiently well represented by `Vec`.
* Functions require `Ord` instead of `PartialOrd` - just as `Iterator::min`.
* Avoid duplicate calls to `lt` by accepting a `FnMut(...)->Ordering` - seems canonical compared to the `bool`-solution.
* Use internal iteration instead of a manual `for`-loop.

Moreover, I simplified some bits.

Co-authored-by: Mikael Zayenz Lagerkvist <zayenz@gmail.com>
Co-authored-by: philipp <descpl@yahoo.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants