Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add delete and delete_by methods to Iterator #2475

Closed
wants to merge 1 commit into from
Closed

RFC: Add delete and delete_by methods to Iterator #2475

wants to merge 1 commit into from

Conversation

ElectricCoffee
Copy link

@ElectricCoffee ElectricCoffee commented Jun 14, 2018

This RFC requests the addition of two new methods to the Iterator trait, called delete and delete_by.

You can find the render [here].

Note, I didn't know what to put into the Reference-level explanation, as I believe the rest of the document explains the intent well enough.

@Centril Centril added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Jun 14, 2018
@Centril
Copy link
Contributor

Centril commented Jun 14, 2018

Does this occur frequently enough?

Maybe delete sounds too destructive, could there be a better name?

I would try to fit in the word first somehow.

Note, I didn't know what to put into the Reference-level explanation, as I believe the rest of the document explains the intent well enough.

You can include a possible implementation. :)

@CryZe
Copy link

CryZe commented Jun 14, 2018

I think the name should be filter_first. It took me a long time to figure out that this doesn't actually delete the element out of the original container. I also think this is too niche to put in std. itertools or the odds crate seem like a better fit.

@ElectricCoffee
Copy link
Author

You can include a possible implementation. :)

I'm nowhere near versed enough in Rust to be able to do that without a great deal of effort

@Diggsey
Copy link
Contributor

Diggsey commented Jun 14, 2018

Yeah this seems like a better fit for itertools. Also, why does delete_by(...) take a value and a predicate? Why does it not just take a predicate? It could just as easily be written x.delete_by(|x| x <= 4) (capturing the value)

@Centril
Copy link
Contributor

Centril commented Jun 14, 2018

Here is a proof of concept implementation:
https://play.rust-lang.org/?gist=24d9dde030d72aac1119d047df897a5a&version=stable&mode=debug

The generalization and optimization (size_hint, etc.) is left as an exercise for the reader.

I also think this is too niche to put in std. itertools or the odds crate seem like a better fit.

Yeah this seems like a better fit for itertools.

Perhaps you are both right; However, inclusion in std comes with the benefit of specialization.
If @ElectricCoffee or someone else can provide some examples of this occurring frequently enough in the wild then perhaps it should be included in libcore after all. What we shouldn't do is add this to itertools first and then move it up to std; that only causes unnecessary pain.

@ElectricCoffee
Copy link
Author

ElectricCoffee commented Jun 14, 2018

@Diggsey it was mostly to mirror the Haskell implementation, which both include a predicate and a value.
I think it's to do with the fact that delete = deleteBy (==) for those who understand Haskell syntax, in Rust terms it would probably be something like

fn delete<T: std::cmp::Eq>(&self, item: T) -> Self {
    self.delete_by(item, |x, y| x == y)
}

@clarfonthey
Copy link
Contributor

omit_first might be a reasonable name.

@Centril
Copy link
Contributor

Centril commented Jun 14, 2018

@clarcharr That's a Great name ❤️

`delete_by` would work like `delete`, but also take a binary predicate:
```rust
// `result` contains [1, 2, 3, 5, 6, 7, 8, 9], skipping 4
let result = (1 .. 10).delete_by(4, |x, y| x <= y);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I'd rather have delete_by(|x| x == 4) than delete_by(4, |x, y| x <= y) where 4 is always passed to y. The current way looks very confusing to me.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was using the example in the Haskell source code verbatim.
The example within uses <= to show that you can use an operator different from ==

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, I think the signature should be:

fn delete_by<F>(self, f: F) where F: Fn(&Self::Item) -> bool { ... }

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

The `delete` method removes the first occurrence of its input from the iterator.
Copy link
Contributor

@clarfonthey clarfonthey Jun 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be helpful to add a method signature for delete. Perhaps:

fn delete<T>(self, x: &T) where T: PartialEq<Self::Item> { ... }

# Unresolved questions
[unresolved]: #unresolved-questions

- Maybe `delete` sounds too destructive, could there be a better name?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in the comments, perhaps omit_first and omit_first_by might be good names? So, you'd have iter.omit_first(&4) and iter.omit_first_by(|x| x < 4).

@ElectricCoffee
Copy link
Author

ElectricCoffee commented Jun 14, 2018

In response to everyone who says iter.delete_by(|x| x < 4) being a better model than iter.delete_by(4, |x, y| x < y); I agree.

I took the inspiration for the structure directly from the Haskell original.
I'm not actually sure what the purpose of the original design was, but I'm sure they had a good reason to do it.

That being said, since Rust doesn't support currying the way Haskell does, so the purpose of splitting the predicate and the queried value has likely been lost.

@Centril
Copy link
Contributor

Centril commented Jun 15, 2018

I've refined my implementation further using omit_first and omit_first_by: https://play.rust-lang.org/?gist=34f7b804496d6440d05c57a89f7689ce&version=nightly&mode=debug

Some possible optimizations are included as well as documentation and some notes on implementation details.


As it stands, there is no _nice_ way of non-destructively removing a single element from an iterator.
There might be a more general way of implementing it that could be more useful in general; however,
as it stands simply using `filter`, `map`, or `fold` won't do the trick when wanting to afect a single item in an unkown spot in the iterator.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filter takes FnMut, so one can absolutely do this with filter. For example:

pub fn delete<T: PartialEq<U>, U>(
    it: impl Iterator<Item = T>,
    item: U,
) -> impl Iterator<Item = T> {
    let mut found = false;
    it.filter(move |x| {
        if !found && x == &item {
            found = true;
            false
        } else {
            true
        }
    })
}

Or abstract out the inners into (using @Centril's Option trick to sometimes use less space)

pub fn not_first<T: Eq>(x: T) -> impl FnMut(&T) -> bool {
    let mut needle = Some(x);
    move |x| {
        if Some(x) == needle.as_ref() {
            needle = None;
            false
        } else {
            true
        }
    }
}

And just use let result = outcomes.iter().filter(not_first(smallest));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice :) I wonder how this will perform compared to my more elaborate implementation, particularly wrt. size_hint and the folds since we know more about the characteristics of a non-filter based implementation. Someone should bench this ^,-

@nielsle
Copy link

nielsle commented Jun 15, 2018

If you are working with vectors, then you can use Vec::retain

let mut vec = vec![1, 2, 3, 4];
vec.retain(|&x| x%2 == 0);
assert_eq!(vec, [2, 4]);

@clarfonthey
Copy link
Contributor

@nielsle That's not what this method does; it only deletes the first occurrence, and would be closer to remove_item.

Anyway, as stated, this is just for iterators and not actually modifying underlying containers.

@joshtriplett
Copy link
Member

At the risk of further bikeshedding on names, what about filter_one? That makes it clear that it has the same signature and behavior as filter, except that it only filters one item.

Or, as another possibility, what about filter_n, which could take a number of items to filter and stop after that many?

@Centril
Copy link
Contributor

Centril commented Jun 16, 2018

@joshtriplett I think the problem with filter_oneis that it has the expectation of only keeping equal elements, which is the same as find. Meanwhile, omit_firsttells the reader that everything except the matching element is kept.

The idea to extend to n elements is interesting! The perf might be slightly worse, but it could be worth it.

@joshtriplett
Copy link
Member

@Centril Good point, filter looks for things that match and this looks for things that don't match. That does indeed make filter the wrong name.

Given that, omit_one or omit_n seem plausible.

@Centril
Copy link
Contributor

Centril commented Jun 16, 2018

@joshtriplett one vs. first; now that is a hard bikeshed ;)

@joshtriplett
Copy link
Member

To me, first has too much of a connotation created by methods like slice_first or split_first. omit_first sounds like a method that should take zero arguments, return &x[1..], and be called tail. ;) omit_one raises a question of "which one", and the argument answers that question.

@ElectricCoffee
Copy link
Author

So what about the case where the user can specify their own predicate? Would that be omit_one_by and omit_n_by?

@Centril
Copy link
Contributor

Centril commented Jun 16, 2018

@joshtriplett Good rationale. 👍 tail is indeed a reasonable interpretation of omit_first.

@ElectricCoffee Depends on whether you want to omit one or n; If we want the general mechanism you'd write .omit_n_by(1, |elt| ...) instead of .omit_one_by(|elt| ...).

@ElectricCoffee
Copy link
Author

@Centril I meant in terms of naming, not in terms of features :)

I wasn't sure if the predicate versions were out of the picture or not

@vincentdephily
Copy link

filter looks for things that match and this looks for things that don't match. That does indeed make filter the wrong name.

Is looking for things that don't match a required behavior, given that we can easily flip the meaning using the predicate ? There'd be little value in adding omit() given that we already have filter(), and I think that the same logic applies here : make this predicate positive like the others in the API, so that the API is simpler to keep in your head.

I'd say only implement a filter_n() method that takes a predicate and a usize. I can use that for all the usecases, and because I know filter() and recognize foo_n() I immediately know what filter_n() does.

@alygin
Copy link

alygin commented Jun 20, 2018

Iterators already have skip(n) and skip_while(predicate), so there could also be a more general skip_n_while(n, predicate) or skip_n_by(n, predicate) in order not to introduce new verbs.

@joshtriplett
Copy link
Member

joshtriplett commented Jun 20, 2018 via email

@gbutler69
Copy link

My preferred names for this:

  • skip_n_while( n, predicate )
  • skip_n_when( n, predicate )

That being said, what would this do for an array like: [ 0, 0, 1, 2, 1, 2, 1, 3, 3, 4, 1, 1, 5, 5, 6, 5, 6, 6 ]?

  • if skip_n_while( 2, |x|x=2 )

Would it be:

[ 0, 0, 1, 1, 1, 3, 3, 4, 1, 1, 5, 5, 6, 5, 6, 6 ]

or would it be:

[ 0, 0, 1, 3, 3, 4, 1, 1, 5, 5, 6, 5, 6, 6 ]

What about, if skip_n_while( 1, |x|x=5 )? Would it be:

[ 0, 0, 1, 2, 1, 2, 1, 3, 3, 4, 1, 1, 5, 6, 6, 6 ]

or would it be:

[ 0, 0, 1, 2, 1, 2, 1, 3, 3, 4, 1, 1, 5, 6, 5, 6, 6 ]

What about skip_n_( 2, |x|x=6 )? Would it be:

[ 0, 0, 1, 2, 1, 2, 1, 3, 3, 4, 1, 1, 5, 5, 5 ]

or would it be:

[ 0, 0, 1, 2, 1, 2, 1, 3, 3, 4, 1, 1, 5, 5, 5, 6 ]

or:

[ 0, 0, 1, 2, 1, 2, 1, 3, 3, 4, 1, 1, 5, 5, 5, 6, 6 ]

In other words, this whole operation seems very ad-hoc. If it is going to only delete the first n occurrences encountered that matches the predicate, regardless of whether or not they are consecutive, then, a name like:

  • skip_n_by( n, predicate )
  • omit_n_by( n, predicate )
  • skip_first_n( n, predicate ) <== this would be preferable in this case

would be better; however, if it will only skip the first n of the first group of consecutive entries that match the predicate, then a name like:

  • skip_first_n_while_consecutive( n, predicate )

would be better. But, if it will delete the first n from each group of consecutive entries that match the predicate then a name like:

  • skip_first_n_of_consecutive( n, predicate )

would be better.

TL;DR

* delete/delete_by is not a good name (despite alignment with what Haskell does)
* alternate names need to reflect what is going on
* the purpose of this seems unclear and rather niche
* there are several similar operations that differ slightly in interpretation and I can't think of a reason that one interpretation is better than another - they all seem like niche cases

Still TL;DR

* probably belongs in iterator tools crate, not in the standard library (too niche!)

@clarfonthey
Copy link
Contributor

Using precedent: Vec::remove_item may give insight that instead of a delete and delete_by pair, we may want a delete_item and delete pair. So, keeping that in mind, perhaps omit is a good choice of wording.

So, omit(n, pred) and omit_item(n, item) might be reasonable. Then omit_one and omit_one_item could specialise to pass 1 as the first argument.

@nielsle
Copy link

nielsle commented Jun 21, 2018

Thank you for the correction @clarcharr

I think that you we can use enumerate to fix example 1 in the RFC.

fn main() {
    let outcomes = vec![5, 2, 2, 6];

    let smallest = outcomes.iter().enumerate()
        .min_by(|(_,x1), (_, x2)| x1.cmp(x2))
        .map(|(i, _)| i)
        .unwrap();

    let result: Vec<_> = outcomes.iter().enumerate()
        .filter(|(i, _)| *i != smallest)
        .map(|(_, x)| x)
        .collect();
        
    println!("{:?} {:?}", smallest, result);
    // prints:     1 [5, 2, 6]
}

@nielsle
Copy link

nielsle commented Jun 25, 2018

Here is another version, that only performs one iteration

    let values = vec![1, 1, 2, 3, 4, 4, 5, 6];
    
    let result: Vec<_> = values.iter()
        .scan(false, |done, &x|
            if *done == false && x == 4 {
                *done = true;
                Some(None)  //
            } else {
                Some(Some(x))
            }
        )
        .filter_map(|x| x)
        .collect();
        
    println!("{:?}",  result);
    // prints:     [1, 1, 2, 3, 4, 5, 6]

@Centril Centril added A-iterators Iterator related proposals & ideas A-types-libstd Proposals & ideas introducing new types to the standard library. labels Nov 22, 2018
@matthiasbeyer
Copy link

Maybe a bit late to the party, but anyways: The naming of these functions (as in the RFC) is completely non-explanatory. I would expect to delete everything that matches the predicate from the iterator - not good!

Also, delete_by-predicate closure taking two arguments? That is also misleading - one would expect to have a iter.delete_by(|element| /* impl */) like predicate, but not a predicate that takes the former and the current (or the current and the next - I don't know) element.

@ElectricCoffee
Copy link
Author

Feel free to come up with a better name, I took these from an existing language

@matthiasbeyer
Copy link

I'd suggest reject_first or delete_first as names, but then again why would we offer such a functionality in std if an equivalent iterator can be build easily? Like others said, I'd rather see this in itertools than in std.

@KodrAus KodrAus added the Libs-Tracked Libs issues that are tracked on the team's project board. label Jul 29, 2020
@m-ou-se
Copy link
Member

m-ou-se commented Jun 16, 2021

We discussed this proposal in the library api team meeting last week, and we felt that the use case for this addition is too niche to warrant another method on the already very large interface of Iterator.

@rfcbot fcp close

@rfcbot
Copy link
Collaborator

rfcbot commented Jun 16, 2021

Team member @m-ou-se has proposed to close this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. disposition-close This RFC is in PFCP or FCP with a disposition to close it. final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. labels Jun 16, 2021
@rfcbot
Copy link
Collaborator

rfcbot commented Jun 21, 2021

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot removed the proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. label Jun 21, 2021
@rfcbot rfcbot added the finished-final-comment-period The final comment period is finished for this RFC. label Jul 1, 2021
@rfcbot
Copy link
Collaborator

rfcbot commented Jul 1, 2021

The final comment period, with a disposition to close, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

The RFC is now closed.

@rfcbot rfcbot added to-announce closed This FCP has been closed (as opposed to postponed) and removed final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. disposition-close This RFC is in PFCP or FCP with a disposition to close it. labels Jul 1, 2021
@rfcbot rfcbot closed this Jul 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-iterators Iterator related proposals & ideas A-types-libstd Proposals & ideas introducing new types to the standard library. closed This FCP has been closed (as opposed to postponed) finished-final-comment-period The final comment period is finished for this RFC. Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the RFC. to-announce
Projects
None yet
Development

Successfully merging this pull request may close these issues.