Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dedup, dedup_by and dedup_by_key to the Iterator trait #83748

Closed
wants to merge 24 commits into from

Conversation

slerpyyy
Copy link

@slerpyyy slerpyyy commented Apr 1, 2021

Tracking issue: #83747

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @joshtriplett (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 1, 2021
@rust-log-analyzer

This comment has been minimized.

@slerpyyy
Copy link
Author

slerpyyy commented Apr 1, 2021

Finally got the CI to stop complaining 🎉

@rust-log-analyzer

This comment has been minimized.

type Item = T;

fn next(&mut self) -> Option<Self::Item> {
if self.last.is_none() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to check this on every iteration?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you're right, this is only strictly necessary for the first iteration. I can move this into the constructor.

}

let last_item = self.last.as_ref()?;
let mut next = loop {
Copy link
Contributor

@pickfire pickfire Apr 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the iterator is a iter::repeat(1) then this will loop forever?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this does not terminate for infinite repeating items

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is a way for me to fix that without solving the halting problem, so I'll add a warning to the documentation instead

@pickfire
Copy link
Contributor

pickfire commented Apr 1, 2021

Hi, thanks for sending in your first pull request. \o/

Just wondering, if an Iterator does not have duplicates in the first place, can there some sort of method that makes dedup a no-op?

@voidc
Copy link
Contributor

voidc commented Apr 1, 2021

Why do we need all three of the structs Dedup, DedupBy, and DedupByKey? The methods dedup and dedup_by_key could be implemented in terms of DedupBy, I think.

/// [`Iterator`]: trait.Iterator.html
#[unstable(feature = "iter_dedup", reason = "recently added", issue = "83748")]
#[derive(Debug, Clone, Copy)]
pub struct Dedup<I, T> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also implement SourceIter and InPlaceIterable for these?

Comment on lines 6 to 7
/// This `struct` is created by the [`dedup`] method on [`Iterator`]. See its
/// documentation for more.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// This `struct` is created by the [`dedup`] method on [`Iterator`]. See its
/// documentation for more.
/// This `struct` is created by [`Iterator::dedup`]. See its documentation
/// for more.

@slerpyyy
Copy link
Author

slerpyyy commented Apr 1, 2021

@voidc The problem is that I need a way to express the unique ZST of the closure I'm passing into the struct within the signature of the function which I wasn't able to do.

If the signature is

fn dedup<F>(self) -> DedupBy<Self, F, Self::Item>

then I can't create a closure from within the function that maches the user defined type F.

Comment on lines 61 to 82
/// This `struct` is created by the [`dedup_by`] method on [`Iterator`].
/// See its documentation for more.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// This `struct` is created by the [`dedup_by`] method on [`Iterator`].
/// See its documentation for more.
/// This `struct` is created by [`Iterator::dedup_by`] or [`Iterator::dedup_by_key`].
/// See its documentation for more.

Like @voidc mentioned, the fields are even the same so I think they can be merged together.

F: FnMut(&Self::Item) -> K,
K: PartialEq,
{
DedupByKey::new(self, key)
Copy link
Contributor

@pickfire pickfire Apr 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DedupByKey::new(self, key)
self.dedup_by(|a, b| key(a) == key(b))

https://doc.rust-lang.org/stable/src/alloc/vec/mod.rs.html#1441

        self.dedup_by(|a, b| key(a) == key(b))

Not sure if this will work though.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After that change, what is the signature of the function dedup_by_key?

Copy link
Contributor

@cjgillot cjgillot Apr 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the C++ way to give your closure type a name:

struct EqByKey<F> {
  f: F
}
impl<I, K: PartialEq, F: FnMut(&I) -> K> FnOnce<(&I, &I)> for EqByKey<F> {
  type Output = bool;
  extern "rust-call" fn call_once(mut self, (a, b): (&I, &I)) -> bool {
    (self.f)(a) == (self.f)(b)
  }
}
impl<I, K: PartialEq, F: FnMut(&I) -> K> FnMut<(&I, &I)> for EqByKey<F> {
  extern "rust-call" fn call_mut(&mut self, (a, b): (&I, &I)) -> bool {
    (self.f)(a) == (self.f)(b)
  }
}

Edit: just saw #83748 (comment)

@slerpyyy
Copy link
Author

slerpyyy commented Apr 1, 2021

Hi, thanks for sending in your first pull request. \o/

Just wondering, if an Iterator does not have duplicates in the first place, can there some sort of method that makes dedup a no-op?

I'm not sure I understand what you mean. There isn't really a good way to know whether or not an Iterator is going to contain duplicates or not in advance, so we always have to check for duplicates in every iteration.

@pickfire
Copy link
Contributor

pickfire commented Apr 1, 2021

Just wondering, how will the performance of this be compared to Vec::dedup_by? I am guessing that will be a lot faster because it eagerly checks all the data, maybe we should benchmark it?

@pickfire
Copy link
Contributor

pickfire commented Apr 1, 2021

I'm not sure I understand what you mean. There isn't really a good way to know whether or not an Iterator is going to contain duplicates or not in advance, so we always have to check for duplicates in every iteration.

If the iterator came from a HashMap or BTreeMap then it won't have any duplicates.

By the way, dedup_by is also in itertools.

@voidc
Copy link
Contributor

voidc commented Apr 1, 2021

@voidc The problem is that I need a way to express the unique ZST of the closure I'm passing into the struct within the signature of the function which I wasn't able to do.

If the signature is

fn dedup<F>(self) -> DedupBy<Self, F, Self::Item>

then I can't create a closure from within the function that maches the user defined type F.

Could something like that work?

fn dedup(self) -> DedupBy<Self, impl FnMut(&T, &T) -> bool, Self::Item>

or probably

fn dedup(self) -> DedupBy<Self, impl for<'a> FnMut(&'a T, &'a T) -> bool, Self::Item>

@slerpyyy
Copy link
Author

slerpyyy commented Apr 1, 2021

I tried

fn dedup(self) -> DedupBy<Self, impl FnMut(&Self::Item, &Self::Item) -> bool, Self::Item>
where Self::Item: PartialEq {
    self.dedup_by(|a, b| a == b)
}

and I got

error[E0562]: `impl Trait` not allowed outside of function and inherent method return types
   --> src\dedup.rs:178:37
    |
178 |     fn dedup(self) -> DedupBy<Self, impl FnMut(&Self::Item, &Self::Item) -> bool, Self::Item>
    |                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

error: aborting due to previous error

As I understand there is no way to express this type in Rust today. There might be ways to simplify these structs using macros, but the current version is the best I could come up with.

}

fn size_hint(&self) -> (usize, Option<usize>) {
(0, self.inner.size_hint().1)
Copy link
Contributor

@AngelicosPhosphoros AngelicosPhosphoros Apr 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, lower bound should be

self.last().as_ref().map(|_|1).unwrap_or(0)

@voidc
Copy link
Contributor

voidc commented Apr 2, 2021

This is how you could solve it without having three distinct types: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=7a207cd41e8c9dd691a15a6c721c1678

@@ -50,7 +50,9 @@ where
}

fn size_hint(&self) -> (usize, Option<usize>) {
(0, self.inner.size_hint().1)
let min = self.last.as_ref().map(|_| 1).unwrap_or(0);
let max = self.inner.size_hint().1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add lower bound to size_hint, probably.
You now can end with situation (1, Some(0)).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, you're right

let min = self.last.as_ref().map(|_| 1).unwrap_or(0);
let max = self.inner.size_hint().1;
(min, max)
if self.last.is_some() { (1, self.inner.size_hint().1) } else { (0, Some(0)) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it doesn't fix my last comment.
You can have self.last.is_some() true and self.inner.size_hint().1 returning Some(0) which result in (1, Some(0)).
I think, you should use something like your previous code:

let from_stored = self.last.as_ref().map(|_| 1).unwrap_or(0);
let inner_upper = self.inner.size_hint().1;
(from_stored, inner_upper.map(move|x|x+from_stored))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the itertools crate, the following is used:

let (low, hi) = size_hint::add_scalar(self.iter.size_hint(), self.last.is_some() as usize);
((low > 0) as usize, hi)

where add_scalar is defined like this.

@slerpyyy
Copy link
Author

slerpyyy commented Apr 3, 2021

This is how you could solve it without having three distinct types: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=7a207cd41e8c9dd691a15a6c721c1678

This does deduplicate a lot of the code, but I'm not sure we want to add a new trait to the std library for this feature

@voidc
Copy link
Contributor

voidc commented Apr 3, 2021

I could imagine the trait to be useful outside this specific feature. It could live in std::cmp. Alternatively, one could implement FnMut(&T, &T) -> bool directly for ByKey<F> with an unstable feature, I think.

Here is an implementation with out an additional trait: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=79f50cced999414ea1dae9c5f4e0364d

@JohnCSimon
Copy link
Member

triage:
@slerpyyy looks like this is still in draft so I'm assigning it back to the author, if it's ready for review please change the tag to S-waiting-on-review
thank you!

@rustbot label: -S-waiting-on-review +S-waiting-on-author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 19, 2021
@rustbot
Copy link
Collaborator

rustbot commented Aug 21, 2022

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

@slerpyyy
Copy link
Author

Didn't have time to test locally, I'll fix things tomorrow if the CI fails

@rust-log-analyzer

This comment has been minimized.

@voidc
Copy link
Contributor

voidc commented Aug 22, 2022

Exposing the internal-only hack of using named structs for the type parameter feels pretty iffy to me (since it's not something we're doing elsewhere to my knowledge and isn't a language feature that's close to stabilization, and so would be a long-term unstable component of the API we couldn't easily remove), and I'm not sure the benefit is all that large.

I wouldn't call the "named closures" approach a hack. On the contrary, this pattern is also used in the implementation of Dedup in itertools. In my opinion, this is an elegant way to avoid duplication in this case. If implementing the Fn trait for the ByKey and ByPartialEq structs directly is of concern, we could use a separate trait (see also the DedupPredicate trait in itertools).

@Mark-Simulacrum
Copy link
Member

A separate trait (sealed and/or private) seems like a reasonable alternative; the specific problem is the use of direct impls of the Fn traits.

@slerpyyy
Copy link
Author

Wait how the hell did this ICE?

@rust-log-analyzer
Copy link
Collaborator

The job mingw-check failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
    Checking parking_lot v0.11.2
error: an associated function with this name may be added to the standard library in the future
   --> crates/text-edit/src/lib.rs:125:34
    |
125 |         self.indels = iter_merge.dedup_by(|a, b| a == b && !a.delete.is_empty()).cloned().collect();
    |
    |
    = note: `-D unstable-name-collisions` implied by `-D warnings`
    = note: for more information, see issue #48919 <https://github.com/rust-lang/rust/issues/48919>
    = note: for more information, see issue #48919 <https://github.com/rust-lang/rust/issues/48919>
    = help: call with fully qualified syntax `itertools::Itertools::dedup_by(...)` to keep using the current method
    = help: add `#![feature(iter_dedup)]` to the crate attributes to enable `std::iter::Iterator::dedup_by`
    Checking crossbeam v0.8.1
    Checking idna v0.2.3
error: could not compile `text-edit` due to previous error
warning: build failed, waiting for other jobs to finish...

@Dylan-DPC Dylan-DPC added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 19, 2022
@Dylan-DPC
Copy link
Member

@slerpyyy any updates on resolving the ICE/CI failure?

@slerpyyy
Copy link
Author

@Dylan-DPC The ICE seemed to be caused by interaction with the inplace_iteration feature which has been removed, so this is no longer an issue

The CI is currently failing due to an unstable-name-collisions warning, because the itertools crate provides a method with the same name and I don't really know what to do about that

@JohnCSimon JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jan 1, 2023
@safinaskar
Copy link
Contributor

@slerpyyy , does this patch solve this problem with partition_dedup_by_key: #54279 (comment) ?

@JohnCSimon
Copy link
Member

@slerpyyy

Ping from triage: I'm closing this due to inactivity, - last touched in October 2022 Please reopen when you are ready to continue with this.
Note: if you are going to continue please open the PR BEFORE you push to it, else you won't be able to reopen - this is a quirk of github.
Thanks for your contribution.

@rustbot label: +S-inactive

@rustbot rustbot added the S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. label May 28, 2023
@Dylan-DPC Dylan-DPC closed this Jul 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet