Add `dedup`, `dedup_by` and `dedup_by_key` to the `Iterator` trait #83748

slerpyyy · 2021-04-01T13:09:07Z

Tracking issue: #83747

rust-highfive · 2021-04-01T13:09:10Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @joshtriplett (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

slerpyyy · 2021-04-01T14:43:42Z

Finally got the CI to stop complaining 🎉

pickfire · 2021-04-01T15:48:59Z

library/core/src/iter/adapters/dedup.rs

+    type Item = T;
+
+    fn next(&mut self) -> Option<Self::Item> {
+        if self.last.is_none() {


Do we have to check this on every iteration?

No, you're right, this is only strictly necessary for the first iteration. I can move this into the constructor.

pickfire · 2021-04-01T15:54:11Z

library/core/src/iter/adapters/dedup.rs

+        }
+
+        let last_item = self.last.as_ref()?;
+        let mut next = loop {


If the iterator is a iter::repeat(1) then this will loop forever?

Yes, this does not terminate for infinite repeating items

I don't think there is a way for me to fix that without solving the halting problem, so I'll add a warning to the documentation instead

pickfire · 2021-04-01T15:56:51Z

Hi, thanks for sending in your first pull request. \o/

Just wondering, if an Iterator does not have duplicates in the first place, can there some sort of method that makes dedup a no-op?

voidc · 2021-04-01T16:04:03Z

Why do we need all three of the structs Dedup, DedupBy, and DedupByKey? The methods dedup and dedup_by_key could be implemented in terms of DedupBy, I think.

pickfire · 2021-04-01T16:08:14Z

library/core/src/iter/adapters/dedup.rs

+/// [`Iterator`]: trait.Iterator.html
+#[unstable(feature = "iter_dedup", reason = "recently added", issue = "83748")]
+#[derive(Debug, Clone, Copy)]
+pub struct Dedup<I, T> {


Should we also implement SourceIter and InPlaceIterable for these?

pickfire · 2021-04-01T16:11:25Z

library/core/src/iter/adapters/dedup.rs

+/// This `struct` is created by the [`dedup`] method on [`Iterator`]. See its
+/// documentation for more.


Suggested change

/// This `struct` is created by the [`dedup`] method on [`Iterator`]. See its

/// documentation for more.

/// This `struct` is created by [`Iterator::dedup`]. See its documentation

/// for more.

slerpyyy · 2021-04-01T16:11:39Z

@voidc The problem is that I need a way to express the unique ZST of the closure I'm passing into the struct within the signature of the function which I wasn't able to do.

If the signature is

fn dedup<F>(self) -> DedupBy<Self, F, Self::Item>

then I can't create a closure from within the function that maches the user defined type F.

pickfire · 2021-04-01T16:12:47Z

library/core/src/iter/adapters/dedup.rs

+/// This `struct` is created by the [`dedup_by`] method on [`Iterator`].
+/// See its documentation for more.


Suggested change

/// This `struct` is created by the [`dedup_by`] method on [`Iterator`].

/// See its documentation for more.

/// This `struct` is created by [`Iterator::dedup_by`] or [`Iterator::dedup_by_key`].

/// See its documentation for more.

Like @voidc mentioned, the fields are even the same so I think they can be merged together.

pickfire · 2021-04-01T16:17:26Z

library/core/src/iter/traits/iterator.rs

+        F: FnMut(&Self::Item) -> K,
+        K: PartialEq,
+    {
+        DedupByKey::new(self, key)


Suggested change

DedupByKey::new(self, key)

self.dedup_by(|a, b| key(a) == key(b))

https://doc.rust-lang.org/stable/src/alloc/vec/mod.rs.html#1441

self.dedup_by(|a, b| key(a) == key(b))

Not sure if this will work though.

After that change, what is the signature of the function dedup_by_key?

You can use the C++ way to give your closure type a name:

struct EqByKey<F> { f: F } impl<I, K: PartialEq, F: FnMut(&I) -> K> FnOnce<(&I, &I)> for EqByKey<F> { type Output = bool; extern "rust-call" fn call_once(mut self, (a, b): (&I, &I)) -> bool { (self.f)(a) == (self.f)(b) } } impl<I, K: PartialEq, F: FnMut(&I) -> K> FnMut<(&I, &I)> for EqByKey<F> { extern "rust-call" fn call_mut(&mut self, (a, b): (&I, &I)) -> bool { (self.f)(a) == (self.f)(b) } }

Edit: just saw #83748 (comment)

slerpyyy · 2021-04-01T16:19:04Z

Hi, thanks for sending in your first pull request. \o/

Just wondering, if an Iterator does not have duplicates in the first place, can there some sort of method that makes dedup a no-op?

I'm not sure I understand what you mean. There isn't really a good way to know whether or not an Iterator is going to contain duplicates or not in advance, so we always have to check for duplicates in every iteration.

pickfire · 2021-04-01T16:19:16Z

Just wondering, how will the performance of this be compared to Vec::dedup_by? I am guessing that will be a lot faster because it eagerly checks all the data, maybe we should benchmark it?

pickfire · 2021-04-01T16:21:46Z

I'm not sure I understand what you mean. There isn't really a good way to know whether or not an Iterator is going to contain duplicates or not in advance, so we always have to check for duplicates in every iteration.

If the iterator came from a HashMap or BTreeMap then it won't have any duplicates.

By the way, dedup_by is also in itertools.

voidc · 2021-04-01T16:43:18Z

@voidc The problem is that I need a way to express the unique ZST of the closure I'm passing into the struct within the signature of the function which I wasn't able to do.

If the signature is
fn dedup<F>(self) -> DedupBy<Self, F, Self::Item>
then I can't create a closure from within the function that maches the user defined type F.

Could something like that work?

fn dedup(self) -> DedupBy<Self, impl FnMut(&T, &T) -> bool, Self::Item>

or probably

fn dedup(self) -> DedupBy<Self, impl for<'a> FnMut(&'a T, &'a T) -> bool, Self::Item>

slerpyyy · 2021-04-01T17:00:36Z

I tried

fn dedup(self) -> DedupBy<Self, impl FnMut(&Self::Item, &Self::Item) -> bool, Self::Item>
where Self::Item: PartialEq {
    self.dedup_by(|a, b| a == b)
}

and I got

error[E0562]: `impl Trait` not allowed outside of function and inherent method return types
   --> src\dedup.rs:178:37
    |
178 |     fn dedup(self) -> DedupBy<Self, impl FnMut(&Self::Item, &Self::Item) -> bool, Self::Item>
    |                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

error: aborting due to previous error

As I understand there is no way to express this type in Rust today. There might be ways to simplify these structs using macros, but the current version is the best I could come up with.

AngelicosPhosphoros · 2021-04-01T20:46:36Z

library/core/src/iter/adapters/dedup.rs

+    }
+
+    fn size_hint(&self) -> (usize, Option<usize>) {
+        (0, self.inner.size_hint().1)


I think, lower bound should be

self.last().as_ref().map(|_|1).unwrap_or(0)

voidc · 2021-04-02T11:57:18Z

This is how you could solve it without having three distinct types: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=7a207cd41e8c9dd691a15a6c721c1678

AngelicosPhosphoros · 2021-04-02T12:21:31Z

library/core/src/iter/adapters/dedup.rs

@@ -50,7 +50,9 @@ where
    }

    fn size_hint(&self) -> (usize, Option<usize>) {
-        (0, self.inner.size_hint().1)
+        let min = self.last.as_ref().map(|_| 1).unwrap_or(0);
+        let max = self.inner.size_hint().1;


You should add lower bound to size_hint, probably.
You now can end with situation (1, Some(0)).

Oh yes, you're right

AngelicosPhosphoros · 2021-04-03T07:46:41Z

library/core/src/iter/adapters/dedup.rs

-        let min = self.last.as_ref().map(|_| 1).unwrap_or(0);
-        let max = self.inner.size_hint().1;
-        (min, max)
+        if self.last.is_some() { (1, self.inner.size_hint().1) } else { (0, Some(0)) }


Well, it doesn't fix my last comment.
You can have self.last.is_some() true and self.inner.size_hint().1 returning Some(0) which result in (1, Some(0)).
I think, you should use something like your previous code:

let from_stored = self.last.as_ref().map(|_| 1).unwrap_or(0); let inner_upper = self.inner.size_hint().1; (from_stored, inner_upper.map(move|x|x+from_stored))

In the itertools crate, the following is used:

let (low, hi) = size_hint::add_scalar(self.iter.size_hint(), self.last.is_some() as usize); ((low > 0) as usize, hi)

where add_scalar is defined like this.

slerpyyy · 2021-04-03T10:40:36Z

This is how you could solve it without having three distinct types: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=7a207cd41e8c9dd691a15a6c721c1678

This does deduplicate a lot of the code, but I'm not sure we want to add a new trait to the std library for this feature

voidc · 2021-04-03T11:04:33Z

I could imagine the trait to be useful outside this specific feature. It could live in std::cmp. Alternatively, one could implement FnMut(&T, &T) -> bool directly for ByKey<F> with an unstable feature, I think.

Here is an implementation with out an additional trait: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=79f50cced999414ea1dae9c5f4e0364d

JohnCSimon · 2021-04-19T00:18:17Z

triage:
@slerpyyy looks like this is still in draft so I'm assigning it back to the author, if it's ready for review please change the tag to S-waiting-on-review
thank you!

@rustbot label: -S-waiting-on-review +S-waiting-on-author

…ing `ByPartialEq`

Co-authored-by: Cameron Steffen <cam.steffen94@gmail.com>

Co-authored-by: Anders Kaseorg <andersk@mit.edu>

rustbot · 2022-08-21T23:18:23Z

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

Stabilizing library features
Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
Changing public documentation in ways that create new stability guarantees
Changing observable runtime behavior of library APIs

slerpyyy · 2022-08-21T23:20:06Z

Didn't have time to test locally, I'll fix things tomorrow if the CI fails

voidc · 2022-08-22T02:45:17Z

Exposing the internal-only hack of using named structs for the type parameter feels pretty iffy to me (since it's not something we're doing elsewhere to my knowledge and isn't a language feature that's close to stabilization, and so would be a long-term unstable component of the API we couldn't easily remove), and I'm not sure the benefit is all that large.

I wouldn't call the "named closures" approach a hack. On the contrary, this pattern is also used in the implementation of Dedup in itertools. In my opinion, this is an elegant way to avoid duplication in this case. If implementing the Fn trait for the ByKey and ByPartialEq structs directly is of concern, we could use a separate trait (see also the DedupPredicate trait in itertools).

Mark-Simulacrum · 2022-08-22T02:58:50Z

A separate trait (sealed and/or private) seems like a reasonable alternative; the specific problem is the use of direct impls of the Fn traits.

slerpyyy · 2022-08-22T07:37:42Z

Wait how the hell did this ICE?

rust-log-analyzer · 2022-08-22T08:48:26Z

The job mingw-check failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

    Checking parking_lot v0.11.2
error: an associated function with this name may be added to the standard library in the future
   --> crates/text-edit/src/lib.rs:125:34
    |
125 |         self.indels = iter_merge.dedup_by(|a, b| a == b && !a.delete.is_empty()).cloned().collect();
    |
    |
    = note: `-D unstable-name-collisions` implied by `-D warnings`
    = note: for more information, see issue #48919 <https://github.com/rust-lang/rust/issues/48919>
    = note: for more information, see issue #48919 <https://github.com/rust-lang/rust/issues/48919>
    = help: call with fully qualified syntax `itertools::Itertools::dedup_by(...)` to keep using the current method
    = help: add `#![feature(iter_dedup)]` to the crate attributes to enable `std::iter::Iterator::dedup_by`
    Checking crossbeam v0.8.1
    Checking idna v0.2.3
error: could not compile `text-edit` due to previous error
warning: build failed, waiting for other jobs to finish...

Dylan-DPC · 2022-10-26T13:07:18Z

@slerpyyy any updates on resolving the ICE/CI failure?

slerpyyy · 2022-10-31T00:07:43Z

@Dylan-DPC The ICE seemed to be caused by interaction with the inplace_iteration feature which has been removed, so this is no longer an issue

The CI is currently failing due to an unstable-name-collisions warning, because the itertools crate provides a method with the same name and I don't really know what to do about that

safinaskar · 2023-04-06T13:27:12Z

@slerpyyy , does this patch solve this problem with partition_dedup_by_key: #54279 (comment) ?

JohnCSimon · 2023-05-28T13:59:49Z

@slerpyyy

Ping from triage: I'm closing this due to inactivity, - last touched in October 2022 Please reopen when you are ready to continue with this.
Note: if you are going to continue please open the PR BEFORE you push to it, else you won't be able to reopen - this is a quirk of github.
Thanks for your contribution.

@rustbot label: +S-inactive

rust-highfive assigned joshtriplett Apr 1, 2021

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 1, 2021

slerpyyy mentioned this pull request Apr 1, 2021

Tracking Issue for Iterator::{dedup, dedup_by, dedup_by_key} #83747

Open

3 tasks

This comment has been minimized.

Sign in to view

pickfire reviewed Apr 1, 2021

View reviewed changes

AngelicosPhosphoros reviewed Apr 1, 2021

View reviewed changes

AngelicosPhosphoros reviewed Apr 2, 2021

View reviewed changes

AngelicosPhosphoros reviewed Apr 3, 2021

View reviewed changes

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 19, 2021

slerpyyy and others added 11 commits August 22, 2022 00:57

Run formatter

1a04987

Added more documentation

69b18f7

Simplified type signature

d8aebea

Defer initial call to next + lots of inline tags

5dec9ce

Make use of Option::unwrap_unchecked and prevent user from contruct…

6cd757a

…ing `ByPartialEq`

Update library/core/src/iter/adapters/dedup.rs

fcbff04

Co-authored-by: Cameron Steffen <cam.steffen94@gmail.com>

Refactor Dedup::next as suggested by @camsteffen

9a906dd

whoops

3fbbfdb

Update library/core/src/iter/adapters/dedup.rs

9648fb5

Co-authored-by: Anders Kaseorg <andersk@mit.edu>

Fix inplace_iteration related impls

67d01d5

Go back to separate iterators

a822aad

slerpyyy force-pushed the feature_iter_dedup branch from e6edf5c to a822aad Compare August 21, 2022 23:18

This comment has been minimized.

Sign in to view

Remove inplace_iteration traits because of ICE

f57678c

Dylan-DPC added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 19, 2022

JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jan 1, 2023

rustbot added the S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. label May 28, 2023

Dylan-DPC closed this Jul 17, 2023

		/// This `struct` is created by the [`dedup`] method on [`Iterator`]. See its
		/// documentation for more.

		/// This `struct` is created by the [`dedup_by`] method on [`Iterator`].
		/// See its documentation for more.

	DedupByKey::new(self, key)
	self.dedup_by(\|a, b\| key(a) == key(b))

Add dedup, dedup_by and dedup_by_key to the Iterator trait #83748

Add dedup, dedup_by and dedup_by_key to the Iterator trait #83748

Conversation

slerpyyy commented Apr 1, 2021 • edited Loading

rust-highfive commented Apr 1, 2021

This comment has been minimized.

slerpyyy commented Apr 1, 2021

This comment has been minimized.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pickfire Apr 1, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pickfire commented Apr 1, 2021

voidc commented Apr 1, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

slerpyyy commented Apr 1, 2021 • edited Loading

Choose a reason for hiding this comment

pickfire Apr 1, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cjgillot Apr 3, 2021 • edited Loading

Choose a reason for hiding this comment

slerpyyy commented Apr 1, 2021

pickfire commented Apr 1, 2021

pickfire commented Apr 1, 2021 • edited Loading

voidc commented Apr 1, 2021 • edited Loading

slerpyyy commented Apr 1, 2021 • edited Loading

AngelicosPhosphoros Apr 1, 2021 • edited Loading

Choose a reason for hiding this comment

voidc commented Apr 2, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

slerpyyy commented Apr 3, 2021

voidc commented Apr 3, 2021 • edited Loading

JohnCSimon commented Apr 19, 2021

rustbot commented Aug 21, 2022

slerpyyy commented Aug 21, 2022

This comment has been minimized.

voidc commented Aug 22, 2022

Mark-Simulacrum commented Aug 22, 2022

slerpyyy commented Aug 22, 2022

rust-log-analyzer commented Aug 22, 2022

Dylan-DPC commented Oct 26, 2022

slerpyyy commented Oct 31, 2022

safinaskar commented Apr 6, 2023

JohnCSimon commented May 28, 2023

Add `dedup`, `dedup_by` and `dedup_by_key` to the `Iterator` trait #83748

Add `dedup`, `dedup_by` and `dedup_by_key` to the `Iterator` trait #83748

slerpyyy commented Apr 1, 2021 •

edited

Loading

pickfire Apr 1, 2021 •

edited

Loading

slerpyyy commented Apr 1, 2021 •

edited

Loading

pickfire Apr 1, 2021 •

edited

Loading

cjgillot Apr 3, 2021 •

edited

Loading

pickfire commented Apr 1, 2021 •

edited

Loading

voidc commented Apr 1, 2021 •

edited

Loading

slerpyyy commented Apr 1, 2021 •

edited

Loading

AngelicosPhosphoros Apr 1, 2021 •

edited

Loading

voidc commented Apr 2, 2021 •

edited

Loading

voidc commented Apr 3, 2021 •

edited

Loading