Powerset iterator adaptor #335

willcrozi · 2019-03-01T11:05:21Z

Implements a powerset iterator adaptor that iterates over all subsets of the input iterator's elements. Returns vectors representing these subsets. Internally uses Combinations of increasing length.

I've taken the strategy of using a 'position' field that acts both as a means to to detect the special case of the first element and also allows optimal size_hint() implementation.

Additionally there is a commit to improve performance that alters Combinations implementation slightly. I've added Combinations benchmark as a stand-alone commit to allow checking for performance regressions. Powerset performance after this commit improves some cases (with small sizes of n) by 10-30%

This is my first attempt at a Rust contribution, happy to put in whatever discussion/work to get this merged. Cheers!

jswrenn

I want to get this into the next Itertools release! Sorry for the long delay in review!

.travis.yml

willcrozi · 2020-09-23T10:13:43Z

I want to get this into the next Itertools release! Sorry for the long delay in review!

Great to hear, thanks!

I've rewritten this feature since the original pull request. I think the code is ready and was intending on writing some explanatory notes, which I can post here later today.

It seems the CI build for the Powerset benchmark times out (it completes successfully on my fork). I'll take a look into this.

willcrozi · 2020-09-23T11:17:20Z

Travis CI jobs 2 and 3 are both getting stuck and terminated after a 10 minute limit. I've reduced the length of the powerset benchmarks (needed doing anyway TBH) but this seems related to caching.

Setting up build cache

adding /home/travis/.cargo to cache
creating directory /home/travis/.cargo
adding /home/travis/build/rust-itertools/itertools/target to cache
creating directory /home/travis/build/rust-itertools/itertools/target
adding /home/travis/.rustup to cache
creating directory /home/travis/.rustup
adding /home/travis/.cache/sccache to cache
creating directory /home/travis/.cache/sccache

No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.

Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#build-times-out-because-no-output-was-received

The build has been terminated```

willcrozi · 2020-09-23T14:08:59Z

My own use case for this adaptor relates to generating passwords to feed into tools such as hashcat. Since a powerset is essentially a chain of combinations of increasing length over the input set, I've made use of the Combinations iterator.

The additions to Combinations and LazyBuffer are to allow a Powerset to efficiently expand and reuse a Combinations rather than repeatedly creating new ones (with the associated cost of cloning of the source buffer). This makes a difference in performance for my use case (lots of small powersets within large iterator chains), but for most uses this is likely amortized for larger powersets.

Powerset's need to detect the special case of its first element, combined with the usefulness of a functional size hint for large powersets, seems to me to warrant Powerset's pos field.

size_hint::two_exp() was added for Powerset's size hint but is quite specific and could be folded into Powerset's size_hint method instead.

The benchmarks for Combinations was added to check performance regression due to the current modifications. The Powerset benchmark was added for completeness.

Let me know what you think.

phimuemue

Thanks for this! A powerset adaptor probably comes in very handy.

I sympathize with the idea of reusing Combinations, but I think it would be easier to avoid all the inner special cases.

src/powerset.rs

src/size_hint.rs

phimuemue · 2020-09-25T08:30:27Z

src/combinations.rs

+    #[inline]
+    pub fn k(&self) -> usize { self.indices.len() }
+
+    /// Returns the (current) length of the pool from which combination elements are
+    /// selected. This value can change between invocations of `next()` and `init()`.
+    #[inline]
+    pub(crate) fn n(&self) -> usize { self.pool.len() }
+
+    /// Returns a reference to the source iterator.
+    #[inline]
+    pub(crate) fn src(&self) -> &I { &self.pool.it }


Do we really need (all of) these? E.g. at some point, we stored k explicitly, and we threw it out because we then always had multiple ways (namely k vs. indices.len to compute the very same thing), which just bloated the iterator.

These are mainly a consequence of handing most of our information over to Combinations. We could remove src() at the expense of losing a decent size_hint() but I think k() and n() have more justification:

k() and n() are used within Powerset to detect iterator completion. The alternatives I could see were a Powerset::done field (bloat) or just allowing a completed Powerset to keep increasing the size of k for its Combinations on every call to next() (wasteful, could allocate).

A combination's k and n are also part of the formal notation for combinations in set-theory, so I thought it might make sense to have them visible.

k() and n() are also necessary for the size_hint() impl.

Could potentially make code more readable e.g. self.indices.len() vs self.k()

I understand that we shouldn't jump through too many hoops for size_hint (and there are quite a few hoops here!), but since powersets can become large very quickly as input size increases, having one might be important for users.

src/combinations.rs

willcrozi · 2020-09-26T12:00:12Z

Hi, thanks for the detailed review!

I sympathize with the idea of reusing Combinations, but I think it would be easier to avoid all the inner special cases.

Agreed, please see my replies to your specific comments.

It seems there's a few issues that weave together here.

What I'm thinking:

Scrap the Inner enum and move to using increasing length Combinations from the outset, simplifying Powerset's next() impl as you suggest.
Decide between the following:
- Keep size_hint impl (and therefore k(), n() and src() methods on Combinations
- Scrap size_hint (removing Powerset::pos) and decide between:
  - Keeping k() and n() on Combinations
  - Add done field to Powerset
  - Some other way for Combinations to indicate a completed state?

Any idea on how to fix the the CI failures on jobs 2 and 3? (It gets stuck setting up sccache by the looks of it).

willcrozi · 2020-09-26T15:06:02Z

I've simplified the Powerset implementation as per suggestions (withk() and n() still included for now).

The first additional commit excludes impl of size_hint and the second includes it for comparison.

I'll take some benchmarks when I get the chance to compare to the original approach.

phimuemue

Hi there! Thanks for addressing the points - imho this improved the PR quite a bit!

Decide between the following:

* Keep `size_hint` impl (and therefore `k()`, `n()` and `src()` methods on `Combinations`

* Scrap `size_hint` (removing `Powerset::pos`) and decide between:
  
  * Keeping `k()` and `n()` on `Combinations`
  * Add `done` field to `Powerset`
  * Some other way for `Combinations` to indicate a completed state?

I think we could start out without size_hint - and if we really need it, possibly implement it for Combinations and compute Powerset's size_hint in terms of Combinations' size_hint. However, let's wait for @jswrenn's opinion.

As a side note: Could you rebase your commits, so that they are easier to review? It would make subsequent reviews much easier. Maybe separate your work along the lines of:

Pure formatting changes (if necessary)
Typos
Actual implementation
Benchmarks

@jswrenn We are increasingly confronted with huge PRs that could be easily split into smaller ones. Should we encourage this somewhere in our guidelines? (Do we have such guidelines?)

phimuemue · 2020-09-26T18:58:57Z

src/lazy_buffer.rs

@@ -44,6 +43,25 @@ where
            }
        }
    }
+
+    pub fn prefill(&mut self, len: usize) -> bool {


I think the return value is not used anywhere, so we should possibly omit it for simplicity.

Possibly for another PR: Could we unify prefill and get_next?

I think the return value is not used anywhere, so we should possibly omit it for simplicity.

Yes, I've removed this.

Possibly for another PR: Could we unify prefill and get_next?

Yes would be good. Maybe implementing one in terms of the other is the way to go?

src/lazy_buffer.rs

src/powerset.rs

src/combinations.rs

src/lazy_buffer.rs

src/powerset.rs

willcrozi · 2020-09-27T01:51:40Z

I think we could start out without size_hint - and if we really need it, possibly implement it for Combinations and compute Powerset's size_hint in terms of Combinations' size_hint. However, let's wait for @jswrenn's opinion.

Fair enough, though I'm not sure it's practical to compute a size_hint for Combinations since the formula involves three factorials: (n! / k!(n - k)!). A powerset's length is much easier to compute directly: 2^n.

As a side note: Could you rebase your commits, so that they are easier to review? It would make subsequent reviews much easier. Maybe separate your work along the lines of:
* Pure formatting changes (if necessary)
* Typos
* Actual implementation
* Benchmarks

No problem! I've split into two commits (impl and benchmarks). I've also made the doc comments more in line with the rest of the library.

src/combinations.rs

phimuemue

Spotted some minor things to bring powerset in line with other methods, etc.

src/powerset.rs

src/lazy_buffer.rs

phimuemue · 2020-09-28T20:02:27Z

src/powerset.rs

+
+/// Create a new `Powerset` from a clonable iterator.
+pub fn powerset<I>(src: I) -> Powerset<I>
+    where I: Iterator,


Should this be IntoIterator instead of Iterator? (Afaik the other methods use IntoIterator.)

combinations and all the others I've looked at so far have I: Iterator in the where clause

src/lazy_buffer.rs

willcrozi · 2020-09-28T21:53:40Z

Just to check what's the preferred approach when I've got changes based on PR review feedback:

Add a new commit specifically addressing the feedback?
or
Rebase then force-push, splitting review changes across commits according to category (typos, implementation, benchmarks etc)?

Renames tuple_combinations benchmark functions to tuple_comb_* for clarity in test results.

An iterator to iterate through the powerset of the elements from an iterator.

willcrozi · 2020-09-29T22:56:23Z

I've pushed changes addressing the resolved feedback so far.

I've split the benchmarks into two commits and ordered them so that it's easier to check any effect from the changes to introduced to Combinations. I've also tweaked the doc comment for Combinations::reset() to state "If k is larger than the current length of the data pool an attempt is made to prefill...", which is more correct.

As far as I can see the outstanding issues are:

Whether we perform the extra check for the special case of k == 0 (see discussion here)
Whether we want a size_hint() implementation (see discussion here and here)

Let me know what you think and if there are any more!

jswrenn · 2020-10-01T17:07:34Z

Fair enough, though I'm not sure it's practical to compute a size_hint for Combinations since the formula involves three factorials: (n! / k!(n - k)!). A powerset's length is much easier to compute directly: 2^n.

I'm basically convinced by this that we should have a size_hint method. I don't see any harm in it, either.

jswrenn · 2020-12-07T18:51:28Z

It's about time this gets merged. Worst case scenario, we can always fix issues in subsequent releases! Thanks for contributing this.

bors r+

335: Powerset iterator adaptor r=jswrenn a=willcrozi Implements a [powerset](https://en.wikipedia.org/wiki/Power_set) iterator adaptor that iterates over all subsets of the input iterator's elements. Returns vectors representing these subsets. Internally uses `Combinations` of increasing length. I've taken the strategy of using a 'position' field that acts both as a means to to detect the special case of the first element and also allows optimal `size_hint()` implementation. Additionally there is a commit to improve performance that alters `Combinations` implementation slightly. I've added Combinations benchmark as a stand-alone commit to allow checking for performance regressions. `Powerset` performance after this commit improves some cases (with small sizes of `n`) by 10-30% This is my first attempt at a Rust contribution, happy to put in whatever discussion/work to get this merged. Cheers! Co-authored-by: Will Crozier <willcrozi@gmail.com>

bors · 2020-12-07T20:02:33Z

Timed out.

jswrenn · 2020-12-08T17:23:39Z

bors retry

335: Powerset iterator adaptor r=jswrenn a=willcrozi Implements a [powerset](https://en.wikipedia.org/wiki/Power_set) iterator adaptor that iterates over all subsets of the input iterator's elements. Returns vectors representing these subsets. Internally uses `Combinations` of increasing length. I've taken the strategy of using a 'position' field that acts both as a means to to detect the special case of the first element and also allows optimal `size_hint()` implementation. Additionally there is a commit to improve performance that alters `Combinations` implementation slightly. I've added Combinations benchmark as a stand-alone commit to allow checking for performance regressions. `Powerset` performance after this commit improves some cases (with small sizes of `n`) by 10-30% This is my first attempt at a Rust contribution, happy to put in whatever discussion/work to get this merged. Cheers! Co-authored-by: Will Crozier <willcrozi@gmail.com>

jswrenn · 2020-12-08T17:56:37Z

My attempt at making bors happy with the new CI isn't going well, so bear with me here...

jswrenn · 2020-12-08T18:03:34Z

bors r-

bors · 2020-12-08T18:03:35Z

Canceled.

jswrenn · 2020-12-08T18:03:47Z

bors r+

335: Powerset iterator adaptor r=jswrenn a=willcrozi Implements a [powerset](https://en.wikipedia.org/wiki/Power_set) iterator adaptor that iterates over all subsets of the input iterator's elements. Returns vectors representing these subsets. Internally uses `Combinations` of increasing length. I've taken the strategy of using a 'position' field that acts both as a means to to detect the special case of the first element and also allows optimal `size_hint()` implementation. Additionally there is a commit to improve performance that alters `Combinations` implementation slightly. I've added Combinations benchmark as a stand-alone commit to allow checking for performance regressions. `Powerset` performance after this commit improves some cases (with small sizes of `n`) by 10-30% This is my first attempt at a Rust contribution, happy to put in whatever discussion/work to get this merged. Cheers! Co-authored-by: Will Crozier <willcrozi@gmail.com>

bors · 2020-12-08T19:06:34Z

Timed out.

jswrenn · 2020-12-10T19:47:54Z

bors r+

bors · 2020-12-10T19:50:42Z

Build succeeded:

bors build finished

willcrozi · 2020-12-12T19:57:12Z

It's about time this gets merged. Worst case scenario, we can always fix issues in subsequent releases! Thanks for contributing this.

No problem, great to see it finally merged!

Thanks @jswrenn and @phimuemue for the all the feedback and bearing with me. 👍

willcrozi force-pushed the powerset branch from 9faaec7 to cec4456 Compare March 1, 2019 15:55

jswrenn self-assigned this Jul 18, 2019

jswrenn added the waiting-on-review label Jul 18, 2019

willcrozi force-pushed the powerset branch from cec4456 to 577ca75 Compare September 13, 2020 15:30

willcrozi force-pushed the powerset branch 2 times, most recently from 9d539aa to c769df8 Compare September 23, 2020 00:06

jswrenn added this to the next milestone Sep 23, 2020

jswrenn reviewed Sep 23, 2020

View reviewed changes

.travis.yml Outdated Show resolved Hide resolved

willcrozi force-pushed the powerset branch 3 times, most recently from fbd8b74 to d10c03e Compare September 23, 2020 09:32

willcrozi force-pushed the powerset branch from d10c03e to d9db41e Compare September 23, 2020 10:43

phimuemue reviewed Sep 25, 2020

View reviewed changes

willcrozi force-pushed the powerset branch from 2f41e1d to aa6252d Compare September 26, 2020 16:01

phimuemue reviewed Sep 26, 2020

View reviewed changes

willcrozi force-pushed the powerset branch from aa6252d to 5a676ab Compare September 27, 2020 01:53

willcrozi commented Sep 27, 2020

View reviewed changes

src/combinations.rs Show resolved Hide resolved

jswrenn reviewed Sep 28, 2020

View reviewed changes

src/combinations.rs Outdated Show resolved Hide resolved

src/combinations.rs Outdated Show resolved Hide resolved

src/combinations.rs Show resolved Hide resolved

phimuemue reviewed Sep 28, 2020

View reviewed changes

jswrenn reviewed Sep 28, 2020

View reviewed changes

src/lazy_buffer.rs Outdated Show resolved Hide resolved

src/lazy_buffer.rs Outdated Show resolved Hide resolved

willcrozi added 3 commits September 29, 2020 22:34

Add benchmarks for Combinations

2b005c2

Renames tuple_combinations benchmark functions to tuple_comb_* for clarity in test results.

FEAT: Powerset iterator adaptor

8cdf928

An iterator to iterate through the powerset of the elements from an iterator.

Add benchmarks for Powerset

8c5d32c

willcrozi force-pushed the powerset branch from 5a676ab to 8c5d32c Compare September 29, 2020 22:30

Add Iterator::size_hint() method impl. for Powerset

83c0f04

bors bot merged commit 130ffd3 into rust-itertools:master Dec 10, 2020

dependabot bot mentioned this pull request Mar 9, 2021

Update itertools requirement from 0.9.0 to 0.10.0 hacspec/hacspec#81

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Powerset iterator adaptor #335

Powerset iterator adaptor #335

willcrozi commented Mar 1, 2019

jswrenn left a comment

willcrozi commented Sep 23, 2020

willcrozi commented Sep 23, 2020

willcrozi commented Sep 23, 2020

phimuemue left a comment

phimuemue Sep 25, 2020

willcrozi Sep 26, 2020

willcrozi commented Sep 26, 2020

willcrozi commented Sep 26, 2020

phimuemue left a comment

phimuemue Sep 26, 2020

phimuemue Sep 26, 2020

willcrozi Sep 27, 2020

willcrozi commented Sep 27, 2020 •

edited

Loading

phimuemue left a comment

phimuemue Sep 28, 2020 •

edited

Loading

willcrozi Sep 28, 2020

willcrozi commented Sep 28, 2020

willcrozi commented Sep 29, 2020

jswrenn commented Oct 1, 2020

jswrenn commented Dec 7, 2020

bors bot commented Dec 7, 2020

jswrenn commented Dec 8, 2020

jswrenn commented Dec 8, 2020

jswrenn commented Dec 8, 2020

bors bot commented Dec 8, 2020

jswrenn commented Dec 8, 2020

bors bot commented Dec 8, 2020

jswrenn commented Dec 10, 2020

bors bot commented Dec 10, 2020

willcrozi commented Dec 12, 2020

Powerset iterator adaptor #335

Powerset iterator adaptor #335

Conversation

willcrozi commented Mar 1, 2019

jswrenn left a comment

Choose a reason for hiding this comment

willcrozi commented Sep 23, 2020

willcrozi commented Sep 23, 2020

willcrozi commented Sep 23, 2020

phimuemue left a comment

Choose a reason for hiding this comment

phimuemue Sep 25, 2020

Choose a reason for hiding this comment

willcrozi Sep 26, 2020

Choose a reason for hiding this comment

willcrozi commented Sep 26, 2020

willcrozi commented Sep 26, 2020

phimuemue left a comment

Choose a reason for hiding this comment

phimuemue Sep 26, 2020

Choose a reason for hiding this comment

phimuemue Sep 26, 2020

Choose a reason for hiding this comment

willcrozi Sep 27, 2020

Choose a reason for hiding this comment

willcrozi commented Sep 27, 2020 • edited Loading

phimuemue left a comment

Choose a reason for hiding this comment

phimuemue Sep 28, 2020 • edited Loading

Choose a reason for hiding this comment

willcrozi Sep 28, 2020

Choose a reason for hiding this comment

willcrozi commented Sep 28, 2020

willcrozi commented Sep 29, 2020

jswrenn commented Oct 1, 2020

jswrenn commented Dec 7, 2020

bors bot commented Dec 7, 2020

jswrenn commented Dec 8, 2020

jswrenn commented Dec 8, 2020

jswrenn commented Dec 8, 2020

bors bot commented Dec 8, 2020

jswrenn commented Dec 8, 2020

bors bot commented Dec 8, 2020

jswrenn commented Dec 10, 2020

bors bot commented Dec 10, 2020

willcrozi commented Dec 12, 2020

willcrozi commented Sep 27, 2020 •

edited

Loading

phimuemue Sep 28, 2020 •

edited

Loading