-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Powerset iterator adaptor #335
Conversation
9d539aa
to
c769df8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to get this into the next Itertools release! Sorry for the long delay in review!
fbd8b74
to
d10c03e
Compare
Great to hear, thanks! I've rewritten this feature since the original pull request. I think the code is ready and was intending on writing some explanatory notes, which I can post here later today. It seems the CI build for the Powerset benchmark times out (it completes successfully on my fork). I'll take a look into this. |
Travis CI jobs 2 and 3 are both getting stuck and terminated after a 10 minute limit. I've reduced the length of the powerset benchmarks (needed doing anyway TBH) but this seems related to caching.
|
My own use case for this adaptor relates to generating passwords to feed into tools such as hashcat. Since a powerset is essentially a chain of combinations of increasing length over the input set, I've made use of the The additions to
The benchmarks for Let me know what you think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! A powerset adaptor probably comes in very handy.
I sympathize with the idea of reusing Combinations
, but I think it would be easier to avoid all the inner
special cases.
src/combinations.rs
Outdated
#[inline] | ||
pub fn k(&self) -> usize { self.indices.len() } | ||
|
||
/// Returns the (current) length of the pool from which combination elements are | ||
/// selected. This value can change between invocations of `next()` and `init()`. | ||
#[inline] | ||
pub(crate) fn n(&self) -> usize { self.pool.len() } | ||
|
||
/// Returns a reference to the source iterator. | ||
#[inline] | ||
pub(crate) fn src(&self) -> &I { &self.pool.it } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need (all of) these? E.g. at some point, we stored k
explicitly, and we threw it out because we then always had multiple ways (namely k
vs. indices.len
to compute the very same thing), which just bloated the iterator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are mainly a consequence of handing most of our information over to Combinations
. We could remove src()
at the expense of losing a decent size_hint()
but I think k()
and n()
have more justification:
k()
andn()
are used withinPowerset
to detect iterator completion. The alternatives I could see were aPowerset::done
field (bloat) or just allowing a completedPowerset
to keep increasing the size ofk
for itsCombinations
on every call tonext()
(wasteful, could allocate).- A combination's
k
andn
are also part of the formal notation for combinations in set-theory, so I thought it might make sense to have them visible. k()
andn()
are also necessary for thesize_hint()
impl.- Could potentially make code more readable e.g.
self.indices.len()
vsself.k()
I understand that we shouldn't jump through too many hoops for size_hint
(and there are quite a few hoops here!), but since powersets can become large very quickly as input size increases, having one might be important for users.
Hi, thanks for the detailed review!
Agreed, please see my replies to your specific comments. It seems there's a few issues that weave together here. What I'm thinking:
Any idea on how to fix the the CI failures on jobs 2 and 3? (It gets stuck setting up |
I've simplified the The first additional commit excludes impl of I'll take some benchmarks when I get the chance to compare to the original approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi there! Thanks for addressing the points - imho this improved the PR quite a bit!
Decide between the following:
* Keep `size_hint` impl (and therefore `k()`, `n()` and `src()` methods on `Combinations` * Scrap `size_hint` (removing `Powerset::pos`) and decide between: * Keeping `k()` and `n()` on `Combinations` * Add `done` field to `Powerset` * Some other way for `Combinations` to indicate a completed state?
I think we could start out without size_hint
- and if we really need it, possibly implement it for Combinations
and compute Powerset
's size_hint
in terms of Combinations
' size_hint
. However, let's wait for @jswrenn's opinion.
As a side note: Could you rebase your commits, so that they are easier to review? It would make subsequent reviews much easier. Maybe separate your work along the lines of:
- Pure formatting changes (if necessary)
- Typos
- Actual implementation
- Benchmarks
@jswrenn We are increasingly confronted with huge PRs that could be easily split into smaller ones. Should we encourage this somewhere in our guidelines? (Do we have such guidelines?)
src/lazy_buffer.rs
Outdated
@@ -44,6 +43,25 @@ where | |||
} | |||
} | |||
} | |||
|
|||
pub fn prefill(&mut self, len: usize) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the return value is not used anywhere, so we should possibly omit it for simplicity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly for another PR: Could we unify prefill
and get_next
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the return value is not used anywhere, so we should possibly omit it for simplicity.
Yes, I've removed this.
Possibly for another PR: Could we unify
prefill
andget_next
?
Yes would be good. Maybe implementing one in terms of the other is the way to go?
Fair enough, though I'm not sure it's practical to compute a
No problem! I've split into two commits (impl and benchmarks). I've also made the doc comments more in line with the rest of the library. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spotted some minor things to bring powerset
in line with other methods, etc.
|
||
/// Create a new `Powerset` from a clonable iterator. | ||
pub fn powerset<I>(src: I) -> Powerset<I> | ||
where I: Iterator, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be IntoIterator
instead of Iterator
? (Afaik the other methods use IntoIterator
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
combinations
and all the others I've looked at so far have I: Iterator
in the where clause
Just to check what's the preferred approach when I've got changes based on PR review feedback:
|
Renames tuple_combinations benchmark functions to tuple_comb_* for clarity in test results.
An iterator to iterate through the powerset of the elements from an iterator.
I've pushed changes addressing the resolved feedback so far. I've split the benchmarks into two commits and ordered them so that it's easier to check any effect from the changes to introduced to As far as I can see the outstanding issues are:
Let me know what you think and if there are any more! |
I'm basically convinced by this that we should have a |
It's about time this gets merged. Worst case scenario, we can always fix issues in subsequent releases! Thanks for contributing this. bors r+ |
335: Powerset iterator adaptor r=jswrenn a=willcrozi Implements a [powerset](https://en.wikipedia.org/wiki/Power_set) iterator adaptor that iterates over all subsets of the input iterator's elements. Returns vectors representing these subsets. Internally uses `Combinations` of increasing length. I've taken the strategy of using a 'position' field that acts both as a means to to detect the special case of the first element and also allows optimal `size_hint()` implementation. Additionally there is a commit to improve performance that alters `Combinations` implementation slightly. I've added Combinations benchmark as a stand-alone commit to allow checking for performance regressions. `Powerset` performance after this commit improves some cases (with small sizes of `n`) by 10-30% This is my first attempt at a Rust contribution, happy to put in whatever discussion/work to get this merged. Cheers! Co-authored-by: Will Crozier <willcrozi@gmail.com>
Timed out. |
bors retry |
335: Powerset iterator adaptor r=jswrenn a=willcrozi Implements a [powerset](https://en.wikipedia.org/wiki/Power_set) iterator adaptor that iterates over all subsets of the input iterator's elements. Returns vectors representing these subsets. Internally uses `Combinations` of increasing length. I've taken the strategy of using a 'position' field that acts both as a means to to detect the special case of the first element and also allows optimal `size_hint()` implementation. Additionally there is a commit to improve performance that alters `Combinations` implementation slightly. I've added Combinations benchmark as a stand-alone commit to allow checking for performance regressions. `Powerset` performance after this commit improves some cases (with small sizes of `n`) by 10-30% This is my first attempt at a Rust contribution, happy to put in whatever discussion/work to get this merged. Cheers! Co-authored-by: Will Crozier <willcrozi@gmail.com>
My attempt at making bors happy with the new CI isn't going well, so bear with me here... |
bors r- |
Canceled. |
bors r+ |
335: Powerset iterator adaptor r=jswrenn a=willcrozi Implements a [powerset](https://en.wikipedia.org/wiki/Power_set) iterator adaptor that iterates over all subsets of the input iterator's elements. Returns vectors representing these subsets. Internally uses `Combinations` of increasing length. I've taken the strategy of using a 'position' field that acts both as a means to to detect the special case of the first element and also allows optimal `size_hint()` implementation. Additionally there is a commit to improve performance that alters `Combinations` implementation slightly. I've added Combinations benchmark as a stand-alone commit to allow checking for performance regressions. `Powerset` performance after this commit improves some cases (with small sizes of `n`) by 10-30% This is my first attempt at a Rust contribution, happy to put in whatever discussion/work to get this merged. Cheers! Co-authored-by: Will Crozier <willcrozi@gmail.com>
Timed out. |
bors r+ |
Build succeeded: |
No problem, great to see it finally merged! Thanks @jswrenn and @phimuemue for the all the feedback and bearing with me. 👍 |
Implements a powerset iterator adaptor that iterates over all subsets of the input iterator's elements. Returns vectors representing these subsets. Internally uses
Combinations
of increasing length.I've taken the strategy of using a 'position' field that acts both as a means to to detect the special case of the first element and also allows optimal
size_hint()
implementation.Additionally there is a commit to improve performance that alters
Combinations
implementation slightly. I've added Combinations benchmark as a stand-alone commit to allow checking for performance regressions.Powerset
performance after this commit improves some cases (with small sizes ofn
) by 10-30%This is my first attempt at a Rust contribution, happy to put in whatever discussion/work to get this merged. Cheers!