Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Itertools::multi_cartesian_product #235

Merged
merged 1 commit into from
Jan 8, 2018

Conversation

tobz1000
Copy link
Contributor

@tobz1000 tobz1000 commented Nov 12, 2017

Hello!

I have created a new Itertools adaptor: MultiProduct. It is to be used on a meta-iterator (iterator which yield iterators) to produce the cartesian product of the iterators it yields. It is created by calling the .multi_cartesian_product() method.

This can be used, instead of the iproduct macro, when the number of iterators is not known at compile time.

It currently uses a recursive function, which makes it susceptible to stack overflows for products with a huge number of elements. I could probably rework this to use a heap stack if desired, although I haven't thought about it too much so far.

It adds 7 new top-level items to adaptors/mod.rs, so maybe it'd be worth splitting into its own file?

Any comments/questions, please let me know.

@tobz1000 tobz1000 changed the title Itertools::multi_cartesian_product Itertools::multi_cartesian_product Nov 12, 2017
@bluss
Copy link
Member

bluss commented Nov 12, 2017

Yes, please use its own file. Will review later, not sure when I have time.

@tobz1000
Copy link
Contributor Author

The 1.12 build is failing because of (at least) a syntax feature added in 1.17. Is there a reason that 1.12 is targeted?

@bluss
Copy link
Member

bluss commented Nov 12, 2017

It's the current minimum version for itertools. Being a widely used library, we don't want to change that if we can avoid it.

@bluss
Copy link
Member

bluss commented Nov 25, 2017

Thanks for working on this! This is a worthwhile feature that fits well inside the scope of itertools.

There are several items that I'd like to be addressed with this code before it can be merged. Perhaps we can find a person from the community that can review it closer?

Specifically look at

  • Avoid .collect() and making new vectors unless absolutely necessary, for example in .last().
  • Avoid .unwrap() unless absolutely necessary. Prefer pattern matching and using the Some/None-ness for control flow
  • We need to have an eye towards fuse-correctness
  • Please use quickcheck tests; we cover many more cases that way, without having to write a lot of cases by hand

@tobz1000
Copy link
Contributor Author

tobz1000 commented Nov 27, 2017

Thanks for taking the time to look at this.

  • Avoid .collect() and making new vectors unless absolutely necessary, for example in .last().

I've reworked .last to only collect once. I think, even in the case of returning None, this vec is unavoidable since it is unknown that we will need to throw away the result beforehand.

  • Avoid .unwrap() unless absolutely necessary. Prefer pattern matching and using the Some/None-ness for control flow

After modifying .last, the only remaining unwrap is in .curr_iterator(). Unfortunately, this method does rely on all MultiProductIter.curs being left in the Some state after recursively calling MultiProduct::iterate_last. I haven't been able to find an elegant way to get rid of this potential invalid state.

  • We need to have an eye towards fuse-correctness

Could you elaborate on how it should behave please? Currently, at the end of iteration, a MultiProduct is back in its starting state. It doesn't rely on the (un-)fusedness of its child iterators, since it uses a fresh clone of each (similarly to Product). Would you like it to become automatically fused if the child iterators are fused?

  • Please use quickcheck tests; we cover many more cases that way, without having to write a lot of cases by hand

I will look into this - I'm not quite sure what quickcheck achieves yet, but I'll do some research :)

I've done some thinking about this adaptor API-wise, and could use your input: would we perhaps want to yield a wrapper struct which implements Iterator, rather than a naked vec? The implementation would still use vecs underneath, but this could potentially save a breaking change in future. To complement this, a custom FromIterator for vec would save the user having to copy the vec if that's what they really want.

@bluss
Copy link
Member

bluss commented Nov 28, 2017

Great to see the progress

Would you like it to become automatically fused if the child iterators are fused?

This is not even necessary. Fuse-preserving is good but we can't always have it. The most important part is to not call .next() on any iterator we have already gotten a None from. For .fuse()d iterators and known-good ones we don't need this rule.

See qc tests in tests/quick.rs

@tobz1000
Copy link
Contributor Author

tobz1000 commented Dec 6, 2017

I've added some quickcheck tests equivalent to those already present for cartesian_product/iproduct, plus a test for .last(). They take a new Iterator, ShiftRange, which yields a number of Iters, each of which has different start/end values.

These seem to cover everything I was previously testing for in test_std.rs, so those tests have been removed.

Additionally, I've added some benchmarks to compare to iproduct. The results are roughly a 30x slowdown. :)

(Re: my suggestion to yield an iterable rather than a naked vec: I've realised that there is no type parameter to specify the type of Iterator you're collecting from :) so for the purpose of saving unnecessary memory copies, yielding vecs is probably best.)

impl<I> ExactSizeIterator for MultiProduct<I>
where I: ExactSizeIterator + Clone,
I::Item: Clone
{ }
Copy link
Member

@bluss bluss Jan 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not allowed to implement ESI since part of its contract is that the size of the iterator fits inside usize::max_value() (and we can't ensure that). That kind of small change can be adjusted after merge.

@bluss
Copy link
Member

bluss commented Jan 6, 2018

Thanks for your solid work on this!

@@ -332,7 +335,8 @@ impl<I, J> Iterator for Product<I, J>
// Compute a * b_orig + b for both lower and upper bound
size_hint::add(
size_hint::mul(self.a.size_hint(), self.b_orig.size_hint()),
(b_min * has_cur, b_max.map(move |x| x * has_cur)))
(b_min * has_cur, b_max.map(move |x| x * has_cur))
)
Copy link
Member

@bluss bluss Jan 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes unrelated to the PR in this file unfortunately (such should never be in the PR)

@bluss
Copy link
Member

bluss commented Jan 6, 2018

Could you do me a favour and clean up the commit history in the PR? how many commits it ends up with is not so imporant. If you don't do that, I can squash merge it in the end instead.

@tobz1000
Copy link
Contributor Author

tobz1000 commented Jan 6, 2018

Sure, I'll try and address your issues in the next day or so.

@tobz1000
Copy link
Contributor Author

tobz1000 commented Jan 7, 2018

Addressed your comments, removed some unnecessary casts, and squashed and rebased. You can see the latest actual changes here.

@bluss
Copy link
Member

bluss commented Jan 8, 2018

Thank you!

@bluss bluss merged commit 5ef9af2 into rust-itertools:master Jan 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants