Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Interleave iterator #15886

Closed
wants to merge 1 commit into from

Conversation

Projects
None yet
6 participants
@Gankro
Copy link
Contributor

Gankro commented Jul 21, 2014

@sfackler was interested in having this adapter, so I hacked it together.

let (b_lower, b_upper) = self.b.size_hint();

// for capturing offset for whether the "head" points at a or b
let off_by_one = if self.using_first { 1u } else { 0u };

This comment has been minimized.

@lilyball

lilyball Jul 22, 2014

Contributor

Fun fact: you can just say let off_by_one = self.using_first as uint;. Booleans can coerce to all primitive integral types.

This comment has been minimized.

@Gankro

Gankro Jul 22, 2014

Author Contributor

Neat! Would you recommend this from a clarity point of view?

This comment has been minimized.

@lilyball

lilyball Jul 22, 2014

Contributor

It's a bit simpler, and avoids a conditional (which presumably would get optimized away anyway but w/e).

It may actually be beneficial to not even do the conversion here but instead just keep the boolean a boolean and convert it where you actually need the integral value (e.g. that way 1 - off_by_one just becomes !using_first as uint).

This comment has been minimized.

@Gankro

Gankro Jul 22, 2014

Author Contributor

All else equal, I think off_by_one captures intent a bit better. I was a bit cagey about having this trick in the code at all, because it makes things a bit harder to read, in my mind. It's a big code reduction win, though. I'll yield to your expertise on the matter, though.

This comment has been minimized.

@Gankro

Gankro Jul 22, 2014

Author Contributor

Hmm... having as uint everywhere seems to be seriously dropping readability for me, are you sure?

This comment has been minimized.

@lilyball

lilyball Jul 22, 2014

Contributor

Everywhere? How often do you need it?

let upper = match (a_upper, b_upper) {
(Some(x), Some(y)) => Some(calc_size(x, y, off_by_one)),
(Some(x), None) => Some(calc_size(x, x, off_by_one).saturating_add(1)),
(None, Some(y)) => Some(calc_size(y, y, off_by_one).saturating_add(1)),

This comment has been minimized.

@lilyball

lilyball Jul 22, 2014

Contributor

You can't actually re-use calc_size() here. The problem is the lower bound wants saturating_add() as you have, but the upper bound actually wants checked_add(). If the upper bound overflows you need to return None.

I suppose you could rewrite calc_size() to use checked_add() instead and have the lower bound use .unwrap_or(uint::MAX).

This comment has been minimized.

@Gankro

Gankro Jul 22, 2014

Author Contributor

Interesting, I didn't realize "too big" should become None. Makes sense, though. Is this issue documented anywhere?

This comment has been minimized.

@lilyball

lilyball Jul 22, 2014

Contributor

Not sure if it's actually documented somewhere (I hope it is!) but it's a logical consequence of the fact that the upper bound is an upper bound.

This comment has been minimized.

@Gankro

Gankro Jul 22, 2014

Author Contributor

As an aside, I just noticed checked_add is by-reference and saturating_add is by-value. Any reason?

This comment has been minimized.

@lilyball

lilyball Jul 22, 2014

Contributor

IIRC SaturatingAdd is written in terms of CheckedAdd. The by-reference approach allows for implementing it on more types (e.g. non-Copy types) but by-value tends to be more intuitive. Overall I think the real reason is they were written by different people ;)

@Gankro

This comment has been minimized.

Copy link
Contributor Author

Gankro commented Jul 22, 2014

@kballard I've incorporated your recommendations, and fixed the errors. I didn't fully inline all the as uints because I found it increased line length and paranthesization in a way that reduced overall readability. I'll fully inline them if still desired, though.

@liigo

This comment has been minimized.

Copy link
Contributor

liigo commented Jul 22, 2014

Is this useful enough to be in lib std?

fn next(&mut self) -> Option<A> {
self.using_first = !self.using_first;
// this is backward on purpose; variable name is always accurate wrt to *next* read
if !self.using_first {

This comment has been minimized.

@lilyball

lilyball Jul 22, 2014

Contributor

You can ditch the comment by using a local variable.

let use_a = self.using_first;
self.using_first = !use_a;
if self.use_a {
    ...

This comment has been minimized.

@Gankro

Gankro Jul 22, 2014

Author Contributor

Good call.

let upper = match (a_upper, b_upper) {
(Some(x), Some(y)) => calc_size(x, y, off_by_one),
(Some(x), None) => x.checked_mul(&2).and_then(|x| x.checked_add(&1)),
(None, Some(y)) => y.checked_mul(&2).and_then(|y| y.checked_add(&1)),

This comment has been minimized.

@lilyball

lilyball Jul 22, 2014

Contributor

The upper bounds on these two cases may be unnecessarily high by 1. you're unconditionally adding 1, but you really only should add it depending on the value of off_by_one.

    (Some(x), None) => x.checked_mul(&2).and_then(|x| x.checked_add(&(1 - off_by_one))),
    (None, Some(y)) => y.checked_mul(&2).and_then(|y| y.checked_add(&off_by_one))),

You could even simplify this to calc_size() replacing None with uint::MAX. It will come up with the same result.

    (Some(x), None) => calc_size(x, uint::MAX, off_by_one),
    (None, Some(y)) => calc_size(uint::MAX, y, off_by_one),

(note: I'm not saying you should do the latter; in fact, the former might be slightly more obviously correct, as the latter is replacing an unbounded value with uint::MAX, which does happen to work in all cases but isn't necessarily as easy to follow)

This comment has been minimized.

@Gankro

Gankro Jul 22, 2014

Author Contributor

@kballard it's certainly doomed to be at least off-by-one without the other structure's information. You're right that I unnecessarily make it off-by-two in a specific case. I think I had gotten myself caught in a bad logic loop and convinced myself that it's "best to be safe", especially since the quality of the upper bound is already going to be imperfect just by the nature of getting one None. But assuming the None is representing "huge" and not just "I dunno", we can get perfectly accurate results.

I actually do like your suggestion of deferring to calc_size with uint::MAX, as I think it effectively captures the reasoning behind the explicit code I've written: assume that the other iterator is bigger (because that's all you can do).

This comment has been minimized.

@lilyball

lilyball Jul 22, 2014

Contributor

Yeah, uint::MAX works well enough, because if you have a Some bound then you know it's <= uint::MAX, and your calculation is based upon the smaller of the two. So uint::MAX is guaranteed to work.

This comment has been minimized.

@Gankro

Gankro Jul 22, 2014

Author Contributor

Alright, integrated both of your cleanups. Tossed in a comment for this one, since I felt it was best to make it explicit what we're doing.

let (a_sz, a_upper) = self.a.size_hint();
let (b_sz, b_upper) = self.b.size_hint();
assert!(a_upper == Some(a_sz));
assert!(b_upper == Some(b_sz));

This comment has been minimized.

@lilyball

lilyball Jul 22, 2014

Contributor

This probably isn't the PR for it, but I'm surprised ExactSize doesn't just vend a method .exact_size() that includes the assert, so that way iterator adaptors don't have to keep asserting.

This comment has been minimized.

@Gankro

Gankro Jul 22, 2014

Author Contributor

Honestly it struck me as weird to even bother with these asserts. Not trusting something that claims to implement an API to implement it strikes me as way over-defensive. But I figured if Zip does it, I probably should.

I could write up a PR to add it if you're honestly interested though, since I'm hacking away in here already.

This comment has been minimized.

@lilyball

lilyball Jul 22, 2014

Contributor

Yeah, I'm not really sure what the correct answer is. I think the asserts are here because the ExactSize trait demands invariants that can be broken without unsafe code. So the idea is to fail early in any clients of ExactSize if the invariant is broken, so that way the client code doesn't break its own invariants. But on the other hand, it really does seem silly to have to constantly assert that a given trait obeys its own invariants.

This comment has been minimized.

@bluss

bluss Jul 28, 2014

Contributor

kballard it effectively does now -- the .len() method; the Zip impl just hasn't been updated to use it.

I think being overly defensive is fine; I mean that was a way to introduce the ExactSize trait without too many detractors -- I just wanted a way to sensibly define a double ended enumerate() on vec iterator; I'm not sure if every type should bother doing the same

}
let (a_sz, _) = self.a.size_hint();
let (b_sz, _) = self.b.size_hint();
assert!(a_sz == b_sz || a_sz + (1 - off_by_one) == b_sz + off_by_one);

This comment has been minimized.

@lilyball

lilyball Jul 22, 2014

Contributor

I realize this is based on the Zip iterator code, but I'm not sure what the point is of re-fetching the size.

This comment has been minimized.

@Gankro

Gankro Jul 22, 2014

Author Contributor

@kballard: to determine how much you've actually cut out, I guess? I mean, you can figure it out based on your range, in theory. This code seems very defensively designed. At least for my use, it just means less code spent tracking which branch I went into and edge-case interactions with almost-empty ranges.

This comment has been minimized.

@lilyball

lilyball Jul 22, 2014

Contributor

You could just make a_sz and b_sz mut and modify them as appropriate in the if-condition. It will be potentially much more efficient, as size_hint() may have to run an arbitrarily-complex calculation.

This comment has been minimized.

@Gankro

Gankro Jul 22, 2014

Author Contributor

I'm tempted to leave this as part of the general "assert that everything still works" pattern that seems to be around the file. I agree that it's wasteful, but it's wasteful to have the asserts at all. I'd rather be consistent and either have both or neither. Disagree?

We should also update Zip to match, if we do anything.

This comment has been minimized.

@lilyball

lilyball Jul 22, 2014

Contributor

I guess it doesn't really matter. My preference would be to skip the asserts here, and let Zip be updated in a different PR (if anyone really cares), but if you want to leave the asserts in for consistency, I suppose you can do that.

This comment has been minimized.

@sfackler

sfackler Jul 22, 2014

Member

It should probably be a debug_assert!(...).

This comment has been minimized.

@lilyball

lilyball Jul 23, 2014

Contributor

@sfackler I wish. I think it's just assert!() in Zip (and therefore here) to ensure that user-defined iterators meet the invariants. The problem is, ideally that testing would occur if the user-defined iterator is compiled with debug asserts enabled, but using debug_assert!() here only runs the assert if the standard library is compiled with debug asserts enabled. Therefore, it just uses assert!().

#[inline]
fn indexable(&self) -> uint {
let (lower, _) = self.size_hint();
lower

This comment has been minimized.

@lilyball

lilyball Jul 22, 2014

Contributor

This isn't right. RandomAccessIterator does not imply ExactSize. This means that it's technically ok for the size_hint() to return a lower bound that's lower than the indexable range of the iterator. I can't imagine that it's particularly useful to do that, but it's legal.

To that end you need to replicate the calculation here, using self.a.indexable() and self.b.indexable() instead of the lower bound.

This comment has been minimized.

@Gankro

Gankro Jul 22, 2014

Author Contributor

Do you think I should just copy-paste the logic here, or split it out into a private method or function of the module? Not sure on namespace pollution policy in this case.

This comment has been minimized.

@lilyball

lilyball Jul 22, 2014

Contributor

I'm not sure. A private function would probably be fine. Or, since this is in a large file with other types, perhaps a private self-less method instead (I am recommending against a regular method because a regular method shouldn't take arguments that could contradict with the fields of the type, but you don't want to be re-calling size_hint() on the nested iterators repeatedly if you can help it).

This comment has been minimized.

@Gankro

Gankro Jul 22, 2014

Author Contributor

Went with a private function for the reasons you stated. I'm working on compiling again but I made the mistake of switching branches and rebuilding everything takes like... 3 hours on my laptop.

Otherwise I think I've addressed your latest batch of issues?

@lilyball

This comment has been minimized.

Copy link
Contributor

lilyball commented Jul 23, 2014

At this point I think it LGTM, although the commits need to be squashed together. Also, I'm not going to unilaterally decide that this is indeed appropriate to put in the standard library. If someone else agrees that this is good, then r=me with the commits squashed.

@Gankro

This comment has been minimized.

Copy link
Contributor Author

Gankro commented Jul 23, 2014

@kballard I've made a retroactive RFC: rust-lang/rfcs#180

@pnkfelix pnkfelix referenced this pull request Jul 23, 2014

Closed

0000-interleave-iter #180

@Gankro

This comment has been minimized.

Copy link
Contributor Author

Gankro commented Jul 23, 2014

RFC killed, downgraded to pesudo-rfc on discuss: http://discuss.rust-lang.org/t/pseudo-rfc-interleave-iterator/249

@Gankro

This comment has been minimized.

Copy link
Contributor Author

Gankro commented Jul 23, 2014

@cmr seems to be giving this the thumbs up, concept-wise.

@Gankro

This comment has been minimized.

Copy link
Contributor Author

Gankro commented Jul 23, 2014

Rebased. @kballard: r?

@brson

This comment has been minimized.

Copy link
Contributor

brson commented Jul 24, 2014

I'm skeptical of the value of having this in std. We have many iterators already, some of questionable utility. Does anybody have a sense of how common this is?

@Gankro

This comment has been minimized.

Copy link
Contributor Author

Gankro commented Jul 24, 2014

@brson: as far as I can tell from a cursory search of functional languages that I know the name of, it seems nonexistant as a std adapter. But I'm not an expert on functional languages, and I find their docs to be fairly tedious to search. Although maybe this sort of thing is easier to implement in more functional languages? Or maybe they just don't care about things like random access, size, and bi-directionality, which is what makes it non-trivial.

I don't have a lot to say about its usefulness that I haven't already. A couple people expressed a desire for it, and it's non-trivial for someone to implement a full version (not just a plain Iterator) correctly. I personally don't need it for anything, I wrote it because it seemed like an interesting learning project.

Perhaps @sfackler or someone in the thread can provide a more rigorous argument for its inclusion.

Honestly, at this point my primary motivation to get it in is just to not have wasted the ever-patient and helpful @kballard's time. :D

@lilyball

This comment has been minimized.

Copy link
Contributor

lilyball commented Jul 24, 2014

I don't have much of an opinion as to whether it's worth having in the standard library. If not, perhaps @Gankro could start a libiterators library that people can include via cargo to hold the more esoteric things like interleave. Although offhand I don't know what else to put in there.

@liigo

This comment has been minimized.

Copy link
Contributor

liigo commented Jul 24, 2014

I'm skeptical of the value of having this in std.

+1, it's not very useful

@brson

This comment has been minimized.

Copy link
Contributor

brson commented Jul 25, 2014

@Gankro, I'm going to close this on the assumption that it isn't generally useful enough to provide to everybody by default (we have to worry about long-term bloat caused by maintaining features forever). Thank you for the contribution though.

@brson brson closed this Jul 25, 2014

@Gankro

This comment has been minimized.

Copy link
Contributor Author

Gankro commented Jul 25, 2014

@brson no worries, I was going to close it in an hour or so myself, for the same reasons.

bors added a commit that referenced this pull request Aug 24, 2014

auto merge of #16728 : bluss/rust/zip-next-back, r=alexcrichton
Use ExactSize::len() and defer to its decisions about overly defensive
assertions. Remove the length double-check and simply put a failure
case if the Zip finds an uneven end in .next_back().

Fixing this up since I think I wrote this, and it's been known to
confuse rusties (PR #15886).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.