Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upAdd Interleave iterator #15886
Conversation
lilyball
reviewed
Jul 22, 2014
| let (b_lower, b_upper) = self.b.size_hint(); | ||
|
|
||
| // for capturing offset for whether the "head" points at a or b | ||
| let off_by_one = if self.using_first { 1u } else { 0u }; |
This comment has been minimized.
This comment has been minimized.
lilyball
Jul 22, 2014
Contributor
Fun fact: you can just say let off_by_one = self.using_first as uint;. Booleans can coerce to all primitive integral types.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
lilyball
Jul 22, 2014
Contributor
It's a bit simpler, and avoids a conditional (which presumably would get optimized away anyway but w/e).
It may actually be beneficial to not even do the conversion here but instead just keep the boolean a boolean and convert it where you actually need the integral value (e.g. that way 1 - off_by_one just becomes !using_first as uint).
This comment has been minimized.
This comment has been minimized.
Gankro
Jul 22, 2014
Author
Contributor
All else equal, I think off_by_one captures intent a bit better. I was a bit cagey about having this trick in the code at all, because it makes things a bit harder to read, in my mind. It's a big code reduction win, though. I'll yield to your expertise on the matter, though.
This comment has been minimized.
This comment has been minimized.
Gankro
Jul 22, 2014
Author
Contributor
Hmm... having as uint everywhere seems to be seriously dropping readability for me, are you sure?
This comment has been minimized.
This comment has been minimized.
lilyball
reviewed
Jul 22, 2014
| let upper = match (a_upper, b_upper) { | ||
| (Some(x), Some(y)) => Some(calc_size(x, y, off_by_one)), | ||
| (Some(x), None) => Some(calc_size(x, x, off_by_one).saturating_add(1)), | ||
| (None, Some(y)) => Some(calc_size(y, y, off_by_one).saturating_add(1)), |
This comment has been minimized.
This comment has been minimized.
lilyball
Jul 22, 2014
Contributor
You can't actually re-use calc_size() here. The problem is the lower bound wants saturating_add() as you have, but the upper bound actually wants checked_add(). If the upper bound overflows you need to return None.
I suppose you could rewrite calc_size() to use checked_add() instead and have the lower bound use .unwrap_or(uint::MAX).
This comment has been minimized.
This comment has been minimized.
Gankro
Jul 22, 2014
Author
Contributor
Interesting, I didn't realize "too big" should become None. Makes sense, though. Is this issue documented anywhere?
This comment has been minimized.
This comment has been minimized.
lilyball
Jul 22, 2014
Contributor
Not sure if it's actually documented somewhere (I hope it is!) but it's a logical consequence of the fact that the upper bound is an upper bound.
This comment has been minimized.
This comment has been minimized.
Gankro
Jul 22, 2014
Author
Contributor
As an aside, I just noticed checked_add is by-reference and saturating_add is by-value. Any reason?
This comment has been minimized.
This comment has been minimized.
lilyball
Jul 22, 2014
Contributor
IIRC SaturatingAdd is written in terms of CheckedAdd. The by-reference approach allows for implementing it on more types (e.g. non-Copy types) but by-value tends to be more intuitive. Overall I think the real reason is they were written by different people ;)
This comment has been minimized.
This comment has been minimized.
|
@kballard I've incorporated your recommendations, and fixed the errors. I didn't fully inline all the |
This comment has been minimized.
This comment has been minimized.
|
Is this useful enough to be in lib std? |
lilyball
reviewed
Jul 22, 2014
| fn next(&mut self) -> Option<A> { | ||
| self.using_first = !self.using_first; | ||
| // this is backward on purpose; variable name is always accurate wrt to *next* read | ||
| if !self.using_first { |
This comment has been minimized.
This comment has been minimized.
lilyball
Jul 22, 2014
Contributor
You can ditch the comment by using a local variable.
let use_a = self.using_first;
self.using_first = !use_a;
if self.use_a {
...
This comment has been minimized.
This comment has been minimized.
lilyball
reviewed
Jul 22, 2014
| let upper = match (a_upper, b_upper) { | ||
| (Some(x), Some(y)) => calc_size(x, y, off_by_one), | ||
| (Some(x), None) => x.checked_mul(&2).and_then(|x| x.checked_add(&1)), | ||
| (None, Some(y)) => y.checked_mul(&2).and_then(|y| y.checked_add(&1)), |
This comment has been minimized.
This comment has been minimized.
lilyball
Jul 22, 2014
Contributor
The upper bounds on these two cases may be unnecessarily high by 1. you're unconditionally adding 1, but you really only should add it depending on the value of off_by_one.
(Some(x), None) => x.checked_mul(&2).and_then(|x| x.checked_add(&(1 - off_by_one))),
(None, Some(y)) => y.checked_mul(&2).and_then(|y| y.checked_add(&off_by_one))),You could even simplify this to calc_size() replacing None with uint::MAX. It will come up with the same result.
(Some(x), None) => calc_size(x, uint::MAX, off_by_one),
(None, Some(y)) => calc_size(uint::MAX, y, off_by_one),(note: I'm not saying you should do the latter; in fact, the former might be slightly more obviously correct, as the latter is replacing an unbounded value with uint::MAX, which does happen to work in all cases but isn't necessarily as easy to follow)
This comment has been minimized.
This comment has been minimized.
Gankro
Jul 22, 2014
Author
Contributor
@kballard it's certainly doomed to be at least off-by-one without the other structure's information. You're right that I unnecessarily make it off-by-two in a specific case. I think I had gotten myself caught in a bad logic loop and convinced myself that it's "best to be safe", especially since the quality of the upper bound is already going to be imperfect just by the nature of getting one None. But assuming the None is representing "huge" and not just "I dunno", we can get perfectly accurate results.
I actually do like your suggestion of deferring to calc_size with uint::MAX, as I think it effectively captures the reasoning behind the explicit code I've written: assume that the other iterator is bigger (because that's all you can do).
This comment has been minimized.
This comment has been minimized.
lilyball
Jul 22, 2014
Contributor
Yeah, uint::MAX works well enough, because if you have a Some bound then you know it's <= uint::MAX, and your calculation is based upon the smaller of the two. So uint::MAX is guaranteed to work.
This comment has been minimized.
This comment has been minimized.
Gankro
Jul 22, 2014
Author
Contributor
Alright, integrated both of your cleanups. Tossed in a comment for this one, since I felt it was best to make it explicit what we're doing.
lilyball
reviewed
Jul 22, 2014
| let (a_sz, a_upper) = self.a.size_hint(); | ||
| let (b_sz, b_upper) = self.b.size_hint(); | ||
| assert!(a_upper == Some(a_sz)); | ||
| assert!(b_upper == Some(b_sz)); |
This comment has been minimized.
This comment has been minimized.
lilyball
Jul 22, 2014
Contributor
This probably isn't the PR for it, but I'm surprised ExactSize doesn't just vend a method .exact_size() that includes the assert, so that way iterator adaptors don't have to keep asserting.
This comment has been minimized.
This comment has been minimized.
Gankro
Jul 22, 2014
Author
Contributor
Honestly it struck me as weird to even bother with these asserts. Not trusting something that claims to implement an API to implement it strikes me as way over-defensive. But I figured if Zip does it, I probably should.
I could write up a PR to add it if you're honestly interested though, since I'm hacking away in here already.
This comment has been minimized.
This comment has been minimized.
lilyball
Jul 22, 2014
Contributor
Yeah, I'm not really sure what the correct answer is. I think the asserts are here because the ExactSize trait demands invariants that can be broken without unsafe code. So the idea is to fail early in any clients of ExactSize if the invariant is broken, so that way the client code doesn't break its own invariants. But on the other hand, it really does seem silly to have to constantly assert that a given trait obeys its own invariants.
This comment has been minimized.
This comment has been minimized.
bluss
Jul 28, 2014
Contributor
kballard it effectively does now -- the .len() method; the Zip impl just hasn't been updated to use it.
I think being overly defensive is fine; I mean that was a way to introduce the ExactSize trait without too many detractors -- I just wanted a way to sensibly define a double ended enumerate() on vec iterator; I'm not sure if every type should bother doing the same
lilyball
reviewed
Jul 22, 2014
| } | ||
| let (a_sz, _) = self.a.size_hint(); | ||
| let (b_sz, _) = self.b.size_hint(); | ||
| assert!(a_sz == b_sz || a_sz + (1 - off_by_one) == b_sz + off_by_one); |
This comment has been minimized.
This comment has been minimized.
lilyball
Jul 22, 2014
Contributor
I realize this is based on the Zip iterator code, but I'm not sure what the point is of re-fetching the size.
This comment has been minimized.
This comment has been minimized.
Gankro
Jul 22, 2014
Author
Contributor
@kballard: to determine how much you've actually cut out, I guess? I mean, you can figure it out based on your range, in theory. This code seems very defensively designed. At least for my use, it just means less code spent tracking which branch I went into and edge-case interactions with almost-empty ranges.
This comment has been minimized.
This comment has been minimized.
lilyball
Jul 22, 2014
Contributor
You could just make a_sz and b_sz mut and modify them as appropriate in the if-condition. It will be potentially much more efficient, as size_hint() may have to run an arbitrarily-complex calculation.
This comment has been minimized.
This comment has been minimized.
Gankro
Jul 22, 2014
Author
Contributor
I'm tempted to leave this as part of the general "assert that everything still works" pattern that seems to be around the file. I agree that it's wasteful, but it's wasteful to have the asserts at all. I'd rather be consistent and either have both or neither. Disagree?
We should also update Zip to match, if we do anything.
This comment has been minimized.
This comment has been minimized.
lilyball
Jul 22, 2014
Contributor
I guess it doesn't really matter. My preference would be to skip the asserts here, and let Zip be updated in a different PR (if anyone really cares), but if you want to leave the asserts in for consistency, I suppose you can do that.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
lilyball
Jul 23, 2014
Contributor
@sfackler I wish. I think it's just assert!() in Zip (and therefore here) to ensure that user-defined iterators meet the invariants. The problem is, ideally that testing would occur if the user-defined iterator is compiled with debug asserts enabled, but using debug_assert!() here only runs the assert if the standard library is compiled with debug asserts enabled. Therefore, it just uses assert!().
lilyball
reviewed
Jul 22, 2014
| #[inline] | ||
| fn indexable(&self) -> uint { | ||
| let (lower, _) = self.size_hint(); | ||
| lower |
This comment has been minimized.
This comment has been minimized.
lilyball
Jul 22, 2014
Contributor
This isn't right. RandomAccessIterator does not imply ExactSize. This means that it's technically ok for the size_hint() to return a lower bound that's lower than the indexable range of the iterator. I can't imagine that it's particularly useful to do that, but it's legal.
To that end you need to replicate the calculation here, using self.a.indexable() and self.b.indexable() instead of the lower bound.
This comment has been minimized.
This comment has been minimized.
Gankro
Jul 22, 2014
Author
Contributor
Do you think I should just copy-paste the logic here, or split it out into a private method or function of the module? Not sure on namespace pollution policy in this case.
This comment has been minimized.
This comment has been minimized.
lilyball
Jul 22, 2014
Contributor
I'm not sure. A private function would probably be fine. Or, since this is in a large file with other types, perhaps a private self-less method instead (I am recommending against a regular method because a regular method shouldn't take arguments that could contradict with the fields of the type, but you don't want to be re-calling size_hint() on the nested iterators repeatedly if you can help it).
This comment has been minimized.
This comment has been minimized.
Gankro
Jul 22, 2014
Author
Contributor
Went with a private function for the reasons you stated. I'm working on compiling again but I made the mistake of switching branches and rebuilding everything takes like... 3 hours on my laptop.
Otherwise I think I've addressed your latest batch of issues?
This comment has been minimized.
This comment has been minimized.
|
At this point I think it LGTM, although the commits need to be squashed together. Also, I'm not going to unilaterally decide that this is indeed appropriate to put in the standard library. If someone else agrees that this is good, then r=me with the commits squashed. |
This comment has been minimized.
This comment has been minimized.
|
@kballard I've made a retroactive RFC: rust-lang/rfcs#180 |
This comment has been minimized.
This comment has been minimized.
|
RFC killed, downgraded to pesudo-rfc on discuss: http://discuss.rust-lang.org/t/pseudo-rfc-interleave-iterator/249 |
This comment has been minimized.
This comment has been minimized.
|
@cmr seems to be giving this the thumbs up, concept-wise. |
This comment has been minimized.
This comment has been minimized.
|
Rebased. @kballard: r? |
This comment has been minimized.
This comment has been minimized.
|
I'm skeptical of the value of having this in std. We have many iterators already, some of questionable utility. Does anybody have a sense of how common this is? |
This comment has been minimized.
This comment has been minimized.
|
@brson: as far as I can tell from a cursory search of functional languages that I know the name of, it seems nonexistant as a std adapter. But I'm not an expert on functional languages, and I find their docs to be fairly tedious to search. Although maybe this sort of thing is easier to implement in more functional languages? Or maybe they just don't care about things like random access, size, and bi-directionality, which is what makes it non-trivial. I don't have a lot to say about its usefulness that I haven't already. A couple people expressed a desire for it, and it's non-trivial for someone to implement a full version (not just a plain Iterator) correctly. I personally don't need it for anything, I wrote it because it seemed like an interesting learning project. Perhaps @sfackler or someone in the thread can provide a more rigorous argument for its inclusion. Honestly, at this point my primary motivation to get it in is just to not have wasted the ever-patient and helpful @kballard's time. :D |
This comment has been minimized.
This comment has been minimized.
|
I don't have much of an opinion as to whether it's worth having in the standard library. If not, perhaps @Gankro could start a |
This comment has been minimized.
This comment has been minimized.
+1, it's not very useful |
This comment has been minimized.
This comment has been minimized.
|
@Gankro, I'm going to close this on the assumption that it isn't generally useful enough to provide to everybody by default (we have to worry about long-term bloat caused by maintaining features forever). Thank you for the contribution though. |
brson
closed this
Jul 25, 2014
This comment has been minimized.
This comment has been minimized.
|
@brson no worries, I was going to close it in an hour or so myself, for the same reasons. |
Gankro commentedJul 21, 2014
@sfackler was interested in having this adapter, so I hacked it together.