Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Calculate capacity when collecting into Option and Result #52910

Open
wants to merge 1 commit into
base: master
from

Conversation

@ljedrz
Copy link
Contributor

ljedrz commented Jul 31, 2018

I was browsing the perf page to see the impact of my recent changes (e.g. #52697) and I was surprised that some of the results were not as awesome as I expected. I dug some more and found an issue that is the probable culprit: Collecting into a Result<Vec<_>> doesn't reserve the capacity in advance.

Collecting into Option or Result might result in an empty collection, but there is no reason why we shouldn't provide a non-zero lower bound when we know the Iterator we are collecting from doesn't contain any None or Err.

We know this, because the Adapter iterator used in the FromIterator implementations for Option and Result registers if any None or Err are present in the Iterator in question; we can use this information and return a more accurate lower bound in case we know it won't be equal to zero.

I have benchmarked collecting into Option and Result using the current implementation and one with the proposed changes; I have also benchmarked a push loop with a known capacity as a reference that should be slower than using FromIterator (i.e. collect()). The results are quite promising:

test bench_collect_to_option_new ... bench:         246 ns/iter (+/- 23)
test bench_collect_to_option_old ... bench:         954 ns/iter (+/- 54)
test bench_collect_to_result_new ... bench:         250 ns/iter (+/- 25)
test bench_collect_to_result_old ... bench:         939 ns/iter (+/- 104)
test bench_push_loop_to_option   ... bench:         294 ns/iter (+/- 21)
test bench_push_loop_to_result   ... bench:         303 ns/iter (+/- 29)

Fixes #48994.

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Jul 31, 2018

r? @sfackler

(rust_highfive has picked a reviewer for you, use r? to override)

@sfackler

This comment has been minimized.

Copy link
Member

sfackler commented Jul 31, 2018

but there is no reason why we shouldn't provide a non-zero lower bound when we know the Iterator we are collecting from doesn't contain any None or Err.

We don't know that the iterator doesn't contain a None or Err, just that we haven't seen one yet. That being said, it does seem pretty reasonable to optimize for the case where there is no None or Err since that's probably the common case.

@rfcbot fcp merge

@rfcbot

This comment has been minimized.

Copy link

rfcbot commented Jul 31, 2018

Team member @sfackler has proposed to merge this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Jul 31, 2018

Even if there are any Nones and Errs within the iterator, method will still end up collecting all the elements into some collection before the first None or Err and the collection gets discarded, right? This change is a no-brainer.

@rfcbot

This comment has been minimized.

Copy link

rfcbot commented Aug 1, 2018

🔔 This is now entering its final comment period, as per the review above. 🔔

@sfackler

This comment has been minimized.

Copy link
Member

sfackler commented Aug 1, 2018

@bors r+

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Aug 1, 2018

📌 Commit 77aa031 has been approved by sfackler

@ollie27

This comment has been minimized.

Copy link
Contributor

ollie27 commented Aug 1, 2018

Isn't this creating an iterator that might return less elements than the lower bound of its size_hint? That's a violation of the trait's protocol: https://doc.rust-lang.org/nightly/std/iter/trait.Iterator.html#implementation-notes

@sfackler

This comment has been minimized.

Copy link
Member

sfackler commented Aug 1, 2018

Yep

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Aug 1, 2018

Yes it is a protocol violation, but since the iterator is private there is no "victim".

@ollie27

This comment has been minimized.

Copy link
Contributor

ollie27 commented Aug 1, 2018

The iterator is private but it can be passed to anything implementing FromIterator including things outside of std which we can't guarantee will behave sensibly with an invalid iterator.

@sfackler

This comment has been minimized.

Copy link
Member

sfackler commented Aug 2, 2018

That's true but it seems like the benefits outweigh the costs here, right? I don't really see there being FromIterator implementations that would explode if the lower bound estimate is wrong.

@ollie27

This comment has been minimized.

Copy link
Contributor

ollie27 commented Aug 2, 2018

How about SmallVec: https://github.com/servo/rust-smallvec/blob/fcb1b7e99a2a98d928983af78a494ef566cbde8b/lib.rs#L1141-L1174. If the lower bound is wrong then it will try to iterate over an already consumed iterator (playground).

@ljedrz

This comment has been minimized.

Copy link
Contributor Author

ljedrz commented Aug 2, 2018

@ollie27 I'm not sure about your example, but SmallVec doesn't seem to regress with this change.

@ollie27

This comment has been minimized.

Copy link
Contributor

ollie27 commented Aug 2, 2018

SmallVec was just an example of real world code that relies on a correct implementation of size_hint to not explode. It does regress though if you consider that "no further elements are taken" once the iterator has returned None (playground).

Even Vec's FromIterator implementation can't handle nonsense lower bounds in size_hint (playground).

@ljedrz

This comment has been minimized.

Copy link
Contributor Author

ljedrz commented Aug 2, 2018

Is there any way around this? We wouldn't want to be forced to using push loops, which are both less idiomatic and less performant (though a lot less than using the existing implementation of Option/Result::FromIterator).

@ljedrz

This comment has been minimized.

Copy link
Contributor Author

ljedrz commented Aug 2, 2018

@ollie27 What about the following alternative implementation?

fn option_from_iter_new<A, V: FromIterator<A>, I: IntoIterator<Item=Option<A>>>(iter: I) -> Option<V> {
    let iter = iter.into_iter();
    let mut v = Vec::with_capacity(
        if iter.size_hint().1.is_some() { iter.size_hint().0 } else { 0 }
    );

    for elem in iter.into_iter() {
        match elem {
            Some(e) => v.push(e),
            None => return None
        }
    }

    Some(v.into_iter().collect())
}

It makes both your playground examples compile and is about as fast as a manual push loop:

test bench_collect_to_option_new ... bench:         220 ns/iter (+/- 12)
test bench_collect_to_option_old ... bench:         600 ns/iter (+/- 32)
test bench_collect_to_result_new ... bench:         222 ns/iter (+/- 16)
test bench_collect_to_result_old ... bench:         596 ns/iter (+/- 40)
test bench_push_loop_to_option   ... bench:         211 ns/iter (+/- 13)
test bench_push_loop_to_result   ... bench:         221 ns/iter (+/- 8)

I'm not too happy about that intermediate Vec (any other ideas?), but it still looks like a performance win.

cramertj added a commit to cramertj/rust that referenced this pull request Aug 2, 2018

Rollup merge of rust-lang#52910 - ljedrz:fix_48994, r=sfackler
Calculate capacity when collecting into Option and Result

I was browsing the [perf page](http://perf.rust-lang.org) to see the impact of my recent changes (e.g. rust-lang#52697) and I was surprised that some of the results were not as awesome as I expected. I dug some more and found an issue that is the probable culprit: [Collecting into a Result<Vec<_>> doesn't reserve the capacity in advance](rust-lang#48994).

Collecting into `Option` or `Result` might result in an empty collection, but there is no reason why we shouldn't provide a non-zero lower bound when we know the `Iterator` we are collecting from doesn't contain any `None` or `Err`.

We know this, because the `Adapter` iterator used in the `FromIterator` implementations for `Option` and `Result` registers if any `None` or `Err` are present in the `Iterator` in question; we can use this information and return a more accurate lower bound in case we know it won't be equal to zero.

I [have benchmarked](https://gist.github.com/ljedrz/c2fcc19f6260976ae7a46ae47aa71fb5) collecting into `Option` and `Result` using the current implementation and one with the proposed changes; I have also benchmarked a push loop with a known capacity as a reference that should be slower than using `FromIterator` (i.e. `collect()`). The results are quite promising:
```
test bench_collect_to_option_new ... bench:         246 ns/iter (+/- 23)
test bench_collect_to_option_old ... bench:         954 ns/iter (+/- 54)
test bench_collect_to_result_new ... bench:         250 ns/iter (+/- 25)
test bench_collect_to_result_old ... bench:         939 ns/iter (+/- 104)
test bench_push_loop_to_option   ... bench:         294 ns/iter (+/- 21)
test bench_push_loop_to_result   ... bench:         303 ns/iter (+/- 29)
```
Fixes rust-lang#48994.
@ollie27

This comment has been minimized.

Copy link
Contributor

ollie27 commented Aug 2, 2018

Using an intermediate Vec like that is may work fine when collecting into a Vec because of this specialisation but likely won't be good for other collections. It will also still hit OOM if the size_hint is something like (usize::MAX, Some(usize::MAX)).

I'm not sure what can really be done here. Unfortunately, this may be a case where you just have to manually use with_capacity to get the best performance. That's already what you have to do when collecting iterators that don't have precise size_hints.

I'll take this off the queue, hopefully I've made my point.

@bors r-

@bors bors removed the S-waiting-on-bors label Aug 2, 2018

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Dec 3, 2018

@ljedrz Is it not kicking in, or is it not making any noticeable difference?

@ljedrz

This comment has been minimized.

Copy link
Contributor Author

ljedrz commented Dec 3, 2018

@eddyb I assumed it's equivalent; I didn't see a performance difference in a collect() benchmark.

@Dylan-DPC

This comment has been minimized.

Copy link
Member

Dylan-DPC commented Jan 7, 2019

ping from triage @ljedrz @eddyb any updates? is this blocked on anything?

@ljedrz

This comment has been minimized.

Copy link
Contributor Author

ljedrz commented Jan 7, 2019

@Dylan-DPC no, I just don't know why the specialization doesn't doesn't improve anything. @eddyb do you see any further options?

@stokhos

This comment has been minimized.

Copy link

stokhos commented Jan 14, 2019

Ping from triage @eddyb this PR needs your help

@scottmcm

This comment has been minimized.

Copy link
Member

scottmcm commented Jan 14, 2019

@TimNN

This comment has been minimized.

Copy link
Contributor

TimNN commented Jan 29, 2019

Ping from triage! Could someone summarize the status of this PR?

Show resolved Hide resolved src/libcore/iter/traits.rs Outdated
@scottmcm

This comment has been minimized.

Copy link
Member

scottmcm commented Jan 29, 2019

The PR looks plausible to me, FWIW.

I think it needs @ljedrz to rebase it to resolve the conflicts, then probably a try+perf to confirm it's effective?

@ljedrz ljedrz force-pushed the ljedrz:fix_48994 branch from c018002 to e856435 Jan 30, 2019

@ljedrz

This comment has been minimized.

Copy link
Contributor Author

ljedrz commented Jan 30, 2019

@scottmcm Rebased; please feel free to do a perf run if you feel this can work.

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Jan 30, 2019

☔️ The latest upstream changes (presumably #57974) made this pull request unmergeable. Please resolve the merge conflicts.

@scottmcm

This comment has been minimized.

Copy link
Member

scottmcm commented Jan 30, 2019

@ljedrz Sorry, can you rebase again? Seems like something else stepped on you again 🙁

@ljedrz ljedrz force-pushed the ljedrz:fix_48994 branch from e856435 to 5762e67 Feb 1, 2019

@ljedrz

This comment has been minimized.

Copy link
Contributor Author

ljedrz commented Feb 1, 2019

@scottmcm no prob; rebased.

@scottmcm

This comment has been minimized.

Copy link
Member

scottmcm commented Feb 1, 2019

Let's see if I have permissions to

@bors try

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Feb 1, 2019

⌛️ Trying commit 5762e67 with merge 8b32f79...

bors added a commit that referenced this pull request Feb 1, 2019

Auto merge of #52910 - ljedrz:fix_48994, r=<try>
[WIP] Calculate capacity when collecting into Option and Result

I was browsing the [perf page](http://perf.rust-lang.org) to see the impact of my recent changes (e.g. #52697) and I was surprised that some of the results were not as awesome as I expected. I dug some more and found an issue that is the probable culprit: [Collecting into a Result<Vec<_>> doesn't reserve the capacity in advance](#48994).

Collecting into `Option` or `Result` might result in an empty collection, but there is no reason why we shouldn't provide a non-zero lower bound when we know the `Iterator` we are collecting from doesn't contain any `None` or `Err`.

We know this, because the `Adapter` iterator used in the `FromIterator` implementations for `Option` and `Result` registers if any `None` or `Err` are present in the `Iterator` in question; we can use this information and return a more accurate lower bound in case we know it won't be equal to zero.

I [have benchmarked](https://gist.github.com/ljedrz/c2fcc19f6260976ae7a46ae47aa71fb5) collecting into `Option` and `Result` using the current implementation and one with the proposed changes; I have also benchmarked a push loop with a known capacity as a reference that should be slower than using `FromIterator` (i.e. `collect()`). The results are quite promising:
```
test bench_collect_to_option_new ... bench:         246 ns/iter (+/- 23)
test bench_collect_to_option_old ... bench:         954 ns/iter (+/- 54)
test bench_collect_to_result_new ... bench:         250 ns/iter (+/- 25)
test bench_collect_to_result_old ... bench:         939 ns/iter (+/- 104)
test bench_push_loop_to_option   ... bench:         294 ns/iter (+/- 21)
test bench_push_loop_to_result   ... bench:         303 ns/iter (+/- 29)
```
Fixes #48994.
@bors

This comment has been minimized.

Copy link
Contributor

bors commented Feb 2, 2019

☀️ Test successful - checks-travis
State: approved= try=True

@Mark-Simulacrum

This comment has been minimized.

Copy link
Member

Mark-Simulacrum commented Feb 2, 2019

@rust-timer

This comment has been minimized.

Copy link

rust-timer commented Feb 2, 2019

Success: Queued 8b32f79 with parent 23d8d0c, comparison URL.

@rust-timer

This comment has been minimized.

Copy link

rust-timer commented Feb 2, 2019

Finished benchmarking try commit 8b32f79

@ljedrz

This comment has been minimized.

Copy link
Contributor Author

ljedrz commented Feb 2, 2019

Hmm, perf doesn't seem to have detected any difference.

@Mark-Simulacrum

This comment has been minimized.

Copy link
Member

Mark-Simulacrum commented Feb 27, 2019

@sfackler Are you still the correct reviewer for this PR? Perhaps we should re-assign to @scottmcm? I'm not sure what the current status here is.

@Dylan-DPC

This comment has been minimized.

Copy link
Member

Dylan-DPC commented Mar 18, 2019

ping from triage @ljedrz you have conflicts to resolve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.