Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize implementations of FromIterator and Extend for Vec #22681

Merged
merged 4 commits into from Jun 17, 2015

Conversation

Projects
None yet
10 participants
@mzabaluev
Copy link
Contributor

mzabaluev commented Feb 22, 2015

Instead of a fast branch with a sized iterator falling back to a potentially poorly optimized iterate-and-push loop, a single efficient loop can serve all cases.

In my benchmark runs, I see some good gains, but also some regressions, possibly due to different inlining choices by the compiler. YMMV.

mzabaluev added some commits Feb 22, 2015

Optimize Vec::from_iter and extend
Use one loop, efficient for both sized and size-ignorant iterators
(including iterators lying about their size).
In Vec::from_iter, unroll the first iteration
For the first ever element to put into a vector, the branching
conditions are more predictable.
Desugar the implementation of extend to work with Iterator
Implement both Vec::from_iter and extend in terms of an internal
method working with Iterator. Otherwise, the code below ends up
using two monomorphizations of extend, differing only in the
implementation of IntoIterator:

let mut v = Vector::from_iter(iterable1);
v.extend(iterable2);
@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Feb 22, 2015

r? @Gankro

(rust_highfive has picked a reviewer for you, use r? to override)

@Gankro

This comment has been minimized.

Copy link
Contributor

Gankro commented Feb 22, 2015

Can you post your experimental results.

@mahkoh

This comment has been minimized.

Copy link
Contributor

mahkoh commented Feb 22, 2015

Does it optimize to a memcpy if the src is contiguous? If not then it will have to be changed anyway.

// self.push(item);
// }
loop {
match iterator.next() {

This comment has been minimized.

@pczarn

pczarn Feb 22, 2015

Contributor

Can you use while let here?

This comment has been minimized.

@mzabaluev

mzabaluev Feb 22, 2015

Author Contributor

I can, good point.

@mzabaluev

This comment has been minimized.

Copy link
Contributor Author

mzabaluev commented Feb 22, 2015

Results of benchmarking on i686: https://gist.github.com/mzabaluev/df41c2f50464416b0a26

tl;dr: Notable losers are (observed consistently over multiple bench runs):

Test Percentage
bit::bit_vec_bench::bench_bit_vec_big_iter 189%
slice::bench::sort_big_random_small 143%
slice::bench::zero_1kb_from_elem 184%
vec::tests::bench_extend_0000_1000 119%
vec::tests::bench_from_elem_0100 129%
vec::tests::bench_from_iter_0010 121%
vec::tests::bench_from_iter_0100 125%

Winners:

Test Percentage
string::tests::bench_push_char_two_bytes 95%
string::tests::from_utf8_lossy_100_invalid 38%
vec::tests::bench_extend_0000_0010 87%
vec::tests::bench_extend_0000_0100 89%
vec::tests::bench_from_elem_1000 48%
vec::tests::bench_from_iter_1000 81%

Overall, I don't think too much trust should be put in microbenchmarks repeatedly testing one call-site optimized function.

@huonw

This comment has been minimized.

Copy link
Member

huonw commented Feb 22, 2015

How much is "winning" and "losing'"?

@mzabaluev

This comment has been minimized.

Copy link
Contributor Author

mzabaluev commented Feb 22, 2015

@mahkoh These changes should not drastically change the performance with slice iterators, in which case it's as close to reserve + type-aligned memcpy as the optimizer can make it. The main purpose is to provide comparable efficiency for size-unaware or pessimistic iterators.

@mzabaluev

This comment has been minimized.

Copy link
Contributor Author

mzabaluev commented Feb 22, 2015

@huonw I've added percentages to the comment above.

@pczarn

This comment has been minimized.

Copy link
Contributor

pczarn commented Feb 23, 2015

Two of these methods have #[inline], but extend_desugared does not; are benchmarks affected? Is there some benefit to having extend_desugared vs calling vector.extend(iterator)? Does extend_desugared change inlining choices, too?

@mzabaluev

This comment has been minimized.

Copy link
Contributor Author

mzabaluev commented Feb 23, 2015

I actually ran benchmarks once with #[inline] on extend_desugared, but the results did not look more different than some jitter. I don't think #[inline] has any effect on generic methods, as they are always available in metadata, so the compiler can always choose to inline them across crates. Even less so for unit tests.

@mzabaluev

This comment has been minimized.

Copy link
Contributor Author

mzabaluev commented Feb 23, 2015

There should be more benchmarks with extending from sizeless/pessimistic iterators. Any good candidates in libcollections?

@Gankro

This comment has been minimized.

Copy link
Contributor

Gankro commented Feb 23, 2015

BTreeMap's range iter, VecMap's iter, and bitvset's iter have no idea how many elements they contain, but they're all also doing a non-trivial amount of work to actually yield those elements.

@mzabaluev

This comment has been minimized.

Copy link
Contributor Author

mzabaluev commented Feb 23, 2015

@pczarn Without looking too closely into the results (I get easily frustrated with the long build times of the library crates and their tests), I assume the optimizer has more "reason" to share extend_desugared if it is used in various places, unless the tradeoff between cache/branch predictor efficiency and removing the overhead of calling leans towards the latter.

@steveklabnik

This comment has been minimized.

Copy link
Member

steveklabnik commented Apr 21, 2015

How do we all feel about this PR today?

@Gankro

This comment has been minimized.

Copy link
Contributor

Gankro commented Apr 22, 2015

Oh geez, this slipped right through the cracks! I don't have a great gut on this since it seems to just be shuffling perf around. r? @huonw

@rust-highfive rust-highfive assigned huonw and unassigned Gankro Apr 22, 2015

if vector.len() == vector.capacity() {
for element in iterator {
vector.push(element);
let mut vector = match iterator.next() {

This comment has been minimized.

@huonw

huonw Apr 22, 2015

Member

Hm, does this actually improve performance, over the simpler:

let mut vector = Vec::with_capacity(iterator.size_hint().0);
vector.extend_desugared(iterator);
vector

This comment has been minimized.

@mzabaluev

mzabaluev Apr 22, 2015

Author Contributor

This avoids a branch that is present in extend_desugared. That branch tends to be not taken, but in the first iteration the vector is always expanded.

This comment has been minimized.

@huonw

huonw Apr 22, 2015

Member

Hm, why it does it avoid that branch? It seems that if the iterator has a non-zero size hint the with_capacity will ensure that it isn't taken, and in the case that the iterator has a zero-size hint it seems both with and without this are equally bad off?

Basically, I'm asking if this was noticeable in practice.

This comment has been minimized.

@mzabaluev

mzabaluev Apr 22, 2015

Author Contributor

You are right, and it was somehow lost on me that Vec::with_capacity(1) allocates exactly one element. I should simplify the code to your suggestion; the performance difference will likely be negligible.

This comment has been minimized.

@mzabaluev

mzabaluev Apr 23, 2015

Author Contributor

With the suggested change, vec::bench_from_iter_0000 regresses
from 29 ns/iter (+/- 2) to 87 ns/iter (+/- 48) in my testing
(rebased against commit 3dbfa74), the other benchmarks seemingly unaffected. I wonder if it can be considered an edge case.

@Gankro

This comment has been minimized.

Copy link
Contributor

Gankro commented May 19, 2015

@huonw @mzabaluev What's up with this?

@alexcrichton alexcrichton added the T-libs label May 26, 2015

@Gankro

This comment has been minimized.

Copy link
Contributor

Gankro commented May 28, 2015

Closing due to inactivity.

@Gankro Gankro closed this May 28, 2015

@mzabaluev

This comment has been minimized.

Copy link
Contributor Author

mzabaluev commented May 29, 2015

@Gankro Ah sorry, I let your request slip by. I believe the branch is good as it is. @huonw's suggestion, while good from code clarity point of view, resulted in some performance degradation as mentioned in the discussion above.

@Gankro

This comment has been minimized.

Copy link
Contributor

Gankro commented May 30, 2015

@huonw Any reason not to merge this?

@Gankro

This comment has been minimized.

Copy link
Contributor

Gankro commented Jun 4, 2015

huon's busy

@Gankro Gankro reopened this Jun 4, 2015

while let Some(element) = iterator.next() {
let len = self.len();
if len == self.capacity() {
let (lower, _) = iterator.size_hint();

This comment has been minimized.

@Gankro

Gankro Jun 9, 2015

Contributor

Calling size_hint in a loop seems really bad. This is not necessarily a straight-forward or cheap method. Is hoisting it out not worth it in your testing?

This comment has been minimized.

@huonw

huonw Jun 16, 2015

Member

It is called exponentially rarely, so it seems fine?

This comment has been minimized.

@Gankro

Gankro Jun 16, 2015

Contributor

Oh derp, sure.

@Gankro

This comment has been minimized.

Copy link
Contributor

Gankro commented Jun 16, 2015

At worst I like this better than the old code. r+

@huonw

This comment has been minimized.

Copy link
Member

huonw commented Jun 16, 2015

@bors r+

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Jun 16, 2015

📌 Commit 7b464d3 has been approved by huonw

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Jun 17, 2015

⌛️ Testing commit 7b464d3 with merge 0250ff9...

bors added a commit that referenced this pull request Jun 17, 2015

Auto merge of #22681 - mzabaluev:extend-faster, r=huonw
Instead of a fast branch with a sized iterator falling back to a potentially poorly optimized iterate-and-push loop, a single efficient loop can serve all cases.

In my benchmark runs, I see some good gains, but also some regressions, possibly due to different inlining choices by the compiler. YMMV.

@bors bors merged commit 7b464d3 into rust-lang:master Jun 17, 2015

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details

@brson brson added the relnotes label Jun 23, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.