New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collecting into a Result<Vec<_>> doesn't reserve the capacity in advance #48994

Open
scurest opened this Issue Mar 13, 2018 · 1 comment

Comments

Projects
None yet
4 participants
@scurest
Copy link

scurest commented Mar 13, 2018

Couldn't find an issue for this and don't know if it counts but filing anyway.


If you have

fn foo(s: &[i8]) -> Vec<u8> {
    s.iter()
    .map(|&x| x as u8)
    .collect()
}

the SpecExtend machinery ensures that the vector has s.len() space reserved in advance. However if you change it to return a result

fn foo(s: &[i8]) -> Result<Vec<u8>> {
    s.iter()
    .map(|&x| if x < 0 { Err(...) } else { Ok(x as u8) } )
    .collect()
}

then (based on examining the LLVM IR and heaptracker's "Temporary" measurements) that optimization has quietly been lost.

This is technically correct in the sense that the first element yielded could be an Err of course (the size hint for the Adapter in Result's FromIterator impl has a lower bound of 0). But this pessimizes the good path to take more memory and be slower in favor of possibly saving memory on the bad one, which seems backwards.

Is there a specialization that could be added to fix this?

@scottmcm

This comment has been minimized.

Copy link
Member

scottmcm commented Mar 14, 2018

I wonder if there's a more general tweak. It looks like collect only looks at the low end of the hint (https://doc.rust-lang.org/src/alloc/vec.rs.html#1804), so it feels like there ought to be a way to get some use out of the high end as well.

Strawman proposal:

  • Do the same as today if high is None
    • Mostly to avoid trying to do anything with the default (0, None) "hint"
  • Otherwise reserve a rough approximation of √(low × high)
    • So (0, 1GiB) pre-reserves 32KiB, for example
    • Very cheap with leading_zeros and shifting
    • Hopefully never so big as to be a problem (and could try_reserve to never be, I suppose)
  • At the end, shrink_to_fit if len() <= cap()/2
    • So if there really is only a handful of elements, it doesn't waste too much space

Probably needs the benchmark set to decide whether it's constructive, though...

scurest added a commit to scurest/apicula that referenced this issue Mar 15, 2018

Avoid collect::<Result<_>> for rotation matrices
This shaves off about 20% of our maximum heap usage in my tests
when loading many animations. There are other places where we
collect into a Result, but since those vectors are smaller they
make less of a difference.

See rust-lang/rust#48994.

Also did some readability edits nearby.

cramertj added a commit to cramertj/rust that referenced this issue Aug 2, 2018

Rollup merge of rust-lang#52910 - ljedrz:fix_48994, r=sfackler
Calculate capacity when collecting into Option and Result

I was browsing the [perf page](http://perf.rust-lang.org) to see the impact of my recent changes (e.g. rust-lang#52697) and I was surprised that some of the results were not as awesome as I expected. I dug some more and found an issue that is the probable culprit: [Collecting into a Result<Vec<_>> doesn't reserve the capacity in advance](rust-lang#48994).

Collecting into `Option` or `Result` might result in an empty collection, but there is no reason why we shouldn't provide a non-zero lower bound when we know the `Iterator` we are collecting from doesn't contain any `None` or `Err`.

We know this, because the `Adapter` iterator used in the `FromIterator` implementations for `Option` and `Result` registers if any `None` or `Err` are present in the `Iterator` in question; we can use this information and return a more accurate lower bound in case we know it won't be equal to zero.

I [have benchmarked](https://gist.github.com/ljedrz/c2fcc19f6260976ae7a46ae47aa71fb5) collecting into `Option` and `Result` using the current implementation and one with the proposed changes; I have also benchmarked a push loop with a known capacity as a reference that should be slower than using `FromIterator` (i.e. `collect()`). The results are quite promising:
```
test bench_collect_to_option_new ... bench:         246 ns/iter (+/- 23)
test bench_collect_to_option_old ... bench:         954 ns/iter (+/- 54)
test bench_collect_to_result_new ... bench:         250 ns/iter (+/- 25)
test bench_collect_to_result_old ... bench:         939 ns/iter (+/- 104)
test bench_push_loop_to_option   ... bench:         294 ns/iter (+/- 21)
test bench_push_loop_to_result   ... bench:         303 ns/iter (+/- 29)
```
Fixes rust-lang#48994.

bors added a commit that referenced this issue Aug 5, 2018

Auto merge of #53086 - scottmcm:use-upper-in-collect, r=<try>
Try to guess a smarter initial capacity in Vec::from_iter

> Another possibility is collect could look at the upper bound and be smarter about what capacity to use?  ~ #45840 (comment)

This is obviously good for hints like `(60, Some(61))` where today we allocate space for 60, then double it should the additional element show up, and it'd be much better to just always allocate 61.

More nuanced are hints like `(0, Some(150))`, where today the code uses just the `0`, and thus starts at a capacity of `1`, but with this change will start at `10` instead.

This can undeniably increase memory pressure over where it is today, so I expect at least some controversy 🙂  It does use `try_reserve` for the allocation that's more than the lower bound, and thus shouldn't introduce new aborts, at least.  And the starting point grows by the root, keeping things fairly contained: even with an hint of `(0, Some(1_000_000_000))` it'll only start at `30_517`.

cc @ljedrz
cc #48994

bors added a commit that referenced this issue Aug 8, 2018

Auto merge of #53086 - scottmcm:use-upper-in-collect, r=<try>
Try to guess a smarter initial capacity in Vec::from_iter

> Another possibility is collect could look at the upper bound and be smarter about what capacity to use?  ~ #45840 (comment)

This is obviously good for hints like `(60, Some(61))` where today we allocate space for 60, then double it should the additional element show up, and it'd be much better to just always allocate 61.

More nuanced are hints like `(0, Some(150))`, where today the code uses just the `0`, and thus starts at a capacity of `1`, but with this change will start at `10` instead.

This can undeniably increase memory pressure over where it is today, so I expect at least some controversy 🙂  It does use `try_reserve` for the allocation that's more than the lower bound, and thus shouldn't introduce new aborts, at least.  And the starting point grows by the root, keeping things fairly contained: even with an hint of `(0, Some(1_000_000_000))` it'll only start at `30_517`.

cc @ljedrz
cc #48994

bors added a commit that referenced this issue Feb 1, 2019

Auto merge of #52910 - ljedrz:fix_48994, r=<try>
[WIP] Calculate capacity when collecting into Option and Result

I was browsing the [perf page](http://perf.rust-lang.org) to see the impact of my recent changes (e.g. #52697) and I was surprised that some of the results were not as awesome as I expected. I dug some more and found an issue that is the probable culprit: [Collecting into a Result<Vec<_>> doesn't reserve the capacity in advance](#48994).

Collecting into `Option` or `Result` might result in an empty collection, but there is no reason why we shouldn't provide a non-zero lower bound when we know the `Iterator` we are collecting from doesn't contain any `None` or `Err`.

We know this, because the `Adapter` iterator used in the `FromIterator` implementations for `Option` and `Result` registers if any `None` or `Err` are present in the `Iterator` in question; we can use this information and return a more accurate lower bound in case we know it won't be equal to zero.

I [have benchmarked](https://gist.github.com/ljedrz/c2fcc19f6260976ae7a46ae47aa71fb5) collecting into `Option` and `Result` using the current implementation and one with the proposed changes; I have also benchmarked a push loop with a known capacity as a reference that should be slower than using `FromIterator` (i.e. `collect()`). The results are quite promising:
```
test bench_collect_to_option_new ... bench:         246 ns/iter (+/- 23)
test bench_collect_to_option_old ... bench:         954 ns/iter (+/- 54)
test bench_collect_to_result_new ... bench:         250 ns/iter (+/- 25)
test bench_collect_to_result_old ... bench:         939 ns/iter (+/- 104)
test bench_push_loop_to_option   ... bench:         294 ns/iter (+/- 21)
test bench_push_loop_to_result   ... bench:         303 ns/iter (+/- 29)
```
Fixes #48994.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment