Skip to content

Commit

Permalink
Auto merge of #70793 - the8472:in-place-iter-collect, r=Amanieu
Browse files Browse the repository at this point in the history
specialize some collection and iterator operations to run in-place

This is a rebase and update of #66383 which was closed due inactivity.

Recent rustc changes made the compile time regressions disappear, at least for webrender-wrench. Running a stage2 compile and the rustc-perf suite takes hours on the hardware I have at the moment, so I can't do much more than that.

![Screenshot_2020-04-05 rustc performance data](https://user-images.githubusercontent.com/1065730/78462657-5d60f100-76d4-11ea-8a0b-4f3962707c38.png)

In the best case of the `vec::bench_in_place_recycle` synthetic microbenchmark these optimizations can provide a 15x speedup over the regular implementation which allocates a new vec for every benchmark iteration. [Benchmark results](https://gist.github.com/the8472/6d999b2d08a2bedf3b93f12112f96e2f). In real code the speedups are tiny, but it also depends on the allocator used, a system allocator that uses a process-wide mutex will benefit more than one with thread-local pools.

## What was changed

* `SpecExtend` which covered `from_iter` and `extend` specializations was split into separate traits
* `extend` and `from_iter` now reuse the `append_elements` if passed iterators are from slices.
* A preexisting `vec.into_iter().collect::<Vec<_>>()` optimization that passed through the original vec has been generalized further to also cover cases where the original has been partially drained.
* A chain of *Vec<T> / BinaryHeap<T> / Box<[T]>* `IntoIter`s  through various iterator adapters collected into *Vec<U>* and *BinaryHeap<U>* will be performed in place as long as `T` and `U` have the same alignment and size and aren't ZSTs.
* To enable above specialization the unsafe, unstable `SourceIter` and `InPlaceIterable` traits have been added. The first allows reaching through the iterator pipeline to grab a pointer to the source memory. The latter is a marker that promises that the read pointer will advance as fast or faster than the write pointer and thus in-place operation is possible in the first place.
* `vec::IntoIter` implements `TrustedRandomAccess` for `T: Copy` to allow in-place collection when there is a `Zip` adapter in the iterator. TRA had to be made an unstable public trait to support this.

## In-place collectible adapters

* `Map`
* `MapWhile`
* `Filter`
* `FilterMap`
* `Fuse`
* `Skip`
* `SkipWhile`
* `Take`
* `TakeWhile`
* `Enumerate`
* `Zip` (left hand side only, `Copy` types only)
* `Peek`
* `Scan`
* `Inspect`

## Concerns

`vec.into_iter().filter(|_| false).collect()` will no longer return a vec with 0 capacity, instead it will return its original allocation. This avoids the cost of doing any allocation or deallocation but could lead to large allocations living longer than expected.
If that's not acceptable some resizing policy at the end of the attempted in-place collect would be necessary, which in the worst case could result in one more memcopy than the non-specialized case.

## Possible followup work

* split liballoc/vec.rs to remove `ignore-tidy-filelength`
* try to get trivial chains such as `vec.into_iter().skip(1).collect::<Vec<)>>()` to compile to a `memmove` (currently compiles to a pile of SIMD, see #69187 )
* improve up the traits so they can be reused by other crates, e.g. itertools. I think currently they're only good enough for internal use
* allow iterators sourced from a `HashSet` to be in-place collected into a `Vec`
  • Loading branch information
bors committed Sep 3, 2020
2 parents 62dad45 + 2f23a0f commit 0d0f6b1
Show file tree
Hide file tree
Showing 15 changed files with 1,078 additions and 49 deletions.
245 changes: 243 additions & 2 deletions library/alloc/benches/vec.rs
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
use rand::prelude::*;
use std::iter::{repeat, FromIterator};
use test::Bencher;
use test::{black_box, Bencher};

#[bench]
fn bench_new(b: &mut Bencher) {
b.iter(|| {
let v: Vec<u32> = Vec::new();
assert_eq!(v.len(), 0);
assert_eq!(v.capacity(), 0);
v
})
}

Expand All @@ -17,6 +19,7 @@ fn do_bench_with_capacity(b: &mut Bencher, src_len: usize) {
let v: Vec<u32> = Vec::with_capacity(src_len);
assert_eq!(v.len(), 0);
assert_eq!(v.capacity(), src_len);
v
})
}

Expand Down Expand Up @@ -47,6 +50,7 @@ fn do_bench_from_fn(b: &mut Bencher, src_len: usize) {
let dst = (0..src_len).collect::<Vec<_>>();
assert_eq!(dst.len(), src_len);
assert!(dst.iter().enumerate().all(|(i, x)| i == *x));
dst
})
}

Expand Down Expand Up @@ -77,6 +81,7 @@ fn do_bench_from_elem(b: &mut Bencher, src_len: usize) {
let dst: Vec<usize> = repeat(5).take(src_len).collect();
assert_eq!(dst.len(), src_len);
assert!(dst.iter().all(|x| *x == 5));
dst
})
}

Expand Down Expand Up @@ -109,6 +114,7 @@ fn do_bench_from_slice(b: &mut Bencher, src_len: usize) {
let dst = src.clone()[..].to_vec();
assert_eq!(dst.len(), src_len);
assert!(dst.iter().enumerate().all(|(i, x)| i == *x));
dst
});
}

Expand Down Expand Up @@ -141,6 +147,7 @@ fn do_bench_from_iter(b: &mut Bencher, src_len: usize) {
let dst: Vec<_> = FromIterator::from_iter(src.clone());
assert_eq!(dst.len(), src_len);
assert!(dst.iter().enumerate().all(|(i, x)| i == *x));
dst
});
}

Expand Down Expand Up @@ -175,6 +182,7 @@ fn do_bench_extend(b: &mut Bencher, dst_len: usize, src_len: usize) {
dst.extend(src.clone());
assert_eq!(dst.len(), dst_len + src_len);
assert!(dst.iter().enumerate().all(|(i, x)| i == *x));
dst
});
}

Expand Down Expand Up @@ -224,9 +232,24 @@ fn do_bench_extend_from_slice(b: &mut Bencher, dst_len: usize, src_len: usize) {
dst.extend_from_slice(&src);
assert_eq!(dst.len(), dst_len + src_len);
assert!(dst.iter().enumerate().all(|(i, x)| i == *x));
dst
});
}

#[bench]
fn bench_extend_recycle(b: &mut Bencher) {
let mut data = vec![0; 1000];

b.iter(|| {
let tmp = std::mem::replace(&mut data, Vec::new());
let mut to_extend = black_box(Vec::new());
to_extend.extend(tmp.into_iter());
data = black_box(to_extend);
});

black_box(data);
}

#[bench]
fn bench_extend_from_slice_0000_0000(b: &mut Bencher) {
do_bench_extend_from_slice(b, 0, 0)
Expand Down Expand Up @@ -271,6 +294,7 @@ fn do_bench_clone(b: &mut Bencher, src_len: usize) {
let dst = src.clone();
assert_eq!(dst.len(), src_len);
assert!(dst.iter().enumerate().all(|(i, x)| i == *x));
dst
});
}

Expand Down Expand Up @@ -305,10 +329,10 @@ fn do_bench_clone_from(b: &mut Bencher, times: usize, dst_len: usize, src_len: u

for _ in 0..times {
dst.clone_from(&src);

assert_eq!(dst.len(), src_len);
assert!(dst.iter().enumerate().all(|(i, x)| dst_len + i == *x));
}
dst
});
}

Expand Down Expand Up @@ -431,3 +455,220 @@ fn bench_clone_from_10_0100_0010(b: &mut Bencher) {
fn bench_clone_from_10_1000_0100(b: &mut Bencher) {
do_bench_clone_from(b, 10, 1000, 100)
}

macro_rules! bench_in_place {
(
$($fname:ident, $type:ty , $count:expr, $init: expr);*
) => {
$(
#[bench]
fn $fname(b: &mut Bencher) {
b.iter(|| {
let src: Vec<$type> = black_box(vec![$init; $count]);
let mut sink = src.into_iter()
.enumerate()
.map(|(idx, e)| { (idx as $type) ^ e }).collect::<Vec<$type>>();
black_box(sink.as_mut_ptr())
});
}
)+
};
}

bench_in_place![
bench_in_place_xxu8_i0_0010, u8, 10, 0;
bench_in_place_xxu8_i0_0100, u8, 100, 0;
bench_in_place_xxu8_i0_1000, u8, 1000, 0;
bench_in_place_xxu8_i1_0010, u8, 10, 1;
bench_in_place_xxu8_i1_0100, u8, 100, 1;
bench_in_place_xxu8_i1_1000, u8, 1000, 1;
bench_in_place_xu32_i0_0010, u32, 10, 0;
bench_in_place_xu32_i0_0100, u32, 100, 0;
bench_in_place_xu32_i0_1000, u32, 1000, 0;
bench_in_place_xu32_i1_0010, u32, 10, 1;
bench_in_place_xu32_i1_0100, u32, 100, 1;
bench_in_place_xu32_i1_1000, u32, 1000, 1;
bench_in_place_u128_i0_0010, u128, 10, 0;
bench_in_place_u128_i0_0100, u128, 100, 0;
bench_in_place_u128_i0_1000, u128, 1000, 0;
bench_in_place_u128_i1_0010, u128, 10, 1;
bench_in_place_u128_i1_0100, u128, 100, 1;
bench_in_place_u128_i1_1000, u128, 1000, 1
];

#[bench]
fn bench_in_place_recycle(b: &mut test::Bencher) {
let mut data = vec![0; 1000];

b.iter(|| {
let tmp = std::mem::replace(&mut data, Vec::new());
data = black_box(
tmp.into_iter()
.enumerate()
.map(|(idx, e)| idx.wrapping_add(e))
.fuse()
.peekable()
.collect::<Vec<usize>>(),
);
});
}

#[bench]
fn bench_in_place_zip_recycle(b: &mut test::Bencher) {
let mut data = vec![0u8; 1000];
let mut rng = rand::thread_rng();
let mut subst = vec![0u8; 1000];
rng.fill_bytes(&mut subst[..]);

b.iter(|| {
let tmp = std::mem::replace(&mut data, Vec::new());
let mangled = tmp
.into_iter()
.zip(subst.iter().copied())
.enumerate()
.map(|(i, (d, s))| d.wrapping_add(i as u8) ^ s)
.collect::<Vec<_>>();
assert_eq!(mangled.len(), 1000);
data = black_box(mangled);
});
}

#[bench]
fn bench_in_place_zip_iter_mut(b: &mut test::Bencher) {
let mut data = vec![0u8; 256];
let mut rng = rand::thread_rng();
let mut subst = vec![0u8; 1000];
rng.fill_bytes(&mut subst[..]);

b.iter(|| {
data.iter_mut().enumerate().for_each(|(i, d)| {
*d = d.wrapping_add(i as u8) ^ subst[i];
});
});

black_box(data);
}

#[derive(Clone)]
struct Droppable(usize);

impl Drop for Droppable {
fn drop(&mut self) {
black_box(self);
}
}

#[bench]
fn bench_in_place_collect_droppable(b: &mut test::Bencher) {
let v: Vec<Droppable> = std::iter::repeat_with(|| Droppable(0)).take(1000).collect();
b.iter(|| {
v.clone()
.into_iter()
.skip(100)
.enumerate()
.map(|(i, e)| Droppable(i ^ e.0))
.collect::<Vec<_>>()
})
}

#[bench]
fn bench_chain_collect(b: &mut test::Bencher) {
let data = black_box([0; LEN]);
b.iter(|| data.iter().cloned().chain([1].iter().cloned()).collect::<Vec<_>>());
}

#[bench]
fn bench_chain_chain_collect(b: &mut test::Bencher) {
let data = black_box([0; LEN]);
b.iter(|| {
data.iter()
.cloned()
.chain([1].iter().cloned())
.chain([2].iter().cloned())
.collect::<Vec<_>>()
});
}

#[bench]
fn bench_nest_chain_chain_collect(b: &mut test::Bencher) {
let data = black_box([0; LEN]);
b.iter(|| {
data.iter().cloned().chain([1].iter().chain([2].iter()).cloned()).collect::<Vec<_>>()
});
}

pub fn example_plain_slow(l: &[u32]) -> Vec<u32> {
let mut result = Vec::with_capacity(l.len());
result.extend(l.iter().rev());
result
}

pub fn map_fast(l: &[(u32, u32)]) -> Vec<u32> {
let mut result = Vec::with_capacity(l.len());
for i in 0..l.len() {
unsafe {
*result.get_unchecked_mut(i) = l[i].0;
result.set_len(i);
}
}
result
}

const LEN: usize = 16384;

#[bench]
fn bench_range_map_collect(b: &mut test::Bencher) {
b.iter(|| (0..LEN).map(|_| u32::default()).collect::<Vec<_>>());
}

#[bench]
fn bench_chain_extend_ref(b: &mut test::Bencher) {
let data = black_box([0; LEN]);
b.iter(|| {
let mut v = Vec::<u32>::with_capacity(data.len() + 1);
v.extend(data.iter().chain([1].iter()));
v
});
}

#[bench]
fn bench_chain_extend_value(b: &mut test::Bencher) {
let data = black_box([0; LEN]);
b.iter(|| {
let mut v = Vec::<u32>::with_capacity(data.len() + 1);
v.extend(data.iter().cloned().chain(Some(1)));
v
});
}

#[bench]
fn bench_rev_1(b: &mut test::Bencher) {
let data = black_box([0; LEN]);
b.iter(|| {
let mut v = Vec::<u32>::new();
v.extend(data.iter().rev());
v
});
}

#[bench]
fn bench_rev_2(b: &mut test::Bencher) {
let data = black_box([0; LEN]);
b.iter(|| example_plain_slow(&data));
}

#[bench]
fn bench_map_regular(b: &mut test::Bencher) {
let data = black_box([(0, 0); LEN]);
b.iter(|| {
let mut v = Vec::<u32>::new();
v.extend(data.iter().map(|t| t.1));
v
});
}

#[bench]
fn bench_map_fast(b: &mut test::Bencher) {
let data = black_box([(0, 0); LEN]);
b.iter(|| map_fast(&data));
}
25 changes: 23 additions & 2 deletions library/alloc/src/collections/binary_heap.rs
Original file line number Diff line number Diff line change
Expand Up @@ -145,13 +145,13 @@
#![stable(feature = "rust1", since = "1.0.0")]

use core::fmt;
use core::iter::{FromIterator, FusedIterator, TrustedLen};
use core::iter::{FromIterator, FusedIterator, InPlaceIterable, SourceIter, TrustedLen};
use core::mem::{self, size_of, swap, ManuallyDrop};
use core::ops::{Deref, DerefMut};
use core::ptr;

use crate::slice;
use crate::vec::{self, Vec};
use crate::vec::{self, AsIntoIter, Vec};

use super::SpecExtend;

Expand Down Expand Up @@ -1173,6 +1173,27 @@ impl<T> ExactSizeIterator for IntoIter<T> {
#[stable(feature = "fused", since = "1.26.0")]
impl<T> FusedIterator for IntoIter<T> {}

#[unstable(issue = "none", feature = "inplace_iteration")]
unsafe impl<T> SourceIter for IntoIter<T> {
type Source = IntoIter<T>;

#[inline]
unsafe fn as_inner(&mut self) -> &mut Self::Source {
self
}
}

#[unstable(issue = "none", feature = "inplace_iteration")]
unsafe impl<I> InPlaceIterable for IntoIter<I> {}

impl<I> AsIntoIter for IntoIter<I> {
type Item = I;

fn as_into_iter(&mut self) -> &mut vec::IntoIter<Self::Item> {
&mut self.iter
}
}

#[unstable(feature = "binary_heap_into_iter_sorted", issue = "59278")]
#[derive(Clone, Debug)]
pub struct IntoIterSorted<T> {
Expand Down
4 changes: 4 additions & 0 deletions library/alloc/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -99,13 +99,15 @@
#![feature(fmt_internals)]
#![feature(fn_traits)]
#![feature(fundamental)]
#![feature(inplace_iteration)]
#![feature(internal_uninit_const)]
#![feature(lang_items)]
#![feature(layout_for_ptr)]
#![feature(libc)]
#![feature(map_first_last)]
#![feature(map_into_keys_values)]
#![feature(negative_impls)]
#![feature(never_type)]
#![feature(new_uninit)]
#![feature(nll)]
#![feature(nonnull_slice_from_raw_parts)]
Expand Down Expand Up @@ -133,7 +135,9 @@
#![feature(slice_partition_dedup)]
#![feature(maybe_uninit_extra, maybe_uninit_slice)]
#![feature(alloc_layout_extra)]
#![feature(trusted_random_access)]
#![feature(try_trait)]
#![feature(type_alias_impl_trait)]
#![feature(associated_type_bounds)]
// Allow testing this library

Expand Down
Loading

0 comments on commit 0d0f6b1

Please sign in to comment.