reuse RHS allocation for vec.extend(vec.into_iter()) when they do not fit into the LHS #77496

the8472 · 2020-10-03T16:23:16Z

I tried a broader version of this optimization in #70793 but it regressed compile times because it emitted more IR for every single use of extend() and thus had to be removed from that PR. This attempt is narrower since it only applies to cases where we're extending/appending directly from another vec.

 name                             extend-rec-baseline.b ns/iter  extend-append.b ns/iter  diff ns/iter   diff %  speedup 
 vec::bench_extend_recycle        243                            26                               -217  -89.30%   x 9.35

The aim is to improve runtime behavior, not necessarily compile time but I still want a perf-run to make sure it's not regressing significantly unlike the original attempt.

rust-highfive · 2020-10-03T16:23:19Z

r? @KodrAus

(rust_highfive has picked a reviewer for you, use r? to override)

jyn514 · 2020-10-03T16:47:44Z

@bors try @rust-timer queue

rust-timer · 2020-10-03T16:47:45Z

Awaiting bors try build completion

bors · 2020-10-03T16:48:18Z

⌛ Trying commit b0a249415002259095836413843f80f033c3d782 with merge 097a1aa29aaea51d74064470a76793b00bb9ec11...

the8472 · 2020-10-03T16:51:39Z

CC @pickfire since you wanted to review vec changes

bors · 2020-10-03T17:33:28Z

☀️ Try build successful - checks-actions, checks-azure
Build commit: 097a1aa29aaea51d74064470a76793b00bb9ec11 (097a1aa29aaea51d74064470a76793b00bb9ec11)

rust-timer · 2020-10-03T17:33:30Z

Queued 097a1aa29aaea51d74064470a76793b00bb9ec11 with parent 738d4a7, future comparison URL.

rust-timer · 2020-10-03T19:53:15Z

Finished benchmarking try commit (097a1aa29aaea51d74064470a76793b00bb9ec11): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never

jyn514 · 2020-10-03T20:18:19Z

Nice, moderate improvements across the board :)

jyn514 · 2020-10-03T20:19:11Z

Oh wait I was looking at bootstrap - the instruction counts seem to be about the same.

library/alloc/src/vec.rs

Co-authored-by: Ivan Tham <pickfire@riseup.net>

the8472 · 2020-10-04T15:45:40Z

Removed the part that needs further discussion
generalized the optimization to also apply when the LHS is not empty

library/alloc/src/vec.rs

pickfire · 2020-10-05T01:27:30Z

library/alloc/src/vec.rs

+    /// * `offset == 0` is always valid
+    /// * `offset` must be positive
+    unsafe fn into_vec(self, offset: isize) -> Vec<T> {
+        let dst = unsafe { self.buf.as_ptr().offset(offset) };


We could probably use add instead of offset for usize, but I don't know if the compiler will optimize out the offset if it is 0? Does it or should we have two functions, one with offset and one without?

0 is a constant in the other callsite, so it should be easy to optimize for llvm.

cc @lzutao to see if he's interested to check this out

pickfire · 2020-10-05T01:46:51Z

library/alloc/src/vec.rs

+            && self.capacity() - self.len() < iterator.len()
+            && iterator.cap - iterator.len() >= self.len()


Can this be simplified as?

Suggested change

&& self.capacity() - self.len() < iterator.len()

&& iterator.cap - iterator.len() >= self.len()

&& iterator.len() - self.len() < iterator.cap - self.capacity()

I wonder if this will always grow the vec if the iterator is larger.

The proposed change would underflow if self is larger than iterator

This is technically still an overflow, but that iterator.len() - self.len() would panic or wrap if, say, self.capacity() == 20_000 and self.len() == 19_950, and iterator.len() == 100.

I meant the case where self.capacity() > iterator.cap . The subtraction would underflow the usize result and thus lead to the inequality unexpectedly evaluating to true which would then violate the safety constraints of into_vec_with_uninit_prefix

Then how about?

Suggested change

&& self.capacity() - self.len() < iterator.len()

&& iterator.cap - iterator.len() >= self.len()

&& iterator.len().saturating_sub(self.len()) < iterator.cap.saturating_sub(self.capacity())

I

For a self with len == 2 && cap == 2 and an iterator len == 2 && cap == 3 that would evaluate to true and attempt to store 4 elements into an allocation of 3. 💣💥

library/alloc/src/vec.rs

the8472 · 2020-10-06T20:41:17Z

Should we try rerunning the benchmarks? I think a notable part may be gone.

Significant parts were changed since then, yes.

library/alloc/src/vec.rs

pickfire · 2020-10-08T01:41:49Z

library/alloc/src/vec.rs

+        //  ² memcpy
+        //  ³ memmove
+        //  ⁴ free
+        //


Suggested change

//

That line was intentionally left blank.

I think following the rest of the codebase convention here would have this blank line not have the comment //, and just be empty.

pickfire · 2020-10-08T01:59:52Z

library/alloc/src/vec.rs

+        // ³  into_vec       ____------BBBB____--------------  Vec(0x00, 0, 4)    Vec(0x0a, 4, 8)
+        // ⁴  *self = v      ----------BBBB____--------------  Vec(0x0a, 4, 8)
+        //
+        // ## empty self, pristine iterator


Do we even check for this case below?

By the way, nice diagram.

into_vec_with_uninit_prefix is pretty much just a struct conversion when there's nothing to move.

pickfire · 2020-10-08T02:17:48Z

library/alloc/src/vec.rs

+        // ## insufficient capacity
+        //
+        //    [initial]      AAAAA-----BBBBBB__--------------  Vec(0x00, 5,  5)   IntoIter(0x0a, 0x0a, 0x0f, 8)
+        // ¹² reserve(6)     ----------BBBBBB__--AAAAA______-  Vec(0x14, 5,  11)  IntoIter(0x0a, 0x0a, 0x0f, 8)


Wait, is this correct or maybe I think wrongly. Shouldn't reserve only have one malloc (realloc) to reallocate the memory when the alignment is the same which in this case should be the same, why does it needs a memmove?

rust/library/std/src/alloc.rs

Lines 169 to 183 in e055f87

// SAFETY: `new_size` is non-zero as `old_size` is greater than or equal to `new_size`

// as required by safety conditions. Other conditions must be upheld by the caller

old_size if old_layout.align() == new_layout.align() => unsafe {

let new_size = new_layout.size();

// `realloc` probably checks for `new_size >= old_layout.size()` or something similar.

intrinsics::assume(new_size >= old_layout.size());

let raw_ptr = GlobalAlloc::realloc(self, ptr.as_ptr(), old_layout, new_size);

let ptr = NonNull::new(raw_ptr).ok_or(AllocError)?;

if zeroed {

raw_ptr.add(old_size).write_bytes(0, new_size - old_size);

}

Ok(NonNull::slice_from_raw_parts(ptr, new_size))

},

In many cases realloc is just malloc+memcpy+free. Only in some cases it can extend the allocation in place.

pickfire · 2020-10-08T02:19:26Z

library/alloc/src/vec.rs

        }
-        iterator.ptr = iterator.end;
+        iterator.move_into(self);


It would be cooler if the code are linked to the cool diagram.

Suggested change

iterator.move_into(self);

// Insufficient capacity

iterator.move_into(self);

Well, yes and no.

With the optimization present that is indeed all it covers. But it is the general codepath that also works without the optimization. So I don't want to give the impression that it can only handle that case.

pickfire

Looks good to me but I think we should run the timer again.

the8472 · 2020-10-15T19:41:21Z

@jyn514 poke for another perf run.

jyn514 · 2020-10-15T20:16:41Z

@bors try @rust-timer queue

Thanks for the ping, I'd forgotten about this.

rust-timer · 2020-10-15T20:16:42Z

Awaiting bors try build completion

bors · 2020-10-15T20:16:54Z

⌛ Trying commit 1885f38 with merge d99d46fbeabf815d807dea479ae158f7ae9041c2...

bors · 2020-10-15T21:00:26Z

☀️ Try build successful - checks-actions, checks-azure
Build commit: d99d46fbeabf815d807dea479ae158f7ae9041c2 (d99d46fbeabf815d807dea479ae158f7ae9041c2)

rust-timer · 2020-10-15T21:00:28Z

Queued d99d46fbeabf815d807dea479ae158f7ae9041c2 with parent b5c9e24, future comparison URL.

library/alloc/src/vec.rs

rust-timer · 2020-10-16T05:36:21Z

Finished benchmarking try commit (d99d46fbeabf815d807dea479ae158f7ae9041c2): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never

the8472 · 2020-10-16T09:29:31Z

Slight improvements in the bootstrap timings, otherwise not much change. Not sure if it's worth it.

pickfire · 2020-10-16T10:14:29Z

Slight change but the memory did improve, less memory used.

Mark-Simulacrum · 2020-10-16T12:33:16Z

I am pretty sure that the memory usage here is just noise. Noise in the 20% range is not unusual for max-rss.

I think the improvements on the bootstrap timing is also likely noise, though that's less clear.

KodrAus · 2020-10-23T04:30:50Z

@the8472 Hmm, was there a usecase you had in mind originally when optimizing this case?

the8472 · 2020-10-23T21:20:47Z

@KodrAus just trying to reduce allocations and memcpys. But looks like it didn't work out.

reuse RHS allocation in vec.extend() when the LHS is empty

c5af975

rust-highfive assigned KodrAus Oct 3, 2020

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 3, 2020

jyn514 added I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Oct 3, 2020

pickfire reviewed Oct 4, 2020

View reviewed changes

library/alloc/src/vec.rs Outdated Show resolved Hide resolved

pickfire reviewed Oct 4, 2020

View reviewed changes

library/alloc/src/vec.rs Outdated Show resolved Hide resolved

the8472 force-pushed the extend-recycle branch from b0a2494 to c5af975 Compare October 4, 2020 14:02

the8472 and others added 2 commits October 4, 2020 17:40

vec.extend() reuse RHS allocation if combined data does not fit into LHS

2409653

add comment for IntoIter::move_to

8a0a13a

Co-authored-by: Ivan Tham <pickfire@riseup.net>

the8472 force-pushed the extend-recycle branch from b8af850 to 8a0a13a Compare October 4, 2020 15:44

the8472 changed the title ~~reuse allocations for vec.append(vec) and vec.extend(vec.into_iter()) when the LHS is empty~~ reuse RHS allocation for vec.extend(vec.into_iter()) when they do not fit into the LHS Oct 4, 2020

the8472 mentioned this pull request Oct 4, 2020

Swap self and other in vec::append when it avoids a reallocation #77538

Closed

pickfire reviewed Oct 5, 2020

View reviewed changes

library/alloc/src/vec.rs Outdated Show resolved Hide resolved

pickfire reviewed Oct 5, 2020

View reviewed changes

library/alloc/src/vec.rs Outdated Show resolved Hide resolved

pickfire reviewed Oct 5, 2020

View reviewed changes

library/alloc/src/vec.rs Show resolved Hide resolved

the8472 added 3 commits October 6, 2020 22:34

rename function

cb477d4

add ascii illustration

5e6abf1

move self-replacement to separate line

567cd52

memory-based ascii illustration

1dbab48

pickfire reviewed Oct 7, 2020

View reviewed changes

library/alloc/src/vec.rs Show resolved Hide resolved

annotate used primitives in ascii illustration

aa4a4ac

pickfire reviewed Oct 8, 2020

View reviewed changes

reserve() was not annotated with free

1885f38

pickfire approved these changes Oct 10, 2020

View reviewed changes

the8472 commented Oct 15, 2020

View reviewed changes

library/alloc/src/vec.rs Outdated Show resolved Hide resolved

fix debug_assert and improve associated comment

d460c85

the8472 closed this Oct 23, 2020

		&& self.capacity() - self.len() < iterator.len()
		&& iterator.cap - iterator.len() >= self.len()

	&& self.capacity() - self.len() < iterator.len()
	&& iterator.cap - iterator.len() >= self.len()
	&& iterator.len() - self.len() < iterator.cap - self.capacity()

	&& self.capacity() - self.len() < iterator.len()
	&& iterator.cap - iterator.len() >= self.len()
	&& iterator.len().saturating_sub(self.len()) < iterator.cap.saturating_sub(self.capacity())

	// SAFETY: `new_size` is non-zero as `old_size` is greater than or equal to `new_size`
	// as required by safety conditions. Other conditions must be upheld by the caller
	old_size if old_layout.align() == new_layout.align() => unsafe {
	let new_size = new_layout.size();

	// `realloc` probably checks for `new_size >= old_layout.size()` or something similar.
	intrinsics::assume(new_size >= old_layout.size());

	let raw_ptr = GlobalAlloc::realloc(self, ptr.as_ptr(), old_layout, new_size);
	let ptr = NonNull::new(raw_ptr).ok_or(AllocError)?;
	if zeroed {
	raw_ptr.add(old_size).write_bytes(0, new_size - old_size);
	}
	Ok(NonNull::slice_from_raw_parts(ptr, new_size))
	},

	iterator.move_into(self);
	// Insufficient capacity
	iterator.move_into(self);

reuse RHS allocation for vec.extend(vec.into_iter()) when they do not fit into the LHS #77496

reuse RHS allocation for vec.extend(vec.into_iter()) when they do not fit into the LHS #77496

Conversation

the8472 commented Oct 3, 2020 • edited Loading

rust-highfive commented Oct 3, 2020

jyn514 commented Oct 3, 2020

rust-timer commented Oct 3, 2020

bors commented Oct 3, 2020

the8472 commented Oct 3, 2020

bors commented Oct 3, 2020

rust-timer commented Oct 3, 2020

rust-timer commented Oct 3, 2020

jyn514 commented Oct 3, 2020

jyn514 commented Oct 3, 2020

the8472 commented Oct 4, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pickfire Oct 5, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

the8472 commented Oct 6, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pickfire Oct 8, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pickfire left a comment

Choose a reason for hiding this comment

the8472 commented Oct 15, 2020

jyn514 commented Oct 15, 2020

rust-timer commented Oct 15, 2020

bors commented Oct 15, 2020

bors commented Oct 15, 2020

rust-timer commented Oct 15, 2020

rust-timer commented Oct 16, 2020

the8472 commented Oct 16, 2020

pickfire commented Oct 16, 2020 • edited Loading

Mark-Simulacrum commented Oct 16, 2020

KodrAus commented Oct 23, 2020

the8472 commented Oct 23, 2020

the8472 commented Oct 3, 2020 •

edited

Loading

pickfire Oct 5, 2020 •

edited

Loading

pickfire Oct 8, 2020 •

edited

Loading

pickfire commented Oct 16, 2020 •

edited

Loading