New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std: Avoid `ptr::copy` if unnecessary in `vec::Drain` #50575

Merged
merged 1 commit into from May 11, 2018

Conversation

Projects
None yet
8 participants
@alexcrichton
Member

alexcrichton commented May 9, 2018

This commit is spawned out of a performance regression investigation in #50496.
In tracking down this regression it turned out that the expand_statements
function in the compiler was taking quite a long time. Further investigation
showed two key properties:

  • The function was "fast" on glibc 2.24 and slow on glibc 2.23
  • The hottest function was memmove from glibc

Combined together it looked like glibc gained an optimization to the memmove
function in 2.24. Ideally we don't want to rely on this optimization, so I
wanted to dig further to see what was happening.

The hottest part of expand_statements was Drop for Drain in the call to
splice where we insert new statements into the original vector. This should
be a cheap operation because we're draining and replacing iterators of the exact
same length, but under the hood memmove was being called a lot, causing a
slowdown on glibc 2.23.

It turns out that at least one of the optimizations in glibc 2.24 was that
memmove where the src/dst are equal becomes much faster. This program
executes in ~2.5s against glibc 2.23 and ~0.3s against glibc 2.24, exhibiting
how glibc 2.24 is optimizing memmove if the src/dst are equal.

And all that brings us to what this commit itself is doing. The change here is
purely to Drop for Drain to avoid the call to ptr::copy if the region being
copied doesn't actually need to be copied. For normal usage of just Drain
itself this check isn't really necessary, but because Splice internally
contains Drain this provides a nice speed boost on glibc 2.23. Overall this
should fix the regression seen in #50496 on glibc 2.23 and also fix the
regression on Windows where memmove looks to not have this optimization.

Note that the way splice was called in expand_statements would cause a
quadratic number of elements to be copied via memmove which is likely why the
tuple-stress benchmark showed such a severe regression.

Closes #50496

std: Avoid `ptr::copy` if unnecessary in `vec::Drain`
This commit is spawned out of a performance regression investigation in #50496.
In tracking down this regression it turned out that the `expand_statements`
function in the compiler was taking quite a long time. Further investigation
showed two key properties:

* The function was "fast" on glibc 2.24 and slow on glibc 2.23
* The hottest function was memmove from glibc

Combined together it looked like glibc gained an optimization to the memmove
function in 2.24. Ideally we don't want to rely on this optimization, so I
wanted to dig further to see what was happening.

The hottest part of `expand_statements` was `Drop for Drain` in the call to
`splice` where we insert new statements into the original vector. This *should*
be a cheap operation because we're draining and replacing iterators of the exact
same length, but under the hood memmove was being called a lot, causing a
slowdown on glibc 2.23.

It turns out that at least one of the optimizations in glibc 2.24 was that
`memmove` where the src/dst are equal becomes much faster. [This program][prog]
executes in ~2.5s against glibc 2.23 and ~0.3s against glibc 2.24, exhibiting
how glibc 2.24 is optimizing `memmove` if the src/dst are equal.

And all that brings us to what this commit itself is doing. The change here is
purely to `Drop for Drain` to avoid the call to `ptr::copy` if the region being
copied doesn't actually need to be copied. For normal usage of just `Drain`
itself this check isn't really necessary, but because `Splice` internally
contains `Drain` this provides a nice speed boost on glibc 2.23. Overall this
should fix the regression seen in #50496 on glibc 2.23 and also fix the
regression on Windows where `memmove` looks to not have this optimization.

Note that the way `splice` was called in `expand_statements` would cause a
quadratic number of elements to be copied via `memmove` which is likely why the
tuple-stress benchmark showed such a severe regression.

Closes #50496

[prog]: https://gist.github.com/alexcrichton/c05bc51c6771bba5ae5b57561a6c1cd3
@rust-highfive

This comment has been minimized.

Collaborator

rust-highfive commented May 9, 2018

r? @aidanhs

(rust_highfive has picked a reviewer for you, use r? to override)

@kennytm

This comment has been minimized.

Member

kennytm commented May 9, 2018

@bors try

@Mark-Simulacrum could we do a perf check?

@bors

This comment has been minimized.

Contributor

bors commented May 9, 2018

⌛️ Trying commit 254b601 with merge 1aac55d...

bors added a commit that referenced this pull request May 9, 2018

Auto merge of #50575 - alexcrichton:faster-drain-drop, r=<try>
std: Avoid `ptr::copy` if unnecessary in `vec::Drain`

This commit is spawned out of a performance regression investigation in #50496.
In tracking down this regression it turned out that the `expand_statements`
function in the compiler was taking quite a long time. Further investigation
showed two key properties:

* The function was "fast" on glibc 2.24 and slow on glibc 2.23
* The hottest function was memmove from glibc

Combined together it looked like glibc gained an optimization to the memmove
function in 2.24. Ideally we don't want to rely on this optimization, so I
wanted to dig further to see what was happening.

The hottest part of `expand_statements` was `Drop for Drain` in the call to
`splice` where we insert new statements into the original vector. This *should*
be a cheap operation because we're draining and replacing iterators of the exact
same length, but under the hood memmove was being called a lot, causing a
slowdown on glibc 2.23.

It turns out that at least one of the optimizations in glibc 2.24 was that
`memmove` where the src/dst are equal becomes much faster. [This program][prog]
executes in ~2.5s against glibc 2.23 and ~0.3s against glibc 2.24, exhibiting
how glibc 2.24 is optimizing `memmove` if the src/dst are equal.

And all that brings us to what this commit itself is doing. The change here is
purely to `Drop for Drain` to avoid the call to `ptr::copy` if the region being
copied doesn't actually need to be copied. For normal usage of just `Drain`
itself this check isn't really necessary, but because `Splice` internally
contains `Drain` this provides a nice speed boost on glibc 2.23. Overall this
should fix the regression seen in #50496 on glibc 2.23 and also fix the
regression on Windows where `memmove` looks to not have this optimization.

Note that the way `splice` was called in `expand_statements` would cause a
quadratic number of elements to be copied via `memmove` which is likely why the
tuple-stress benchmark showed such a severe regression.

Closes #50496

[prog]: https://gist.github.com/alexcrichton/c05bc51c6771bba5ae5b57561a6c1cd3
@kennytm

This comment has been minimized.

Member

kennytm commented May 9, 2018

Also beta-nominating this for 1.27.

@Mark-Simulacrum

This comment has been minimized.

Member

Mark-Simulacrum commented May 9, 2018

Perf has been queued.

@sfackler

This comment has been minimized.

Member

sfackler commented May 9, 2018

This seems like a reasonable thing to do even if it doesn't fix the perf regression. r=me

@bors

This comment has been minimized.

Contributor

bors commented May 9, 2018

☀️ Test successful - status-travis
State: approved= try=True

@alexcrichton

This comment has been minimized.

Member

alexcrichton commented May 9, 2018

FWIW locally on my computer (glibc 2.23)

$ time rustc +stable main.rs
rustc +stable main.rs  4.94s user 0.18s system 100% cpu 5.080 total
$ time rustc +beta main.rs
rustc +beta main.rs  5.00s user 0.09s system 98% cpu 5.185 total
$ time rustc +nightly main.rs
^C
rustc +nightly main.rs  206.66s user 0.14s system 99% cpu 3:27.45 total
$ time rustc +1aac55d18084910bbfa1d25733a5393860616b8b main.rs
rustc +1aac55d18084910bbfa1d25733a5393860616b8b main.rs  4.06s user 0.05s system 100% cpu 4.092 total
@alexcrichton

This comment has been minimized.

Member

alexcrichton commented May 9, 2018

@alexcrichton

This comment has been minimized.

Member

alexcrichton commented May 9, 2018

While the link I pasted above doesn't work yet this is the results comparing to the previous commit on master so either this PR or the rollup before it caused the speed boost, but I'm gonna optimistically say it was this PR :)

@bors: r=sfackler

@bors

This comment has been minimized.

Contributor

bors commented May 9, 2018

📌 Commit 254b601 has been approved by sfackler

@alexcrichton

This comment has been minimized.

Member

alexcrichton commented May 10, 2018

@bors: rollup

alexcrichton added a commit to alexcrichton/rust that referenced this pull request May 10, 2018

Rollup merge of rust-lang#50575 - alexcrichton:faster-drain-drop, r=s…
…fackler

std: Avoid `ptr::copy` if unnecessary in `vec::Drain`

This commit is spawned out of a performance regression investigation in rust-lang#50496.
In tracking down this regression it turned out that the `expand_statements`
function in the compiler was taking quite a long time. Further investigation
showed two key properties:

* The function was "fast" on glibc 2.24 and slow on glibc 2.23
* The hottest function was memmove from glibc

Combined together it looked like glibc gained an optimization to the memmove
function in 2.24. Ideally we don't want to rely on this optimization, so I
wanted to dig further to see what was happening.

The hottest part of `expand_statements` was `Drop for Drain` in the call to
`splice` where we insert new statements into the original vector. This *should*
be a cheap operation because we're draining and replacing iterators of the exact
same length, but under the hood memmove was being called a lot, causing a
slowdown on glibc 2.23.

It turns out that at least one of the optimizations in glibc 2.24 was that
`memmove` where the src/dst are equal becomes much faster. [This program][prog]
executes in ~2.5s against glibc 2.23 and ~0.3s against glibc 2.24, exhibiting
how glibc 2.24 is optimizing `memmove` if the src/dst are equal.

And all that brings us to what this commit itself is doing. The change here is
purely to `Drop for Drain` to avoid the call to `ptr::copy` if the region being
copied doesn't actually need to be copied. For normal usage of just `Drain`
itself this check isn't really necessary, but because `Splice` internally
contains `Drain` this provides a nice speed boost on glibc 2.23. Overall this
should fix the regression seen in rust-lang#50496 on glibc 2.23 and also fix the
regression on Windows where `memmove` looks to not have this optimization.

Note that the way `splice` was called in `expand_statements` would cause a
quadratic number of elements to be copied via `memmove` which is likely why the
tuple-stress benchmark showed such a severe regression.

Closes rust-lang#50496

[prog]: https://gist.github.com/alexcrichton/c05bc51c6771bba5ae5b57561a6c1cd3

alexcrichton added a commit to alexcrichton/rust that referenced this pull request May 10, 2018

Rollup merge of rust-lang#50575 - alexcrichton:faster-drain-drop, r=s…
…fackler

std: Avoid `ptr::copy` if unnecessary in `vec::Drain`

This commit is spawned out of a performance regression investigation in rust-lang#50496.
In tracking down this regression it turned out that the `expand_statements`
function in the compiler was taking quite a long time. Further investigation
showed two key properties:

* The function was "fast" on glibc 2.24 and slow on glibc 2.23
* The hottest function was memmove from glibc

Combined together it looked like glibc gained an optimization to the memmove
function in 2.24. Ideally we don't want to rely on this optimization, so I
wanted to dig further to see what was happening.

The hottest part of `expand_statements` was `Drop for Drain` in the call to
`splice` where we insert new statements into the original vector. This *should*
be a cheap operation because we're draining and replacing iterators of the exact
same length, but under the hood memmove was being called a lot, causing a
slowdown on glibc 2.23.

It turns out that at least one of the optimizations in glibc 2.24 was that
`memmove` where the src/dst are equal becomes much faster. [This program][prog]
executes in ~2.5s against glibc 2.23 and ~0.3s against glibc 2.24, exhibiting
how glibc 2.24 is optimizing `memmove` if the src/dst are equal.

And all that brings us to what this commit itself is doing. The change here is
purely to `Drop for Drain` to avoid the call to `ptr::copy` if the region being
copied doesn't actually need to be copied. For normal usage of just `Drain`
itself this check isn't really necessary, but because `Splice` internally
contains `Drain` this provides a nice speed boost on glibc 2.23. Overall this
should fix the regression seen in rust-lang#50496 on glibc 2.23 and also fix the
regression on Windows where `memmove` looks to not have this optimization.

Note that the way `splice` was called in `expand_statements` would cause a
quadratic number of elements to be copied via `memmove` which is likely why the
tuple-stress benchmark showed such a severe regression.

Closes rust-lang#50496

[prog]: https://gist.github.com/alexcrichton/c05bc51c6771bba5ae5b57561a6c1cd3

bors added a commit that referenced this pull request May 10, 2018

Auto merge of #50611 - alexcrichton:rollup, r=alexcrichton
Rollup of 18 pull requests

Successful merges:

 - #49423 (Extend tests for RFC1598 (GAT))
 - #50010 (Give SliceIndex impls a test suite of girth befitting the implementation (and fix a UTF8 boundary check))
 - #50447 (Fix update-references for tests within subdirectories.)
 - #50514 (Pull in a wasm fix from LLVM upstream)
 - #50524 (Make DepGraph::previous_work_products immutable)
 - #50532 (Don't use Lock for heavily accessed CrateMetadata::cnum_map.)
 - #50538 ( Make CrateNum allocation more thread-safe. )
 - #50564 (Inline `Span` methods.)
 - #50565 (Use SmallVec for DepNodeIndex within dep_graph.)
 - #50569 (Allow for specifying a linker plugin for cross-language LTO)
 - #50572 (Clarify in the docs that `mul_add` is not always faster.)
 - #50574 (add fn `into_inner(self) -> (Idx, Idx)` to RangeInclusive (#49022))
 - #50575 (std: Avoid `ptr::copy` if unnecessary in `vec::Drain`)
 - #50588 (Move "See also" disambiguation links for primitive types to top)
 - #50590 (Fix tuple struct field spans)
 - #50591 (Restore RawVec::reserve* documentation)
 - #50598 (Remove unnecessary mutable borrow and resizing in DepGraph::serialize)
 - #50606 (Retry when downloading the Docker cache.)

Failed merges:

 - #50161 (added missing implementation hint)
 - #50558 (Remove all reference to DepGraph::work_products)

bors added a commit that referenced this pull request May 10, 2018

Auto merge of #50611 - alexcrichton:rollup, r=alexcrichton
Rollup of 18 pull requests

Successful merges:

 - #49423 (Extend tests for RFC1598 (GAT))
 - #50010 (Give SliceIndex impls a test suite of girth befitting the implementation (and fix a UTF8 boundary check))
 - #50447 (Fix update-references for tests within subdirectories.)
 - #50514 (Pull in a wasm fix from LLVM upstream)
 - #50524 (Make DepGraph::previous_work_products immutable)
 - #50532 (Don't use Lock for heavily accessed CrateMetadata::cnum_map.)
 - #50538 ( Make CrateNum allocation more thread-safe. )
 - #50564 (Inline `Span` methods.)
 - #50565 (Use SmallVec for DepNodeIndex within dep_graph.)
 - #50569 (Allow for specifying a linker plugin for cross-language LTO)
 - #50572 (Clarify in the docs that `mul_add` is not always faster.)
 - #50574 (add fn `into_inner(self) -> (Idx, Idx)` to RangeInclusive (#49022))
 - #50575 (std: Avoid `ptr::copy` if unnecessary in `vec::Drain`)
 - #50588 (Move "See also" disambiguation links for primitive types to top)
 - #50590 (Fix tuple struct field spans)
 - #50591 (Restore RawVec::reserve* documentation)
 - #50598 (Remove unnecessary mutable borrow and resizing in DepGraph::serialize)
 - #50606 (Retry when downloading the Docker cache.)

Failed merges:

 - #50161 (added missing implementation hint)
 - #50558 (Remove all reference to DepGraph::work_products)

bors added a commit that referenced this pull request May 10, 2018

Auto merge of #50611 - alexcrichton:rollup, r=alexcrichton
Rollup of 18 pull requests

Successful merges:

 - #49423 (Extend tests for RFC1598 (GAT))
 - #50010 (Give SliceIndex impls a test suite of girth befitting the implementation (and fix a UTF8 boundary check))
 - #50447 (Fix update-references for tests within subdirectories.)
 - #50514 (Pull in a wasm fix from LLVM upstream)
 - #50524 (Make DepGraph::previous_work_products immutable)
 - #50532 (Don't use Lock for heavily accessed CrateMetadata::cnum_map.)
 - #50538 ( Make CrateNum allocation more thread-safe. )
 - #50564 (Inline `Span` methods.)
 - #50565 (Use SmallVec for DepNodeIndex within dep_graph.)
 - #50569 (Allow for specifying a linker plugin for cross-language LTO)
 - #50572 (Clarify in the docs that `mul_add` is not always faster.)
 - #50574 (add fn `into_inner(self) -> (Idx, Idx)` to RangeInclusive (#49022))
 - #50575 (std: Avoid `ptr::copy` if unnecessary in `vec::Drain`)
 - #50588 (Move "See also" disambiguation links for primitive types to top)
 - #50590 (Fix tuple struct field spans)
 - #50591 (Restore RawVec::reserve* documentation)
 - #50598 (Remove unnecessary mutable borrow and resizing in DepGraph::serialize)
 - #50606 (Retry when downloading the Docker cache.)

Failed merges:

 - #50161 (added missing implementation hint)
 - #50558 (Remove all reference to DepGraph::work_products)

@bors bors merged commit 254b601 into rust-lang:master May 11, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details

bors added a commit that referenced this pull request May 11, 2018

Auto merge of #50631 - pietroalbini:beta-backports, r=alexcrichton
[beta] Process backports

* #50575: std: Avoid `ptr::copy` if unnecessary in `vec::Drain`

r? @alexcrichton

@alexcrichton alexcrichton deleted the alexcrichton:faster-drain-drop branch Jun 29, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment