Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More `ObligationForest` improvements #64545

Merged
merged 5 commits into from Sep 19, 2019

Conversation

@nnethercote
Copy link
Contributor

commented Sep 17, 2019

Following on from #64500, these commits alsomake the code both nicer and faster.

r? @nikomatsakis

nnethercote added 3 commits Sep 17, 2019
`Node` has an optional parent and a list of other descendents. Most of
the time the parent is treated the same as the other descendents --
error-handling is the exception -- and chaining them together for
iteration has a non-trivial cost.

This commit changes the representation. There is now a single list of
descendants, and a boolean flag that indicates if there is a parent (in
which case it is first descendent). This representation encodes the same
information, in a way that is less idiomatic but cheaper to iterate over
for the common case where the parent doesn't need special treatment.

As a result, some benchmark workloads are up to 2% faster.
The size of the indices doesn't matter much here, and having a
`newtype_index!` index type without also using `IndexVec` requires lots
of conversions. So this commit removes `NodeIndex` in favour of uniform
use of `usize` as the index type. As well as making the code slightly
more concise, it also slightly speeds things up.
Now that all indices have type `usize`, it makes sense to be more
consistent about their naming. This commit removes all uses of `i` in
favour of `index`.
@nnethercote

This comment has been minimized.

Copy link
Contributor Author

commented Sep 17, 2019

Some local measurements:

inflate-check
        avg: -10.3%     min: -14.1%     max: -0.0%
keccak-check
        avg: -7.6%      min: -11.7%     max: -0.0%
cranelift-codegen-check
        avg: -1.3%      min: -2.2%      max: -0.0%
clap-rs-check
        avg: -0.7%      min: -1.7%      max: 0.0%
serde-check
        avg: -0.7%      min: -1.2%      max: -0.0%
deeply-nested-check
        avg: -0.5%      min: -0.8%      max: 0.0%
unify-linearly-check
        avg: -0.3%      min: -0.6%      max: 0.0%
style-servo-check
        avg: -0.4%      min: -0.6%      max: 0.0%
wg-grammar-check
        avg: -0.3%      min: -0.5%      max: -0.0%

Let's see how a CI perf run compares.

@bors try

@bors

This comment has been minimized.

Copy link
Contributor

commented Sep 17, 2019

⌛️ Trying commit b261edf with merge 2ae267a...

bors added a commit that referenced this pull request Sep 17, 2019
More `ObligationForest` improvements

Following on from #64500, these commits alsomake the code both nicer and faster.

r? @nikomatsakis
@bors

This comment has been minimized.

Copy link
Contributor

commented Sep 17, 2019

☀️ Try build successful - checks-azure
Build commit: 2ae267a

@Mark-Simulacrum

This comment has been minimized.

Copy link
Member

commented Sep 17, 2019

@rust-timer

This comment has been minimized.

Copy link

commented Sep 17, 2019

Queued 2ae267a with parent 5670d04, future comparison URL.

@rust-timer

This comment has been minimized.

Copy link

commented Sep 17, 2019

Finished benchmarking try commit 2ae267a, comparison URL.

@nnethercote

This comment has been minimized.

Copy link
Contributor Author

commented Sep 17, 2019

Perf results on CI are similar to what I saw locally: big wins on keccak and inflate, smaller wins elsewhere.

I think the lesson from this PR (especially commits 1, 4 and 5) is "don't rely on the compiler to fully optimize higher-level constructs in the hottest code".

nnethercote added a commit to nnethercote/rust that referenced this pull request Sep 18, 2019
PR rust-lang#64545 got a big speed-up by replacing a hot call to `all()` with
explicit iteration. This is because the implementation of `all()` is
excessively complex: it wraps the given predicate in a closure that
returns a `LoopState`, passes that closure to `try_for_each()`, which
wraps the first closure in a second closure, passes that second closure
to `try_fold()`, which does the actual iteration using the second
closure.

A sufficient smart compiler could optimize all this away; rustc is
currently not sufficiently smart.

This commit does the following.

- Changes the implementations of `all()`, `any()`, `find()` and
  `find_map()` to use the simplest possible code, rather than using
  `try_for_each()`. (I am reminded of "The Evolution of a Haskell
  Programmer".) These are both shorter and faster than the current
  implementations, and will permit the undoing of the `all()` removal in
  rust-lang#64545.

- Changes `ResultShunt::next()` so it doesn't call `self.find()`,
  because that was causing infinite recursion with the new
  implementation of `find()`, which itself calls `self.next()`. (I
  honestly don't know how the old implementation of
  `ResultShunt::next()` didn't cause an infinite loop, given that it
  also called `self.next()`, albeit via `try_for_each()` and
  `try_fold()`.)

- Changes `nth()` to use `self.next()` in a while loop rather than `for
  x in self`, because using self-iteration within an iterator method
  seems dubious, and `self.next()` is used in all the other iterator
  methods.
bors added a commit that referenced this pull request Sep 18, 2019
Simplify some `Iterator` methods.

PR #64545 got a big speed-up by replacing a hot call to `all()` with
explicit iteration. This is because the implementation of `all()` is
excessively complex: it wraps the given predicate in a closure that
returns a `LoopState`, passes that closure to `try_for_each()`, which
wraps the first closure in a second closure, passes that second closure
to `try_fold()`, which does the actual iteration using the second
closure.

A sufficient smart compiler could optimize all this away; rustc is
currently not sufficiently smart.

This commit does the following.

- Changes the implementations of `all()`, `any()`, `find()` and
  `find_map()` to use the simplest possible code, rather than using
  `try_for_each()`. (I am reminded of "The Evolution of a Haskell
  Programmer".) These are both shorter and faster than the current
  implementations, and will permit the undoing of the `all()` removal in
  #64545.

- Changes `ResultShunt::next()` so it doesn't call `self.find()`,
  because that was causing infinite recursion with the new
  implementation of `find()`, which itself calls `self.next()`. (I
  honestly don't know how the old implementation of
  `ResultShunt::next()` didn't cause an infinite loop, given that it
  also called `self.next()`, albeit via `try_for_each()` and
  `try_fold()`.)

- Changes `nth()` to use `self.next()` in a while loop rather than `for
  x in self`, because using self-iteration within an iterator method
  seems dubious, and `self.next()` is used in all the other iterator
  methods.
@nikomatsakis

This comment has been minimized.

Copy link
Contributor

commented Sep 18, 2019

r=me with comment, thanks @nnethercote!

nnethercote added 2 commits Sep 17, 2019
Amazingly enough, this is a 3.5% instruction count win on `keccak`.
The super-hot call site of `inlined_shallow_resolve()` basically does
`r.inlined_shallow_resolve(ty) != ty`. This commit introduces a
version of that function specialized for that particular call pattern,
`shallow_resolve_changed()`. Incredibly, this reduces the instruction
count for `keccak` by 5%.

The commit also renames `inlined_shallow_resolve()` as
`shallow_resolve()` and removes the `inline(always)` annotation, as it's
no longer nearly so hot.
@nnethercote nnethercote force-pushed the nnethercote:ObligForest-more branch from b261edf to 3b85597 Sep 18, 2019
@nnethercote

This comment has been minimized.

Copy link
Contributor Author

commented Sep 18, 2019

@bors r=nmatsakis

@bors

This comment has been minimized.

Copy link
Contributor

commented Sep 18, 2019

📌 Commit 3b85597 has been approved by nmatsakis

@nnethercote

This comment has been minimized.

Copy link
Contributor Author

commented Sep 18, 2019

Because this has a perf impact, let's do:
@bors rollup-never

@Mark-Simulacrum

This comment has been minimized.

Copy link
Member

commented Sep 18, 2019

Since we've already run perf there's probably no reason for that? command is rollup=never btw

@nnethercote

This comment has been minimized.

Copy link
Contributor Author

commented Sep 18, 2019

My preference is for perf-affecting PRs to land on their own. If you look at perf.rust-lang.org and see an improvement or regression, and find it's a roll-up, it can be a real pain to work out which PR caused the change.

@bors rollup=never

@nnethercote nnethercote reopened this Sep 18, 2019
@bors

This comment has been minimized.

Copy link
Contributor

commented Sep 19, 2019

⌛️ Testing commit 3b85597 with merge 9b9d2af...

bors added a commit that referenced this pull request Sep 19, 2019
More `ObligationForest` improvements

Following on from #64500, these commits alsomake the code both nicer and faster.

r? @nikomatsakis
@bors

This comment has been minimized.

Copy link
Contributor

commented Sep 19, 2019

☀️ Test successful - checks-azure
Approved by: nmatsakis
Pushing 9b9d2af to master...

@bors bors added the merged-by-bors label Sep 19, 2019
@bors bors merged commit 3b85597 into rust-lang:master Sep 19, 2019
5 checks passed
5 checks passed
homu Test successful
Details
pr Build #20190918.54 succeeded
Details
pr (Linux mingw-check) Linux mingw-check succeeded
Details
pr (Linux x86_64-gnu-llvm-6.0) Linux x86_64-gnu-llvm-6.0 succeeded
Details
pr (LinuxTools) LinuxTools succeeded
Details
@nnethercote nnethercote deleted the nnethercote:ObligForest-more branch Sep 20, 2019
bors added a commit that referenced this pull request Sep 20, 2019
Even more `ObligationForest` improvements

Following on from #64545, more speed and readability improvements.

r? @nikomatsakis
bors added a commit that referenced this pull request Sep 20, 2019
Even more `ObligationForest` improvements

Following on from #64545, more speed and readability improvements.

r? @nikomatsakis
scottmcm added a commit to scottmcm/rust that referenced this pull request Sep 22, 2019
While this definitely helps sometimes (particularly for trivial closures), it's also a pessimization sometimes, so it's better to leave this to (hypothetical) future LLVM improvements instead of forcing this on everyone.

I think it's better for the advice to be that sometimes you need to unroll manually than you sometimes need to not-unroll manually (like rust-lang#64545).
bors added a commit that referenced this pull request Sep 24, 2019
Remove manual unrolling from slice::Iter(Mut)::try_fold

While this definitely helps sometimes (particularly for trivial closures), it's also a pessimization sometimes, so it's better to leave this to (hypothetical) future LLVM improvements instead of forcing this on everyone.

I think it's better for the advice to be that sometimes you need to unroll manually than you sometimes need to not-unroll manually (like #64545).

---

For context see #64572 (comment)
bors added a commit that referenced this pull request Sep 25, 2019
Even more `ObligationForest` improvements

Following on from #64545, more speed and readability improvements.

r? @nikomatsakis
bors added a commit that referenced this pull request Sep 30, 2019
Remove manual unrolling from slice::Iter(Mut)::try_fold

While this definitely helps sometimes (particularly for trivial closures), it's also a pessimization sometimes, so it's better to leave this to (hypothetical) future LLVM improvements instead of forcing this on everyone.

I think it's better for the advice to be that sometimes you need to unroll manually than you sometimes need to not-unroll manually (like #64545).

---

For context see #64572 (comment)
choller added a commit to choller/rust that referenced this pull request Oct 17, 2019
While this definitely helps sometimes (particularly for trivial closures), it's also a pessimization sometimes, so it's better to leave this to (hypothetical) future LLVM improvements instead of forcing this on everyone.

I think it's better for the advice to be that sometimes you need to unroll manually than you sometimes need to not-unroll manually (like rust-lang#64545).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.