Optimize ObligationForest's NodeState handling. #36993

Merged
merged 1 commit into from Nov 3, 2016

Conversation

Projects
None yet
8 participants
@nnethercote
Contributor

nnethercote commented Oct 6, 2016

This commit does the following.

  • Changes NodeState from an enum to a bitflags. This makes it
    possible to check against multiple possible values in a single bitwise
    operation.
  • Replaces all the hot matches involving NodeState with if/else
    chains that ensure that cases are handled in the order of frequency.
  • Partially inlines two functions, find_cycles_from_node and
    mark_as_waiting_from, at two call sites in order to avoid function
    unnecessary function calls on hot paths.
  • Fully inlines and removes is_popped.

These changes speeds up rustc-benchmarks/inflate-0.1.0 by about 7% when
doing debug builds with a stage1 compiler.

r? @arielb1

@arielb1

This comment has been minimized.

Show comment
Hide comment
@arielb1

arielb1 Oct 7, 2016

Contributor

That sounds like Totally The Wrong Way to handle an O(n^2) algorithm.

Contributor

arielb1 commented Oct 7, 2016

That sounds like Totally The Wrong Way to handle an O(n^2) algorithm.

@arielb1

This comment has been minimized.

Show comment
Hide comment
@arielb1

arielb1 Oct 7, 2016

Contributor

For example, you should be able to get a better perf improvement by skipping over the entire middle "GC" steps if all obligations hit the Ok(None) case.

Contributor

arielb1 commented Oct 7, 2016

For example, you should be able to get a better perf improvement by skipping over the entire middle "GC" steps if all obligations hit the Ok(None) case.

@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Oct 7, 2016

Contributor

That sounds like Totally The Wrong Way to handle an O(n^2) algorithm.

I found this comment to be curt and unhelpful. It made me feel angry and bad. It probably wasn't your intent, but that is the effect it had. Please think about how you express yourself when criticizing other people's code.

Contributor

nnethercote commented Oct 7, 2016

That sounds like Totally The Wrong Way to handle an O(n^2) algorithm.

I found this comment to be curt and unhelpful. It made me feel angry and bad. It probably wasn't your intent, but that is the effect it had. Please think about how you express yourself when criticizing other people's code.

@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Oct 7, 2016

Contributor

For example, you should be able to get a better perf improvement by skipping over the entire middle "GC" steps if all obligations hit the Ok(None) case.

Can you elaborate more? Are you talking about process_obligations?

Contributor

nnethercote commented Oct 7, 2016

For example, you should be able to get a better perf improvement by skipping over the entire middle "GC" steps if all obligations hit the Ok(None) case.

Can you elaborate more? Are you talking about process_obligations?

@arielb1

This comment has been minimized.

Show comment
Hide comment
@arielb1

arielb1 Oct 8, 2016

Contributor

I found this comment to be curt and unhelpful. It made me feel angry and bad. It probably wasn't your intent, but that is the effect it had. Please think about how you express yourself when criticizing other people's code.

I'm sorry.

I just think that the best way to deal with an O(n^2) case is to well, make it stop being O(n^2), rather than adding low-level optimizations to the specific O(n^2) part.

The main entry point of the obligation forest is process_obligations, and it is basically updates all obligations and then performs a "mark-and-sweep GC", where the running of the "finalizers" advances type inference. The mark-and-sweep GC takes O(n) time.

If nothing changes in the process_obligation loop, we can skip GCing altogether. Also, because cycles are rare (and shouldn't be present in the inflate example), we may want to use some hybrid of garbage collection and reference counting to avoid the sweeps.

Contributor

arielb1 commented Oct 8, 2016

I found this comment to be curt and unhelpful. It made me feel angry and bad. It probably wasn't your intent, but that is the effect it had. Please think about how you express yourself when criticizing other people's code.

I'm sorry.

I just think that the best way to deal with an O(n^2) case is to well, make it stop being O(n^2), rather than adding low-level optimizations to the specific O(n^2) part.

The main entry point of the obligation forest is process_obligations, and it is basically updates all obligations and then performs a "mark-and-sweep GC", where the running of the "finalizers" advances type inference. The mark-and-sweep GC takes O(n) time.

If nothing changes in the process_obligation loop, we can skip GCing altogether. Also, because cycles are rare (and shouldn't be present in the inflate example), we may want to use some hybrid of garbage collection and reference counting to avoid the sweeps.

@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Oct 10, 2016

Contributor

I just think that the best way to deal with an O(n^2) case is to well, make it stop being O(n^2), rather than adding low-level optimizations to the specific O(n^2) part.

To be clear: this is a reasonable opinion. The problem was not that you criticized my code, the problem was the way you criticized it. I suggest you avoid using the phrase "Totally The Wrong Way" in future reviews. Also, when suggesting alternative approaches, please describe them in reasonable detail. (Your follow-up comment about GCing was much better in this regard.)

Contributor

nnethercote commented Oct 10, 2016

I just think that the best way to deal with an O(n^2) case is to well, make it stop being O(n^2), rather than adding low-level optimizations to the specific O(n^2) part.

To be clear: this is a reasonable opinion. The problem was not that you criticized my code, the problem was the way you criticized it. I suggest you avoid using the phrase "Totally The Wrong Way" in future reviews. Also, when suggesting alternative approaches, please describe them in reasonable detail. (Your follow-up comment about GCing was much better in this regard.)

@Manishearth

This comment has been minimized.

Show comment
Hide comment
@Manishearth

Manishearth Oct 16, 2016

Member

Mod note: As Nick said, please be constructive and nice in your criticism. It seems like you've already acknowledged that, but just in case, leaving this comment 😄

Member

Manishearth commented Oct 16, 2016

Mod note: As Nick said, please be constructive and nice in your criticism. It seems like you've already acknowledged that, but just in case, leaving this comment 😄

@zatherz

This comment has been minimized.

Show comment
Hide comment
@zatherz

zatherz Oct 16, 2016

Why have my constructive and nice comments been removed? I'm very well acquainted with Rust's code of conduct and I think @nnethercote's reply could very well be considered harassment. He tried to make @arielb1 feel bad for his reply. We shouldn't reply to harassment with harassment. I am so angry that I'm considering writing a Medium blog post about it and posting it to HackerNews.

zatherz commented Oct 16, 2016

Why have my constructive and nice comments been removed? I'm very well acquainted with Rust's code of conduct and I think @nnethercote's reply could very well be considered harassment. He tried to make @arielb1 feel bad for his reply. We shouldn't reply to harassment with harassment. I am so angry that I'm considering writing a Medium blog post about it and posting it to HackerNews.

@Manishearth

This comment has been minimized.

Show comment
Hide comment
@Manishearth

Manishearth Oct 17, 2016

Member

Mod note: @zatherz Your comment was deleted for containing profanity and blatantly violating the code of conduct. If you have further questions about moderation or the code of conduct, please ask rust-mods@rust-lang.org or on one of the forums. This is not the place for such discussion [further offtopic comments will be deleted]

Member

Manishearth commented Oct 17, 2016

Mod note: @zatherz Your comment was deleted for containing profanity and blatantly violating the code of conduct. If you have further questions about moderation or the code of conduct, please ask rust-mods@rust-lang.org or on one of the forums. This is not the place for such discussion [further offtopic comments will be deleted]

@aturon

This comment has been minimized.

Show comment
Hide comment
Member

aturon commented Oct 17, 2016

@BurntSushi

This comment has been minimized.

Show comment
Hide comment
@BurntSushi

BurntSushi Oct 19, 2016

Member

Moderator note: As @Manishearth said, if you have a specific question or objection with our moderation, then please ask rust-mods@rust-lang.org. This isn't the place for it.

Member

BurntSushi commented Oct 19, 2016

Moderator note: As @Manishearth said, if you have a specific question or objection with our moderation, then please ask rust-mods@rust-lang.org. This isn't the place for it.

@nikomatsakis nikomatsakis self-assigned this Oct 21, 2016

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Oct 21, 2016

Contributor

Adding myself as another assigneee. Sorry I haven't taken a look yet. My initial reaction was somewhat similar to @arielb1 (i.e., I'd rather solve this from the top), but on the other hand the speed improvements that @nnethercote showed are promising! I'd like to give this a closer read. Will try to do ASAP! Apologies for the delay @nnethercote!

Contributor

nikomatsakis commented Oct 21, 2016

Adding myself as another assigneee. Sorry I haven't taken a look yet. My initial reaction was somewhat similar to @arielb1 (i.e., I'd rather solve this from the top), but on the other hand the speed improvements that @nnethercote showed are promising! I'd like to give this a closer read. Will try to do ASAP! Apologies for the delay @nnethercote!

@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Oct 21, 2016

Contributor

Adding myself as another assigneee

Oh, please wait a bit! #37231 landed recently and that changed the profiles significantly, so I've started reworking this PR but it's not ready yet.

Contributor

nnethercote commented Oct 21, 2016

Adding myself as another assigneee

Oh, please wait a bit! #37231 landed recently and that changed the profiles significantly, so I've started reworking this PR but it's not ready yet.

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Oct 24, 2016

Contributor

@nnethercote ok ping me

Contributor

nikomatsakis commented Oct 24, 2016

@nnethercote ok ping me

@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Oct 24, 2016

Contributor

I've updated. The commit no longer converts NodeState to bitflags, which was probably the most controversial part of the change. Due to #37231 this is now only a 2% win.

r? @nikomatsakis

It feels like there is still plenty of room for improvement in this benchmark, though I'm struggling to find those improvements myself. Here's Cachegrind's summary of the hottest functions (measurements are instruction counts).

1,743,729,697  src/librustc/traits/fulfill.rs:<rustc::traits::fulfill::FulfillProcessor<'a, 'b, 'gcx, 'tcx> as rustc_data_structures::obligation_forest::ObligationProcessor>::process_obligation
1,470,943,055  src/librustc_data_structures/unify/mod.rs:<rustc_data_structures::unify::UnificationTable<K>>::get
1,291,117,019  src/librustc_data_structures/obligation_forest/mod.rs:<rustc_data_structures::obligation_forest::ObligationForest<O>>::process_obligations
1,098,926,283  src/librustc/infer/mod.rs:rustc::infer::InferCtxt::shallow_resolve
  982,723,014  src/libcore/result.rs:<rustc::traits::fulfill::FulfillProcessor<'a, 'b, 'gcx, 'tcx> as rustc_data_structures::obligation_forest::ObligationProcessor>::process_obligation
  819,129,999  src/libcore/iter/iterator.rs:<rustc::traits::fulfill::FulfillProcessor<'a, 'b, 'gcx, 'tcx> as rustc_data_structures::obligation_forest::ObligationProcessor>::process_obligation
  608,066,958  src/librustc_data_structures/obligation_forest/mod.rs:<rustc_data_structures::obligation_forest::ObligationForest<O>>::compress
  465,370,000  /build/glibc-GKVZIf/glibc-2.23/string/::/sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:__memcpy_avx_unaligned
  446,992,304  /build/glibc-GKVZIf/glibc-2.23/malloc/malloc.c:_int_malloc
  433,218,885  src/librustc_data_structures/unify/mod.rs:<rustc_data_structures::unify::UnificationTable<K>>::get
  397,811,572  src/libcore/iter/range.rs:<rustc_data_structures::obligation_forest::ObligationForest<O>>::process_obligations
  397,030,502  ???:???
  368,654,892  src/libcore/option.rs:rustc::infer::InferCtxt::shallow_resolve
  321,693,196  src/libcore/iter/range.rs:<rustc_data_structures::obligation_forest::ObligationForest<O>>::compress

You have to squint a bit when interpreting them due to inlining but they give you a good idea. In particular, this call chain is very hot:

process_obligations -> process_obligation -> process_predicate -> shallow_resolve -> UnificationTable::probe -> UnificationTable::get

Any changes that can speed up any of those functions or reduce the number of calls in this chain are likely to help.

Contributor

nnethercote commented Oct 24, 2016

I've updated. The commit no longer converts NodeState to bitflags, which was probably the most controversial part of the change. Due to #37231 this is now only a 2% win.

r? @nikomatsakis

It feels like there is still plenty of room for improvement in this benchmark, though I'm struggling to find those improvements myself. Here's Cachegrind's summary of the hottest functions (measurements are instruction counts).

1,743,729,697  src/librustc/traits/fulfill.rs:<rustc::traits::fulfill::FulfillProcessor<'a, 'b, 'gcx, 'tcx> as rustc_data_structures::obligation_forest::ObligationProcessor>::process_obligation
1,470,943,055  src/librustc_data_structures/unify/mod.rs:<rustc_data_structures::unify::UnificationTable<K>>::get
1,291,117,019  src/librustc_data_structures/obligation_forest/mod.rs:<rustc_data_structures::obligation_forest::ObligationForest<O>>::process_obligations
1,098,926,283  src/librustc/infer/mod.rs:rustc::infer::InferCtxt::shallow_resolve
  982,723,014  src/libcore/result.rs:<rustc::traits::fulfill::FulfillProcessor<'a, 'b, 'gcx, 'tcx> as rustc_data_structures::obligation_forest::ObligationProcessor>::process_obligation
  819,129,999  src/libcore/iter/iterator.rs:<rustc::traits::fulfill::FulfillProcessor<'a, 'b, 'gcx, 'tcx> as rustc_data_structures::obligation_forest::ObligationProcessor>::process_obligation
  608,066,958  src/librustc_data_structures/obligation_forest/mod.rs:<rustc_data_structures::obligation_forest::ObligationForest<O>>::compress
  465,370,000  /build/glibc-GKVZIf/glibc-2.23/string/::/sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:__memcpy_avx_unaligned
  446,992,304  /build/glibc-GKVZIf/glibc-2.23/malloc/malloc.c:_int_malloc
  433,218,885  src/librustc_data_structures/unify/mod.rs:<rustc_data_structures::unify::UnificationTable<K>>::get
  397,811,572  src/libcore/iter/range.rs:<rustc_data_structures::obligation_forest::ObligationForest<O>>::process_obligations
  397,030,502  ???:???
  368,654,892  src/libcore/option.rs:rustc::infer::InferCtxt::shallow_resolve
  321,693,196  src/libcore/iter/range.rs:<rustc_data_structures::obligation_forest::ObligationForest<O>>::compress

You have to squint a bit when interpreting them due to inlining but they give you a good idea. In particular, this call chain is very hot:

process_obligations -> process_obligation -> process_predicate -> shallow_resolve -> UnificationTable::probe -> UnificationTable::get

Any changes that can speed up any of those functions or reduce the number of calls in this chain are likely to help.

@nikomatsakis

r=me with the helper extracted (maybe tagged #[inline]?)

+
+ for dependent in &node.dependents {
+ self.mark_as_waiting_from(&self.nodes[dependent.get()]);
+ }

This comment has been minimized.

@nikomatsakis

nikomatsakis Oct 31, 2016

Contributor

rather than making two copies of this code, maybe pull out a helper like mark_neighbors_as_waiting_from(node)?

@nikomatsakis

nikomatsakis Oct 31, 2016

Contributor

rather than making two copies of this code, maybe pull out a helper like mark_neighbors_as_waiting_from(node)?

This comment has been minimized.

@nnethercote

nnethercote Nov 2, 2016

Contributor

Done.

@nnethercote

nnethercote Nov 2, 2016

Contributor

Done.

Optimize ObligationForest's NodeState handling.
This commit partially inlines two functions, `find_cycles_from_node` and
`mark_as_waiting_from`, at two call sites in order to avoid function
unnecessary function calls on hot paths.

It also fully inlines and removes `is_popped`.

These changes speeds up rustc-benchmarks/inflate-0.1.0 by about 2% when
doing debug builds with a stage1 compiler.
@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Nov 2, 2016

Contributor

Comments addressed.

Contributor

nnethercote commented Nov 2, 2016

Comments addressed.

@aturon

This comment has been minimized.

Show comment
Hide comment
@aturon

aturon Nov 2, 2016

Member

@bors r=nikomatsakis

Member

aturon commented Nov 2, 2016

@bors r=nikomatsakis

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Nov 2, 2016

Contributor

@bors r+

Contributor

nikomatsakis commented Nov 2, 2016

@bors r+

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Nov 2, 2016

Contributor

📌 Commit 7b33f7e has been approved by nikomatsakis

Contributor

bors commented Nov 2, 2016

📌 Commit 7b33f7e has been approved by nikomatsakis

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Nov 2, 2016

Contributor

Not sure why @bors wasn't listening to @aturon. =)

Contributor

nikomatsakis commented Nov 2, 2016

Not sure why @bors wasn't listening to @aturon. =)

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Nov 3, 2016

Contributor

⌛️ Testing commit 7b33f7e with merge f9f45c6...

Contributor

bors commented Nov 3, 2016

⌛️ Testing commit 7b33f7e with merge f9f45c6...

bors added a commit that referenced this pull request Nov 3, 2016

Auto merge of #36993 - nnethercote:obligation, r=nikomatsakis
Optimize ObligationForest's NodeState handling.

This commit does the following.
- Changes `NodeState` from an enum to a `bitflags`. This makes it
  possible to check against multiple possible values in a single bitwise
  operation.
- Replaces all the hot `match`es involving `NodeState` with `if`/`else`
  chains that ensure that cases are handled in the order of frequency.
- Partially inlines two functions, `find_cycles_from_node` and
  `mark_as_waiting_from`, at two call sites in order to avoid function
  unnecessary function calls on hot paths.
- Fully inlines and removes `is_popped`.

These changes speeds up rustc-benchmarks/inflate-0.1.0 by about 7% when
doing debug builds with a stage1 compiler.

r? @arielb1

@bors bors merged commit 7b33f7e into rust-lang:master Nov 3, 2016

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details

@nnethercote nnethercote deleted the nnethercote:obligation branch Nov 3, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment