Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize ObligationForest's NodeState handling. #36993

Merged
merged 1 commit into from Nov 3, 2016

Conversation

Projects
None yet
8 participants
@nnethercote
Copy link
Contributor

nnethercote commented Oct 6, 2016

This commit does the following.

  • Changes NodeState from an enum to a bitflags. This makes it
    possible to check against multiple possible values in a single bitwise
    operation.
  • Replaces all the hot matches involving NodeState with if/else
    chains that ensure that cases are handled in the order of frequency.
  • Partially inlines two functions, find_cycles_from_node and
    mark_as_waiting_from, at two call sites in order to avoid function
    unnecessary function calls on hot paths.
  • Fully inlines and removes is_popped.

These changes speeds up rustc-benchmarks/inflate-0.1.0 by about 7% when
doing debug builds with a stage1 compiler.

r? @arielb1

@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented Oct 7, 2016

That sounds like Totally The Wrong Way to handle an O(n^2) algorithm.

@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented Oct 7, 2016

For example, you should be able to get a better perf improvement by skipping over the entire middle "GC" steps if all obligations hit the Ok(None) case.

@nnethercote

This comment has been minimized.

Copy link
Contributor Author

nnethercote commented Oct 7, 2016

That sounds like Totally The Wrong Way to handle an O(n^2) algorithm.

I found this comment to be curt and unhelpful. It made me feel angry and bad. It probably wasn't your intent, but that is the effect it had. Please think about how you express yourself when criticizing other people's code.

@nnethercote

This comment has been minimized.

Copy link
Contributor Author

nnethercote commented Oct 7, 2016

For example, you should be able to get a better perf improvement by skipping over the entire middle "GC" steps if all obligations hit the Ok(None) case.

Can you elaborate more? Are you talking about process_obligations?

@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented Oct 8, 2016

I found this comment to be curt and unhelpful. It made me feel angry and bad. It probably wasn't your intent, but that is the effect it had. Please think about how you express yourself when criticizing other people's code.

I'm sorry.

I just think that the best way to deal with an O(n^2) case is to well, make it stop being O(n^2), rather than adding low-level optimizations to the specific O(n^2) part.

The main entry point of the obligation forest is process_obligations, and it is basically updates all obligations and then performs a "mark-and-sweep GC", where the running of the "finalizers" advances type inference. The mark-and-sweep GC takes O(n) time.

If nothing changes in the process_obligation loop, we can skip GCing altogether. Also, because cycles are rare (and shouldn't be present in the inflate example), we may want to use some hybrid of garbage collection and reference counting to avoid the sweeps.

@nnethercote

This comment has been minimized.

Copy link
Contributor Author

nnethercote commented Oct 10, 2016

I just think that the best way to deal with an O(n^2) case is to well, make it stop being O(n^2), rather than adding low-level optimizations to the specific O(n^2) part.

To be clear: this is a reasonable opinion. The problem was not that you criticized my code, the problem was the way you criticized it. I suggest you avoid using the phrase "Totally The Wrong Way" in future reviews. Also, when suggesting alternative approaches, please describe them in reasonable detail. (Your follow-up comment about GCing was much better in this regard.)

@Manishearth

This comment has been minimized.

Copy link
Member

Manishearth commented Oct 16, 2016

Mod note: As Nick said, please be constructive and nice in your criticism. It seems like you've already acknowledged that, but just in case, leaving this comment 😄

@zatherz

This comment has been minimized.

Copy link

zatherz commented Oct 16, 2016

Why have my constructive and nice comments been removed? I'm very well acquainted with Rust's code of conduct and I think @nnethercote's reply could very well be considered harassment. He tried to make @arielb1 feel bad for his reply. We shouldn't reply to harassment with harassment. I am so angry that I'm considering writing a Medium blog post about it and posting it to HackerNews.

@Manishearth

This comment has been minimized.

Copy link
Member

Manishearth commented Oct 17, 2016

Mod note: @zatherz Your comment was deleted for containing profanity and blatantly violating the code of conduct. If you have further questions about moderation or the code of conduct, please ask rust-mods@rust-lang.org or on one of the forums. This is not the place for such discussion [further offtopic comments will be deleted]

@aturon

This comment has been minimized.

Copy link
Member

aturon commented Oct 17, 2016

@BurntSushi

This comment has been minimized.

Copy link
Member

BurntSushi commented Oct 19, 2016

Moderator note: As @Manishearth said, if you have a specific question or objection with our moderation, then please ask rust-mods@rust-lang.org. This isn't the place for it.

@nikomatsakis nikomatsakis self-assigned this Oct 21, 2016

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Oct 21, 2016

Adding myself as another assigneee. Sorry I haven't taken a look yet. My initial reaction was somewhat similar to @arielb1 (i.e., I'd rather solve this from the top), but on the other hand the speed improvements that @nnethercote showed are promising! I'd like to give this a closer read. Will try to do ASAP! Apologies for the delay @nnethercote!

@nnethercote

This comment has been minimized.

Copy link
Contributor Author

nnethercote commented Oct 21, 2016

Adding myself as another assigneee

Oh, please wait a bit! #37231 landed recently and that changed the profiles significantly, so I've started reworking this PR but it's not ready yet.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Oct 24, 2016

@nnethercote ok ping me

@nnethercote nnethercote force-pushed the nnethercote:obligation branch from 7c64806 to 40f5122 Oct 24, 2016

@nnethercote

This comment has been minimized.

Copy link
Contributor Author

nnethercote commented Oct 24, 2016

I've updated. The commit no longer converts NodeState to bitflags, which was probably the most controversial part of the change. Due to #37231 this is now only a 2% win.

r? @nikomatsakis

It feels like there is still plenty of room for improvement in this benchmark, though I'm struggling to find those improvements myself. Here's Cachegrind's summary of the hottest functions (measurements are instruction counts).

1,743,729,697  src/librustc/traits/fulfill.rs:<rustc::traits::fulfill::FulfillProcessor<'a, 'b, 'gcx, 'tcx> as rustc_data_structures::obligation_forest::ObligationProcessor>::process_obligation
1,470,943,055  src/librustc_data_structures/unify/mod.rs:<rustc_data_structures::unify::UnificationTable<K>>::get
1,291,117,019  src/librustc_data_structures/obligation_forest/mod.rs:<rustc_data_structures::obligation_forest::ObligationForest<O>>::process_obligations
1,098,926,283  src/librustc/infer/mod.rs:rustc::infer::InferCtxt::shallow_resolve
  982,723,014  src/libcore/result.rs:<rustc::traits::fulfill::FulfillProcessor<'a, 'b, 'gcx, 'tcx> as rustc_data_structures::obligation_forest::ObligationProcessor>::process_obligation
  819,129,999  src/libcore/iter/iterator.rs:<rustc::traits::fulfill::FulfillProcessor<'a, 'b, 'gcx, 'tcx> as rustc_data_structures::obligation_forest::ObligationProcessor>::process_obligation
  608,066,958  src/librustc_data_structures/obligation_forest/mod.rs:<rustc_data_structures::obligation_forest::ObligationForest<O>>::compress
  465,370,000  /build/glibc-GKVZIf/glibc-2.23/string/::/sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:__memcpy_avx_unaligned
  446,992,304  /build/glibc-GKVZIf/glibc-2.23/malloc/malloc.c:_int_malloc
  433,218,885  src/librustc_data_structures/unify/mod.rs:<rustc_data_structures::unify::UnificationTable<K>>::get
  397,811,572  src/libcore/iter/range.rs:<rustc_data_structures::obligation_forest::ObligationForest<O>>::process_obligations
  397,030,502  ???:???
  368,654,892  src/libcore/option.rs:rustc::infer::InferCtxt::shallow_resolve
  321,693,196  src/libcore/iter/range.rs:<rustc_data_structures::obligation_forest::ObligationForest<O>>::compress

You have to squint a bit when interpreting them due to inlining but they give you a good idea. In particular, this call chain is very hot:

process_obligations -> process_obligation -> process_predicate -> shallow_resolve -> UnificationTable::probe -> UnificationTable::get

Any changes that can speed up any of those functions or reduce the number of calls in this chain are likely to help.

@nikomatsakis
Copy link
Contributor

nikomatsakis left a comment

r=me with the helper extracted (maybe tagged #[inline]?)


for dependent in &node.dependents {
self.mark_as_waiting_from(&self.nodes[dependent.get()]);
}

This comment has been minimized.

@nikomatsakis

nikomatsakis Oct 31, 2016

Contributor

rather than making two copies of this code, maybe pull out a helper like mark_neighbors_as_waiting_from(node)?

This comment has been minimized.

@nnethercote

nnethercote Nov 2, 2016

Author Contributor

Done.

@nnethercote nnethercote force-pushed the nnethercote:obligation branch from 40f5122 to 1011042 Nov 2, 2016

Optimize ObligationForest's NodeState handling.
This commit partially inlines two functions, `find_cycles_from_node` and
`mark_as_waiting_from`, at two call sites in order to avoid function
unnecessary function calls on hot paths.

It also fully inlines and removes `is_popped`.

These changes speeds up rustc-benchmarks/inflate-0.1.0 by about 2% when
doing debug builds with a stage1 compiler.

@nnethercote nnethercote force-pushed the nnethercote:obligation branch from 1011042 to 7b33f7e Nov 2, 2016

@nnethercote

This comment has been minimized.

Copy link
Contributor Author

nnethercote commented Nov 2, 2016

Comments addressed.

@aturon

This comment has been minimized.

Copy link
Member

aturon commented Nov 2, 2016

@bors r=nikomatsakis

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Nov 2, 2016

@bors r+

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Nov 2, 2016

📌 Commit 7b33f7e has been approved by nikomatsakis

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Nov 2, 2016

Not sure why @bors wasn't listening to @aturon. =)

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Nov 3, 2016

⌛️ Testing commit 7b33f7e with merge f9f45c6...

bors added a commit that referenced this pull request Nov 3, 2016

Auto merge of #36993 - nnethercote:obligation, r=nikomatsakis
Optimize ObligationForest's NodeState handling.

This commit does the following.
- Changes `NodeState` from an enum to a `bitflags`. This makes it
  possible to check against multiple possible values in a single bitwise
  operation.
- Replaces all the hot `match`es involving `NodeState` with `if`/`else`
  chains that ensure that cases are handled in the order of frequency.
- Partially inlines two functions, `find_cycles_from_node` and
  `mark_as_waiting_from`, at two call sites in order to avoid function
  unnecessary function calls on hot paths.
- Fully inlines and removes `is_popped`.

These changes speeds up rustc-benchmarks/inflate-0.1.0 by about 7% when
doing debug builds with a stage1 compiler.

r? @arielb1

@bors bors merged commit 7b33f7e into rust-lang:master Nov 3, 2016

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details

@nnethercote nnethercote deleted the nnethercote:obligation branch Nov 3, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.