Switch to a per-entry state machine in Graph #6013

stuhood · 2018-06-24T21:51:03Z

Problem

#4558 introduces a new state for a Node: it may be dirty, which requires that we remember the previous value of a Node while simultaneously computing its next value. I began implementing this state machine with our existing future::Shared usage, but ran into the need to either hide values in the closure of the running future, or store values atomically outside of the future's value. This complexity represents a classic case for a state machine.

Additionally, we have seen that future::Shared has a relatively high cost in terms of complexity and memory (see #5990).

Solution

In order to make #4558 easier to implement correctly (and to reduce memory usage), implement the storage of a Node's value in the Graph via a state machine.

One of the largest complexities of Shared is that it needs to elect one of the waiters of the Shared future as the one who will actually do the work of polling the wrapped future. To avoid that complexity, we introduce usage of the tokio Runtime for all Node executions. This allows waiters to stay very simple, and receive the result value via a oneshot::channel.

Result

#4558 should be much easier to implement. I'll likely wait to land this until #4558 is further along.

This change reduces memory usage about 200MB further than the fix from #5990 (down to about 2.5GB for ./pants list :: in Twitter's repo). Unfortunately, in its current form it also represents a performance regression of about 10% for that same usecase. Although I believe I have a handle on why, I'd like to defer fixing that regression until after #4558 is implemented.

This also fixes #3695, as since we now fail slowly, we are able to render multiple distinct failures at once.

stuhood · 2018-06-24T21:56:53Z

This depends on #6010.

From profiling, I believe I have a handle on why we see a performance regression: because Shared has its own internal mutex (with the memory costs that that entails), it was able to be operated on outside of the Graph's mutex. But as implemented, this patch removes all mutexes other than the Graph mutex, and so we end up spending more time under that lock.

There are two concrete improvements possible:

Part of what happens under the Graph mutex is the cloning of Values to send to the waiters of a Running->Completed transitioning entry. By switching the NodeResult::Value type to wrap an Arc we could avoid calling into python while under the lock (or outside of it, for that matter!)
Rather than executing fn get under the lock, we could wrap the state field an Arc<Atomic<..>> (from the crossbeam crate). This would allow us to rapidly clone the state field while under the lock, and then operate on it atomically outside of any mutex. It's also possible that just using a Arc<Mutex<..>> would be fine.

illicitonion · 2018-06-25T14:35:27Z

src/rust/engine/graph/src/lib.rs

-struct EntryState<N: Node> {
-  field: EntryStateField<N::Item, N::Error>,
-  start_time: Instant,
+type RunToken = usize;


Add a doc comment explaining what a RunToken is? :)

illicitonion · 2018-06-25T14:42:42Z

src/rust/engine/graph/src/lib.rs

+        waiters.push(send);
+        return recv
+          .map_err(|_| N::Error::invalidated())
+          .and_then(|res| res)


.flatten()?

illicitonion · 2018-06-25T14:56:55Z

src/rust/engine/graph/src/lib.rs

-          let context = context.clone_for(entry_id);
-          let node = n.clone();
-          future::lazy(move || node.run(context)).to_boxed()
+    let next_state = match &mut self.state {


This match has two super different cases:

NotStarted -> {Running, Completed[Cyclic]}, then recurse

Running -> Future[unresolved], Completed -> Future[Resolved] which is returned

I found it pretty hard to follow because of the mix of returning values out of the match vs early returning out of the function, and it left me asking questions like "Why are we recursing when we've made a future to wait on" and "won't we end up creating multiple waiters?"

What I really want is to make the recursing happen inside the NotStarted match, rather than at the function scope, but I guess the borrow-checker doesn't like that...

Absent that, maybe rather than early returning, the outer match could return Either<EntryState, BoxFuture> and the recursing or returning logic could happen there, to make it more explicit?

Hm... how strongly do you feel about this? I feel like I've seen this pattern in other rust state machine code. And Either isn't actually in the stdlib.

Strongly enough to justify a multi-sentence comment if we're not going to change the code :)

(A lightweight alternative to Either is enum WhatToDo { NextState(StateEntry), Return(BoxFuture) }, but happy with just adding a comment explaining what's going on, too)

The word "state machine" makes me feel pretty strongly that using an enum would clarify this, anything that has state should make it blindingly clear where the state transitions occur and what transition is occurring, and enums are super cheap to make and can be better documenting than a much longer comment at times. If the early return is required for whatever logic we don't need to be all haskell about it but if it's easily avoidable I usually try to do that (which led to the unnecessarily-public SingleExpansionResult in #5769).

The word "state machine" makes me feel pretty strongly that using an enum would clarify this

Well, we're already using an enum, so the question here was whether it was worth having two enums, one that was used only in this method.

But I believe that that is resolved by the changes in #6059.

That's what I was referring to as well, but will review #6059!

illicitonion · 2018-06-25T15:05:43Z

src/rust/engine/graph/src/lib.rs

+        run_token,
+      } => {
+        if result_run_token == run_token {
+          // Notify all waiters (ignoring any that have gone away), and then store the value.


When might they go away?

illicitonion · 2018-06-25T15:17:41Z

src/rust/engine/graph/src/lib.rs

-  start_time: Instant,
+type RunToken = usize;
+
+enum EntryState<N: Node> {


Given that all three states have a RunToken, and we operate on them independent of the rest of state, I'd be tempted to to give each node both an EntryState field and a RunToken field, rather than merging them into one.

I am not sure yet whether we're going to end up with a Mutex or Atomic around this field... would prefer to keep it in for now.

illicitonion · 2018-06-25T17:13:15Z

src/rust/engine/graph/src/lib.rs

+    let state = mem::replace(&mut self.state, EntryState::NotStarted(0));
+
+    // We care about exactly one case: a Running state with the same run_token. All other states
+    // represent various (legal) race conditions.


Can you expand on those race conditions? (i.e. mention that it means a thing was invalidated and we don't care about its result any more)

illicitonion

Add that one comment and I'm happy :) Thanks!

stuhood · 2018-06-26T23:38:17Z

The final error in Travis is related to #3695, because this change has the effect of making pantsd fail slowly: in essence, all nodes that have already been requested will complete (although a user request will still fail fast). This might cause traces to render more than they should.

…tor with a callback to persist their results.

… on the python side.

cosmicexplorer · 2018-07-03T01:11:05Z

src/rust/engine/graph/src/lib.rs

-          let context = context.clone_for(entry_id);
-          let node = n.clone();
-          future::lazy(move || node.run(context)).to_boxed()
+    let next_state = match &mut self.state {


The word "state machine" makes me feel pretty strongly that using an enum would clarify this, anything that has state should make it blindingly clear where the state transitions occur and what transition is occurring, and enums are super cheap to make and can be better documenting than a much longer comment at times. If the early return is required for whatever logic we don't need to be all haskell about it but if it's easily avoidable I usually try to do that (which led to the unnecessarily-public SingleExpansionResult in #5769).

cosmicexplorer · 2018-07-03T01:22:38Z

src/rust/engine/graph/src/lib.rs

+    match &self.state {
+      &EntryState::Running { start_time, .. } => Some(now.duration_since(start_time)),
+      _ => None,
+    }


This method is neat and is making me think about more instrumentation methods for the graph -- haven't thought of anything cool yet.

cosmicexplorer · 2018-07-03T01:24:56Z

src/rust/engine/graph/src/lib.rs

+    where
+      F: Future<Item = (), Error = ()> + Send + 'static,
+    {
+      // Avoids introducing a dependency on a threadpool.


Why is that something we want to avoid doing?

No great reason other than complexity... there is an argument to be made in general for doing the simplest thing that works, and this does. But I've since introduced a [dev-dependency] for the tests in #6059 and that was relatively painless, so who knows.

Got it, was just wondering if there was something deeper. This makes sense.

cosmicexplorer · 2018-07-03T01:27:44Z

src/rust/engine/graph/src/node.rs

+  /// Spawns a Future on an Executor provided by the context.
+  ///
+  /// NB: Unlike the futures `Executor` trait itself, this implementation _must_ spawn the work
+  /// on another thread, as it is called from within the Graph lock.


Is there a way we can encode this into the type so we can't make this mistake? And/or would it make sense to name this spawn_in_separate_thread()?

The bounds (Send) sortof already hint that the value will be sent to another thread, but unfortunately that is lost in the noise of all Futures currently requiring a Send bound. Renaming it would make sense, yea. I'll do that in #6059.

cosmicexplorer · 2018-07-03T01:43:13Z

src/rust/engine/graph/src/lib.rs

+    let dst_id = {
+      // TODO: doing cycle detection under the lock... unfortunate, but probably unavoidable
+      // without a much more complicated algorithm.
+      let potential_dst_id = inner.ensure_entry(EntryKey::Valid(dst_node.clone()));


Would it be conceivable to only obtain a copy of the relevant subgraph under the lock (could use an r/w lock here too) with something like filter_map, do the cycle detection, then re-acquire the lock for mutation?

So, the most basic form of that is hard, because acquiring a read lock and then "upgrading" it to a write lock is a rare, and fairly restrictive operation. The RwLock in the stdlib does not allow for it, and although the RwLock in the parking_lot crate does, you need to actually acquire explicitly as upgradeable_read, which is exclusive (just like acquiring for write).

I've also implemented a tactic involving adding a "mutation generation" to the graph that allowed us to cycle detect for a particular generation (not the same "generation" concept as on #6059) under the read lock, and then skip cycle detection if we acquired the write lock and the generation hadn't changed. And unfortunately that was also not faster.

But this comment is sortof fud, because the most time I've ever seen cycle detection take in a profile is about 3%.

I was assuming we would obtain the read lock to get the subgraph we want, then release the read lock to perform cycle detection, then acquire a write lock to perform mutation? Keeping a read lock while performing the cycle detection would seem to have the same issue as using a simple mutex as we do here? I haven't written enough multithreaded code outside of a lab to know if I'm missing something.

And yeah, this is not a blocking concern, but I potentially see potential cycle detection lag becoming an issue to address in the future when the graph gets large enough.

I was assuming we would obtain the read lock to get the subgraph we want, then release the read lock to perform cycle detection, then acquire a write lock to perform mutation?

While you have the lock released, the graph might change. And so you would need something like the "mutation generation" strategy I mentioned.

Ah, of course. That makes sense. Maybe leave a note that this isn't a perf concern right now in the comment?

cosmicexplorer · 2018-07-03T01:51:51Z

src/rust/engine/graph/src/lib.rs

+  /// When the Executor finishes executing a Node it calls back to store the result value. We use
+  /// the run_token value to determine whether the Node changed while we were busy executing it,
+  /// so that we can discard the work.
+  ///


Really not trying to nitpick here: when would this happen? Would this occur basically only if this is a long-running Node with frequently-changing input? If I'm some object outside of the graph waiting on the result of some Node downstream of this one, would this mean that the value I would have received for the Node could be different depending on whether some other object happens to make a request for something else that transitively uses this node in the meantime?

Would this occur basically only if this is a long-running Node with frequently-changing input?

It would occur when while a Node is running, one of its transitive dependencies is invalidated, which will clear or dirty this Node, making the work that is executing irrelevant.

would this mean that the value I would have received for the Node could be different depending on whether some other object happens to make a request for something else that transitively uses this node in the meantime?

No: requests to read things are "read only"/"append only" in terms of the graph: they only cause the graph to grow. Graph::invalidate_from_roots is what causes inputs to "change".

Oh ok, awesome! I'm not clear on when one of a Node's transitive deps would be invalidated during the course of its execution unless there's more than one path from roots to the Node? Because I'm assuming a Node is only going to execute in the first place if one of its transitive dependencies is invalidated?

Nodes start executing top-down. But they finish executing bottom up... a node Z that (transitively) depends on reading a file from disk won't complete until the node that reads the file from disk has completed. And if in between when the file is read from disk and when Z finishes running the file is invalidated again, then Z needs to re-run.

And the file would be invalidated again if e.g. it is a source file and the user edits it after making a request for some product, but before the request completes, because watchman would note the file had changed and upon receiving that notif we would mark the corresponding snapshot node for the changed file as invalid?

Ok great, I had a misconception about the interface from the engine to the graph, I think this method is probably fine as is.

Also, note that this method is not public... it's only called from this module.

stuhood · 2018-07-03T02:21:56Z

Add that one comment and I'm happy :) Thanks!

@illicitonion : So, #6059 overhauls this anyway for performance reasons... can you take a look over there and see whether it looks clearer to you?

illicitonion · 2018-07-03T11:05:10Z

The new version in that PR looks great, thanks!

…y root per distinct failure. Fixes pantsbuild#3695.

…nged (#6059) ### Problem As described in #4558, we currently completely delete `Node`s from the `Graph` when their inputs have changed. One concrete case where this is problematic is that all `Snapshots` in the graph end up with a dependency on the `scandir` outputs of all of their parent directories, because we need to expand symlinks recursively from the root when consuming a `Path` (in order to see whether any path component on the way down is a symlink). This means that changes anywhere above a `Snapshot` invalidate that `Snapshot`, and changes at the root of the repo invalidate _all_ `Snapshots` (although 99% of the syscalls they depend on are not invalidated, having no dependencies of their own). But this case is just one of many cases affected by the current implementation: there are many other times where we re-compute more than we should due to the current `Node` invalidation strategy. ### Solution Implement node "dirtying", as described on #4558. There are a few components to this work: * In addition to being `Entry::clear`ed (which will force a `Node` to re-run), a `Node` may be `Entry::dirty`ed. A "dirty" `Node` is eligible to be "cleaned" if its dependencies have not changed since it was dirtied. * Each `Node` records a `Generation` value that acts as proxy for "my output has changed". The `Node` locally maintains this value, and when a Node re-runs for any reason (either due to being `dirtied` or `cleared`), it compares its new output value to its old output value to determine whether to increment the `Generation`. * Each `Node` records the `Generation` values of the dependencies that it used to run, at the point when it runs. When a dirtied `Node` is deciding whether to re-run, it compares the previous generation values of its dependencies to their current dependency values: if they are equal, then the `Node` can be "cleaned": ie, its previous value can be used without re-running it. This patch also expands the testing of `Graph` to differentiate dirtying a `Node` from clearing it, and confirms that the correct `Nodes` re-run in each of those cases. ### Result Cleaning all `Nodes` involved in `./pants list ::` after touching `pants.ini` completes 6 times faster than recomputing them from scratch (56 seconds vs 336 seconds in our repository). More gains are likely possible by implementing the performance improvement(s) described on #6013. Fixes #4558 and fixes #4394.

stuhood requested review from baroquebobcat, illicitonion and cosmicexplorer June 24, 2018 21:51

stuhood force-pushed the stuhood/graph-state-machine branch from b87cbb6 to a51b172 Compare June 25, 2018 00:01

illicitonion reviewed Jun 25, 2018

View reviewed changes

stuhood force-pushed the stuhood/graph-state-machine branch from a51b172 to 9f7907a Compare June 26, 2018 01:33

illicitonion approved these changes Jun 26, 2018

View reviewed changes

baroquebobcat approved these changes Jun 26, 2018

View reviewed changes

stuhood force-pushed the stuhood/graph-state-machine branch 2 times, most recently from 7e0ab6c to 22b5bcd Compare July 2, 2018 03:13

stuhood mentioned this pull request Jul 3, 2018

Add support for reusing Graph node values if their inputs haven't changed #6059

Merged

stuhood added 4 commits July 2, 2018 18:46

Remove future::Shared, and instead spawn all Nodes on the tokio execu…

a755424

…tor with a callback to persist their results.

Switch from StableDiGraph to DiGraph now that we no longer remove nodes.

0e5a85e

Review feedback

43f2aae

Add an invalidation unit test to replace some integration-level tests…

27455ef

… on the python side.

stuhood force-pushed the stuhood/graph-state-machine branch from 22b5bcd to 27455ef Compare July 3, 2018 01:47

cosmicexplorer reviewed Jul 3, 2018

View reviewed changes

Switch the implementation of tracing to tracing a shortest path to an…

a368f7a

…y root per distinct failure. Fixes pantsbuild#3695.

stuhood merged commit ff4162d into pantsbuild:master Jul 3, 2018

stuhood deleted the stuhood/graph-state-machine branch July 3, 2018 21:16

This was referenced Jul 3, 2018

Shared futures regularly allocate 3kb #5990

Closed

Move each EntryState in the graph under its own Mutex #6074

Closed

stuhood mentioned this pull request Aug 30, 2018

Pantsd deadlock due to fail-fast #6426

Closed

Switch to a per-entry state machine in Graph #6013

Switch to a per-entry state machine in Graph #6013

Conversation

stuhood commented Jun 24, 2018 • edited Loading

Problem

Solution

Result

stuhood commented Jun 24, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

illicitonion left a comment

Choose a reason for hiding this comment

stuhood commented Jun 26, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cosmicexplorer Jul 3, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cosmicexplorer Jul 3, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cosmicexplorer Jul 3, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stuhood Jul 3, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stuhood commented Jul 3, 2018

illicitonion commented Jul 3, 2018

stuhood commented Jun 24, 2018 •

edited

Loading

stuhood commented Jun 24, 2018 •

edited

Loading

stuhood commented Jun 26, 2018 •

edited

Loading

cosmicexplorer Jul 3, 2018 •

edited

Loading

cosmicexplorer Jul 3, 2018 •

edited

Loading

cosmicexplorer Jul 3, 2018 •

edited

Loading

stuhood Jul 3, 2018 •

edited

Loading