Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimum viable speculating command runner #7992

Merged
merged 10 commits into from Jul 12, 2019

Conversation

@hrfuller
Copy link
Contributor

commented Jul 1, 2019

Problem

progress on #7949

Implement first pieces of a speculating command runner.

Solution

A command runner implementation that takes two bounded command runners and only runs the preferred one.

Result

Another layer of abstraction around running commands.

@Eric-Arellano Eric-Arellano requested review from illicitonion and stuhood Jul 1, 2019

@Eric-Arellano Eric-Arellano assigned gshuflin and unassigned gshuflin Jul 1, 2019

@Eric-Arellano Eric-Arellano requested a review from gshuflin Jul 1, 2019

@Eric-Arellano

This comment has been minimized.

Copy link
Contributor

commented Jul 1, 2019

Will conflict with #7991.

@hrfuller

This comment has been minimized.

Copy link
Contributor Author

commented Jul 1, 2019

Yes. One of us will have to rebase and resolve. I don't mind doing it.

Show resolved Hide resolved src/rust/engine/src/context.rs Outdated
@stuhood

This comment has been minimized.

Copy link
Member

commented Jul 2, 2019

Thanks, this looks sane.

In terms of how to break up #7949, I think that because this ended up being fairly small, you might consider doing a little bit of design for steps 3 and 4 (perhaps as a comment on #7949), and then including the absolute most basic subset of the options in this PR in order to enable actually implementing basic speculation as one or two commits atop the existing one in this PR.

@illicitonion

This comment has been minimized.

Copy link
Contributor

commented Jul 2, 2019

Thanks for putting this together, it looks good :)

If we can make it work, I think I'd rather be using Box<dyn CommandRunner> everywhere, and having Context set up what it needs to, rather than hard-coding types everywhere.

So ideally the Speculating CommandRunner will just have two Box<dyn CommandRunner>s, rather than a BoundedCommandRunner and an Option<BoundedCommandRunner>, and in Context we'd either make one BoundedCommandRunner (stored as a Box<dyn CommandRunner>) or a SpeculatingCommandRunner, rather than hard-coding that we always have a SpeculatingCommandRunner which may or may not actually speculate...

@hrfuller

This comment has been minimized.

Copy link
Contributor Author

commented Jul 2, 2019

Thanks for the suggestions @illicitonion, that makes sense and is more explicit about whats going on.

@hrfuller hrfuller force-pushed the twitter:hfuller/speculation-7949 branch from fa59450 to 24a9b9f Jul 2, 2019

@hrfuller

This comment has been minimized.

Copy link
Contributor Author

commented Jul 8, 2019

@illicitonion After discussion with @stuhood we decided that for now, and potentially for the long run, we won't extend the speculation runner into a high level retrier. That retry should be handled by lower level components as it is currently done, and we will be opaque about it for now at the speculation level. That is for a given speculation call. The actual RPC may be retried at the command level in a way that the speculating command runner won't know about. Curious to get your thoughts on this potential design, and if you are happy with the current retry capabilities.

@illicitonion

This comment has been minimized.

Copy link
Contributor

commented Jul 9, 2019

@illicitonion After discussion with @stuhood we decided that for now, and potentially for the long run, we won't extend the speculation runner into a high level retrier. That retry should be handled by lower level components as it is currently done, and we will be opaque about it for now at the speculation level. That is for a given speculation call. The actual RPC may be retried at the command level in a way that the speculating command runner won't know about. Curious to get your thoughts on this potential design, and if you are happy with the current retry capabilities.

Sounds great!

On retrying, there are a few use cases for retrying:

  1. The server told us we needed to correct a condition before it can execute (e.g. missing files) - we handle this well today.
  2. Some generic transient error on an individual RPC(e.g. connection dropped) - I don't think we retry these for any RPC in the CommandRunner (the Store does, via serverset). We probably should.
  3. Weird random errors from the server - e.g. a worker crashed and the scheduler told us rather than internally retrying. We should probably retry to whole execution (not just the GetOperation RPC) in this case, and we currently don't.
  4. The action we're running spuriously failed (e.g. a test was flaky) - we should handle this at a much higher level (and potentially pass a "do not cache" flag to the process execution, and force re-execution of these in the graph). I don't think we need to handle this now.
@stuhood

This comment has been minimized.

Copy link
Member

commented Jul 10, 2019

On retrying, there are a few use cases for retrying:

Great points, thanks. Will create some tickets for these.

EDIT: On second thought: I don't really have any idea about the priority of these, so I'm going to suggest holding off.

@hrfuller hrfuller force-pushed the twitter:hfuller/speculation-7949 branch from 6d84809 to d404b37 Jul 10, 2019

@hrfuller hrfuller changed the title Boiler plate for speculating command runner Minimum viable speculating command runner Jul 10, 2019

@illicitonion
Copy link
Contributor

left a comment

Looks good, thanks!

I'd probably revert the context.rs changes and land them separately (with whatever flags we need).

I don't know much about cancellation - would it be possible to add a test which shows that cancellation works properly? I imagine something like the DelayedCommandRunner incrementing an atomic counter at the end of its running, and us verifying that the counter only ever hits one even if we want for the duration of both delays. Or is this something that doesn't work at the moment, and we need to work out how to do?

Show resolved Hide resolved src/rust/engine/process_execution/Cargo.toml Outdated
Show resolved Hide resolved src/rust/engine/process_execution/src/speculate.rs Outdated
Show resolved Hide resolved src/rust/engine/process_execution/src/speculate.rs Outdated
Show resolved Hide resolved src/rust/engine/process_execution/src/speculate.rs Outdated
Show resolved Hide resolved src/rust/engine/process_execution/src/speculate.rs Outdated
Show resolved Hide resolved src/rust/engine/process_execution/src/speculate.rs Outdated
@stuhood
Copy link
Member

left a comment

Thanks!

Show resolved Hide resolved src/rust/engine/process_execution/src/speculate.rs Outdated
Show resolved Hide resolved src/rust/engine/process_execution/src/speculate.rs Outdated
Show resolved Hide resolved src/rust/engine/process_execution/src/speculate.rs Outdated
Show resolved Hide resolved src/rust/engine/src/context.rs Outdated
@illicitonion
Copy link
Contributor

left a comment

Looks great, thanks!

.select(delay.then(move |_| command_runner.secondary.run(req_2, workunit_store_clone)))
.then(|raced_result| match raced_result {
Ok((successful_res, _outstanding_req)) => {
ok::<FallibleExecuteProcessResult, String>(successful_res).to_boxed()

This comment has been minimized.

Copy link
@illicitonion

illicitonion Jul 11, 2019

Contributor

Because Result implements IntoFuture, you should actually be able to just:
Ok(successful_res)
and
Err(failed_res)
in this closure :) Not particularly important, but good to know about :)

#[test]
fn test_no_speculation() {
let (result, call_counter) = run_speculation_test(0, 0, 100, false, false);
assert_eq![1, *call_counter.lock().unwrap()];

This comment has been minimized.

Copy link
@illicitonion

illicitonion Jul 11, 2019

Contributor

For the no speculation case, it may actually be nice to have two counters; a "started" counter and a "finished" counter, to show that we didn't even start the second attempt. But also understand if you don't think it's worthwhile :)

@stuhood
Copy link
Member

left a comment

Thanks!

@stuhood stuhood merged commit 987121c into pantsbuild:master Jul 12, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@stuhood stuhood deleted the twitter:hfuller/speculation-7949 branch Jul 12, 2019

stuhood added a commit that referenced this pull request Jul 16, 2019

Add support for speculation as a viable command runner. (#8050)
### Problem

This is a follow up to #7992 to add CLI flags for controlling usage of speculation.

### Solution

Add two new advanced cli flags under the process_execution scope: 
* `process_execution_speculation_timeout_ms` takes a timeout after which we would speculate (if enabled).
* `process_execution_speculation` is a tri-state flag, which selects the type of speculation, or disables it. This maintains the possibility of remote only execution without any speculation.

Both flags have defaults but will only have any effect if the `remote_execution` switch is flipped on.

### Result

Users have control of the speculation mode and timeout parameter.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.