New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cancel running work when entering the fork context #6464

Merged
merged 3 commits into from Sep 7, 2018

Conversation

Projects
None yet
3 participants
@stuhood
Copy link
Member

stuhood commented Sep 6, 2018

Problem

As described in #6426: when an error occurs prefork with pantsd and a request fails fast, work that is still running in the engine can race the shutdown of all resources that happens in order to safely fork, and as a result, can trigger a deadlock if new requests are made to the Runtime.

Solution

Add the ability to drain the engine's Graph, causing all work to fail fast with Invalidated the next time it attempts to interact with the Graph. Additionally, ensure that the Invalidated state is not actually stored, since it always represents an ephemeral failure.

(NB: We continue to push toward reducing usage of fork-without-exec by investing in the --v2 UX, but removing it entirely might be a ways off.)

Result

Entering the fork_context will cause all in-flight work to fail fast in a recoverable way, allowing for safe forking even in the presence of concurrent in-flight pantsd runs. According to ~30 minutes of runs that used to repro, this fixes #6426.

@stuhood stuhood force-pushed the twitter:stuhood/cancel-running-work-in-fork-context branch from 67f0576 to 1a035e6 Sep 7, 2018

@stuhood stuhood changed the title WIP: Cancel running work when entering the fork context Cancel running work when entering the fork context Sep 7, 2018

@stuhood stuhood requested review from illicitonion , ity and blorente Sep 7, 2018

@stuhood

This comment has been minimized.

Copy link
Member

stuhood commented Sep 7, 2018

This is reviewable: individual commits should be useful to review independently.

@stuhood stuhood added this to the 1.9.x milestone Sep 7, 2018

@stuhood stuhood force-pushed the twitter:stuhood/cancel-running-work-in-fork-context branch from 1a035e6 to cf25183 Sep 7, 2018

@blorente
Copy link
Contributor

blorente left a comment

Besides a minor naming thing, it looks great! Thanks!

A maybe possibly future thing would be to change the with_{reset, exclusive, shutdown} names to run_with_<X>. This is a personal thing that probably stems from coming across the with_ idiom for the first time, though.

debug!("Waiting to enter fork_context...");
thread::sleep(Duration::from_millis(10));
}
let t = self.runtime.with_reset(|| {

This comment has been minimized.

@blorente

blorente Sep 7, 2018

Contributor

If I understand correctly, this is whatever the result of f() is, gotten by unwrapping res. Maybe a name such as fork_result, or just result would be more descriptive.

@illicitonion
Copy link
Contributor

illicitonion left a comment

Looks great :) Thanks!

debug!("Waiting to enter fork_context...");
thread::sleep(Duration::from_millis(10));
}
let t = self.runtime.with_reset(|| {

This comment has been minimized.

@illicitonion

illicitonion Sep 7, 2018

Contributor

We should really switch Resettable.get to Resettable.with at some point... May make a good cross-pollination ticket, actually, for getting someone into Rust :)

@stuhood

This comment has been minimized.

Copy link
Member

stuhood commented Sep 7, 2018

Thanks gang!

@stuhood stuhood merged commit ccfb31d into pantsbuild:master Sep 7, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@stuhood stuhood deleted the twitter:stuhood/cancel-running-work-in-fork-context branch Sep 7, 2018

@stuhood stuhood referenced this pull request Sep 7, 2018

Merged

Prepare 1.10.0rc0 release #6467

stuhood added a commit that referenced this pull request Sep 7, 2018

Cancel running work when entering the fork context (#6464)
As described in #6426: when an error occurs prefork with `pantsd` and a request fails fast, work that is still running in the engine can race the shutdown of all resources that happens in order to safely fork, and as a result, can trigger a deadlock if new requests are made to the Runtime.

Add the ability to `drain` the engine's `Graph`, causing all work to fail fast with `Invalidated` the next time it attempts to interact with the `Graph`. Additionally, ensure that the `Invalidated` state is not actually stored, since it always represents an ephemeral failure.

(NB: We continue to push toward reducing usage of fork-without-exec by investing in the `--v2` UX, but removing it entirely might be a ways off.)

Entering the `fork_context` will cause all in-flight work to fail fast in a recoverable way, allowing for safe forking even in the presence of concurrent in-flight `pantsd` runs. According to ~30 minutes of runs that used to repro, this fixes #6426.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment