Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pantsd] Implement auto-shutdown after runs #7341

Merged
merged 4 commits into from Mar 14, 2019

Conversation

Projects
None yet
5 participants
@blorente
Copy link
Contributor

commented Mar 12, 2019

In the context of enabling pantsd in Travis in some capacity, we need to implement a mechanism to auto-shutdown Pantsd after runs, so that we can still have isolated integration tests.

@blorente

This comment has been minimized.

Copy link
Contributor Author

commented Mar 11, 2019

The kill-pantsd goal currently calls PantsDaemon:terminate(). We can take advantage of that function to perform shutdown. As a draft of how to implement this:

  • RemotePantsRunner:run() is a good place to modify because (1) it provides a handle to the pantsd process, and (2) it can run code after the thin client has finished running. So we could add code at the end of that function that calls terminate().
  • Implement a CLI global option (e.g. --pantsd-shutdown-after-run), which toggles this behaviour.

Note that we can insert that call to terminate() anywhere between this line and the end of RemotePantsRunner:run().

@stuhood I'd be interested on hearing your thoughts.

@blorente blorente changed the title [Pantsd] Implement auto-shutdown after runs [pantsd] Implement auto-shutdown after runs Mar 11, 2019

@illicitonion

This comment has been minimized.

Copy link
Contributor

commented Mar 11, 2019

Weird edge-casey question: Should --pantsd-shutdown-after-run be allowed for all runs, or only for the run that also started pantsd?

i.e. which of the following flows do we want to support?

./pants --enable-daemon=True --pantsd-shutdown-after-run
./pants --enable-daemon=True ; ./pants --enable-daemon=True --pantsd-shutdown-after-run
./pants --enable-daemon=True ; ./pants --enable-daemon=False --pantsd-shutdown-after-run

@blorente

This comment has been minimized.

Copy link
Contributor Author

commented Mar 11, 2019

So the proposal above makes it work for all three cases (or could be made work with minimal effort), making it virtually the same as running

./pants --enable-daemon=True --pantsd-shutdown-after-run // Start pantsd, run whatever, kill pantsd
./pants --enable-daemon=True ; ./pants --enable-daemon=True <goal> kill-pantsd ... // Run whatever on pantsd, kill pantsd at the end.
./pants --enable-daemon=True ; ./pants --enable-daemon=False <goal> kill-pantsd ... // Run local pants as if pantsd didn't exist, kill daemon at the end.

So I think theses semantics look good.

However, I'd prefer to only support the first case for now, which is the one we want. This would require a check to see if pantsd is running when we parse the flag, and probably raise an error if the flag is enabled but pantsd is already running.

@blorente

This comment has been minimized.

Copy link
Contributor Author

commented Mar 11, 2019

So we don't have an instance of PantsDaemon, we only have a Handle, which is unfortunate because IPC gets a bit tricky. We have two options here:

  • Use the current pattern with RemotePantsRunner._maybe_launch_pantsd() (which creates an instance of PantsDaemon and calls PantsDaemon.Factory.maybe_launch().
  • In the same line 155 of _connect_and_execute, also run a call to “kill-pantsd” from the thin client, which can be a bit finnicky around handling results, but should work.

I think I lean towards option 1, but welcome any thoughts.

@blorente

This comment has been minimized.

Copy link
Contributor Author

commented Mar 11, 2019

On a second thought, option 1 seems to work only for things that want to spawn daemons. I actually lean towards option 2.

The simplest way to implement this is to insert a call to client.execute('pants', 'kill-pantsd' ...) in _connect_and_execute. However, the semantics of this get a bit tricky. When running ./pants --enable-pantsd --pantsd-shutdown-after-run <some more options> ...:

  • Pantsd is started
  • First call (in the thin client) to client.execute('pants', ['enable-pantsd', <the rest of the options>])
  • Call to client.execute('pants', 'kill-pantsd' ...):
    • Restart daemon, because now we have changed the options.
    • Actually kill the new daemon

This will be the case until #7205 gets resolved, and modifying the arg list passed by the client to try to avoiding a daemon restart seems tricky.

@blorente

This comment has been minimized.

Copy link
Contributor Author

commented Mar 11, 2019

However, option 1 seems to not include IPC at all (it's just used to create daemons), so we probably do need some pailgun-based solution.

So a solution here would be to try to make a second call to pailgun (client.execute()), substituding whatever goal it had with kill-pantsd, something like "find the first word that doesn't begin with an -, and replace it with "kill-pantsd".

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

commented Mar 11, 2019

So we don't have an instance of PantsDaemon, we only have a Handle, which is unfortunate because IPC gets a bit tricky. We have two options here:

Could another alternative be to send a SIGTERM/etc to the pid post #6574?

On a second thought, option 1 seems to work only for things that want to spawn daemons.

Potentially related is #7205, which describes some blockers to recognizing whether we might want to avoid spawning a daemon for the given goal. EDIT: Didn't see that was linked already!

@blorente

This comment has been minimized.

Copy link
Contributor Author

commented Mar 11, 2019

Could another alternative be to send a SIGTERM/etc to the pid post #6574?

Potentially, yes. I liked the other options because they exercise codepaths that are designed for that specifically, though.
I have tried this bit of code:

    with self._trapped_signals(client), STTYSettings.preserved():
      # Execute the command on the pailgun.
      result = client.execute(self.PANTS_COMMAND, *self._args, **modified_env)

      if should_terminate_pantsd:
        kill_pantsd_args = ['kill-pantsd']
        termination_result = client.execute(self.PANTS_COMMAND, self._args[0], *kill_pantsd_args, **modified_env)

, and it seems to work fine, without spawning extra daemons.

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

commented Mar 11, 2019

Sure, but it does create a new pantsd-runner process which isn't 100% necessary. I was thinking it would be nice to lean on our signal handling being correct and shutting down pantsd correctly for this use case, and then fix any errors induced by #6574, than to create a whole new pantsd-runner process just to interpret the kill-pantsd goal, which specifically exists to expose killing pantsd to the pants user (which ideally wouldn't need to exist if pantsd was perfectly reliable).

@blorente

This comment has been minimized.

Copy link
Contributor Author

commented Mar 11, 2019

Yeah, in this case kill-pantsd would also need to be implemented in terms of SIGTERM/etc signals, right?

Not saying I'm opposed to it, it actually looks like the best solution when #6574 is done.

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

commented Mar 11, 2019

If I'm reading it right, PantsDaemonKill calling .terminate() does exactly that right now. It sounds like from #6574 (comment) that #6574 is close to being mergeable, which I will try to ensure.

@blorente

This comment has been minimized.

Copy link
Contributor Author

commented Mar 11, 2019

Yeah, you're right about the call to terminate(), the thing I'm suspicious about is the creation of a PantsDaemon` instance to be able to call it, but that can be solved.

Cool, I like the idea of using PantsDaemon.terminate() (which transitively calls ProcessManager.terminate()). Thanks!

@stuhood

This comment has been minimized.

Copy link
Member

commented Mar 11, 2019

One other potential implementation would be to have the server shut itself down after finishing handling one connection. The server can consume the global option the same way the client would, so this would avoid any additional (potentially racey) communication between the client and server.

I would lean toward the server being aware of the "exit after use" flag, because it's also likely that we will want to do things like "not start watchman if we're going to exit after one run", and etc.

@blorente

This comment has been minimized.

Copy link
Contributor Author

commented Mar 11, 2019

I like the idea of having the server be the one that shuts down, although that leaves some semantics to figure out (I'm just realising this as I'm thinking through the code). Particularly, I'm thinking about the behaviour of the server in the event of "I'm running a pants command and will definitely shut down after this ends, but another request comes in". The options here are:

  • Either the server auto-rejects requests if the flag is enabled and it's already running a request, or
  • The server somehow cuts off other requests when the first one finishes.

The first option would be good, but we may leave some use cases out of testing, and would probably require more code. The second one is more accurate (because it would force us to think about those cases), but it's also asking for a world of pain.

Also, afaik in both cases we need a way to communicate from the child of DaemonPantsRunner back to the pailgun server, which is non-trivial.

@blorente

This comment has been minimized.

Copy link
Contributor Author

commented Mar 12, 2019

(Sorry for converting an issue into a PR, I was testing out hub. I don't know how to unconvert it, so I figured we could carry on here.)

Problem

Described in this issue. We need a way to tell pantsd to shut down after running once.

Solution

  • Add a bootstrap flag --shutdown-pantsd-after-run, which tells pants to shut down after the first use.
  • Read that flag in PantsServices, and pass it down to PailgunService.
  • Implement a callback in PailgunService that terminates the service if the flag is enabled. Pipe that callback to PailgunServer. Any additional logic (e.g. "stop pantsd after x invocations instead of just one") should go here.
  • Make PailgunServer call that callback unconditionally after every finished pants run.

Work to do:

  • Add integration tests to test the functionality below.
  • Add unit tests to PailgunServer and possibly to PailgunService.
  • Make pantsd close more gracefully when a service is shut down, instead of with an exception.

Result

Users can now use the flag, which has the following semantics:

./pants --enable-pantsd --shutdown-pantsd-after-run list # Close pantsd after running list
./pants --enable-pantsd list && ./pants --enable-panstd --shutdown-pantsd-after-run list
     # The new flag is fingerprinted, so changing it will trigger a daemon restart 
     # at the beginning of the second invocation, 
     # so the second invocation will have the same semantics as the first case.
./pants --enable-pantsd list && ./pants --shutdown-panstd-after-run list 
    # Disabling pantsd at the beginning of a run exercises totally different codepaths 
    # (goes straight to a LocalPantsRunner),
    # which means that the second invocation will ignore the flag.

@stuhood stuhood referenced this pull request Mar 12, 2019

Closed

[Pantsd] Enable pantsd in Travis #7320

1 of 11 tasks complete
@stuhood
Copy link
Member

left a comment

Thanks, looks good!

# This needs to be accessed at the same time as enable_pantsd,
# so we register it at bootstrap time.
register('--shutdown-pantsd-after-run', advanced=True, type=bool, default=False,
help='Shutdown pantsd after the current run.'

This comment has been minimized.

Copy link
@stuhood

stuhood Mar 12, 2019

Member

Need a space at the end of this line to avoid concatenating as run.Will.

The two sentences also don't seem super consistent... saying "will shutdown after this run" sortof implies that it is already running, but the next sentence says that that isn't legal. Would say something like: "starts, uses, and then stops pantsd for each run".

Also, rather than needing to clarify that this "won't work if pantsd was already running", it might be easier to just implement something whereby the client does something like:

if shutdown_pantsd_after_run and pantsd.running():
  pantsd.terminate()

I don't think that it would add much latency, assuming that the "running" check is just a PID lookup and check.

@@ -99,7 +99,7 @@ class PailgunServer(ThreadingMixIn, TCPServer):
# Override the ThreadingMixIn default, to minimize the chances of zombie pailgun processes.
daemon_threads = True

def __init__(self, server_address, runner_factory, lifecycle_lock,
def __init__(self, server_address, runner_factory, lifecycle_lock, request_complete_callback,

This comment has been minimized.

Copy link
@stuhood

stuhood Mar 12, 2019

Member

Worth expanding the docstring for this one I think.

@ity

ity approved these changes Mar 12, 2019

Copy link
Contributor

left a comment

reading through the commentary and the review - looks good! sans Stu's comments.

@blorente

This comment has been minimized.

Copy link
Contributor Author

commented Mar 13, 2019

I have addressed the comments, and ended up overriding ProcessManager::needs_restart in PantsDaemon.
The only thing remaining is to make pantsd exit with not an exception when one of the services dies, but because (1) it's behind an experimental flag, (2) I would need a bit of time to think about this and (3) this could be de-prioritized according to #7320 (comment), I'm inclined to leave that as a TODO.

@stuhood

This comment has been minimized.

Copy link
Member

commented Mar 14, 2019

Thanks!

@stuhood stuhood merged commit 5a95dbf into pantsbuild:master Mar 14, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.