New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

windows health checks run from a single powershell instance #6947

Merged

mwrock merged 1 commit into master from named_pipe

Sep 30, 2019

Contributor

mwrock commented Sep 11, 2019

This resolves #5584 and implements #6873

Signed-off-by: mwrock matt@mattwrock.com

Contributor

chef-expeditor bot commented Sep 11, 2019

Hello mwrock! Thanks for the pull request!

Here is what will happen next:

Your PR will be reviewed by the maintainers.
If everything looks good, one of them will approve it, and your PR will be merged.

Thank you for contributing!

mwrock self-assigned this

baumanj reviewed

View reviewed changes

Contributor

baumanj left a comment

I haven't made it all the way through yet, but I thought some initial feedback might be useful. This is a good opportunity for me to learn more about the Windows-specific code.

components/sup/src/manager/service/hooks.rs Show resolved Hide resolved

components/sup/src/manager/service/pipe_hook_client.rs Outdated Show resolved Hide resolved

components/sup/src/manager/service/pipe_hook_client.rs Outdated

+                      // Always check to see if the powershell named pipe server is running.
+                      // This should only be false the first time this function is called
+                      // or if someone explicitly terminates the powershell process
+                      if self.pipe_exists().is_err() {

Contributor

baumanj Sep 17, 2019

It would be nice to distinguish between when we expect the pipe to exist versus when we're expecting that we'll have to create it. I have concerns around getting in loops where we create the pipe over and over again.

Contributor Author

mwrock Sep 19, 2019

Well we expect it to exist on the 2nd call to exec_hook and onwards. However, its vwery possible someone can inadvertantly terminate the hosting pwsh process and obliterate the pipe. If there is something fundamentally wrong and we can't create the pipe, the call to init_server should err resulting in killing the supervisor. If for some reason the client did was not propperly disconnected from the pipe on the last call to exec_hook, the call to init_server would fail trying to create a new pipe that is busy. However we have 2 protections against this scenario:

the client's pipe handle is dropped and thus closed at the end of exec_hook
the powershell server disconnects the client after returning its exit code.

components/sup/src/manager/service/pipe_hook_client.rs Outdated Show resolved Hide resolved

components/sup/src/manager/service/pipe_hook_client.rs Outdated

+                      Ok((pipe, poll))
+                  }
+                  fn init_server<T>(&self,

Contributor

baumanj Sep 17, 2019

Is this function intended to be called many times? The logic of exec_hook seems to indicate that it could be. If needing to reestablish the named pipe more than once during the lifetime of a HealthCheckHook is expected, I think the name init is unexpected. Could that instead be rolled into connect? If the connection should be durable, we could make it stateful in PipeHookClient either through an Option or the typestate pattern.

Contributor Author

mwrock Sep 19, 2019

Along the lines of the above comment, we don't "expect" to call it multiple times but we need to be prepared to call it multiple times in the event the hosting pwsh process is terminated. I like keeping connect focused on connecting to an existing pipe. It also avoids needing to pass in the args that need to be forwarded to init_service. For instance, the quit function connects if there is an existing pipe but there is no need to create a new pipe service just to quit.

I had originally implemented this using a durrable connection and the struct held an Option<Pipe> in a mutatable property. The prblem that introduced is that the hook's run function is not setup to be mutable which it needs to be since it's pipe client can mutate. However changing the signature of run to use a &mut self proved to be pretty invasive.

I can definitely change the name of the init_server function though. Perhaps to start_server.

components/sup/src/manager/service/pipe_hook_client.rs Outdated

+                  fn pipe_ready(&self, poll: &Poll, readiness: Ready) -> Result<()> {
+                      let mut events = Events::with_capacity(1024);
+                      loop {
+                          poll.poll(&mut events, None)?;

Contributor

baumanj Sep 17, 2019

I think we should add liveliness_checker::mark_thread_alive() here so we can track potential deadlocks. Here's one way to do it (note the change of return type):

    fn pipe_ready(&self, poll: &Poll, readiness: Ready) -> std::io::Result<bool> {
        let mut events = Events::with_capacity(1024);
        let loop_value = loop {
            let checked_thread = habitat_common::liveliness_checker::mark_thread_alive();
            let result = poll.poll(&mut events, None).map(|_|
                events.iter().any(|e| e.readiness().contains(readiness))
            );
            if let Ok(false) = result {
                continue;
            } else {
                break checked_thread.unregister(result);
            }
        };
        loop_value.into_result()
    }

Contributor

baumanj Sep 17, 2019

Passing None to poll will block forever. It's hard to say before running this in the field if there are instances where we'd rather time out, but adding a env_config_duration would allow tuning that without an additional release. I wonder if it ever makes sense to block longer than our intended delay between health check runs.

Contributor Author

mwrock Sep 19, 2019

Yeah my concern with adding a timeout is that effectively becomes a timeout for how long we allow the hook to run which we do not currently enforce in any of our hooks and I did not want to introduce a new concept of hook timeout and especially not only for windows. I think the worst that would happen here is the health check would stall but since they run in their own thread it would not interfere with the supervisor or other services in general.

components/sup/src/manager/service/pipe_hook_client.rs Outdated Show resolved Hide resolved

components/sup/src/manager/service/pipe_hook_client.rs Outdated Show resolved Hide resolved

components/sup/src/manager/service/pipe_hook_client.rs Outdated Show resolved Hide resolved

components/sup/src/manager/service/pipe_hook_client.rs Outdated Show resolved Hide resolved


          windows health checks run from a single powershell instance

Signed-off-by: mwrock <matt@mattwrock.com>

mwrock force-pushed the named_pipe branch from ff04e72 to 6940157 Compare

September 19, 2019 01:59

mwrock marked this pull request as ready for review

September 19, 2019 01:59

mwrock requested review from chefsalim, christophermaier and raskchanky as code owners

September 19, 2019 01:59

baumanj approved these changes

View reviewed changes

Contributor

baumanj left a comment

Looks good!

mwrock merged commit 596de47 into master

chef-expeditor bot deleted the named_pipe branch

September 30, 2019 20:12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment