Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

windows health checks run from a single powershell instance #6947

Merged
merged 1 commit into from Sep 30, 2019

Conversation

mwrock
Copy link
Contributor

@mwrock mwrock commented Sep 11, 2019

This resolves #5584 and implements #6873

Signed-off-by: mwrock matt@mattwrock.com

@chef-expeditor
Copy link
Contributor

Hello mwrock! Thanks for the pull request!

Here is what will happen next:

  1. Your PR will be reviewed by the maintainers.
  2. If everything looks good, one of them will approve it, and your PR will be merged.

Thank you for contributing!

@mwrock mwrock self-assigned this Sep 11, 2019
Copy link
Contributor

@baumanj baumanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't made it all the way through yet, but I thought some initial feedback might be useful. This is a good opportunity for me to learn more about the Windows-specific code.

components/sup/src/manager/service/hooks.rs Show resolved Hide resolved
components/sup/src/manager/service/pipe_hook_client.rs Outdated Show resolved Hide resolved
// Always check to see if the powershell named pipe server is running.
// This should only be false the first time this function is called
// or if someone explicitly terminates the powershell process
if self.pipe_exists().is_err() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to distinguish between when we expect the pipe to exist versus when we're expecting that we'll have to create it. I have concerns around getting in loops where we create the pipe over and over again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well we expect it to exist on the 2nd call to exec_hook and onwards. However, its vwery possible someone can inadvertantly terminate the hosting pwsh process and obliterate the pipe. If there is something fundamentally wrong and we can't create the pipe, the call to init_server should err resulting in killing the supervisor. If for some reason the client did was not propperly disconnected from the pipe on the last call to exec_hook, the call to init_server would fail trying to create a new pipe that is busy. However we have 2 protections against this scenario:

  1. the client's pipe handle is dropped and thus closed at the end of exec_hook
  2. the powershell server disconnects the client after returning its exit code.

components/sup/src/manager/service/pipe_hook_client.rs Outdated Show resolved Hide resolved
Ok((pipe, poll))
}

fn init_server<T>(&self,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function intended to be called many times? The logic of exec_hook seems to indicate that it could be. If needing to reestablish the named pipe more than once during the lifetime of a HealthCheckHook is expected, I think the name init is unexpected. Could that instead be rolled into connect? If the connection should be durable, we could make it stateful in PipeHookClient either through an Option or the typestate pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along the lines of the above comment, we don't "expect" to call it multiple times but we need to be prepared to call it multiple times in the event the hosting pwsh process is terminated. I like keeping connect focused on connecting to an existing pipe. It also avoids needing to pass in the args that need to be forwarded to init_service. For instance, the quit function connects if there is an existing pipe but there is no need to create a new pipe service just to quit.

I had originally implemented this using a durrable connection and the struct held an Option<Pipe> in a mutatable property. The prblem that introduced is that the hook's run function is not setup to be mutable which it needs to be since it's pipe client can mutate. However changing the signature of run to use a &mut self proved to be pretty invasive.

I can definitely change the name of the init_server function though. Perhaps to start_server.

fn pipe_ready(&self, poll: &Poll, readiness: Ready) -> Result<()> {
let mut events = Events::with_capacity(1024);
loop {
poll.poll(&mut events, None)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add liveliness_checker::mark_thread_alive() here so we can track potential deadlocks. Here's one way to do it (note the change of return type):

    fn pipe_ready(&self, poll: &Poll, readiness: Ready) -> std::io::Result<bool> {
        let mut events = Events::with_capacity(1024);
        let loop_value = loop {
            let checked_thread = habitat_common::liveliness_checker::mark_thread_alive();
            let result = poll.poll(&mut events, None).map(|_|
                events.iter().any(|e| e.readiness().contains(readiness))
            );
            if let Ok(false) = result {
                continue;
            } else {
                break checked_thread.unregister(result);
            }
        };
        loop_value.into_result()
    }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing None to poll will block forever. It's hard to say before running this in the field if there are instances where we'd rather time out, but adding a env_config_duration would allow tuning that without an additional release. I wonder if it ever makes sense to block longer than our intended delay between health check runs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah my concern with adding a timeout is that effectively becomes a timeout for how long we allow the hook to run which we do not currently enforce in any of our hooks and I did not want to introduce a new concept of hook timeout and especially not only for windows. I think the worst that would happen here is the health check would stall but since they run in their own thread it would not interfere with the supervisor or other services in general.

components/sup/src/manager/service/pipe_hook_client.rs Outdated Show resolved Hide resolved
components/sup/src/manager/service/pipe_hook_client.rs Outdated Show resolved Hide resolved
components/sup/src/manager/service/pipe_hook_client.rs Outdated Show resolved Hide resolved
components/sup/src/manager/service/pipe_hook_client.rs Outdated Show resolved Hide resolved
Signed-off-by: mwrock <matt@mattwrock.com>
Copy link
Contributor

@baumanj baumanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@mwrock mwrock merged commit 596de47 into master Sep 30, 2019
@chef-expeditor chef-expeditor bot deleted the named_pipe branch September 30, 2019 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Powershell startup spikes CPU
2 participants