Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup lifecycle restarts and remove reload hook #6668

Merged
merged 10 commits into from Jul 17, 2019

Conversation

@davidMcneil
Copy link
Contributor

commented Jun 18, 2019

Resolves #1838 #5305 #5306 #5307

The following issues must be addressed before merging:

  • The two TODOs in the code
  • Document the change
  • Update core plans to reflect the change to the reload hook

Signed-off-by: David McNeil mcneil.david2@gmail.com

@davidMcneil davidMcneil self-assigned this Jun 18, 2019
Copy link
Contributor

left a comment

Just some maintainability suggestions you can take or leave. I don't have enough context here to fully approve, so I think that'll have to wait for @christophermaier.

@@ -1099,7 +1099,9 @@ impl Manager {
.expect("Services lock is poisoned!");
let idents_to_restart: Vec<_> = state_services.iter()
.filter_map(|(current_ident, service)| {
if let Some(new_ident) =
if service.needs_restart {

This comment has been minimized.

Copy link
@baumanj

baumanj Jun 18, 2019

Contributor

The formatting here is confusing. It might be worth seeing if there's a rustfmt issue for this, and if not filing one. In the meantime, I'd suggest fixing this manually and adding a #[rustfmt::skip] for this method.

run: false,
post_run: false,
post_stop: false, }
}

This comment has been minimized.

Copy link
@baumanj

baumanj Jun 18, 2019

Contributor

Why not Self::default()?

|| self.run
|| self.post_run
|| self.post_stop
}

This comment has been minimized.

Copy link
@baumanj

baumanj Jun 18, 2019

Contributor

In the event we add additional hooks, we could take a slightly different approach here to help ensure the appropriate modification are made:

        match self {
            Self { health_check,
                   init,
                   file_updated,
                   reconfigure,
                   suitability,
                   run,
                   post_run,
                   post_stop, } => {
                health_check
                || init
                || file_updated
                || reconfigure
                || suitability
                || run
                || post_run
                || post_stop
            }
        }

Then if we left something out, we'd get a helpful message from the compiler. For example:

error[E0027]: pattern does not mention field `a_new_hook`
   --> components/sup/src/manager/service/hooks.rs:396:13
    |
396 | /             Self { health_check,
397 | |                    init,
398 | |                    file_updated,
399 | |                    reconfigure,
...   |
402 | |                    post_run,
403 | |                    post_stop, } => {
    | |_______________________________^ missing field `a_new_hook`

This comment has been minimized.

Copy link
@baumanj

baumanj Jun 18, 2019

Contributor

Another alternative would be using a HashMap wrapper with an array of constant associated keys instead of a struct with separate fields. That would allow us to do something like self.0.values().any(…).

@@ -673,7 +660,7 @@ mod tests {
////////////////////////////////////////////////////////////////////////

let hook_table = HookTable::load(&service_group, &template_path, &hooks_path);
assert_eq!(hook_table.compile(&service_group, &ctx), true);
assert_eq!(hook_table.compile(&service_group, &ctx).changed(), true);

This comment has been minimized.

Copy link
@baumanj

baumanj Jun 18, 2019

Contributor

I know this was the existing code, but as long as we're modifying, I think

Suggested change
assert_eq!(hook_table.compile(&service_group, &ctx).changed(), true);
assert!(hook_table.compile(&service_group, &ctx).changed());

reads a bit clearer. Similarly elsewhere.

// Restart the service if:
// 1. the run or post-run hooks have changed
// 2. the process is down
// 3. there is no reconfigure hook and there was a config change

This comment has been minimized.

Copy link
@baumanj

baumanj Jun 18, 2019

Contributor

This code is so legible, I feel like the comment is redundant. Since the risk of the two getting out of sync is relatively high, I'd suggest removing the comment.

@davidMcneil davidMcneil force-pushed the dmcneil/lifecycle-hooks branch 2 times, most recently from c27f0ef to 88676b0 Jun 20, 2019
@davidMcneil davidMcneil marked this pull request as ready for review Jun 21, 2019
@@ -455,14 +424,14 @@ impl Service {
self.validate_binds(census_ring);
}

let svc_updated = self.update_templates(census_ring);
let (svc_updated, hook_update_table, config_change) = self.update_templates(census_ring);

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jun 24, 2019

Contributor

Wonder if we can consolidate hook_update_table and config_change into a single new type (not suggesting we just shove the config_change into the hook_update_table)... maybe this is what has the additional methods that drive the restart / reconfigure logic.

{
// TODO (DM): This flag is a hack. We have the `TaskExecutor` here. We could just
// schedule the `stop` future, but the `Manager` wraps the `stop` future with
// additional functionality. Can we refactor to make this flag unnecessary?

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jun 24, 2019

Contributor

Wonder if we could pass a return value from here back up to the manager, so it could handle the queuing of the future up there? That would at least reduce extra state lying around like this.

This comment has been minimized.

Copy link
@davidMcneil

davidMcneil Jun 25, 2019

Author Contributor

It is not ideal, but I am going to propose we leave this TODO for the time being. I am hopeful that as we continue refactoring the lifecycle hooks a better pattern will emerge. For example, if we end up with a Service future we could just spawn the future directly even without the TaskExecutor.

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jun 25, 2019

Contributor

I'm cool with letting this sit for a while and seeing how things evolve. Using a return value was an alternative that sprang to my mind, but given the state of other code surrounding this, that might make things more complicated.

It would be good to add some comments on this field (in the struct declaration) stating what it's currently intended to do, and that we're ultimately looking to remove it, just so Future Us doesn't lose sight of this.

@@ -736,29 +704,14 @@ impl Service {
self.user_config_updated = false;
}

self.defaults_updated = false;

// TODO (DM): Do we need to return cfg_changed? What does it actually indicate?

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jun 24, 2019

Contributor

cfg_changed indicates whether any of the data that would feed into the templating process itself has changed. It's probably a bad name 😦

At the end of the day, though, we ultimately care about whether those changes lead to anything changing in the configuration files or hook files that we render out to disk.

For instance, some new Supervisor may have appeared in the network (meaning that cfg_changed would be true), but that fact alone is probably totally irrelevant to the Redis service that you're running. On the other hand, if that new Supervisor happens to be running Redis too, then chances are good that your on-disk configuration and/or hooks will have changed in response. That is the thing that's important here.

@@ -736,29 +704,14 @@ impl Service {
self.user_config_updated = false;
}

self.defaults_updated = false;

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jun 24, 2019

Contributor

Heh, I think I forgot to take this out... one of my previous refactorings should have rendered this obsolete 😅

Thanks for catching it!

// schedule the `stop` future, but the `Manager` wraps the `stop` future with
// additional functionality. Can we refactor to make this flag unnecessary?
self.needs_restart = true;
} else if config_change || hook_update_table.reconfigure {

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jun 24, 2019

Contributor

I think this is correct, but the way the overall if/else conditional is structured makes it a little tricky to confirm without whipping up a truth table. Restructuring it so it's more obviously correct and complete would be good (that may involve some helper methods on the hook_update_table struct [mentioned earlier], some additional structure to let the compiler lend a hand here, etc.)

@davidMcneil davidMcneil force-pushed the dmcneil/lifecycle-hooks branch 2 times, most recently from 5d067d0 to 35b4c44 Jun 25, 2019
have_reconfigure_hook }
}

fn needs_restart(&self) -> bool {

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jun 26, 2019

Contributor

Some documentation comments about why the logic is this way (on this as well as needs_reconfigure below) would be useful.

Having the reason why run and post_run are the "special" hooks clearly spelled out will also be a useful guide to Future Us as we add additional hooks.

}
}
}
}
if svc_updated {
if template_update.changed() {

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jun 26, 2019

Contributor

🤔 This may be too coarse, now... as coded here, this would retrigger health checks if, say, just the init hook were to change. Since that no longer will trigger a restart, there's no way it would affect the running service. You might be able to make an argument that health checks should be restarted if just the health check hook were to change, though.

Digging a bit further, it seems that svc_updated was actually only indicating if any of the source data for templating had changed, and not if the service was actually updated (reconfigured, restarted, actually changed in any way), as advertised. This ultimately bubbles up to cause an updated service rumor to be sent out, which is a bit surprising; why should we send out an updated service rumor if nothing has affected the running of the service at all?

(So, template_update.changed() is, strictly speaking, changing this behavior, but now I'm not convinced the previous behavior was correct, either 😱 )

This comment has been minimized.

Copy link
@davidMcneil

davidMcneil Jun 26, 2019

Author Contributor

Interesting. I had a similar thoughts about the health check, but was confused by the comments in in that if block. They seem almost contradictory. On one hand, we want to run the health check right away after any change to immediately detect if there was a problem. On the other hand, we want to wait enough time to let the service get up to speed. One option would be to completely remove the restart? That way we are not trying to guess how long is long enough.

As for the return value, it sounds like we may want to change the behaviour even further, and only return true if the service was restarted or reconfigured (opposed to any change to hooks or config). Is there a good way to test if this breaks things?

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jun 26, 2019

Contributor

I think health checks may just naturally restart once a service restarts anyway (if they don't, then that seems like a more natural place to for this behavior, rather than dealing with extra special cases... though we would still want to take reconfigures into account).

As for changing the return value... yeah, I agree that we should change it. Testing will likely involve resurrecting our Kubernetes testing cluster, which is slated to be added into our testing pipelines at some point in the near future. (pinging @scotthain)

This comment has been minimized.

Copy link
@scotthain

scotthain Jul 2, 2019

Contributor

@christophermaier We should be able to set up an e2e test with the kube cluster pretty easily once we get that pipeline hooked up. I've never used/seen the old kube cluster tho so I have very little context on its actual operation.

Copy link
Contributor

left a comment

I like the explicit returns.

@davidMcneil davidMcneil force-pushed the dmcneil/lifecycle-hooks branch from 845e668 to bdde83b Jul 2, 2019
Copy link
Contributor

left a comment

This is a nice improvement!

@@ -921,12 +902,17 @@ impl Service {
win_perm::harden_path(path.as_ref())
}

fn execute_hooks(&mut self, launcher: &LauncherCli, executor: &TaskExecutor) {
/// Returns `true` if the service was restarted or reconfigured.

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jul 5, 2019

Contributor

This is actually true if the service is marked for restarting, or if the reconfigure hook ran (currently synchronously), correct?

This comment has been minimized.

Copy link
@davidMcneil

davidMcneil Jul 5, 2019

Author Contributor

Yes, that is correct. I will update the comment to be more clear.

@davidMcneil davidMcneil force-pushed the dmcneil/lifecycle-hooks branch 2 times, most recently from 5bc1ec8 to a856cbe Jul 7, 2019
davidMcneil added 10 commits Jun 10, 2019
Resolves #5305 #5306 #5307

Signed-off-by: David McNeil <mcneil.david2@gmail.com>

PR feedback

Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Signed-off-by: David McNeil <mcneil.david2@gmail.com>
The reload hook has been deprecated, but add it back
in for now. Run it immediately following the reconfigure
hook. In a future release, we will completely remove
this hook.

Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Signed-off-by: David McNeil <mcneil.david2@gmail.com>
@davidMcneil davidMcneil force-pushed the dmcneil/lifecycle-hooks branch from a856cbe to ff47367 Jul 17, 2019
@davidMcneil davidMcneil merged commit 553502a into master Jul 17, 2019
5 checks passed
5 checks passed
DCO This commit has a DCO Signed-off-by
Details
buildkite/habitat-sh-habitat-master-verify Build #2693 passed (50 minutes, 14 seconds)
Details
buildkite/habitat-sh-habitat-master-website Build #3354 passed (43 seconds)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
expeditor/config-validation Validated your Expeditor config file
Details
@davidMcneil davidMcneil deleted the dmcneil/lifecycle-hooks branch Jul 17, 2019
chef-ci added a commit that referenced this pull request Jul 17, 2019
Obvious fix; these changes are the result of automation not creative thinking.
@davidMcneil davidMcneil referenced this pull request Aug 29, 2019
christophermaier added a commit that referenced this pull request Sep 25, 2019
In #6668 some documentation changes were added to the hook reference
partial.

It appears #6925 left the partial behind, while moving its content
into the main reference page. The content was missing the changes from
PR #6668, however; I suspect rebase issues.

This commit removes the now-redundant partial, and restores the
missing documentation.

Signed-off-by: Christopher Maier <cmaier@chef.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.