New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API change: add maxInitialFailureCount to health probes #71449

Open
wants to merge 1 commit into
base: master
from

Conversation

@matthyx
Copy link
Contributor

matthyx commented Nov 27, 2018

What type of PR is this?
/kind api-change

What this PR does / why we need it
Adds an numerical option maxInitialFailureCount to probes to allow a greater number of failures during the initial start of the pod before taking action.

Which issue(s) this PR fixes
Fixes #27114

Special notes for your reviewer
It's been a long time since I last contributed to k/k, I don't know where to document this change, please point me to the correct page.

Does this PR introduce a user-facing change?
Yes.

Adds an numerical option `maxInitialFailureCount` to probes to allow a greater number of failures during the initial start of the pod before taking action.
@matthyx

This comment has been minimized.

Copy link
Contributor Author

matthyx commented Nov 27, 2018

/assign @lavalamp

@matthyx

This comment has been minimized.

Copy link
Contributor Author

matthyx commented Nov 28, 2018

@lavalamp e2e-containerized test failure was a flake... you can review the PR.

@@ -218,6 +221,11 @@ func (w *worker) doProbe() (keepGoing bool) {
return true
}

// Record the end of the initialization phase.
if result == results.Success && !w.hasInitialized {

This comment has been minimized.

@lavalamp

lavalamp Nov 28, 2018

Member

We need to get someone from SIG Node to weigh in on this.

  • Right now, this code might randomly allow MaxInitialFailureCount failures after the pod has become alive, if kubelet happens to restart at the same time the pod goes unhealthy. Is that OK from a UX perspective? From a Kubelet perspective? (i.e., do other things already behave like that?)

This comment has been minimized.

@yujuhong

yujuhong Feb 13, 2019

Member

Normally, kubelet handles container/pod lifecycle-related functions by relying on the underlying container runtime to persist the states. For probes and/or lifecycle hooks, kubelet rely on in-memory states only.

This is consistent with the existing probe behavior, but the existing behavior does not care about pod lifecycle much. If the MaxInitialFailureCount is set very high, it can potentially make the container downtime much longer and will need to be used with caution (documentation?).

By the way, just to make it clear, even though we use the term "pod" in the discussion threads, the changes pertain to "containers" and "container starts".

This comment has been minimized.

@matthyx

matthyx Feb 13, 2019

Author Contributor

@yujuhong maybe we could persist the hasInitialized state somewhere inside the container itself? Why not a file that we create somewhere?
But it would do an exec every time we need to check that state...

This comment has been minimized.

@matthyx

matthyx Feb 13, 2019

Author Contributor

If the MaxInitialFailureCount is set very high, it can potentially make the container downtime much longer and will need to be used with caution (documentation?).

Indeed, but people in need of that feature would normally use a very high FailureThreshold, so it wouldn't be worse in that case...

@@ -1977,6 +1977,10 @@ type Probe struct {
// Defaults to 3. Minimum value is 1.
// +optional
FailureThreshold int32 `json:"failureThreshold,omitempty" protobuf:"varint,6,opt,name=failureThreshold"`
// Maximum number of failures allowed for the probe to be considered failed before having succeeded once.

This comment has been minimized.

@lavalamp

lavalamp Nov 28, 2018

Member

We need to make it clearer that this is only while waiting for the pod to start up. We need to document the behavior when kubelet restarts (at a minimum); I was thinking it was OK to restart the count in that case but with fresh eyes it's not sounding like a good idea to me any more. :/

I'll suggest some wording in a bit.

We also need to document and/or enforce compatibility/version skew guarantees.

@thockin
Copy link
Member

thockin left a comment

The main complexity of this overall idea is the state-saving, which I don't think is handled correctly here. Node folks will know best.

@@ -1874,6 +1874,10 @@ type Probe struct {
// Minimum consecutive failures for the probe to be considered failed after having succeeded.
// +optional
FailureThreshold int32
// Maximum number of failures allowed for the probe to be considered failed before having succeeded once.

This comment has been minimized.

@thockin

thockin Nov 28, 2018

Member

This sentence doesn't quite parse for me. Maybe "Maximum number of initial failures (before having succeeded once) allowed before being considered failed." or something?

We need to add some better documentation for this struct, I think, including a timeline example or 2 or 3 showing how the fields interact.

This is a smaller change than adding a whole new "initialized probe" or "startup probe", but I am a little on the fence about the conceptual complexity of it. Adding a new instance of Probe seems like it might be overall simpler to comprehend and document. @lavalamp or @liggitt thoughts?

If we're going to impose the requirement that kubelet carry this state at all (@dchen1107 @yujuhong @derekwaynecarr), it's the same burden in either mode.

Another alternative would be to add a timeout to PostStart handlers, and use that instead? E.g. Do whatever you need to do internal to your pod as PostStart. When you return from that, probes will begin. If the timeout is hit, it's the same as PostStart having failed. Presumably any state that Kubelet needs to track that is already implemented?

Regardless of which path we take, we need to document VERY clearly what is guaranteed and what is best-effort. E.g. If kubelet crashes, can it ever re-trigger the thing (whatever we decide is the right approach) or is it guaranteed never to re-trigger? I.e. could a user do something internally stateful there and rely on it not getting borked?

This comment has been minimized.

@liggitt

liggitt Nov 28, 2018

Member

We need to add some better documentation for this struct, I think, including a timeline example or 2 or 3 showing how the fields interact.

Agree

If we're going to impose the requirement that kubelet carry this state at all (@dchen1107 @yujuhong @derekwaynecarr), it's the same burden in either mode.

FailureThreshold is already stateful and resets if the kubelet restarts... would MaxInitialFailureCount behave the same way (e.g. "initial" is from the probe's perspective, not the container's)?

This comment has been minimized.

@yujuhong

yujuhong Feb 13, 2019

Member

FailureThreshold is already stateful and resets if the kubelet restarts... would MaxInitialFailureCount behave the same way (e.g. "initial" is from the probe's perspective, not the container's)?

+1. What's the expected behavior? If the state needs to be persisted over kubelet restarts, it'll need to be handled very differently from the existing probes.

Another alternative would be to add a timeout to PostStart handlers, and use that instead? E.g. Do whatever you need to do internal to your pod as PostStart. When you return from that, probes will begin. If the timeout is hit, it's the same as PostStart having failed. Presumably any state that Kubelet needs to track that is already implemented?

IIRC, kubelet doesn't save any state for running/completing poststart hooks. If we go this route, we'd still have to persist some state, and we'll have to teach the probers to wait until the hook is completed. Right now, the probers may start in parallel (i.e., triggered by running containers).

@@ -218,6 +221,11 @@ func (w *worker) doProbe() (keepGoing bool) {
return true
}

// Record the end of the initialization phase.
if result == results.Success && !w.hasInitialized {
w.hasInitialized = true

This comment has been minimized.

@thockin

thockin Nov 28, 2018

Member

if kubelet crashes and restarts, what happens? Do we suddenly allow some (potentially large) number of failures again?

This comment has been minimized.

@yujuhong

yujuhong Feb 13, 2019

Member

Yes, based on the current implementation.

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Nov 28, 2018

@matthyx

This comment has been minimized.

Copy link
Contributor Author

matthyx commented Nov 28, 2018

Then should we persist all these states in etcd?
I mean basically all of these:

  • lastResult
  • resultRun
  • onHold
  • hasInitialized

The idea behind is that all these states are important for the logic, and a kubelet restart makes it amnesic on these...

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Nov 29, 2018

@matthyx

This comment has been minimized.

Copy link
Contributor Author

matthyx commented Nov 30, 2018

Any thoughts about storing these states (@dchen1107 @yujuhong @derekwaynecarr)?

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Nov 30, 2018

@matthyx

This comment has been minimized.

Copy link
Contributor Author

matthyx commented Nov 30, 2018

Oops, sorry, I didn't want to sound impatient or rude.
Happy Kubecon Seattle!
See you hopefully in Barcelona next year (if I manage to convince my boss).

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Nov 30, 2018

@matthyx

This comment has been minimized.

Copy link
Contributor Author

matthyx commented Dec 18, 2018

@thockin @dchen1107 @yujuhong @derekwaynecarr, Kubecon is finished.
Can I get your thoughts about storing kubelet states in etcd (or somewhere else) to survive kubelet restarts?

@matthyx

This comment has been minimized.

Copy link
Contributor Author

matthyx commented Jan 3, 2019

Maybe a stateful kubelet could be the subject of a KEP, since this problem isn't unique to that PR.
Can we move forward on this PR and discuss state backup elsewhere?

@matthyx matthyx force-pushed the matthyx:maxInitialFailureCount branch from 1bc0300 to a706e25 Jan 23, 2019

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Jan 23, 2019

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: matthyx
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: lavalamp

If they are not already assigned, you can assign the PR to them by writing /assign @lavalamp in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@matthyx matthyx force-pushed the matthyx:maxInitialFailureCount branch from a706e25 to 8b09ddb Feb 11, 2019

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Feb 11, 2019

@matthyx: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-local-e2e-containerized 8b09ddb link /test pull-kubernetes-local-e2e-containerized
pull-kubernetes-bazel-test 8b09ddb link /test pull-kubernetes-bazel-test

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@@ -1874,6 +1874,10 @@ type Probe struct {
// Minimum consecutive failures for the probe to be considered failed after having succeeded.
// +optional
FailureThreshold int32
// Maximum number of failures allowed for the probe to be considered failed before having succeeded once.

This comment has been minimized.

@yujuhong

yujuhong Feb 13, 2019

Member

FailureThreshold is already stateful and resets if the kubelet restarts... would MaxInitialFailureCount behave the same way (e.g. "initial" is from the probe's perspective, not the container's)?

+1. What's the expected behavior? If the state needs to be persisted over kubelet restarts, it'll need to be handled very differently from the existing probes.

Another alternative would be to add a timeout to PostStart handlers, and use that instead? E.g. Do whatever you need to do internal to your pod as PostStart. When you return from that, probes will begin. If the timeout is hit, it's the same as PostStart having failed. Presumably any state that Kubelet needs to track that is already implemented?

IIRC, kubelet doesn't save any state for running/completing poststart hooks. If we go this route, we'd still have to persist some state, and we'll have to teach the probers to wait until the hook is completed. Right now, the probers may start in parallel (i.e., triggered by running containers).

@@ -218,6 +221,11 @@ func (w *worker) doProbe() (keepGoing bool) {
return true
}

// Record the end of the initialization phase.
if result == results.Success && !w.hasInitialized {
w.hasInitialized = true

This comment has been minimized.

@yujuhong

yujuhong Feb 13, 2019

Member

Yes, based on the current implementation.

@@ -1982,6 +1982,10 @@ type Probe struct {
// Defaults to 3. Minimum value is 1.
// +optional
FailureThreshold int32 `json:"failureThreshold,omitempty" protobuf:"varint,6,opt,name=failureThreshold"`
// Maximum number of failures allowed for the probe to be considered failed before having succeeded once.
// No default value. Minimum value is 1.

This comment has been minimized.

@yujuhong

yujuhong Feb 13, 2019

Member

Isn't the default value 0 in this case? This is larger than the minimum value already.

Also, there is no validation for the field.

@@ -218,6 +221,11 @@ func (w *worker) doProbe() (keepGoing bool) {
return true
}

// Record the end of the initialization phase.
if result == results.Success && !w.hasInitialized {

This comment has been minimized.

@yujuhong

yujuhong Feb 13, 2019

Member

Normally, kubelet handles container/pod lifecycle-related functions by relying on the underlying container runtime to persist the states. For probes and/or lifecycle hooks, kubelet rely on in-memory states only.

This is consistent with the existing probe behavior, but the existing behavior does not care about pod lifecycle much. If the MaxInitialFailureCount is set very high, it can potentially make the container downtime much longer and will need to be used with caution (documentation?).

By the way, just to make it clear, even though we use the term "pod" in the discussion threads, the changes pertain to "containers" and "container starts".

@yujuhong

This comment has been minimized.

Copy link
Member

yujuhong commented Feb 13, 2019

storing in etcd (e.g. pod status) is possible, but a little awkward. I want to hear node team on this.

I don't think we should rely on etcd to persist the state. kubelet reports pod/container status, but does not rely on the apiserver to perform health check today. Changes like this may increase container downtime when kubelet the apisever is temporarily unreachable.

@yujuhong

This comment has been minimized.

Copy link
Member

yujuhong commented Feb 13, 2019

Also, probably an unpopular opinion, but the 1.14 release process requires a KEP for features like this. I think the KEP deadline has passed, so this'd most likely miss the 1.14 release.
You're also welcome to add this in the sig-node weekly meeting agenda to get more attention: https://github.com/kubernetes/community/tree/master/sig-node

@matthyx

This comment has been minimized.

Copy link
Contributor Author

matthyx commented Feb 13, 2019

this'd most likely miss the 1.14 release

It's OK, we've also missed the 1.13 deadline... at least it's moving a bit now, thanks for taking the time to give your point of view.

So you suggest I add this point to the sig-node meeting, would I have to participate to defend this idea?

@yujuhong

This comment has been minimized.

Copy link
Member

yujuhong commented Feb 14, 2019

So you suggest I add this point to the sig-node meeting, would I have to participate to defend this idea?

I wouldn't call it "defend", but you'll need to attend the meeting to explain what the proposal is about, and participate in the discussion.

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Feb 17, 2019

@matthyx: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@matthyx

This comment has been minimized.

Copy link
Contributor Author

matthyx commented Feb 18, 2019

I wouldn't call it "defend", but you'll need to attend the meeting to explain what the proposal is about, and participate in the discussion.

Thanks for your help, I will shoot for the SIG meeting next week. In the meantime, I will try to sketch out the provisional KEP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment