New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix stateful set pod recreation and event spam #123809
Conversation
// Note that pods with phase Succeeded will also trigger this event. This is | ||
// because final pod phase of evicted or otherwise forcibly stopped pods | ||
// (e.g. terminated on node reboot) is determined by the exit code of the | ||
// container, not by the reason for pod termination. We should restart the pod | ||
// regardless of the exit code. | ||
if isFailed(replicas[i]) || isSucceeded(replicas[i]) { | ||
if isFailed(replicas[i]) { | ||
ssc.recorder.Eventf(set, v1.EventTypeWarning, "RecreatingFailedPod", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see a use case behind this event. It is here only for tracking of the terminal phases.
Without this event we will still get the following events:
default 5s Normal Killing pod/nginx-roll-2 Stopping container nginx
default 4s Normal SuccessfulCreate statefulset/nginx-roll create Pod nginx-roll-2 in StatefulSet nginx-roll successful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we could move the event generation into the updatePod
handler if we want to be backwards compatible. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move or change here to logging seems acceptable.
@@ -235,6 +235,9 @@ func (ssc *StatefulSetController) updatePod(logger klog.Logger, old, cur interfa | |||
return | |||
} | |||
logger.V(4).Info("Pod objectMeta updated", "pod", klog.KObj(curPod), "oldObjectMeta", oldPod.ObjectMeta, "newObjectMeta", curPod.ObjectMeta) | |||
if oldPod.Status.Phase != curPod.Status.Phase && podutil.IsPodPhaseTerminal(curPod.Status.Phase) { | |||
logger.V(4).Info("StatefulSet Pod reached terminal phase", "pod", klog.KObj(curPod), "statefulSet", klog.KObj(set), "podPhase", curPod.Status.Phase) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am adding this logging in case somebody would like to debug these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, why not just add that pod phase to the above line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that might be even better for debugging
updateRevision.Name, | ||
replicaOrd) | ||
// New pod should be generated on the next sync after the current pod is removed from etcd. | ||
return true, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should try not to create the pods until the old one has been removed from etcd. Otherwise the kcm logs are getting spammed with:
stateful_set.go:430] error syncing StatefulSet default/nginx-roll, requeuing: object is being deleted: pods "nginx-roll-2" already exists, the server was not able to generate a unique name for the object
2a42314
to
fe811c5
Compare
/triage accepted |
a9728e2
to
f952189
Compare
9497927
to
4fcd99f
Compare
// - try at least 3 times if there are 0 replicas | ||
// - assume that each replica can potentially generate 3 conflicts when statefulset status is updated | ||
// e.g. sudden pod changes between (Running, Ready, Available, Terminating, Deleted) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E2E's were failing without this fix; https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/123809/pull-kubernetes-conformance-kind-ga-only-parallel/1766049762357809152
Kubernetes e2e suite: [It] [sig-apps] StatefulSet Basic StatefulSet functionality [StatefulSetBasic] Scaling should happen in predictable order and halt if any stateful pod is unhealthy [Slow] [Conformance] expand_less 1m56s
{ failed [FAILED] too many retries draining statefulset "ss"
In [It] at: k8s.io/kubernetes/test/e2e/framework/statefulset/rest.go:265 @ 03/08/24 10:57:02.069
}
I am not sure if this is suppose to test conflicts? If not, this could be changed to a timeout based update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me this looks like helper written before the backoff was introduced. How about re-writing this with wiat.ExponentialBackoff
with initial duration 2s, factor 1.5 and 2 steps. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, RetryOnConflict
might be even better for this. Also I based the backoff on DefaultRetry
and increased the number of steps a bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor nits, but overall this looks reasonable.
@@ -235,6 +235,9 @@ func (ssc *StatefulSetController) updatePod(logger klog.Logger, old, cur interfa | |||
return | |||
} | |||
logger.V(4).Info("Pod objectMeta updated", "pod", klog.KObj(curPod), "oldObjectMeta", oldPod.ObjectMeta, "newObjectMeta", curPod.ObjectMeta) | |||
if oldPod.Status.Phase != curPod.Status.Phase && podutil.IsPodPhaseTerminal(curPod.Status.Phase) { | |||
logger.V(4).Info("StatefulSet Pod reached terminal phase", "pod", klog.KObj(curPod), "statefulSet", klog.KObj(set), "podPhase", curPod.Status.Phase) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, why not just add that pod phase to the above line?
// Note that pods with phase Succeeded will also trigger this event. This is | ||
// because final pod phase of evicted or otherwise forcibly stopped pods | ||
// (e.g. terminated on node reboot) is determined by the exit code of the | ||
// container, not by the reason for pod termination. We should restart the pod | ||
// regardless of the exit code. | ||
if isFailed(replicas[i]) || isSucceeded(replicas[i]) { | ||
if isFailed(replicas[i]) { | ||
ssc.recorder.Eventf(set, v1.EventTypeWarning, "RecreatingFailedPod", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move or change here to logging seems acceptable.
// - try at least 3 times if there are 0 replicas | ||
// - assume that each replica can potentially generate 3 conflicts when statefulset status is updated | ||
// e.g. sudden pod changes between (Running, Ready, Available, Terminating, Deleted) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me this looks like helper written before the backoff was introduced. How about re-writing this with wiat.ExponentialBackoff
with initial duration 2s, factor 1.5 and 2 steps. wdyt?
- do not emit events when pod reaches terminal phase - do not try to recreate pod until the old pod has been removed from etcd storage
c3dbcd7
to
b904810
Compare
statefulset controller does less requests per sync now and thus can reconcile status faster, thus resulting in a higher chance for conflicts
b904810
to
9d9a2d1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
/label tide/merge-method-squash
LGTM label has been added. Git tree hash: 7aa2668daacc5ea3b06841104fe670296e1a2ed9
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: atiratree, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
unrelated |
What type of PR is this?
/kind bug
What this PR does / why we need it:
etcd storage
Which issue(s) this PR fixes:
Fixes #122709
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: