New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
set lastterminationstate for container status even when CRI fails to return termination (or any) data #95364
Conversation
/triage accepted |
/kind bug |
/kind bug |
/priority important-soon |
/assign |
pkg/kubelet/kubelet_pods.go
Outdated
status.LastTerminationState.Terminated = &v1.ContainerStateTerminated{ | ||
Reason: "ContainerStatusUnknown", | ||
Message: "The container could not be located when the pod was deleted. The container used to be Running", | ||
ExitCode: 138, // one more than 137 for the other case of missing containers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this exit code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exit codes have meaning, i dont understand why 137 isnt used like https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/status/status_manager.go#L334
pkg/kubelet/kubelet_pods.go
Outdated
status.LastTerminationState.Terminated = &v1.ContainerStateTerminated{ | ||
Reason: "ContainerStatusUnknown", | ||
Message: "The container could not be located when the pod was deleted. The container used to be Running", | ||
ExitCode: 138, // one more than 137 for the other case of missing containers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exit codes have meaning, i dont understand why 137 isnt used like https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/status/status_manager.go#L334
/assign |
…return termination (or any) data
This took a little bit to review, but thank you for the clear comment thread. /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deads2k, derekwaynecarr The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
If no container is found when determining container status, then assuming it should be waiting seems plausible, but the pod phase status code requires that a previous termination be present. If we're offline long enough (or something removed the container?), then the previous termination may not be present. This next code block ensures that if the container was previously running then when that container status disappears, we can infer that it terminated even if we don't know the status code. By setting the lasttermination state we are able to leave the container status waiting and present more accurate data via the API.
This only impacts the path when the pod itself is being deleted to narrow the scope of change.