-
Notifications
You must be signed in to change notification settings - Fork 39.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Container exit message lost due to FallbackToLogsOnError is not compatible with ContainerCannotRun #81280
Conversation
Hi @yqwang-ms. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
…ompatible with ContainerCannotRun
d450e35
to
f82be3d
Compare
/assign @dchen1107 |
/assign @sjenning Could you please take a look at this small fix when you free? :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your detailed pr - especially the detailed reproduction use case :)
This change makes sense to me :)
/lgtm
Could I trouble you to update your release note? I don't find ContainerCannotRun should not FallbackToLogsOnError
to be particularly descriptive of the change you're making. What about something like If container fails because ContainerCannotRun, do not utilize the FallbackToLogsOnError TerminationMessagePolicy, as it masks more useful logs.
if len(tMessage) != 0 { | ||
cStatus.Message = tMessage | ||
if len(cStatus.Message) != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain a little more about this change? I'm not totally following how it relates to the overall change you are describing in your pr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review.
You are right, strictly speaking, it should not belong to this PR, however, in more general sense, it can prevent future useful info being lost.
For example, kubelet failed to read the termination file which may due to a reason described in dockerd's container error message.
So the principle here is that, discard exit info ONLY if we can ensure it is useless or duplicated.
For this case, we cannot ensure that dockerd's container error message is useless given "failed to read the termination file" or even we successfully read the file (because we also donot know if the dockerd's container error message is the root cause of the just read content).
So, for simplicity, I just append the error message, instead of one replace another.
It is just my preference, and TBD, we can disscus on it. :)
(Release note also updated)
/ok-to-test |
/retest |
@sjenning @derekwaynecarr Could you please take a look at this small fix? :) |
Pinging @sjenning @derekwaynecarr , PTAL |
/unassign @dchen1107 /assign @Random-Liu Because no responses from bot suggested approvers, could you please kindly take a look at this small fix when you free? :) |
Pinging @Random-Liu and @tallclair , PTAL |
if len(cStatus.Message) != 0 { | ||
cStatus.Message += ": " | ||
} | ||
cStatus.Message += tMessage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for cStatus.Message
to grow infinitely, or do we reset it each time this function is called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review.
I can confirm that cStatus.Message will NOT grow infinitely, given that getPodContainerStatuses
is not stateful for the cStatus.Message (and it is by design should not be stateful), so that cStatus.Message will NOT accumulated.
We can prove this in details, please check:
-
The background is that cStatus.Message will only be updated in 2 places:
Note the cStatus.Message is just
string
instead of*string
kubernetes/pkg/kubelet/container/runtime.go
Line 314 in f82be3d
Message string -
cStatus.Message in line 421 can only come from above
toKubeContainerStatus
in the samegetPodContainerStatuses
func:
cStatus := toKubeContainerStatus(status, m.runtimeName) -
In the
getPodContainerStatuses
, it will only init the cStatus.Message fromruntime container message
:
cStatus.Message = status.Message -
The
runtime container message
can only come directly from the underlay runtime:
message = r.State.Error -
The
runtime container message
will NOT from the cStatus.Message. So, the cStatus.Message will only be appended at most once by:
kubernetes/pkg/kubelet/kuberuntime/kuberuntime_container.go
Lines 421 to 426 in f82be3d
if len(tMessage) != 0 { if len(cStatus.Message) != 0 { cStatus.Message += ": " } cStatus.Message += tMessage } -
So the cStatus.Message will NOT grow infinitely.
Overall, the cStatus.Message is composed from 2 parts, first part is the runtime container message
, see steps 2-4, and second part is TerminationMessage from:
tMessage, checkLogs := getTerminationMessage(status, annotatedInfo.TerminationMessagePath, fallbackToLogs) |
So, no more chars in the cStatus.Message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kindly ping @tallclair , PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any comments for this? :)
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: tallclair, yqwang-ms The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Fix Container exit message lost due to FallbackToLogsOnError is not compatible with ContainerCannotRun
What type of PR is this?
/kind bug
What this PR does / why we need it:
Bug: FallbackToLogsOnError is not compatiable with ContainerCannotRun.
If a container cannot even be started, such as due to docker run parameters incorrect (like below example, unknown linux capability), it certainly does not have logs, so in this case, we should not override the exit message got from dockerd (like below example, "unknown linux capability"), to be "Error on reading termination message from logs". Otherwise, the truely useful exit message will be lost.
Reproduce Steps:
Submit below Pod:
Pod failed without useful info:
We only have info "Error on reading termination message from logs"
If the FallbackToLogsOnError is removed, and submit the Pod again, then we can get useful info:
"Linux spec capabilities: Unknown capability to add"
Which issue(s) this PR fixes:
Fixes # NO See issue above.
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: