New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Second attempt: Plumb context to Kubelet CRI calls #113591
Conversation
/sig node |
/approve Let's try again! :) |
cc @sallyom |
Let's have a chat about this - #107829 dealt with the correct behavior that getting context into the Kubelet so that we could reduce propagation latency of "pod termination requests" (which leads to measurable end user outcomes). There are probably fixes in 107829 that address the issues involved, but I would like to review this before we put it in again. |
It looks like this attempt doesn't try to fix all of the context propagation in the kubelet, is that correct? I took a maximalist approach in my change, but it didn't have to be that way. |
My primary goal was to propagate context from SyncPod -> CRI calls for tracing purposes. See #113414. There are definitely plenty of other places in the kubelet where context would be helpful (Device plugins, CNI, probably others). During the process of plumbing SyncPod -> CRI, I had to add context to functions that were used elsewhere, and used existing contexts where possible. |
After our discussion we agreed that for /hold cancel |
Cluster didn't come up
/retest |
@dashpole where are you seeing this? |
/label tide/merge-method-squash |
looks like most recent run of |
/triage accepted |
/assign @mrunalp |
/approve |
/approve |
/approve let's try this again. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dashpole, dims, klueska, mrunalp The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
late to the party but yay! 🥳 |
* plumb context from CRI calls through kubelet * clean up extra timeouts * try fixing incorrectly cancelled context
Related to #113566.
Second attempt at #113408, which was reverted in #113548
What type of PR is this?
/kind feature
What this PR does / why we need it:
Plumb context to kubelet CRI calls.
This includes two additional commits on-top-of the original PR.
The "clean up extra timeouts" commit fixes a confusing place where we set the same timeout on a context twice, identified here: #113408 (comment).
The "try fixing incorrectly canceled context" commit ignores the incoming context from the pod worker. I believe this will fix the test failure.
Which issue(s) this PR fixes:
Part of #113414
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
cc @liggitt @dims @bobbypage @dchen1107 @aojea