New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DO NOT MERGE: Testing context cancellation and status prioritization together #1163
DO NOT MERGE: Testing context cancellation and status prioritization together #1163
Conversation
If CRI returns a container that has been created but is not running, it is not safe to assume it is terminal, as our connection to CRI may have failed. Instead, created is treated as waiting, as in "waiting for this container to start". Either syncPod or syncTerminatingPod is responsible for handling this state.
In preparation for allowing `sync*Pod` methods to be cancelled when the pod transitions to terminating, pass context to the appropriate methods in the Kubelet that might need to be cancelled within a deadline or due to user input. Does not change the behavior of those functions. Change interface methods and stored structs for easier review.
In preparation for allowing `sync*Pod` methods to be cancelled when the pod transitions to terminating, pass context to the appropriate methods in the Kubelet that might need to be cancelled within a deadline or due to user input. Does not change the behavior of those functions. Propagate core long running methods (CRI, GC, streaming) up out of methods towards the top-level. Methods with context imply remote invocations of CRI and so the context is propagated up until it hits either a method carrying a context (such as HTTP servers, or `sync*Pod` which will perform cancellation), a top level wait loop, or a boundary with a subsystem that does not clearly deserve a context propagation. Top level loops get context.Background() and the rest get context.TODO(). This commits contains all such transitions, and subsequent PRs are propagating context only.
In preparation for allowing `sync*Pod` methods to be cancelled when the pod transitions to terminating, pass context to the appropriate methods in the Kubelet that might need to be cancelled within a deadline or due to user input. Does not change the behavior of those functions. For the remote CRI service, remove the wrappers that injected a new context and call the direct context equivalents for timeout.
In preparation for allowing `sync*Pod` methods to be cancelled when the pod transitions to terminating, pass context to the appropriate methods in the Kubelet that might need to be cancelled within a deadline or due to user input. Does not change the behavior of those functions. Contains all propagation of context upwards when the parent method either now passes context, or context was already present.
In preparation for allowing `sync*Pod` methods to be cancelled when the pod transitions to terminating, pass context to the appropriate methods in the Kubelet that might need to be cancelled within a deadline or due to user input. Does not change the behavior of those functions. Update test methods to pass contexts where changed.
@smarterclayton: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
Comment |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: smarterclayton The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
None of the refactors touch it as it is deleted upstream
a8fc39b
to
c5618c9
Compare
@smarterclayton: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
Comment |
c5618c9
to
031d339
Compare
@smarterclayton: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
Comment |
031d339
to
6c7d7b3
Compare
@smarterclayton: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
Comment |
Track how long it takes for pod updates to propagate from detection to successful change on API server. Will guide future improvements in pod start and shutdown latency.
Streamline the pod status manager to track the set of updated pods instead of using a buffered channel. Remove the time the pod status lock is held by moving other expensive checks out of the loop, which also opens the door for parallelizing the status queue later. Avoid making some checks twice now that syncPod is only called from syncBatch. Protect apiStatusVersions under the pod status lock as well to prevent accidents.
Some pod status transitions directly impact end-to-end user latency in the Kubelet, such as pods going ready, going unready, or becoming Succeeded or Failed. Prioritize the order that pods are updated in to minimize that latency.
6c7d7b3
to
6bb6cb3
Compare
@smarterclayton: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
Comment |
@smarterclayton: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
Comment |
@smarterclayton: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@smarterclayton: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
#1161, #1162 should significantly reduce perceived end to end latency of startup and shutdown