Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod status updates take longer to propagate to the API than necessary #116617

Open
smarterclayton opened this issue Mar 14, 2023 · 9 comments
Open
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability.

Comments

@smarterclayton
Copy link
Contributor

smarterclayton commented Mar 14, 2023

What happened?

#107897 originally identified a number of inefficiencies in how pod status is reported back to the API:

  1. Pods are queued in a channel and then processed one by one - this causes HoL blocking and writes every observed status change, but pod status is level driven (split into kubelet: Remove status manager channel #116615 for inclusion into Give terminal phase correctly to all pods that will not be restarted #115331)
  2. Status is a GET then PATCH, but we now have server side apply and using it would reduce 2 RT to 1 RT and reduce QPS
  3. We treat all status updates the same, when in reality certain transitions should be prioritized (unready -> ready, Running -> Succeeded|Failed)
  4. Status is single threaded, and we could easily try multiple
  5. It's hard to know when our status cache has been brought up to date (so that we can bypass a SSA attempt), we could use ResourceVersion tracking for that (apimachinery proposed allowing it to be parsed)

In general these contribute to #113606 and should be improved.

/sig node
/sig scalability

What did you expect to happen?

Faster

How can we reproduce it (as minimally and precisely as possible)?

See kubelet logs. Several e2e tests stress this flow, but we need a test that shows the improvements.

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
# paste output here

Cloud provider

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

@smarterclayton smarterclayton added the kind/bug Categorizes issue or PR as related to a bug. label Mar 14, 2023
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 14, 2023
@smarterclayton smarterclayton changed the title Pod status updates take longer to propagate back to the API than necessary Pod status updates take longer to propagate to the API than necessary Mar 14, 2023
@sftim
Copy link
Contributor

sftim commented Mar 14, 2023

We treat all status updates the same, when in reality certain transitions should be prioritized (unready -> ready, Running -> Succeeded|Failed)

One key actual state report for me is to see that the kubelet has at least seen the pod and, if you like, acknowledged its responsibility for further status updates. I think that's the update to set status.startTime.

@smarterclayton
Copy link
Contributor Author

smarterclayton commented Mar 14, 2023

Right, that's another good one. Please add others. Note however that it may take 50ms for a round trip of a pod status, and some containers might start and go ready within that time. Would be good to think through how we want to bias towards these, or whether workload characteristics might matter.

@pacoxu
Copy link
Member

pacoxu commented Mar 15, 2023

/sig node
/sig scalability

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 15, 2023
@SergeyKanzhelev SergeyKanzhelev added this to Triage in SIG Node Bugs Mar 15, 2023
@sftim
Copy link
Contributor

sftim commented Mar 16, 2023

With SSA, the kubelet can fire off multiple PATCH requests that - the kubelet believes - can arrive in an arbitrary order at the API server and produce the right outcome. I mean that's the dream, right?

@smarterclayton
Copy link
Contributor Author

Right - Kubelet would simply need to guarantee that no two SSAs for the same pod are inflight at once.

@sftim
Copy link
Contributor

sftim commented Mar 16, 2023

kubelet would simply need to guarantee that no two SSAs for the same pod are inflight at once.

Not even that. No two conflicting SSAs. For example, I think we can send a request to apply status.startTime and, regardless of its outcome, follow that up with a request that applies status.startTime (to the same value!) and also updates status.conditions. The order that these apply shouldn't matter.

@SergeyKanzhelev
Copy link
Member

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 22, 2023
@SergeyKanzhelev SergeyKanzhelev moved this from Triage to Triaged in SIG Node Bugs Mar 22, 2023
@smarterclayton
Copy link
Contributor Author

Re: two inflight SSAs for non-conflicting operations to the same pod

For our own sanity, we probably don't want to have to reason about that while debugging :)

@k8s-triage-robot
Copy link

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

  • Confirm that this issue is still relevant with /triage accepted (org members only)
  • Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability.
Development

No branches or pull requests

6 participants