Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubelet doesn't update static pod status #61717

Closed
JulienBalestra opened this issue Mar 26, 2018 · 14 comments · Fixed by #77661
Closed

Kubelet doesn't update static pod status #61717

JulienBalestra opened this issue Mar 26, 2018 · 14 comments · Fixed by #77661
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@JulienBalestra
Copy link
Contributor

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:

In the kubelet, when using --pod-manifest-path the kubelet creates static pods but doesn't update the status accordingly in the PodList (kubelet /pods).

You can find that kind of Pods with the following annotation:

"kubernetes.io/config.source": "file"

The status will stay like that:

{
  "phase": "Pending",
  "conditions": [
    {
      "type": "PodScheduled",
      "status": "True",
      "lastProbeTime": null,
      "lastTransitionTime": "2018-03-26T14:38:07Z"
    }
  ]
}

What you expected to happen:

The status is updated.

How to reproduce it (as minimally and precisely as possible):

Use a static pod.

Anything else we need to know?:

I tried to fix it in #57106 but the test-grid continuously failed at the reboot phase (see #59889).

I had to revert it (#59948, #59892).

I'm creating this issue to track this bug.

The problem in the e2e tests are probably from this function.

This is an old output of the failure: https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce-reboot/20693?log#log

@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. kind/bug Categorizes issue or PR as related to a bug. labels Mar 26, 2018
@yujuhong
Copy link
Contributor

I don't think this is a bug. /pods only exposes what kubelet sees from the sources, and doesn't include statuses known only to kubelet. Since for static pods, their statuses are not written back to the source, they do not show up in /pods.

The fact that it has the PodScheduled condition is unintentional and a bug. It has just been fixed recently.

@smoya
Copy link

smoya commented Mar 26, 2018

I confirm that this behaviour is still happening. Having pods with status Pending where they are actually Running. The kubernetes API shows them as Running when Kubelet /pods endpoint as Pending.

@dims
Copy link
Member

dims commented Apr 2, 2018

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 2, 2018
@dashpole
Copy link
Contributor

dashpole commented Apr 3, 2018

I agree with @yujuhong that this is working as intended.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 2, 2018
@JulienBalestra
Copy link
Contributor Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 2, 2018
@JulienBalestra
Copy link
Contributor Author

@dashpole @yujuhong I understand that the source of truth is currently the API server and that details can be retrieved there. We have however found this to be a scaling challenge.

In our use case, we are a monitoring application that runs as a DaemonSet. Monitoring systems such as ours need to continuously refresh the PodList because Kubernetes operators want a real time feedback on application health via metrics, logs and traces.

Having each node that is monitoring query the Kubernetes apiserver to get each of the Pod statuses results in high load on the API server. This is exacerbated and becomes difficult to scale as clusters grow to a large number of nodes.

We have however found that having each node query its own Kubelet API /pods locally is quite scaleable and allows this work to be distributed across the cluster. Ideally we would be able to use this API for static pods as well since the state of the static Pods is stored inside the kubelet.

Other benefits of using the kublet API are that:
It provides the link to the container runtime in the status section, which is helpful for metadata collection and association.
Is easily managed the RBAC system through webhooks

If there is another approach or API that you’d suggest we’re happy to rethink how we collect this data.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 14, 2018
@irabinovitch
Copy link

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 14, 2018
@vvchik
Copy link

vvchik commented Nov 17, 2018

I also meet this issue.

@relistan
Copy link

relistan commented Jan 5, 2019

Reporting the status as "Pending" might be consistent, but it's incorrect. If the pod is created statically and is health checking OK, it's not "Pending". The Kubelet API is almost very useful but the status flag seems to make it much less so since it's not possible to tell from the /pods API endpoint if a static Pod is actually up successfully or not.

@hkaj
Copy link

hkaj commented Feb 12, 2019

Hi team,

I'm reaching out to see if we can revisit this issue. We need this to be able to monitor pods locally without having agents reach out to the apiserver. Currently the pod is in a wrong state (Pending when it's actually Running), and we're missing all status information (pod IP, container statuses, etc.).

We would be happy and willing to update #57106 to solve the issue if we get the green light that this will be accepted.

cc @yujuhong @dashpole

@dashpole
Copy link
Contributor

I think as long as we fix the test failures the change caused, we can re-introduce it. Feel free to assign me on a PR which accomplishes this, and make sure the reboot test passes.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

Successfully merging a pull request may close this issue.