Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubelet status manager sync the status of local Pods #77661

Merged

Conversation

mfpierre
Copy link
Contributor

@mfpierre mfpierre commented May 9, 2019

What type of PR is this?
/kind bug

What this PR does / why we need it:
Re-introducing the change made in #57106

This enables the kubelet pod list to have correctly updated statuses for static pods.

Example of the static kube-proxy pod without the fix:

    "status": {
        "phase": "Pending"
    }

And with the fix:

    "status": {
        "phase": "Running",
        "conditions": [
            ...
        ],
        "hostIP": "10.132.0.5",
        "podIP": "10.132.0.5",
        "startTime": "2019-05-07T13:41:40Z",
        "containerStatuses": [
            {
                "name": "kube-proxy",
                "state": {
                    "running": {
                        "startedAt": "2019-05-09T12:18:49Z"
                    }
                },
                "lastState": {
                    "terminated": {
                        "exitCode": 255,
                        "reason": "Error",
                        "startedAt": "2019-05-09T12:16:42Z",
                        "finishedAt": "2019-05-09T12:18:12Z",
                        "containerID": "docker://505208a6ed12d2ba0ef80533554d156616805946ceeb52c929c06d48a245410a"
                    }
                },
                "ready": true,
                "restartCount": 3,
                "image": "k8s.gcr.io/kube-proxy:v1.15.0-alpha.0.669_fa86a27d02b784-dirty",
                "imageID": "docker://sha256:a38ce2de08f1da8ab989f29209e3f44f542e8b0811e31c72dc983d0827ef6d8b",
                "containerID": "docker://4f0806183f7492e3095512461c4179603988429daa88008d57c56c104a6c8d0e"
            }
        ],
        "qosClass": "Burstable"
    }

The mentioned PR #57106 was reverted because e2e reboot tests were failing see #59889

But I tested the change locally and ran the e2e Feature:Reboot tests (using kubetest with gce provider) and it seems to run fine:

Ran 6 of 3261 Specs in 1033.409 seconds
SUCCESS! -- 6 Passed | 0 Failed | 0 Pending | 3255 Skipped PASS

Ginkgo ran 1 suite in 17m14.755647878s
Test Suite Passed
2019/05/09 14:21:55 process.go:155: Step './hack/ginkgo-e2e.sh --ginkgo.focus=\[Feature:Reboot\] --minStartupPods=8' finished in 17m15.873100718s

Which issue(s) this PR fixes:
Fixes #61717

Special notes for your reviewer:
As per #61717 (comment) reboot tests seems to be OK now.

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels May 9, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @mfpierre. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 9, 2019
@k8s-ci-robot k8s-ci-robot requested review from mtaufen and vishh May 9, 2019 13:38
@mfpierre
Copy link
Contributor Author

mfpierre commented May 9, 2019

cc @dashpole (following up on #61717 (comment)) e2e reboot test seems to pass now

@mattjmcnaughton
Copy link
Contributor

@mfpierre thanks for your work here :) I just want to confirm that its ok to merge the changes to pkg/kubelet/kubelet_getters.go without the changes to pkg/kubelet/kubelet_pods.go which were in the original PR? If so, can you add a brief explanation of why (just for my own curiosity :) )?

@mfpierre
Copy link
Contributor Author

mfpierre commented May 9, 2019

@mattjmcnaughton if you look back at the changes made on kubelet_pods.go in the original PR it was a refactoring adding an early return but the logic was the same.

Didn't pick up this changes to make the code change in the PR as minimal as possible.

@dashpole
Copy link
Contributor

dashpole commented May 9, 2019

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 9, 2019
@mattjmcnaughton
Copy link
Contributor

mattjmcnaughton commented May 10, 2019 via email

Copy link
Member

@yastij yastij left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/priority important-soon

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels May 14, 2019
@dashpole
Copy link
Contributor

/lgtm

@mtaufen
Copy link
Contributor

mtaufen commented May 14, 2019

/assign @Random-Liu

@Random-Liu
Copy link
Member

I think the test failure was caused by the other part of the change in #57106

I think this part should be safe. Let's see.

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mfpierre, Random-Liu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tpepper
Copy link
Member

tpepper commented May 30, 2019

@kubernetes/sig-node-pr-reviews can I get a status update on this PR's three cherry picks? None of them have been approved and one is held...

@hoegaarden
Copy link
Contributor

@kubernetes/sig-node-pr-reviews can I get a status update on this PR's three cherry picks? None of them have been approved and one is held...

ping @kubernetes/sig-node-pr-reviews -- let us know in case you have concerns with approving those cherry picks. Or please approve them, if not ;) Thanks a lot!

@yujuhong
Copy link
Contributor

yujuhong commented Jul 2, 2019

I'm not sure this is worth cherry-picking back to previous branches. It's not fixing a regression, and is more close to supporting a new feature.

rphillips added a commit to rphillips/origin that referenced this pull request Sep 13, 2019
This enables the kubelet pod list to have correctly updated statuses for
static pods.

ref: kubernetes/kubernetes#77661
rphillips added a commit to rphillips/origin that referenced this pull request Sep 17, 2019
This enables the kubelet pod list to have correctly updated statuses for
static pods.

ref: kubernetes/kubernetes#77661
rphillips added a commit to rphillips/origin that referenced this pull request Sep 17, 2019
This enables the kubelet pod list to have correctly updated statuses for
static pods.

ref: kubernetes/kubernetes#77661
openshift-publish-robot pushed a commit to openshift/kubernetes that referenced this pull request Sep 17, 2019
This enables the kubelet pod list to have correctly updated statuses for
static pods.

ref: kubernetes#77661

Origin-commit: 4d137036db7b8881076ed2425887e6bac6d52739
openshift-publish-robot pushed a commit to openshift/kubernetes that referenced this pull request Sep 20, 2019
This enables the kubelet pod list to have correctly updated statuses for
static pods.

ref: kubernetes#77661

Origin-commit: 952e25e2211caa32620f42a176f021a5fc730836
@mfpierre
Copy link
Contributor Author

mfpierre commented Nov 8, 2019

@tedyu this PR is effectively fixing the status of static pods on the local kubelet pod list, and we (datadog) and many of our customers do rely on this fix starting kubernetes 1.15 to fix their monitoring.
I think we should focus on fixing the bug we think it introduced instead of reverting it, as it would affect all the people relying on what this PR fixes.

@tedyu
Copy link
Contributor

tedyu commented Nov 8, 2019

@mfpierre
I agree that this shouldn't be reverted.
How about the following change ?

@@ -167,8 +168,10 @@ func (kl *Kubelet) GetPods() []*v1.Pod {
        // a kubelet running without apiserver requires an additional
        // update of the static pod status. See #57106
        for _, p := range pods {
-               if status, ok := kl.statusManager.GetPodStatus(p.UID); ok {
-                       p.Status = status
+               if kubelettypes.IsStaticPod(p) {
+                       if status, ok := kl.statusManager.GetPodStatus(p.UID); ok {
+                               p.Status = status
+                       }
                }
        }
        return pods

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Kubelet doesn't update static pod status