Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes No ref for container in probes after kubelet restart #84792

Conversation

EricMountain
Copy link
Contributor

@EricMountain EricMountain commented Nov 5, 2019

What type of PR is this?

/kind bug

What this PR does / why we need it:

After restarting, the kubelet synchs previously running containers. However it does not populate the container RefManager map (pkg/kubelet/container/container_reference_manager.go).

Any probes that fail against containers that were running before the kubelet restarted do not generate events, and instead log:

No ref for container "containerd://..." (...)

So for instance kubectl describe pod ... with a container (started before kubelet restart) failing its readiness probe will show:

...
Containers:
  xxxxxx:
...
    State:          Running
      Started:      Mon, 16 Sep 2019 11:28:31 +0200
    Ready:          False
    Restart Count:  0
...
    Liveness:   tcp-socket :6379 delay=30s timeout=8s period=10s #success=1 #failure=60
    Readiness:  http-get http://:ready/ready delay=30s timeout=8s period=10s #success=1 #failure=3
...
Conditions:
  Type           Status
  Initialized    True
  Ready          False
  PodScheduled   True
...
Events:          <none>

I.e. a non-ready container, however no matching events. If a container is not-ready, then it is because it is failing its Readiness probe and we expect corresponding events.

Liveness probe events are similarly not created in such a situation, though the container is restarted. So we see short container existence time and restarts, but not the matching liveness probe failure events.

Which issue(s) this PR fixes

Fixes the No ref log mentionned in #83336, though probably not the underlying issue.

Special notes for your reviewer:

2 commits:

  • One to fix the No ref issue and ensure events are created by using almost the same recordContainerEvent() function as implemented in pkg/kubelet/kuberuntime/kuberuntime_container.go. The difference is that we don't do container ID/name substitution as the container ID is not present in the messages passed in.
  • One to remove RefManager (pkg/kubelet/container/container_reference_manager.go): it was only used by the prober to retrieve container ID - since we generate the ID on the fly now, it is unnecessary.

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

N/A

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 5, 2019
@k8s-ci-robot
Copy link
Contributor

Welcome @EricMountain!

It looks like this is your first PR to kubernetes/kubernetes 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kubernetes has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Nov 5, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @EricMountain. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Nov 5, 2019
@EricMountain
Copy link
Contributor Author

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. area/kubelet and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 5, 2019
@EricMountain
Copy link
Contributor Author

/assign @vishh

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 5, 2019
Copy link
Contributor

@mattjmcnaughton mattjmcnaughton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ok-to-test

Thanks for your work here!

If you know it, can you share more around the original motivation for the container reference manager? In other words, why did we add this manager, instead of generating references on the fly every time?
Are there benefits offered by the container reference manager that we may loose if we get rid of it entirely?

If there may be benefits, an alternative option would be "falling back" to generating the reference on the fly only if it does not exist in the container manage after the kubelet restart? If I'm understanding correctly, such a change would require less code change... we would just need to update RefManager.GetRef.

pkg/kubelet/prober/prober.go Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 5, 2019
@EricMountain EricMountain force-pushed the eric.mountain/simple_probe_no_ref_fix_master branch from f6b4cb3 to ddbe5ff Compare November 5, 2019 17:16
@EricMountain
Copy link
Contributor Author

/retest

@EricMountain
Copy link
Contributor Author

/test pull-kubernetes-e2e-kind

@EricMountain EricMountain force-pushed the eric.mountain/simple_probe_no_ref_fix_master branch from ddbe5ff to a859f2c Compare November 5, 2019 22:08
@EricMountain
Copy link
Contributor Author

Working on understanding the test failure.

@EricMountain EricMountain force-pushed the eric.mountain/simple_probe_no_ref_fix_master branch from a859f2c to 71a28d7 Compare November 6, 2019 10:22
@EricMountain
Copy link
Contributor Author

Retest due to failure creating kind cluster.

/test pull-kubernetes-e2e-kind

@EricMountain EricMountain force-pushed the eric.mountain/simple_probe_no_ref_fix_master branch from 71a28d7 to 7842644 Compare November 13, 2019 16:09
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 13, 2019
@EricMountain
Copy link
Contributor Author

/test pull-kubernetes-kubemark-e2e-gce-big

@EricMountain
Copy link
Contributor Author

/test pull-kubernetes-node-e2e-containerd

Copy link
Contributor

@mattjmcnaughton mattjmcnaughton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Thanks for your work here :) I agree with merging this change for the short-term, and will take a look at your other diff.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 14, 2019
@mattjmcnaughton
Copy link
Contributor

/assign @tallclair

@EricMountain
Copy link
Contributor Author

Hi @tallclair! Would you take a look at this PR? Thanks!

@dims
Copy link
Member

dims commented Jan 23, 2020

/priority important-soon
/milestone v1.18

@k8s-ci-robot k8s-ci-robot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jan 23, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Jan 23, 2020
@k8s-ci-robot k8s-ci-robot removed the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Jan 23, 2020
@gianarb
Copy link

gianarb commented Feb 12, 2020

Hello @matt-tyler @tallclair !
Bug Triage team here for the 1.18 release. This is a friendly reminder that code freeze is scheduled for 5 March. Is this issue still intended for milestone 1.18, or should it extend to the next one? Thanks in advance!

Copy link
Member

@tallclair tallclair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/hold

pkg/kubelet/prober/prober.go Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 19, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: EricMountain, tallclair

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 19, 2020
@EricMountain EricMountain force-pushed the eric.mountain/simple_probe_no_ref_fix_master branch from 65fe21a to 4cb28f6 Compare February 21, 2020 12:33
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 21, 2020
@gianarb
Copy link

gianarb commented Feb 24, 2020

Hello @EricMountain, @tallclair !
Bug Triage team here for the 1.18 release. This is a friendly reminder that code freeze is scheduled for 5 March. I saw you approved this PR already but it got some new code, can you review again in order to try to get it merged before code freeze?
Thanks

@tallclair
Copy link
Member

/lgtm
/hold cancel

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Feb 25, 2020
@k8s-ci-robot k8s-ci-robot merged commit 46fcbcf into kubernetes:master Feb 25, 2020
@EricMountain EricMountain deleted the eric.mountain/simple_probe_no_ref_fix_master branch February 26, 2020 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants