kubelet bootstrap: start hostNetwork pods before we have PodCIDR #35526

justinsb · 2016-10-25T15:35:26Z

Network readiness was checked in the pod admission phase, but pods that
fail admission are not retried. Move the check to the pod start phase.

Issue #35409
Issue #35521

This change is

justinsb · 2016-10-25T16:51:36Z

This should fix the problem that needed a reversion of #33347 (static pods that fail admission because network-not-ready are not rescheduled). Should I apply this on top of #33347 and squash for review?

cc @yujuhong @dchen1107

Also not clear whether I should move the disk space check out of the admission check. I'd lean towards doing it separately.

yujuhong · 2016-10-25T17:13:29Z

static pods that fail admission because network-not-ready are not rescheduled

All regular pods got rejected whenever kubelet restarts was a equally serious problem :-)

Also not clear whether I should move the disk space check out of the admission check. I'd lean towards doing it separately.

I think we actually want to reject pods if disk is full, so not sure if we want to change that, and definitely not in this PR.
As for the not retrying the static pods, we will fix that separately (reusing #35521 for this purpose).

Should I apply this on top of #33347 and squash for review?

Sounds good. Thanks!

justinsb · 2016-10-25T20:14:14Z

All regular pods got rejected whenever kubelet restarts was a equally serious problem :-)

Ooops... sorry!

I rebased on top of the prior (now reverted) PR. Looks much simpler now!

yujuhong · 2016-10-25T21:50:19Z

pkg/kubelet/kubelet.go

-	if errOuter := canRunPod(pod); errOuter != nil || pod.DeletionTimestamp != nil || apiPodStatus.Phase == api.PodFailed {
+	errOuter := canRunPod(pod)
+	if errOuter == nil {
+		rs := kl.runtimeState.networkErrors()


We don't really distinguish network errors here. Would there be a case where even pods using the host network cannot run?

/cc @bprashanth I don't think this works with the CRI integration, in which case, we'll just sync all the pods and fail?

Not entirely sure what you're saying :-) . We now start synchronizing before the network is ready, so that we can see the hostNetwork pods. But we can't actually start pods that don't use hostNetwork, because the network plug is not ready.

Oh I think I understand...

Perhaps if we renamed to networkPluginErrors, or podNetworkErrors? The idea is that if the network plugin has a problem, we put it into this separate list. An error that prevents hostNetwork pods would presumably not be a network plugin error, and would go into the general errors list (which prevents the pod sync loop from even starting)

yujuhong · 2016-10-25T21:52:02Z

pkg/kubelet/kubelet.go

+	if errOuter == nil {
+		rs := kl.runtimeState.networkErrors()
+		if len(rs) != 0 && !podUsesHostNetwork(pod) {
+			errOuter = fmt.Errorf("Network is not ready: %v", rs)


nit (errors should start with a small case letter): s/Network/network

I don't think you want to set errOuter and get the running containers get killed in line 1321. Just returning an error here should be sufficient. In that case, no need to reuse the errOuter variable.

Fixed the error nit - thanks.

Can you explain when we do & don't want to kill the pod? Is it the case that we kill if the failure is permanent, but not if the failure is transient?

AFAIK, we only kill pods if the pod is getting deleted (DeletionTimestamp is set), evicted, or if kubelet is not capable (in terms of security context) to run the pod at all. Although I am not sure how kubelet could've started the pod for the latter case to begin with...

Most of the pod that kubelet determined not to run would be rejected at admission, and not entering syncpod at all.

justinsb · 2016-10-31T16:06:39Z

Rebased.

@yujuhong are you happy with this, if we we renamed to networkPluginErrors, or podNetworkErrors? Any preference?

k8s-ci-robot · 2016-10-31T16:20:27Z

Jenkins GKE smoke e2e failed for commit 4b37edcb0ed5ef4e6e82ffd735dfdd4343a94d5f. Full PR test history.

The magic incantation to run this job again is @k8s-bot cvm gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

zmerlynn · 2016-10-31T20:22:46Z

I'd like to get this PR in soon, since it's blocking the kops-updown test from passing (and has for a while).

We do now admit pods (unlike the first attempt), but now we will stop non-hostnetwork pods from starting if the network is not ready. Issue kubernetes#35409

We no longer pass in a "dummy" pod-cidr (10.123.45.0/29), and rely on reconcile-cidr=true instead (which is the default).

k8s-ci-robot · 2016-11-04T04:37:53Z

Jenkins GCE e2e failed for commit 68c0b42. Full PR test history.

The magic incantation to run this job again is @k8s-bot cvm gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

justinsb · 2016-11-04T04:49:27Z

@dchen1107 now with added unit-test :-)

zkmoney · 2016-11-04T16:54:45Z

@justinsb if this passes all tests, when do you expect this fix to be available in kubelet?

edevil · 2016-11-04T17:09:15Z

@zkmoney I think this is dependant on a Kubernetes member applying the LGTM label.

FWIW, LGTM to me. :)

justinsb · 2016-11-04T18:05:34Z

Assuming it gets LGTM-ed before freeze, it should be in 1.5. I don't know if we would backport it .. we should probably add a flag if we did.

zkmoney · 2016-11-04T18:29:06Z

Ah got it, thank you. I am eagerly awaiting the ability to remove a dummy,
broken --pod-cidr value from my bootstrap kubelet sysconfig and still have
the master be able to start the api server pods (I run all of the
components via manifests).

If there is any good solution to this that you know of in the meantime, I
would be grateful. Thanks.

On Fri, Nov 4, 2016 at 2:06 PM Justin Santa Barbara <
notifications@github.com> wrote:

Assuming it gets LGTM-ed before freeze, it should be in 1.5. I don't know
if we would backport it .. we should probably add a flag if we did.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#35526 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ATM25fLEHpbObgeYxIVn7mmPg8avgR35ks5q63QjgaJpZM4KgI3e
.

edevil · 2016-11-04T18:33:02Z

Yes, I'm in the same boat of the dummy pod-cidr. And this is a problem for daemon-set pods because they are started on the master nodes. I can try to cherry-pick this for 1.4 if someone sets the appropriate label.

zkmoney · 2016-11-04T18:41:38Z

@edevil FWIW for daemonsets where they don't actually need to be running on the masters I've used the spec.template.spec.nodeSelector to target nodes with the appropriate labels and avoid masters, but I know this isn't a viable solution for all or long term.

edevil · 2016-11-04T18:43:40Z

@zkmoney This is for the logging component (Fluentd) so I do want it to run on every host. :)

dchen1107 · 2016-11-04T21:16:30Z

LGTM

k8s-github-robot · 2016-11-06T20:16:26Z

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

k8s-github-robot · 2016-11-06T20:53:14Z

Automatic merge from submit-queue

edevil · 2016-11-07T00:10:12Z

Can this be backported to 1.4?

zkmoney · 2016-11-08T04:04:01Z

👍 For that @edevil

googlebot added the cla: yes label Oct 25, 2016

k8s-github-robot assigned vishh Oct 25, 2016

k8s-github-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. release-note-label-needed labels Oct 25, 2016

justinsb mentioned this pull request Oct 25, 2016

WIP: kubelet: retry pods that fail admission #35527

Closed

justinsb force-pushed the fix_35521_b branch from 825ceb9 to c9452ee Compare October 25, 2016 20:11

justinsb changed the title ~~WIP: kubelet: move network readiness check to pod-start phase~~ kubelet bootstrap: start hostNetwork pods before we have PodCIDR Oct 25, 2016

k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 25, 2016

yujuhong assigned dchen1107 and yujuhong and unassigned vishh Oct 25, 2016

yujuhong reviewed Oct 25, 2016

View reviewed changes

WillPlatnick mentioned this pull request Oct 26, 2016

Monitor-Telegraf Pod is on CrashLoopBackOff state on Master node With K8s 1.4.0 deis/monitor#146

Closed

k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 27, 2016

yujuhong mentioned this pull request Oct 27, 2016

CRI: Add a "Status" call in the API report readiness of the runtime #35701

Closed

justinsb mentioned this pull request Oct 27, 2016

Pods running on master doesn't have correct pod ip kubernetes/kops#448

Closed

justinsb force-pushed the fix_35521_b branch from e0bf6ae to 4b37edc Compare October 31, 2016 16:05

k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 31, 2016

justinsb added release-note Denotes a PR that will be considered when it comes time to generate release notes. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed release-note-label-needed labels Oct 31, 2016

k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 31, 2016

zmerlynn added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Oct 31, 2016

dchen1107 added this to the v1.5 milestone Nov 2, 2016

ajohnstone mentioned this pull request Nov 3, 2016

set hostname -f in nodes kubernetes/kops#792

Closed

k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 3, 2016

justinsb added 2 commits November 4, 2016 00:11

Create hostNetwork pods even if network plugin not ready

f8eb179

We do now admit pods (unlike the first attempt), but now we will stop non-hostnetwork pods from starting if the network is not ready. Issue kubernetes#35409

Update bringup: don't pass in dummy pod-cidr

68c0b42

We no longer pass in a "dummy" pod-cidr (10.123.45.0/29), and rely on reconcile-cidr=true instead (which is the default).

justinsb force-pushed the fix_35521_b branch from e0f2f36 to 68c0b42 Compare November 4, 2016 04:12

k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 4, 2016

kubelet: testing for hostNetwork with network errors

8862811

yujuhong added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Nov 4, 2016

justinsb added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Nov 6, 2016

k8s-github-robot merged commit 182a09c into kubernetes:master Nov 6, 2016

WillPlatnick mentioned this pull request Nov 22, 2016

Can't reach kube-dns service from pod on master kubernetes/kops#970

Closed

chentao1596 mentioned this pull request Dec 5, 2016

WIP:kubelet: support multi-headers when getting pod from HTTP source #38089

Closed

bacongobbler mentioned this pull request Sep 25, 2017

Fluentd pod crashing on Azure Container Service deis/workflow#847

Open

Cryptophobia mentioned this pull request Mar 20, 2018

Fluentd pod crashing on Azure Container Service teamhephy/workflow#6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubelet bootstrap: start hostNetwork pods before we have PodCIDR #35526

kubelet bootstrap: start hostNetwork pods before we have PodCIDR #35526

justinsb commented Oct 25, 2016 •

edited by k8s-oncall

Loading

justinsb commented Oct 25, 2016 •

edited

Loading

yujuhong commented Oct 25, 2016

justinsb commented Oct 25, 2016

yujuhong Oct 25, 2016

yujuhong Oct 25, 2016

justinsb Oct 26, 2016

justinsb Oct 26, 2016

yujuhong Oct 25, 2016

yujuhong Oct 25, 2016

justinsb Oct 26, 2016

yujuhong Nov 1, 2016

yujuhong Nov 1, 2016

justinsb commented Oct 31, 2016

k8s-ci-robot commented Oct 31, 2016

zmerlynn commented Oct 31, 2016

k8s-ci-robot commented Nov 4, 2016

justinsb commented Nov 4, 2016

zkmoney commented Nov 4, 2016

edevil commented Nov 4, 2016

justinsb commented Nov 4, 2016

zkmoney commented Nov 4, 2016

edevil commented Nov 4, 2016

zkmoney commented Nov 4, 2016

edevil commented Nov 4, 2016

dchen1107 commented Nov 4, 2016

k8s-github-robot commented Nov 6, 2016

k8s-github-robot commented Nov 6, 2016

edevil commented Nov 7, 2016

zkmoney commented Nov 8, 2016

kubelet bootstrap: start hostNetwork pods before we have PodCIDR #35526

kubelet bootstrap: start hostNetwork pods before we have PodCIDR #35526

Conversation

justinsb commented Oct 25, 2016 • edited by k8s-oncall Loading

justinsb commented Oct 25, 2016 • edited Loading

yujuhong commented Oct 25, 2016

justinsb commented Oct 25, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

justinsb commented Oct 31, 2016

k8s-ci-robot commented Oct 31, 2016

zmerlynn commented Oct 31, 2016

k8s-ci-robot commented Nov 4, 2016

justinsb commented Nov 4, 2016

zkmoney commented Nov 4, 2016

edevil commented Nov 4, 2016

justinsb commented Nov 4, 2016

zkmoney commented Nov 4, 2016

edevil commented Nov 4, 2016

zkmoney commented Nov 4, 2016

edevil commented Nov 4, 2016

dchen1107 commented Nov 4, 2016

k8s-github-robot commented Nov 6, 2016

k8s-github-robot commented Nov 6, 2016

edevil commented Nov 7, 2016

zkmoney commented Nov 8, 2016

justinsb commented Oct 25, 2016 •

edited by k8s-oncall

Loading

justinsb commented Oct 25, 2016 •

edited

Loading