Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet bootstrap: start hostNetwork pods before we have PodCIDR #35526

Merged
merged 3 commits into from
Nov 6, 2016

Conversation

justinsb
Copy link
Member

@justinsb justinsb commented Oct 25, 2016

Network readiness was checked in the pod admission phase, but pods that
fail admission are not retried. Move the check to the pod start phase.

Issue #35409
Issue #35521


This change is Reviewable

@k8s-github-robot k8s-github-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. release-note-label-needed labels Oct 25, 2016
@justinsb
Copy link
Member Author

justinsb commented Oct 25, 2016

This should fix the problem that needed a reversion of #33347 (static pods that fail admission because network-not-ready are not rescheduled). Should I apply this on top of #33347 and squash for review?

cc @yujuhong @dchen1107

Also not clear whether I should move the disk space check out of the admission check. I'd lean towards doing it separately.

@yujuhong
Copy link
Contributor

static pods that fail admission because network-not-ready are not rescheduled

All regular pods got rejected whenever kubelet restarts was a equally serious problem :-)

Also not clear whether I should move the disk space check out of the admission check. I'd lean towards doing it separately.

I think we actually want to reject pods if disk is full, so not sure if we want to change that, and definitely not in this PR.
As for the not retrying the static pods, we will fix that separately (reusing #35521 for this purpose).

Should I apply this on top of #33347 and squash for review?

Sounds good. Thanks!

@justinsb
Copy link
Member Author

All regular pods got rejected whenever kubelet restarts was a equally serious problem :-)

Ooops... sorry!

I rebased on top of the prior (now reverted) PR. Looks much simpler now!

@justinsb justinsb changed the title WIP: kubelet: move network readiness check to pod-start phase kubelet bootstrap: start hostNetwork pods before we have PodCIDR Oct 25, 2016
@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 25, 2016
@yujuhong yujuhong assigned dchen1107 and yujuhong and unassigned vishh Oct 25, 2016
if errOuter := canRunPod(pod); errOuter != nil || pod.DeletionTimestamp != nil || apiPodStatus.Phase == api.PodFailed {
errOuter := canRunPod(pod)
if errOuter == nil {
rs := kl.runtimeState.networkErrors()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't really distinguish network errors here. Would there be a case where even pods using the host network cannot run?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @bprashanth I don't think this works with the CRI integration, in which case, we'll just sync all the pods and fail?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not entirely sure what you're saying :-) . We now start synchronizing before the network is ready, so that we can see the hostNetwork pods. But we can't actually start pods that don't use hostNetwork, because the network plug is not ready.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I think I understand...

Perhaps if we renamed to networkPluginErrors, or podNetworkErrors? The idea is that if the network plugin has a problem, we put it into this separate list. An error that prevents hostNetwork pods would presumably not be a network plugin error, and would go into the general errors list (which prevents the pod sync loop from even starting)

if errOuter == nil {
rs := kl.runtimeState.networkErrors()
if len(rs) != 0 && !podUsesHostNetwork(pod) {
errOuter = fmt.Errorf("Network is not ready: %v", rs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (errors should start with a small case letter): s/Network/network

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you want to set errOuter and get the running containers get killed in line 1321. Just returning an error here should be sufficient. In that case, no need to reuse the errOuter variable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the error nit - thanks.

Can you explain when we do & don't want to kill the pod? Is it the case that we kill if the failure is permanent, but not if the failure is transient?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, we only kill pods if the pod is getting deleted (DeletionTimestamp is set), evicted, or if kubelet is not capable (in terms of security context) to run the pod at all. Although I am not sure how kubelet could've started the pod for the latter case to begin with...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the pod that kubelet determined not to run would be rejected at admission, and not entering syncpod at all.

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 27, 2016
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 31, 2016
@justinsb justinsb added release-note Denotes a PR that will be considered when it comes time to generate release notes. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed release-note-label-needed labels Oct 31, 2016
@justinsb
Copy link
Member Author

Rebased.

@yujuhong are you happy with this, if we we renamed to networkPluginErrors, or podNetworkErrors? Any preference?

@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 31, 2016
@k8s-ci-robot
Copy link
Contributor

Jenkins GKE smoke e2e failed for commit 4b37edcb0ed5ef4e6e82ffd735dfdd4343a94d5f. Full PR test history.

The magic incantation to run this job again is @k8s-bot cvm gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@zmerlynn
Copy link
Member

I'd like to get this PR in soon, since it's blocking the kops-updown test from passing (and has for a while).

@zmerlynn zmerlynn added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Oct 31, 2016
@dchen1107 dchen1107 added this to the v1.5 milestone Nov 2, 2016
@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 3, 2016
We do now admit pods (unlike the first attempt), but now we will stop
non-hostnetwork pods from starting if the network is not ready.

Issue kubernetes#35409
We no longer pass in a "dummy" pod-cidr (10.123.45.0/29), and rely on
reconcile-cidr=true instead (which is the default).
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 4, 2016
@k8s-ci-robot
Copy link
Contributor

Jenkins GCE e2e failed for commit 68c0b42. Full PR test history.

The magic incantation to run this job again is @k8s-bot cvm gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@justinsb
Copy link
Member Author

justinsb commented Nov 4, 2016

@dchen1107 now with added unit-test :-)

@zkmoney
Copy link

zkmoney commented Nov 4, 2016

@justinsb if this passes all tests, when do you expect this fix to be available in kubelet?

@edevil
Copy link
Contributor

edevil commented Nov 4, 2016

@zkmoney I think this is dependant on a Kubernetes member applying the LGTM label.

FWIW, LGTM to me. :)

@justinsb
Copy link
Member Author

justinsb commented Nov 4, 2016

Assuming it gets LGTM-ed before freeze, it should be in 1.5. I don't know if we would backport it .. we should probably add a flag if we did.

@yujuhong yujuhong added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Nov 4, 2016
@zkmoney
Copy link

zkmoney commented Nov 4, 2016

Ah got it, thank you. I am eagerly awaiting the ability to remove a dummy,
broken --pod-cidr value from my bootstrap kubelet sysconfig and still have
the master be able to start the api server pods (I run all of the
components via manifests).

If there is any good solution to this that you know of in the meantime, I
would be grateful. Thanks.

On Fri, Nov 4, 2016 at 2:06 PM Justin Santa Barbara <
notifications@github.com> wrote:

Assuming it gets LGTM-ed before freeze, it should be in 1.5. I don't know
if we would backport it .. we should probably add a flag if we did.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#35526 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ATM25fLEHpbObgeYxIVn7mmPg8avgR35ks5q63QjgaJpZM4KgI3e
.

@edevil
Copy link
Contributor

edevil commented Nov 4, 2016

Yes, I'm in the same boat of the dummy pod-cidr. And this is a problem for daemon-set pods because they are started on the master nodes. I can try to cherry-pick this for 1.4 if someone sets the appropriate label.

@zkmoney
Copy link

zkmoney commented Nov 4, 2016

@edevil FWIW for daemonsets where they don't actually need to be running on the masters I've used the spec.template.spec.nodeSelector to target nodes with the appropriate labels and avoid masters, but I know this isn't a viable solution for all or long term.

@edevil
Copy link
Contributor

edevil commented Nov 4, 2016

@zkmoney This is for the logging component (Fluentd) so I do want it to run on every host. :)

@dchen1107
Copy link
Member

LGTM

@justinsb justinsb added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Nov 6, 2016
@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 182a09c into kubernetes:master Nov 6, 2016
@edevil
Copy link
Contributor

edevil commented Nov 7, 2016

Can this be backported to 1.4?

@zkmoney
Copy link

zkmoney commented Nov 8, 2016

👍 For that @edevil

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/backlog Higher priority than priority/awaiting-more-evidence. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants