Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bootstrap: Start hostNetwork pods even if network plugin not ready #33347

Merged
merged 2 commits into from
Oct 24, 2016

Conversation

justinsb
Copy link
Member

@justinsb justinsb commented Sep 23, 2016

bootstrap: Start hostNetwork pods even if network plugin not ready

This change is Reviewable

@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note-label-needed labels Sep 23, 2016
@k8s-github-robot
Copy link

@k8s-ci-robot
You must link to the test flake issue which caused you to request this manual re-test.
Re-test requests should be in the form of: k8s-bot test this issue: #<number>
Here is the list of open test flakes.

@justinsb justinsb assigned justinsb and unassigned yujuhong Sep 23, 2016
@justinsb
Copy link
Member Author

WIP so self-assigning for now

@justinsb
Copy link
Member Author

This addresses (some of?) #32900 It is also fairly simple.

I realize this is a hard sell for 1.4.0. But if we want to put it into 1.4 at all, maybe we can incorporate it with a flag? Then installation tools could opt-in instead of setting pod-cidr. This is of course predicated on this not being obviously wrong!

cc @bprashanth @thockin

@@ -155,7 +155,7 @@ assemble_kubelet_flags() {
if [ ! -z "${KUBELET_APISERVER:-}" ] && \
[ ! -z "${KUBELET_CERT:-}" ] && \
[ ! -z "${KUBELET_KEY:-}" ]; then
KUBELET_CMD_FLAGS="${KUBELET_CMD_FLAGS} --api-servers=https://${KUBELET_APISERVER} --register-schedulable=false --reconcile-cidr=false --pod-cidr=10.123.45.0/29"
KUBELET_CMD_FLAGS="${KUBELET_CMD_FLAGS} --api-servers=https://${KUBELET_APISERVER} --register-schedulable=false --reconcile-cidr=false"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you drop pod-cidr, that means you have to set reconcile-cidr.

@freehan
Copy link
Contributor

freehan commented Sep 26, 2016

@roberthbailey Where does GKE take bootstrap parameter from?

Bascially, we want to maintain the current setup for GKE, where pod-cidr was set on master.
For other setups like AWS/GCE, master bootstraps without pod-cidr and uses node.PodCIDR once assigned.

@justinsb justinsb changed the title WIP: Start hostNetwork pods even if network plugin not ready Start hostNetwork pods even if network plugin not ready Sep 27, 2016
@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 27, 2016
@justinsb justinsb removed their assignment Sep 29, 2016
@justinsb
Copy link
Member Author

justinsb commented Oct 4, 2016

Can I get some feedback on this as fix? If so, I propose we're early enough in the cycle that we just merge it (once it is rebased and cleaned up and passes e2e), plenty of time to revert / fix any problems it uncovers.

@freehan
Copy link
Contributor

freehan commented Oct 4, 2016

What about the comment regarding reconcile-cidr?

This is a good one. Let us get it in.

@justinsb
Copy link
Member Author

justinsb commented Oct 4, 2016

Thanks @freehan . Yes, I was probably over-eager when expunging pod-cidr - I will fix it up properly :-) Just wanted to check that we liked the general approach first!

@caseydavenport
Copy link
Member

Haven't read the code yet, but @justinsb I thought this was already the case? It certainly is for CNI plugins. Is this just a kubenet fix?

@justinsb
Copy link
Member Author

justinsb commented Oct 4, 2016

@caseydavenport are you sure? We hit this pretty hard in 1.4, with the /30 -> /29 issue... Maybe other plugins aren't correctly reporting status? Or maybe the plugin is just ready immediately, and doesn't need to wait for a podcidr? Check out the kubelet changes - they are pretty small...

@caseydavenport
Copy link
Member

Yeah, my experience was with the Calico plugin, which is ready immediately and doesn't need to wait for a pod CIDR. I think I understand this PR now - it's essentially handling the case where --configure-cbr0 is set, which will fail until a pod CIDR is configured even for host networked pods. For cases like the Calico case where --configure-cbr0 isn't used, we weren't hitting this before.

Sorry for the confusion!

@freehan
Copy link
Contributor

freehan commented Oct 4, 2016

In the Calico case, it is using the cni network plugin right? I believe that one already separates hostnetwork and podnetwork.

@@ -2197,6 +2197,15 @@ func (kl *Kubelet) rejectPod(pod *api.Pod, reason, message string) {
// can be admitted, a brief single-word reason and a message explaining why
// the pod cannot be admitted.
func (kl *Kubelet) canAdmitPod(pods []*api.Pod, pod *api.Pod) (bool, string, string) {
if rs := kl.runtimeState.networkErrors(); len(rs) != 0 {
hostNetwork := false
if pod.Spec.SecurityContext != nil && pod.Spec.SecurityContext.HostNetwork {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about instead:

if (!podUsesHostNetwork(pod)) {
    return false, "NetworkNotReady", fmt.Sprintf(...)
}

We added podUsesHostNetwork() specifically to cut down on this repeated pattern :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - will do

@dcbw
Copy link
Member

dcbw commented Oct 5, 2016

Yeah, my experience was with the Calico plugin, which is ready immediately and doesn't need to wait for a pod CIDR. I think I understand this PR now - it's essentially handling the case where --configure-cbr0 is set, which will fail until a pod CIDR is configured even for host networked pods. For cases like the Calico case where --configure-cbr0 isn't used, we weren't hitting this before.

@caseydavenport almost right. Currently, if the network plugin reports an error from Status(), kubelet will block running HostNetwork=true pods that shouldn't depend on the network plugin at all. This PR removes that erroneous inter-dependency and allows HostNetwork pods to run even when the network plugin isn't ready, since the two shouldn't have anything to do with each other.

So it's not really --configure-cbr0 specific, it's a problem for any network plugin that may report an error from Status(). But right now that's basically just kubenet, since CNI has no way to filter status back up to Kubernetes.

@yujuhong
Copy link
Contributor

yujuhong commented Oct 6, 2016

The downside is that there is no feedback loop to prevent schedulers from assigning non-host network pods to the node. There may be a period of time after kubelet startup that it reports ready, but rejects all the (non-host network) pods. /cc @kubernetes/sig-node

@edevil
Copy link
Contributor

edevil commented Oct 12, 2016

Is there any hope of seeing this in a near future in a point release?

Copy link
Member

@mikedanese mikedanese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks ok to me.

@@ -2197,6 +2197,15 @@ func (kl *Kubelet) rejectPod(pod *api.Pod, reason, message string) {
// can be admitted, a brief single-word reason and a message explaining why
// the pod cannot be admitted.
func (kl *Kubelet) canAdmitPod(pods []*api.Pod, pod *api.Pod) (bool, string, string) {
if rs := kl.runtimeState.networkErrors(); len(rs) != 0 {
hostNetwork := false
if pod.Spec.SecurityContext != nil && pod.Spec.SecurityContext.HostNetwork {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@dcbw
Copy link
Member

dcbw commented Oct 24, 2016

The pkg/kubelet bits LGTM

@thockin thockin added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 24, 2016
@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 3c84164 into kubernetes:master Oct 24, 2016
@edevil
Copy link
Contributor

edevil commented Oct 24, 2016

Is this something that will end up on the next 1.4.x or will we have to wait for 1.5?

@roberthbailey
Copy link
Contributor

@edevil - if you want it in the 1.4 branch feel free to send a cherry pick pull request.

@k8s-cherrypick-bot
Copy link

Removing label cherrypick-candidate because no release milestone was set. This is an invalid state and thus this PR is not being considered for cherry-pick to any release branch. Please add an appropriate release milestone and then re-add the label.

@roberthbailey roberthbailey added this to the v1.4 milestone Oct 24, 2016
@jessfraz jessfraz added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Oct 24, 2016
@crassirostris
Copy link

crassirostris commented Oct 25, 2016

Guys, I'm sorry for not being vocal, but this PR broke suites gce-es-logging and gci-gce-es-logging and GCL-logging test

Logging doesn't work at all after this PR!

To give you more details:

Fluentd pod currently is in the node manifest. In kubelet logs you can see the following event

I1025 08:46:20.354155    1247 server.go:613] Event(api.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"fluentd-elasticsearch-jenkins-e2e-minion-group-mndb", UID:"befba121e25c310938143db137604e03", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'NetworkNotReady' Network is not ready: [Kubenet does not have netConfig. This is most likely due to lack of PodCIDR]

And no mentions of fluentd after that.

@justinsb
Copy link
Member Author

@crassirostris I'll take a look. Is there an open issue?

@piosz
Copy link
Member

piosz commented Oct 25, 2016

#35409

@justinsb
Copy link
Member Author

Thanks - looking now.

@yujuhong
Copy link
Contributor

FYI, this PR may have cause a lot of tests in serial suite to flake as well: #35519

@foxish
Copy link
Contributor

foxish commented Oct 25, 2016

@yujuhong @justinsb do we want to revert this PR?

@yujuhong
Copy link
Contributor

I'd vote for reverting this first.

@mikedanese mikedanese removed the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Oct 25, 2016
@jessfraz jessfraz removed this from the v1.4 milestone Oct 25, 2016
@thockin
Copy link
Member

thockin commented Oct 25, 2016

I would be open to a cherrypick, but it needs to be careful

On Mon, Oct 24, 2016 at 2:13 AM, André Cruz notifications@github.com
wrote:

Is this something that will end up on the next 1.4.x or will we have to
wait for 1.5?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#33347 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVM8A235zZq-u6mNkr0iy29m2udhNks5q3HaigaJpZM4KEkYq
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet