Add networking-daemonsets feature #1195

davidmccormick · 2018-03-21T14:23:07Z

Deploy Calico and Flannel networking as a daemonset with kubernetes API as backing store. Removes the need for nodes connecting to etcd and frees up node podCIDR leases faster -addressing cluster role issue: flannel-io/flannel#954.

Experimental feature, disabled by default.
Kubernetes Controllers become responsible for allocating node cidrs.
Switch between calico+flannel (canal) or flannel.
Fast roll out into existing clusters with minimal disruption.
Optional calico typha service for easing load on apiservers in large clusters.

davidmccormick · 2018-03-21T15:49:52Z

I need to update the tests with the expectation of the NetworkingDaemonSets type which gets set up with default values.

…t can be imported

codecov-io · 2018-03-22T09:31:06Z

Codecov Report

Merging #1195 into master will increase coverage by 0.22%.
The diff coverage is 78.94%.

@@            Coverage Diff             @@
##           master    #1195      +/-   ##
==========================================
+ Coverage    36.2%   36.42%   +0.22%     
==========================================
  Files          63       63              
  Lines        3823     3846      +23     
==========================================
+ Hits         1384     1401      +17     
- Misses       2224     2229       +5     
- Partials      215      216       +1

Impacted Files	Coverage Δ
model/userdata.go	`60% <0%> (-1.02%)`	⬇️
core/controlplane/config/config.go	`62.38% <83.33%> (+0.39%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 548bb38...2c4232a. Read the comment docs.

mumoshu

Some comments on naming, but code LGTM overall!
Thank you very much for your efforts 🎉

mumoshu · 2018-03-27T11:31:39Z

core/controlplane/config/config.go

+type NetworkingDaemonSets struct {
+	Enabled         bool        `yaml:"enabled"`
+	Typha           bool        `yaml:"typha"`
+	CalicoNodeImage model.Image `yaml:"calico-node-image"`


Super nit but would you mind changing yaml keys to lowerCamelCase for consistency with other existing keys?

Sure - no problem

mumoshu · 2018-03-27T11:42:33Z

core/controlplane/config/templates/cloud-config-worker

    - name: kubelet.service
      command: start
      runtime: true
      content: |
        [Unit]
+        {{ if not .Experimental.NetworkingDaemonSets.Enabled -}}


How about renaming NetworkingDaemonSets to e.g. SelfHostedCanal? After reading the canal doc, I now believe we can safely call it so

Can we move the new configuration(NetworkingDaemonSets before the my suggested rename) out of .Experimental? Sorry for the confusion but we're basically avoiding to use the .Experimental key anymore. We're in pre-1.0 and there is no strict rule about what's considered experimental or when it graduates from experimental and so on.

However, a note about the experimental feature in cluster.yaml comments would be welcomed!

I didn't want to name it Canal as you can have selfhosted flannel if you don't supply UseCalico: true. Perhaps another name better captures the hosted-in-kubernetes, removed-from-coreos-supplied-flannel-ness of the change? Shall we call it KubeHostedNetworking, or SelfHostedNetworkFabric maybe?

I'll remove it from Experimental and make it top level?

Thanks for pointing it out! Made sense a lot.

How about Kubernetes.Networking.SelfHosting then?

Probably we can also consider migrating other network-related keys like useCalico under that for more cohesion, in the future.

Ok, I'll make those changes and re-push

Err, did you want me to create a Kubernetes stanza? If so should I move kubeDns, kubeProxy, kubernetesDashboard, + lots of others into it? (Feels quite a big change).

Or create it just for my networking settings for now?

#kubernetes: # networking: # selfHosting: # enabled: false # typha: false # calicoNodeImage: # repo: quay.io/calico/node # tag: v3.0.3 # calicoCniImage: # repo: quay.io/calico/cni # tag: v2.0.1 # flannelImage: # repo: quay.io/coreos/flannel # tag: v0.9.1 # flannelCniImage: # repo: quay.io/coreos/flannel-cni # tag: v0.3.0 # typhaImage: # repo: quay.io/calico/typha # tag: v0.6.2

Or could I get away with KubernetesNetworkingSelfHosting? e.g.

#kubernetesNetworkingSelfHosting: # enabled: false # typha: false # calicoNodeImage: # repo: quay.io/calico/node # tag: v3.0.3 # calicoCniImage: # repo: quay.io/calico/cni # tag: v2.0.1 # flannelImage: # repo: quay.io/coreos/flannel # tag: v0.9.1 # flannelCniImage: # repo: quay.io/coreos/flannel-cni # tag: v0.3.0 # typhaImage: # repo: quay.io/calico/typha # tag: v0.6.2

mumoshu · 2018-03-27T17:33:54Z

Or create it just for this settings?

This is what I meant!
I'll create separate issues for migrating others.

…osting and squash a few bugs hampering migrations rebuild templates after rebase and merge from master

davidmccormick · 2018-03-29T16:41:34Z

Hi, I've updated the settings to kubernetes.networking.selfhosting, squashed a couple of bugs, and merged the latest changes. I have new versions of the template files and these seem to be conflicting and not sure how to clear them.

Is their a way to select my versions? Can you do that?

…g-daemonsets

davidmccormick · 2018-03-29T17:04:22Z

core/controlplane/config/templates/cloud-config-etcd

@@ -270,6 +270,7 @@ coreos:
            {{end -}}

            [Service]
+            Type=simple


etcd rollout stalls at the first etcd server in a new cluster when the etcdadm-check (etcd disaster recovery enabled) service is enabled. This service, etcd-member, is type 'notify' by default - but etcd won't send the notify until the cluster is healthy (and so systemd reports etcd as starting-not started even though etcd is running ok). Etcadm-check won't start until etcd-member has started and so no cfn-signal can be sent to start the subsequent etcd servers. This means that your roll out gets stuck at the first etcd server. By changing the type of etcd-member to 'simple' we allow etcdadm-check start and the cloud-formation signal can go out. There is no side effect from changing the type so that as soon as etcd has been started everything can continue.

@davidmccormick Good point!

But I take this change to actually result in a side-effect of we no longer postpone the rolling update of etcd nodes in case etcd-member failed to join the existing cluster. I think this is better explained in #1206 (comment).

Not saying this change is no good. However, at least, this chance should be discussed and made separately!
That being said, would you mind reverting this?

mumoshu · 2018-04-02T01:38:32Z

core/controlplane/config/templates/stack-template.json

+        "ToPort": 5473
+      },
+      "Type": "AWS::EC2::SecurityGroupIngress"
+    },


Note to self: This rule should be enabled only when typha is enabled.

mumoshu · 2018-04-02T01:39:15Z

core/controlplane/config/templates/cluster.yaml

+# ISSUES1 - calico iptables rules remain on the nodes, recycle nodes to remove.
+# ISSUES2 - Existing pod ip addresses are not tracked and can cause clashing ips - please cordon all nodes whilst
+# performing the upgrade until all workers/nodes have been recycyled - then it will be ok.
+# ISSUES3 - Rolling back to old networking will require all nodes are cycled/rolled and the network is disrupted throughout.


I really appreciate the comprehensive and detailed documentation here!

mumoshu · 2018-04-02T07:10:49Z

core/controlplane/config/templates/cloud-config-worker

+      enable: false
+      content: |
+        [Unit]
+        Description=Perform actions which help when migrating from legacy to selfhosted networking.


Shouldn't this be run when .Kubernetes.Networking.SelfHosting.Enabled is true?
The proposed implementation seems to be doing the opposite of that, or perhaps I'm missing something?

Hi, no - the migration helper is intended to be a safeguard for rolling the new networking in an existing cluster and should only be deployed to legacy networking hosts and not new nodes where SelfHostedNetworking has been enabled (i.e. no local flannel). What it does is stop the local flannel service and remove its devices so that a more rapid migration can be achieved. This is in order to minimise cluster disruption in the time between the first new controller coming up and adding the new daemonsets and all of the nodes having been recycled. I did find an issue in my testing where the kubelet was still taking the ip addresses from the old flannel rather than the new and so I thought it would make sense to stop it so this can't happen.

@davidmccormick I think I started to understand the good of taking care of migration from old to new networking for minimal downtime.

But on the other hand I'm still unable to understand why this implementation helps that. To me, this systemd unit seems to run "whenever a node w/ legacy networking starts", but not "before a node w/ new networking comes up".

Also, how does a node w/ legacy networking starts successfully if you disable non-self-hosted flanneld this way?

Probably I'm still missing something...

Hi, sorry - I haven't explained it all that well and it is a little complicated. The net-migration-helper is deployed to all nodes when the self-hosting option has been set to off/disabled - ie. all legacy style nodes. On its own it wont do anything - you see it will only trigger if someone writes a file /etc/kubernetes/cni/net.d/net-migration which I'm using as a kind of semaphore/trigger file. This file gets written when either of the self-hosting canal or flannel daemonsets are deployed (effectively signalling all legacy nodes that a migration is happening so they can shutdown their local flannel). Migrated nodes don't need to worry about this file though - they don't have a local flannel and so this service isn't deployed to them.

The whole goal of this is to minimise the disruption when rolling a legacy cluster to the new networking (if every node has to be rolled before the network is good this would cause lengthy disruption on large clusters). This way the cluster can reconfigure its networking with only a minute or less of disruption as it rolls out (in fact in some rolls there is no disruption at all).

As soon as the user selects self-hosting in their cluster.yaml and rolls their cluster, and once the first new controller comes up and runs install-kube-system, it will deploy the canal or flannel daemonsets to ALL nodes - so for a while we have legacy nodes which have their previously running flannel on them AND the new flannel deployed by daemonset. By writing this semaphore file /etc/kubernetes/cni/net.d/net-migration as part of the daemonset - we have a way of getting the legacy nodes to stop their local flannel and prevent any clashes that can occur from having flannel running twice. Once all the nodes have rolled and their are no legacy nodes in the cluster then the file has no purpose any longer and just gets ignored.

I saw some strange behaviour in testing that led me to believe that the safest way to deploy into an existing cluster is to roll the cluster so the net-migration helper goes out first on legacy nodes. I'm not convinced that it is needed in all cases, but it does give me a bit more security for performing a roll out into a large cluster.

Do you need any more docs/testing?

Totally understood! Thank you so much for the clarification.

For anyone interested, a few points you should know to understand about how this is implemented:

net-migration-helper.path is implicitly paired to net-migration-helper.service, so that once the file is created on the path, the .service counterpart is started by systemd.

A canal daemonset deployed via the first controller node rolled via kube-aws update to enable canal writes the file to all the controller and worker nodes, so that net-migration-helper.path is triggered on every legacy node for faster migration. Implementation: https://github.com/kubernetes-incubator/kube-aws/pull/1195/files#diff-ef25536c536667a40b993d4d24ab7567R1241

…telsDotCom/kube-aws into feature/networking-daemonsets

cknowles · 2018-04-05T15:12:05Z

Awesome! I would heart this PR more if I could. I believe this will resolve #704 as well which I would need to work on soon. Might be worth commented it as experimental in cluster.yaml?

mumoshu · 2018-04-06T01:23:20Z

core/controlplane/config/templates/cloud-config-controller

@@ -284,8 +317,7 @@ coreos:
        --container-runtime={{.ContainerRuntime}} \
        --rkt-path=/usr/bin/rkt \
        --rkt-stage1-image=coreos.com/rkt/stage1-coreos \
-        --node-labels node-role.kubernetes.io/master{{if .NodeLabels.Enabled}},{{.NodeLabels.String}} \
-        {{end}} \
+        --node-labels=node-role.kubernetes.io/master="",kubernetes.io/role=master,{{if .NodeLabels.Enabled}},{{.NodeLabels.String}}{{end}} \


Not sure if it actually affects anything but you might unexpectedly have double commands when NodeLabels is enabled: kubernetes.io/role=master,,{{.NodeLabels.String}}

Ah yes let me send a small correction

mumoshu · 2018-04-06T12:03:20Z

@davidmccormick Thank you so much for all the your awesome works. I've reviewed this again, and am happy to get this merged 👍

jorge07 · 2018-04-10T14:32:53Z

Did someone test this?

davidmccormick · 2018-04-10T14:36:54Z

I did lots of testing of it - are you having a particular issue?

jorge07 · 2018-04-10T14:49:15Z

We're having issues creating a cluster from master and looks related with canal. I'll try to collect as much logs as possible and post it here asap

davidmccormick · 2018-04-10T15:22:32Z

Ok - let me know what you have and what settings you are using in cluster.yaml?
Also please let me know if it is a new cluster or existing - and if existing, what was the configured networking - flannel or flannel + calico?

rastut · 2018-04-11T11:09:32Z

Hey i am a colleague from @jorge07 . Yesterday we were doing some testing with the master branch and we have found that the cluster was not able to run containers correctly. We are running the cluster without self-hosting the network .We have been looking logs around, and the problem was that kubelet was not able to locate cni binaries.

Apr 11 10:42:10 ip-10-0-2-59.us-east-2.compute.internal kubelet-wrapper[1410]: E0411 10:42:10.546050    1410 cni.go:259] Error adding network: failed to find plugin "loopback" in path [/opt/loopback/bin /opt/cni/bin]
Apr 11 10:42:10 ip-10-0-2-59.us-east-2.compute.internal kubelet-wrapper[1410]: E0411 10:42:10.546706    1410 cni.go:220] Error while adding to cni lo network: failed to find plugin "loopback" in path [/opt/loopback/bin /opt/cni/bin]
Apr 11 10:42:10 ip-10-0-2-59.us-east-2.compute.internal kubelet-wrapper[1410]: E0411 10:42:10.547302    1410 cni.go:259] Error adding network: failed to find plugin "loopback" in path [/opt/loopback/bin /opt/cni/bin]
Apr 11 10:42:10 ip-10-0-2-59.us-east-2.compute.internal kubelet-wrapper[1410]: E0411 10:42:10.551897    1410 cni.go:220] Error while adding to cni lo network: failed to find plugin "loopback" in path [/opt/loopback/bin /opt/cni/bin]

After taking a look basically i have found that the kubelet system unit basically is mounting the host path /opt/cni/bin inside the kubelet container, this is causing that the path of the container is override with the host path and the kubelet is not able to found the cni binaries.

https://github.com/kubernetes-incubator/kube-aws/blob/3f116956373f6fa817618b257b4479a8cb464b45/core/controlplane/config/templates/cloud-config-worker#L329-L330

This is the output of one of the nodes that currently are not working correctly:

core@ip-10-0-2-59 /opt/cni/bin $ rkt list                                                                                                  
UUID		APP		IMAGE NAME				STATE		CREATED		STARTED		NETWORKS
9d5945e6	awscli		quay.io/coreos/awscli:master		exited garbage	1 hour ago	1 hour ago	
9f0dfd23	flannel		quay.io/coreos/flannel:v0.10.0		exited		1 hour ago	1 hour ago	
eadd1353	flannel		quay.io/coreos/flannel:v0.10.0		running		1 hour ago	1 hour ago	
f1dcc1fc	hyperkube-amd64	k8s.gcr.io/hyperkube-amd64:v1.9.1	running		59 minutes ago	59 minutes ago	
core@ip-10-0-2-59 /opt/cni/bin $ sudo rkt enter f1dcc1fc                                                                                   
enter: no command specified, assuming "/bin/bash"
root@ip-10-0-2-59:/# ls /opt/cni/bin
root@ip-10-0-2-59:/#

If we delete the argument from the unit then the containers began to run correctly and on the kubelet container we can see the binaries:

core@ip-10-0-1-182 /usr/lib64/systemd $ sudo rkt enter 4cd0af93                                                                  
enter: no command specified, assuming "/bin/bash"
root@ip-10-0-1-182:/# ls /opt/cni/bin
bridge	dhcp  flannel  host-local  ipvlan  loopback  macvlan  portmap  ptp  sample  tuning  vlan
root@ip-10-0-1-182:/#

davidmccormick · 2018-04-11T13:04:44Z

Hi thanks for the info - I will have a look at making the mount dependent on the networking selected. I'm guessing that you are not running with calico? as this would have installed the cni binaries as part of its bootstrap.

The quickest fix whilst I work on testing the fix would be to enable the kubernetes-networking-self hosting feature with type 'flannel' (and would allow more testing of this feature). Apologies for the bug, and I'll be opening a pull request with a fix as soon as I am happy with the fix.

ktateish · 2018-04-11T13:30:29Z

Hi all,
I've opened an issue #1232 without noticing this discussion. (Thanks @kevtaylor for notifying me)

I'll keep #1232 open for users who search issues with ContainerCreating, so please include Close #1232 in your commit message when you create a fix commit for this problem in the future.

I'll also keep investigating it.

rastut · 2018-04-11T13:31:21Z

Yep not using calico. I have just try some changes, modifying the template, but then i am getting some strange failure. If i go through the templates and i modify the mount, then i am getting an error in relation to the signature verification of the image.

Apr 11 13:05:52 ip-10-0-2-229.us-east-2.compute.internal kubelet-wrapper[2258]: run: signature verification for docker images is not supported (try --insecure-options=image)

davidmccormick · 2018-04-11T13:38:45Z

Thanks, it looks like I introduced this bug with cni after my flannel testing. I'm testing an update to the cloud-config that copies the binaries into the host /opt/cni/bin directory. I need to keep this mounted so that users can switch from legacy flannel/calico mode to the self-hosting daemonsets with minimal cluster disruption.

k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 21, 2018

preisinger and others added 2 commits March 22, 2018 09:02

closes kubernetes-retired#1188 - commit generated templates so projec…

05354e5

…t can be imported

Add networking-daemonsets feature

d0b6895

davidmccormick force-pushed the feature/networking-daemonsets branch from de5fc2e to d0b6895 Compare March 22, 2018 09:05

mumoshu reviewed Mar 27, 2018

View reviewed changes

davidmccormick force-pushed the feature/networking-daemonsets branch from f86e848 to 8e2eb2a Compare March 29, 2018 15:53

Updates to change configuration stanza to kubernetes.networking.selfH…

e1d19bb

…osting and squash a few bugs hampering migrations rebuild templates after rebase and merge from master

davidmccormick force-pushed the feature/networking-daemonsets branch from 8e2eb2a to e1d19bb Compare March 29, 2018 16:29

Merge remote-tracking branch 'upstream/master' into feature/networkin…

f72f3c3

…g-daemonsets

davidmccormick commented Mar 29, 2018

View reviewed changes

mumoshu reviewed Apr 2, 2018

View reviewed changes

mumoshu and others added 4 commits April 2, 2018 19:46

Merge branch 'master' into feature/networking-daemonsets

e8f9f3a

Only open Typha port when Typha is enabled.

bbfcd40

Merge branch 'feature/networking-daemonsets' of https://github.com/Ho…

66da725

…telsDotCom/kube-aws into feature/networking-daemonsets

Remove introduced weird typo.

59ece95

mumoshu reviewed Apr 6, 2018

View reviewed changes

Fix extra comma and remove etcd spin-up fix.

2c4232a

mumoshu added this to the v0.9.10.rc-1 milestone Apr 6, 2018

mumoshu merged commit 07eb6b0 into kubernetes-retired:master Apr 6, 2018

davidmccormick deleted the feature/networking-daemonsets branch April 10, 2018 14:37

kevtaylor mentioned this pull request Apr 11, 2018

Some pods stay on ContainerCreating state #1232

Closed

davidmccormick mentioned this pull request Apr 11, 2018

Bug/flannel cni binary install #1235

Merged

cknowles mentioned this pull request May 5, 2018

Support hostPort with CNI #704

Closed

Add networking-daemonsets feature #1195

Add networking-daemonsets feature #1195

Conversation

davidmccormick commented Mar 21, 2018

davidmccormick commented Mar 21, 2018

codecov-io commented Mar 22, 2018 • edited Loading

Codecov Report

mumoshu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidmccormick Mar 27, 2018 • edited Loading

Choose a reason for hiding this comment

mumoshu commented Mar 27, 2018

davidmccormick commented Mar 29, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidmccormick Apr 3, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mumoshu Apr 6, 2018 • edited Loading

Choose a reason for hiding this comment

cknowles commented Apr 5, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mumoshu commented Apr 6, 2018

jorge07 commented Apr 10, 2018

davidmccormick commented Apr 10, 2018

jorge07 commented Apr 10, 2018

davidmccormick commented Apr 10, 2018 • edited Loading

rastut commented Apr 11, 2018

davidmccormick commented Apr 11, 2018 • edited Loading

ktateish commented Apr 11, 2018

rastut commented Apr 11, 2018 • edited Loading

davidmccormick commented Apr 11, 2018

codecov-io commented Mar 22, 2018 •

edited

Loading

davidmccormick Mar 27, 2018 •

edited

Loading

davidmccormick Apr 3, 2018 •

edited

Loading

mumoshu Apr 6, 2018 •

edited

Loading

davidmccormick commented Apr 10, 2018 •

edited

Loading

davidmccormick commented Apr 11, 2018 •

edited

Loading

rastut commented Apr 11, 2018 •

edited

Loading