-
Notifications
You must be signed in to change notification settings - Fork 295
Add networking-daemonsets feature #1195
Add networking-daemonsets feature #1195
Conversation
I need to update the tests with the expectation of the NetworkingDaemonSets type which gets set up with default values. |
de5fc2e
to
d0b6895
Compare
Codecov Report
@@ Coverage Diff @@
## master #1195 +/- ##
==========================================
+ Coverage 36.2% 36.42% +0.22%
==========================================
Files 63 63
Lines 3823 3846 +23
==========================================
+ Hits 1384 1401 +17
- Misses 2224 2229 +5
- Partials 215 216 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments on naming, but code LGTM overall!
Thank you very much for your efforts 🎉
core/controlplane/config/config.go
Outdated
type NetworkingDaemonSets struct { | ||
Enabled bool `yaml:"enabled"` | ||
Typha bool `yaml:"typha"` | ||
CalicoNodeImage model.Image `yaml:"calico-node-image"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super nit but would you mind changing yaml keys to lowerCamelCase for consistency with other existing keys?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure - no problem
- name: kubelet.service | ||
command: start | ||
runtime: true | ||
content: | | ||
[Unit] | ||
{{ if not .Experimental.NetworkingDaemonSets.Enabled -}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- How about renaming
NetworkingDaemonSets
to e.g.SelfHostedCanal
? After reading the canal doc, I now believe we can safely call it so - Can we move the new configuration(NetworkingDaemonSets before the my suggested rename) out of
.Experimental
? Sorry for the confusion but we're basically avoiding to use the.Experimental
key anymore. We're in pre-1.0 and there is no strict rule about what's considered experimental or when it graduates from experimental and so on.- However, a note about the experimental feature in cluster.yaml comments would be welcomed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't want to name it Canal as you can have selfhosted flannel if you don't supply UseCalico: true. Perhaps another name better captures the hosted-in-kubernetes, removed-from-coreos-supplied-flannel-ness of the change? Shall we call it KubeHostedNetworking, or SelfHostedNetworkFabric maybe?
I'll remove it from Experimental and make it top level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing it out! Made sense a lot.
How about Kubernetes.Networking.SelfHosting
then?
Probably we can also consider migrating other network-related keys like useCalico
under that for more cohesion, in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll make those changes and re-push
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Err, did you want me to create a Kubernetes stanza? If so should I move kubeDns, kubeProxy, kubernetesDashboard, + lots of others into it? (Feels quite a big change).
Or create it just for my networking settings for now?
#kubernetes:
# networking:
# selfHosting:
# enabled: false
# typha: false
# calicoNodeImage:
# repo: quay.io/calico/node
# tag: v3.0.3
# calicoCniImage:
# repo: quay.io/calico/cni
# tag: v2.0.1
# flannelImage:
# repo: quay.io/coreos/flannel
# tag: v0.9.1
# flannelCniImage:
# repo: quay.io/coreos/flannel-cni
# tag: v0.3.0
# typhaImage:
# repo: quay.io/calico/typha
# tag: v0.6.2
Or could I get away with KubernetesNetworkingSelfHosting? e.g.
#kubernetesNetworkingSelfHosting:
# enabled: false
# typha: false
# calicoNodeImage:
# repo: quay.io/calico/node
# tag: v3.0.3
# calicoCniImage:
# repo: quay.io/calico/cni
# tag: v2.0.1
# flannelImage:
# repo: quay.io/coreos/flannel
# tag: v0.9.1
# flannelCniImage:
# repo: quay.io/coreos/flannel-cni
# tag: v0.3.0
# typhaImage:
# repo: quay.io/calico/typha
# tag: v0.6.2
This is what I meant! |
f86e848
to
8e2eb2a
Compare
…osting and squash a few bugs hampering migrations rebuild templates after rebase and merge from master
8e2eb2a
to
e1d19bb
Compare
Hi, I've updated the settings to kubernetes.networking.selfhosting, squashed a couple of bugs, and merged the latest changes. I have new versions of the template files and these seem to be conflicting and not sure how to clear them. Is their a way to select my versions? Can you do that? |
@@ -270,6 +270,7 @@ coreos: | |||
{{end -}} | |||
|
|||
[Service] | |||
Type=simple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
etcd rollout stalls at the first etcd server in a new cluster when the etcdadm-check (etcd disaster recovery enabled) service is enabled. This service, etcd-member, is type 'notify' by default - but etcd won't send the notify until the cluster is healthy (and so systemd reports etcd as starting-not started even though etcd is running ok). Etcadm-check won't start until etcd-member has started and so no cfn-signal can be sent to start the subsequent etcd servers. This means that your roll out gets stuck at the first etcd server. By changing the type of etcd-member to 'simple' we allow etcdadm-check start and the cloud-formation signal can go out. There is no side effect from changing the type so that as soon as etcd has been started everything can continue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidmccormick Good point!
But I take this change to actually result in a side-effect of we no longer postpone the rolling update of etcd nodes in case etcd-member failed to join the existing cluster. I think this is better explained in #1206 (comment).
Not saying this change is no good. However, at least, this chance should be discussed and made separately!
That being said, would you mind reverting this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed it
"ToPort": 5473 | ||
}, | ||
"Type": "AWS::EC2::SecurityGroupIngress" | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: This rule should be enabled only when typha is enabled.
# ISSUES1 - calico iptables rules remain on the nodes, recycle nodes to remove. | ||
# ISSUES2 - Existing pod ip addresses are not tracked and can cause clashing ips - please cordon all nodes whilst | ||
# performing the upgrade until all workers/nodes have been recycyled - then it will be ok. | ||
# ISSUES3 - Rolling back to old networking will require all nodes are cycled/rolled and the network is disrupted throughout. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really appreciate the comprehensive and detailed documentation here!
enable: false | ||
content: | | ||
[Unit] | ||
Description=Perform actions which help when migrating from legacy to selfhosted networking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be run when .Kubernetes.Networking.SelfHosting.Enabled
is true
?
The proposed implementation seems to be doing the opposite of that, or perhaps I'm missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, no - the migration helper is intended to be a safeguard for rolling the new networking in an existing cluster and should only be deployed to legacy networking hosts and not new nodes where SelfHostedNetworking has been enabled (i.e. no local flannel). What it does is stop the local flannel service and remove its devices so that a more rapid migration can be achieved. This is in order to minimise cluster disruption in the time between the first new controller coming up and adding the new daemonsets and all of the nodes having been recycled. I did find an issue in my testing where the kubelet was still taking the ip addresses from the old flannel rather than the new and so I thought it would make sense to stop it so this can't happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidmccormick I think I started to understand the good of taking care of migration from old to new networking for minimal downtime.
But on the other hand I'm still unable to understand why this implementation helps that. To me, this systemd unit seems to run "whenever a node w/ legacy networking starts", but not "before a node w/ new networking comes up".
Also, how does a node w/ legacy networking starts successfully if you disable non-self-hosted flanneld this way?
Probably I'm still missing something...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, sorry - I haven't explained it all that well and it is a little complicated. The net-migration-helper is deployed to all nodes when the self-hosting option has been set to off/disabled - ie. all legacy style nodes. On its own it wont do anything - you see it will only trigger if someone writes a file /etc/kubernetes/cni/net.d/net-migration which I'm using as a kind of semaphore/trigger file. This file gets written when either of the self-hosting canal or flannel daemonsets are deployed (effectively signalling all legacy nodes that a migration is happening so they can shutdown their local flannel). Migrated nodes don't need to worry about this file though - they don't have a local flannel and so this service isn't deployed to them.
The whole goal of this is to minimise the disruption when rolling a legacy cluster to the new networking (if every node has to be rolled before the network is good this would cause lengthy disruption on large clusters). This way the cluster can reconfigure its networking with only a minute or less of disruption as it rolls out (in fact in some rolls there is no disruption at all).
As soon as the user selects self-hosting in their cluster.yaml and rolls their cluster, and once the first new controller comes up and runs install-kube-system, it will deploy the canal or flannel daemonsets to ALL nodes - so for a while we have legacy nodes which have their previously running flannel on them AND the new flannel deployed by daemonset. By writing this semaphore file /etc/kubernetes/cni/net.d/net-migration as part of the daemonset - we have a way of getting the legacy nodes to stop their local flannel and prevent any clashes that can occur from having flannel running twice. Once all the nodes have rolled and their are no legacy nodes in the cluster then the file has no purpose any longer and just gets ignored.
I saw some strange behaviour in testing that led me to believe that the safest way to deploy into an existing cluster is to roll the cluster so the net-migration helper goes out first on legacy nodes. I'm not convinced that it is needed in all cases, but it does give me a bit more security for performing a roll out into a large cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need any more docs/testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally understood! Thank you so much for the clarification.
For anyone interested, a few points you should know to understand about how this is implemented:
net-migration-helper.path
is implicitly paired tonet-migration-helper.service
, so that once the file is created on the path, the.service
counterpart is started by systemd.- A
canal
daemonset deployed via the first controller node rolled viakube-aws update
to enable canal writes the file to all the controller and worker nodes, so thatnet-migration-helper.path
is triggered on every legacy node for faster migration. Implementation: https://github.com/kubernetes-incubator/kube-aws/pull/1195/files#diff-ef25536c536667a40b993d4d24ab7567R1241
Awesome! I would heart this PR more if I could. I believe this will resolve #704 as well which I would need to work on soon. Might be worth commented it as experimental in cluster.yaml? |
@@ -284,8 +317,7 @@ coreos: | |||
--container-runtime={{.ContainerRuntime}} \ | |||
--rkt-path=/usr/bin/rkt \ | |||
--rkt-stage1-image=coreos.com/rkt/stage1-coreos \ | |||
--node-labels node-role.kubernetes.io/master{{if .NodeLabels.Enabled}},{{.NodeLabels.String}} \ | |||
{{end}} \ | |||
--node-labels=node-role.kubernetes.io/master="",kubernetes.io/role=master,{{if .NodeLabels.Enabled}},{{.NodeLabels.String}}{{end}} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it actually affects anything but you might unexpectedly have double commands when NodeLabels is enabled: kubernetes.io/role=master,,{{.NodeLabels.String}}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes let me send a small correction
@davidmccormick Thank you so much for all the your awesome works. I've reviewed this again, and am happy to get this merged 👍 |
Did someone test this? |
I did lots of testing of it - are you having a particular issue? |
We're having issues creating a cluster from master and looks related with canal. I'll try to collect as much logs as possible and post it here asap |
Ok - let me know what you have and what settings you are using in cluster.yaml? |
Hey i am a colleague from @jorge07 . Yesterday we were doing some testing with the master branch and we have found that the cluster was not able to run containers correctly. We are running the cluster without self-hosting the network .We have been looking logs around, and the problem was that kubelet was not able to locate cni binaries.
After taking a look basically i have found that the kubelet system unit basically is mounting the host path This is the output of one of the nodes that currently are not working correctly:
If we delete the argument from the unit then the containers began to run correctly and on the kubelet container we can see the binaries:
|
Hi thanks for the info - I will have a look at making the mount dependent on the networking selected. I'm guessing that you are not running with calico? as this would have installed the cni binaries as part of its bootstrap. The quickest fix whilst I work on testing the fix would be to enable the kubernetes-networking-self hosting feature with type 'flannel' (and would allow more testing of this feature). Apologies for the bug, and I'll be opening a pull request with a fix as soon as I am happy with the fix. |
Hi all, I'll keep #1232 open for users who search issues with I'll also keep investigating it. |
Yep not using calico. I have just try some changes, modifying the template, but then i am getting some strange failure. If i go through the templates and i modify the mount, then i am getting an error in relation to the signature verification of the image.
|
Thanks, it looks like I introduced this bug with cni after my flannel testing. I'm testing an update to the cloud-config that copies the binaries into the host /opt/cni/bin directory. I need to keep this mounted so that users can switch from legacy flannel/calico mode to the self-hosting daemonsets with minimal cluster disruption. |
Deploy Calico and Flannel networking as a daemonset with kubernetes API as backing store. Removes the need for nodes connecting to etcd and frees up node podCIDR leases faster -addressing cluster role issue: flannel-io/flannel#954.
Experimental feature, disabled by default.
Kubernetes Controllers become responsible for allocating node cidrs.
Switch between calico+flannel (canal) or flannel.
Fast roll out into existing clusters with minimal disruption.
Optional calico typha service for easing load on apiservers in large clusters.