refactor(vpcCni): set node anti-affinity to not deploy to fargate #291

metral · 2019-12-12T06:16:20Z

Proposed changes

refactor(vpcCni): set node anti-affinity to not deploy to fargate

This change sets a node anti-affinity for the AWS CNI DaemonSet to only deploy to nodes that are not labeled for fargate with eks.amazonaws.com/compute-type=fargate.

By default, the DaemonSet uses a RollingUpdate strategy with a maxUnavailable of 1 pod. This means that on a Pulumi update to this change for existing clusters, the aws-node pods will update one at a time, until all are updated with the new node anti-affinity. This will not cause down time of the cluster or running pods.

In summary, this change allows clusters to:

Mix and match the type of nodes used (EC2 / self-managed, AWS managed, and/or Fargate)
Deploy the aws-cni to the cluster for any configuration of nodes, and
Although the DS will always be created, it will only deploy to nodes that are not fargate backed.

Related issues (optional)

Fixes #286

lukehoban · 2019-12-12T06:48:53Z

nodejs/eks/cni/aws-k8s-cni.yaml

@@ -76,6 +76,8 @@ spec:
                    operator: In
                    values:
                      - amd64
+                  - key: "eks.amazonaws.com/compute-type"
+                    operator: DoesNotExist


This is diverging from the upstream CNI spec, right?

How does EKS natively handle this? They install CNI by default - https://docs.aws.amazon.com/eks/latest/userguide/cni-upgrades.html - so what happens if you just manually create a cluster put Fargate profiles that cause all pods to go to Fargate? Do you still end up with DameonSets trying to launch pods on the fake nodes?

What’s not clear to me is how this is supposed to work - I see no mention of this issue in EKS docs or in eksctl issues.

This is diverging from the upstream CNI spec, right?

Yes

How does EKS natively handle this?

Docs are lacking, so hard to tell.

what happens if you just manually create a cluster put Fargate profiles that cause all pods to go to Fargate? Do you still end up with DameonSets trying to launch pods on the fake nodes?

For an all fargate cluster, the aws-node DaemonSet still gets created, but with this change you don't see any Pods attempting to run on the fargate nodes, as the node labels now don't match for them. Prior to this change, we'd see aws-node pods in the Pending state trying to run on the fargate nodes and not succeeding. That's no longer the case in this scenario as it will only start Pods for the DS on EC2 or managed nodes, if they exist.

What’s not clear to me is how this is supposed to work - I see no mention of this issue in EKS docs or in eksctl issues.

Agreed. The docs are skim for this scenario with a hybrid cluster case and for an all fargate cluster.

FWIW the initial attempt at this PR was to move the creation of the VpcCni resource in createCore() to the create function for self-managed or managed node groups, to exclude it for fargate nodes. This change would either alter the established APIs and existing clusters expecting to pass vpcCniOptions into the Cluster definition, or result in attempting to deploy the aws-node DS per node group used - resulting in resource dupes on Pulumi's end, and attempts at creating multiple DS objects in the k8s control plane (e.g. if you had many node groups).

Setting the node anti-affinity in the CNI spec seemed less intrusive of a change overall given that our end goal is to not deploy it on fargate nodes, but still ensure the DS exists if the cluster we were to mix and match node types along with fargate nodes.

cc @lblackstone Thoughts?

This change seems reasonable to me. Affinity is the right way to avoid scheduling on nodes that don't support particular workloads.

We should just make sure the divergence is documented.

It appears there is a divergence between what AWS installs in practice, and what they have in the VPC CNI repo on GitHub.

Opened an issue to understand whether that's intentional - but we should align with what they actually install (which this PR almost does - we should match them exactly):

aws/amazon-vpc-cni-k8s#755

metral · 2019-12-12T19:59:56Z

Feedback has been addressed. PTAL @lukehoban @lblackstone

CHANGELOG.md

This change sets a node anti-affinity for the AWS CNI DaemonSet to only deploy to nodes that are not labeled for fargate with `eks.amazonaws.com/compute-type=fargate`. By default, the DaemonSet uses a `RollingUpdate` [strategy](https://kubernetes.io/docs/tasks/manage-daemon/update-daemon-set/#performing-a-rolling-update) with a `maxUnavailable` of 1 pod. This means that on a Pulumi update to this change for existing clusters, the `aws-node` pods will update one at a time, until all are updated with the new node anti-affinity. This will not cause down time of the cluster or running pods. In summary, this change allows clusters to: - Mix and match the type of nodes used (EC2 / self-managed, AWS managed, and/or Fargate) - Deploy the aws-cni to the cluster for any configuration of nodes, and - Although the DS will always be created, it will only deploy to nodes that are not fargate backed.

metral force-pushed the metral/refactor-cni branch from 2e45a6b to aeb38ea Compare December 12, 2019 06:17

lukehoban reviewed Dec 12, 2019

View reviewed changes

metral force-pushed the metral/refactor-cni branch 2 times, most recently from 52e6dd1 to 715f1e3 Compare December 12, 2019 19:59

lukehoban approved these changes Dec 12, 2019

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

metral force-pushed the metral/refactor-cni branch 2 times, most recently from 1b64081 to 5a67a4b Compare December 12, 2019 20:04

metral force-pushed the metral/refactor-cni branch from 5a67a4b to 02d3bc2 Compare December 12, 2019 20:05

lblackstone approved these changes Dec 12, 2019

View reviewed changes

metral merged commit 118ced0 into master Dec 13, 2019

pulumi-bot deleted the metral/refactor-cni branch December 13, 2019 22:25

leezen mentioned this pull request Dec 22, 2020

Upgrade aws-k8s-cni to v1.7.5 #496

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(vpcCni): set node anti-affinity to not deploy to fargate #291

refactor(vpcCni): set node anti-affinity to not deploy to fargate #291

metral commented Dec 12, 2019

lukehoban Dec 12, 2019

metral Dec 12, 2019

lblackstone Dec 12, 2019

lukehoban Dec 12, 2019 •

edited

Loading

metral Dec 12, 2019

metral commented Dec 12, 2019

refactor(vpcCni): set node anti-affinity to not deploy to fargate #291

refactor(vpcCni): set node anti-affinity to not deploy to fargate #291

Conversation

metral commented Dec 12, 2019

Proposed changes

Related issues (optional)

lukehoban Dec 12, 2019

Choose a reason for hiding this comment

metral Dec 12, 2019

Choose a reason for hiding this comment

lblackstone Dec 12, 2019

Choose a reason for hiding this comment

lukehoban Dec 12, 2019 • edited Loading

Choose a reason for hiding this comment

metral Dec 12, 2019

Choose a reason for hiding this comment

metral commented Dec 12, 2019

lukehoban Dec 12, 2019 •

edited

Loading