-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix and improve migrate-nodegroup test #214
Conversation
ac38a22
to
b549c9b
Compare
@@ -81,15 +81,20 @@ spec: | |||
tolerations: | |||
- operator: Exists | |||
containers: | |||
- image: 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:v1.5.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this now march any released version of this yaml from the source? Would love to keep this identical to some released version if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The upstream YAML manifest and ours match. Below is the diff between upstream and ours.
The main differences lie in:
- Removing the image, and
AWS_VPC_K8S_CNI_LOGLEVEL
as we set those above incni.ts
- Enabling the use of the readiness & liveness probes since these are omitted upstream as this healthz endpoint is only available on CNI images >=
v1.5.2
which we use now.
$ diff -u upstream-aws-k8s-cni.yaml aws-k8s-cni.yaml
--- upstream-aws-k8s-cni.yaml 2019-08-08 11:28:15.838079451 -0700
+++ aws-k8s-cni.yaml 2019-08-07 19:12:21.888614038 -0700
@@ -81,23 +81,20 @@
tolerations:
- operator: Exists
containers:
- - image: 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:v1.5.2
- imagePullPolicy: Always
+ - imagePullPolicy: Always
ports:
- containerPort: 61678
name: metrics
name: aws-node
- #readinessProbe:
- # exec:
- # command: ["/app/grpc_health_probe", "-addr=:50051"]
- # initialDelaySeconds: 5
- #livenessProbe:
- # exec:
- # command: ["/app/grpc_health_probe", "-addr=:50051"]
- # initialDelaySeconds: 5
+ readinessProbe:
+ exec:
+ command: ["/app/grpc_health_probe", "-addr=:50051"]
+ initialDelaySeconds: 5
+ livenessProbe:
+ exec:
+ command: ["/app/grpc_health_probe", "-addr=:50051"]
+ initialDelaySeconds: 5
env:
- - name: AWS_VPC_K8S_CNI_LOGLEVEL
- value: DEBUG
- name: MY_NODE_NAME
valueFrom:
fieldRef:
64c053e
to
922a509
Compare
I'll make changes to adopt the override approach instead of the current one, and not remove bits from the YAML manifest directly to maintain that with upstream as best we can. Though note that |
922a509
to
44151b3
Compare
Feedback has been addressed. PTAL @lukehoban. Updated diff in the YAML manifest. The diffs are:
|
44151b3
to
47c5033
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Great that this is ultimately addressed by fixes in 1.5.2 CNI! |
Agreed! Also, the explicit removal of the NGINX workload and |
47c5033
to
b145e46
Compare
Fixes #194.
The changes in this fix resolve a leaked ENI that lead to a dependency violation on tear down. It was resolved by bumping up
aws-cni
to v1.5.2 that was recently released [1]. We now also:aws-cni
DaemonSet to ensure these get explicitly torn down.This fix was tested across 70+ runs of the
migrate-nodegroup
test, and no run hit issue #194.Additionally,
logLevel
,logFile
, andimage
have been added as options toVpcCniOptions
, and defaults for logging have been adjusted to be verbose, and be produced for theaws-cni
Pods.[1] - The specific fixes in
aws-cni
v1.5.2 were:healthz
endpoint: Adding healthz endpoint to IPamD aws/amazon-vpc-cni-k8s#548 and Update start script to wait for ipamd health check aws/amazon-vpc-cni-k8s#553. Thehealthz
endpoint is used in the readiness & liveness probes of the DaemonSet.