Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Amazon VPC CNI plugin #3997

Merged
merged 1 commit into from
Dec 18, 2017

Conversation

aledbf
Copy link
Member

@aledbf aledbf commented Dec 3, 2017

TODO:

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 3, 2017
@chrislovecnm
Copy link
Contributor

We need to ensure this is not installed on k8s versions lower 1.7. Can we add something to validation.go?

@liwenwu-amazon
Copy link

I am not seeing any IAM role change in this diff. The worker nodes needs to have following IAM policy:

 {
     "Effect": "Allow",
     "Action": [
         "ec2:CreateNetworkInterface",
         "ec2:AttachNetworkInterface",
         "ec2:DeleteNetworkInterface",
         "ec2:DetachNetworkInterface",
         "ec2:DescribeNetworkInterfaces",
         “ec2:DescribeInstances”,
         “ec2:ModifyNetworkInterfaceAttribute”,
         "ec2:AssignPrivateIpAddresses"
     ],  
     "Resource": [
         "*" 
     ]   
 },
 {
     "Effect": "Allow",
     "Action": "tag:TagResources",
     "Resource": "*" 
 },

@chrislovecnm
Copy link
Contributor

Thanks, @liwenwu-amazon - can we set the IAM perms so that the CNI provider only has perms for that node? If the ds is running on ip-172-31-23-208, then the CNI provider would only have perms for DetachNetworkInterface on ec2 instance ip-172-31-23-208. If my node is compromised I am giving DetachNetworkInterface for my entire account, and I would prefer to not.

Also, when we delete a cluster or delete a node, does kops need to clean up all of the ENI's? I just thought about cleanup upon deletion, since we are creating new networking components.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 5, 2017
@aledbf aledbf force-pushed the amazon-vpc-cni branch 4 times, most recently from 3b318b5 to 9fc5a9d Compare December 7, 2017 01:16
@aledbf
Copy link
Member Author

aledbf commented Dec 7, 2017

Also, when we delete a cluster or delete a node, does kops need to clean up all of the ENI's? I just thought about cleanup upon deletion, since we are creating new networking components.

I don't see additionals ENIs in the console, just in the nodes

screenshot from 2017-12-06 22 25 05

enifc1e629495 Link encap:Ethernet  HWaddr 6a:74:5d:79:2f:30  
          inet6 addr: fe80::6874:5dff:fe79:2f30/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:84 errors:0 dropped:2 overruns:0 frame:0
          TX packets:49 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:5356 (5.2 KiB)  TX bytes:3426 (3.3 KiB)

@aledbf
Copy link
Member Author

aledbf commented Dec 7, 2017

The only missing change to this PR to make this work ootb is the cleanup of security group inbound rules.
It should look like this
screenshot from 2017-12-07 01 12 41
screenshot from 2017-12-07 01 12 15

@aledbf
Copy link
Member Author

aledbf commented Dec 7, 2017

@chrislovecnm @justinsb how can we measure (or compare) the performance of this cni provider?

@aledbf aledbf force-pushed the amazon-vpc-cni branch 4 times, most recently from c5e635d to 56d1e23 Compare December 7, 2017 19:45
@aledbf
Copy link
Member Author

aledbf commented Dec 7, 2017

Right now is not possible to limit the scope of the permissions.
screenshot from 2017-12-07 16 24 50

@aledbf aledbf changed the title WIP: Add support for Amazon VPC CNI plugin Add support for Amazon VPC CNI plugin Dec 7, 2017
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 7, 2017
@aledbf
Copy link
Member Author

aledbf commented Dec 15, 2017

@liwenwu-amazon this is the procedure:

  1. build and upload kops
export S3_BUCKET_NAME=<some bucket you own>

export KOPS_STATE_STORE=s3://${S3_BUCKET_NAME}
export KOPS_BASE_URL=https://${S3_BUCKET_NAME}.s3.amazonaws.com/kops/dev/

make kops-install upload S3_BUCKET=s3://${S3_BUCKET_NAME} VERSION=dev
  1. create a cluster
kops create cluster \
    --zones us-east-1a,us-east-1b,us-east-1c \
    --dns private \
    --vpc vpc-0066bd79 \
    --node-count 5 \
    --master-size m3.xlarge \
    --networking amazon-vpc-routed-eni \
    --kubernetes-version 1.8.0 $NAME -v 10 

@liwenwu-amazon
Copy link

@aledbf Thank you for the instructions. I am still getting same error.

Just curious if following file name needs to be changed to networking.amazon-vpc-routed-eni?

diff --git a/upup/models/cloudup/resources/addons/networking.amazon-vpc/0.1.1-kops.1.yaml.template b/upup/models/cloudup/resources/addons/networking.amazon-vpc/0.1.1-kops.1.yaml.template
new file mode 100644
index 0000000..f8f266c
--- /dev/null
+++ b/upup/models/cloudup/resources/addons/networking.amazon-vpc/0.1.1-kops.1.yaml.template
@@ -0,0 +1,52 @@
+kind: DaemonSet
+apiVersion: extensions/v1beta1
+metadata:
+  name: aws-node
+  namespace: kube-system
+  labels:
+    k8s-app: aws-node
+spec:
+  selector:
+    matchLabels:
+      k8s-app: aws-node
...

@aledbf
Copy link
Member Author

aledbf commented Dec 15, 2017

@chrislovecnm
Copy link
Contributor

@liwenwu-amazon

Also run make clean && make please

Is go bindata running for you? That file is generated by go bindata

@chrislovecnm
Copy link
Contributor

To be clear

$ make clean
$ make

Also is that file on disk?

@liwenwu-amazon
Copy link

@aledbf @chrislovecnm Thanks and I am able to bring up a kop cluster. But I am running into a new problem:
The current Amazon VPC CNI plugin requires IP Forwarding is enabled on the node. The current node AMI have IP forwarding disabled. So Pod to Pod traffic get dropped.

Here is iptable-save output

iptables-save
# Generated by iptables-save v1.4.21 on Sat Dec 16 01:18:20 2017
*nat
:PREROUTING ACCEPT [19:1596]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [16:960]
:POSTROUTING ACCEPT [0:0]
:DOCKER - [0:0]
:KUBE-MARK-DROP - [0:0]
:KUBE-MARK-MASQ - [0:0]
:KUBE-NODEPORTS - [0:0]
:KUBE-POSTROUTING - [0:0]
:KUBE-SEP-GU4TQJBUWQA2RWOB - [0:0]
:KUBE-SERVICES - [0:0]
:KUBE-SVC-ERIFXISQEP7F7OF4 - [0:0]
:KUBE-SVC-NPX46M4PTMTKRN6Y - [0:0]
:KUBE-SVC-TCOU7JCQXEZGVUNU - [0:0]
-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
-A POSTROUTING ! -d 10.0.0.0/16 -m comment --comment "AWS, SNAT" -m addrtype ! --dst-type LOCAL -j SNAT --to-source 10.0.113.78
-A DOCKER -i docker0 -j RETURN
-A KUBE-MARK-DROP -j MARK --set-xmark 0x8000/0x8000
-A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -m mark --mark 0x4000/0x4000 -j MASQUERADE
-A KUBE-SEP-GU4TQJBUWQA2RWOB -s 10.0.42.249/32 -m comment --comment "default/kubernetes:https" -j KUBE-MARK-MASQ
-A KUBE-SEP-GU4TQJBUWQA2RWOB -p tcp -m comment --comment "default/kubernetes:https" -m recent --set --name KUBE-SEP-GU4TQJBUWQA2RWOB --mask 255.255.255.255 --rsource -m tcp -j DNAT --to-destination 10.0.42.249:443
-A KUBE-SERVICES ! -s 10.0.128.0/17 -d 10.0.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.0.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
-A KUBE-SERVICES ! -s 10.0.128.0/17 -d 10.0.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.0.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-SVC-TCOU7JCQXEZGVUNU
-A KUBE-SERVICES ! -s 10.0.128.0/17 -d 10.0.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.0.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-SVC-ERIFXISQEP7F7OF4
-A KUBE-SERVICES -m comment --comment "kubernetes service nodeports; NOTE: this must be the last rule in this chain" -m addrtype --dst-type LOCAL -j KUBE-NODEPORTS
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -m recent --rcheck --seconds 10800 --reap --name KUBE-SEP-GU4TQJBUWQA2RWOB --mask 255.255.255.255 --rsource -j KUBE-SEP-GU4TQJBUWQA2RWOB
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -j KUBE-SEP-GU4TQJBUWQA2RWOB
COMMIT
# Completed on Sat Dec 16 01:18:20 2017
# Generated by iptables-save v1.4.21 on Sat Dec 16 01:18:20 2017
*filter
:INPUT ACCEPT [204:43544]
:FORWARD DROP [19:1596]
:OUTPUT ACCEPT [186:23000]
:DOCKER - [0:0]
:DOCKER-ISOLATION - [0:0]
:KUBE-FIREWALL - [0:0]
:KUBE-SERVICES - [0:0]
-A INPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A INPUT -j KUBE-FIREWALL
-A FORWARD -j DOCKER-ISOLATION
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT -j KUBE-FIREWALL
-A DOCKER-ISOLATION -j RETURN
-A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -j DROP
-A KUBE-SERVICES -d 10.0.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns has no endpoints" -m udp --dport 53 -j REJECT --reject-with icmp-port-unreachable
-A KUBE-SERVICES -d 10.0.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp has no endpoints" -m tcp --dport 53 -j REJECT --reject-with icmp-port-unreachable
COMMIT

@chrislovecnm
Copy link
Contributor

We have a container that will do that, which is included with Calico. We can add that to the manifest.

When your team refactors the provider to run on the master, you may want to consider enabling that from there.

@liwenwu-amazon
Copy link

@chrislovecnm @aledbf Let me know when you have the updated manifest that enable IP forwarding. thanks

@aledbf
Copy link
Member Author

aledbf commented Dec 17, 2017

@liwenwu-amazon please update the code. The latest rebased code contains the image @chrislovecnm mentioned https://github.com/kubernetes/kops/pull/3997/files#diff-cf17abfa9600a8947998f51a549c0b46R102

@liwenwu-amazon
Copy link

I am getting follow error with kubernetes 1.8.0

Dec 18 00:52:47 ip-10-0-37-225 docker[7432]: /usr/bin/docker: Error response from daemon: repository protokube not found: does not exist or no pull access.
Dec 18 00:52:47 ip-10-0-37-225 docker[7432]: See '/usr/bin/docker run --help'.
Dec 18 00:52:47 ip-10-0-37-225 systemd[1]: protokube.service: main process exited, code=exited, status=125/n/a
Dec 18 00:52:47 ip-10-0-37-225 systemd[1]: Unit protokube.service entered failed state.
Dec 18 00:52:49 ip-10-0-37-225 systemd[1]: protokube.service holdoff time over, scheduling restart.
Dec 18 00:52:49 ip-10-0-37-225 systemd[1]: Stopping Kubernetes Protokube Service...
Dec 18 00:52:49 ip-10-0-37-225 systemd[1]: Starting Kubernetes Protokube Service...
Dec 18 00:52:49 ip-10-0-37-225 systemd[1]: Starting Run docker-healthcheck once...
Dec 18 00:52:49 ip-10-0-37-225 systemd[1]: Started Kubernetes Protokube Service.
Dec 18 00:52:49 ip-10-0-37-225 docker-healthcheck[7449]: docker healthy
Dec 18 00:52:49 ip-10-0-37-225 dockerd[1407]: time="2017-12-18T00:52:49.470243188Z" level=error msg="Handler for POST /v1.26/containers/create returned error: No such image: protokube:1.8.0"
Dec 18 00:52:49 ip-10-0-37-225 systemd[1]: Started Run docker-healthcheck once.
Dec 18 00:52:49 ip-10-0-37-225 docker[7453]: Unable to find image 'protokube:1.8.0' locally
Dec 18 00:52:49 ip-10-0-37-225 dockerd[1407]: time="2017-12-18T00:52:49.524735742Z" level=error msg="Not continuing with pull after error: errors:\ndenied: requested access to the resource is denied\nunauthorized: authentication required\n"
Dec 18 00:52:49 ip-10-0-37-225 dockerd[1407]: time="2017-12-18T00:52:49.524833031Z" level=error msg="Handler for POST /v1.26/images/create returned error: repository protokube not found: does not exist or no pull access"
Dec 18 00:52:49 ip-10-0-37-225 docker[7453]: /usr/bin/docker: Error response from daemon: repository protokube not found: does not exist or no pull access.
Dec 18 00:52:49 ip-10-0-37-225 docker[7453]: See '/usr/bin/docker run --help'.
Dec 18 00:52:49 ip-10-0-37-225 systemd[1]: protokube.service: main process exited, code=exited, status=125/n/a
Dec 18 00:52:49 ip-10-0-37-225 systemd[1]: Unit protokube.service entered failed state.
Dec 18 00:52:51 ip-10-0-37-225 systemd[1]: protokube.service holdoff time over, scheduling restart.
Dec 18 00:52:51 ip-10-0-37-225 systemd[1]: Stopping Kubernetes Protokube Service...
Dec 18 00:52:51 ip-10-0-37-225 systemd[1]: Starting Kubernetes Protokube Service...
Dec 18 00:52:51 ip-10-0-37-225 systemd[1]: Started Kubernetes Protokube Service.
Dec 18 00:52:51 ip-10-0-37-225 dockerd[1407]: time="2017-12-18T00:52:51.712773087Z" level=error msg="Handler for POST /v1.26/containers/create returned error: No such image: protokube:1.8.0"
Dec 18 00:52:51 ip-10-0-37-225 docker[7477]: Unable to find image 'protokube:1.8.0' locally
Dec 18 00:52:51 ip-10-0-37-225 dockerd[1407]: time="2017-12-18T00:52:51.770322650Z" level=error msg="Not continuing with pull after error: errors:\ndenied: requested access to the resource is denied\nunauthorized: authentication required\n"
Dec 18 00:52:51 ip-10-0-37-225 dockerd[1407]: time="2017-12-18T00:52:51.770426820Z" level=error msg="Handler for POST /v1.26/images/create returned error: repository protokube not found: does not exist or no pull access"
Dec 18 00:52:51 ip-10-0-37-225 docker[7477]: /usr/bin/docker: Error response from daemon: repository protokube not found: does not exist or no pull access.
Dec 18 00:52:51 ip-10-0-37-225 docker[7477]: See '/usr/bin/docker run --help'.
Dec 18 00:52:51 ip-10-0-37-225 systemd[1]: protokube.service: main process exited, code=exited, status=125/n/a
Dec 18 00:52:51 ip-10-0-37-225 systemd[1]: Unit protokube.service entered failed state.

And I am getting different error when trying kubernetes 1.7.10

Dec 18 00:00:02 ip-10-0-52-81 systemd[1]: Started Kubernetes Protokube Service.
Dec 18 00:00:02 ip-10-0-52-81 dockerd[1438]: time="2017-12-18T00:00:02.081143299Z" level=error msg="Handler for POST /v1.24/containers/create returned error: No such image: protokube:1.8.0"
Dec 18 00:00:02 ip-10-0-52-81 docker[3744]: Unable to find image 'protokube:1.8.0' locally
Dec 18 00:00:02 ip-10-0-52-81 dockerd[1438]: time="2017-12-18T00:00:02.156731844Z" level=error msg="Attempting next endpoint for pull after error: unauthorized: authentication required"
Dec 18 00:00:02 ip-10-0-52-81 docker[3744]: Pulling repository docker.io/library/protokube
Dec 18 00:00:02 ip-10-0-52-81 dockerd[1438]: time="2017-12-18T00:00:02.178509385Z" level=error msg="Not continuing with pull after error: Error: image library/protokube:1.8.0 not found"
Dec 18 00:00:02 ip-10-0-52-81 docker[3744]: /usr/bin/docker: Error: image library/protokube:1.8.0 not found.
Dec 18 00:00:02 ip-10-0-52-81 docker[3744]: See '/usr/bin/docker run --help'.
Dec 18 00:00:02 ip-10-0-52-81 systemd[1]: protokube.service: main process exited, code=exited, status=125/n/a
Dec 18 00:00:02 ip-10-0-52-81 systemd[1]: Unit protokube.service entered failed state.
Dec 18 00:00:04 ip-10-0-52-81 systemd[1]: protokube.service holdoff time over, scheduling restart.
Dec 18 00:00:04 ip-10-0-52-81 systemd[1]: Stopping Kubernetes Protokube Service...
Dec 18 00:00:04 ip-10-0-52-81 systemd[1]: Starting Kubernetes Protokube Service...
Dec 18 00:00:04 ip-10-0-52-81 systemd[1]: Started Kubernetes Protokube Service.
Dec 18 00:00:04 ip-10-0-52-81 dockerd[1438]: time="2017-12-18T00:00:04.327914141Z" level=error msg="Handler for POST /v1.24/containers/create returned error: No such image: protokube:1.8.0"
Dec 18 00:00:04 ip-10-0-52-81 docker[3753]: Unable to find image 'protokube:1.8.0' locally
Dec 18 00:00:04 ip-10-0-52-81 dockerd[1438]: time="2017-12-18T00:00:04.384409482Z" level=error msg="Attempting next endpoint for pull after error: unauthorized: authentication required"
Dec 18 00:00:04 ip-10-0-52-81 docker[3753]: Pulling repository docker.io/library/protokube
Dec 18 00:00:04 ip-10-0-52-81 dockerd[1438]: time="2017-12-18T00:00:04.413352594Z" level=error msg="Not continuing with pull after error: Error: image library/protokube:1.8.0 not found"
Dec 18 00:00:04 ip-10-0-52-81 docker[3753]: /usr/bin/docker: Error: image library/protokube:1.8.0 not found.
Dec 18 00:00:04 ip-10-0-52-81 docker[3753]: See '/usr/bin/docker run --help'.
Dec 18 00:00:04 ip-10-0-52-81 systemd[1]: protokube.service: main process exited, code=exited, status=125/n/a
Dec 18 00:00:04 ip-10-0-52-81 systemd[1]: Unit protokube.service entered failed state.

@aledbf
Copy link
Member Author

aledbf commented Dec 18, 2017

@liwenwu-amazon did you followed the procedure #3997 (comment) to build kops and the S3 assets?

@liwenwu-amazon
Copy link

@aledbf yes. I have done it twice. Let me try one more time.

@liwenwu-amazon
Copy link

@aledbf It works now! Was my fault, lost my VPN during my built.
I have done some basic tests (such as pod to pod ping) It works.
thanks again.

@chrislovecnm
Copy link
Contributor

@liwenwu-amazon so are you go to merge as is?

@liwenwu-amazon
Copy link

@chrislovecnm looks good to me! Thanks.

@chrislovecnm
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 18, 2017
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chrislovecnm

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 18, 2017
@aledbf
Copy link
Member Author

aledbf commented Dec 18, 2017

@chrislovecnm this is ok to be merged? (I still see the hold label)

@chrislovecnm
Copy link
Contributor

/h

@chrislovecnm
Copy link
Contributor

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 18, 2017
@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants