Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Cilium as CNI plugin #4224

Merged
merged 5 commits into from
Mar 26, 2018
Merged

Conversation

nebril
Copy link
Contributor

@nebril nebril commented Jan 8, 2018

This change adds Cilium as an option to provide networking for kops-deployed cluster.

For Cilium to run you need an image with kernel >=4.8, I am testing on CoreOS image: 595879546273/CoreOS-stable-1576.4.0-hvm

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 8, 2018
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 8, 2018
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 9, 2018
@nebril nebril changed the title Add Cilium as CNI plugin [WIP] Add Cilium as CNI plugin Jan 18, 2018
@nebril
Copy link
Contributor Author

nebril commented Jan 18, 2018

You can spin up cluster with cilium networking as follows:

kops create cluster \
--zones <zones, e.g. "eu-central-1a"> \
--image 595879546273/CoreOS-stable-1576.4.0-hvm \
--networking cilium \
--override "cluster.spec.etcdClusters[*].version=3.1.11" \
--kubernetes-version 1.8.7

image uses CoreOS to make sure kernel is new enough for Cilium to work properly
Cilium requires etcd>= 3.1.0 to work properly.
I tested it only with k8s 1.8 for now, seems to be working properly.

@nebril
Copy link
Contributor Author

nebril commented Jan 18, 2018

/retest

1 similar comment
@chrislovecnm
Copy link
Contributor

/retest

@chrislovecnm
Copy link
Contributor

So how can we ensure that the user is running the correct kernel, which btw is not tested with e2e testing.

@nebril
Copy link
Contributor Author

nebril commented Jan 22, 2018

@chrislovecnm you mean at the point of creating the cluster? I am trying to figure out if filtering output of aws ec2 describe-images with kernel-id filter can help here.

Seems like images with aki-184c7a05 KernelID (which is what CoreOS image uses) should work. I don't know how to make a connection between kernel version and KernelID which would work for future KernelIDs, but there are only 5 unique KernelIDs in output of aws ec2 describe-images (which I guess consists all public images available for use).

I thought that aws ec2 describe-image-attribute might help, but it seems like most public images deny the right to describe image attributes publicly.

Would stating in the docs explicitly that you have to run proper kernel in order for Cilium to work correctly be enough?

@nebril
Copy link
Contributor Author

nebril commented Jan 26, 2018

@chrislovecnm ping. Do you have any other feedback about this PR?

@chrislovecnm
Copy link
Contributor

chrislovecnm commented Jan 26, 2018

@nebril overall the PR looks good, I just completed a review, not sure how to handle the kernel stuff. I apologize how long the review process has been.

@justinsb thoughts about how to handle the kernel dependency? I am wondering if we should output a message or just update the docs? For Cilium to run you need an image with kernel >=4.8, and I do not think we have any way to check for that. Also, I know that kernel version is significantly ahead of the e2e testing. What advice do you have?

Copy link
Contributor

@chrislovecnm chrislovecnm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bunch of the dates in files will drop out if you rebase, I think. Otherwise, if there is no code changes, please back out the date changes.

Thanks for the new plugin, I think we have 9 now! As I mention on all new CNI plugins, please strive to keep the plugin updated, as we rely on the community for those updates.

Thanks again!

@@ -1,7 +1,7 @@
// +build !ignore_autogenerated

/*
Copyright 2017 The Kubernetes Authors.
Copyright 2018 The Kubernetes Authors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rebase, and then probably want to re-run make apimachinery. I think these changes will be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

name: cilium-config
namespace: kube-system
data:
# This etcd-config contains the etcd endpoints of your cluster. If you use
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be using http etcd. There is an update for Calico to use TLS that is going in. You can either change the PR to output a message that you are using unencrypted etcd communication, and the etcd port is open, or figure out TLS.

@@ -0,0 +1,285 @@
kind: ConfigMap
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we warn the user when they try to use cilium with k8s 1.7 for instance? It will not work.

prometheus.io/port: "9090"
spec:
serviceAccountName: cilium
containers:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would encourage adding the resource limits that say the weave manifest has. Or other applicable resource limits.

spec:
serviceAccountName: cilium
containers:
- image: cilium/cilium:stable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a version we can use? Like a numbered version? We have had problems with the latest or other non-number versioned tags, where the manifest needs to be updated to the latest version of the CNI provider.

@chrislovecnm
Copy link
Contributor

Cilium requires etcd>= 3.1.0 to work properly.

We need to validate that as well ...

@nebril
Copy link
Contributor Author

nebril commented Jan 29, 2018

@chrislovecnm Thanks for great review, I will try to address your comments and will let you know when it's done.

@k8s-github-robot
Copy link

@nebril PR needs rebase

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 4, 2018
@justinsb justinsb added this to the 1.9 milestone Feb 21, 2018
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 15, 2018
@nebril
Copy link
Contributor Author

nebril commented Mar 20, 2018

@chrislovecnm can you re-review this?

I would encourage adding the resource limits that say the weave manifest has. Or other applicable resource limits.

The issue I have with this comment is that Cilium is a CNI plugin which is basically the Kubernetes component, so limiting its resources seems like limiting for example kubelet or other k8s components. Are you sure this is required?

nebril added 5 commits March 20, 2018 13:07
Signed-off-by: Maciej Kwiek <maciej@covalent.io>
Signed-off-by: Maciej Kwiek <maciej@covalent.io>
Signed-off-by: Maciej Kwiek <maciej@covalent.io>
Signed-off-by: Maciej Kwiek <maciej@covalent.io>
Signed-off-by: Maciej Kwiek <maciej@covalent.io>
@justinsb
Copy link
Member

/lgtm

Thanks @nebril

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 26, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: justinsb, nebril

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 26, 2018
@justinsb
Copy link
Member

I think this is fine to merge as an option.

Just FYI @nebril I think using etcd directly is basically frowned upon - it's better to talk to the kubernetes API. Flannel and calico both started by talking directly to etcd and have moved / are moving to using the API instead.

@k8s-ci-robot k8s-ci-robot merged commit fc1bed4 into kubernetes:master Mar 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants