-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cilium eni #8316
Cilium eni #8316
Conversation
Hi @olemarkus. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
upup/models/cloudup/resources/addons/networking.cilium.io/k8s-1.12.yaml.template
Outdated
Show resolved
Hide resolved
Need to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The duplicate masquerade
keys in the configmap need to be addressed.
I'm still not comfortable with always requiring the operator to run on the master. This doesn't exist in the upstream charts at all. As I see it, the only reason to do this is to give the IAM roles needed for ENI mode. Perhaps bring this up at the Kops Office Hours meeting?
nodeup/pkg/model/context.go
Outdated
} | ||
|
||
return false | ||
return (c.Cluster.Spec.Networking.CNI != nil && c.Cluster.Spec.Networking.CNI.UsesSecondaryIP) || c.Cluster.Spec.Networking.AmazonVPC != nil || c.Cluster.Spec.Networking.LyftVPC != nil || (c.Cluster.Spec.Networking.Cilium != nil && c.Cluster.Spec.Networking.Cilium.Ipam == "eni") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps define a constant for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, I think another constant along side this one would be good.
@@ -127,6 +127,7 @@ data: | |||
enable-endpoint-routes: "true" | |||
auto-create-cilium-node-resource: "true" | |||
blacklist-conflicting-routes: "false" | |||
masquerade: "false" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this conflict with the masquerade
on line 120?
This seems a reasonable thing to do, but I don't see it in the upstream charts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't directly in the charts, but it is in the documentation: https://docs.cilium.io/en/v1.6/gettingstarted/aws-eni/#prepare-deploy-cilium
I will change this to forcing the config on line 120 instead.
On Cilium operator running on master nodes: there are two alternatives Cilium operator is something I would call a system component as without it running, normal pods won't spawn. The question then becomes what kind of security concerns do we have that outweights pods having the escalated privileges on normal nodes. |
What I can do is only force the operator on masters if ipam is set to ENI or gate it with a dedicated config option |
@k8s-ci-robot you should label this needs-rebase. |
618cfcd
to
4cfa762
Compare
pkg/apis/kops/validation/legacy.go
Outdated
ciliumSpec := c.Spec.Networking.Cilium | ||
if ciliumSpec.Ipam == "eni" { | ||
if c.Spec.CloudProvider != "aws" { | ||
allErrs = append(allErrs, field.Invalid(fieldSpec.Child("Cilium").Child("Ipam"), "eni", "Cilum ENI IPAM is supported only in AWS")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use field.Forbidden()
pkg/apis/kops/validation/legacy.go
Outdated
allErrs = append(allErrs, field.Invalid(fieldSpec.Child("Cilium").Child("Ipam"), "eni", "Cilum ENI IPAM is supported only in AWS")) | ||
} | ||
if !ciliumSpec.DisableMasquerade { | ||
allErrs = append(allErrs, field.Invalid(fieldSpec.Child("Cilium").Child("DisableMasquerade"), "false", "Masquerade must be disabled when ENI IPAM is used")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also field.Forbidden()
Is masquerading forbidden with ENI? the Note block at https://docs.cilium.io/en/v1.6/gettingstarted/aws-eni/#prepare-deploy-cilium seems to suggest either way is possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I masq, cluster networking doesn't work. I am guessing Cilium is masking behind the wrong interface or something. I couldn't quite figure out why. It may be that one have to explicitly set which interface to mask behind. But then I am unsure if we can really know that across all distros.
Masqing only makes sense if the SG on the ENI cilium creates have egress rules. But there isn't really any benefit of doing that since because of the masq, traffic through the ENI reaches internet anyway.
So I think the easiest is to force masqing off for now and if anyone has any good reason for having masq on, we loosen that restriction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In ENI mode, there will be multiple possible egress paths for traffic leaving a node. Based on the destination, the right egress interface will be chosen. Masquerading is typically only ever needed for traffic heading out to the interface. This is typically routed via the default route pointing out eth0
although there can be node configurations where this is different.
For this reason, the Cilium ENI guide sets the egressMasqueradingInterfaces
helm option to limit masquerading to traffic leaving on eth0:
--set global.egressMasqueradeInterfaces=eth0 \
Masquerading is only required if no NAT/internet gateway is set up in AWS which will perform the masquerading.
pkg/apis/kops/validation/legacy.go
Outdated
} | ||
} | ||
|
||
if c.Spec.Networking != nil && c.Spec.Networking.Cilium != nil && c.Spec.Networking.Cilium.Ipam == "eni" && c.Spec.CloudProvider != "aws" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Appears to be a duplicate of the check/error at line 608.
upup/models/cloudup/resources/addons/networking.cilium.io/k8s-1.12.yaml.template
Outdated
Show resolved
Hide resolved
4cfa762
to
871c04b
Compare
I ran into an issue where etcd-manager is listening on one IP, but both primary IPs are added to the etcd-manager hosts file:
This will randomly break the etcd-cluster. Need to look into how to change this behaviour. |
If it helps your troubleshooting, I just confirmed that amazonvpc CNI does not have that problem with kopeio/etcd-manager:3.0.20190930
|
Note that it took quite a while for this to happen. But thanks, that helps. Would be useful with some clues on how etcd manager gets the info it uses for hosts. That has been quite hard to track |
I believe this is where etcd-manager writes the map of hosts to /etc/hosts and I believe it comes from the member list within the etcd cluster itself. Can you use etcdctl to see the list of members? |
From what I can tell, it looks like the issue only happens when using coreos. When using the default kops AMIs, I cannot reproduce this behaviour /hold cancel |
pkg/apis/kops/validation/legacy.go
Outdated
if c.Spec.Networking != nil && c.Spec.Networking.Cilium != nil { | ||
ciliumSpec := c.Spec.Networking.Cilium | ||
if ciliumSpec.Ipam == "eni" { | ||
if c.Spec.CloudProvider != "aws" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could use kops.CloudProviderAWS here.
e7f4886
to
e84a4f2
Compare
pkg/model/iam/iam_builder.go
Outdated
@@ -861,6 +865,34 @@ func addLyftVPCPermissions(p *Policy, resource stringorslice.StringOrSlice, lega | |||
) | |||
} | |||
|
|||
func addCiliumEniPermissions(p *Policy, resource stringorslice.StringOrSlice, legacyIAM bool, clusterName string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this clusterName parameter is unused and can probably be removed.
* Force cilium-operator run on master nodes * Add option for setting cilium ipam mode * If cilium ipam mode is eni, add additional permissions to master nodes * Allow NonMasqueradeCIDR overlap with NetworkCIDR when Cilium ENI is enabled
bf35c38
to
ced8f00
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks for sticking with it!
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: olemarkus, rifelpet The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
With this feature enabled, Cilium will create and allocate ENIs for pods similar to LyftVPC and AmazonVPC.
Marked this as work in progress since there are a couple of other Cilium PRs that should go in first.