Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cilium eni #8316

Merged
merged 1 commit into from
Feb 16, 2020
Merged

Cilium eni #8316

merged 1 commit into from
Feb 16, 2020

Conversation

olemarkus
Copy link
Member

With this feature enabled, Cilium will create and allocate ENIs for pods similar to LyftVPC and AmazonVPC.

Marked this as work in progress since there are a couple of other Cilium PRs that should go in first.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 11, 2020
@k8s-ci-robot
Copy link
Contributor

Hi @olemarkus. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jan 11, 2020
@hakman
Copy link
Member

hakman commented Jan 12, 2020

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 12, 2020
@johngmyers
Copy link
Member

Need to make crds

@olemarkus olemarkus changed the title WIP: Cilium eni Cilium eni Jan 27, 2020
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 27, 2020
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 8, 2020
Copy link
Member

@johngmyers johngmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The duplicate masquerade keys in the configmap need to be addressed.

I'm still not comfortable with always requiring the operator to run on the master. This doesn't exist in the upstream charts at all. As I see it, the only reason to do this is to give the IAM roles needed for ENI mode. Perhaps bring this up at the Kops Office Hours meeting?

}

return false
return (c.Cluster.Spec.Networking.CNI != nil && c.Cluster.Spec.Networking.CNI.UsesSecondaryIP) || c.Cluster.Spec.Networking.AmazonVPC != nil || c.Cluster.Spec.Networking.LyftVPC != nil || (c.Cluster.Spec.Networking.Cilium != nil && c.Cluster.Spec.Networking.Cilium.Ipam == "eni")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps define a constant for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, I think another constant along side this one would be good.

@@ -127,6 +127,7 @@ data:
enable-endpoint-routes: "true"
auto-create-cilium-node-resource: "true"
blacklist-conflicting-routes: "false"
masquerade: "false"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this conflict with the masquerade on line 120?

This seems a reasonable thing to do, but I don't see it in the upstream charts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't directly in the charts, but it is in the documentation: https://docs.cilium.io/en/v1.6/gettingstarted/aws-eni/#prepare-deploy-cilium

I will change this to forcing the config on line 120 instead.

@olemarkus
Copy link
Member Author

On Cilium operator running on master nodes: there are two alternatives
a) the privileged IAM permissions is given to all nodes (and thus all pods by default)
b) cilium operator runs on master and only master nodes have privileged IAM permissions

Cilium operator is something I would call a system component as without it running, normal pods won't spawn.

The question then becomes what kind of security concerns do we have that outweights pods having the escalated privileges on normal nodes.

@olemarkus
Copy link
Member Author

What I can do is only force the operator on masters if ipam is set to ENI or gate it with a dedicated config option

@johngmyers
Copy link
Member

@k8s-ci-robot you should label this needs-rebase.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 11, 2020
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 11, 2020
ciliumSpec := c.Spec.Networking.Cilium
if ciliumSpec.Ipam == "eni" {
if c.Spec.CloudProvider != "aws" {
allErrs = append(allErrs, field.Invalid(fieldSpec.Child("Cilium").Child("Ipam"), "eni", "Cilum ENI IPAM is supported only in AWS"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use field.Forbidden()

allErrs = append(allErrs, field.Invalid(fieldSpec.Child("Cilium").Child("Ipam"), "eni", "Cilum ENI IPAM is supported only in AWS"))
}
if !ciliumSpec.DisableMasquerade {
allErrs = append(allErrs, field.Invalid(fieldSpec.Child("Cilium").Child("DisableMasquerade"), "false", "Masquerade must be disabled when ENI IPAM is used"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also field.Forbidden()

Is masquerading forbidden with ENI? the Note block at https://docs.cilium.io/en/v1.6/gettingstarted/aws-eni/#prepare-deploy-cilium seems to suggest either way is possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I masq, cluster networking doesn't work. I am guessing Cilium is masking behind the wrong interface or something. I couldn't quite figure out why. It may be that one have to explicitly set which interface to mask behind. But then I am unsure if we can really know that across all distros.

Masqing only makes sense if the SG on the ENI cilium creates have egress rules. But there isn't really any benefit of doing that since because of the masq, traffic through the ENI reaches internet anyway.

So I think the easiest is to force masqing off for now and if anyone has any good reason for having masq on, we loosen that restriction.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ENI mode, there will be multiple possible egress paths for traffic leaving a node. Based on the destination, the right egress interface will be chosen. Masquerading is typically only ever needed for traffic heading out to the interface. This is typically routed via the default route pointing out eth0 although there can be node configurations where this is different.

For this reason, the Cilium ENI guide sets the egressMasqueradingInterfaces helm option to limit masquerading to traffic leaving on eth0:

  --set global.egressMasqueradeInterfaces=eth0 \

Masquerading is only required if no NAT/internet gateway is set up in AWS which will perform the masquerading.

}
}

if c.Spec.Networking != nil && c.Spec.Networking.Cilium != nil && c.Spec.Networking.Cilium.Ipam == "eni" && c.Spec.CloudProvider != "aws" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appears to be a duplicate of the check/error at line 608.

@olemarkus
Copy link
Member Author

I ran into an issue where etcd-manager is listening on one IP, but both primary IPs are added to the etcd-manager hosts file:

xxx.141   etcd-events-b.internal.xxx
xxx.229   etcd-events-b.internal.xxx
xxx.24    etcd-events-c.internal.xxx
xxx.157   etcd-events-c.internal.xxx
xxx.236    etcd-events-a.internal.xxx
xxx.181    etcd-events-a.internal.xxx

This will randomly break the etcd-cluster. Need to look into how to change this behaviour.
I would think this also breaks the other ENI CNIs
/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 12, 2020
@rifelpet
Copy link
Member

rifelpet commented Feb 12, 2020

If it helps your troubleshooting, I just confirmed that amazonvpc CNI does not have that problem with kopeio/etcd-manager:3.0.20190930

# Begin host entries managed by etcd-manager[etcd] - do not edit
xxx.255	etcd-a.internal.xxx
xxx.142	etcd-b.internal.xxx
xxx.110	etcd-c.internal.xxx
# End host entries managed by etcd-manager[etcd]

# Begin host entries managed by etcd-manager[etcd-events] - do not edit
xxx.255	etcd-events-a.internal.xxx
xxx.142	etcd-events-b.internal.xxx
xxx.110	etcd-events-c.internal.xxx
# End host entries managed by etcd-manager[etcd-events]

@olemarkus
Copy link
Member Author

Note that it took quite a while for this to happen. But thanks, that helps.

Would be useful with some clues on how etcd manager gets the info it uses for hosts. That has been quite hard to track

@rifelpet
Copy link
Member

I believe this is where etcd-manager writes the map of hosts to /etc/hosts and I believe it comes from the member list within the etcd cluster itself. Can you use etcdctl to see the list of members?

@olemarkus
Copy link
Member Author

From what I can tell, it looks like the issue only happens when using coreos. When using the default kops AMIs, I cannot reproduce this behaviour

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 12, 2020
@olemarkus olemarkus mentioned this pull request Feb 13, 2020
if c.Spec.Networking != nil && c.Spec.Networking.Cilium != nil {
ciliumSpec := c.Spec.Networking.Cilium
if ciliumSpec.Ipam == "eni" {
if c.Spec.CloudProvider != "aws" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could use kops.CloudProviderAWS here.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 13, 2020
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 13, 2020
@@ -861,6 +865,34 @@ func addLyftVPCPermissions(p *Policy, resource stringorslice.StringOrSlice, lega
)
}

func addCiliumEniPermissions(p *Policy, resource stringorslice.StringOrSlice, legacyIAM bool, clusterName string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this clusterName parameter is unused and can probably be removed.

pkg/apis/kops/validation/legacy.go Outdated Show resolved Hide resolved
pkg/apis/kops/validation/legacy.go Outdated Show resolved Hide resolved
pkg/apis/kops/validation/legacy.go Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 16, 2020
* Force cilium-operator run on master nodes
* Add option for setting cilium ipam mode
* If cilium ipam mode is eni, add additional permissions to master nodes
* Allow NonMasqueradeCIDR overlap with NetworkCIDR when Cilium ENI is enabled
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 16, 2020
Copy link
Member

@rifelpet rifelpet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks for sticking with it!
/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 16, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: olemarkus, rifelpet

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 16, 2020
@k8s-ci-robot k8s-ci-robot merged commit 5a5eb67 into kubernetes:master Feb 16, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Feb 16, 2020
@olemarkus olemarkus deleted the cilium-eni branch March 25, 2020 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants