Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCE Ingress controller (GLBC) should install a firewall rule per Ingress object per cluster #37306

Closed
madhusudancs opened this issue Nov 22, 2016 · 7 comments
Assignees
Labels
sig/network Categorizes an issue or PR as relevant to SIG Network.
Milestone

Comments

@madhusudancs
Copy link
Contributor

Although this issue is about GLBC which is in its own repo - https://github.com/kubernetes/ingress/commits/master, this affects federation. So filing this issue here.

GLBC uses the UID specified in a ConfigMap in its cluster to construct the names of the GCE L7 load balancer resources it installs. In the multi-cluster Ingress case or the Federated Ingress case, things work as expected for all the resources except the firewall rules. GLBC is conservative about the firewall rules and it exposes only the nodes it is responsible for as the targets clobbering the target list. In the case of Federated Ingress, each GLBC instance running in an underlying cluster in a federation overwrites the targets specified by other instances and this causes GCE L7 load balancer's health checks and backends to flap as described here - #36327.

A workaround is to manually install a firewall rule to expose the targets of all the underlying clusters in a federation for each Federated Ingress object so that the health checks can pass and GCE L7 load balancer is stable. We will document this workaround in v1.5 and fix the issue in v1.6.

The proposed solution is to have each individual GLBC instance install a separate firewall that is responsible for the cluster it is running in. This requires a change in the firewall rule naming scheme. A disadvantage in taking this approach is that, it requires a migration path for the existing firewall rules in the running clusters.

An alternative solution is to have the Federated Ingress Controller program the required firewall rule to allow traffic to all the underlying clusters and a flag to GLBC to disable firewall rule creation. Disadvantages with this approach are:

  1. It bakes in GCP-specific knowledge into Federated Ingress Controller which is against the original design goal of the controller.
  2. Multi-cluster Ingress users outside federation will still need an automated way to install these rules.

cc @kubernetes/sig-cluster-federation @nikhiljindal @matchstick @bprashanth

@madhusudancs madhusudancs added area/cluster-federation sig/network Categorizes an issue or PR as relevant to SIG Network. labels Nov 22, 2016
@madhusudancs madhusudancs added this to the v1.6 milestone Nov 22, 2016
@madhusudancs madhusudancs self-assigned this Nov 22, 2016
@bprashanth
Copy link
Contributor

Re moving firewall logic into the federated controller, you should probably
model it after the service controller with pluggable cloud specific
backends. Today the federated ingress controller's primary function is:

  1. UID management
  2. setting static-ip annotation

Both of which are GCP specific, since the on-prem nginx controller and by
extension the AWS ingress (which is basically a service.Type=LoadBalancer
in front of an nginx ingress at this point) don't care about them. I
suspect cross cluster loadbalancing is going to bring some site specific
twists anyway, as we start thinking about more platforms.

On Tue, Nov 22, 2016 at 12:13 PM, Madhusudan.C.S notifications@github.com
wrote:

Although this issue is about GLBC which is in its own repo -
https://github.com/kubernetes/ingress/commits/master, this affects
federation. So filing this issue here.

GLBC uses the UID specified in a ConfigMap in its cluster to construct the
names of the GCE L7 load balancer resources it installs. In the
multi-cluster Ingress case or the Federated Ingress case, things work as
expected for all the resources except the firewall rules. GLBC is
conservative about the firewall rules and it exposes only the nodes it is
responsible for as the targets clobbering the target list. In the case of
Federated Ingress, each GLBC instance running in an underlying cluster in a
federation overwrites the targets specified by other instances and this
causes GCE L7 load balancer's health checks and backends to flap as
described here - #36327
#36327.

A workaround is to manually install a firewall rule to expose the targets
of all the underlying clusters in a federation for each Federated Ingress
object so that the health checks can pass and GCE L7 load balancer is
stable. We will document this workaround in v1.5 and fix the issue in v1.6.

The proposed solution is to have each individual GLBC instance install a
separate firewall that is responsible for the cluster it is running in.
This requires a change in the firewall rule naming scheme. A disadvantage
in taking this approach is that, it requires a migration path for the
existing firewall rules in the running clusters.

An alternative solution is to have the Federated Ingress Controller
program the required firewall rule to allow traffic to all the underlying
clusters and a flag to GLBC to disable firewall rule creation.
Disadvantages with this approach are:

  1. It bakes in GCP-specific knowledge into Federated Ingress
    Controller which is against the original design goal of the controller.
  2. Multi-cluster Ingress users outside federation will still need an
    automated way to install these rules.

cc @kubernetes/sig-cluster-federation
https://github.com/orgs/kubernetes/teams/sig-cluster-federation
@nikhiljindal https://github.com/nikhiljindal @matchstick
https://github.com/matchstick @bprashanth
https://github.com/bprashanth


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#37306, or mute the
thread
https://github.com/notifications/unsubscribe-auth/AKa-zPWlZamk6L1lhatKnvF0jep5U1Fqks5rA0zOgaJpZM4K547n
.

@ghost
Copy link

ghost commented Feb 13, 2017

I think that we need to implement the relatively minor fix to bug in the GCE Ingress Controller (GLBC), so that it creates cluster-specific rather than global firewall rule names so that one GLBC does not overwrite the firewall rule created by another GLBC, in another cluster (which is what it erroneously does today).

As for the backwards compatibility concern (essentially, how do we cause the old globally named firewall rule to be deleted, eventually, once all cluster's GLBC controllers have been upgraded to the new scheme), I think that we can deal with that through a release note ("please delete this firewall rule when it's no longer required") as having an additional unneeded firewall rule lying around should be benign, and it's deletion can be regarded as a cosmetic cleanup, rather than required. @bprashanth, correct me if I'm missing anything here.

I think we can and should avoid moving the cloud-specific firewall management logic from the intentionally cloud specific GLBC to the ideally cloud-agnostic federated ingress controller, for a several reasons:

  1. The federation control plane currently does not interact directly with the cloud provider at all, and we want to keep it that way. It only interacts with the Kubernetes API, and with DNS provers, via a pluggable interface.
  2. Even though the current federated ingress controller contains some logic which only applies to GLBC (and hence implicitly GCE), all it's really doing is transferring annotations and a ConfigMap from one cluster to another, if they exist. While this is GLBC specific, it only requires knowledge of the Kubernetes API, and where the ConfigMap and annotations for GLBC are located. And yes, we can and should make this piece of logic more pluggable, but I don't think that's critical right now.
  3. The fix required to what is essentially a bug in GLBC (using a non-global name for the firewall rules it creates and manages, causing multiple instances of itself to overwrite each others firewall rules) is conceptually and practically much simpler than the logic that would be required in the federated ingress controller. I'm not even convinced that one can write it correctly in the federated ingress controller (because in some cases GLBC in e.g. non-federated clusters would manage the firewall rules, and in other cases, e.g. when adding a cluster to a federation, the control would have to transfer to the federated ingress controller in some orderly fashion - this just sounds hard/impossible to do correctly). There is a proposal to implement this in [Federation] Install the required firewall rule in the federated Ingress test. #37571 but there are some fairly serious problems with it.
  4. As mentioned above, I think that the concerns regarding an upgrade path are fairly easily addressed.

Q

@madhusudancs
Copy link
Contributor Author

madhusudancs commented Feb 13, 2017

As for the backwards compatibility concern (essentially, how do we cause the old globally named firewall rule to be deleted, eventually, once all cluster's GLBC controllers have been upgraded to the new scheme), I think that we can deal with that through a release note ("please delete this firewall rule when it's no longer required") as having an additional unneeded firewall rule lying around should be benign, and it's deletion can be regarded as a cosmetic cleanup, rather than required

Let's please not underestimate this effort. There are production clusters where people are running real stuff. At the very least, it has to go through rigorous upgrade testing. Everything looks conceptually and practically simpler on paper :)

There is a proposal to implement this in #37571 but there are some fairly serious problems with it.

I am not entirely sure what those serious problems are. Could you please elaborate? It is a test fix, a workaround until we get the code working, to give confidence that everything other than the known issues work.

@ghost
Copy link

ghost commented Feb 13, 2017

@madhusudancs You're right, the small GCLB change will need testing. But so equally will the more complex alternative (as if the federation control plane messes up the firewall rules, the clusters will also potentially be bricked). Admittedly the more complex alternative's logic will only be invoked on clusters which join a federation, but those may themselves be production clusters. So I don't think that the risk is significantly less there. Both need good testing.

I've added some comments to #37571

@madhusudancs
Copy link
Contributor Author

Other approaches don't need less testing or are relatively easier. Among the alternatives, per-cluster firewall is the relatively the least difficult. But that doesn't mean it is a cake walk.

@ethernetdan
Copy link
Contributor

Moving to 1.7 as late to happen in 1.6. Feel free to switch back if this is incorrect.

@ethernetdan ethernetdan modified the milestones: v1.7, v1.6 Mar 13, 2017
@madhusudancs
Copy link
Contributor Author

This was fixed in PRs #41942 and kubernetes/ingress-nginx#278

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/network Categorizes an issue or PR as relevant to SIG Network.
Projects
None yet
Development

No branches or pull requests

4 participants