GCE Ingress controller (GLBC) should install a firewall rule per Ingress object per cluster #37306

madhusudancs · 2016-11-22T20:12:17Z

Although this issue is about GLBC which is in its own repo - https://github.com/kubernetes/ingress/commits/master, this affects federation. So filing this issue here.

GLBC uses the UID specified in a ConfigMap in its cluster to construct the names of the GCE L7 load balancer resources it installs. In the multi-cluster Ingress case or the Federated Ingress case, things work as expected for all the resources except the firewall rules. GLBC is conservative about the firewall rules and it exposes only the nodes it is responsible for as the targets clobbering the target list. In the case of Federated Ingress, each GLBC instance running in an underlying cluster in a federation overwrites the targets specified by other instances and this causes GCE L7 load balancer's health checks and backends to flap as described here - #36327.

A workaround is to manually install a firewall rule to expose the targets of all the underlying clusters in a federation for each Federated Ingress object so that the health checks can pass and GCE L7 load balancer is stable. We will document this workaround in v1.5 and fix the issue in v1.6.

The proposed solution is to have each individual GLBC instance install a separate firewall that is responsible for the cluster it is running in. This requires a change in the firewall rule naming scheme. A disadvantage in taking this approach is that, it requires a migration path for the existing firewall rules in the running clusters.

An alternative solution is to have the Federated Ingress Controller program the required firewall rule to allow traffic to all the underlying clusters and a flag to GLBC to disable firewall rule creation. Disadvantages with this approach are:

It bakes in GCP-specific knowledge into Federated Ingress Controller which is against the original design goal of the controller.
Multi-cluster Ingress users outside federation will still need an automated way to install these rules.

cc @kubernetes/sig-cluster-federation @nikhiljindal @matchstick @bprashanth

bprashanth · 2016-11-22T20:35:33Z

Re moving firewall logic into the federated controller, you should probably
model it after the service controller with pluggable cloud specific
backends. Today the federated ingress controller's primary function is:

UID management
setting static-ip annotation

Both of which are GCP specific, since the on-prem nginx controller and by
extension the AWS ingress (which is basically a service.Type=LoadBalancer
in front of an nginx ingress at this point) don't care about them. I
suspect cross cluster loadbalancing is going to bring some site specific
twists anyway, as we start thinking about more platforms.

On Tue, Nov 22, 2016 at 12:13 PM, Madhusudan.C.S notifications@github.com
wrote:

Although this issue is about GLBC which is in its own repo -
https://github.com/kubernetes/ingress/commits/master, this affects
federation. So filing this issue here.

GLBC uses the UID specified in a ConfigMap in its cluster to construct the
names of the GCE L7 load balancer resources it installs. In the
multi-cluster Ingress case or the Federated Ingress case, things work as
expected for all the resources except the firewall rules. GLBC is
conservative about the firewall rules and it exposes only the nodes it is
responsible for as the targets clobbering the target list. In the case of
Federated Ingress, each GLBC instance running in an underlying cluster in a
federation overwrites the targets specified by other instances and this
causes GCE L7 load balancer's health checks and backends to flap as
described here - #36327
#36327.

A workaround is to manually install a firewall rule to expose the targets
of all the underlying clusters in a federation for each Federated Ingress
object so that the health checks can pass and GCE L7 load balancer is
stable. We will document this workaround in v1.5 and fix the issue in v1.6.

The proposed solution is to have each individual GLBC instance install a
separate firewall that is responsible for the cluster it is running in.
This requires a change in the firewall rule naming scheme. A disadvantage
in taking this approach is that, it requires a migration path for the
existing firewall rules in the running clusters.

An alternative solution is to have the Federated Ingress Controller
program the required firewall rule to allow traffic to all the underlying
clusters and a flag to GLBC to disable firewall rule creation.
Disadvantages with this approach are:

It bakes in GCP-specific knowledge into Federated Ingress
Controller which is against the original design goal of the controller.

Multi-cluster Ingress users outside federation will still need an
automated way to install these rules.

cc @kubernetes/sig-cluster-federation
https://github.com/orgs/kubernetes/teams/sig-cluster-federation
@nikhiljindal https://github.com/nikhiljindal @matchstick
https://github.com/matchstick @bprashanth
https://github.com/bprashanth

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#37306, or mute the
thread
https://github.com/notifications/unsubscribe-auth/AKa-zPWlZamk6L1lhatKnvF0jep5U1Fqks5rA0zOgaJpZM4K547n
.

…ks problem. Ref issue: kubernetes/kubernetes#36327 and kubernetes/kubernetes#37306

ghost · 2017-02-13T19:38:04Z

I think that we need to implement the relatively minor fix to bug in the GCE Ingress Controller (GLBC), so that it creates cluster-specific rather than global firewall rule names so that one GLBC does not overwrite the firewall rule created by another GLBC, in another cluster (which is what it erroneously does today).

As for the backwards compatibility concern (essentially, how do we cause the old globally named firewall rule to be deleted, eventually, once all cluster's GLBC controllers have been upgraded to the new scheme), I think that we can deal with that through a release note ("please delete this firewall rule when it's no longer required") as having an additional unneeded firewall rule lying around should be benign, and it's deletion can be regarded as a cosmetic cleanup, rather than required. @bprashanth, correct me if I'm missing anything here.

I think we can and should avoid moving the cloud-specific firewall management logic from the intentionally cloud specific GLBC to the ideally cloud-agnostic federated ingress controller, for a several reasons:

The federation control plane currently does not interact directly with the cloud provider at all, and we want to keep it that way. It only interacts with the Kubernetes API, and with DNS provers, via a pluggable interface.
Even though the current federated ingress controller contains some logic which only applies to GLBC (and hence implicitly GCE), all it's really doing is transferring annotations and a ConfigMap from one cluster to another, if they exist. While this is GLBC specific, it only requires knowledge of the Kubernetes API, and where the ConfigMap and annotations for GLBC are located. And yes, we can and should make this piece of logic more pluggable, but I don't think that's critical right now.
The fix required to what is essentially a bug in GLBC (using a non-global name for the firewall rules it creates and manages, causing multiple instances of itself to overwrite each others firewall rules) is conceptually and practically much simpler than the logic that would be required in the federated ingress controller. I'm not even convinced that one can write it correctly in the federated ingress controller (because in some cases GLBC in e.g. non-federated clusters would manage the firewall rules, and in other cases, e.g. when adding a cluster to a federation, the control would have to transfer to the federated ingress controller in some orderly fashion - this just sounds hard/impossible to do correctly). There is a proposal to implement this in [Federation] Install the required firewall rule in the federated Ingress test. #37571 but there are some fairly serious problems with it.
As mentioned above, I think that the concerns regarding an upgrade path are fairly easily addressed.

Q

madhusudancs · 2017-02-13T19:47:17Z

As for the backwards compatibility concern (essentially, how do we cause the old globally named firewall rule to be deleted, eventually, once all cluster's GLBC controllers have been upgraded to the new scheme), I think that we can deal with that through a release note ("please delete this firewall rule when it's no longer required") as having an additional unneeded firewall rule lying around should be benign, and it's deletion can be regarded as a cosmetic cleanup, rather than required

Let's please not underestimate this effort. There are production clusters where people are running real stuff. At the very least, it has to go through rigorous upgrade testing. Everything looks conceptually and practically simpler on paper :)

There is a proposal to implement this in #37571 but there are some fairly serious problems with it.

I am not entirely sure what those serious problems are. Could you please elaborate? It is a test fix, a workaround until we get the code working, to give confidence that everything other than the known issues work.

ghost · 2017-02-13T20:09:07Z

@madhusudancs You're right, the small GCLB change will need testing. But so equally will the more complex alternative (as if the federation control plane messes up the firewall rules, the clusters will also potentially be bricked). Admittedly the more complex alternative's logic will only be invoked on clusters which join a federation, but those may themselves be production clusters. So I don't think that the risk is significantly less there. Both need good testing.

I've added some comments to #37571

madhusudancs · 2017-02-13T20:13:40Z

Other approaches don't need less testing or are relatively easier. Among the alternatives, per-cluster firewall is the relatively the least difficult. But that doesn't mean it is a cake walk.

ethernetdan · 2017-03-13T22:53:35Z

Moving to 1.7 as late to happen in 1.6. Feel free to switch back if this is incorrect.

madhusudancs · 2017-03-13T23:03:09Z

This was fixed in PRs #41942 and kubernetes/ingress-nginx#278

madhusudancs added area/cluster-federation sig/network Categorizes an issue or PR as relevant to SIG Network. labels Nov 22, 2016

madhusudancs added this to the v1.6 milestone Nov 22, 2016

madhusudancs self-assigned this Nov 22, 2016

madhusudancs mentioned this issue Nov 22, 2016

Federated ingress creates flapping backends and health checks #36327

Closed

madhusudancs added a commit to madhusudancs/kubernetes.github.io that referenced this issue Nov 22, 2016

Document the workaround for the Federated Ingress flapping healthchec…

0c5c99f

…ks problem. Ref issue: kubernetes/kubernetes#36327 and kubernetes/kubernetes#37306

madhusudancs mentioned this issue Dec 23, 2016

[Federation] unable to create a federated ingress #39087

Closed

nikhiljindal mentioned this issue Jan 17, 2017

federation ingress: known issues #39989

Closed

5 tasks

ghost mentioned this issue Feb 13, 2017

[Federation] Install the required firewall rule in the federated Ingress test. #37571

Closed

madhusudancs assigned csbell and unassigned madhusudancs Feb 13, 2017

csbell mentioned this issue Feb 22, 2017

Extend ConfigMap to store fwrule names kubernetes/ingress-nginx#278

Merged

madhusudancs mentioned this issue Mar 2, 2017

Add ProviderUid support to Federated Ingress #41942

Merged

ethernetdan modified the milestones: v1.7, v1.6 Mar 13, 2017

madhusudancs closed this as completed Mar 13, 2017

madhusudancs modified the milestones: v1.6, v1.7 Mar 13, 2017

This was referenced Oct 30, 2017

Federated ingress creates flapping backends and health checks kubernetes-retired/federation#79

Closed

federation ingress: known issues kubernetes-retired/federation#99

Closed

nicksardo mentioned this issue Jan 24, 2018

Delegate creating firewall rule to ingress-gce GoogleCloudPlatform/k8s-multicluster-ingress#122

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GCE Ingress controller (GLBC) should install a firewall rule per Ingress object per cluster #37306

GCE Ingress controller (GLBC) should install a firewall rule per Ingress object per cluster #37306

madhusudancs commented Nov 22, 2016

bprashanth commented Nov 22, 2016

ghost commented Feb 13, 2017

madhusudancs commented Feb 13, 2017 •

edited

Loading

ghost commented Feb 13, 2017

madhusudancs commented Feb 13, 2017

ethernetdan commented Mar 13, 2017

madhusudancs commented Mar 13, 2017

GCE Ingress controller (GLBC) should install a firewall rule per Ingress object per cluster #37306

GCE Ingress controller (GLBC) should install a firewall rule per Ingress object per cluster #37306

Comments

madhusudancs commented Nov 22, 2016

bprashanth commented Nov 22, 2016

ghost commented Feb 13, 2017

madhusudancs commented Feb 13, 2017 • edited Loading

ghost commented Feb 13, 2017

madhusudancs commented Feb 13, 2017

ethernetdan commented Mar 13, 2017

madhusudancs commented Mar 13, 2017

madhusudancs commented Feb 13, 2017 •

edited

Loading