New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
azure: add OutboundType for controlling egress #3324
azure: add OutboundType for controlling egress #3324
Conversation
@smarterclayton @ironcladlou if any one can help check that this doesn't break any ipv6 assumptions.. that seems like pretty fragile and easy to regress. Thanks! |
/test e2e-azure |
/retest |
default = false | ||
|
||
description = <<EOF | ||
This determined whether User defined routing will be used for egress to Internet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would change this (and all the other blocks that use it) to "determines".
data/data/azure/vnet/public-lb.tf
Outdated
// true,true,false,true = true | ||
// true,true,true,false = true | ||
// true,true,true,true = true | ||
// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe make this into a truth table?
/approve |
7af719a
to
ff5c71e
Compare
/test e2e-azure |
1 similar comment
/test e2e-azure |
/lgtm |
count = var.use_ipv4 || true ? 1 : 0 | ||
need_public_ipv4 = ! var.private || ! var.outbound_udr | ||
|
||
need_public_ipv6 = var.use_ipv6 && (! var.private || ! var.outbound_udr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe
need_public_ipv6 = var.use_ipv6 && local.need_public_ipv4
@@ -938,10 +947,6 @@ spec: | |||
used when installing on bare metal for machine pools which do | |||
not define their own platform configuration. | |||
type: object | |||
dnsVIP: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this intentionally removed?
/approve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, just one question.
if platform == azuretypes.Name && installConfig.Config.Publish == types.InternalPublishingStrategy { | ||
if platform == azuretypes.Name && | ||
installConfig.Config.Publish == types.InternalPublishingStrategy && | ||
installConfig.Config.Azure.OutboundType == azuretypes.LoadbalancerOutboundType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what configuration this outbound service is still needed? Can it be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mjudeikis It is required for internal clusters with outbound-loadbalancer. We cannot depend on machinesets to manage the membership of the compute to the backend of the public LB. We need to keep the headless SLB to keep the compatibility when nodes are added without machinesets like in UPI/bring your own RHEL.
/retest |
/test e2e-aws |
OutboundType is a strategy for how egress from cluster is achieved Currently 2 strategies are: - loadbalancer (default) this uses the standard load balancer pointing to the control plane and compute nodes to provide egress based on [1] for internal cluster, this means an public lb will be used to provide egress, i.e. even though all the endpoints of the cluster are internal and only accessible to the virtual network, there exists a public lb with public ip that provides the egress for the cluster - user defined routing this allows the user to configure the routing of the virtual network on their choosing. the installer is not expected to setup any egress, rather the installer should turn off egress from the standard lb. users have an option to setup the egress using [2]. An example is how AKS docuements using UDR for egress [3]. This can also be used if the user is using all egress to internet using a proxy, or when the user expects no egress to the internet at all. Since the routing for the virtual network needs to setup by the user up-front before installing a cluster, using this strategy would require a pre-existing network [1]: https://docs.microsoft.com/en-us/azure/load-balancer/load-balancer-outbound-connections#lb [2]: https://docs.microsoft.com/en-us/azure/virtual-network/virtual-networks-udr-overview [3]: https://docs.microsoft.com/en-us/azure/aks/egress-outboundtype
For internal clusters, a dummy k8s service type LB is created for cluster. This service creates a public standard LB which is then used by compute nodes as an egress pathway to internet. Without this compute nodes in internal cluster would have no way to reach out to the internet. But when the user is not using LB outbound type, i.e UDR, this service is no longer required because the user already has setup the network.
* when is public ipv4 LB required - it's required for external clusters - it's required for internal clusters when outbound is required using LB - it's required for IPv6 clusters because Azure RM LB's can't be just ipv6.. :/ * when is public ipv6 LB required - it's required for external ipv6 clusters - it's required for internal clusters when outbound is required using LB * azure_lb_rules - the k8s API rules are required for external clusters - the k8s API rules provide SNAT only when outbound is done using LB - the k8s API rules are not required for internal clusters - the outbound rules (dummy rules) are required for internal clusters when outboubd is done using LB * azure_lb_backends - the backends are only created when the frontend configurations are created because of the failure from Azure ``` Load Balancer /subscriptions/xx/resourceGroups/xx/providers/Microsoft.Network/loadBalancers/xx-public-lb does not have Frontend IP Configuration, but it has other child resources. This setup is not supported. ``` - since the backends are not created for certain cases, the master and bootstrap modules need to skip adding the virtual machines to the azure lb backends. Although it should be simple to switch on whether the backend was created i.e null/vs not null, the terraform issue [1] doesn't allow such conditional in count. ``` ERROR Error: Invalid count argument ERROR ERROR on ../../../../../../../tmp/openshift-install-941276375/bootstrap/main.tf line 142, in resource "azurerm_netwo ERROR 142: count = var.elb_backend_pool_v4_id == null ? 0 : 1 ERROR ERROR The "count" value depends on resource attributes that cannot be determined ERROR until apply, so Terraform cannot predict how many instances will be created. ERROR To work around this, use the -target argument to first apply only the ERROR resources that the count depends on. ``` So the master and bootstrap modules need to recreate the conditions of `need_public_ipv{4,6}` using the inputs `use_ipv{4,6}`, `private`, `outbound_udr` [1]: hashicorp/terraform#12570
…as soon as they are created On Azure the route to internet depends on various factors but for openshift where all machines have internal ips, whether machine belongs to a public lb backend decides how the traffic is routed to the internet. when the machine boots up, it is not part of any LB backend so all the traffic flows based on scenario 3 in [1], while as soon as k8s cloud provider adds the node to the public LB backend the traffic flows based on scenario 2 in [1]. To make sure that traffic to internet takes one path, it's useful to add the machine to the LB backed as soon as it is created. Since there is only one public LB since 7c1a274 , we can use the machine-api to add the machines to correct backend as soon as they are created. For outbound type UDR, the machines do not have to be added to the LB as even when machine is added to the LB by k8s cloud provider, it does not affect the egress as it is controlled by however the user has setup the outbound routing. [1]: https://docs.microsoft.com/en-us/azure/load-balancer/load-balancer-outbound-connections
c7bd93d
to
19ceb44
Compare
Rebase around #3634 ping @jhixson74 for lgtm |
/approve |
/test e2e-azure |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: abhinavdahiya, jhixson74, mjudeikis The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Please review the full test history for this PR and help us cut down flakes. |
@abhinavdahiya: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest Please review the full test history for this PR and help us cut down flakes. |
enhancement: openshift/enhancements#338
xref: https://issues.redhat.com/browse/CORS-1365
Similar to AKS outbound type
pkg/types/azure: add OutboundType for controlling egress
OutboundType is a strategy for how egress from cluster is achieved
Currently 2 strategies are:
loadbalancer (default)
this uses the standard load balancer pointing to the control plane and compute nodes to provide egress based on Azure Outbound for private machines behind standard LB
for internal cluster, this means an public lb will be used to provide egress, i.e. even though all the endpoints of the cluster are internal and only accessible to the virtual network, there exists a public lb with public ip that provides the egress for the cluster
user defined routing
this allows the user to configure the routing of the virtual network on their choosing. the installer is not expected to setup any egress, rather the installer should turn off egress from the standard lb. users have an option to setup the egress using Azure network user defined routing. An example is how AKS docuements using UDR for egress AKS outbound type UDR.
This can also be used if the user is using all egress to internet using a proxy, or when the user expects no egress to the internet at all.
Since the routing for the virtual network needs to setup by the user up-front before installing a cluster, using this strategy would require a pre-existing network
manifests/azure: no outbound k8s service type LB for UDR
For internal clusters, a dummy k8s service type LB is created for cluster. This service creates a public standard LB which is then used by compute nodes as an egress pathway to internet. Without this compute nodes in internal cluster would have no way to reach out to the internet.
But when the user is not using LB outbound type, i.e UDR, this service is no longer required because the user already has setup the network.
data/azure: implement outbound type LB and UDR
So the master and bootstrap modules need to recreate the conditions of
need_public_ipv{4,6}
using the inputsuse_ipv{4,6}
,private
,outbound_udr
/assign @fabianofranz @jhixson74