Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Azure internal load balancer #43510

Merged
merged 5 commits into from
Apr 19, 2017

Conversation

karataliu
Copy link
Contributor

Which issue this PR fixes
Fixes #38901

What this PR does / why we need it:
This PR is to add support for Azure internal load balancer

Currently when exposing a serivce with LoadBalancer type, Azure provider would assume that it requires a public load balancer.
Thus it will request a public IP address resource, and expose the service via that public IP.
In this case we're not able to apply private IP addresses (within the cluster virtual network) for the service.

Special notes for your reviewer:

  1. Clarification:
    a. 'LoadBalancer' refers to an option for 'type' field under ServiceSpec. See https://kubernetes.io/docs/resources-reference/v1.5/#servicespec-v1
    b. 'Azure LoadBalancer' refers a type of Azure resource. See https://docs.microsoft.com/en-us/azure/load-balancer/load-balancer-overview

  2. For a single Azure LoadBalancer, all frontend ip should reference either a subnet or publicIpAddress, which means that it could be either an Internet facing load balancer or an internal one.
    For current provider, it would create an Azure LoadBalancer with generated '${loadBalancerName}' for all services with 'LoadBalancer' type.
    This PR introduces name '${loadBalancerName}-internal' for a separate Azure Load Balancer resource, used by all the service that requires internal load balancers.

  3. This PR introduces a new annotation for the internal load balancer type behaviour:
    a. When the annotaion value is set to 'false' or not set, it falls back to the original behaviour, assuming that user is requesting a public load balancer;
    b. When the annotaion value is set to 'true', the following rule applies depending on 'loadBalancerIP' field on ServiceSpec:

    • If 'loadBalancerIP' is not set, it will create a load balancer rule with dynamic assigned frontend IP under the cluster subnet;
    • If 'loadBalancerIP' is set, it will create a load balancer rule with the frontend IP set to the given value. If the given value is not valid, that is, it does not falls into the cluster subnet range, then the creation will fail.
  4. Users may change the load balancer type by applying the annotation to the service at runtime.
    In this case, the load balancer rule would need to be 'switched' between the internal one and external one.
    For example, it we have a service with internal load balancer, and then user removes the annotation, making it to a public one. Before we creating rules in the public Azure LoadBalancer, we'll need to clean up rules in the internal Azure LoadBalancer.

Release note:

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 22, 2017
@k8s-ci-robot
Copy link
Contributor

Hi @karataliu. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with @k8s-bot ok to test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-reviewable
Copy link

This change is Reviewable

@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-label-needed labels Mar 22, 2017
@khenidak
Copy link
Contributor

CC @colemickens @brendandburns

@khenidak
Copy link
Contributor

@karataliu Hi! just a QQ ensureLoadBalancerDeleted() does not do anything when delete is happening on ilb? is this expected? wouldn't that leave dangling load balancing rules on the lib?

@colemickens
Copy link
Contributor

Few questions, haven't looked at code yet:

  1. Is the value of the annotation a bool, or is the desired private subnet IP? Seems like you describe it as both?

  2. What happens if the user pre-supposes the private ip and it is already taken? Is the failure surfaced well to the user?

@colemickens
Copy link
Contributor

There's a number of edge-cases here to consider too. For example, what happens if you create two services (one ILB, one ELB) w/ frontend port = 80 and backend port = 80?

When they're on the same LB object, Azure will allow this if DSR is enabled. If they're on separate LB objects, even with DSR enabled on both, it's very possible that we will reject it. Can you confirm?

@karataliu
Copy link
Contributor Author

@khenidak

Thanks for the review.

For 'ensureLoadBalancerDeleted', please check the new file in the diff panel.
https://github.com/kubernetes/kubernetes/pull/43510/files#diff-c901394068476b4ccb003a6c6efad57cR281

At Line272 it figures out the loadbalancer name (based on internal or not), and then at Line281~Line302 it performs the same update logic taken from original code.

Basically the call to 'reconcileLoadBalancer' will calculate the rule to be updated, and later code will apply the update.

@karataliu
Copy link
Contributor Author

karataliu commented Mar 23, 2017

@colemickens

Thanks for the review.

Q: Is the value of the annotation a bool, or is the desired private subnet IP? Seems like you describe it as both?

A: The annotation value is a boolean only. We'll then look up 'loadBalancerIP' for the desired private subnet IP. Please check Note#3.

Q: What happens if the user pre-supposes the private ip and it is already taken? Is the failure surfaced well to the user?

A: This is a similar case to that of proposing a private IP which does not fall into the subnet IP range.
It will trigger following entry in the service's events.

Type:Warning
Reason:CreatingLoadBalancerFailed
Message: ... Private IP address ... from the subnet can be assigned to only one IP configuration. ...

Which comes from

s.eventRecorder.Event(service, v1.EventTypeWarning, "CreatingLoadBalancerFailed", message)

The controller will keep retrying. And if later at some time the assigned IP is freed, the service creation will eventually complete successfully.

Q: What happens if you create two services (one ILB, one ELB) w/ frontend port = 80 and backend port = 80?
When they're on the same LB object, Azure will allow this if DSR is enabled. If they're on separate LB objects, even with DSR enabled on both, it's very possible that we will reject it. Can you confirm?

A: Azure will allow ILB and ELB based on identical backend IPConfiguration to reuse the same backend port, as long as 'EnableFloatingIP' (DSR) option is set to true.
In my test I created 2 services with same port configuration, only difference is that one of them has set the internal load balancer annotaion.
Since we've enabled EnableFloatingIP option in code, both services are created successfully. The loadbalancer rules and connectivity for services also appear ok.

service.Spec.Ports = []v1.ServicePort{}
az.ensureLoadBalancerDeleted(clusterName, service, isInternal)

sg, existsSg, err := az.getSecurityGroup()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the following be moved to the overloaded func?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've got two Azure LoadBalancer Resource (internal, external), but only one Network Security Group (bound to vnet).

'ensureLoadBalancerDeleted' is for reconciliation of rules under certain Azure LoadBalancer resource.

If the service requests an external LoadBalancer:
Upon creation, controller calls it (isInternal=true) on internal load balancer resource to ensure the related internal rules deleted (in case the service was switched from internal load balancer type).
Then controller calls 'reconcileSecurityGroup' (wantLb=true) to ensure new rules under Network Security Group.

Upon deletion, controller calls it (isInternal=false) on external load balancer resource to ensure the related external rules deleted.
Then controller calls 'reconcileSecurityGroup' (wantLb=false) to cleanup related rules under Network Security Group.

For internal LoadBalancer, it'll be. creation: 'isInternal=false', 'wantLb=true'. Deletion: 'isInternal=true', 'wantLb=false'.

That is, the parameter value of 'isInternal' on 'ensureLoadBalancerDeleted', is orthogonal to 'wantLb' on 'reconcileSecurityGroup'. In this case, they may be not well suited to combine.

I think we may do some refactor later to wrap the similar code logic around 'reconcileSecurityGroup' calls, however.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about passing in another flag, say, "cleanupSecurityGroup" which defaults to true? Another option is to pull the IP removal part out the original func also, and rename the overloaded one to something like "deleteLoadBalancerOnly" to avoid confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, the name 'ensureLoadBalancerDeleted' is somewhat confusing.
Will rename 'ensureLoadBalancerDeleted' to 'deleteLoadBalancerResource', and create a separate func for 'deletePublicIpResource'.

Current PR branch is conflicting with a newly merged change. Will update after rebasing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added 'cleanupLoadBalancer' and 'cleanupPublicIP' in new commit.

@@ -105,7 +127,50 @@ func (az *Cloud) EnsureLoadBalancer(clusterName string, service *v1.Service, nod
}
}

lb, lbNeedsUpdate, err := az.reconcileLoadBalancer(lb, pip, clusterName, service, nodes)
var lbIP *string

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to move the added portion to a new func? This one is getting so much longer with this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the function gets a bit long. But we'll also need to pass down the external parameters to the spited func.

I've got a try for the change: karataliu@3848fb4 , and could be evaluated later.


sourceRanges, err := serviceapi.GetLoadBalancerSourceRanges(service)
if err != nil {
return sg, false, err
}
var sourceAddressPrefixes []string
if sourceRanges == nil || serviceapi.IsAllowAll(sourceRanges) {
sourceAddressPrefixes = []string{"Internet"}
if !requiresInternalLoadBalancer(service) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So sourceAddressPrefixes is nil for the internal case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and it would function like an empty array.
Traffics from vnet is allowed by default rule, thus there is no need to add an explicit one.

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 27, 2017
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 27, 2017
@gmarek
Copy link
Contributor

gmarek commented Apr 3, 2017

/unassign @gmarek
/assign @mikedanese

@k8s-ci-robot k8s-ci-robot assigned mikedanese and unassigned gmarek Apr 3, 2017
@karataliu
Copy link
Contributor Author

@colemickens, @khenidak Can you help take a look? Thanks.

@brendandburns
Copy link
Contributor

A couple of nits, but this LGTM.

Can you add an e2e test for this?

Thanks!
--brendan

@karataliu
Copy link
Contributor Author

@brendandburns
Thanks for the comments. I've pushed the change for log level update.

As for e2e test, I did a search around and find the following two related test cases:
https://github.com/kubernetes/kubernetes/blob/dea98cb/test/e2e/service.go#L469
https://github.com/kubernetes/kubernetes/blob/dea98cb/test/e2e/service.go#L1168

They are both testing cloud Load Balancer, and against certain cloud providers (gce, gke, aws).

The change in this PR is limited to Azure provider only. But seems for now 'azure' as an E2E cloud provider is not ready. Do we need to add that support first? If so, shall we include it in the same PR? Could you please kindly suggest?

Related: #22715

@karataliu
Copy link
Contributor Author

Regarding the failed test (Jenkins GCE e2e), the test history indicates it passed at first round. The newly added commit only changes log level, which doesn't seem to cause the break. Any idea how to get it rerun, or to deal with such a situation? Thanks.

@karataliu
Copy link
Contributor Author

Update the links on previous comment:
https://github.com/kubernetes/kubernetes/blob/877dc56/test/e2e/service.go#L469
https://github.com/kubernetes/kubernetes/blob/877dc56/test/e2e/service.go#L1168

Adding E2E tests might require some extra work in test-infra project.

Any suggestions on how to proceed with this PR? Thank you.

@colemickens
Copy link
Contributor

@brendandburns Do you want to gate this on adding e2e tests, given there's no groundwork for e2e for Azure right now? (Not sure how much that entails)

@karataliu We have the e2e tests running in Azure right now, without doing work in test-infra. Do you have to do work in test-infra just to add a new golang-based e2e test that we can then execute in our Jenkins system on Azure?

@karataliu
Copy link
Contributor Author

@colemickens I just did some search around and found there's cloud testing related code lies in test-infra. It is fine to use other approaches. Thanks.

@colemickens
Copy link
Contributor

@karataliu From my time in that repo, I believe that's more about describing the jobs that will run in various Jenkins setups to run the actual e2e tests. But the e2e tests all just live in the main Kubernetes code base. I don't think there's any changes needed, but it might be hard for you to run the tests without having access to a Jenkins box that is already running the tests in/against Azure. If that becomes a road-block, we can share details of the Jenkins jobs that we use so that others can create similar jobs to run the e2e ginkgo tests.

@brendandburns
Copy link
Contributor

fwiw, the tests are here in the main repo:

https://github.com/kubernetes/kubernetes/tree/master/test/e2e

I'm ok with merging this as is, and then adding a test as a follow on, but I'd really like to make sure we get some coverage on this code.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 19, 2017
@brendandburns
Copy link
Contributor

@k8s-bot cvm gce e2e test this

@brendandburns
Copy link
Contributor

/approve

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: brendandburns, karataliu

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 19, 2017
@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 44645, 44639, 43510)

@k8s-github-robot k8s-github-robot merged commit d2060ad into kubernetes:master Apr 19, 2017
@karataliu
Copy link
Contributor Author

Thanks for the update. I'll keep a note on adding test back for the change. Thank you.

var fipConfigurationProperties *network.FrontendIPConfigurationPropertiesFormat

if isInternal {
subnet, existsSubnet, err := az.getSubnet(az.VnetName, az.SubnetName)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey guys, just a quick question regarding this.

Does the subnet need to be in the same resource group (in Azure) as the cluster?
This should not be the case given we can create the kubernetes cluster with an existing VNet in different resource groups.

Getting the following error (the subnet (internal-network/kube-machines) is in a different resource group, it is the same subnet that the agent machines are on):
Error creating load balancer (will retry): Failed to create load balancer for service default/rabbitmq: ensure(default/rabbitmq): lb(XXXXXX-internal) - failed to get subnet: internal-network/kube-machines

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cloudprovider really isn't setup well to handle multi-resource-group things right now unfortunately. There's a larger task to revamp and use resource IDs everywhere instead of resource "names" that are assumed to be under a single RG.

@dcieslak19973
Copy link

Will this be added in 1.7?

@calebamiles
Copy link
Contributor

@dcieslak19973 this should land in 1.7 and is in the process of being backported to 1.6

k8s-github-robot pushed a commit that referenced this pull request May 17, 2017
…dbalancer

Automatic merge from submit-queue

Cherry pick azure internal loadbalancer

Backports fixes for Azure load balancer: 
- [Add support for Azure internal load balancer](#43510)
- [Add support for bring-your-own ip address for Services on Azure](#42034)

**Release note**:
```release-note
Add support for Azure internal load balancer.
```
k8s-github-robot pushed a commit that referenced this pull request Jun 21, 2017
Automatic merge from submit-queue

Add E2E tests for Azure internal loadbalancer support, fix an issue for public IP resource deletion.

**What this PR does / why we need it**:

- Add E2E tests for Azure internal loadbalancer support: #43510
- Fix an issue that public IP resource not get deleted when switching from external loadbalancer to internal static loadbalancer.

**Special notes for your reviewer**:

1.  Add new Azure resource tag to Public IP resources to indicate kubernetes managed resources.
   Currently we determine whether the public IP resource should be deleted by looking at LoadBalancerIp property on spec. In the scenario 'Switching from external loadbalancer to internal loadbalancer with static IP', that value might have been updated for internal loadbalancer. So here we're to add an explicit tag for kubernetes managed resources.

2. Merge cleanupPublicIP logic into cleanupLoadBalancer

**Release note**:
NONE

CC @brendandburns @colemickens
@djsly
Copy link
Contributor

djsly commented Nov 20, 2017

It was discovered that the Azure Internal Load Balancer doesn't open up the NSG associated with the Subnet where the Internal LB is created, unlike the Public LB.

#56086

calebamiles pushed a commit to kubernetes-sigs/cloud-provider-azure that referenced this pull request Mar 21, 2018
Automatic merge from submit-queue

Add E2E tests for Azure internal loadbalancer support, fix an issue for public IP resource deletion.

**What this PR does / why we need it**:

- Add E2E tests for Azure internal loadbalancer support: kubernetes/kubernetes#43510
- Fix an issue that public IP resource not get deleted when switching from external loadbalancer to internal static loadbalancer.

**Special notes for your reviewer**:

1.  Add new Azure resource tag to Public IP resources to indicate kubernetes managed resources.
   Currently we determine whether the public IP resource should be deleted by looking at LoadBalancerIp property on spec. In the scenario 'Switching from external loadbalancer to internal loadbalancer with static IP', that value might have been updated for internal loadbalancer. So here we're to add an explicit tag for kubernetes managed resources.

2. Merge cleanupPublicIP logic into cleanupLoadBalancer

**Release note**:
NONE

CC @brendandburns @colemickens
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

azure Feature Request: Services with LoadBalancer should allow us to configure the azure lb with a private IP