Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DigitalOcean] Add load balancer support for master HA #8237

Merged
merged 6 commits into from
Feb 3, 2020

Conversation

srikiz
Copy link
Contributor

@srikiz srikiz commented Dec 31, 2019

This PR adds load balancer support for digital ocean cloud provider.

  • It would add a new load balancer based on a cluster master tag.

  • Incorporated the new load balancer address as part of the kuber config file.

  • Tested with 3 masters and a load balancer and was able to access the kubernetes cluster via the load balancer address from outside.

  • Deleting the cluster also deletes the load balancer.

  • Created a new cluster with the below command

./kops create cluster --cloud=digitalocean --name=dev5.k8s.local --networking=cilium --api-loadbalancer-type=public --master-count=3 --networking=cilium --zones=tor1 --ssh-public-key=~/.ssh/id_rsa.pub -v 10 --yes

Will be creating a new PR to add documentation for DO's support to multi-master with gossip based clustering.

@k8s-ci-robot
Copy link
Contributor

Hi @srikiz. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 31, 2019
@rifelpet
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 31, 2019
@srikiz
Copy link
Contributor Author

srikiz commented Jan 7, 2020

@timoreimann - Please review when you get a chance, thanks !

Copy link
Contributor

@timoreimann timoreimann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good -- left a bunch of improvement suggestions.


lbSpec := b.Cluster.Spec.API.LoadBalancer
if lbSpec == nil {
// Skipping API ELB creation; not requested in Spec
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ELB is probably from AWS? If so, we should just call it LB in my opinion.

Similar for several other occurrences in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, will do.

return fmt.Errorf("unhandled LoadBalancer type %q", lbSpec.Type)
}

loadbalancerName := "api-" + strings.Replace(b.ClusterName(), ".", "-", -1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we could extract the result of the Replace call into a variable for reusage on the next line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done.


// Create LoadBalancer for API ELB
var loadbalancer *dotasks.LoadBalancer
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to have the extra scope?

To me it looks like we can get rid of it and use := to define loadbalancer directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, updated.

// Note that this must match Digital Ocean's lb name
klog.V(2).Infof("Querying DO to find Loadbalancers for API (%q)", cluster.Name)

loadBalaners, _, err := c.LoadBalancers().List(context.TODO(), nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in the first variable name (missing c).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is List() from godo and thus needs paging?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that was the godo api to retrieve all the load balancers, so needed paging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed typo as well.

_, err := c.Client.LoadBalancers.Delete(context.TODO(), lb.ID)

if err != nil {
return fmt.Errorf("failed to delete load balancer with name %s", lb.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include err in the output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

}

for _, loadbalancer := range loadbalancers {
if loadbalancer.Name == fi.StringValue(lb.Name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if it's also possible to look up an LB by ID? That would allow us to do a single lookup using a different godo method, with no paging necessary whatsoever.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really helps, thanks for letting me know. Changed the logic to use LB ID instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should handle the case where the names don't match, e.g., logging.

I'm not sure I understand why we need the extra check though -- can we think of a case where the ID alone is insufficient?

}

klog.Errorf("Error fetching IP Address for lb Name=%s", fi.StringValue(lb.Name))
time.Sleep(10 * 3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

30?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed that to 10 seconds with time.Sleep(10 * time.Second)

return nil, errors.New("IP Address is still empty.")
}

func is_ipv4(host string) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICS, another way to implement this is to invoke net.ParseIP followed by To4 while checking that we always get a non-nil result on each step. That is,

ip := net.ParseIP(host)
if ip == nil {
  return false
}

return ip.To4() != nil

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, updated.

return nil, errors.New("IP Address is still empty.")
}

func is_ipv4(host string) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function name style should follow Go, i.e., be called isIPv4.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


address := loadBalancer.IP

if len(address) > 0 && is_ipv4(address) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can drop the len(address) > 0 check part, it's implicitly checked by is_ipv4.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 11, 2020
@srikiz
Copy link
Contributor Author

srikiz commented Jan 11, 2020

@timoreimann - I have incorporated all the comments. Please have a look when you get a chance. Thanks !

@srikiz
Copy link
Contributor Author

srikiz commented Jan 11, 2020

/test pull-kops-e2e-kubernetes-aws

Copy link
Contributor

@timoreimann timoreimann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good -- just two smallish things left.

// Note that this must match Digital Ocean's lb name
klog.V(2).Infof("Querying DO to find Loadbalancers for API (%q)", cluster.Name)

loadBalancers, _, err := c.LoadBalancers().List(context.TODO(), nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need paging?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I missed that. Updated it.

}

for _, loadbalancer := range loadbalancers {
if loadbalancer.Name == fi.StringValue(lb.Name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should handle the case where the names don't match, e.g., logging.

I'm not sure I understand why we need the extra check though -- can we think of a case where the ID alone is insufficient?

Add load balancer fi tasks

Add load balancer builder for DO

Fix go imports

Implement FindIPAddress functionality

Add load balancer api ingress status calls

Add error checks for FindIPAddress

Add delete LB option

Update load balancer delete logic

Revert make file changes

revert utils code changes

Revert NewDOCloud changes

Remove minor code comments

Update with gomod for dependencies
@srikiz
Copy link
Contributor Author

srikiz commented Jan 28, 2020

@timoreimann - I have incorporated the changes.
Please have a look when you get a chance.

Copy link
Contributor

@timoreimann timoreimann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good, thanks for cleaning it up further.

I caught just a few more smallish thing; after those, I think we should be ready.

return nil, fmt.Errorf("load balancer service get request returned error %v", err)
}

// do another check to double-sure we are talking to the right load balancer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment looks like it's obsolete by now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

// Loadbalancer = nil if not found
return nil, nil
Copy link
Contributor

@timoreimann timoreimann Jan 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if we do

if fi.StringValue(lb.ID) == "" {
  // Loadbalancer = nil if not found
  return nil, nil
}

cloud := c.Cloud.(*digitalocean.Cloud)
lbService := cloud.LoadBalancers()
loadbalancer, _, err := lbService.Get(context.TODO(), fi.StringValue(lb.ID))
[...]

then we do not need to intend the if-branch for the case where the LB ID exists. The code also seems easier to read because the check for the empty LB ID and the action we take are not far apart anymore.

(I also moved the cloud and lbService variables next to their usage -- they are not needed unless we have an ID.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, that's a good suggestion. Updated accordingly.

loadBalancerService := cloud.LoadBalancers()

klog.V(10).Infof("Find IP address for load balancer ID=%s", fi.StringValue(lb.ID))
loadBalancer, _, err := loadBalancerService.Get(context.TODO(), fi.StringValue(lb.ID))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we certain at this point that we have an LB ID? Genuine question as I don't know the kops work flow in this context that well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes @timoreimann - we are certain we have an LB ID, but not certain about the LB IP Address. FindAddress is only called after RenderDO is invoked - we create a new LB Request in RenderDO that returns a LB ID. We use this LB ID to find if the IP Address is available.

return &address, nil
}

klog.Errorf("Error fetching IP Address for lb Name=%s", fi.StringValue(lb.Name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we match the description with the error below and mention that we are sleeping so that users aren't surprised by the delay, e.g.:

const lbWaitTime = 10 * time.Second
klog.Errorf("IP address for LB %s not yet available -- sleeping %s", fi.StringValue(lb.Name), lbWaitTime)

I also extracted the wait time into a const so that we can reuse it in the actual time.Sleep call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, updated.

return &address, nil
}

klog.Errorf("Error fetching IP Address for lb Name=%s", fi.StringValue(lb.Name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do a klog.Warnf here? The delay should be expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

klog.Errorf("Error fetching IP Address for lb Name=%s", fi.StringValue(lb.Name))
time.Sleep(10 * time.Second)

klog.V(10).Infof("Sleeping for 10 seconds")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd drop this one given my improvement suggestion for the message above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, done.

return ip.To4() != nil
}

// terraformVolume represents the digitalocean_volume resource in terraform
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one looks like a trailing comment that shouldn't be here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, updated.

Copy link
Contributor

@timoreimann timoreimann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me, thanks a ton Sri.

LGTM (though it still needs approval from a kops maintainer / Kubernetes org member).

@rifelpet
Copy link
Member

rifelpet commented Feb 2, 2020

The E2E failures are unrelated, so we can move forward with this regardless. Thanks!

/test pull-kops-e2e-kubernetes-aws

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 2, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rifelpet, srikiz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 2, 2020
@rifelpet
Copy link
Member

rifelpet commented Feb 3, 2020

/test pull-kops-e2e-kubernetes-aws

@k8s-ci-robot k8s-ci-robot merged commit 4c6b874 into kubernetes:master Feb 3, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Feb 3, 2020
k8s-ci-robot added a commit that referenced this pull request May 25, 2020
…pstream-release-1.17

Automated cherry pick of #8237: Initial changes for load balancer task
@srikiz srikiz deleted the DO-AddLoadBalancer branch October 25, 2021 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants