Bug 1829824: Remove dead member from LB pool. #612

gryf · 2020-05-04T16:09:54Z

During installation a bootstrap node is used as an API node at first,
but then gets removed. In Kuryr's bootstrap we never remove it from API
loadbalancer pool which leads to Octavia stating that LB is in degraded
state.

With those changes we prevent it by making sure on reconciliation we
remove loadbalancer members that we don't need there anymore. This
should help with scaledown of masters too.

openshift-ci-robot · 2020-05-04T16:10:01Z

@gryf: This pull request references Bugzilla bug 1829824, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target release (4.5.0) matches configured target release for branch (4.5.0)
bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1829824: Remove dead member from LB pool.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

MaysaMacedo

Looks good, just some minor issues, plus the jobs failures.

MaysaMacedo · 2020-05-04T20:11:05Z

pkg/platform/openstack/kuryr_bootstrap.go

@@ -481,6 +483,11 @@ func BootstrapKuryr(conf *operv1.NetworkSpec, kubeClient client.Client) (*bootst
 		}
 	}

+	err := purgeOpenStackLbPoolMember(client, poolId, addresses)


the variable err already exists, no need to recreate it, just to update.

MaysaMacedo · 2020-05-04T20:20:42Z

pkg/platform/openstack/loadbalancer.go

@@ -411,3 +411,34 @@ func listOpenStackOctaviaProviders(client *gophercloud.ServiceClient) ([]provide
 		return providersList, nil
 	}
 }
+
+// Iterate on pool members and check their address against provided list.
+// Remove all surplus members, which address doesn't existst on that list.


typo: exist. Nit: better to specify what is the provided list.

MaysaMacedo · 2020-05-04T20:21:19Z

pkg/platform/openstack/kuryr_bootstrap.go

@@ -481,6 +483,11 @@ func BootstrapKuryr(conf *operv1.NetworkSpec, kubeClient client.Client) (*bootst
 		}
 	}

+	err := purgeOpenStackLbPoolMember(client, poolId, addresses)
+	if err != nil {
+		return nil, errors.Wrap(err, "Failed on purging LB pool.")


Nit: LB pool member.

Actually, it's an error during purging pool from invalid members ;)

luis5tb · 2020-05-05T06:49:08Z

pkg/platform/openstack/loadbalancer.go

+		return errors.Wrap(err, "failed to extract LB members list")
+	}
+
+	for _, member := range members {


as we have already added the "new members", does it make sense to compare here members and addresses length first, and if the same assume members are already up to date?

Seems tempting, although it won't prevent us from making less Octavia calls, right? And this will ensure, that members addresses are the same as for nodes addresses (as it should be) :)

I support @gryf here, we could miss kind of a failover of master VM. Regardless of if it's possible or not, it's almost no cost to a do full comparison. It's just golang makes it look long and clumsy. ;)

gryf · 2020-05-06T13:08:01Z

/retest

luis5tb · 2020-05-07T07:19:00Z

/test e2e-metal-ipi

luis5tb · 2020-05-07T07:19:49Z

/lgtm

pkg/platform/openstack/kuryr_bootstrap.go

During installation a bootstrap node is used as an API node at first, but then gets removed. In Kuryr's bootstrap we never remove it from API loadbalancer pool which leads to Octavia stating that LB is in degraded state. With those changes we prevent it by making sure on reconciliation we remove loadbalancer members that we don't need there anymore. This should help with scaledown of masters too.

MaysaMacedo · 2020-05-07T10:08:04Z

/lgtm

dulek

Looks good to me, minor feedback in the comments.

dulek · 2020-05-07T10:20:29Z

pkg/platform/openstack/kuryr_bootstrap.go

@@ -457,9 +457,11 @@ func BootstrapKuryr(conf *operv1.NetworkSpec, kubeClient client.Client) (*bootst
 	log.Print("Creating OpenShift API loadbalancer pool members")
 	r, _ := regexp.Compile(fmt.Sprintf("^%s-(master-port-[0-9]+|bootstrap-port)$", clusterID))
 	portList, err := listOpenStackPortsMatchingPattern(client, tag, r)
+	addresses := make([]string, 0)


No need for make() in that case, just var addresses []string to create a nil slice should be fine, append will be fine with it.

dulek · 2020-05-07T10:31:08Z

pkg/platform/openstack/loadbalancer.go

+		return errors.Wrap(err, "failed to extract LB members list")
+	}
+
+	for _, member := range members {


I support @gryf here, we could miss kind of a failover of master VM. Regardless of if it's possible or not, it's almost no cost to a do full comparison. It's just golang makes it look long and clumsy. ;)

dulek · 2020-05-07T10:33:08Z

pkg/platform/openstack/loadbalancer.go

+		for _, address := range addresses {
+			if address == member.Address {
+				found = true
+				break


Instead of using found flag you could just use loops labels: mark loop in 429 as members_loop and make this a continue members_loop.

openshift-ci-robot · 2020-05-07T10:34:43Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dulek, gryf, luis5tb, MaysaMacedo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/platform/openstack/OWNERS~~ [dulek]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2020-05-07T11:06:52Z

@gryf: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/e2e-gcp-ovn	`0ccf62b`	link	`/test e2e-gcp-ovn`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-bot · 2020-05-07T11:14:52Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-ci-robot · 2020-05-07T11:41:40Z

@gryf: All pull requests linked via external trackers have merged: openshift/cluster-network-operator#612. Bugzilla bug 1829824 has been moved to the MODIFIED state.

In response to this:

Bug 1829824: Remove dead member from LB pool.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot added bugzilla/severity-low Referenced Bugzilla bug's severity is low for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels May 4, 2020

openshift-ci-robot requested review from dulek and luis5tb May 4, 2020 16:10

MaysaMacedo reviewed May 4, 2020

View reviewed changes

gryf force-pushed the boostrap-lb branch from 70b0be9 to cfb7a72 Compare May 5, 2020 06:13

luis5tb reviewed May 5, 2020

View reviewed changes

openshift-ci-robot assigned luis5tb May 7, 2020

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label May 7, 2020

MaysaMacedo reviewed May 7, 2020

View reviewed changes

pkg/platform/openstack/kuryr_bootstrap.go Outdated Show resolved Hide resolved

gryf force-pushed the boostrap-lb branch from cfb7a72 to 0ccf62b Compare May 7, 2020 10:03

openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label May 7, 2020

gryf requested review from MaysaMacedo and luis5tb May 7, 2020 10:04

openshift-ci-robot assigned MaysaMacedo May 7, 2020

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label May 7, 2020

dulek approved these changes May 7, 2020

View reviewed changes

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 7, 2020

openshift-merge-robot merged commit 4725755 into openshift:master May 7, 2020

openshift-ci-robot mentioned this pull request May 7, 2020

Bug 1829824: Fix for wrong client passed to the purgeOpenStackLbPoolMember function. #622

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug 1829824: Remove dead member from LB pool. #612

Bug 1829824: Remove dead member from LB pool. #612

gryf commented May 4, 2020

openshift-ci-robot commented May 4, 2020

MaysaMacedo left a comment •

edited

MaysaMacedo May 4, 2020

gryf May 5, 2020

MaysaMacedo May 4, 2020

gryf May 5, 2020

MaysaMacedo May 4, 2020

gryf May 5, 2020 •

edited

luis5tb May 5, 2020

gryf May 5, 2020

dulek May 7, 2020

gryf commented May 6, 2020

luis5tb commented May 7, 2020

luis5tb commented May 7, 2020

MaysaMacedo commented May 7, 2020

dulek left a comment

dulek May 7, 2020

dulek May 7, 2020

dulek May 7, 2020

openshift-ci-robot commented May 7, 2020

openshift-ci-robot commented May 7, 2020 •

edited

openshift-bot commented May 7, 2020

openshift-ci-robot commented May 7, 2020

Bug 1829824: Remove dead member from LB pool. #612

Bug 1829824: Remove dead member from LB pool. #612

Conversation

gryf commented May 4, 2020

openshift-ci-robot commented May 4, 2020

MaysaMacedo left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gryf May 5, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gryf commented May 6, 2020

luis5tb commented May 7, 2020

luis5tb commented May 7, 2020

MaysaMacedo commented May 7, 2020

dulek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openshift-ci-robot commented May 7, 2020

openshift-ci-robot commented May 7, 2020 • edited

openshift-bot commented May 7, 2020

openshift-ci-robot commented May 7, 2020

MaysaMacedo left a comment •

edited

gryf May 5, 2020 •

edited

openshift-ci-robot commented May 7, 2020 •

edited