Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1829824: Remove dead member from LB pool. #612

Merged
merged 1 commit into from May 7, 2020

Conversation

gryf
Copy link
Member

@gryf gryf commented May 4, 2020

During installation a bootstrap node is used as an API node at first,
but then gets removed. In Kuryr's bootstrap we never remove it from API
loadbalancer pool which leads to Octavia stating that LB is in degraded
state.

With those changes we prevent it by making sure on reconciliation we
remove loadbalancer members that we don't need there anymore. This
should help with scaledown of masters too.

@openshift-ci-robot openshift-ci-robot added bugzilla/severity-low Referenced Bugzilla bug's severity is low for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels May 4, 2020
@openshift-ci-robot
Copy link
Contributor

@gryf: This pull request references Bugzilla bug 1829824, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.5.0) matches configured target release for branch (4.5.0)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1829824: Remove dead member from LB pool.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

@MaysaMacedo MaysaMacedo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just some minor issues, plus the jobs failures.

@@ -481,6 +483,11 @@ func BootstrapKuryr(conf *operv1.NetworkSpec, kubeClient client.Client) (*bootst
}
}

err := purgeOpenStackLbPoolMember(client, poolId, addresses)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the variable err already exists, no need to recreate it, just to update.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack.

@@ -411,3 +411,34 @@ func listOpenStackOctaviaProviders(client *gophercloud.ServiceClient) ([]provide
return providersList, nil
}
}

// Iterate on pool members and check their address against provided list.
// Remove all surplus members, which address doesn't existst on that list.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: exist. Nit: better to specify what is the provided list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack.

@@ -481,6 +483,11 @@ func BootstrapKuryr(conf *operv1.NetworkSpec, kubeClient client.Client) (*bootst
}
}

err := purgeOpenStackLbPoolMember(client, poolId, addresses)
if err != nil {
return nil, errors.Wrap(err, "Failed on purging LB pool.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: LB pool member.

Copy link
Member Author

@gryf gryf May 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it's an error during purging pool from invalid members ;)

return errors.Wrap(err, "failed to extract LB members list")
}

for _, member := range members {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as we have already added the "new members", does it make sense to compare here members and addresses length first, and if the same assume members are already up to date?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems tempting, although it won't prevent us from making less Octavia calls, right? And this will ensure, that members addresses are the same as for nodes addresses (as it should be) :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support @gryf here, we could miss kind of a failover of master VM. Regardless of if it's possible or not, it's almost no cost to a do full comparison. It's just golang makes it look long and clumsy. ;)

@gryf
Copy link
Member Author

gryf commented May 6, 2020

/retest

@luis5tb
Copy link
Contributor

luis5tb commented May 7, 2020

/test e2e-metal-ipi

@luis5tb
Copy link
Contributor

luis5tb commented May 7, 2020

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label May 7, 2020
During installation a bootstrap node is used as an API node at first,
but then gets removed. In Kuryr's bootstrap we never remove it from API
loadbalancer pool which leads to Octavia stating that LB is in degraded
state.

With those changes we prevent it by making sure on reconciliation we
remove loadbalancer members that we don't need there anymore. This
should help with scaledown of masters too.
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label May 7, 2020
@gryf gryf requested review from MaysaMacedo and luis5tb May 7, 2020 10:04
@MaysaMacedo
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label May 7, 2020
Copy link
Contributor

@dulek dulek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, minor feedback in the comments.

@@ -457,9 +457,11 @@ func BootstrapKuryr(conf *operv1.NetworkSpec, kubeClient client.Client) (*bootst
log.Print("Creating OpenShift API loadbalancer pool members")
r, _ := regexp.Compile(fmt.Sprintf("^%s-(master-port-[0-9]+|bootstrap-port)$", clusterID))
portList, err := listOpenStackPortsMatchingPattern(client, tag, r)
addresses := make([]string, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for make() in that case, just var addresses []string to create a nil slice should be fine, append will be fine with it.

return errors.Wrap(err, "failed to extract LB members list")
}

for _, member := range members {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support @gryf here, we could miss kind of a failover of master VM. Regardless of if it's possible or not, it's almost no cost to a do full comparison. It's just golang makes it look long and clumsy. ;)

for _, address := range addresses {
if address == member.Address {
found = true
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using found flag you could just use loops labels: mark loop in 429 as members_loop and make this a continue members_loop.

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dulek, gryf, luis5tb, MaysaMacedo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 7, 2020
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented May 7, 2020

@gryf: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-gcp-ovn 0ccf62b link /test e2e-gcp-ovn

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 4725755 into openshift:master May 7, 2020
@openshift-ci-robot
Copy link
Contributor

@gryf: All pull requests linked via external trackers have merged: openshift/cluster-network-operator#612. Bugzilla bug 1829824 has been moved to the MODIFIED state.

In response to this:

Bug 1829824: Remove dead member from LB pool.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-low Referenced Bugzilla bug's severity is low for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants