Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openstack: Don't Delete LB in Case of Security Group Reconciliation Errors #82264

Merged

Conversation

multi-io
Copy link
Contributor

@multi-io multi-io commented Sep 3, 2019

What type of PR is this?

/kind bug

What this PR does / why we need it:

This fixes the legacy Openstack cloud provider's lbaas control loop so the EnsureLoadBalancer() function no longer deletes the LB if something went wrong when reconciling the LB's security groups. With the current master, if you have an LB service and associated LB already up and running and working fine, and then during a reconcile loop (which shouldn't change anything) e.g. the OpenStack API is down temporarily at the wrong moment (i.e. if it's still up during the LB and listener reconciliation, but then down during the SG reconciliation), then the whole LB will be deleted. We saw this exact thing happen in a real world customer application, which went offline because of if (the LB is recreated shortly after, but likely with a different floating IP).

Deleting the LB in case of errors in a "reconcile" (rather than "create") function seems just wrong, and all the other parts of EnsureLoadBalancer() don't do it either: E.g. if a transient error occurs when creating a listener, we just return it and leave the LB in a half-created state (

if err != nil {
// Unknown error, retry later
return nil, fmt.Errorf("error creating LB listener: %v", err)
}
), and the service controller will catch that error and re-queue the work item (
runtime.HandleError(fmt.Errorf("error processing service %v (will retry): %v", key, err))
s.queue.AddRateLimited(key)
) so the LB creation will go through eventually.

This PR just fixes the SG reconciliation to follow the same pattern. It seems to me that the current "delete LB in case an an error" approach was originally not part of a "reconcile" function but of a "create" function, where it would've made more sense.

The same bug is present in the new out-of-tree openstack cloud provider; I've submitted a corresponding PR there (kubernetes/cloud-provider-openstack#743). We'd still like to fix this error in-tree as well and also have the fix backported to 1.15 and 1.14 (please?) because our migration to cloud controller manager is still in the early planning stages and will take more time.

Which issue(s) this PR fixes:

Fixes #35056

Release note:

Openstack: Do not delete managed LB in case of security group reconciliation errors

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Sep 3, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @multi-io. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Sep 3, 2019
@k8s-ci-robot k8s-ci-robot added area/cloudprovider sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 3, 2019
@multi-io multi-io force-pushed the openstack_dont_delete_lb_on_errors branch from eeee973 to 7a3f15a Compare September 3, 2019 12:16
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Sep 4, 2019
@alvaroaleman
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 4, 2019
@alvaroaleman
Copy link
Member

/assign @dims

@bashofmann
Copy link

@dims Is there anything that blocks merging this? In the out-of-tree openstack cloud provider the fix has already been merged: kubernetes/cloud-provider-openstack#743

@dims
Copy link
Member

dims commented Oct 10, 2019

@multi-io Ack!

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 10, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dims, multi-io

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 10, 2019
@k8s-ci-robot k8s-ci-robot merged commit 8ad7e78 into kubernetes:master Oct 10, 2019
@k8s-ci-robot k8s-ci-robot added this to the v1.17 milestone Oct 10, 2019
ohsewon pushed a commit to ohsewon/kubernetes that referenced this pull request Oct 16, 2019
…ete_lb_on_errors

openstack: Don't Delete LB in Case of Security Group Reconciliation Errors
k8s-ci-robot added a commit that referenced this pull request Feb 8, 2020
…2264-upstream-release-1.16

Automated cherry pick of #82264: openstack: do not delete LB in case of security group
@justaugustus
Copy link
Member

/kind bug
/priority important-soon

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 8, 2020
k8s-ci-robot added a commit that referenced this pull request Feb 8, 2020
…2264-upstream-release-1.15

Automated cherry pick of #82264: openstack: do not delete LB in case of security group
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cloudprovider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OpenStack LBaaS V2 Deletes Load Balancers if "ManageSecurityGroups" is true.
6 participants