Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenStack LBaaS fix: must use ID, not name, of the node security group #65373

Merged

Conversation

multi-io
Copy link
Contributor

@multi-io multi-io commented Jun 22, 2018

This is a bugfix for the OpenStack LBaaS cloud provider security group management.

A bit of context: When creating a load balancer for a given type: LoadBalancer service, the provider will try to:

(see pkg/cloudprovider/providers/openstack/openstack_loadbalancer.go/EnsureLoadBalancer)

  1. create a load balancer (LB) in Openstack with listeners corresponding to the service's ports
  2. attach a floating IP to the LB's network port

If manage-security-groups is enabled in controller-manager's cloud.conf:

  1. create a security group with ingress rules corresponding to the LB's listeners, and attach it to the LB's network port
  2. for all nodes of the cluster, pick an existing security group for the nodes ("node security group") and add ingress rules to it exposing the service's NodePorts to the security group created in step 3.

In the current upstream master, steps 1 through 3 work fine, step 4 fails, leading to a service that's not accessible via the LB without further manual intervention.

The bug is in the "pick an existing security group" operation (func getNodeSecurityGroupIDForLB), which, contrary to its name, will return the security group's name rather than its ID (actually it returns a list of names rather than IDs, apparently to cover some corner cases where you might have more than one node security group, but anyway). This will then be used when trying to add the ingress rules to the group, which the Openstack API will reject with a 404 (at least on our (fairly standard) Openstack Ocata installation) because we're giving it a name where it expects an ID.

The PR adds a "get ID given a name" lookup to the getNodeSecurityGroupIDForLB function, so it actually returns IDs. That's it. I'm not sure if the upstream code wasn't really tested, or maybe other people use other Openstacks with more lenient APIs. The bug and the fix is always reproducible on our installation.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):

Fixes #58145

Special notes for your reviewer:

Should we turn getNodeSecurityGroupIDForLB into a method with the lbaas as its receiver because it now requires two of the lbaas's attributes? I'm not sure what the conventions are here, if any.

Release note:

Properly manage security groups for loadbalancer services on OpenStack.

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 22, 2018
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 22, 2018
@k8s-ci-robot
Copy link
Contributor

Hi @multi-io. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@@ -570,7 +570,12 @@ func getNodeSecurityGroupIDForLB(compute *gophercloud.ServiceClient, nodes []*v1
// case 2: node1:SG1,SG2 node2:SG3,SG4 return SG1,SG3
// case 3: node1:SG1,SG2 node2:SG2,SG3 return SG1,SG2
securityGroupName := srv.SecurityGroups[0]["name"]
nodeSecurityGroupIDs.Insert(securityGroupName.(string))
secGroupID, err := groups.IDFromName(network, securityGroupName.(string))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

groups.IDFromName does an API lookup. Given that almost all uses of this loop are going to be performing many lookups for a very small (almost always 1) number of names, it would be nice to do this lookup after de-duplication...

iow: something like:

Assuming s/nodeSecurityGroupIDs/sgNames/g

for _, node := range nodes {
   ...
   sgNames.Insert(securityGroupName.(string))
}

ids := make([]string, sgNames.Len())
for i, name := range sgNames.List() {
    var err error
    ids[i], err = groups.IDFromName(network, name)
    if err != nil {
        return []string{}, err
    }
}

return ids, nil

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, thanks. Fix applied. (force push -- is that OK?)

What do you think about turning the function into a method?

@multi-io multi-io force-pushed the openstack_lbaas_node_secgroup_fix branch from e9b08b6 to 8ed735d Compare June 25, 2018 21:55
Copy link
Member

@anguslees anguslees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

(and yes, force pushing is fine - it's the github way. For long/controversial PRs it's sometimes nice to build up incremental commits to make the review easier, and then squash the whole lot after you get a "verbal" lgtm.)

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 26, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: anguslees, multi-io

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 26, 2018
@anguslees
Copy link
Member

/cc @hogepodge - this looks like an important gap in our testing (Service type=LoadBalancer + lbaas without explicit node security group config), if you're looking for a motivational example...

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 65449, 65373, 49410). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 3d69499 into kubernetes:master Jun 26, 2018
@anguslees
Copy link
Member

If the previous code was broken, then it was broken in 1.9-1.11, and we should backport this to (at least) 1.11 imo. I hesitate to suggest this code has been non-functional for that long however, so more testing reports from people with/without this patch would be useful to help inform that backport discussion.

(tagging @dims to track backporting - if required)

@bashofmann
Copy link

We can verify this issue with 1.9.7 as well as 1.10.3.

@dims
Copy link
Member

dims commented Jun 28, 2018

Thanks @bashofmann @anguslees. i've filed reverts.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Jul 3, 2018
k8s-github-robot pushed a commit that referenced this pull request Jul 4, 2018
…pstream-release-1.10

Automatic merge from submit-queue.

Automated cherry pick of #65373: BUGFIX: must use ID, not name, of the node security group

Cherry pick of #65373 on release-1.10.

#65373: BUGFIX: must use ID, not name, of the node security group
k8s-github-robot pushed a commit that referenced this pull request Jul 10, 2018
…pstream-release-1.11

Automatic merge from submit-queue.

Automated cherry pick of #65373: BUGFIX: must use ID, not name, of the node security group

Cherry pick of #65373 on release-1.11.

#65373: BUGFIX: must use ID, not name, of the node security group
k8s-github-robot pushed a commit that referenced this pull request Jul 24, 2018
…pstream-release-1.9

Automatic merge from submit-queue.

Automated cherry pick of #65373: BUGFIX: must use ID, not name, of the node security group

Cherry pick of #65373 on release-1.9.

#65373: BUGFIX: must use ID, not name, of the node security group
bashofmann added a commit to kubermatic/kubermatic that referenced this pull request Aug 8, 2018
…onfig for Kubernetes versions that support it

The issue that this setting is not working was fixed with kubernetes/kubernetes#65373 and
is available in versions >=1.9.10, 1.10.6, 1.11.1

The setting should not be applied to older Kubernetes versions since it breaks creating LoadBalancers completely.

Fixes #1717
bashofmann added a commit to kubermatic/kubermatic that referenced this pull request Aug 9, 2018
* Sets OpenStack LoadBalancer manage-security-groups setting in cloud-config for Kubernetes versions that support it

The issue that this setting is not working was fixed with kubernetes/kubernetes#65373 and
is available in versions >=1.9.10, 1.10.6, 1.11.1

The setting should not be applied to older Kubernetes versions since it breaks creating LoadBalancers completely.

Fixes #1717

* Update test fixtures that were broken because of rebase
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kubelet fails when cloudprovider openstack is used and should manage loadbalancer security group
6 participants