Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ IPAM provider for floating ips #1763

Merged

Conversation

bilbobrovall
Copy link
Contributor

@bilbobrovall bilbobrovall commented Nov 29, 2023

What this PR does / why we need it:
Creates an IPAM provider for openstack floating ips that allocates floating ips on demand or uses pre allocated ips and in combination with #1762 lets us get floating ips for worker nodes

Which issue(s) this PR fixes:
Second part of #1750

Special notes for your reviewer:

  1. Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • if necessary:
    • includes documentation
    • adds unit tests

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 29, 2023
Copy link

linux-foundation-easycla bot commented Nov 29, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: bilbobrovall / name: Bilbo Brovall (368ff89)

Copy link

netlify bot commented Nov 29, 2023

Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!

Name Link
🔨 Latest commit 368ff89
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-cluster-api-openstack/deploys/65a92b737b27230008316d1e
😎 Deploy Preview https://deploy-preview-1763--kubernetes-sigs-cluster-api-openstack.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Nov 29, 2023
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Nov 29, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @bilbobrovall. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Nov 29, 2023
Copy link
Contributor

@mdbooth mdbooth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic work, thanks!

I've just taken a first pass at the pool controller, so I haven't quite gotten all the way through, yet. There's a few things to think about here. I'll continue to review here, but if it would be helpful I can also make time for a face to face chat.

controllers/openstackfloatingippool_controller.go Outdated Show resolved Hide resolved
controllers/openstackfloatingippool_controller.go Outdated Show resolved Hide resolved
controllers/openstackfloatingippool_controller.go Outdated Show resolved Hide resolved
controllers/openstackfloatingippool_controller.go Outdated Show resolved Hide resolved
controllers/openstackfloatingippool_controller.go Outdated Show resolved Hide resolved
controllers/openstackfloatingippool_controller.go Outdated Show resolved Hide resolved
controllers/openstackfloatingippool_controller.go Outdated Show resolved Hide resolved
controllers/openstackfloatingippool_controller.go Outdated Show resolved Hide resolved
@jichenjc
Copy link
Contributor

jichenjc commented Dec 1, 2023

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 1, 2023
@bilbobrovall
Copy link
Contributor Author

@mdbooth I've just pushed fixes for most of your comments.
The one thing I'm not very happy with is the time between an ip getting created in openstack and the IPAddress crd being created as it opens up for some race conditions and is the reason I'm running the controller with MaxConcurrentReconciles: 1.

What I would like to do is something like trying to create the IPAddress before actually getting one from openstack, to make sure that only one reconcile tries to get an IP from openstack for a claim and then updating the IPAddress after getting an ip from openstack but I can't do that because the IPAddress.Spec is immutable, any ideas? But then again it might be fine without concurrency

@huxcrux
Copy link
Contributor

huxcrux commented Dec 7, 2023

/retest

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 7, 2023
@bilbobrovall
Copy link
Contributor Author

/retest

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 11, 2023
@mdbooth
Copy link
Contributor

mdbooth commented Dec 11, 2023

@mdbooth I've just pushed fixes for most of your comments. The one thing I'm not very happy with is the time between an ip getting created in openstack and the IPAddress crd being created as it opens up for some race conditions and is the reason I'm running the controller with MaxConcurrentReconciles: 1.

What I would like to do is something like trying to create the IPAddress before actually getting one from openstack, to make sure that only one reconcile tries to get an IP from openstack for a claim and then updating the IPAddress after getting an ip from openstack but I can't do that because the IPAddress.Spec is immutable, any ideas? But then again it might be fine without concurrency

I don't think we have to worry about this. Even with MaxConcurrentReconciles > 1 controller-runtime will only ever schedule one reconciliation for a particular resource at a time, but it can schedule reconciliations for different resources concurrently. See kubernetes-sigs/controller-runtime#616.

@bilbobrovall
Copy link
Contributor Author

@mdbooth I've just pushed fixes for most of your comments. The one thing I'm not very happy with is the time between an ip getting created in openstack and the IPAddress crd being created as it opens up for some race conditions and is the reason I'm running the controller with MaxConcurrentReconciles: 1.
What I would like to do is something like trying to create the IPAddress before actually getting one from openstack, to make sure that only one reconcile tries to get an IP from openstack for a claim and then updating the IPAddress after getting an ip from openstack but I can't do that because the IPAddress.Spec is immutable, any ideas? But then again it might be fine without concurrency

I don't think we have to worry about this. Even with MaxConcurrentReconciles > 1 controller-runtime will only ever schedule one reconciliation for a particular resource at a time, but it can schedule reconciliations for different resources concurrently. See kubernetes-sigs/controller-runtime#616.

👍 I think I got that from the sample cluster-api-ipam-provider-in-cluster but looking over it again I realized it's because they use a separate controller For IPAddressClaims that creates the IPs. I felt like that would only add extra complexity in our implementation though. And since it looks like we don't need MaxConcurrentReconciles: 1 we can atleast process claims from different pools in parallel while the For IPAddressClaim approach could only process one claim at a time without implementing a per pool lock.

And about these e2e-test, what rate of false positives should I expect? My current plan was running /retest a couple of times and if I don't get a pass assume I've broken something and investigate, but that feels suboptimal..

@bilbobrovall
Copy link
Contributor Author

/retest

api/v1alpha7/openstackfloatingippool_types.go Outdated Show resolved Hide resolved
api/v1alpha7/openstackfloatingippool_types.go Outdated Show resolved Hide resolved
api/v1alpha7/openstackfloatingippool_types.go Outdated Show resolved Hide resolved
return ctrl.Result{}, client.IgnoreNotFound(err)
}

scope, err := r.ScopeFactory.NewClientScopeFromFloatingIPPool(ctx, r.Client, pool, r.CaCertificates, log)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenfin What happened to your client scope patch, btw?

if err := r.Client.List(ctx, ipAddresses, client.InNamespace(pool.Namespace), client.MatchingFields{infrav1.OpenStackFloatingIPPoolNameIndex: pool.Name}); err != nil {
return err
}
pool.Status.ClaimedIPs = []string{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please can you make() this slice as we know its size in advance?

Comment on lines 179 to 181
if err = r.setIPStatuses(ctx, pool); err != nil {
return ctrl.Result{}, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is redundant: any change which could have resulted in a change to IP statuses will result in another reconcile, in which case the first thing we will do is call setIPStatuses(). We don't need to do this twice in the same reconcile.

Comment on lines 163 to 165
if pool.Spec.ReclaimPolicy == infrav1.ReclaimDelete && !contains(pool.Spec.PreAllocatedFloatingIPs, ip) {
controllerutil.AddFinalizer(ipAddress, infrav1.DeleteFloatingIPFinalizer)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer we always added a Finalizer. Lets put the cleanup logic explicitly in the delete flow of the controller rather than implicitly in the k8s GC.

controllers/openstackfloatingippool_controller.go Outdated Show resolved Hide resolved
if err := r.setIPStatuses(ctx, pool); err != nil {
return ctrl.Result{}, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observation: setIPStatuses() is already fetching and iterating over a full list of IPAddress objects in this pool. We could probably move the IPAddress controller into this controller.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 15, 2023
Comment on lines 70 to 80
// +kubebuilder:default={}
// +optional
ClaimedIPs []string `json:"claimedIPs"`

// +kubebuilder:default={}
// +optional
AvailableIPs []string `json:"availableIPs"`

// +kubebuilder:default={}
// +optional
IPs []string `json:"ips"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of keeping track of IPs in 3 separate lists.

pkg/cloud/services/networking/floatingip.go Outdated Show resolved Hide resolved

if !ipAddress.ObjectMeta.DeletionTimestamp.IsZero() {
// If the IPAddress has more than one finalizer, it has not been released by machine yet and we should not delete it.
if controllerutil.ContainsFinalizer(ipAddress, infrav1.DeleteFloatingIPFinalizer) && len(ipAddress.GetFinalizers()) == 1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a bit bold assumption to me, we could expect the users to add finalizers too. Maybe it'd be best to lookup the finalizer that we expect to be gone before proceeding instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure about how that use case would look and it might only be the comment that needs to be changed. My reasoning was that if there's another finalizer on the address it's probably being used or something wants to keep it and does not want it deleted from openstack thus waiting for the "DeleteFloatingIPFinalizer" to be the last finalizer left feels reasonable.
As we are trying to keep the infra provider and ipam provider as separate as possible and follow the api spec I feel like expecting a finalizer thats not managed by the ipam provider would counteract those things

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I see now, cool!

if !ipAddress.ObjectMeta.DeletionTimestamp.IsZero() {
// If the IPAddress has more than one finalizer, it has not been released by machine yet and we should not delete it.
if controllerutil.ContainsFinalizer(ipAddress, infrav1.DeleteFloatingIPFinalizer) && len(ipAddress.GetFinalizers()) == 1 {
if err = networkingService.DeleteFloatingIP(pool, ipAddress.Spec.Address); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we care for retention policy here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relates to https://github.com/kubernetes-sigs/cluster-api-provider-openstack/pull/1763/files#r1424084068 I'll move the conditional logic from the pool controller to here 👍

// ip could be empty meaning we want to get a new one or the IP
// if ip is not empty it got and ip from availableIPs that does not exist in openstack
// we try to allocate it, if we fail we mark it as failed and skip it next time
fp, err := networkingService.CreateFloatingIPForPool(pool, ip)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will only work with OpenStack admin credentials. I don't see CreateFloatingIPForPool called with "" as the second parameter. This feels wrong, I think we want this to be able to reserve a new IP automatically, even if it's Neutron choosing the address.

Comment on lines +217 to +261
func union(a []string, b []string) []string {
m := make(map[string]struct{})
for _, item := range a {
m[item] = struct{}{}
}
for _, item := range b {
m[item] = struct{}{}
}
result := make([]string, 0, len(m))
for item := range m {
result = append(result, item)
}
return result
}

func diff(a []string, b []string) []string {
m := make(map[string]struct{})
for _, item := range a {
m[item] = struct{}{}
}
for _, item := range b {
delete(m, item)
}
result := make([]string, 0, len(m))
for item := range m {
result = append(result, item)
}
return result
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there's a potential to use this: https://github.com/kubernetes/utils/blob/master/set/set.go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but we would need to bump k8s.io/utils and k8s.io/utils/pointer is deprecated in favor of k8s.io/utils/ptr so that would involve some refactoring and should probably be done in a separate PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed!

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 4, 2024
@bilbobrovall
Copy link
Contributor Author

I've rebuilt quite abit now @mdbooth and @dulek and I think everything should be fixed now.

The big things that needs to be reviews is probably that I removed the ipaddress controller and combined the logic with the setIPStatuses method in a new reconcileIPAddresses in the pool controller and the logic for preventing orphaned ips, I tested It a couple of times and made both the tagging and creation of the IPAddress object fail 80% of the time without getting any orphaned addresses. But it's off course still possible if the controller would crash right after the FIP was created or if both the openstack and kubernetes api went down in the same window. And it looks like the only way around that would be using a temporary port and that would feel abit over engineered imo.

Copy link
Contributor

@mdbooth mdbooth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you need to move forward with this and there are a also number of other folks (myself included) really looking forward to it. I suspect it may still need a few rounds to stabilise, but I don't want to hold it up.

My suggestion is that we move the pool API to v1alpha1 so we don't have to try to hold it to the same stability guarantees we're urgently trying to get the rest of the API to. If it helps, I already did this in this branch: https://github.com/shiftstack/cluster-api-provider-openstack/tree/openstack_floating_ip_pool. Feel free to cherry pick the top commit. It's purely mechanical, so there's no need to co-author me when you squash commits before merging.

There's one more commit in there which I've submitted separately as #1825. You'll need that so that make generate doesn't panic, but I'd just rebase on top of it.

Once we've done that I'm inclined to merge this fairly quickly. We've declared the API as subject to change, and the codebase is fairly independent. My preference is to merge it and iterate on it.

@lentzi90 @jichenjc @dulek thoughts?

}

claims := &ipamv1.IPAddressClaimList{}
if err := r.Client.List(context.Background(), claims, client.InNamespace(req.Namespace), client.MatchingFields{infrav1.OpenStackFloatingIPPoolNameIndex: pool.Name}); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if err := r.Client.List(context.Background(), claims, client.InNamespace(req.Namespace), client.MatchingFields{infrav1.OpenStackFloatingIPPoolNameIndex: pool.Name}); err != nil {
if err := r.Client.List(ctx, claims, client.InNamespace(req.Namespace), client.MatchingFields{infrav1.OpenStackFloatingIPPoolNameIndex: pool.Name}); err != nil {

Comment on lines 313 to 330
// Get tagged floating IPs and add them to the available IPs if they are not present in either the available IPs or the claimed IPs
// This is done to prevent leaking floating IPs if to prevent leaking floating IPs if the floating IP was created but the IPAddress object was not
taggedIPs, err := networkingService.GetFloatingIPsByTag(pool.GetFloatingIPTag())
if err != nil {
scope.Logger().Error(err, "Failed to get floating IPs by tag", "pool", pool.Name)
return "", err
}
for _, taggedIP := range taggedIPs {
if contains(pool.Status.AvailableIPs, taggedIP.FloatingIP) || contains(pool.Status.ClaimedIPs, taggedIP.FloatingIP) {
continue
}
scope.Logger().Info("Tagged floating IP found that was not known to the pool, adding it to the pool", "ip", taggedIP.FloatingIP)
pool.Status.AvailableIPs = append(pool.Status.AvailableIPs, taggedIP.FloatingIP)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably don't want to do this unconditionally in the IPAddressClaim loop for every unallocated IPAddressClaim. Suggest we could do it only if len(AvailableIPs) == 0?

}

// Retry creating the IPAddress object
err = wait.ExponentialBackoff(backoff, func() (bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be cancellable:

Suggested change
err = wait.ExponentialBackoff(backoff, func() (bool, error) {
err = wait.ExponentialBackoffWithContext(ctx, backoff, func() (bool, error) {

defer func() {
tag := pool.GetFloatingIPTag()

err := wait.ExponentialBackoff(backoff, func() (bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, please use ExponentialBackoffWithContext so this is cancellable.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 15, 2024
@dulek
Copy link
Contributor

dulek commented Jan 15, 2024

I can agree with Matt here, this doesn't need to be 100% perfect. I don't understand what v1alpha1 changes though.

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 15, 2024
Copy link
Contributor

@lentzi90 lentzi90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!
This looks quite good already and I agree with @mdbooth that we can probably take it in already under the v1alpha1 API version. Found a few nits though

PROJECT Outdated
@@ -50,3 +53,7 @@ resources:
- group: infrastructure
kind: OpenStackClusterTemplate
version: v1alpha8
- group: infrastructure
kind: OpenStackFloatingIPPool
version: v1alpha8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not updated


// OpenStackFloatingIPPoolSpec defines the desired state of OpenStackFloatingIPPool.
type OpenStackFloatingIPPoolSpec struct {
PreAllocatedFloatingIPs []string `json:"preAllocatedFloatingIPs,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a doc string for this as well? I know it is fairly self explanatory, but it is still nice to have descriptions for all fields.

Copy link
Contributor

@dulek dulek Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Little formatting fixes, plus I think this should have +optional:

	// PreAllocatedFloatingIPs is a list of floating IPs precreated in OpenStack that should be used by this pool.
	// These are used before allocating new ones and are not deleted from OpenStack when the pool is deleted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dulek FYI I've been recommending against +optionals because they're already the default. I don't have a strong position on it, though. Is there specific guidance somewhere I'm ignoring?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so just the formatting fixes! :)

controllers/openstackfloatingippool_controller.go Outdated Show resolved Hide resolved
@dulek
Copy link
Contributor

dulek commented Jan 17, 2024

This looks pretty good to me!

@mdbooth
Copy link
Contributor

mdbooth commented Jan 18, 2024

Once the last remaining review feedback is addressed (is it just the docstring?) please can you squash these commits and I'll approve it.

@bilbobrovall
Copy link
Contributor Author

Once the last remaining review feedback is addressed (is it just the docstring?) please can you squash these commits and I'll approve it.

Great, fixed and squashed 👍 , I'll be looking into rebasing #1762 when this is merged, how should I think about the apiversion on that one I guess I should just move it to alpha8?

@bilbobrovall
Copy link
Contributor Author

/retest

@mdbooth
Copy link
Contributor

mdbooth commented Jan 18, 2024

/retest-required

@mdbooth
Copy link
Contributor

mdbooth commented Jan 18, 2024

/approve

I'll leave lgtm to somebody else.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bilbobrovall, mdbooth

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 18, 2024
Copy link
Contributor

@lentzi90 lentzi90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 19, 2024
@lentzi90
Copy link
Contributor

/test pull-cluster-api-provider-openstack-e2e-full-test

@bilbobrovall
Copy link
Contributor Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 19, 2024
@k8s-ci-robot k8s-ci-robot merged commit 05571b9 into kubernetes-sigs:main Jan 19, 2024
10 checks passed
@bilbobrovall bilbobrovall deleted the openstack_floating_ip_pool branch January 22, 2024 01:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants