-
Notifications
You must be signed in to change notification settings - Fork 253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ IPAM provider for floating ips #1763
✨ IPAM provider for floating ips #1763
Conversation
|
✅ Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
Hi @bilbobrovall. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fantastic work, thanks!
I've just taken a first pass at the pool controller, so I haven't quite gotten all the way through, yet. There's a few things to think about here. I'll continue to review here, but if it would be helpful I can also make time for a face to face chat.
/ok-to-test |
@mdbooth I've just pushed fixes for most of your comments. What I would like to do is something like trying to create the IPAddress before actually getting one from openstack, to make sure that only one reconcile tries to get an IP from openstack for a claim and then updating the IPAddress after getting an ip from openstack but I can't do that because the |
/retest |
/retest |
09e14f3
to
e1c35d1
Compare
I don't think we have to worry about this. Even with |
👍 I think I got that from the sample cluster-api-ipam-provider-in-cluster but looking over it again I realized it's because they use a separate controller And about these e2e-test, what rate of false positives should I expect? My current plan was running /retest a couple of times and if I don't get a pass assume I've broken something and investigate, but that feels suboptimal.. |
/retest |
return ctrl.Result{}, client.IgnoreNotFound(err) | ||
} | ||
|
||
scope, err := r.ScopeFactory.NewClientScopeFromFloatingIPPool(ctx, r.Client, pool, r.CaCertificates, log) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephenfin What happened to your client scope patch, btw?
if err := r.Client.List(ctx, ipAddresses, client.InNamespace(pool.Namespace), client.MatchingFields{infrav1.OpenStackFloatingIPPoolNameIndex: pool.Name}); err != nil { | ||
return err | ||
} | ||
pool.Status.ClaimedIPs = []string{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please can you make()
this slice as we know its size in advance?
if err = r.setIPStatuses(ctx, pool); err != nil { | ||
return ctrl.Result{}, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is redundant: any change which could have resulted in a change to IP statuses will result in another reconcile, in which case the first thing we will do is call setIPStatuses(). We don't need to do this twice in the same reconcile.
if pool.Spec.ReclaimPolicy == infrav1.ReclaimDelete && !contains(pool.Spec.PreAllocatedFloatingIPs, ip) { | ||
controllerutil.AddFinalizer(ipAddress, infrav1.DeleteFloatingIPFinalizer) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer we always added a Finalizer. Lets put the cleanup logic explicitly in the delete flow of the controller rather than implicitly in the k8s GC.
if err := r.setIPStatuses(ctx, pool); err != nil { | ||
return ctrl.Result{}, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Observation: setIPStatuses() is already fetching and iterating over a full list of IPAddress objects in this pool. We could probably move the IPAddress controller into this controller.
// +kubebuilder:default={} | ||
// +optional | ||
ClaimedIPs []string `json:"claimedIPs"` | ||
|
||
// +kubebuilder:default={} | ||
// +optional | ||
AvailableIPs []string `json:"availableIPs"` | ||
|
||
// +kubebuilder:default={} | ||
// +optional | ||
IPs []string `json:"ips"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of keeping track of IPs in 3 separate lists.
controllers/ipaddress_controller.go
Outdated
|
||
if !ipAddress.ObjectMeta.DeletionTimestamp.IsZero() { | ||
// If the IPAddress has more than one finalizer, it has not been released by machine yet and we should not delete it. | ||
if controllerutil.ContainsFinalizer(ipAddress, infrav1.DeleteFloatingIPFinalizer) && len(ipAddress.GetFinalizers()) == 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a bit bold assumption to me, we could expect the users to add finalizers too. Maybe it'd be best to lookup the finalizer that we expect to be gone before proceeding instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure about how that use case would look and it might only be the comment that needs to be changed. My reasoning was that if there's another finalizer on the address it's probably being used or something wants to keep it and does not want it deleted from openstack thus waiting for the "DeleteFloatingIPFinalizer" to be the last finalizer left feels reasonable.
As we are trying to keep the infra provider and ipam provider as separate as possible and follow the api spec I feel like expecting a finalizer thats not managed by the ipam provider would counteract those things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I see now, cool!
controllers/ipaddress_controller.go
Outdated
if !ipAddress.ObjectMeta.DeletionTimestamp.IsZero() { | ||
// If the IPAddress has more than one finalizer, it has not been released by machine yet and we should not delete it. | ||
if controllerutil.ContainsFinalizer(ipAddress, infrav1.DeleteFloatingIPFinalizer) && len(ipAddress.GetFinalizers()) == 1 { | ||
if err = networkingService.DeleteFloatingIP(pool, ipAddress.Spec.Address); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we care for retention policy here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relates to https://github.com/kubernetes-sigs/cluster-api-provider-openstack/pull/1763/files#r1424084068 I'll move the conditional logic from the pool controller to here 👍
// ip could be empty meaning we want to get a new one or the IP | ||
// if ip is not empty it got and ip from availableIPs that does not exist in openstack | ||
// we try to allocate it, if we fail we mark it as failed and skip it next time | ||
fp, err := networkingService.CreateFloatingIPForPool(pool, ip) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will only work with OpenStack admin credentials. I don't see CreateFloatingIPForPool
called with ""
as the second parameter. This feels wrong, I think we want this to be able to reserve a new IP automatically, even if it's Neutron choosing the address.
func union(a []string, b []string) []string { | ||
m := make(map[string]struct{}) | ||
for _, item := range a { | ||
m[item] = struct{}{} | ||
} | ||
for _, item := range b { | ||
m[item] = struct{}{} | ||
} | ||
result := make([]string, 0, len(m)) | ||
for item := range m { | ||
result = append(result, item) | ||
} | ||
return result | ||
} | ||
|
||
func diff(a []string, b []string) []string { | ||
m := make(map[string]struct{}) | ||
for _, item := range a { | ||
m[item] = struct{}{} | ||
} | ||
for _, item := range b { | ||
delete(m, item) | ||
} | ||
result := make([]string, 0, len(m)) | ||
for item := range m { | ||
result = append(result, item) | ||
} | ||
return result | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there's a potential to use this: https://github.com/kubernetes/utils/blob/master/set/set.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, but we would need to bump k8s.io/utils and k8s.io/utils/pointer is deprecated in favor of k8s.io/utils/ptr so that would involve some refactoring and should probably be done in a separate PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed!
a9075d1
to
b1db68e
Compare
I've rebuilt quite abit now @mdbooth and @dulek and I think everything should be fixed now. The big things that needs to be reviews is probably that I removed the ipaddress controller and combined the logic with the setIPStatuses method in a new reconcileIPAddresses in the pool controller and the logic for preventing orphaned ips, I tested It a couple of times and made both the tagging and creation of the IPAddress object fail 80% of the time without getting any orphaned addresses. But it's off course still possible if the controller would crash right after the FIP was created or if both the openstack and kubernetes api went down in the same window. And it looks like the only way around that would be using a temporary port and that would feel abit over engineered imo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you need to move forward with this and there are a also number of other folks (myself included) really looking forward to it. I suspect it may still need a few rounds to stabilise, but I don't want to hold it up.
My suggestion is that we move the pool API to v1alpha1 so we don't have to try to hold it to the same stability guarantees we're urgently trying to get the rest of the API to. If it helps, I already did this in this branch: https://github.com/shiftstack/cluster-api-provider-openstack/tree/openstack_floating_ip_pool. Feel free to cherry pick the top commit. It's purely mechanical, so there's no need to co-author me when you squash commits before merging.
There's one more commit in there which I've submitted separately as #1825. You'll need that so that make generate
doesn't panic, but I'd just rebase on top of it.
Once we've done that I'm inclined to merge this fairly quickly. We've declared the API as subject to change, and the codebase is fairly independent. My preference is to merge it and iterate on it.
} | ||
|
||
claims := &ipamv1.IPAddressClaimList{} | ||
if err := r.Client.List(context.Background(), claims, client.InNamespace(req.Namespace), client.MatchingFields{infrav1.OpenStackFloatingIPPoolNameIndex: pool.Name}); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if err := r.Client.List(context.Background(), claims, client.InNamespace(req.Namespace), client.MatchingFields{infrav1.OpenStackFloatingIPPoolNameIndex: pool.Name}); err != nil { | |
if err := r.Client.List(ctx, claims, client.InNamespace(req.Namespace), client.MatchingFields{infrav1.OpenStackFloatingIPPoolNameIndex: pool.Name}); err != nil { |
// Get tagged floating IPs and add them to the available IPs if they are not present in either the available IPs or the claimed IPs | ||
// This is done to prevent leaking floating IPs if to prevent leaking floating IPs if the floating IP was created but the IPAddress object was not | ||
taggedIPs, err := networkingService.GetFloatingIPsByTag(pool.GetFloatingIPTag()) | ||
if err != nil { | ||
scope.Logger().Error(err, "Failed to get floating IPs by tag", "pool", pool.Name) | ||
return "", err | ||
} | ||
for _, taggedIP := range taggedIPs { | ||
if contains(pool.Status.AvailableIPs, taggedIP.FloatingIP) || contains(pool.Status.ClaimedIPs, taggedIP.FloatingIP) { | ||
continue | ||
} | ||
scope.Logger().Info("Tagged floating IP found that was not known to the pool, adding it to the pool", "ip", taggedIP.FloatingIP) | ||
pool.Status.AvailableIPs = append(pool.Status.AvailableIPs, taggedIP.FloatingIP) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably don't want to do this unconditionally in the IPAddressClaim loop for every unallocated IPAddressClaim. Suggest we could do it only if len(AvailableIPs) == 0?
} | ||
|
||
// Retry creating the IPAddress object | ||
err = wait.ExponentialBackoff(backoff, func() (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be cancellable:
err = wait.ExponentialBackoff(backoff, func() (bool, error) { | |
err = wait.ExponentialBackoffWithContext(ctx, backoff, func() (bool, error) { |
defer func() { | ||
tag := pool.GetFloatingIPTag() | ||
|
||
err := wait.ExponentialBackoff(backoff, func() (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, please use ExponentialBackoffWithContext so this is cancellable.
I can agree with Matt here, this doesn't need to be 100% perfect. I don't understand what v1alpha1 changes though. |
f030fbf
to
97a0899
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
This looks quite good already and I agree with @mdbooth that we can probably take it in already under the v1alpha1 API version. Found a few nits though
PROJECT
Outdated
@@ -50,3 +53,7 @@ resources: | |||
- group: infrastructure | |||
kind: OpenStackClusterTemplate | |||
version: v1alpha8 | |||
- group: infrastructure | |||
kind: OpenStackFloatingIPPool | |||
version: v1alpha8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not updated
|
||
// OpenStackFloatingIPPoolSpec defines the desired state of OpenStackFloatingIPPool. | ||
type OpenStackFloatingIPPoolSpec struct { | ||
PreAllocatedFloatingIPs []string `json:"preAllocatedFloatingIPs,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a doc string for this as well? I know it is fairly self explanatory, but it is still nice to have descriptions for all fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Little formatting fixes, plus I think this should have +optional
:
// PreAllocatedFloatingIPs is a list of floating IPs precreated in OpenStack that should be used by this pool.
// These are used before allocating new ones and are not deleted from OpenStack when the pool is deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dulek FYI I've been recommending against +optional
s because they're already the default. I don't have a strong position on it, though. Is there specific guidance somewhere I'm ignoring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so just the formatting fixes! :)
This looks pretty good to me! |
Once the last remaining review feedback is addressed (is it just the docstring?) please can you squash these commits and I'll approve it. |
fd0fb6d
to
368ff89
Compare
Great, fixed and squashed 👍 , I'll be looking into rebasing #1762 when this is merged, how should I think about the apiversion on that one I guess I should just move it to alpha8? |
/retest |
/retest-required |
/approve I'll leave lgtm to somebody else. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bilbobrovall, mdbooth The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/test pull-cluster-api-provider-openstack-e2e-full-test |
/hold cancel |
What this PR does / why we need it:
Creates an IPAM provider for openstack floating ips that allocates floating ips on demand or uses pre allocated ips and in combination with #1762 lets us get floating ips for worker nodes
Which issue(s) this PR fixes:
Second part of #1750
Special notes for your reviewer:
TODOs:
/hold