New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent stranding of partial load balancers resources #7852
Conversation
+cc @smarterclayton (openshift) |
openstack != openshift, Max :) cc @anguslees for openstack change |
Passes LB e2es consistently |
How embarrassing... that's what I get for attempting to triage during a meeting :-) |
:) I'd like to pretend like some marketing guy probably didn't anticipate that when the name was chosen, but I know better ;) ----- Original Message -----
|
@@ -66,7 +66,8 @@ type TCPLoadBalancer interface { | |||
CreateTCPLoadBalancer(name, region string, externalIP net.IP, ports []int, hosts []string, affinityType api.AffinityType) (string, error) | |||
// UpdateTCPLoadBalancer updates hosts under the specified load balancer. | |||
UpdateTCPLoadBalancer(name, region string, hosts []string) error | |||
// DeleteTCPLoadBalancer deletes a specified load balancer. | |||
// DeleteTCPLoadBalancer deletes a specified load balancer. It should return | |||
// nil if the load balancer specified already didn't exist. | |||
DeleteTCPLoadBalancer(name, region string) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest renaming to something like EnsureTCPLoadBalancerDeleted() to better advertise it's idempotence.
The main feedback is that you need to retry pool deletion. The other comments are cosmetic, and won't block merging. |
5dc516a
to
93a0574
Compare
Rebased and fixed all your comments except the pool error one. |
return err | ||
// It's ok if the pool doesn't exist, as we may still need to delete the vip | ||
// (although I don't believe the system should ever be in that state). | ||
pool, poolErr := pools.Get(lb.network, vip.PoolID).Extract() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the above returned ErrNotFound, then vip.PoolID will be undefined here.
@a-robinson let me know when you're ready for a re-review. Assigning to you until then. It looks like you still need to address some of @anguslees openstack feedback. Also, it would be good to get someone to test this on openstack before moreging. Perhaps @anguslees can do that for us? |
Sorry about the delay, I had some other things float to the top of my priority queue this week. I think we should be fine with getting pools by name rather than ID, but have separated that out into issue #8352. In the meantime, I've modified this PR to effectively maintain current behavior, just returning nil rather than an error if we attempt to delete a VIP that doesn't exist, paralleling what would happen if we first attempted to get load balancer before deleting it. |
@quinton-hoole, are you alright with merging this? |
Apologies for the delay. I'll review it now. Has this been tested on openstack yet, as per above comment? |
No, I don't believe it has been tested on openstack, but @anguslees seems to be happy with it? |
} else if vipErr != nil { | ||
return vipErr | ||
} | ||
vipExists := (vipErr == nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that it's possible to get here without vipErr being nil, in which case vipExists is superfluous, right?
Just two minor things left to check/fix, as far as I can see. |
to load balancers having already been deleted.
controller to rely on that, so that we won't strand partial resources from them anymore (target pools in GCE, pools in OpenStack, etc.).
Re-pushed the code with the vipExists logic torn out and TODOs clarified with the openstack issue number. |
Have you performed a few runs of the services e2e tests against this code to confirm that:
|
Can anyone explain why the v1.0 candidate milestone tag was removed from this one? I don't think that we can launch without this, as all clusters that employ external load balancers on GCE will stop working properly within a few days, like our current soak test clusters. |
Tentatively re-adding v1.0-candidate milestone until I understand why it's not v1.0 worthy. |
Prevent stranding of partial load balancers resources
Will likely fix #7753, although the proof is in the pudding. Explanation of what this is doing in the comment there.
@quinton-hoole