-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix resource cleanup in ingress_utils.go within e2e/framework #42555
Fix resource cleanup in ingress_utils.go within e2e/framework #42555
Conversation
Hey @krousey, Matt is actually out today. Wondering if you'd like to take a look or can point me to a more appropriate reviewer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one small question.. and one double take.
@spxtr, doesn't the test-infra team take steps to try to clean this up?
// Static-IP allocated on behalf of the test, never deleted by the | ||
// controller. Delete this IP only after the controller has had a chance | ||
// to cleanup or it might interfere with the controller, causing it to | ||
// throw out confusing events. | ||
if ipErr := wait.Poll(5*time.Second, LoadBalancerCleanupTimeout, func() (bool, error) { | ||
if ipErr := wait.Poll(5*time.Second, 1*time.Minute, func() (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you still mean to change this? Or was this to aid in testing and debugging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On purpose. The LoadBalancerCleanupTimeout
is 15 minutes and is already used above at 336.
This wait.Poll(...)
is for manually deleting the static IP created during Setup(...)
. We want to try just enough times to make sure any propagation of deletions from above will make the ip viewed as un-used. Even polling for a minute may be overkill.
@@ -648,8 +674,7 @@ func (cont *GCEIngressController) isHTTPErrorCode(err error, code int) bool { | |||
|
|||
// Cleanup cleans up cloud resources. | |||
// If del is false, it simply reports existing resources without deleting them. | |||
// It always deletes resources created through it's methods, like staticIP, even | |||
// if del is false. | |||
// If dle is true, it deletes resources it finds acceptable (see canDelete func). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"If del is true".
Also, this is really gross. It's 2 functions hiding behind a boolean switch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very gross, indeed. It's also looking at the return strings of deleteXYZ(del bool) string
to determine if resources exists. Frankly, all of it would benefit from being rebuilt. I'm more concerned with stopping the leak of resources at the moment.
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED The following people have approved this PR: krousey, nicksardo Needs approval from an approver in each of these OWNERS Files:
We suggest the following people: |
@k8s-bot gce etcd3 e2e test this |
Automatic merge from submit-queue (batch tested with PRs 42080, 41653, 42598, 42555) |
What this PR does / why we need it:
The GLBC is failing to delete resources during the etcd rollback tests and the e2e cleanup is leaking them. After a short while, tests are failing to create new resources.
This PR addresses the e2e/framework's ability to delete GLBC-created resources and adds more logging.
Which issue this PR fixes:
Helps #38569 but does not completely close this flake
Special notes for your reviewer:
Resources were not being deleted because resource names were being truncated and then their ability to be deleted was determined by the entire cluster id existing in the name. Truncated names also have an extra '0' append to the end of their name (unknown origin). This PR tries to match on a common prefix.
Minor changes were made to improve log readability.
Testing this PR:
This was tested by running a master upgrade test and by adding a second forwarding-rule mid-run. This forwarding rule referenced the same url-map used by the first forwarding-rule created by the GLBC. Therefore, the GLBC will be able to delete the forwarding-rule but not anymore L7 resources. This second forwarding rule's name was nearly identical to the first forwarding rule so that the cleanup code will find it.
As you can see from the test run below, the cleanup code deleted all the resources that the GLBC could not.