Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Increase CRD cleanup requeue delay; remove e2e test #2541

Merged
merged 1 commit into from
Jan 4, 2023

Conversation

kylape
Copy link
Contributor

@kylape kylape commented Jan 4, 2023

Requeue is increased from 10 seconds to 30 minutes.

Summary

When KCP is under heavy load during e2e test runs, it can take a long time for a new APIBinding to complete reconciliation. In some cases, it takes longer than the requeue delay configured in the CRD cleanup controller, which means the newly-created CRD will be deleted before initialization of its corresponding APIBinding is complete.

The obvious fix is to increase the requeue delay, but there is an e2e test that waits on this delay to ensure CRDs are cleaned up properly. Unfortunately waiting becomes infeasible if the delay is increased to the order of minutes or hours instead of seconds.

Therefore the e2e test is removed in favor of an expanded unit test. The queue is mocked out in the unit test, and the test function emulates time going forward by decreasing the CRD's creationTimestamp before calling process again.

Related issue(s)

Fixes #2507

Requeue is increased from 10 seconds to 30 minutes.

When KCP is under heavy load during e2e test runs, it can take a long
time for a new `APIBinding` to complete reconciliation.  In some cases, it
takes longer than the requeue delay configured in the CRD cleanup
controller, which means the newly-created CRD will be deleted before
initialization of its corresponding `APIBinding` is complete.

The obvious fix is to increase the requeue delay, but there is an e2e
test that waits on this delay to ensure CRDs are cleaned up properly.
Unfortunately waiting becomes infeasible if the delay is increased to the
order of minutes or hours instead of seconds.

Therefore the e2e test is removed in favor of an expanded unit test.
The queue is mocked out in the unit test, and the test function emulates
time going forward by decreasing the CRD's `creationTimestamp` before
calling `process` again.

Fixes kcp-dev#2507
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 4, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 4, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ncdc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 4, 2023
@openshift-merge-robot openshift-merge-robot merged commit 417e8e3 into kcp-dev:main Jan 4, 2023
@kylape kylape deleted the fix-2507 branch January 4, 2023 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

flake: bound CRD deleted by CRD cleanup controller too aggressively
3 participants