Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[gardenlet] Switch BackupEntry controller to controller-runtime #6926

Merged
merged 27 commits into from
Nov 23, 2022

Conversation

shafeeqes
Copy link
Contributor

@shafeeqes shafeeqes commented Oct 30, 2022

How to categorize this PR?

/area dev-productivity scalability
/kind enhancement

What this PR does / why we need it:
Refactor the BackupEntry controller to controller-runtime.

Which issue(s) this PR fixes:
Part of #4251

Special notes for your reviewer:
Generally, we want to follow this cookbook while refactoring existing controllers:

Add documentation
Add integration test based on envtest (if not already present)
Switch controller to controller-runtime

Depends on #6837, Hence in draft state.
Also I will eliminate the wait for extension in favour of adding watches, once the approach in BackupBucket controller is reviewed and merged.

Release note:

NONE

@gardener-prow
Copy link
Contributor

gardener-prow bot commented Oct 30, 2022

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@gardener-prow gardener-prow bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. area/dev-productivity Developer productivity related (how to improve development) area/scalability Scalability related kind/enhancement Enhancement, improvement, extension cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. labels Oct 30, 2022
@gardener-prow gardener-prow bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Oct 30, 2022
@shafeeqes shafeeqes force-pushed the refactor/be-controller branch 3 times, most recently from 76bac08 to 3f26375 Compare October 31, 2022 04:51
@shafeeqes shafeeqes force-pushed the refactor/be-controller branch 3 times, most recently from 7fa9788 to 392ec6b Compare November 9, 2022 10:43
@gardener gardener deleted a comment from gardener-prow bot Nov 9, 2022
@gardener-prow gardener-prow bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 9, 2022
@gardener-prow gardener-prow bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 9, 2022
@shafeeqes shafeeqes force-pushed the refactor/be-controller branch 6 times, most recently from 9b8b84c to 8a50abb Compare November 14, 2022 05:17
@shafeeqes shafeeqes marked this pull request as ready for review November 14, 2022 05:17
@gardener-prow gardener-prow bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 14, 2022
@gardener-prow gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label Nov 22, 2022
Copy link
Member

@rfranzke rfranzke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only nits

docs/concepts/gardenlet.md Outdated Show resolved Hide resolved
docs/concepts/gardenlet.md Outdated Show resolved Hide resolved
pkg/gardenlet/controller/backupentry/migration/add.go Outdated Show resolved Hide resolved
pkg/gardenlet/controller/backupentry/migration/add.go Outdated Show resolved Hide resolved
@gardener-prow gardener-prow bot removed the lgtm Indicates that a PR is ready to be merged. label Nov 23, 2022
Copy link
Member

@rfranzke rfranzke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@gardener-prow gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label Nov 23, 2022
@gardener-prow
Copy link
Contributor

gardener-prow bot commented Nov 23, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rfranzke

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gardener-prow gardener-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 23, 2022
@gardener-prow gardener-prow bot removed the lgtm Indicates that a PR is ready to be merged. label Nov 23, 2022
@rfranzke
Copy link
Member

/lgtm

@gardener-prow gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label Nov 23, 2022
@shafeeqes
Copy link
Contributor Author

shafeeqes commented Nov 23, 2022

@shafeeqes: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-gardener-e2e-kind-migration b49c65e link true /test pull-gardener-e2e-kind-migration
Full PR test history. Your PR dashboard. Command help for this repository. Please help us cut down on flakes by linking this test failure to an open flake report or filing a new flake report if you can't find an existing one. Also see our testing guideline for how to avoid and hunt flakes.

Looks like the Cluster resource is gone before the backupEntry is deleted,

2022-11-23T07:30:07.590367904Z stderr F {"level":"info","ts":"2022-11-23T07:30:07.590Z","msg":"Extension BackupEntry not yet deleted","controller":"backupentry","object":{"name":"shoot--local--e2e-migrate--7afb4ec6-3318-49a4-9562-ee070f464933","namespace":"garden-local"},"namespace":"garden-local","name":"shoot--local--e2e-migrate--7afb4ec6-3318-49a4-9562-ee070f464933","reconcileID":"6e61b81b-661a-43f1-8384-46ca59244fd2","extensionBackupEntry":{"name":"shoot--local--e2e-migrate--7afb4ec6-3318-49a4-9562-ee070f464933"}}
2022-11-23T07:30:07.603772924Z stderr F {"level":"error","ts":"2022-11-23T07:30:07.603Z","msg":"Failed to get shoot from cluster","controller":"backupentry","shootTechnicalID":"shoot--local--e2e-migrate","error":"Cluster.extensions.gardener.cloud \"shoot--local--e2e-migrate\" not found","stacktrace":"github.com/gardener/gardener/pkg/gardenlet/controller/backupentry/backupentry.(*Reconciler).MapExtensionBackupEntryToCoreBackupEntry

Fixing it.
Update: This is just an expected log, real issue was #6926 (comment).

@gardener-prow gardener-prow bot removed the lgtm Indicates that a PR is ready to be merged. label Nov 23, 2022
@shafeeqes
Copy link
Contributor Author

Had a combined look with @plkokanov,
We found that this is caused because of a race condition.

The BackupEntry is deleted if the shoot is deleted, and in this deletion flow, it checks for a successfully reconciled BackupBucket. If the BackupBucket is deleted in this time, the BackupBucket status is updated to Processing, and only after this it checks if there are any associated BackupEntrys.
This causes a deadlock, because the BackupEntry needs a Succeeded BackupBucket to proceed with deletion and the BackupBucket requires all the BackupEntrys to be gone.
This happened because just after the Shoot deletion test is successful, we delete the Seed in the script, and the seed deletion flow deletes the BackupBucket.
6367d91 moves the check for associated BackupEntrys before accepting deletion, so that we don't set the BackupBucket status to Processing.

Also, if there is a deletion graceperiod, the BackupEntry will not be deleted even if the Shoot is gone, In this case, the cluster resource won't be present anymore. So we shouldn't try to fetch the cluster in that case. Anyway, for deletion of BackupEntry extension, we have a RequeueAfter, so we don't need to map the extension to core BackupEntry if there is a deletionTimestamp.

@rfranzke
Copy link
Member

Awesome @shafeeqes and @plkokanov, this sounds very reasonable. Nice finding and thanks for fixing it! 👏🏻
/lgtm

@gardener-prow gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label Nov 23, 2022
@gardener-prow gardener-prow bot merged commit 36c43c1 into gardener:master Nov 23, 2022
@shafeeqes shafeeqes deleted the refactor/be-controller branch November 23, 2022 13:27
@rfranzke rfranzke changed the title [gardenlet] Switch BackupEntry controller to controller-runtime [gardenlet] Switch BackupEntry controller to controller-runtime Dec 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/dev-productivity Developer productivity related (how to improve development) area/scalability Scalability related cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. kind/enhancement Enhancement, improvement, extension lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants