Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete related VSRs after restore completes. #269

Merged
merged 8 commits into from May 19, 2023

Conversation

mrnold
Copy link

@mrnold mrnold commented May 16, 2023

OADP-1872: delete VSRs after a restore completes, and clean up VSRs from completed restores and restores that no longer exist.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 16, 2023
@openshift-ci
Copy link

openshift-ci bot commented May 16, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Signed-off-by: Matthew Arnold <marnold@redhat.com>
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 18, 2023
@mrnold mrnold changed the base branch from konveyor-dev to oadp-1.2 May 18, 2023 17:50
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 18, 2023
@openshift-ci
Copy link

openshift-ci bot commented May 18, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mrnold

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Matthew Arnold <marnold@redhat.com>
Signed-off-by: Matthew Arnold <marnold@redhat.com>
@mrnold
Copy link
Author

mrnold commented May 18, 2023

Okay, I think this satisfies all the requirements, and it's working okay. It is very slow though, so I will keep working to try to have it not hammer the API so hard.

@mrnold mrnold marked this pull request as ready for review May 18, 2023 21:52
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 18, 2023
return err
}

for _, restore := range restoreList.Items {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are calling the CleanupRestoreVSRs function for the restores with completed phase, IMO we should only cleanup VSRs for this current restore and not touch any other VSRs.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shubham-pampattiwar Hmm. I guess that depends on whether we want to clean up old non-cleaned restores or not. If we're only ever concerned with removing VSRs for the current restore, then this should be simpler.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean up only the vsr's associated w/ the restore in process. Matching the vsb and vsr behavior would be the right thing to do. Customers can clean any other $vsr's themselves.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put this back to only removing VSRs for the current restore.

This reverts commit bfbb435.

Signed-off-by: Matthew Arnold <marnold@redhat.com>
This reverts commit 8582710.

Signed-off-by: Matthew Arnold <marnold@redhat.com>
@@ -575,6 +576,9 @@ func (r *restoreReconciler) runValidatedRestore(restore *api.Restore, info backu
r.logger.Debug("Restore completed")
restore.Status.Phase = api.RestorePhaseCompleted
r.metrics.RegisterRestoreSuccess(restore.Spec.ScheduleName)
if err := datamover.DeleteVSRsIfComplete(restore.Name, r.logger); err != nil {
r.logger.WithError(err).Error("Error removing VSRs after completed restore")
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the same above after line 568/569, otherwise a single restore error will prevent VSR cleanup.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what if there are inProgress operations, does the cleanup just never happen, or is there another place which will call the same code?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have addressed these with cc1c0 and 7e77a now, let me know whether or not that's what you had in mind.

// check for a failed VSR
for _, cond := range currentVSR.Status.Conditions {
if cond.Status == metav1.ConditionFalse && cond.Reason == ReconciledReasonError && cond.Type == ConditionReconciled && (currentVSR.Status.Phase == SnapMoverBackupPhaseFailed || currentVSR.Status.Phase == SnapMoverBackupPhasePartiallyFailed) {
return false, errors.Errorf("volumesnapshotrestore %s has failed status", currentVSR.Name)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we return error if VSR failed? This causes the check command to fail out, which means we only clean up VSRs if they're all successful. Is this the desired behavior, or do we always want to clean up post-restore?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sseago I think we decided to only clean up completed CRs for debugging purposes

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disregard this comment, as we want the current behavior here.

Signed-off-by: Matthew Arnold <marnold@redhat.com>
Signed-off-by: Matthew Arnold <marnold@redhat.com>
@shubham-pampattiwar
Copy link
Member

@mrnold Unit tests are failing please make sure you sun make update and make test` locally and see if there are any changes needed

phase = " partially failed "
}
if err = datamover.DeleteVSRsIfComplete(restore.Name, log); err != nil {
return ctrl.Result{}, errors.Wrapf(err, "error cleaning up after%srestore", phase)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we really need the extra phase text in the message. That makes the flow slightly more confusing, and it doesn't really add much.

Signed-off-by: Matthew Arnold <marnold@redhat.com>
@openshift-ci
Copy link

openshift-ci bot commented May 19, 2023

@mrnold: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/unit-test 734897b link false /test unit-test

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@weshayutin
Copy link

@weshayutin
Copy link

@sseago @kaovilai @mrnold @shubham-pampattiwar on a call waiving.

@weshayutin weshayutin merged commit a147573 into openshift:oadp-1.2 May 19, 2023
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants