New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete related VSRs after restore completes. #269
Conversation
Skipping CI for Draft Pull Request. |
Signed-off-by: Matthew Arnold <marnold@redhat.com>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: mrnold The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Matthew Arnold <marnold@redhat.com>
Signed-off-by: Matthew Arnold <marnold@redhat.com>
Okay, I think this satisfies all the requirements, and it's working okay. It is very slow though, so I will keep working to try to have it not hammer the API so hard. |
pkg/datamover/datamover.go
Outdated
return err | ||
} | ||
|
||
for _, restore := range restoreList.Items { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are calling the CleanupRestoreVSRs
function for the restores with completed phase, IMO we should only cleanup VSRs for this current restore and not touch any other VSRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shubham-pampattiwar Hmm. I guess that depends on whether we want to clean up old non-cleaned restores or not. If we're only ever concerned with removing VSRs for the current restore, then this should be simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clean up only the vsr's associated w/ the restore in process. Matching the vsb and vsr behavior would be the right thing to do. Customers can clean any other $vsr's themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put this back to only removing VSRs for the current restore.
@@ -575,6 +576,9 @@ func (r *restoreReconciler) runValidatedRestore(restore *api.Restore, info backu | |||
r.logger.Debug("Restore completed") | |||
restore.Status.Phase = api.RestorePhaseCompleted | |||
r.metrics.RegisterRestoreSuccess(restore.Spec.ScheduleName) | |||
if err := datamover.DeleteVSRsIfComplete(restore.Name, r.logger); err != nil { | |||
r.logger.WithError(err).Error("Error removing VSRs after completed restore") | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need the same above after line 568/569, otherwise a single restore error will prevent VSR cleanup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, what if there are inProgress operations, does the cleanup just never happen, or is there another place which will call the same code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// check for a failed VSR | ||
for _, cond := range currentVSR.Status.Conditions { | ||
if cond.Status == metav1.ConditionFalse && cond.Reason == ReconciledReasonError && cond.Type == ConditionReconciled && (currentVSR.Status.Phase == SnapMoverBackupPhaseFailed || currentVSR.Status.Phase == SnapMoverBackupPhasePartiallyFailed) { | ||
return false, errors.Errorf("volumesnapshotrestore %s has failed status", currentVSR.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we return error if VSR failed? This causes the check command to fail out, which means we only clean up VSRs if they're all successful. Is this the desired behavior, or do we always want to clean up post-restore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sseago I think we decided to only clean up completed CRs for debugging purposes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disregard this comment, as we want the current behavior here.
Signed-off-by: Matthew Arnold <marnold@redhat.com>
Signed-off-by: Matthew Arnold <marnold@redhat.com>
@mrnold Unit tests are failing please make sure you sun |
phase = " partially failed " | ||
} | ||
if err = datamover.DeleteVSRsIfComplete(restore.Name, log); err != nil { | ||
return ctrl.Result{}, errors.Wrapf(err, "error cleaning up after%srestore", phase) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we really need the extra phase text in the message. That makes the flow slightly more confusing, and it doesn't really add much.
Signed-off-by: Matthew Arnold <marnold@redhat.com>
@mrnold: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
The unit test https://github.com/openshift/velero/blob/konveyor-dev/pkg/controller/restore_operations_controller_test.go#L70 is failing via cli invocation, goland invocation but passing vscode. |
@sseago @kaovilai @mrnold @shubham-pampattiwar on a call waiving. |
OADP-1872: delete VSRs after a restore completes, and clean up VSRs from completed restores and restores that no longer exist.