Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Back up and restore a VM. #1333

Merged
merged 8 commits into from
Mar 5, 2024

Conversation

mrnold
Copy link
Contributor

@mrnold mrnold commented Feb 15, 2024

This pull request adds an actual backup and restore of a running VM. This includes some rework of the existing backup and restore test code, so that different parts of the sequence could be specialized for VM purposes. The result is a little bit awkward, but it makes almost no functional changes to existing tests.

@kaovilai
Copy link
Member

/hold

re:Do not merge this yet

@openshift-ci openshift-ci bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Feb 16, 2024
@mrnold
Copy link
Contributor Author

mrnold commented Feb 17, 2024

/retest

1 similar comment
@mrnold
Copy link
Contributor Author

mrnold commented Feb 18, 2024

/retest

@mrnold mrnold changed the title Do not merge: CI: Back up and restore a VM. CI: Back up and restore a VM. Feb 18, 2024
@mrnold
Copy link
Contributor Author

mrnold commented Feb 18, 2024

Okay, the VM backup and restore test has worked successfully at least once. This is at least worth having @mateusoliveira43 and @mpryc take a look at it.

@mpryc
Copy link
Contributor

mpryc commented Feb 18, 2024

/test 4.13-e2e-test-aws

@mrnold
Copy link
Contributor Author

mrnold commented Feb 18, 2024

Looks like failures in "Mongo application DATAMOVER" instead of "MySQL application two Vol CSI" this time, maybe one more retest for entertainment.

@mrnold
Copy link
Contributor Author

mrnold commented Feb 18, 2024

/retest

@weshayutin weshayutin removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 19, 2024
@mateusoliveira43
Copy link
Contributor

@weshayutin
Copy link
Contributor

@mateusoliveira43 it only partially resolves OADP-2982. I'll review how to associate a pr w/ a jira again w/ @mrnold . It's a just a jira task and for CI.

@@ -24,6 +25,9 @@ func CreateBackupForNamespaces(ocClient client.Client, veleroNamespace, backupNa
SnapshotMoveData: &snapshotMoveData,
},
}
if snapshotVolumes {
backup.Spec.SnapshotVolumes = pointer.Bool(true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maintainers, I never used this flag. This is needed if the PV is not in the same namespace as one of in IncludedNamespaces?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PVs are cluster-scoped and are not in any namespace. PVCs that are not in an included namespace will be excluded from backup.
This flag tells Velero to snapshot PVCs that are 1) included in the backup, and 2) not backed up using fs-backup. If this is false, then non-fs-backup PVCs will not be snapshotted.

@@ -22,6 +23,9 @@ func CreateRestoreFromBackup(ocClient client.Client, veleroNamespace, backupName
BackupName: backupName,
},
}
if restorePVs {
restore.Spec.RestorePVs = pointer.Bool(true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maintainers, what this option does with the restore?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm remembering correctly, this the restore equivalent to snapshotVolumes -- @shubham-pampattiwar is this correct?
Assuming I'm not misremembering, this means, similar to above, if it's true then we restore the PVC content from snapshot if it is a volume that was backed up via snapshot.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RestorePVs specifies whether to restore all included PVs from snapshot

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 @sseago

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The snapshot can be cloud provider based or csi snapshot. Both are applicable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shubham-pampattiwar It's confusing that the restore and backup flags aren't named similarly, but they are essentially equilalent flags, but one for backup, the other for restore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The snapshot can be cloud provider based or csi snapshot. Both are applicable.

Same with SnapshotVolumes. There was a fix a while back to make this flag do the same thing for CSI and native snapshots.

Copy link
Member

@shubham-pampattiwar shubham-pampattiwar Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes since velero 1.11 I think, initially these 2 flags were only applicable for cloud provider based snapshots.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest breaking this PR in 2

  • add backup/restore virt
  • remove duplications from backup/restore and must-gather tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I have started this process with #1345.

@mrnold
Copy link
Contributor Author

mrnold commented Feb 26, 2024

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 26, 2024
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 1, 2024
Add new VM tests, and work through some extra VM details like removing
the DataVolume (but not the PVC) before backup. This still fails because
the restore is PartiallyFailed, need to add test-specific validation.

Signed-off-by: Matthew Arnold <marnold@redhat.com>
Signed-off-by: Matthew Arnold <marnold@redhat.com>
Signed-off-by: Matthew Arnold <marnold@redhat.com>
Add some retries in places that might hit "the object was modified"
errors, and other small tweaks.

Signed-off-by: Matthew Arnold <marnold@redhat.com>
Trying to addres "DataVolume not found" errors from backup.

Signed-off-by: Matthew Arnold <marnold@redhat.com>
Signed-off-by: Matthew Arnold <marnold@redhat.com>
Signed-off-by: Matthew Arnold <marnold@redhat.com>
Signed-off-by: Matthew Arnold <marnold@redhat.com>
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 4, 2024
@mrnold
Copy link
Contributor Author

mrnold commented Mar 4, 2024

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 4, 2024
@shubham-pampattiwar
Copy link
Member

/retest

1 similar comment
@shubham-pampattiwar
Copy link
Member

/retest

@mrnold
Copy link
Contributor Author

mrnold commented Mar 5, 2024

/test 4.12-e2e-test-aws

Copy link
Contributor

@weshayutin weshayutin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/LGTM

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 5, 2024
Copy link

openshift-ci bot commented Mar 5, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mrnold, weshayutin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kaovilai
Copy link
Member

kaovilai commented Mar 5, 2024

/override ci/prow/4.12-e2e-test-aws

ya.. the change is for a completely diff test

:)
ps. we've seen this flake b4

Copy link

openshift-ci bot commented Mar 5, 2024

@kaovilai: Overrode contexts on behalf of kaovilai: ci/prow/4.12-e2e-test-aws

In response to this:

/override ci/prow/4.12-e2e-test-aws

ya.. the change is for a completely diff test

:)
ps. we've seen this flake b4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link

openshift-ci bot commented Mar 5, 2024

@mrnold: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit bf64895 into openshift:master Mar 5, 2024
16 checks passed
mrnold added a commit to mrnold/oadp-operator that referenced this pull request Jun 3, 2024
* Back up and restore a VM.

Add new VM tests, and work through some extra VM details like removing
the DataVolume (but not the PVC) before backup. This still fails because
the restore is PartiallyFailed, need to add test-specific validation.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Make log checking optional.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Include PV in VM backup and restore.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Attempt to improve VM test reliability.

Add some retries in places that might hit "the object was modified"
errors, and other small tweaks.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Wait for PVC owner reference to go away.

Trying to addres "DataVolume not found" errors from backup.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Try a readiness delay on VM backup.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Try booting VM from PVC instead of DataVolume.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Make verification hooks really optional.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

---------

Signed-off-by: Matthew Arnold <marnold@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. CI lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants