Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

USHIFT-1312 Test: Rebooting healthy system should result in backing up the data #1866

Merged
merged 2 commits into from
Jun 19, 2023

Conversation

pmtk
Copy link
Member

@pmtk pmtk commented Jun 1, 2023

Tests functionality of #1874

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 1, 2023
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 1, 2023
@pmtk
Copy link
Member Author

pmtk commented Jun 1, 2023

This test will probably require setting up current microshift-e2e to ignore ostree tags or provide mechanism to not run tests on non-ostree-based systems

@pmtk
Copy link
Member Author

pmtk commented Jun 1, 2023

/cc @dhellmann @jogeo

@openshift-ci openshift-ci bot requested review from dhellmann and jogeo June 1, 2023 06:53
test/suites/upgrade_ostree.robot Outdated Show resolved Hide resolved
test/suites/upgrade_ostree.robot Outdated Show resolved Hide resolved
test/suites/upgrade_ostree.robot Outdated Show resolved Hide resolved
@pmtk pmtk changed the title WIP USHIFT-1083 Test backup after reboot from healthy system USHIFT-1083 Test backup after reboot from healthy system Jun 2, 2023
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 2, 2023
@pmtk pmtk changed the title USHIFT-1083 Test backup after reboot from healthy system WIP USHIFT-1083 Test backup after reboot from healthy system Jun 2, 2023
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 2, 2023
@pmtk pmtk changed the title WIP USHIFT-1083 Test backup after reboot from healthy system USHIFT-1083 Test backup after reboot from healthy system Jun 2, 2023
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 2, 2023
@pmtk pmtk changed the title USHIFT-1083 Test backup after reboot from healthy system WIP USHIFT-1083 Test backup after reboot from healthy system Jun 2, 2023
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 2, 2023
@pmtk pmtk changed the title WIP USHIFT-1083 Test backup after reboot from healthy system USHIFT-1083 Test backup after reboot from healthy system Jun 5, 2023
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 5, 2023
@pmtk
Copy link
Member Author

pmtk commented Jun 5, 2023

/cc @dhellmann @pacevedom @jogeo @dhensel-rh

This PR needs some sort mechanism to not run tests intended for R4E/ostree-based systems on RPM.
Approaches that come to mind:

  • doing a runtime check and deciding if test should run or be skipped
  • using tags to filter out irrelevant test suites
    any other ideas?

@openshift-ci openshift-ci bot requested a review from dhensel-rh June 5, 2023 14:43
Copy link
Contributor

@dhellmann dhellmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks OK, but I see the e2e test is failing with an error in this new test suite

Keyword 'Greenboot Health Check Exited' failed after retrying for 5 minutes. The last error was: dead != exited

@@ -25,3 +25,15 @@ Login MicroShift Host
Logout MicroShift Host
[Documentation] Close the open ssh connection to the MicroShift host
SSHLibrary.Close Connection

Reboot MicroShift Host
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be more robust if we looked for a different boot id after the reboot, to avoid any race condition with issuing the reboot command and sshd stopping. It doesn't seem like the most likely race, so let's make that change in another PR.

${deployID}= Get Booted Deployment ID

${rm_output} ${rm_stderr} ${rc}= Execute Command
... find ${BACKUP_STORAGE} -name '${deployID}*' -exec rm -rf {} +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We know the exact path, right? Can we just remove it, instead of using find?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With current flow (make new backup, delete old backup) there's always a chance that more than one backup will be present, this just makes sure that we're starting clean.
And we don't necessarily know the path, with randomized order of tests, in future, we could have a backup from two boots away, or more.
Also, health.json might already change (post-healthchecks) and point to current boot instead of previous

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, we probably want to do something to limit the depth of the find call, then. Is it enough to say rm -rf ${deployID}* instead of using find at all?

And we probably want to check for deployID being empty, to ensure we don't delete everything.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find was only looking into /var/lib/microshift-backups/ but I changed it to rm -rf because it's easier to read

@pmtk pmtk changed the title USHIFT-1083 Test backup after reboot from healthy system USHIFT-1312 Test backup after reboot from healthy system Jun 6, 2023
@pmtk pmtk changed the title USHIFT-1312 Test backup after reboot from healthy system USHIFT-1312 Test: Rebooting healthy system should result in backing up the data Jun 6, 2023
@@ -0,0 +1,82 @@
*** Settings ***
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is in separate dir so e2e CI job doesn't try to run it against non-ostree system.
When we have e2e for ostree systems we can think if we want to move it back to suites/ and use tags (or other method to filter tests)

@pmtk
Copy link
Member Author

pmtk commented Jun 6, 2023

/label tide/merge-method-squash

@openshift-ci openshift-ci bot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jun 6, 2023
@pmtk
Copy link
Member Author

pmtk commented Jun 7, 2023

/remove-label tide/merge-method-squash

Remembered that tide's squash results in ugly commit messages. I'll squash the commits myself

@openshift-ci openshift-ci bot removed the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jun 7, 2023
Copy link
Contributor

@dhellmann dhellmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small comment, but this generally looks good.

test/resources/microshift-process.resource Show resolved Hide resolved
Copy link
Contributor

@dhellmann dhellmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I'll leave it for @jogeo to approve since he wanted a chance to review, too.

Copy link
Contributor

@jogeo jogeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding this test, it looks good over all. I've left a few questions and comments.

[Documentation] Get ID of currently booted deployment

${status}= Get Booted Deployment Status
RETURN ${status["deployments"][0]["id"]}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect there to always be only one deployment, so the index of 0 will always be the deployment we should get the "id" from?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses kw Get Booted Deployment Status which runs rpm-ostree status --booted --json
Thanks to --booted it will only output one deployment, but it's still using the same data structure, i.e. deployments in an array. With --booted it will only have 1 element.

Without that switch, 0 could be either booted or staged deployment

[Documentation] Get system health information from health.json file

${health} ${stderr} ${rc}= Execute Command
... jq -r '.health' ${BACKUP_STORAGE}/health.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'Evaluate json.loads' was used above in keyword "Get Booted Deployment Status". Should we be consistent in the pattern we use to ingest and parse json?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a side note, 'jq' is used by scripts such as microshift_set_healthy.sh, so it's not a bad thing that we are also using it in our test scripts. My initial comment was just a reaction to seeing json parsed in the same PR using two different approaches.

Copy link
Member Author

@pmtk pmtk Jun 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. jq here felt simpler, otherwise I'd have either cat and use json.loads or... I don't know what, I don't think SSHLibrary.Get File supports privilege escalation and health.json is only readable by root.
Also Get File scp's file, I'd rather use some kind of slurp which I haven't seen in that library.

Maybe we could create some helper keywords to slurping and modifying files as root and have flow consistent

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of keywords for grabbing files only readable by root. Save Default MicroShift Config uses cat to grab the contents of the config file, and it could use the same reusable keyword for that step. Then something like Upload String To File could be modified to take arguments for the owner and mode of the output file.

[Documentation] Remove backups for current deployment

${path}= Get Booted Deployment Backup Prefix Path

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we test path here with 'Should Not Be Empty'? I see that 'Get Booted Deployment Backup Prefix Path' has a check for empty values. However, if the code is changed to use a different key word at some point to get path we would not want to accidentally call 'rm -rf *'

... rpm-ostree status --booted --json
... return_rc=True
Should Be Equal As Integers 0 ${rc}
${json}= Evaluate json.loads('''${ostree_status}''') json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For YAML, I found it handy to parse the text and then return a RF DottedDict type to make accessing the data from RF keywords easier. See the YAML library in the resources directory.

@ggiguash
Copy link
Contributor

Let's merge this in the current form. The tests are not going to run because they are in a separate folder.
We now have a metal environment to run this tests, so let's try those out.
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 19, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 19, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ggiguash, pmtk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@pmtk
Copy link
Member Author

pmtk commented Jun 19, 2023

/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 19, 2023

@pmtk: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit df6387e into openshift:main Jun 19, 2023
6 checks passed
@pmtk pmtk deleted the test-backup-on-boot branch June 19, 2023 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants