Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests/inst: Add destructive test framework #2127

Merged
merged 1 commit into from Aug 17, 2020

Conversation

cgwalters
Copy link
Member

This adds infrastructure to the Rust test suite for destructive
tests, and adds a new transactionality test which runs
rpm-ostree in a loop (along with ostree-finalize-staged) and
repeatedly kills them.

The main goal here is to flush out any "logic errors". I plan
to further extend this to reboots and then force poweroffs.

cgwalters added a commit to cgwalters/coreos-assembler that referenced this pull request Jun 11, 2020
As I was working on extending some of ostree's destructive
test suite to do reboots:
ostreedev/ostree#2127

I realized that the Debian autopkgtest API for rebooting is
better, because it allows *saving state external to the host*.

Rather than having the test count boots as ostree is doing
today, the "mark" allows us to more reliably dispatch.
And further, becase we don't rely on writing anything to disk
on the target, we can add clean support for "forced reboots"
that might kill the OS before we write to persistent storage there.

The "between reboot" state lives in the test runner's memory instead.

We retain support for the previous (two!) reboot APIs here for now.

I tested this with basically the example script
from the Debian autopkgtest specification:

```
set -xeuo pipefail
case "${AUTOPKGTEST_REBOOT_MARK:-}" in
  "") echo "test beginning"; /tmp/autopkgtest-reboot mark1 ;;
  mark1) echo "test in mark1"; /tmp/autopkgtest-reboot mark2 ;;
  mark2) echo "test in mark2" ;;
  *) echo "unexpected mark: ${AUTOPKGTEST_REBOOT_MARK}"; exit 1;;
esac
echo "ok autopkgtest rebooting"
```

I think it will make sense actually to implement more of the autopkgtest
API - Debian has a nontrivial number of tests using this, and I
think there's even work upstream in e.g. systemd to bridge its
tests to autopkgtest.  Which would mean we gain "run systemd's tests in kola"
for free.
cgwalters added a commit to cgwalters/coreos-assembler that referenced this pull request Jun 11, 2020
As I was working on extending some of ostree's destructive
test suite to do reboots:
ostreedev/ostree#2127

I realized that the Debian autopkgtest API for rebooting is
better, because it allows *saving state external to the host*.

Rather than having the test count boots as ostree is doing
today, the "mark" allows us to more reliably dispatch.
And further, becase we don't rely on writing anything to disk
on the target, we can add clean support for "forced reboots"
that might kill the OS before we write to persistent storage there.

The "between reboot" state lives in the test runner's memory instead.

We retain support for the previous (two!) reboot APIs here for now.

I tested this with basically the example script
from the Debian autopkgtest specification:

```
set -xeuo pipefail
case "${AUTOPKGTEST_REBOOT_MARK:-}" in
  "") echo "test beginning"; /tmp/autopkgtest-reboot mark1 ;;
  mark1) echo "test in mark1"; /tmp/autopkgtest-reboot mark2 ;;
  mark2) echo "test in mark2" ;;
  *) echo "unexpected mark: ${AUTOPKGTEST_REBOOT_MARK}"; exit 1;;
esac
echo "ok autopkgtest rebooting"
```

I think it will make sense actually to implement more of the autopkgtest
API - Debian has a nontrivial number of tests using this, and I
think there's even work upstream in e.g. systemd to bridge its
tests to autopkgtest.  Which would mean we gain "run systemd's tests in kola"
for free.
cgwalters added a commit to cgwalters/coreos-assembler that referenced this pull request Jun 12, 2020
As I was working on extending some of ostree's destructive
test suite to do reboots:
ostreedev/ostree#2127

I realized that the Debian autopkgtest API for rebooting is
better, because it allows *saving state external to the host*.

Rather than having the test count boots as ostree is doing
today, the "mark" allows us to more reliably dispatch.
And further, becase we don't rely on writing anything to disk
on the target, we can add clean support for "forced reboots"
that might kill the OS before we write to persistent storage there.

The "between reboot" state lives in the test runner's memory instead.

We retain support for the previous (two!) reboot APIs here for now.

I tested this with basically the example script
from the Debian autopkgtest specification:

```
set -xeuo pipefail
case "${AUTOPKGTEST_REBOOT_MARK:-}" in
  "") echo "test beginning"; /tmp/autopkgtest-reboot mark1 ;;
  mark1) echo "test in mark1"; /tmp/autopkgtest-reboot mark2 ;;
  mark2) echo "test in mark2" ;;
  *) echo "unexpected mark: ${AUTOPKGTEST_REBOOT_MARK}"; exit 1;;
esac
echo "ok autopkgtest rebooting"
```

I think it will make sense actually to implement more of the autopkgtest
API - Debian has a nontrivial number of tests using this, and I
think there's even work upstream in e.g. systemd to bridge its
tests to autopkgtest.  Which would mean we gain "run systemd's tests in kola"
for free.
cgwalters added a commit to cgwalters/coreos-assembler that referenced this pull request Jun 17, 2020
As I was working on extending some of ostree's destructive
test suite to do reboots:
ostreedev/ostree#2127

I realized that the Debian autopkgtest API for rebooting is
better, because it allows *saving state external to the host*.

Rather than having the test count boots as ostree is doing
today, the "mark" allows us to more reliably dispatch.
And further, becase we don't rely on writing anything to disk
on the target, we can add clean support for "forced reboots"
that might kill the OS before we write to persistent storage there.

The "between reboot" state lives in the test runner's memory instead.

We retain support for the previous (two!) reboot APIs here for now.

I tested this with basically the example script
from the Debian autopkgtest specification:

```
set -xeuo pipefail
case "${AUTOPKGTEST_REBOOT_MARK:-}" in
  "") echo "test beginning"; /tmp/autopkgtest-reboot mark1 ;;
  mark1) echo "test in mark1"; /tmp/autopkgtest-reboot mark2 ;;
  mark2) echo "test in mark2" ;;
  *) echo "unexpected mark: ${AUTOPKGTEST_REBOOT_MARK}"; exit 1;;
esac
echo "ok autopkgtest rebooting"
```

I think it will make sense actually to implement more of the autopkgtest
API - Debian has a nontrivial number of tests using this, and I
think there's even work upstream in e.g. systemd to bridge its
tests to autopkgtest.  Which would mean we gain "run systemd's tests in kola"
for free.
openshift-merge-robot pushed a commit to coreos/coreos-assembler that referenced this pull request Jun 17, 2020
As I was working on extending some of ostree's destructive
test suite to do reboots:
ostreedev/ostree#2127

I realized that the Debian autopkgtest API for rebooting is
better, because it allows *saving state external to the host*.

Rather than having the test count boots as ostree is doing
today, the "mark" allows us to more reliably dispatch.
And further, becase we don't rely on writing anything to disk
on the target, we can add clean support for "forced reboots"
that might kill the OS before we write to persistent storage there.

The "between reboot" state lives in the test runner's memory instead.

We retain support for the previous (two!) reboot APIs here for now.

I tested this with basically the example script
from the Debian autopkgtest specification:

```
set -xeuo pipefail
case "${AUTOPKGTEST_REBOOT_MARK:-}" in
  "") echo "test beginning"; /tmp/autopkgtest-reboot mark1 ;;
  mark1) echo "test in mark1"; /tmp/autopkgtest-reboot mark2 ;;
  mark2) echo "test in mark2" ;;
  *) echo "unexpected mark: ${AUTOPKGTEST_REBOOT_MARK}"; exit 1;;
esac
echo "ok autopkgtest rebooting"
```

I think it will make sense actually to implement more of the autopkgtest
API - Debian has a nontrivial number of tests using this, and I
think there's even work upstream in e.g. systemd to bridge its
tests to autopkgtest.  Which would mean we gain "run systemd's tests in kola"
for free.
@cgwalters cgwalters force-pushed the destructive-rs branch 3 times, most recently from ceb0ff0 to d818375 Compare July 2, 2020 13:57
@cgwalters
Copy link
Member Author

cgwalters commented Jul 27, 2020

OK a lot more work here; we're testing that we reliably survive forced poweroffs. Still TODO:

  • Re-merge the "kill -9" code and alternate between multiple "interrupt strategies" of "none, kill -9, reboot -ff" etc.
  • Test that we successfully fail if we e.g. omit fsync()

@cgwalters cgwalters force-pushed the destructive-rs branch 3 times, most recently from d66194f to dcf3914 Compare July 28, 2020 12:40
@cgwalters cgwalters force-pushed the destructive-rs branch 7 times, most recently from 190aca6 to 3addc62 Compare August 6, 2020 00:17
@cgwalters cgwalters changed the title WIP: tests/inst: Add destructive test framework tests/inst: Add destructive test framework Aug 6, 2020
@cgwalters
Copy link
Member Author

OK lifting WIP on this - fault injection would be another level, and I need to solve some other problems before doing that like being able to pull containers/binaries/packages from the "host" cosa container at least in qemu so we don't rely on internet access (as I really want to be able to run this test in a loop and not have it randomly flake for internet reasons).

jlebon
jlebon approved these changes Aug 7, 2020
Copy link
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, but LGTM overall. Cool stuff! It took a while honestly to grok how everything fits together. I think the cognitive load of thinking across reboots made it harder.

tests/inst/itest-macro/src/itest-macro.rs Outdated Show resolved Hide resolved
tests/inst/itest-macro/src/itest-macro.rs Outdated Show resolved Hide resolved
tests/inst/src/destructive.rs Show resolved Hide resolved
tests/inst/src/destructive.rs Outdated Show resolved Hide resolved
tests/inst/src/destructive.rs Outdated Show resolved Hide resolved
tests/inst/src/destructive.rs Show resolved Hide resolved
tests/inst/src/destructive.rs Outdated Show resolved Hide resolved
tests/inst/src/destructive.rs Show resolved Hide resolved
tests/inst/src/destructive.rs Outdated Show resolved Hide resolved
let res = res.context("Failed during upgrade")?;
if res {
println!(
"Failed to interrupt upgrade, attempt {}/{}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could lower the timeout by e.g. 10% in this case to increase the odds for the next time (maybe after we get to 5 retries or something).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about doing things like this - an issue I've seen is that when I start up 4 instances of this test in parallel, there's heavy CPU usage in the VMs where they're all doing the initial setup, and that inflates the timing for the test upgrade.

The better fix I think would be something like an env OSTREE_PAUSE_POINT=pre-deploy,post-deploy,pre-cleanup where the harness can control when the process continues. (Or maybe implement this with scripting gdb or so)

@cgwalters cgwalters force-pushed the destructive-rs branch 2 times, most recently from 43a46b8 to 7ccd3e6 Compare August 16, 2020 22:09
@cgwalters
Copy link
Member Author

OK ended up doing more fixes and tweaks here; I noticed that the results weren't including the "no-interrupt" case, and fixing/handling that required another tricky special case.

I also noticed there were fewer "completed" results for live interrupts than I expected, and that turned out to be me forgetting it needs to be systemctl stop ostree-finalize-staged and not start.

tests/inst/src/destructive.rs Outdated Show resolved Hide resolved
This adds infrastructure to the Rust test suite for destructive
tests, and adds a new `transactionality` test which runs
rpm-ostree in a loop (along with `ostree-finalize-staged`) and
repeatedly uses either `kill -9`, `reboot` and  `reboot -ff`.

The main goal here is to flush out any "logic errors".

So far I've validated that this passes a lot of cycles
using
```
$ kola run --qemu-image=fastbuild-fedora-coreos-ostree-qemu.qcow2 ext.ostree.destructive-rs.transactionality --debug --multiply 8 --parallel 4
```
a number of times.
@jlebon
Copy link
Member

jlebon commented Aug 17, 2020

/lgtm

@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, jlebon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 543610b into ostreedev:master Aug 17, 2020
3 of 4 checks passed
cgwalters added a commit to cgwalters/rpm-ostree that referenced this pull request Aug 17, 2020
We want to test upgrades that actually change files as a general
rule; in some cases we want to test "large" upgrades to validate
performance.

This code generates a "synthetic" upgrade that adds an ELF note
to a percentage of ELF files (randomly selected).  By doing
it this way we are only actually testing one version of the code.

Migrated from coreos/coreos-assembler#1635
using the Rust code from ostreedev/ostree#2127
cgwalters added a commit to cgwalters/rpm-ostree that referenced this pull request Aug 17, 2020
We want to test upgrades that actually change files as a general
rule; in some cases we want to test "large" upgrades to validate
performance.

This code generates a "synthetic" upgrade that adds an ELF note
to a percentage of ELF files (randomly selected).  By doing
it this way we are only actually testing one version of the code.

Migrated from coreos/coreos-assembler#1635
using the Rust code from ostreedev/ostree#2127
cgwalters added a commit to cgwalters/rpm-ostree that referenced this pull request Aug 18, 2020
We want to test upgrades that actually change files as a general
rule; in some cases we want to test "large" upgrades to validate
performance.

This code generates a "synthetic" upgrade that adds an ELF note
to a percentage of ELF files (randomly selected).  By doing
it this way we are only actually testing one version of the code.

Migrated from coreos/coreos-assembler#1635
using the Rust code from ostreedev/ostree#2127
openshift-merge-robot pushed a commit to coreos/rpm-ostree that referenced this pull request Aug 18, 2020
We want to test upgrades that actually change files as a general
rule; in some cases we want to test "large" upgrades to validate
performance.

This code generates a "synthetic" upgrade that adds an ELF note
to a percentage of ELF files (randomly selected).  By doing
it this way we are only actually testing one version of the code.

Migrated from coreos/coreos-assembler#1635
using the Rust code from ostreedev/ostree#2127
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants