outagefs

outagefs emulates power outage to test application and filesystem behaviors.

It works by recording filesystem changes at the block device level, and replaying writes with unsynchronized writes dropped randomly. The recording is done by using fuse to expose a monitored file that represents the block device.

Currently, outagefs is mainly developed and tested on Linux.

Installation

outagefs can be installed via cargo:

cargo install outagefs

Example: "atomic rename" on ext4

It's common to create a file and rename it to overwrite an existing file, and expect the file to either have the new content, or the old content. How does that work practically on ext4? Let's find out.

Setup

First, prepare a base image of ext4 with a file b in it having some content:

# or, try `-s 1m` and see if it makes a difference
truncate -s 3m base
# or, try 'ext2'
mkfs.ext4 base
mkdir ext4root
sudo mount -o loop -t ext4 base ext4root
sudo sh -c 'seq 4000 > ext4root/b'
sudo umount ext4root

Record

Then, use outagefs to record the write + rename operation:

# try adding 'sync' before 'mv' if 'ext2' is used
outagefs mount --record --sudo --exec 'mount -o loop -t ext4 $1 ext4root; seq 2 6000 > ext4root/a; mv ext4root/{a,b}; umount ext4root'

(If the command failed with "fusermount: option allow_other only allowed ...", edit /etc/fuse.conf and uncommit user_allow_other, or run the outagefs command under root)

The above command uses base as the base image, mounts it as a single file with recording turned on, and passes that single file as $1 to the shell script. The shell script mounts the file as ext4 and makes changes to the ext4 filesystem. Writing to the mounted ext4 filesystem gets translated to low-level write and sync operations to the $1 file. The --record flag tells outagefs to write the changes back to disk as changes.

Let's check that outagefs does record some changes:

outagefs show

Verify

The property we want to verify is "b should have either new or old content". Let's express that in a script and name it verify.py:

import pathlib
path = pathlib.Path("./ext4root/b")

def seq(start, end):
    return b"".join([b"%d\n" % i for i in range(start, end + 1)])

try:
    if not path.exists():
        print("BAD: does not exist")
    else:
        data = path.read_bytes()
        if data == b"":
            print("BAD: empty file")
        elif data == seq(1, 4000):
            print("GOOD: old content")
        elif data == seq(2, 6000):
            print("GOOD: new content")
        else:
            print("BAD: unexpected content")
except Exception as ex:
    print(f"ERROR: {ex}")

Verify the end state is good:

outagefs mount --sudo --exec 'mount -o loop -t ext4 $1 ext4root && python3 verify.py; umount ext4root'
# should print 'GOOD: new content'

It's also good if all writes are discarded:

outagefs mount --filter 0 --sudo --exec 'mount -o loop -t ext4 $1 ext4root && python3 verify.py; umount ext4root'
# should print 'GOOD: old content'

Generating Tests

More interesting tests will be when some writes are discarded while others aren't. In theory it's possible to look at outagefs show result and find out what to discard, and figure out bits as a "filter" (1: take, 0 or not mentioned: discard), and test it like:

outagefs mount --filter 1000000001000000011 --sudo --exec 'mount -o loop -t ext4 $1 ext4root && python3 verify.py; umount ext4root'

It is time consuming to figure out interesting test cases manually. outagefs provides a subcommand to generate test cases:

outagefs gen-tests

This will print strings in the offset:bits form, suitable for --filter. gen-tests respects Sync operations. If a Sync is not discarded, none of the Writes before it would be discarded. It will also try to make the number of test cases bounded so tests can complete.

Now, let's just use the generated tests and run the verify script on them:

for f in $(outagefs gen-tests); do
    outagefs mount --filter $f --sudo --exec 'mount -o loop -t ext4 $1 ext4root && python3 verify.py; umount ext4root'
done

Tips

More Challenging Tests

The tests above might be not challenging enough. For example, individual writes are atomic and Sync are expected to work as expected. Hardware might have different properties. For example, having hardward-specific 2KB block size, or does not always respect Sync, or might corrupt data during writes. To make it easier to exercise such behaviors, outagefs has a mutate sub-command to rewrite changes:

outagefs mutate --split-write --zero-fill --drop-sync

The changes file will be updated with the rewritten result. Note that the internal filesystem state can break more easily. It's likely to see some tests erroring out at the mount command. It's also easier to trigger some errors like EUCLEAN or hangs.

Convenient Way to Run Tests

It is verbose and error-prone to setup, record, and run tests manually. The run-suite subcommand can be use to make it easier:

outagefs run-suite --sudo suite-examples/rename-no-fsync-ext2.py

The above command will create a temporary directory, call the script with prepare to create the base image, then changes to make changes to record, and eventually verify to verify test cases. After testing, the temporary directory is deleted.

Bisecting Tests

For non-trivial changes, there are a lot of test cases. Most of the cases are not very interesting. Only those that are transisting from a valid old state to a valid new state are:

Test Cases: |-------------|-------------|-----------|
State:      | Old State   | Interesting | New State |

It is more efficient to bisect the "Interesting" cases to find out obviously broken cases. The verification script can choose to return exit code in the 10 to 19 range to indicate states. Something like:

if is_good_old_state():
    sys.exit(11)
elif is_good_new_state():
    sys.exit(12)

The run-suite command can use the information to bisect the test cases. If there is nothing to bisect, run-suite will run the remaining tests in order.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
src		src
suite-examples		suite-examples
vendor/fuse		vendor/fuse
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
default.nix		default.nix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

outagefs

Installation

Example: "atomic rename" on ext4

Setup

Record

Verify

Generating Tests

Tips

More Challenging Tests

Convenient Way to Run Tests

Bisecting Tests

About

Languages

License

quark-zju/outagefs

Folders and files

Latest commit

History

Repository files navigation

outagefs

Installation

Example: "atomic rename" on ext4

Setup

Record

Verify

Generating Tests

Tips

More Challenging Tests

Convenient Way to Run Tests

Bisecting Tests

About

Topics

Resources

License

Stars

Watchers

Forks

Languages