Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZTS: Use QEMU for tests on Linux and FreeBSD #15838

Closed
wants to merge 3 commits into from

Conversation

mcmilk
Copy link
Contributor

@mcmilk mcmilk commented Jan 30, 2024

Motivation and Context

We have the need for more tests on operating systems != Ubuntu.

Description

ZTS: Use QEMU for tests on Linux and FreeBSD...

This commit adds functional tests for these systems:

  • AlmaLinux 8, AlmaLinux 9, ArchLinux

  • CentOS Stream 9, Fedora 39, Fedora 40

  • Debian 11, Debian 12

  • FreeBSD 13, FreeBSD 14, FreeBSD 15

  • Ubuntu 20.04, Ubuntu 22.04, Ubuntu 24.04

  • enabled by default:

    • AlmaLinux 8, AlmaLinux 9, CentOS Stream 9
    • Fedora 39, Fedora 40
    • FreeBSD 13, FreeBSD 14, FreeBSD 15
    • Ubuntu LTS 20.04, 22.04, 24.04

Workflow for each operating system:

  • install qemu on the github runner
  • download current cloud image of operating system
  • start and init that image via cloud-init
  • install dependencies and poweroff system
  • start system and build openzfs and then poweroff again
  • clone build system and start 3 instances of it
  • the full functional testings should complete within around 4h

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@mcmilk
Copy link
Contributor Author

mcmilk commented Jan 30, 2024

Most FreeBSD tests will get fixed via starting nfsd+samba I think.

@tonyhutter
Copy link
Contributor

@mcmilk I see you currently have this marked as "Draft". When you think it's ready to be reviewed, please let us know and we can take a look.

@behlendorf behlendorf added the Status: Work in Progress Not yet ready for general review label Feb 2, 2024
@mcmilk mcmilk marked this pull request as ready for review March 3, 2024 20:53
@mcmilk
Copy link
Contributor Author

mcmilk commented Mar 3, 2024

Seems ready, I included the FreeBSD src.txz within the FreeBSD cloud image.
But these testings will take some time..... ;-)

@tonyhutter
Copy link
Contributor

Note: I'm actively testing this PR in #16195. Right now I'm running down a bunch of test failures.

@mcmilk
Copy link
Contributor Author

mcmilk commented Jun 5, 2024

Note: I'm actively testing this PR in #16195. Right now I'm running down a bunch of test failures.

I am back from holiday and will also help. I'll investigate the serial console thing first.

@mcmilk
Copy link
Contributor Author

mcmilk commented Jun 16, 2024

It's not final.

The summary isn't ready and some debug things need to be removed.

Can I leave the Ubuntu tests out?
Reason: we have 20 actions runners, this PR needs 15:

  • 1x for checkstyle
  • 1x for CodeQL
  • 13x for the different systems

I would like to add some SUSE distribution as well.

@mcmilk mcmilk force-pushed the qemu-machines branch 3 times, most recently from 3fd75b9 to 702642f Compare June 17, 2024 14:56
@tonyhutter
Copy link
Contributor

Just to make things easier (and not use so many runners), you can exclude the debian* centos-stream* and archlinux runners, since we currently don't support them in buildbot. And when I say exclude, I mean just don't include them in zfs-linux.yml, but keep the rest of the support code you've written (like debian() and archlinux()).

@mcmilk
Copy link
Contributor Author

mcmilk commented Jul 17, 2024

I think it's done now. We can remove the "Status: Work in Progress" badge....

@tonyhutter - What do you think?

@tonyhutter
Copy link
Contributor

@mcmilk that's great news! I'll take a look once all the runners report back.

@mcmilk
Copy link
Contributor Author

mcmilk commented Jul 18, 2024

@mcmilk that's great news! I'll take a look once all the runners report back.

I force pushed again and removed centos-stream-9 and some debugging things within the scripts.

I have seen that you would like to split the tests into fractions like this: 1/3 2/3 ... do you want to add this later or is this just an idea?

@mcmilk
Copy link
Contributor Author

mcmilk commented Jul 19, 2024

I have added FreeBSD 13.3 RELEASE and FreeBSD 14.1 RELEASE to the testings.
It would be nice, if we can also add Debian 11 + 12 by default to the tesstings.

@tonyhutter
Copy link
Contributor

tonyhutter commented Jul 22, 2024

I have seen that you would like to split the tests into fractions like this: 1/3 2/3 ... do you want to add this later or is this just an idea?

Correct, right now it's just an idea. I think it might help with some timing-related failures like:

almalinux8: auto_replace_002_pos
Fedora 40: zpool_status_008_pos

I also vaguely remember buildbot giving me issues if I ran with instances that were less than 8GB RAM as well. That's why I'm curious if running 2 VMs with 8GB RAM might make many of this failures go away. I'm starting to get my variable-number-of-VMs code working with 2 VMs in my testing PR (tonyhutter#1), but I haven't gotten a full run working yet. Once I can get a full run with 2 VMs tested, I wanted to compare it's failures to the remaining failures in this PR. That will help us understand if the failures are timing/underpowered-VM related, or if we need to do some manual fixes to the tests.

@mcmilk
Copy link
Contributor Author

mcmilk commented Jul 24, 2024

Oh no, I forgot the changed zfs-tests.sh script for this pull request :(

@mcmilk
Copy link
Contributor Author

mcmilk commented Jul 24, 2024

Almalinux 8+9, Debian and the FreeBSD 13+14 systems should go green now.

@mcmilk
Copy link
Contributor Author

mcmilk commented Aug 14, 2024

It would be easier - and faster - if the github runners would have 16Gig more RAM.
I think the PR is ready now.

@tonyhutter
Copy link
Contributor

@mcmilk I think we might be missing some stderr output on the QEMU builders. For example, here's the same ZTS bug (#16439) on both builders:

QEMU:

  config.status: executing depfiles commands
  config.status: executing libtool commands
  config.status: executing po-directories commands
  make[2]: Entering directory '/tmp/zfs-build-zfs-yeSNFC5X/BUILD/zfs-kmod-2.2.99/_kmod_build_6.10.3-100.fc39.x86_64'
    GEN      gitrev
  make  all-recursive
  make[3]: Entering directory '/tmp/zfs-build-zfs-yeSNFC5X/BUILD/zfs-kmod-2.2.99/_kmod_build_6.10.3-100.fc39.x86_64'
  Making all in include
  make[4]: Entering directory '/tmp/zfs-build-zfs-yeSNFC5X/BUILD/zfs-kmod-2.2.99/_kmod_build_6.10.3-100.fc39.x86_64/include'
  make[4]: Nothing to be done for 'all'.
  make[4]: Leaving directory '/tmp/zfs-build-zfs-yeSNFC5X/BUILD/zfs-kmod-2.2.99/_kmod_build_6.10.3-100.fc39.x86_64/include'
  Making all in module
  make[4]: Entering directory '/tmp/zfs-build-zfs-yeSNFC5X/BUILD/zfs-kmod-2.2.99/_kmod_build_6.10.3-100.fc39.x86_64/module'
  mkdir -p os/linux/spl/
  mkdir -p avl/ icp/ icp/algs/aes/ icp/algs/blake3/ icp/algs/edonr/ icp/algs/modes/ icp/algs/sha2/ icp/algs/skein/ icp/api/ icp/asm-aarch64/blake3/ icp/asm-aarch64/sha2/ icp/asm-arm/sha2/ icp/asm-ppc64/blake3/ icp/asm-ppc64/sha2/ icp/asm-x86_64/aes/ icp/asm-x86_64/blake3/ icp/asm-x86_64/modes/ icp/asm-x86_64/sha2/ icp/core/ icp/io/ icp/spi/ lua/ lua/setjmp/ nvpair/ os/linux/zfs/ unicode/ zcommon/ zfs/ zstd/ zstd/lib/common/ zstd/lib/compress/ zstd/lib/decompress/
  make -C /usr/src/kernels/6.10.3-100.fc39.x86_64  \
  	  \
  	M="$PWD"  CONFIG_DEBUG_INFO=y CONFIG_ZFS=m modules
  make[5]: Entering directory '/usr/src/kernels/6.10.3-100.fc39.x86_64'
  make[5]: Leaving directory '/usr/src/kernels/6.10.3-100.fc39.x86_64'
  make[4]: Leaving directory '/tmp/zfs-build-zfs-yeSNFC5X/BUILD/zfs-kmod-2.2.99/_kmod_build_6.10.3-100.fc39.x86_64/module'
  make[3]: Leaving directory '/tmp/zfs-build-zfs-yeSNFC5X/BUILD/zfs-kmod-2.2.99/_kmod_build_6.10.3-100.fc39.x86_64'
  make[2]: Leaving directory '/tmp/zfs-build-zfs-yeSNFC5X/BUILD/zfs-kmod-2.2.99/_kmod_build_6.10.3-100.fc39.x86_64'
  
  RPM build warnings:
  
  RPM build errors:
  make[1]: Leaving directory '/home/zfs/zfs'

https://github.com/openzfs/zfs/actions/runs/10388084683/job/28762944809

BUILDBOT:

config.status: executing depfiles commands
config.status: executing libtool commands
config.status: executing po-directories commands
+ make -j2
make[2]: Entering directory '/tmp/zfs-build-buildbot-2Wo4V8Y2/BUILD/zfs-kmod-2.2.99/_kmod_build_6.10.3-100.fc39.x86_64'
  GEN      gitrev
make  all-recursive
make[3]: Entering directory '/tmp/zfs-build-buildbot-2Wo4V8Y2/BUILD/zfs-kmod-2.2.99/_kmod_build_6.10.3-100.fc39.x86_64'
Making all in include
make[4]: Entering directory '/tmp/zfs-build-buildbot-2Wo4V8Y2/BUILD/zfs-kmod-2.2.99/_kmod_build_6.10.3-100.fc39.x86_64/include'
make[4]: Nothing to be done for 'all'.
make[4]: Leaving directory '/tmp/zfs-build-buildbot-2Wo4V8Y2/BUILD/zfs-kmod-2.2.99/_kmod_build_6.10.3-100.fc39.x86_64/include'
Making all in module
make[4]: Entering directory '/tmp/zfs-build-buildbot-2Wo4V8Y2/BUILD/zfs-kmod-2.2.99/_kmod_build_6.10.3-100.fc39.x86_64/module'
mkdir -p os/linux/spl/
mkdir -p avl/ icp/ icp/algs/aes/ icp/algs/blake3/ icp/algs/edonr/ icp/algs/modes/ icp/algs/sha2/ icp/algs/skein/ icp/api/ icp/asm-aarch64/blake3/ icp/asm-aarch64/sha2/ icp/asm-arm/sha2/ icp/asm-ppc64/blake3/ icp/asm-ppc64/sha2/ icp/asm-x86_64/aes/ icp/asm-x86_64/blake3/ icp/asm-x86_64/modes/ icp/asm-x86_64/sha2/ icp/core/ icp/io/ icp/spi/ lua/ lua/setjmp/ nvpair/ os/linux/zfs/ unicode/ zcommon/ zfs/ zstd/ zstd/lib/common/ zstd/lib/compress/ zstd/lib/decompress/
make -C /usr/src/kernels/6.10.3-100.fc39.x86_64  \
	  \
	M="$PWD"  CONFIG_DEBUG_INFO=y CONFIG_ZFS=m modules
make[5]: Entering directory '/usr/src/kernels/6.10.3-100.fc39.x86_64'
make[7]: *** No rule to make target '/tmp/zfs-build-buildbot-2Wo4V8Y2/BUILD/zfs-kmod-2.2.99/_kmod_build_6.10.3-100.fc39.x86_64/module/os/linux/spl/spl-atomic.o', needed by '/tmp/zfs-build-buildbot-2Wo4V8Y2/BUILD/zfs-kmod-2.2.99/_kmod_build_6.10.3-100.fc39.x86_64/module/spl.o'.  Stop.
make[7]: *** Waiting for unfinished jobs....
make[6]: *** [/usr/src/kernels/6.10.3-100.fc39.x86_64/Makefile:1946: /tmp/zfs-build-buildbot-2Wo4V8Y2/BUILD/zfs-kmod-2.2.99/_kmod_build_6.10.3-100.fc39.x86_64/module] Error 2
make[5]: *** [Makefile:252: __sub-make] Error 2
make[5]: Leaving directory '/usr/src/kernels/6.10.3-100.fc39.x86_64'
make[4]: *** [Makefile:56: modules-Linux] Error 2
make[4]: Leaving directory '/tmp/zfs-build-buildbot-2Wo4V8Y2/BUILD/zfs-kmod-2.2.99/_kmod_build_6.10.3-100.fc39.x86_64/module'
make[3]: *** [Makefile:12324: all-recursive] Error 1
make[3]: Leaving directory '/tmp/zfs-build-buildbot-2Wo4V8Y2/BUILD/zfs-kmod-2.2.99/_kmod_build_6.10.3-100.fc39.x86_64'
make[2]: *** [Makefile:4652: all] Error 2
make[2]: Leaving directory '/tmp/zfs-build-buildbot-2Wo4V8Y2/BUILD/zfs-kmod-2.2.99/_kmod_build_6.10.3-100.fc39.x86_64'
error: Bad exit status from /tmp/zfs-build-buildbot-2Wo4V8Y2/TMP/rpm-tmp.egsMTM (%build)

RPM build warnings:
    source_date_epoch_from_changelog set but %changelog is missing

RPM build errors:
    Bad exit status from /tmp/zfs-build-buildbot-2Wo4V8Y2/TMP/rpm-tmp.egsMTM (%build)
make[1]: *** [Makefile:14511: rpm-common] Error 1
make[1]: Leaving directory '/var/lib/buildbot/slaves/zfs/Fedora_39_x86_64__TEST_/build/zfs'
make: *** [Makefile:14445: rpm-kmod] Error 2

https://build.openzfs.org/builders/Fedora%2039%20x86_64%20%28TEST%29/builds/2491/steps/shell_1/logs/make

The test needs some adjusting within the timings.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Co-authored-by: Tino Reichardt <milky-zfs@mcmilk.de>
Sometimes the pool may start an auto scrub.

Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
This commit adds functional tests for these systems:
- AlmaLinux 8, AlmaLinux 9, ArchLinux
- CentOS Stream 9, Fedora 39, Fedora 40
- Debian 11, Debian 12
- FreeBSD 13, FreeBSD 14, FreeBSD 15
- Ubuntu 20.04, Ubuntu 22.04, Ubuntu 24.04

- enabled by default:
  - AlmaLinux 8, AlmaLinux 9
  - Fedora 39, Fedora 40
  - FreeBSD 13, FreeBSD 14, FreeBSD 15

Workflow for each operating system:
- install qemu on the github runner
- download current cloud image of operating system
- start and init that image via cloud-init
- install dependencies and poweroff system
- start system and build openzfs and then poweroff again
- clone build system and start 3 instances of it
- the functional testings complete within times < 3h

Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
@mcmilk
Copy link
Contributor Author

mcmilk commented Aug 16, 2024

I fixed these things:

  • the stderr messages are sent to the github runner again now
  • I rewrote the run() function completly, the return value of some failed run command is printed and used later
  • I also defined a DEBUG_MAX variable in qemu-7-summary.sh - so we don't output some really big debug file directly to the browser
  • rebased to master

An older testrun with failing Fedora 39+40 is here: https://github.com/mcmilk/zfs/actions/runs/10414909636

TODO:

  • detect kernel hangs and show them explicit
  • maybe restart such vm's and download the logfiles
  • increase DEBUG_MAX to around 400KB

@tonyhutter
Copy link
Contributor

@mcmilk this will take care of the checkstyle issues:

diff --git a/scripts/zfs-tests.sh b/scripts/zfs-tests.sh
index 957e674be..fde2e4acb 100755
--- a/scripts/zfs-tests.sh
+++ b/scripts/zfs-tests.sh
@@ -1,4 +1,4 @@
-#!/usr/bin/env bash
+#!/bin/sh
 # shellcheck disable=SC2154
 #
 # CDDL HEADER START
@@ -215,8 +215,8 @@ find_runfile() {
 #
 split_tags() {
        # Get numerator and denominator
-       NUM=$(echo $TAGS | cut -d/ -f1)
-       DEN=$(echo $TAGS | cut -d/ -f2)
+       NUM=$(echo "$TAGS" | cut -d/ -f1)
+       DEN=$(echo "$TAGS" | cut -d/ -f2)
        # At the point this is called, RUNFILES will contain a comma separated
        # list of full paths to the runfiles, like:
        #
@@ -242,9 +242,12 @@ split_tags() {
        #
        # "append,atime,bootfs,cachefile,checksum,cp_files,deadman,dos_attributes, ..."
 
-       cat ${RUNFILES/,/ } | tr -d [],\' | awk '/tags = /{print $NF}' | sort | \
+       # Change the comma to a space for easy processing
+       _RUNFILES="$(echo """$RUNFILES""" | sed 's/,/ /g')"
+       # shellcheck disable=SC2002,SC2086
+       cat $_RUNFILES | tr -d "[],\'" | awk '/tags = /{print $NF}' | sort | \
                uniq | grep -v functional | \
-               awk -v num=$NUM -v den=$DEN '{ if(NR % den == (num - 1)) {printf "%s,",$0}}' | \
+               awk -v num="$NUM" -v den="$DEN" '{ if(NR % den == (num - 1)) {printf "%s,",$0}}' | \
                sed -E 's/,$//'
 }
 
@@ -568,7 +571,7 @@ RUNFILES=${R#,}
 #
 # "append,atime,bootfs,cachefile,checksum,cp_files,deadman,dos_attributes, ..."
 #
-if echo $TAGS | grep -Eq '^[0-9]+/[0-9]+$' ; then
+if echo "$TAGS" | grep -Eq '^[0-9]+/[0-9]+$' ; then
        TAGS=$(split_tags)
 fi
 

@mcmilk
Copy link
Contributor Author

mcmilk commented Aug 16, 2024

I am testing zram disks again, it looks that they will speedup the whole thing a lot.

The checkstyle fixups will get included, thank you.

@mcmilk mcmilk closed this Aug 24, 2024
@mcmilk mcmilk deleted the qemu-machines branch August 24, 2024 13:36
@mcmilk
Copy link
Contributor Author

mcmilk commented Aug 25, 2024

I will re-open another PR with this topic soon. Some more testings need to be done first - sry.

@tonyhutter
Copy link
Contributor

@mcmilk I'm excited to see the new version!

@tonyhutter
Copy link
Contributor

For those following - a newer version of the this PR got merged: #16537

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Work in Progress Not yet ready for general review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants