Permalink
1192 lines (920 sloc) 49.2 KB

openQA tests developer guide

Introduction

openQA is an automated test tool that makes it possible to test the whole installation process of an operating system. It’s free software released under the GPLv2 license. The source code and documentation are hosted in the os-autoinst organization on GitHub.

This document provides the information needed to start developing new tests for openQA or to improve the existing ones. It’s assumed that the reader is already familiar with openQA and has already read the Starter Guide, available at the official repository.

Basic

This section explains the basic layout of openQA tests and the API available in tests. openQA tests are written in the Perl programming language. Some basic but no in-depth knowledge of Perl is needed. This document assumes that the reader is already familiar with Perl.

API

os-autoinst provides the API for the tests using the os-autoinst backend, you can take a look to the published documentation at http://open.qa/api/testapi/.

How to write tests

openQA tests need to implement at least the run subroutine to contain the actual test code and the test needs to be loaded in the distribution’s main.pm.

The test_flags subroutine specifies what happens when the test fails.

There are several callbacks defined:

  • post_fail_hook is called to upload log files or determine the state of the machine

  • pre_run_hook is called before the run function - mainly useful for a whole group of tests

  • post_run_hook is run after successful run function - mainly useful for a whole group of tests

The following example is a basic test that assumes some live image that boots into the desktop when pressing enter at the boot loader:

use base "basetest";
use strict;
use testapi;

sub run {
    # wait for bootloader to appear
    # with a timeout explicitly lower than the default because
    # the bootloader screen will timeout itself
    assert_screen "bootloader", 15;

    # press enter to boot right away
    send_key "ret";

    # wait for the desktop to appear
    assert_screen "desktop", 300;
}

sub test_flags {
    # 'fatal'          - abort whole test suite if this fails (and set overall state 'failed')
    # 'ignore_failure' - if this module fails, it will not affect the overall result at all
    # 'milestone'      - after this test succeeds, update 'lastgood'
    # 'norollback'     - don't roll back to 'lastgood' snapshot if this fails
    return { fatal => 1 };
}

1;

Test Case Examples

Console test that installs software from remote repository via zypper command
sub run() {
    # change to root
    become_root;

    # output zypper repos to the serial
    script_run "zypper lr -d > /dev/$serialdev";

    # install xdelta and check that the installation was successful
    assert_script_run 'zypper --gpg-auto-import-keys -n in xdelta';

    # additionally write a custom string to serial port for later checking
    script_run "echo 'xdelta_installed' > /dev/$serialdev";

    # detecting whether 'xdelta_installed' appears in the serial within 200 seconds
    die "we could not see expected output" unless wait_serial "xdelta_installed", 200;

    # capture a screenshot and compare with needle 'test-zypper_in'
    assert_screen 'test-zypper_in';
}
Typical X11 test testing kate
sub run() {
    # make sure kate was installed
    # if not ensure_installed will try to install it
    ensure_installed 'kate';

    # start kate
    x11_start_program 'kate';

    # check that kate execution succeeded
    assert_screen 'kate-welcome_window';

    # close kate's welcome window and wait for the window to disappear before
    # continuing
    wait_screen_change { send_key 'alt-c' };

    # typing a string in the editor window of kate
    type_string "If you can see this text kate is working.\n";

    # check the result
    assert_screen 'kate-text_shown';

    # quit kate
    send_key 'ctrl-q';

    # make sure kate was closed
    assert_screen 'desktop';
}

Variables

Test case behavior can be controlled via variables. Some basic variables like DISTRI, VERSION, ARCH are always set. Others like DESKTOP are defined by the 'Test suites' in the openQA web UI. Check the existing tests at os-autoinst-distri-opensuse on GitHub for examples.

Variables are accessible via the get_var and check_var functions.

Advanced test features

Capturing kernel exceptions and/or any other exceptions from the serial console

Soft and hard failures can be triggered on demand by regular expressions when they match the serial output which is done after the test is executed. In case it doesn’t make sense to continue test run even if current test module doesn’t have fatal flag, use fatal as serial failure type, so all subsequent test modules won’t be executed if such failure was detected. To use this functionality the test developer needs to define the patterns to look for in the serial output either in the main.pm or in the test itself. Any pattern change done in a test it will be reflected in the next tests.

The patterns defined in the main.pm will be valid for all the tests.

To simplify tests results review, if job fails with the same message, which is defined for the pattern, as previous job, automatic comment carryover will work even if test suites have failed due to different test modules.

Defining serial exception capture in the main.pm
$testapi::distri->set_expected_serial_failures([
        {type => 'soft', message  => 'known issue',  pattern => quotemeta 'Error'},
        {type => 'hard', message  => 'broken build', pattern => qr/exception/},
        {type => 'fatal', message => 'critical issue build', pattern => qr/kernel oops/},
    ]
);
Defining serial exception capture in the test
sub run {
    my ($self) = @_;
    $self->{serial_failures} = [
        {type => 'soft', message  => 'known issue',  pattern => quotemeta 'Error'},
        {type => 'hard', message  => 'broken build', pattern => qr/exception/},
        {type => 'fatal', message => 'critical issue build', pattern => qr/kernel oops/},
    ];
    ...
}
Adding serial exception capture in the test
sub run {
    my ($self) = @_;
    push @$self->{serial_failures}, {type => 'soft', message => 'known issue',  pattern => quotemeta 'Error'};
    ...
}

Assigning jobs to workers

By default, any worker can get any job with the matching architecture.

This behavior can be changed by setting job variable WORKER_CLASS. Jobs with this variable set (typically via machines or test suites configuration) are assigned only to workers, which have the same variable in the configuration file.

For example, the following configuration ensures, that jobs with WORKER_CLASS=desktop can be assigned only to worker instances 1 and 2.

workers.ini
[1]
WORKER_CLASS = desktop

[2]
WORKER_CLASS = desktop

[3]
# WORKER_CLASS is not set

Writing multi-machine tests

Scenarios requiring more than one system under test (SUT), like High Availability testing, are covered as multi-machine tests (MM tests) in this section.

OpenQA approaches multi-machine testing by assigning dependencies between individual jobs. This means the following:

  • everything needed for MM tests must be running as a test job (or you are on your own), even support infrastructure (custom DHCP, NFS, etc. if required), which in principle is not part of the actual testing, must have a defined test suite so a test job can be created

  • OpenQA scheduler makes sure tests are started as a group and in right order, cancelled as a group if some dependencies are violated and cloned as a group if requested.

  • OpenQA does not synchronize individual steps of the tests.

  • OpenQA provides locking server for basic synchronization of tests (e.g. wait until services are ready for failover), but the correct usage of locks is test designer job (beware deadlocks).

In short, writing multi-machine tests adds a few more layers of complexity:

  1. documenting the dependencies and order between individual tests

  2. synchronization between individual tests

  3. actual technical realization (i.e. custom networking)

Job dependencies

There are 2 types of dependencies: CHAINED and PARALLEL:

  • CHAINED describes when one test case depends on another and both are run sequentially, i.e. KDE test suite is run after and only after Installation test suite is successfully finished and cancelled if fail.

To define CHAINED dependency add variable START_AFTER_TEST with the name(s) of test suite(s) after which selected test suite is supposed to run. Use comma separated list for multiple test suite dependency. E.g. START_AFTER_TEST="kde,dhcp-server"

  • PARALLEL describes MM test, test suites are scheduled to run at the same time and managed as a group. On top of that, PARALLEL also describes test suites dependencies, where some test suites (children) run parallel with other test suites (parents) only when parents are running.

To define PARALLEL dependency, use PARALLEL_WITH variable with the name(s) of test suite(s) which acts as a parent suite(s) to selected test suite. In other words, PARALLEL_WITH describes "I need this test suite to be running during my run". Use comma separated list for multiple test suite dependency. E.g. PARALLEL_WITH="web-server,dhcp-server" Keep in mind that parent job must be running until all children finish, else scheduler will cancel child jobs once parent is done.

Job dependencies are only resolved when using the iso controller to create new jobs from job templates. Posting individual jobs manually won’t work.

Job dependencies are currently only possible between tests that are scheduled for the same machine.

OpenQA worker requirements

CHAINED dependency requires only one worker, since dependent jobs will run only after the first one finish. On the other hand PARALLEL dependency requires at least 2 workers for simple scenarios.

Examples:
CHAINED - i.e. test basic functionality before going advanced - requires 1 worker
A <- B <- C

Define test suite A,
then define B with variable START_AFTER_TEST=A and then define C with START_AFTER_TEST=B

-or-

Define test suite A, B
and then define C with START_AFTER_TEST=A,B
In this case however the start order of A and B is not specified.
But C will start only after A, B are successfully done.
PARALLEL basic High-Availability
A
^
B

Define test suite A
and then define B with variable PARALLEL_WITH=A.
A in this case is parent test suite to B and must be running throughout B run.
PARALLEL with multiple parents - i.e. complex support requirements for one test - requires 4 workers
A B C
\ | /
  ^
  D

Define test suites A,B,C
and then define D with PARALLEL_WITH=A,B,C.
A,B,C run in parallel and are parent test suites for D and all must run until D finish.
PARALLEL with one parent - i.e. running independent tests against one server - requires at least 2 workers
   A
   ^
  /|\
 B C D

Define test suite A
and then define B,C,D with PARALLEL_WITH=A
A is parent test suite for B, C, D (all can run in parallel).
Children B, C, D can run and finish anytime, but A must run until all B, C, D finishes.

Test synchronization and locking API

OpenQA provides locking server through lock API. To use lock API import lockapi package (use lockapi;) in your test file. Lock API provides functions: mutex_create, mutex_lock, mutex_unlock, mutex_wait. Each of these functions take at least one parameter: name of the lock. Note that lock name can’t contain "-" character. Locks are associated with caller`s job - locks can’t be unlocked by different job then the one who locked the lock.

mutex_lock tries to lock the mutex lock for caller`s job. If lock is unavailable or locked by someone else, mutex_lock call blocks.

mutex_unlock tries to unlock the mutex lock. If lock is locked by different job, mutex_unlock call blocks. When lock become available or if lock does not exist, call returns without doing anything.

mutex_wait is combination of mutex_lock & mutex_unlock that displays more information about mutex state (time spent waiting, location of lock). Use this if you wait for specific action from single place (apache is running on master node)

mutex_create create new mutex lock. When lock is created by mutex_create, lock is automatically unlocked. When mutex lock already exists call returns without doing anything.

Locks are addressed by their name. This name is valid in test group defined by their dependencies. If there are more groups running at the same time and the same lock name is used, these locks are independent of each other.

The mmapi package provides wait_for_children, which the parent can use to wait for the children to complete.

use lockapi;
use mmapi;

# On parent job
sub run {
    # ftp service started automatically on boot
    assert_screen 'login', 300;

    # unlock by creating the lock
    mutex_create 'ftp_service_ready';

    # wait until all children finish
    wait_for_children;
}

# On child we wait for ftp server to be ready
sub run {
    # wait until ftp service is ready
    # performs mutex lock & unlock internally
    mutex_wait 'ftp_service_ready';

    # connect to ftp and start downloading
    script_run 'ftp parent.job.ip';
    script_run 'get random_file';
}

# Mutexes can be used also for garanting exclusive access to resource
# Example on child when only one job should access ftp at time
sub run {
    # wait until ftp service is ready
    mutex_lock 'ftp_service_ready';

    # Perform operation with exclusive access
    script_run 'ftp parent.job.ip';
    script_run 'put only_i_am_here';
    script_run 'bye';

    # Allow other jobs to connect afterwards
    mutex_unlock 'ftp_service_ready';
}

Sometimes it is useful to wait for certain action from child or sibling job, not parent. In this case child or sibling will create a mutex and any cluster job can lock/unlock it.

The child can however die at any time. To prevent parent deadlock in this situation, it’s required to pass mutex owner job ID as a second parameter to mutex_lock and mutex_wait. Mutex owner is the job that creates the mutex. If a child job with given ID already finished, mutex_lock() calls die. Job ID is also required when unlocking such mutex.

Example of mmapi: Parent JobWait until the child reaches given point
use lockapi;
use mmapi;

sub run {
    my $children = get_children();

    # let's suppose there is only one child
    my $child_id = (keys %$children)[0];

    # this blocks until lock is available and then does nothing
    mutex_wait('child_reached_given_point', $child_id);

    # continue with the test
}

Mutexes are a way to wait for specific event from single job. When we need multiple jobs to reach required state we need to use barriers.

Before first use barrier needs to be created with barrier_create with 2 parameters - name and count. Name behaves as ID (same as with mutexes), count is number of jobs needed to call barrier_wait to unlock barrier.

There is optional barrier_wait parameter called check_dead_job. When used it will kill all jobs waiting in barrier_wait if one of cluster jobs die.

It prevents waiting for state that will never be reached (and eventually die on job timeout). Should be set only on one of barrier_wait calls.

Example is situation with 1 master and 3 worker jobs. We need to wait until 3 worker jobs perform initial setup. After that we can make cluster from them, but if one of them fails it makes no sense waiting.

Example of barriers: Check for dead jobs while waiting for barrier
use lockapi;

# In main.pm
barrier_create('NODES_CONFIGURED', 4);

# On master job
sub run {
    assert_screen 'login', 300;

    # Master is ready, waiting while workers are configured (check_dead_job is optional)
    barrier_wait {name => "NODES_CONFIGURED", check_dead_job => 1};

    # When 4 jobs called barrier_wait they are all unblocked
    script_run 'create_cluster';
    script_run 'test_cluster';

    # Notify all nodes we are finished
    mutex_create 'CLUSTER_CREATED';
    wait_for_children;
}

# On 3 worker jobs
sub run {
    assert_screen 'login', 300;

    # do initial worker setup
    script_run 'zypper in HA';
    script_run 'echo IP > /etc/HA/node_setup';

    # Join the group of jobs waiting for each other
    barrier_wait 'NODES_CONFIGURED';

    # Don't finish until cluster is created & tested
    mutex_wait 'CLUSTER_CREATED';
}

Getting information about parents and children

Example of mmapi: Getting info about parents / children
use base "basetest";
use strict;
use testapi;
use mmapi;

sub run {
    # returns a hash ref containing (id => state) for all children
    my $children = get_children();

    for my $job_id (keys %$children) {
      print "$job_id is cancelled\n" if $children->{$job_id} eq 'cancelled';
    }

    # returns an array with parent ids, all parents are in running state (see Job dependencies above)
    my $parents = get_parents();

    # let's suppose there is only one parent
    my $parent_id = $parents->[0];

    # any job id can be queried for details with get_job_info()
    # it returns a hash ref containing these keys:
    #   name priority state result worker_id
    #   t_started t_finished test
    #   group_id group settings
    my $parent_info = get_job_info($parent_id);

    # it is possible to query variables set by openqa frontend,
    # this does not work for variables set by backend or by the job at runtime
    my $parent_name = $parent_info->{settings}->{NAME}
    my $parent_desktop = $parent_info->{settings}->{DESKTOP}
    # !!! this does not work, VNC is set by backend !!!
    # my $parent_vnc = $parent_info->{settings}->{VNC}
}

Support Server based tests

The idea is to have a dedicated "helper server" to allow advanced network based testing.

Support server takes advantage of the basic parallel setup as described in the previous section, with the support server being the parent test 'A' and the test needing it being the child test 'B'. This ensures that the test 'B' always have the support server available.

Preparing the supportserver:

The support server image is created by calling a special test, based on the autoyast test:

/usr/share/openqa/script/client jobs post DISTRI=opensuse VERSION=13.2 \
    ISO=openSUSE-13.2-DVD-x86_64.iso  ARCH=x86_64 FLAVOR=Server-DVD \
    TEST=supportserver_generator MACHINE=64bit DESKTOP=textmode  INSTALLONLY=1 \
    AUTOYAST=supportserver/autoyast_supportserver.xml SUPPORT_SERVER_GENERATOR=1 \
    PUBLISH_HDD_1=supportserver.qcow2

This produces qemu image 'supportserver.qcow2' that contains the supportserver. The 'autoyast_supportserver.xml' should define correct user and password, as well as packages and the common configuration.

More specific role the supportserver should take is then selected when the server is run in the actual test scenario.

Using the supportserver:

In the Test suites, the supportserver is defined by setting:

HDD_1=supportserver.qcow2
SUPPORT_SERVER=1
SUPPORT_SERVER_ROLES=pxe,qemuproxy
WORKER_CLASS=server,qemu_autoyast_tap_64

where the SUPPORT_SERVER_ROLES defines the specific role (see code in 'tests/support_server/setup.pm' for available roles and their definition), and HDD_1 variable must be the name of the supportserver image as defined via PUBLISH_HDD_1 variable during supportserver generation. If the support server is based on older SUSE versions (opensuse 11.x, SLE11SP4..) it may also be needed to add HDDMODEL=virtio-blk. In case of qemu backend, one can also use BOOTFROM=c, for faster boot directly from the HDD_1 image.

Then for the 'child' test using this supportserver, the following additional variable must be set: PARALLEL_WITH=supportserver-pxe-tftp where 'supportserver-pxe-tftp' is the name given to the supportserver in the test suites screen. Once the tests are defined, they can be added to openQA in the usual way:

/usr/share/openqa/script/client isos post DISTRI=opensuse VERSION=13.2 \
        ISO=openSUSE-13.2-DVD-x86_64.iso ARCH=x86_64 FLAVOR=Server-DVD

where the DISTRI, VERSION, FLAVOR and ARCH correspond to the job group containing the tests. Note that the networking is provided by tap devices, so both jobs should run on machines defined by (apart from others) having NICTYPE=tap, WORKER_CLASS=qemu_autoyast_tap_64.

Example of Support Server: a simple tftp test

Let’s assume that we want to test tftp client operation. For this, we setup the supportserver as a tftp server:

HDD_1=supportserver.qcow2
SUPPORT_SERVER=1
SUPPORT_SERVER_ROLES=dhcp,tftp
WORKER_CLASS=server,qemu_autoyast_tap_64

With a test-suites name supportserver-opensuse-tftp.

The actual test 'child' job, will then have to set PARALLEL_WITH=supportserver-opensuse-tftp, and also other variables according to the test requirements. For convenience, we have also started a dhcp server on the supportserver, but even without it, network could be set up manually by assigning a free ip address (e.g. 10.0.2.15) on the system of the test job.

Example of Support Server: The code in the *.pm module doing the actual tftp test could then look something like the example below
use strict;
use base 'basetest';
use testapi;

sub run {
  my $script="set -e -x\n";
  $script.="echo test >test.txt\n";
  $script.="time tftp ".$server_ip." -c put test.txt test2.txt\n";
  $script.="time tftp ".$server_ip." -c get test2.txt\n";
  $script.="diff -u test.txt test2.txt\n";
  script_output($script);

}

assuming of course, that the tested machine was already set up with necessary infrastructure for tftp, e.g. network was set up, tftp rpm installed and tftp service started, etc. All of this could be conveniently achieved using the autoyast installation, as shown in the next section.

Example of Support Server: autoyast based tftp test

Here we will use autoyast to setup the system of the test job and the os-autoinst autoyast testing infrastructure. For supportserver, this means using proxy to access qemu provided data, for dowloading autoyast profile and tftp verify script:

HDD_1=supportserver.qcow2
SUPPORT_SERVER=1
SUPPORT_SERVER_ROLES=pxe,qemuproxy
WORKER_CLASS=server,qemu_autoyast_tap_64

The actual test 'child' job, will then be defined as :

AUTOYAST=autoyast_opensuse/opensuse_autoyast_tftp.xml
AUTOYAST_VERIFY=autoyast_opensuse/opensuse_autoyast_tftp.sh
DESKTOP=textmode
INSTALLONLY=1
PARALLEL_WITH=supportserver-opensuse-tftp

again assuming the support server’s name being supportserver-opensuse-tftp. Note that the pxe role already contains tftp and dhcp server role, since they are needed for the pxe boot to work.

Example of Support Server: The tftp test defined in the autoyast_opensuse/opensuse_autoyast_tftp.sh file could be something like:
set -e -x
echo test >test.txt
time tftp #SERVER_URL# -c put test.txt test2.txt
time tftp #SERVER_URL# -c get test2.txt
diff -u test.txt test2.txt && echo "AUTOYAST OK"

and the rest is done automatically, using already prepared test modules in tests/autoyast subdirectory.

Using text consoles and the serial terminal

Typically the OS you are testing will boot into a graphical shell e.g. The Gnome desktop environment. This is fine if you wish to test a program with a GUI, but in many situations you will need to enter commands into a textual shell (e.g Bash), TTY, text terminal, command prompt, TUI etc.

OpenQA has two basic methods for interacting with a text shell. The first uses the same input and output methods as when interacting with a GUI, plus a serial port for getting raw text output from the SUT. This is primarily implemented with VNC and so I will referrer to it as the VNC text console.

The serial port device which is used with the VNC text console is the default virtual serial port device in QEMU (i.e. the device configured with the -serial command line option). I will refer to this as the "default serial port". OpenQA currently only uses this serial port for one way communication from the SUT to the host.

The second method uses another serial port for both input and output. The SUT attaches a TTY to the serial port which os-autoinst logs into. All communication is therefor text based, similar to if you SSH’d into a remote machine. This is called the serial terminal console (or the virtio console, see implementation section for details).

The VNC text console is very slow and expensive relative to the serial terminal console, but allows you to continue using assert_screen and is more widely supported. Below is an example of how to use the VNC text console.

To access a text based console or TTY, you can do something like the

following.

use 5.018;
use warnings;
use base 'opensusebasetest';
use testapi;
use utils;

sub run {
    wait_boot;  # Utility function defined by the SUSE distribution
    select_console 'root-console';
}

1;

This will select a text TTY and login as the root user (if necessary). Now that we are on a text console it is possible to run scripts and observe their output either as raw text or on the video feed.

Note that root-console is defined by the distribution, so on different distributions or operating systems this can vary. There are also many utility functions that wrap select_console, so check your distribution’s utility library before using it directly.

Running a script: Using the assert_script_run and script_output commands
assert_script_run('cd /proc');
my $cpuinfo = script_output('cat cpuinfo');
if($cpuinfo =~ m/avx2/) {
    # Do something which needs avx2
}
else {
    # Do some workaround
}

This returns the contents of the SUT’s /proc/cpuinfo file to the test script and then searches it for the term 'avx2' using a regex.

The script_run and script_output are high level commands which use type_string and wait_serial underneath. Sometimes you may wish to use lower level commands which give you more control, but be warned that it may also make your code less portable.

The command wait_serial watches the SUT’s serial port for text output and matches it against a regex. type_string sends a string to the SUT like it was typed in by the user over VNC.

Using a serial terminal

Important
You need a QEMU version >= 2.6.1 and to set the VIRTIO_CONSOLE variable to 1 to use this with the QEMU backend.

Usually OpenQA controls the system under test using VNC. This allows the use of both graphical and text based consoles. Key presses are sent individually as VNC commands and output is returned in the form of screen images and text output from the SUT’s default serial port.

Sending key presses over VNC is very slow, so for tests which send a lot of text commands it is much faster to use a serial port for both sending shell commands and received program output.

Communicating entirely using text also means that you no longer have to worry about your needles being invalidated due to a font change or similar. It is also much cheaper to transfer text and test it against regular expressions than encode images from a VNC feed and test them against sample images (needles).

On the other hand you can no longer use assert_screen or take a screen shot because the text is never rendered as an image. A lot of programs will also send ANSI escape sequences which will appear as raw text to the test script instead of being interpreted by a terminal emulator which then renders the text.

select_console('root-virtio-terminal');  # Selects a virtio based serial terminal

The above code will cause type_string and wait_serial to write and read from a virtio serial port. A distribution specific call back will be made which allows os-autoinst to log into a serial terminal session running on the SUT. Once select_console returns you should be logged into a TTY as root.

If you are struggling to visualise what is happening, imagine SSH-ing into a remote machine as root, you can then type in commands and read the results as if you were sat at that computer. What we are doing is much simpler than using an SSH connection (it is more like using GNU screen with a serial port), but the end result looks quite similar.

As mentioned above, changing input and output to a serial terminal has the effect of changing where wait_serial reads output from. On a QEMU VM wait_serial usually reads from the default serial port which is also where the kernel log is usually output to.

When switching to a virtio based serial terminal, wait_serial will then read from a virtio serial port instead. However the default serial port still exists and can receive output. Some utility library functions are hard coded to redirect output to the default serial port and expect that wait_serial will be able to read it. Usually it is not too difficult to fix the utility function, you just need to remove some redirection from the relevant shell command.

Another common problem is that some library or utility function tries to take a screen shot. The hard part is finding what takes the screen shot, but then it is just a simple case of checking is_serial_terminal and not taking the screen shot if we are on a serial terminal console.

Distributions usually wrap select_console, so instead of using it directly, you can use something like the following which is from the OpenSUSE test suite.

if (select_virtio_console()) {
        # Do something which only works, or is necessary, on a serial terminal
}

This selects the virtio based serial terminal console if possible. If it is available then it returns true. It is also possible to check if the current console is a serial terminal by calling is_serial_terminal.

Once you have selected a serial terminal, the video feed will disappear from the live view, however at the bottom of the live screen there is a separate text feed. After the test has finished you can view the serial log(s) in the assets tab. You will probably have two serial logs; serial0.txt which is written from the default serial port and serial_terminal.txt.

Now that you are on a serial terminal console everything will start to go a lot faster. So much faster in fact that race conditions become a big issue. Generally these can be avoided by using the higher level functions such as script_run and script_output.

It is rarely necessary to use the lower level functions, however it helps to recognise problems caused by race conditions at the lower level, so please read the following section regardless.

So if you do need to use type_string and wait_serial directly then try to use the following pattern:

1) Wait for the terminal prompt to appear. 2) Send your command 3) Wait for your command text to be echoed by the shell (if applicable) 4) Send enter 5) Wait for your command output (if applicable)

To illustrate this is a snippet from the LTP test runner which uses the lower level commands to achieve a little bit more control. I have numbered the lines which correspond to the steps above.

my $fin_msg    = "### TEST $test->{name} COMPLETE >>> ";
my $cmd_text   = qq($test->{command}; echo "$fin_msg\$?");
my $klog_stamp = "echo 'OpenQA::run_ltp.pm: Starting $test->{name}' > /dev/$serialdev";

# More variables and other stuff

if (is_serial_terminal) {
        script_run($klog_stamp);
        wait_serial(serial_term_prompt(), undef, 0, no_regex => 1); #Step 1
        type_string($cmd_text);		  	    	     	    #Step 2
        wait_serial($cmd_text, undef, 0, no_regex => 1);	    #Step 3
        type_string("\n");     	      	 	     		    #Step 4
} else {
        # None serial terminal console code (e.g. the VNC console)
}
my $test_log = wait_serial(qr/$fin_msg\d+/, $timeout, 0, record_output => 1); #Step 5

The first wait_serial (Step 1) ensures that the shell prompt has appeared. If we do not wait for the shell prompt then it is possible that we can send input to whatever command was run before. In this case that command would be 'echo' which is used by script_run to print a 'finished' message.

It is possible that echo was able to print the finish message, but was then suspended by the OS before it could exit. In which case the test script is able to race ahead and start sending input to echo which was intended for the shell. Waiting for the shell prompt stops this from happening.

INFO: It appears that echo does not read STDIN in this case, and so the input will stay inside STDIN’s buffer and be read by the shell (Bash). Unfortunately this results in the input being displayed twice: once by the terminal’s echo (explained later) and once by Bash. Depending on your configuration the behavior could be completely different

The function serial_term_prompt is a distribution specific function which returns the characters previously set as the shell prompt (e.g. export PS1="# ", see the bash(1) or dash(1) man pages). If you are adapting a new distribution to use the serial terminal console, then we recommend setting a simple shell prompt and keeping track of it with utility functions.

The no_regex argument tells wait_serial to use simple string matching instead of regular expressions, see the implementation section for more details. The other arguments are the timeout (undef means we use the default) and a boolean which inverts the result of wait_serial. These are explained in the os-autoinst/testapi.pm documentation.

Then the test script enters our command with type_string (Step 2) and waits for the command’s text to be echoed back by the system under test. Terminals usually echo back the characters sent to them so that the user can see what they have typed.

However this can be disabled (see the stty(1) man page) or possibly even unimplemented on your terminal. So this step may not be applicable, but it provides some error checking so you should think carefully before disabling echo deliberately.

We then consume the echo text (Step 3) before sending enter, to both check that the correct text was received and also to separate it from the command output. It also ensures that the text has been fully processed before sending the newline character which will cause the shell to change state.

It is worth reminding oneself that we are sending and receiving data extremely quickly on an interface usually limited by human typing speed. So any string which results in a significant state change should be treated as a potential source of race conditions.

Finally we send the newline character and wait for our custom finish message. record_output is set to ensure all the output from the SUT is saved (see the next section for more info).

What we do not do at this point, is wait for the shell prompt to appear. That would consume the prompt character breaking the next call to script_run.

We choose to wait for the prompt just before sending a command, rather than after it, so that Step 5 can be deferred to a later time. In theory this allows the test script to perform some other work while the SUT is busy.

Sending new lines and continuation characters

The following command will timeout: script_run("echo \"1\n2\""). The reason being script_run will call wait_serial("echo \"1\n2\"") to check that the command was entered successfully and echoed back (see above for explanation of serial terminal echo, note the echo shell command has not been executed yet). However the shell will translate the newline characters into a newline character plus '>', so we will get something similar to the following output.

echo "1
> 2"

The '>' is unexpected and will cause the match to fail. One way to fix this is simply to do echo -e \"1\\n2\". In this case Perl will not replace \n with a newline character, instead it will be passed to echo which will do the substitution instead (note the '-e' switch for echo).

In general you should be aware that, Perl, the guest kernel and the shell may transform whatever character sequence you enter. Transformations can be spotted by comparing the input string with what wait_serial actually finds.

Sending signals - ctrl-c and ctrl-d

On a VNC based console you simply use send_key like follows.

send_key('ctrl-c');

This usually (see termios(3)) has the effect of sending SIGINT to whatever command is running. Most commands terminate upon receiving this signal (see signal(7)).

On a serial terminal console the send_key command is not implemented (see implementation section). So instead the following can be done to achieve the same effect.

type_string('', terminate_with => 'ETX');

The ETX ASCII code means End of Text and usually results in SIGINT being raised. In fact pressing ctrl-c may just be translated into ETX, so you might consider this a more direct method. Also you can use 'EOT' to do the same thing as pressing ctrl-d.

You also have the option of using Perl’s control character escape sequences in the first argument to type_string. So you can also send ETX with:

type_string("\cC");

The terminate_with parameter just exists to display intention. It is also possible to send any character using the hex code like '\x0f' which may have the effect of pressing the magic SysRq key if you are lucky.

The virtio serial terminal implementation

The os-autoinst package supports several types of 'consoles' of which the virtio serial terminal is one. The majority of code for this console is located in consoles/virtio_terminal.pm and consoles/virtio_screen.pm. However there is also related code in backends/qemu.pm and distribution.pm.

You may find it useful to read the documentation in virtio_terminal.pm and virtio_screen.pm if you need to perform some special action on a terminal such as triggering a signal or simulating the SysRq key. There are also some console specific arguments to wait_serial and type_string such as record_output.

The virtio 'screen' essentially reads data from a socket created by QEMU into a ring buffer and scans it after every read with a regular expression. The ring buffer is large enough to hold anything you are likely to want to match against, but not too large as to cause performance issues. Usually the contents of this ring buffer, up to the end of the match, are returned by wait_serial. This means earlier output will be overwritten once the ring buffer’s length is exceeded. However you can pass record_output which saves the output to a separate unlimited buffer and returns that instead.

Like record_output, the no_regex argument is a console specific argument supported by the serial terminal console. It may or may not have some performance benefits, but more importantly it allows you to easily match arbitrary strings which may contain regex escape sequences. To be clear, no_regex hints that wait_serial should just treat its input as a plain string and use the Perl library function index to search for a match in the ring buffer.

The send_key function is not implemented for the serial terminal console because the OpenQA console implementation would need to map key actions like ctrl-c to a character and then send that character. This may mislead some people into thinking they are actually sending ctrl-c to the SUT and also requires OpenQA to choose what character ctrl-c represents which varies across terminal configurations.

Very little of the code (perhaps none) is specific to a virtio based serial terminal and can be reused with a physical serial port, SSH socket, IPMI or some other text based interface. It is called the virtio console because the current implementation just uses a virtio serial device in QEMU (and it could easily be converted to an emulated port), but it otherwise has nothing to do with the virtio standard and so you should avoid using the name 'virtio console' unless specifically referring to the QEMU virtio implementation.

As mentioned previously, ANSI escape sequences can be a pain. So we try to avoid them by informing the shell that it is running on a 'dumb' terminal (see the SUSE distribution’s serial terminal utility library). However some programs ignore this, but piping there output into tee is usually enough to stop them outputting non-printable characters.

Test Development tricks

Modifying setting of an existing test

There is no interface to modify existing tests but the clone_job.pl script can be used to create a new job that adds, removes or changes settings. This script is located at /usr/share/openqa/script/.

/usr/share/openqa/script/clone_job.pl --from localhost --host localhost 42 FOO=bar BAZ=

If you do not want a cloned job to start up in the same job group as the job you cloned from, e.g. to not pollute build results, the job group can be overwritten, too, using the special variable _GROUP. Add the quoted group name, e.g.:

clone_job.pl --from localhost 42 _GROUP="openSUSE Tumbleweed"

The special group value 0 means that the group connection will be separated and the job will not appear as a job in any job group, e.g.:

clone_job.pl --from localhost 42 _GROUP=0

Backend variables for faster test execution

The os-autoinst backend offers multiple test variables which are helpful for test development. For example:

  • Set _EXIT_AFTER_SCHEDULE=1 if you only want to evaluate the test schedule before the test modules are executed

  • Use _SKIP_POST_FAIL_HOOKS=1 to prevent lengthy post_fail_hook execution in case of expected and known test fails, for examples when you need to create needles anyway

Using snapshots to speed up development of tests

For lower turn-around times during test development based on virtual machines the QEMU backend provides a feature that allows a job to start from a snapshot which can help in this situation.

Depending on the use case, there are two options to help:

  • Create and preserve snapshots for every test module run (MAKETESTSNAPSHOTS)

    • Offers more flexibility as the test can be resumed almost at any point. However disk space requirements are high (expect more than 30GB for one job)

    • This mode is useful for fixing non-fatal issues in tests and debugging SUT as more than just the snapshot of the last failed module is saved.

  • Create a snapshot after every successful test module while always overwriting the existing snapshot to preserve only the latest (TESTDEBUG)

    • Allows to skip just before the start of the first failed test module, which can be limiting, but preserves disk space in comparison to MAKETESTSNAPSHOTS.

    • This mode is useful for iterative test development

In both modes there is no need to modify tests (i.e. adding milestone test flag as the behaviour is implied). In the later mode every test module is also considered fatal. This means the job is aborted after the first failed test module.

Enable snapshots for each module

  • Run the worker with --no-cleanup parameter. This will preserve the hard disks after test runs.

  • Set MAKETESTSNAPSHOTS=1 on a job. This will make openQA save a snapshot for every test module run. One way to do that is by cloning an existing job and adding the setting:

clone_job.pl --from https://openqa.opensuse.org  --host localhost 24 MAKETESTSNAPSHOTS=1
  • Create a job again, this time setting the SKIPTO variable to the snapshot you need. Again, clone_job.pl comes handy here:

clone_job.pl --from https://openqa.opensuse.org  --host localhost 24 SKIPTO=consoletest-yast2_i
  • Use qemu-img snapshot -l something.img to find out what snapshots are in the image. Snapshots are named "test module category"-"test module name" (e.g. installation-start_install).

Storing only the last sucessful snapshot

  • Run the worker with --no-cleanup parameter. This will preserve the hard disks after test runs.

  • Set TESTDEBUG=1 on a job. This will make openQA save a snapshot after each successful test module run. Snapshots are overwritten. The snapshot is named lastgood in all cases.

clone_job.pl --from https://openqa.opensuse.org  --host localhost 24 TESTDEBUG=1
  • Create a job again, this time setting the SKIPTO variable to the snapshot which failed on previous run. Make sure the new job will also have TESTDEBUG=1 set. This can be ensured by the use of the clone_job script on the clone source job or specifying the variable explicitly:

clone_job.pl --from https://openqa.opensuse.org  --host localhost 24 TESTDEBUG=1 SKIPTO=consoletest-yast2_i