RTM - Regression Testing Manager
by Robert McLay
This document describes how to get started with the RTM testing manager. This document is split up into two parts. The first part describes how to install the lua scripting language and how to install the Themis package of which "rtm" is a part of. The second part explains how to use the tool by way of an example.
What rtm provides is a way to run tests and easily know what tests passed, failed or diffed. It is easy to run all tests, one test or select a subset of tests. Having an easy way to run tests means that refactoring is much easier. It also completely captures inputs and outputs and how to run your program. This is a convenient way to pass on knowledge of how your program works to others or yourself six months to a year later when you can't remember how things work.
The Themis package is a collection of tools. The main part is "rtm" the testing manager and the rest of the tools necessary to run it. This package is written in python.
The Themis package is much simplier to install. Take the attached tar ball and untar it where you wish it installed:
$ tar xzf themis-x.y.z.tar.gz
then add themis-x.y.z/bin to your PATH environment variable, where x.y.z is the version number such as 1.9.1
The idea behind rtm is to provide a general testing manager that independent of the program(s) under test. There is a test description file (*.desc) that contains the information required to run each test. In that file there is a section which contains a (Bourne) shell script that runs your test. In order for "rtm" to know if your test passed or not, your script will have to write to a particular file on how your test did.
Probably the best way to explain rtm is by way of an example. In the themis-x.y.z/rtm/doc directory is where this example lives. So change your directory to there and let's start.
$ cd themis-x.y.z/rtm/doc $ make
The make command will build two programs: testprog and diffprog. The testprog program is there to represent a program under test and the diffprog is there as an example of a program to compute the difference between a "gold" solution file and a test one.
The testprog.c file shows a very simple program that writes a file containing the numbers 0 to 99 with a little random noise added. The diffprog.c file takes two files and compares them. The diffprog takes several arguments:
$ diffprog result.csv 1.0e-6 gold_file test_file
The first argument is the name of the csv file that is required to tell rtm how your program did. The second argument is the norm test value. The final two arguments are the gold file that has the known good solution and test file.
The results.csv file looks something like this:
passed
This is actually a simple csv program. Lines that start with a `#' are comment lines. Obviously, the "passed" line should be either "passed", "failed" or "diff" depending on how your program did.
The test directory structure can be quite flexible. I usually create a directory named "rt" for regression tests and place the tests under "rt". I place each test in a separate directory although that is not required. The test directories can nested. This way similar tests can be grouped together. Generally, I recommend that tests live in leaf and not branch directories.
As an example, the test directory structure given here is:
.:
Themis.py diffprog.c rt/
Makefile testprog.c
./rt:
diff_test/ fail_test/ simple_test/ multiple_tests/
./rt/diff_test:
diff_test.desc testprog.gold
./rt/fail_test:
fail_test.desc testprog.gold
./rt/simple_test:
simple_test.desc testprog.gold
./rt/multiple_tests:
multiple_tests.desc testprog.gold
In the top level directory we have the Makefile and the two .c programs along with the rt directory. There is also a file called "Themis.py" which is necessary for rtm to work. You need to create a zero length file with that name above the directory that contains the tests.
So here we have four test directories underneath the rt directory, each with a test description file (*.desc) and a gold results file (testprog.gold).
Each of the *.desc files are similar, so lets look at the simple_test.desc file in detail.
# -*- python -*-
test_name = "simple_test"
test_descript = {
'description' : "a simple test",
'keywords' : [ "simple", test_name,],
'active' : True,
'test_name' : test_name,
'run_script' : """
PATH=$(projectDir):$PATH; export PATH
testprog testprog.soln
diffprog result.csv $(tol) $(testDir)/testprog.gold testprog.soln
finishTest -o $(resultFn) -t $(runtimeFn) result.csv
""",
'tests' : [
{ 'id' : 't1', 'tol' : 1.01e-6},
],
}
The .desc file is a valid python program. Here we are setting key value pairs that describe the test for the dictionary "test_descript". As you can see there are several keyword that are specified so that rtm can run the test. In lua upper and lower case variables are distinct, so your test files need to match the case and spelling used here.
The file shown here has some required and optional key and values. Below is a table describing each:
description no An optional place to descibe the test. keywords no A list keywords so that a test can be selected. active yes Tells if test is active. Set value to "false" (w/o quotes) to mark as inactive. test_name yes A unique test name. run_script yes A parameterized shell script for running the test. tests yes An array of 1 or more tests. Each array entry must set the "id" key.
Later I'll describe the complete features on how to run rtm with all its power. For the moment lets see how a single test is run. By changing your directory to "simple_test" we can run that individual test:
$ rtm .
The period says run rtm by looking for all the test descriptions files from the current directory on down. In this case there is only the simple_test.desc file. The output will look like this:
Starting Tests:
Started : 22:00:00 tst: 1/1 P/F: 0:0, rt/simple_test/simple_test/t1
passed : 22:00:00 tst: 1/1 P/F: 1:0, rt/simple_test/simple_test/t1
Finished Tests
**********************************************************************
*** Test Results ***
**********************************************************************
Date: Mon Jun 16 22:00:00 2014
TARGET: x86_64_dbg
Themis Version: Themis 0.3
Total Test Time: 00:00:00.06
**********************************************************************
*** Test Summary ***
**********************************************************************
Total: 1
passed: 1
******* * **** ********* ***************
Results R Time Test Name version/message
******* * **** ********* ***************
passed R 0.061 rt/simple_test/simple_test/t1
The output is has two parts. The first part is from the "Starting Tests" to "Finished Tests". For each test it reports the start and end times, a record of the current number of pass/failed test and the test name.
The test name looks like a path but it is not. It is the relative path name from the directory that contains the "Themis.py" directory. So that is the "rt/simple_test". It is followed by the name of the test file base name (simple_test from simple_test.desc) and the id tag, in this case "t1". The result is "rt/simple_test/simple_test/t1"
The second part is the report of results. There is header information about the time the test was run and version information. Finally there is a report of the individual tests. If any tests fail/diff then a list of them is given. I'll show that later.
The way the an individual *.desc file is run is as follows. The id value is used to name the test. If there are any other keyword listed on the entry then they are added to the list of parameters that are used in the substitution process (i.e. tol = 1.01e-6). The lines given for the runScript keyword are used to build the actual script used.
Actual step to run an individual *.desc file is as follows:
- The id value is used to name the test.
- The output directory for running the test is created as a subdirectory to the desc file. Its name is $id/$date_name where $id is the id value and $date_name is the date_os_arch_testName: (e.g. "t1/2010_02_22_14_58_04-Linux-x86_64-simple_test")
- The runScript value is converted to an actual shell script with the parameter substituted. A table below gives the list of predefined names.
- The generated script is run in the output directory.
- The output directory is checked for a the result file and the runtime file to report the status of the test.
There is a list of predefined parameters that are used in the expansion of the runScript value into the script:
Key Value
----------- ----------
idTag The id value
outputDir The absolute path to the output directory
projectDir The absolute path to the directory that has
Themis.py in it.
resultFn The absolute path to the results file ($id.results)
runtimeFn The absolute path to the runtime file ($id.runtime)
testDir The absolute path to the directory that has the
*.desc file in it.
testName The testName value given in the .desc file.
tesdescriptFn the absolute path to the .desc file.
The run_script value is repeated here:
'run_script' : """
PATH=$(projectDir):$PATH; export PATH
testprog testprog.soln
diffprog result.csv
The script given here should now be clear what is happening. Line 1 adds the project directory which has the testprog and diffprog programs. Line 2 runs the test program (testprog). Line 3 runs the "diffprog" to compare the results from "testprog" with the gold file in the $(testDir). Line 4 is important for rtm to know what happens. The finishTest program converts the result.csv file, into the results file that rtm needs. All of your run_scripts MUST end with that program, otherwise rtm will assume that the test program was not run.
So running "rtm ." in the simple_test directory ran one test. If we change our directory up one to the rt directory we can run all the tests by:
$ rtm .
You should see output similar to this:
Starting Tests:
Started : 21:28:12 tst: 1/5 P/F: 0:0, rt/simple_test/simple_test/t1
passed : 21:28:12 tst: 1/5 P/F: 1:0, rt/simple_test/simple_test/t1
Started : 21:28:12 tst: 2/5 P/F: 1:0, rt/fail_test/fail_test/t1
failed : 21:28:12 tst: 2/5 P/F: 1:1, rt/fail_test/fail_test/t1
Started : 21:28:12 tst: 3/5 P/F: 1:1, rt/multiple_tests/multiple_tests/t2
diff : 21:28:12 tst: 3/5 P/F: 1:2, rt/multiple_tests/multiple_tests/t2
Started : 21:28:12 tst: 4/5 P/F: 1:2, rt/diff_test/diff_test/t1
diff : 21:28:12 tst: 4/5 P/F: 1:3, rt/diff_test/diff_test/t1
Started : 21:28:12 tst: 5/5 P/F: 1:3, rt/multiple_tests/multiple_tests/t1
passed : 21:28:12 tst: 5/5 P/F: 2:3, rt/multiple_tests/multiple_tests/t1
Finished Tests
*** Test Results ***
Date: Mon Jun 16 21:28:12 2015 TARGET: x86_64_dbg Themis Version: Themis 0.3 Total Test Time: 00:00:00.30
*** Test Summary ***
Total: 5 failed: 1 passed: 2 diff: 2
Results R Time Test Name version/message
passed R 0.064 rt/multiple_tests/multiple_tests/t1
passed R 0.060 rt/simple_test/simple_test/t1
diff R 0.059 rt/diff_test/diff_test/t1
diff R 0.059 rt/multiple_tests/multiple_tests/t2
failed R 0.061 rt/fail_test/fail_test/t1
Results Output Directory
diff rt/diff_test/t1/x86_64-2015_06_16_21_28_12-Darwin-x86_64-diff_test diff rt/multiple_tests/t2/x86_64-2015_06_16_21_28_12-Darwin-x86_64-multiple_tests failed rt/fail_test/t1/x86_64-2015_06_16_21_28_12-Darwin-x86_64-fail_test
We see that two of the five tests passed. We see that one test failed and two tests diffed. Let's look at the three problem children.
The diff test has a very similar .desc file but if we look at the at the testprog.gold file we see that the first value that should be a 0 has been changed to 123. This is a large difference between the "gold" and the "test" values so it "diff'ed".
The fail_test.desc shows that I've removed the output file from "testprog" so the "diffprog" program states that the test failed:
'run_script' : """
PATH=$(projectDir):$PATH; export PATH
testprog testprog.soln
rm testprog.soln
diffprog result.csv
Finally the multiple_tests programs shows that the "t1" test passed and the "t2" test failed. We can see why by looking at the runScript and tests parts:
'run_script' : """
PATH=$(projectDir):$PATH; export PATH
testprog testprog.soln
diffprog result.csv
'tests' : [ { 'id' : 't1', 'tol' : 1.01e-6}, { 'id' : 't2', 'tol' : 1.01e-12}, ],
So the "t1" test has a resonable tolerance of 1.01e-6 where as the "t2" test has an impossible tolerance of 1.01e-12 so that test fails.
One you have run test you'll wish to cleanup the results. In order to make that simple you can run the command:
$ clr
and it will remove all the generated files and directories.
If you find that there are "diff's" and "failed" tests you can rerun only these tests. You can run
$ rtm -r wrong
to get both the failed and diff'ed tests.
You can run only test that have a certain keyword given so:
$ rtm -k multiple .
will run the multiple_tests.desc file (assuming that you are in the "rt" directory), because only multiple_tests.desc file has the multiple keyword defined.
Where as:
$ rtm -k single .
will run simple_test.desc, diff_test.desc and fail_test.desc files because each have the "single" keyword defined.
Note that the keywords are case sensitive so "Single" and "single" and "SINGLE" are different keys.
This discussion covers the basic uses of "rtm". If you have questions then please send them to mclay@tacc.utexas.edu.