CANNIER-Framework is command-line tool that automates the empirical evaluation for the ESE paper "Empirically Evaluating Flaky Test Detection Techniques Combining Test Case Rerunning and Machine Learning Models". Part of its function is to automatically use pytest-CANNIER.
The dependencies of CANNIER-Framework can be found in requirements.txt
. It also requires git
, docker
, and virtualenv
to be installed on the system. We have only tested CANNIER-Framework on Ubuntu 20.04 and Python 3.8. We cannot guarantee correct results with other environments.
You can install CANNIER-Framework with pip install PATH
where PATH
is the directory containing setup.py
. This will also install the dependencies.
You can use CANNIER-Framework with cannier COMMAND *ARGS
. COMMAND
can be one of:
setup
Setup the subject projects as part of the build stage for thecanner-experiment
Docker image (see the CANNIER-Experiment repository for more details). This command is not for manual use.manage
Execute a project's test suite with pytest-CANNIER inside acanner-experiment
Docker container.ARGS
must provide the name of the project, the mode for pytest-CANNIER, a unique number to differentiate this container from other containers for the same project and mode, the name of the victim test case when the mode isvictim
(empty string otherwise), and the commands required to execute the test suite (typicallypython -m pytest
). This command is not for manual use.run
Start containers to run CANNIER-Framework with themanage
command for every project and the modes specified byARGS
. The containers runningchurn
must finish before those runningfeatures
. The containers runningbaseline
andshuffle
must finish before those runningvictim
.collate
Collate the outcome and feature data recorded by pytest-CANNIER.shap
Train a machine learning model and apply the SHAP technique for each of the four flaky test classification problems described in the paper. Can only be used aftercollate
.preds
Execute machine learning pipelines to produce predicted probabilities for every test case. Can only be used aftercollate
. Offers the following subcommands:config
Execute the 96 pipelines with a feature sample size of one to address the first part of RQ1.best
Execute the best pipelines fromconfig
with feature sample size values from two to 15 to address RQ2/4.features
Execute the best pipelines fromconfig
with just the top 15, 12, 9, 6, and 3 most impactful features to address RQ3.
points
Find the Pareto front of detection performance and time cost for the three rerunning-based flaky test detection techniques described in the paper. Can only be used afterpreds
.figures
Generate the data for the tables and figures in the paper. Can only be used afterpoints
.
CANNIER-Framework also offers the following options:
--processes={PROCESSES}
Maximum number of parallel processes to use (default is the result of callingos.cpu_count
).--timeout={TIMEOUT}
Maximum run time for containers in seconds (default 28800).--n-repeats={N_REPEATS}
Number of test suite runs with pytest-CANNIER for each project when the mode isfeatures
and the number of times to repeat model training and evaluation (default 30).--n-reruns={N_REPEATS}
Number of test suite runs with pytest-CANNIER for each project when the mode is eitherbaseline
orshuffle
(default 2,500).
The output of CANNIER-Framework depends on COMMAND
:
run
A directory namedvolume
with subdirectories for each subject project. These will contain an SQLite database with the results of pytest-CANNIER.collate
This will produce three files:items.npy
A NumPy array with shape(N_TESTS, 7)
, whereN_TESTS
is the total number of test cases across all projects. From left-to-right, the columns indicate: which project the test case is from (an integer id), the number of times the test case failed in thebaseline
mode of pytest-CANNIER, the number of times the test case failed in theshuffle
mode, if the test case is NOD flaky (0 = false, 1 = true), if the test case is a victim, and if the test case is relevant to the NOD-vs-Victim flaky test classification problem.features.npy
A NumPy array with shape(N_TESTS, N_REPEATS, 18)
containing theN_REPEATS
sets of the 18 test case features measured by pytest-CANNIER in thefeatures
mode.dependencies.pkl
A pickle file containing a list of boolean NumPy arrays for each project. The arrays are packed withnumpy.packbits
and can be unpacked withnumpy.unpackbits
. Once an array is unpacked, its shape is(N_TESTS_PROJ, N_TESTS_PROJ)
, whereN_TESTS_PROJ
is the number of test cases in the project. The value at[i, j]
indicates if test casej
is a polluter of test casei
.
shap
A directory namedshap
containing the SHAP value matrix for each classification problem as a NumPy array with shape(N_TESTS, 18)
.preds
A directory namedpreds
containing the predicted probabilities for each test case from the machine learning pipelines as a NumPy array. The arrays are named{PROBLEM}_{N_FEATURES}_{MODEL_TYPE}_{N_TREES}_{BALANCING}_{N_SAMPLES}.npy
, wherePROBLEM
is the classification problem,N_FEATURES
is the number of features used to encode a test case,MODEL_TYPE
is the type of machine learning model (RandomForest/ExtraTrees),N_TREES
is the number of decision trees used by the model,BALANCING
is the data balancing technique (SMOTE/SMOTE+ENN/SMOTE+Tomek), andN_SAMPLES
is the feature sample size. Each array has the shape(N_TESTS, N_REPEATS)
. A given row contains theN_REPEATS
predicted probabilities of the test case being in the positive class ofPROBLEM
.points
A directory namedpoints
containing the detection performance, time cost, and parameters of the points on the Pareto fronts for the three rerunning-based flaky test detection techniques as a NumPy array.figures
Directories namedtables
andplots
containing LaTeX code.
CANNIER-Framework has its own pytest test suite. To execute it, you must pass the --schema-file={SCHEMA_FILE}
where SCHEMA_FILE
is the path to the schema file for the database. This can be found in the CANNIER-Experiment repository.