Skip to content

jualvespereira/ICPE2020

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Sampling Effect on Performance Prediction of Configurable Systems: A Case Study (Artifact)

In this repository, we provide general instructions on how to reproduce the results of the paper "Sampling Effect on Performance Prediction of Configurable Systems: A Case Study" and reuse our datasets of measurements.

DOI: 10.5281/zenodo.3614085

Overview

Overview

In the Figure above, we give an overview of our study. We replicate a recent study through an in-depth analysis of x264, a popular and configurable video encoder. We systematically measure 1,152 configurations of x264 with 17 different input videos and two quantitative properties (encoding time and encoding size). Our goal is to understand whether there is a dominant sampling strategy (e.g., random, coverage-based, distance-based) over the very same subject system (x264), i.e., whatever the workload and targeted performance properties.

Publication

Juliana Alves Pereira, Mathieu Acher, Hugo Martin, Jean-Marc Jézéquel. Sampling Effect on Performance Prediction of Configurable Systems: A Case Study. International Conference on Performance Engineering (ICPE), ACM, April 2020. Pages 277–288. Best paper award.

Data

In this repository, we provide two directories: Distance-Based_Data_Time and Distance-Based_Data_Size. Each directory contains the data related to time and size prediction, respectively. They provide two main directories: MeasuredPerformanceValues and PerformancePredictions.

Measured Performance Values

This directory contains the feature model and measurements of all 17 analysed input videos, namely , , , ..., , and a file containing the description of the input video we used to measure all valid configurations. To perform additional experiments related to a new case study case_x, the user should add the case-study name case_x into the files SPLConquerorExecuter.py and analyzeRuns.py, and add the files "FeatureModel.xml" and "measurements.xml" at a new folder case_x into the directory Distance-Based_Data_Time/SupplementaryWebsite/MeasuredPerformanceValues/ (as an example see these files for x264 case study -- more information about the format of these files can be also found at SPLConqueror).

Performance Predictions

This directory contains a set of prediction log files with the experiment results (i.e., error rate) of the different sampling approaches and input videos. Each sampling error rate is computed 100 times using different random seeds for each input video.

Step-by-Step Instructions

Our experiments consists of 2 main steps:

  1. performance prediction
  2. aggregation and visualization of results

1. Performance Prediction

To reproduce our results, we rely on the docker container from the Distance-Based Sampling repository. For more information on this conteiner, we refer to their documentation. To set up the docker container, users must follow the following steps:

  • Install Docker (use the command status docker to make sure it is running).

  • Download the image: docker pull hmartinirisa/icpe2020:latest (by invoking this script, all required ressources are installed, which might take several minutes).

  • Run the container: sudo docker run -it -v "$(pwd)":/docker hmartinirisa/icpe2020 bash

To perform the sampling and learning processes, inside the Docker container, go either to the directory Distance-Based_Data_Time or Distance-Based_Data_Size.

  • cd /ICPE2020/Distance-Based_Data_Time or cd /ICPE2020/Distance-Based_Data_Size

Then, for each <sampling-approach> (twise, solvBased, henard, distBased, divDistBased, and rand) and <case-study> (, , , ..., ), run the following Python script:

  • ./SPLConquerorExecuter.py <case-study> <sampling-approach> <save-location>

The experiments will run for 100 random seeds.

Files

  • learn_<sampling-approach>_<sampling-size>.a: file containing a list of all input commands used to run SPL Conqueror
  • learnAll.a: file containing a super-script to multiple .a-files.
  • out_<sampling-approach>_<sampling-size>.log: file containing the error rate of applying a sampling strategy with a certain sample size on an input video (the error rate is the last number in the last line before "Analyze finished").
  • sampledConfigurations_<sampling-approach>_<sampling-size>.csv: file containing the set of configurations used as sample. This sample set is used as input for the machine-learning technique.
  • out_<sampling-approach>_<sampling-size>.log_error: file containing the error(s) during the execution (if any).

3. Aggregation and Visualization of Results

analyzeRuns.py and ErrorRateTableCreator.py are the main scripts to aggregate and visualize the results. analyzeRuns.py collects all error rates from all 100 runs of all case studies in an unique file all_error_<sampling-approach>_<sampling-size>.txt (see diretory). Then, the script ErrorRateTableCreator.py reads the generated file and invokes the R script PerformKruskalWallis.R to perform the significance tests (e.g., Kruskal Wallis, Mann Whitney U) on the collected error rates and automatically create tex-files that are compiled using LaTeX to generate the Tables 2-8 shown in the paper.

  • ./analyzeRuns.py <run-directory> <output-directory>
  • ./ErrorRateTableCreator.py <input-directory> <sampling-approaches> <labels> <output-tex>

<run-directory> is the directory where all runs of all case studies are stored. <output-directory> and <input-directory> are the directory where the aggregated results should be written to and read from, respectively. <sampling-approaches> and <labels> contain the list of sampling approaches to consider and the labels that should be used in the table. <output-tex> contains the directory where the tex-files should be written to.

Files

  • all_error_<sampling-approach>_<sampling-size>.txt: file containing the agregated error rates from all 100 runs of all case studies
  • <significance-tests>Table.tex: multiple different stand alone tex files that contains the summarized tables from the paper. These files can be compiled using LaTeX.

Usage Example (prediction of time for the input video )

For a better demonstration of the usage, we show it exemplarily for the diversified distance-based samplling approach, the input video and the non-functional property time (for 10 random seeds - see final parameters of the command line in step 3). The location of the measured performance values is here.

  1. sudo docker run -it -v "$(pwd)":/docker hmartinirisa/icpe2020 bash
  2. cd ICPE2020/Distance-Based_Data_Time
  3. ./SPLConquerorExecuter.py x264_0 divDistBased /docker/ICPE2020/DistanceBased_Data_Time/SupplementaryWebsite/PerformancePredictions/AllExperiments 1 10
  4. ./analyzeRuns.py /docker/ICPE2020/Distance-Based_Data_Time/SupplementaryWebsite/PerformancePredictions/AllExperiments/ /docker/ICPE2020/Distance-Based_Data_Time/SupplementaryWebsite/PerformancePredictions/AllSummary/
  5. Create a folder to store the generated latex files mkdir latex_files
  6. ./ErrorRateTableCreator.py /docker/ICPE2020/Distance-Based_Data_Time/SupplementaryWebsite/PerformancePredictions/AllSummary/ "twise,solverBased,henard,distBased,divDistBased,random" "Coverage-based,Solver-based,Randomized solver-based,Distance-based,Diversified distance-based,Random" /docker/ICPE2020/Distance-Based_Data_Time/latex_files

In the replication process, the generated error rates (in the local directory /ICPE2020/DistanceBased_Data_Time/SupplementaryWebsite/PerformancePredictions/AllExperiments/x264_0) must be the same as the provided in this directory for the diversified distance-based sampling approach, same sample sizes and random seeds (1-10).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published