diff --git a/CHANGELOG.md b/CHANGELOG.md index ce72279a..4c927a86 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,19 +4,23 @@ - +## [0.10.1] - 2018-06-08 + + - fix experiment filters and reporting on codespeed submission errors (#77) + ## [0.10.0] - 2018-06-08 - - Restructure command-line options in help, and use argparse (#73) - - Add support for Python 3 and PyPy (#65) - - Add support for extra criteria (things beside run time) (#64) - - Add support for path names in ReBenchLog benchmark names + - restructure command-line options in help, and use argparse (#73) + - add support for Python 3 and PyPy (#65) + - add support for extra criteria (things beside run time) (#64) + - add support for path names in ReBenchLog benchmark names ## [0.9.1] - 2017-12-21 - - Fix time-left reporting of invalid times (#60) - - Take the number of data points per run into account for estimated time left (#62) - - Obtain process output on timeout to enable results of partial runs - - Fix incompatibility with latest setuptools + - fix time-left reporting of invalid times (#60) + - take the number of data points per run into account for estimated time left (#62) + - obtain process output on timeout to enable results of partial runs + - fix incompatibility with latest setuptools ## [0.9.0] - 2017-04-23 @@ -56,7 +60,8 @@ - [0.6.0] - 2014-05-19 - [0.5.0] - 2014-03-25 -[Unreleased]: https://github.com/smarr/ReBench/compare/v0.10.0...HEAD +[Unreleased]: https://github.com/smarr/ReBench/compare/v0.10.1...HEAD +[0.10.1]: https://github.com/smarr/ReBench/compare/v0.10.0...v0.10.1 [0.10.0]: https://github.com/smarr/ReBench/compare/v0.9.1...v0.10.0 [0.9.1]: https://github.com/smarr/ReBench/compare/v0.9.0...v0.9.1 [0.9.0]: https://github.com/smarr/ReBench/compare/v0.8.0...v0.9.0 diff --git a/INSTALL b/INSTALL deleted file mode 100644 index ad22305c..00000000 --- a/INSTALL +++ /dev/null @@ -1,5 +0,0 @@ -ReBench utilizes SciPy for its statistic calculations. - -Instructions to install SciPy can be found at -http://www.scipy.org/Installing_SciPy -This includes, that you will also need to install NumPy. \ No newline at end of file diff --git a/LICENSE b/LICENSE new file mode 100644 index 00000000..07ac6f9a --- /dev/null +++ b/LICENSE @@ -0,0 +1,19 @@ +Copyright (c) 2009-2018 Stefan Marr + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. diff --git a/README.md b/README.md new file mode 100644 index 00000000..d801f691 --- /dev/null +++ b/README.md @@ -0,0 +1,128 @@ +# ReBench: Execute and Document Benchmarks Reproducibly + +[![Build Status](https://travis-ci.org/smarr/ReBench.svg?branch=master)](https://travis-ci.org/smarr/ReBench) +[![Documentation](https://readthedocs.org/projects/rebench/badge/?version=latest)](https://rebench.readthedocs.io/) +[![Codacy Quality](https://api.codacy.com/project/badge/Grade/2f7210b65b414100be03f64fe6702d66)](https://www.codacy.com/app/smarr/ReBench) + +ReBench is a tool to run and document benchmark experiments. +Currently, it is mostly used for benchmarking language implementations, +but it can be used to monitor the performance of all +kind of other applications and programs, too. + +The ReBench [configuration format][docs] is a text format based on [YAML](http://yaml.org/). +A configuration file defines how to build and execute a set of *experiments*, +i.e. benchmarks. +It describe which binary was used, which parameters where given +to the benchmarks, and the number of iterations to be used to obtain +statistically reliable results. + +With this approach, the configuration contains all benchmark-specific +information to reproduce a benchmark run. However, it does not capture +the whole system. + +The data of all benchmark runs is recorded in a data file for later analysis. +Important for long-running experiments, benchmarks can be aborted and +continued at a later time. + +ReBench is focuses on the execution aspect and does not provide advanced +analysis facilities itself. Instead, it is used in combination with +for instance R scripts to process the results or [Codespeed][1] to do continuous +performance tracing. + +The documentation is hosted at [http://rebench.readthedocs.io/][docs]. + +## Goals and Features + +ReBench is designed to + + - enable reproduction of experiments + - document all benchmark parameters + - a flexible execution model, + with support for interrupting and continuing benchmarking + - defining complex sets of comparisons and executing them flexibly + - report results to continuous performance monitoring systems, e.g., [Codespeed][1] + - basic support to build/compile benchmarks/experiments on demand + - extensible support to read output of benchmark harnesses + +## Non-Goals + +ReBench isn't + + - a framework for microbenchmark. + Instead, it relies on existing harnesses and can be extended to parse their + output. + - a performance analysis tool. It is meant to execute experiments and + record the corresponding measurements. + - a data analysis tool. It provides only a bare minimum of statistics, + but has an easily readable data format that can be processed, e.g., with R. + +## Installation and Usage + + + +ReBench is implemented in Python and can be installed via pip: + +```bash +pip install rebench +``` + +A minimal configuration file looks like: + +```yaml +# this run definition will be chosen if no parameters are given to rebench +default_experiment: all +default_data_file: 'example.data' + +# a set of suites with different benchmarks and possibly different settings +benchmark_suites: + ExampleSuite: + gauge_adapter: RebenchLog + command: Harness %(benchmark)s %(input)s %(variable)s + input_sizes: [2, 10] + variable_values: + - val1 + benchmarks: + - Bench1 + - Bench2 + +# a set of binaries use for the benchmark execution +virtual_machines: + MyBin1: + path: bin + binary: test-vm1.py %(cores)s + cores: [1] + MyBin2: + path: bin + binary: test-vm2.py + +# combining benchmark suites and benchmarks suites +experiments: + Example: + suites: + - ExampleSuite + executions: + - MyBin1 + - MyBin2 +``` + +Saved as `test.conf`, it could be executed with ReBench as follows: + +```bash +rebench test.conf +``` + +See the documentation for details: [http://rebench.readthedocs.io/][docs]. + +## Support and Contributions + +In case you encounter issues, +please feel free to [open an issue](https://github.com/smarr/rebench/issues/new) +so that we can help. + +For contributions, we use the [normal Github flow](https://guides.github.com/introduction/flow/) +of pull requests, discussion, and revisions. For larger contributions, +it is likely useful to discuss them upfront in an issue first. + + +[1]: https://github.com/tobami/codespeed/ +[docs]: http://rebench.readthedocs.io/ diff --git a/README.rst b/README.rst deleted file mode 100644 index 58a8fafc..00000000 --- a/README.rst +++ /dev/null @@ -1,75 +0,0 @@ -ReBench - Execute and Document Benchmarks Reproducibly -====================================================== - -ReBench is a tool to run and document benchmarks. Currently, its focus lies on -benchmarking virtual machines, but nonetheless it can be used to benchmark all -kind of other applications/programs, too. - -To facilitate the documentation of benchmarks, ReBench uses a text-based -configuration format. The configuration files contain all aspects of the -benchmark. They describe which binary was used, which parameters where given -to the benchmarks, and the number of iterations to be used to obtain -statistically reliable results. - -Thus, the documentation contains all benchmark-specific informations to -reproduce a benchmark run. However, it does not capture the whole systems -information, and also does not include build settings for the binary that -is benchmarked. These informations can be included as comments, but are not -captured automatically. - -The data of all benchmark runs is recorded in a data file and allows to -continue aborted benchmark runs at a later time. - -The data can be exported for instance as CSV or visualized with the help of -box plots. - -Current Build Status -==================== - -|BuildStatus|_ - -.. |BuildStatus| image:: https://api.travis-ci.org/smarr/ReBench.png -.. _BuildStatus: https://travis-ci.org/smarr/ReBench - -Credits -======= - -Even though, we do not share code with `JavaStats`_, it was a strong inspiration for the creation of ReBench. - -.. _JavaStats: http://www.elis.ugent.be/en/JavaStats - -Furthermore, our thanks go to `Travis CI`_ for their services. - -.. _Travis CI: http://travis-ci.org - -Related Work -============ - -As already mentioned `JavaStats`_ was an important inspiration and also comes -with an OOPSLA paper titled `Statistically Rigorous Java Performance -Evaluation`_. When you want to benchmark complex systems like virtual machines -this is definitely one of the important papers to read. - -Similar, `Caliper`_ is a framework for micro benchmarks and also discusses -important pitfalls not only for `Microbenchmarks`_. - -.. _Statistically Rigorous Java Performance Evaluation: https://buytaert.net/files/oopsla07-georges.pdf -.. _Caliper: http://code.google.com/p/caliper/ -.. _Microbenchmarks: http://code.google.com/p/caliper/wiki/JavaMicrobenchmarks - - -:: - - @article{1297033, - author = {Andy Georges and Dries Buytaert and Lieven Eeckhout}, - title = {Statistically rigorous java performance evaluation}, - journal = {SIGPLAN Not.}, - volume = {42}, - number = {10}, - year = {2007}, - issn = {0362-1340}, - pages = {57--76}, - doi = {http://doi.acm.org/10.1145/1297105.1297033}, - publisher = {ACM}, - address = {New York, NY, USA}, - } diff --git a/docs/CHANGELOG.md b/docs/CHANGELOG.md new file mode 120000 index 00000000..04c99a55 --- /dev/null +++ b/docs/CHANGELOG.md @@ -0,0 +1 @@ +../CHANGELOG.md \ No newline at end of file diff --git a/docs/LICENSE.md b/docs/LICENSE.md new file mode 120000 index 00000000..ea5b6064 --- /dev/null +++ b/docs/LICENSE.md @@ -0,0 +1 @@ +../LICENSE \ No newline at end of file diff --git a/docs/concepts.md b/docs/concepts.md new file mode 100644 index 00000000..675b148d --- /dev/null +++ b/docs/concepts.md @@ -0,0 +1,85 @@ +# Basic Concepts + +Some of the used terminology may not be usual. To avoid confusion, +the following defines the basic concepts + +
+
experiment
+
+

A combination of benchmark suites and + virtual machines.

+ +

ReBench executes experiments to collect the desired measurements.

+
+ +
benchmark suite
+
+ A set of benchmarks + which is used to define experiments. +
+ +
virtual machine
+
+

A named set of settings for the executor of a + benchmark suite.

+

Typically, this is one specific virtual machine with a set of + startup parameters. It refers to an executable that will execute + benchmarks from a suite. + Thus, the virtual machine is the executor.

+
+ +
benchmark
+
+

A program to be executed by a virtual machine.

+ +

A benchmark can define a number of different variables + that can be varied, for instance, to change the input data set, + the number of cores to be used, etc.

+
+ +
variable
+
+

A dimension of the benchmark + that can be varied to influnce execution characteristics

+

Currently, we have the notion of input sizes, cores, and other + variable values. Each of them is varied independently and can potentially + be used to enumerate a large number of runs.

+
+ +
run
+
+

A concrete execution of a benchmark by + a specific virtual machine.

+ +

A run is a specific combination of variables. + It can be executed multiple times. Each time is referred to as an + invocation. + One run itself can execute a benchmark also multiple times, to which + we refer to as iterations.

+

One run can generate multiple data points.

+
+ +
invocation
+
+ The execution of a run. It may execute itself multiple + iterations of a benchmark. +
+ +
iteration
+
+ The execution of a benchmark within a virtual machine + invocation. + An iteration is expected to generate one + data point, possibly including + multiple measurements. +
+ +
data point
+
+ A set of measurements belonging together. + They are generated by an iteration. +
+ +
measurement
+
One value for one specific criterion.
+
diff --git a/docs/conf.py b/docs/conf.py new file mode 100644 index 00000000..c2a6058e --- /dev/null +++ b/docs/conf.py @@ -0,0 +1,8 @@ +from recommonmark.parser import CommonMarkParser + +source_parsers = { + '.md': CommonMarkParser, +} + +source_suffix = ['.md'] +html_theme = 'sphinx_rtd_theme' diff --git a/docs/config.md b/docs/config.md new file mode 100644 index 00000000..552cdf74 --- /dev/null +++ b/docs/config.md @@ -0,0 +1,755 @@ +# ReBench Configuration Format + +The configuration format based on [YAML](http://yaml.org/) +and the most up-to-date documentation is generally the +[schema file](rebench-schema.yml). + +# Basic Configuration + +The main elements of each configuration are +[benchmarks suites](concepts.md#benchmark), [virtual machines (VMs)](concepts.md#vm), +and [experiments](concepts.md#experiment). + +Below a very basic configuration file: + +```YAML +# this run definition will be chosen if no parameters are given to rebench +default_experiment: all +default_data_file: 'example.data' + +# a set of suites with different benchmarks and possibly different settings +benchmark_suites: + ExampleSuite: + gauge_adapter: RebenchLog + command: Harness %(benchmark)s %(input)s + input_sizes: [2, 10] + benchmarks: + - Bench1 + - Bench2 + +# a set of binaries use for the benchmark execution +virtual_machines: + MyBin1: + path: bin + binary: test-vm2.py + +# combining benchmark suites and benchmarks suites +experiments: + Example: + suites: + - ExampleSuite + executions: + - MyBin1 +``` + +This example shows the general structure of a ReBench configuration. + +**General Settings.** +It can contain some general settings, for instance that all defined +experiments are going to be executed (as defined by the `default_experiment` key) +or that the data is to be stored in the `example.data` file. + +**Benchmark Suites.** The `benchmark_suites` key is used to define collections of benchmarks. +A suite is defined by its name, here `ExampleSuite`, and by: + +- a `gauge_adapter` to interpret the output of the suite's benchmark harness +- a `command` which is given to a virtual machine for execution +- possibly `input_sizes` to compare the behavior of benchmarks based on different parameters +- and a list of `benchmarks` + +The `command` uses Python format strings to compose the command line string. +Since there are two benchmarks (`Bench1` and `Bench2`) and two input sizes (`2` and `10`), +this configuration defines four different [runs](concepts.md#run), for which +to record the data. + +**Virtual Machines.** The `virtual_machines` key defines the VMs to use to +execute the runs defined by a benchmark suite. The `path` gives the relative +or absolute path where to find the `binary`. + +**Experiments.** The `experiments` then combine suites and VMs to executions. +In this example it is simply naming the suite and the VM. + +# Reference of the Configuration Format + +As said before, configurations are [YAML](http://yaml.org/) files, which means +standard YAML features are supported. Furthermore, the format of configuration +files is defined as a [schema](rebench-schema.yml). The schema is used to +check the structure of a configuration for validity when it is loaded. + +In the reminder of this page, we detail all elements of the configuration file. + +## Priority of Configuration Elements + +Different configuration elements can define the same settings. +For instance a benchmark, a suite, and a VM can define a setting for +`input_sizes`. If this is the case, there is a priority for the different +elements and the one with the highest priority will be chosen. +Highest means here in terms of ranking, so the first in the list (benchmark), +overrides any other setting. + +These priorities and the ability to define different benchmarks, suites, VMs, etc, +hopefully provides sufficient flexibility to encode all desired experiments. + +The priorities are, starting with highest: + +1. benchmark +2. benchmark suites +3. virtual machine +4. experiment +5. experiments +6. runs (as defined as the root element) + +## Root Elements + +**default_experiment:** + +Defines the experiment to be run, if no other experiment is specified as a +command line parameter. + +Default: `all`, i.e., all defined experiments are executed + +Example: + +```yaml +default_experiment: Example +``` + +--- + +**default_data_file:** + +Defines the data file to be used, if nothing more specific is defined by an +experiment. The data format is CSV, the used separator is a tab (`\t`), +which allows to load the file for instance in Excel (though not recommended) +for a basic analysis. + +Default: `rebench.data` + +Example: + +```yaml +default_data_file: my-experiment.data +``` + +--- + +**build_log:** + +Defines the file to be used for logging the output of build operations. + +Default: `build.log` + +Example: + +```yaml +build_log: my-experiment-build.log +``` + +--- + +**structured elements:** + +In addition to the basic settings mentioned above, the following keys can +be used, and each contains structural elements further detailed below. + +- `runs` +- `reporting` +- `benchmark_suites` +- `virtual_machines` +- `experiments` + +--- + +**dot keys i.e. ignored configuration keys:** + +To be able to use some YAML features, for instance [merge keys] or [node anchors], +it can be useful to define data that is not directly part of the configuration. +For this purpose, we allow dot keys on the root level that are ignored by the +schema check. + +Example: +```YAML +.my-data: data # excluded from schema validation +``` + +## Runs + +The `runs` key defines global run details for all experiments. +All keys that can be used in the `runs` mapping can also be used for the +definition of a benchmark, benchmark suite, VM, a concrete experiment, and +the experiment in general. + +**invocations:** + +The number of times a virtual machine is executed a run. + +Default: `1` + +Example: + +```yaml +runs: + invocations: 100 +``` + +--- + +**iterations:** + +The number of times a run is executed within a virtual machine +invocation. This needs to be supported by a benchmark harness and +ReBench passes this value on to the harness or benchmark. + +Default: `1` + +Example: + +```yaml +runs: + iterations: 42 +``` + +--- + +**warmup:** + +Consider the first N iterations as warmup and ignore them in ReBench's summary +statistics. Note ,they are still persisted in the data file. + +Default: `0` + +Example: + +```yaml +runs: + warmup: 330 +``` + +--- + +**min_iteration_time:** + +Give a warning if the average total run time of an iteration is below this +value in milliseconds. + +Default: `50` + +Example: + +```yaml +runs: + min_iteration_time: 140 +``` + +--- + +**max_invocation_time:** + +Time in second after which an invocation is terminated. +The value -1 indicates that there is no timeout intended. + +Default: `-1` + +Example: + +```yaml +runs: + max_invocation_time: 600 +``` + +--- + +**parallel_interference_factor:** + +Setting used by parallel schedulers to determine the desirable degree of +parallelism. A higher factor means a lower degree of parallelism. + +The problem with parallel executions is that they increase the noise observed +in the results. +![Use not recommended](https://img.shields.io/badge/Use%20Not%20Recommended-Jun%202018-orange.svg) + +Example: + +```yaml +runs: + parallel_interference_factor: 10.5 +``` + +--- + +**execute_exclusively:** + +Determines whether the run is to be executed without any other runs being +executed in parallel. + +The problem with parallel executions is that they increase the noise observed +in the results. +![Use not recommended](https://img.shields.io/badge/Use%20Not%20Recommended-Jun%202018-orange.svg) + +Default: `true` + +Example: + +```yaml +runs: + execute_exclusively: false +``` + +## Reporting + +Currently, [Codespeed] is the only supported system for continuous +performance monitoring. It is configured with the `reporting` key. + +**codespeed:** + +Send results to Codespeed for continuous performance tracking. +The settings define the project that is configured in Codespeed, and the +URL to which the results are reported. Codespeed requires more information, +but since these details depend on the environment, they are passed set on +the [command line](usage.md#continuous-performance-tracking). + +Example: + +```yaml +reporting: + codespeed: + project: MyVM + url: http://example.org/result/add/json/ +``` + +--- + +## Benchmark Suites + +Benchmark suites are named collections of benchmarks and settings that apply to +all of them. + +**gauge_adapter:** + +Name of the parser that interpreters the output of the benchmark harness. +For a list of supported options see the list of [extensions](extensions.md#available-harness-support). + +This key is mandatory. + +Example: + +```yaml +benchmark_suites: + ExampleSuite: + gauge_adapter: ReBenchLog +``` + +--- + +**command:** + +The command for the benchmark harness. It's going to be combined with the +VM's command line. Thus, it should instruct the VM which harness to use +and how to map the various parameters to the corresponding harness settings. + +It supports various format variables, including: + + - benchmark (the benchmark's name) + - cores (the number of cores to be used by the benchmark) + - input (the input variable's value) + - iterations (the number of iterations) + - variable (another variable's value) + - warmup (the number of iterations to be considered warmup) + +This key is mandatory. + +Example: + +```yaml +benchmark_suites: + ExampleSuite: + command: Harness %(benchmark)s --problem-size=%(input)s --iterations=%(iterations)s +``` + +--- + +**location:** + +The path to the benchmark harness. Execution use this location as +working directory. It overrides the location/path of a VM. + +Example: + +```yaml +benchmark_suites: + ExampleSuite: + location: ../benchmarks/ +``` + +--- + +**build:** + +The given string is executed by the system's shell and can be used to +build a benchmark suite. It is executed once before any benchmarks of the suite +are executed. If `location` is set, it is used as working directory. +Otherwise, it is the current working directory of ReBench. + +Example: + +```yaml +benchmark_suites: + ExampleSuite: + build: ./build-suite.sh +``` + +--- + +**description/desc:** + +The keys `description` and `desc` can be used to add a simple explanation of +the purpose of the suite. + +Example: + +```yaml +benchmark_suites: + ExampleSuite: + description: | + This is an example suite for this documentation. +``` + +--- + +**benchmarks:** + +The `benchmarks` key takes the list of benchmarks. Each benchmark is either a +simple name, or a name with additional properties. +See the section on [benchmark](#benchmark) for details. + +Example: + +```yaml +benchmark_suites: + ExampleSuite: + benchmark: + - Benchmark1 + - Benchmark2: + extra_args: "some additional arguments" +``` + +--- + +**run details and variables:** + +A benchmark suite can additional use the keys for [run details](#runs) and +[variables](#benchmark). +Thus, one can use: + +- `invocations` +- `iterations` +- `warmup` +- `min_iteration_time` +- `max_invocation_time` +- `parallel_interference_factor` +- `execute_exclusively` + +As well as: + +- input_sizes +- cores +- variable_values + +## Benchmark + +A benchmark can be define simply as a name. However, some times one might want +to define extra properties. + +**extra_args:** + +This extra argument is appended to the benchmark's command line. + +Example: + +```yaml +- Benchmark2: + extra_args: "some additional arguments" +``` + +--- + +**command:** + +ReBench will use this command instead of the name for the command line. + +Example: + +```yaml +- Benchmark2: + command: some.package.Benchmark2 +``` + +--- + +**codespeed_name:** + +A name used for this benchmark when sending data to Codespeed. +This gives more flexibility to keep Codespeed and these configurations or +source code details decoupled. + +Example: + +```yaml +- Benchmark2: + codespeed_name: "[peak] Benchmark2" +``` + +--- + +**input_sizes:** + +Many benchmark harnesses and benchmarks take an input size as a +configuration parameter. It might identify a data file, or some other +way to adjust the amount of computation performed. + +`input_sizes` expects a list, either as in the list notation below, or +in form of a sequence literal: `[small, large]`. + +Example: + +```yaml +- Benchmark2: + input_sizes: + - small + - large +``` + +--- + +**cores:** + +The number of cores to be used by the benchmark. +At least that's the original motivation for the variable. +In practice, it is more flexible and just another variable that can take +any list of strings. + +Example: + +```yaml +- Benchmark2: + cores: [1, 3, 4, 19] +``` + +--- + +**variable_values:** + +Another dimension by which the benchmark execution can be varied. +It takes a list of strings, or arbitrary values really. + +Example: + +```yaml +- Benchmark2: + variable_values: + - Sequential + - Parallel + - Random +``` + +--- + +**run details:** + +A benchmark suite can additional use the keys for [run details](#runs). + +--- + +## Virtual Machines + +The `virtual_machines` key defines the binaries and their settings to be used +to execute benchmarks. Each VM is a named set of properties. + +**path:** + +Path to the binary. If not given, it's up to the shell to find the binary. + +Example: + +```yaml +virtual_machines: + MyBin1: + path: . +``` + +--- + +**binary:** + +The name of the binary to be used. + +Example: + +```yaml +virtual_machines: + MyBin1: + binary: my-vm +``` + +--- + +**args:** + +The arguments given to the VM. They are given right after the binary. + +Example: + +```yaml +virtual_machines: + MyBin1: + args: --enable-assertions +``` + +--- + +**description and desc:** + +The keys `description` and `desc` can be used to document the purpose of the +VM specified. + +Example: + +```yaml +virtual_machines: + MyBin1: + desc: A simple example for testing. +``` + +--- + +**build:** + +The given string is executed by the system's shell and can be used to +build a VM. It is executed once before any benchmarks are executed with +the VM. If `path` is set, it is used as working directory. Otherwise, +it is the current working directory of ReBench. + +Example: + +```yaml +virtual_machines: + MyBin1: + build: | + make clobber + make +``` + +--- + +**run details and variables:** + +A VM can additional use the keys for [run details](#runs) and [variables](#benchmark) +(`input_sizes`, `cores`, `variable_values`). + +## Experiments + +Experiments combine virtual machines and benchmark suites. +They can be defined by listing suites to be used and executions. +Executions can simply list VMs or also specify benchmark suites. +This gives a lot of flexibility to define the desired combinations. + +**description and desc:** + +Description of the experiment with `description` or `desc`. + +Example: + +```yaml +experiments: + Example: + description: My example experiment. +``` + +--- + +**data_file:** + +The data for this experiment goes into a separate file. +If not given, the `default_data_file` is used. + +Example: + +```yaml +experiments: + Example: + data_file: example.data +``` + +--- + +**reporting:** + +Experiments can define specific reporting options. +See the [reporting](#reporting) for details on the properties. + +Example: + +```yaml +experiments: + Example: + reporting: + codespeed: + ... +``` + +--- + +**suites:** + +List of benchmark suites to be used. + +Example: + +```yaml +experiments: + Example: + suites: + - ExampleSuite +``` + +--- + +**executions:** + +The VMs used for execution, possibly with specific suites assigned. +Thus `executions` takes a list of VM names, possibly with additional keys +to specify a suite and other details. + +Example, simple list of VM names: + +```yaml +experiments: + Example: + executions: + - MyBin1 +``` + +Example, execution with suite: + +```yaml +experiments: + Example: + executions: + - MyBin1: + suites: + - ExampleSuite + cores: [3, 5] +``` + +--- + +**run details and variables:** + +An experiment can additional use the keys for [run details](#runs) and +[variables](#benchmark) (`input_sizes`, `cores`, `variable_values`). +Note, this is possible on the main experiment, but also separately for each +of the defined executions. + +[merge keys]: http://yaml.org/type/merge.html +[node anchors]: http://yaml.org/spec/1.1/current.html#id899912 +[Codespeed]: https://github.com/tobami/codespeed/ diff --git a/docs/extensions.md b/docs/extensions.md new file mode 100644 index 00000000..5433a101 --- /dev/null +++ b/docs/extensions.md @@ -0,0 +1,31 @@ +# Extensibility + +ReBench is designed to be used with existing benchmarking harnesses. +It uses what we call 'gauge adapters' to parse the output generated by harnesses +and store it in its own data files for later processing. + +## Available Harness Support + +ReBench provides currently builtin support for the following benchmark harnesses: + +- `JMH`: [JMH](http://openjdk.java.net/projects/code-tools/jmh/), Java's mircobenchmark harness +- `PlainSecondsLog`: a plain seconds log, i.e., a floating point number per line +- `ReBenchLog`: the ReBench log format, which indicates benchmark name and run time in milliseconds or microseconds +- `SavinaLog`: the harness of the [Savina](https://github.com/shamsimam/savina) benchmarks +- `Time`: a harness that use automatically `/usr/bin/time` + +## Supporting other Benchmark Harness + +To add support for your own harness, check the `rebench.interop` module. +In there, the `adapter` module contains the `GaugeAdapter` base class. + +The key method to implement is `parse_data(self, data, run_id, invocation)`. +The method is expected to return a list of `DataPoint` objects. +Each data point can contain a number of `Measurement` objects, where one of +them needs to be indicated as the `total` value. +The idea here is that a harness can measure different phases of a benchmark +or different properties, for instance memory usage. +These can be encoded as different measurements. The overall run time is +assumed to be the final measurement to conclude the information for a single +iteration of a benchmark. +A good example to study is the `rebench_log_adapter` implementation. diff --git a/docs/index.md b/docs/index.md new file mode 120000 index 00000000..32d46ee8 --- /dev/null +++ b/docs/index.md @@ -0,0 +1 @@ +../README.md \ No newline at end of file diff --git a/docs/rebench-schema.yml b/docs/rebench-schema.yml new file mode 120000 index 00000000..05042ddb --- /dev/null +++ b/docs/rebench-schema.yml @@ -0,0 +1 @@ +../rebench/rebench-schema.yml \ No newline at end of file diff --git a/docs/release.md b/docs/release.md new file mode 100644 index 00000000..d470fcec --- /dev/null +++ b/docs/release.md @@ -0,0 +1,6 @@ +# Release Instructions + +```bash +python setup.py sdist build +python setup.py sdist upload +``` diff --git a/docs/usage.md b/docs/usage.md new file mode 100644 index 00000000..c5344dc7 --- /dev/null +++ b/docs/usage.md @@ -0,0 +1,158 @@ +# Usage + +ReBench is a command-line tool. In the following, we will discuss its usage. +A basic help can be displayed with the `--help` argument: + +```bash +$ rebench --help +Usage: rebench [options] [exp_name] [vm:$]* [s:$]* + +Argument: + config required argument, file containing the experiment to be executed + exp_name optional argument, the name of an experiment definition + from the config file + If not provided, the configured default_experiment is used. + If 'all' is given, all experiments will be executed. + + vm:$ filter experiments to only include the named VM, example: vm:VM1 vm:VM3 + s:$ filter experiments to only include the named suite and possibly benchmark + example: s:Suite1 s:Suite2:Bench3 + +... +``` + +[Configuration files](config.md) provide the setup for experiments by +defining benchmarks, benchmark suites, their parameters, and virtual machines +to execute them. + +### Basic Execution: Run Experiments + +ReBench takes a given configuration file, executes the experiments and stores +the measurement results into the configured data file. Assuming a basic +configuration as seen in the [README](index.md#install), the following command +line will execute all experiments and store the results in the `example.data` +file: + +```bash +$ rebench example.conf +``` + +This basic execution can be customized in various ways as explained below. + +### Partial Execution: Run Some of the Experiments + +Instead of executing the configured experiments, we can ask ReBench to only +execute a subset of them, a specific experiment, only selected VMs, suites, and +benchmarks. + +The [configuration file](config.md) allows us to select a +`default_experiment`. But we can override this setting with the `exp_name` +parameter. Thus, the following will execute only the `Example` experiment: + +```bash +$ rebench example.conf Example +``` + +We can further restrict what is executed. +To only execute the `MyBin1` virtual machine, we use: + +```bash +$ rebench example.conf Example vm:MyBin1 +``` + +To further limit the execution, we can also select a specific benchmark from a +suite: + +```bash +$ rebench example.conf Example vm:MyBin1 s:ExampleSuite:Bench1 +``` + +The filters are applied on the set of *runs* identified by a configuration and +the chosen experiments. Thus, the above says to execute only `MyBin1`, and no +other virtual machine. For the resulting runs, we also want to execute only +`Bench1` in the `ExampleSuite`. If we list additional VMs, they are all +considered. Similarly, naming more benchmarks will include them all. + +### Further Options + +ReBench supports a range of other options to control execution. + +#### Quick Runs, Iterations, Invocations + +The [configuration](config.md#invocation) uses the notion of iteration +and invocation to define how often a VM is started (invocation) and how many +times a benchmark is executed in the same VM (iteration). + +We can override this setting with the following parameters: + +```text +-in INVOCATIONS, --invocations INVOCATIONS + The number of times a VM is started to execute a run. +-it ITERATIONS, --iterations ITERATIONS + The number of times a benchmark is to be executed + within a VM invocation. + +-q, --quick Execute quickly. Identical with --iterations=1 --invocations=1 +``` + +#### Discarding Data, Rerunning Experiments + +ReBench's normal execution mode will assume that it should accumulate all data +until a complete data set is reached. +This means, we can interrupt execution at any point and continue later and +ReBench will continue where it left off. + +Some times, we may want to update some experiments and discard old data: + +```text +-c, --clean Discard old data from the data file (configured in the run description). +-r, --rerun Rerun selected experiments, and discard old data from data file. +``` + +#### Execution Order + +We may care for a different order for the benchmark execution. +This could either be to get a quicker impression of the performance results. +But possibly also to account for the complexity of benchmarking and ensure +that the order does not influence results. + +For this purpose we use *schedulers* to determine the execution order. + +```text +-s SCHEDULER, --scheduler=SCHEDULER + execution order of benchmarks: batch, round-robin, + random [default: batch] +``` + +#### Continuous Performance Tracking + +ReBench supports [Codespeed][1] as platform for continuous performance +tracking. To report data to a Codespeed setup, the [configuration](config.md#codespeed) +needs to have the corresponding details. + +And, Codespeed needs details on the concrete execution: + +```text +--commit-id=COMMIT_ID MANDATORY: when codespeed reporting is used, the + commit-id has to be specified. + +--environment=ENVIRONMENT MANDATORY: name the machine on which the results are + obtained. + +--branch=BRANCH The branch for which the results have to be recorded, + i.e., to which the commit belongs. Default: HEAD + +--executable=EXECUTABLE The executable name given to codespeed. Default: The + name used for the virtual machine. + +--project=PROJECT The project name given to codespeed. Default: Value + given in the config file. + +-I, --disable-inc-report Does a final report at the end instead of reporting + incrementally. + +-S, --disable-codespeed Override configuration and disable reporting to + codespeed. +``` + +[1]: https://github.com/tobami/codespeed/ diff --git a/mkdocs.yml b/mkdocs.yml new file mode 100644 index 00000000..09f18c80 --- /dev/null +++ b/mkdocs.yml @@ -0,0 +1,18 @@ +site_name: 'ReBench: Execute and Document Benchmarks Reproducibly' +site_description: ReBench Project documentation +site_author: ReBench contributors + +pages: + - 'ReBench: Execute and Document Benchmarks Reproducibly': index.md + - Basic Concepts: concepts.md + - Usage: usage.md + - Configuration: config.md + - Development: + - Extensions: extensions.md + - Release Steps: release.md + - License: LICENSE.md + - Change Log: CHANGELOG.md + +markdown_extensions: + - toc: + permalink: True diff --git a/rebench.conf b/rebench.conf index ef4ef5f1..912c2f89 100644 --- a/rebench.conf +++ b/rebench.conf @@ -2,8 +2,8 @@ # Config format is YAML (see http://yaml.org/ for detailed spec) # this run definition will be chosen if no parameters are given to rebench.py -standard_experiment: Test -standard_data_file: 'test.data' +default_experiment: Test +default_data_file: 'test.data' # general configuration for runs runs: diff --git a/rebench/configurator.py b/rebench/configurator.py index b99b43b5..d9fef1af 100644 --- a/rebench/configurator.py +++ b/rebench/configurator.py @@ -154,8 +154,8 @@ def __init__(self, raw_config, data_store, ui, cli_options=None, cli_reporter=No self._raw_config_for_debugging = raw_config # kept around for debugging only self._build_log = build_log or raw_config.get('build_log', 'build.log') - self._data_file = data_file or raw_config.get('standard_data_file', 'rebench.data') - self._exp_name = exp_name or raw_config.get('standard_experiment', 'all') + self._data_file = data_file or raw_config.get('default_data_file', 'rebench.data') + self._exp_name = exp_name or raw_config.get('default_experiment', 'all') # capture invocation and iteration settings and override when quick is selected invocations = cli_options.invocations if cli_options else None diff --git a/rebench/model/__init__.py b/rebench/model/__init__.py index bdda7f0e..ff4b7146 100644 --- a/rebench/model/__init__.py +++ b/rebench/model/__init__.py @@ -18,44 +18,6 @@ # FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS # IN THE SOFTWARE. -""" -Glossary: - - - data point: - a set of measurements belonging together. - generated by one specific run. - In some cases, a single run can produce multiple data points. - - measurement: - one value for one specific criterion - - virtual machine: - A named set of settings for the executor of a benchmark suite. - - Typically, this is one specific virtual machine with a specific set of - startup parameters. It refers to an executable that will execute - benchmarks from suite. Thus, the virtual machine is the executor. - - benchmark suite: - A set of benchmarks with a variety of parameters, i.e., dimension to be - explored by a benchmark. - - benchmark: - A set of experiments based on one program to be executed. - The set is described by parameters, i.e., dimensions that are to be - explored. - - run: - A run is one specific experiments based on the selected - parameters, benchmark, benchmark suite, and virtual machine. - One run can generate multiple data points. - - experiment: - Brings together benchmark suites, virtual machines, and their - various parameters. -""" - def none_or_int(value): if value: diff --git a/rebench/rebench-schema.yml b/rebench/rebench-schema.yml index 092a5b37..e247284e 100644 --- a/rebench/rebench-schema.yml +++ b/rebench/rebench-schema.yml @@ -8,28 +8,26 @@ schema;runs_type: type: int # default: 1 # can't specify this here, because the defaults override settings desc: | - The number of times a virtual machine is executed to run one experiment. - Thus, this is the number of runs per virtual machine. + The number of times a virtual machine is executed a run. iterations: type: int # default: 1 # can't specify this here, because the defaults override settings desc: | - The number of times a benchmark is executed within a virtual machine + The number of times a run is executed within a virtual machine invocation. This needs to be supported by a benchmark harness and ReBench passes this value on to the harness or benchmark. warmup: type: int desc: | - Consider the first N iterations as warmup and ignore them in summary - statistics. Note they are still persistet in the data file. + Consider the first N iterations as warmup and ignore them in ReBench's summary + statistics. Note ,they are still persisted in the data file. min_iteration_time: type: int - # TODO: make sure this is the correct default value # default: 50 # can't specify this here, because the defaults override settings desc: | - Give a warning if the average run time is below this value in - milliseconds. + Give a warning if the average total run time of an iteration is below + this value in milliseconds. max_invocation_time: type: int desc: | @@ -44,7 +42,7 @@ schema;runs_type: TODO: then again, we might want this for research on the impact execute_exclusively: type: bool - # default: false # can't specify this here, because the defaults override settings + # default: true # can't specify this here, because the defaults override settings desc: | TODO: probably needs to be removed, not sure. parallel exec of benchmarks introduced a lot of noise @@ -138,9 +136,11 @@ schema;benchmark_suite_type: The command for the benchmark harness. It's going to be combined with the VM's command line. It supports various format variables, including: - benchmark (the benchmark's name) + - cores (the number of cores to be used by the benchmark) - input (the input variable's value) + - iterations (the number of iterations) - variable (another variable's value) - - cores (the number of cores to be used by the benchmark) + - warmup (the number of iterations to be considered warmup) location: type: str desc: | @@ -162,10 +162,10 @@ schema;benchmark_suite_type: - include: benchmark_type_map description: type: str - desc: A description of the benchmark. + desc: A description of the benchmark suite. desc: type: str - desc: A description of the benchmark. + desc: A description of the benchmark suite. schema;vm_type: type: map @@ -252,11 +252,11 @@ type: map mapping: regex;(\..+): type: any - desc: dot properties, for example `.test` are going to be ignored - standard_experiment: + desc: dot keys, for example `.test` are going to be ignored + default_experiment: type: str default: all - standard_data_file: + default_data_file: type: str default: rebench.data build_log: diff --git a/rebench/rebench.py b/rebench/rebench.py index 1da90848..03d36924 100755 --- a/rebench/rebench.py +++ b/rebench/rebench.py @@ -57,8 +57,11 @@ def shell_options(self): Argument: config required argument, file containing the experiment to be executed - exp_name optional argument, the name of a experiment definition + exp_name optional argument, the name of an experiment definition from the config file + If not provided, the configured default_experiment is used. + If 'all' is given, all experiments will be executed. + vm:$ filter experiments to only include the named VM, example: vm:VM1 vm:VM3 s:$ filter experiments to only include the named suite and possibly benchmark example: s:Suite1 s:Suite2:Bench3 diff --git a/rebench/tests/broken-schema.conf b/rebench/tests/broken-schema.conf index dfb78de4..2eac7a51 100644 --- a/rebench/tests/broken-schema.conf +++ b/rebench/tests/broken-schema.conf @@ -2,8 +2,8 @@ # Config format is YAML (see http://yaml.org/ for detailed spec) # this run definition will be chosen if no parameters are given to rebench.py -standard_experiment: Test -standard_data_file: 'tests/small.data' +default_experiment: Test +default_data_file: 'tests/small.data' # general configuration for runs runs: diff --git a/rebench/tests/broken-yaml.conf b/rebench/tests/broken-yaml.conf index 03487171..ec2055e5 100644 --- a/rebench/tests/broken-yaml.conf +++ b/rebench/tests/broken-yaml.conf @@ -2,8 +2,8 @@ # Config format is YAML (see http://yaml.org/ for detailed spec) # this run definition will be chosen if no parameters are given to rebench.py -standard_experiment: Test -standard_data_file: 'tests/small.data' +default_experiment: Test +default_data_file: 'tests/small.data' # general configuration for runs runs: diff --git a/rebench/tests/bugs/issue_27.conf b/rebench/tests/bugs/issue_27.conf index d332c671..69fb4b30 100644 --- a/rebench/tests/bugs/issue_27.conf +++ b/rebench/tests/bugs/issue_27.conf @@ -1,4 +1,4 @@ -standard_experiment: Test +default_experiment: Test benchmark_suites: Suite: diff --git a/rebench/tests/codespeed.conf b/rebench/tests/codespeed.conf index 80e6bbb1..bec4c6a2 100644 --- a/rebench/tests/codespeed.conf +++ b/rebench/tests/codespeed.conf @@ -2,8 +2,8 @@ # Config format is YAML (see http://yaml.org/ for detailed spec) # this run definition will be choosen if no parameters are given to rebench.py -standard_experiment: Test -standard_data_file: 'codespeed.data' +default_experiment: Test +default_data_file: 'codespeed.data' # reporting should enable the configuration of the format of the out put # REM: not implement yet (STEFAN: 2011-01-19) diff --git a/rebench/tests/features/issue_15.conf b/rebench/tests/features/issue_15.conf index 4cd0f0bc..40c0b627 100644 --- a/rebench/tests/features/issue_15.conf +++ b/rebench/tests/features/issue_15.conf @@ -1,4 +1,4 @@ -standard_experiment: Test +default_experiment: Test runs: invocations: 10 diff --git a/rebench/tests/features/issue_16.conf b/rebench/tests/features/issue_16.conf index d7553894..44692413 100644 --- a/rebench/tests/features/issue_16.conf +++ b/rebench/tests/features/issue_16.conf @@ -1,4 +1,4 @@ -standard_experiment: Test +default_experiment: Test benchmark_suites: Suite: diff --git a/rebench/tests/features/issue_19.conf b/rebench/tests/features/issue_19.conf index ecb8659d..14604b33 100644 --- a/rebench/tests/features/issue_19.conf +++ b/rebench/tests/features/issue_19.conf @@ -1,4 +1,4 @@ -standard_experiment: Test +default_experiment: Test runs: invocations: 10 diff --git a/rebench/tests/features/issue_31.conf b/rebench/tests/features/issue_31.conf index 81502814..949757cf 100644 --- a/rebench/tests/features/issue_31.conf +++ b/rebench/tests/features/issue_31.conf @@ -1,4 +1,4 @@ -standard_experiment: Test +default_experiment: Test benchmark_suites: Suite: diff --git a/rebench/tests/features/issue_34.conf b/rebench/tests/features/issue_34.conf index 191708fa..e4802e96 100644 --- a/rebench/tests/features/issue_34.conf +++ b/rebench/tests/features/issue_34.conf @@ -1,4 +1,4 @@ -standard_experiment: Test +default_experiment: Test runs: invocations: 10 diff --git a/rebench/tests/features/issue_40.conf b/rebench/tests/features/issue_40.conf index 641138ab..25c846ae 100644 --- a/rebench/tests/features/issue_40.conf +++ b/rebench/tests/features/issue_40.conf @@ -1,4 +1,4 @@ -standard_experiment: Test +default_experiment: Test runs: invocations: 10 diff --git a/rebench/tests/features/issue_57.conf b/rebench/tests/features/issue_57.conf index a47b7620..b86fc32e 100644 --- a/rebench/tests/features/issue_57.conf +++ b/rebench/tests/features/issue_57.conf @@ -1,4 +1,4 @@ -standard_experiment: Test +default_experiment: Test runs: invocations: 10 diff --git a/rebench/tests/features/issue_58.conf b/rebench/tests/features/issue_58.conf index 52983192..3541b197 100644 --- a/rebench/tests/features/issue_58.conf +++ b/rebench/tests/features/issue_58.conf @@ -1,4 +1,4 @@ -standard_experiment: Test +default_experiment: Test build_log: build.log diff --git a/rebench/tests/features/issue_59.conf b/rebench/tests/features/issue_59.conf index 27c663a3..2645aa04 100644 --- a/rebench/tests/features/issue_59.conf +++ b/rebench/tests/features/issue_59.conf @@ -1,5 +1,5 @@ -standard_experiment: Test -standard_data_file: test.data +default_experiment: Test +default_data_file: test.data build_log: build.log diff --git a/rebench/tests/features/issue_81.conf b/rebench/tests/features/issue_81.conf index b27949ad..ed01fb87 100644 --- a/rebench/tests/features/issue_81.conf +++ b/rebench/tests/features/issue_81.conf @@ -1,5 +1,5 @@ -standard_experiment: Test -standard_data_file: test.data +default_experiment: Test +default_data_file: test.data build_log: build.log diff --git a/rebench/tests/persistency.conf b/rebench/tests/persistency.conf index 1418b77f..b481e4ac 100644 --- a/rebench/tests/persistency.conf +++ b/rebench/tests/persistency.conf @@ -2,8 +2,8 @@ # Config format is YAML (see http://yaml.org/ for detailed spec) # this run definition will be chosen if no parameters are given to rebench.py -standard_experiment: Test -standard_data_file: 'persistency.data' +default_experiment: Test +default_data_file: 'persistency.data' benchmark_suites: diff --git a/rebench/tests/small.conf b/rebench/tests/small.conf index 20f799be..5657d79b 100644 --- a/rebench/tests/small.conf +++ b/rebench/tests/small.conf @@ -2,8 +2,8 @@ # Config format is YAML (see http://yaml.org/ for detailed spec) # this run definition will be chosen if no parameters are given to rebench.py -standard_experiment: Test -standard_data_file: 'tests/small.data' +default_experiment: Test +default_data_file: 'tests/small.data' # general configuration for runs runs: diff --git a/rebench/tests/test.conf b/rebench/tests/test.conf index 93366dc2..4d6a1779 100644 --- a/rebench/tests/test.conf +++ b/rebench/tests/test.conf @@ -2,8 +2,8 @@ # Config format is YAML (see http://yaml.org/ for detailed spec) # this experiment is chosen if no parameter is given to rebench -standard_experiment: Test -standard_data_file: 'tests/test.data' +default_experiment: Test +default_data_file: 'tests/test.data' # reporting should enable the configuration of the format of the output # reporting: