Skip to content
Branch: master
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

ICSE Artifact Evaluation


To track the pre-integration history of software changes, our tool maps emails containing patches to commits in a repository. The submitted artefacts allow evaluators, among other things, to compute such mappings, and to determine a set of optimal values for tuneable parameters that maximise the mapping quality, as quantified by comparing the mapping for each parameter combination against a given ground truth.

Our tool has been developed as an Open Source project on GitHub (GPLv2). We follow strong developmental quality standards: the development history comprises only well-documented (i.e., comprehensive commit messages) and orthogonal (i.e., every revision is functional) changes. Industrial (such as the Linux Foundation) as well as academic partners actively use and work with our tool, which underlines functionality and reusability.

Evaluators familiar with basic command line interaction can evaluate our artefacts on Linux or Mac OS X. We provide a docker container that can be downloaded from our institutions' website (see below), or be built from scratch. This makes the artefact evaluation process convenient, yet guarantees full replicability from scratch.

Artifact Overview

This is the list of artefacts that is required for the analysis:

For convenience, these artefacts are bundled in a prepared docker image (3.5 GB).

Installation and Analysis

Please find more details in that accompanies our artefact distribution. Note that there is no need to manually download any of the artefacts as everything is automised in scripts.


In our paper, we determine the optimum parameter set for our algorithm by comparing a variety of parameter combinations against a manually created ground truth. This consumes a lot of computational power and memory. A full reproduction is possible, but required several weeks of computational efforts on a 48 core system with 300GiB of RAM. So we do not suggest to perform a full reproduction run (scripts are nonetheless available).

Our evaluation scripts use the optimal parameter set to determine the quality of tool-reconstructed pre-integration history against the ground truth (as quantified by the Fowlkes-Mallows score) to show the high accuracy of the classification method presented in the paper.


We apply for the badges reusable, available, and replicated. Please find the rationale in STATUS.


The LICENSE file can be found in the root directory of the project's repository. The code and all results are published under the terms and conditions of the GPLv2.

You can’t perform that action at this time.