Currently we have a few integration benchmarks that only execute a single repetition, but some of these benchmarks share input files. As a consequence, the first of these benchmarks would run on a cold file cache and the others on a warm cache. We have to normalize this, e.g. by having each integration test run two iterations and discarding the runtime of the first.