[benchmark-dtrace] Various test stability improvements. #29201

gottesmm · 2020-01-14T23:03:04Z

Specifically:

Add the ability to tell the runner to run all of the payloads a second time in reverse.
We previously just took 2 iters, 3 iters and used that to compute our
value. Now we also do a 5 iter run and make sure that the delta in
between (3/5) is twice the delta in between (2/3). If they are not, we
flag the benchmark as unstable.

I also put in a couple of misc fixes that I found locally.

gottesmm · 2020-01-14T23:03:12Z

gottesmm · 2020-01-14T23:03:17Z

Otherwise, one can get results that seem to imply more rr traffic when in reality, one was not tracking {retain,release}_n that as a result of better optimization become just simple retain, release.

…hmark_O --list This makes the output of the test more readable.

The way we already gather numbers for this test is that we run two runs of `Benchmark_O $TEST` with num-samples=2, iters={2,3}. Under the assumption that the only difference in counter numbers can be caused by that extra iteration, subtracting the group of counts for 2,3 gives us the number of counts in that iteration. In certain cases, I have found that a small subset of the benchmarks are producing weird output and I haven't had the time to look into why. That being said, I do know what these weird results look like, so in this commit we do some extra validation work to see if we need to fail a test due to instability. The specific validation is that: 1. We perform another run with num-samples=2, iter=5 and subtract the iter=3 counts from that. Under the assumption that overall work should increase linearly with iteration size in our benchmarks, we check if the counts are actual 2x. 2. If either `result[iter=3] - result[iter=2]` or `result[iter=5] - result[iter=3]` is negative. All of the counters we gather should never decrease with iteration count.

gottesmm · 2020-01-15T22:42:28Z

@swift-ci python lint

gottesmm · 2020-01-15T22:42:31Z

@swift-ci python lint

gottesmm · 2020-01-15T22:42:33Z

@swift-ci python lint

gottesmm · 2020-01-15T22:42:41Z

@swift-ci python lint

gottesmm · 2020-01-15T22:42:52Z

@swift-ci python lint

gottesmm · 2020-01-15T22:43:12Z

@swift-ci python lint

gottesmm · 2020-01-16T00:51:19Z

@swift-ci python lint

gottesmm · 2020-01-16T00:53:48Z

@swift-ci smoke test and merge

gottesmm · 2020-01-16T00:53:52Z

@swift-ci smoke test and merge

gottesmm · 2020-01-16T00:54:01Z

@swift-ci smoke test and merge

gottesmm · 2020-01-16T00:54:07Z

@swift-ci smoke test and merge

gottesmm added 4 commits January 15, 2020 14:39

Have dtrace aggregate rr opts and start tracking {retain,release}_n.

676411f

Otherwise, one can get results that seem to imply more rr traffic when in reality, one was not tracking {retain,release}_n that as a result of better optimization become just simple retain, release.

Pattern match test names, not numbers to capture test names from Benc…

35aa040

…hmark_O --list This makes the output of the test more readable.

Change -csv flag to be --emit-csv.

461f17e

gottesmm force-pushed the pr-7c946eae74676ddfef960009700cb28c2a9e0192 branch from 980f448 to 2840a76 Compare January 15, 2020 22:42

swift-ci merged commit ffc10a5 into swiftlang:master Jan 16, 2020

gottesmm deleted the pr-7c946eae74676ddfef960009700cb28c2a9e0192 branch July 23, 2021 21:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[benchmark-dtrace] Various test stability improvements. #29201

[benchmark-dtrace] Various test stability improvements. #29201

Uh oh!

gottesmm commented Jan 14, 2020

Uh oh!

gottesmm commented Jan 14, 2020

Uh oh!

gottesmm commented Jan 14, 2020

Uh oh!

gottesmm commented Jan 15, 2020

Uh oh!

gottesmm commented Jan 15, 2020

Uh oh!

gottesmm commented Jan 15, 2020

Uh oh!

gottesmm commented Jan 15, 2020

Uh oh!

gottesmm commented Jan 15, 2020

Uh oh!

gottesmm commented Jan 15, 2020

Uh oh!

gottesmm commented Jan 16, 2020

Uh oh!

gottesmm commented Jan 16, 2020

Uh oh!

gottesmm commented Jan 16, 2020

Uh oh!

gottesmm commented Jan 16, 2020

Uh oh!

gottesmm commented Jan 16, 2020

Uh oh!

Uh oh!

[benchmark-dtrace] Various test stability improvements. #29201

[benchmark-dtrace] Various test stability improvements. #29201

Uh oh!

Conversation

gottesmm commented Jan 14, 2020

Uh oh!

gottesmm commented Jan 14, 2020

Uh oh!

gottesmm commented Jan 14, 2020

Uh oh!

gottesmm commented Jan 15, 2020

Uh oh!

gottesmm commented Jan 15, 2020

Uh oh!

gottesmm commented Jan 15, 2020

Uh oh!

gottesmm commented Jan 15, 2020

Uh oh!

gottesmm commented Jan 15, 2020

Uh oh!

gottesmm commented Jan 15, 2020

Uh oh!

gottesmm commented Jan 16, 2020

Uh oh!

gottesmm commented Jan 16, 2020

Uh oh!

gottesmm commented Jan 16, 2020

Uh oh!

gottesmm commented Jan 16, 2020

Uh oh!

gottesmm commented Jan 16, 2020

Uh oh!

Uh oh!