-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[benchmark-dtrace] Various test stability improvements. #29201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
swift-ci
merged 4 commits into
swiftlang:master
from
gottesmm:pr-7c946eae74676ddfef960009700cb28c2a9e0192
Jan 16, 2020
Merged
[benchmark-dtrace] Various test stability improvements. #29201
swift-ci
merged 4 commits into
swiftlang:master
from
gottesmm:pr-7c946eae74676ddfef960009700cb28c2a9e0192
Jan 16, 2020
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
@swift-ci python lint |
@swift-ci smoke test |
Otherwise, one can get results that seem to imply more rr traffic when in reality, one was not tracking {retain,release}_n that as a result of better optimization become just simple retain, release.
…hmark_O --list This makes the output of the test more readable.
The way we already gather numbers for this test is that we run two runs of `Benchmark_O $TEST` with num-samples=2, iters={2,3}. Under the assumption that the only difference in counter numbers can be caused by that extra iteration, subtracting the group of counts for 2,3 gives us the number of counts in that iteration. In certain cases, I have found that a small subset of the benchmarks are producing weird output and I haven't had the time to look into why. That being said, I do know what these weird results look like, so in this commit we do some extra validation work to see if we need to fail a test due to instability. The specific validation is that: 1. We perform another run with num-samples=2, iter=5 and subtract the iter=3 counts from that. Under the assumption that overall work should increase linearly with iteration size in our benchmarks, we check if the counts are actual 2x. 2. If either `result[iter=3] - result[iter=2]` or `result[iter=5] - result[iter=3]` is negative. All of the counters we gather should never decrease with iteration count.
980f448
to
2840a76
Compare
@swift-ci python lint |
6 similar comments
@swift-ci python lint |
@swift-ci python lint |
@swift-ci python lint |
@swift-ci python lint |
@swift-ci python lint |
@swift-ci python lint |
@swift-ci smoke test and merge |
3 similar comments
@swift-ci smoke test and merge |
@swift-ci smoke test and merge |
@swift-ci smoke test and merge |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Specifically:
value. Now we also do a 5 iter run and make sure that the delta in
between (3/5) is twice the delta in between (2/3). If they are not, we
flag the benchmark as unstable.
I also put in a couple of misc fixes that I found locally.