unit tests: Harden flaky decoy output selection tests #8024
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
Since gamma tests are not deterministic, there is a non-0 chance they can fail. The recent set of changes to the decoy selection algo (#7821) seem to cause a marginally higher failure rate in the tests. This PR makes small changes to the tests to get them to pass at a much higher frequency, so that they don't disrupt CI, while still maintaining a solid bar to pass.
It would be nice to eventually have deterministic tests using a random seed(s) so that results are reproducible.
Summary of changes
select_outputs.gamma
This tests the median age of selected outputs is the expected median. I decreased the expected median to account for recent changes to the algorithm & added logic to re-attempt the test with a 10x larger sample size if the test fails on first try.
select_outputs.density
This tests that outputs from blocks of various sizes are picked in sufficient proportion by the algorithm. I allowed a wider deviation from chain data to selected data for larger blocks, and a smaller deviation for smaller blocks (the allowed deviation is proportional to size now) + tested some other sensible heuristics.
select_outputs.same_distribution
This tests that the distribution of outputs picked matches the distribution of blocks which they are part of are picked. I allowed a slightly wider deviation from chain data to selected data.
Results
I ran all gamma tests 130k times before making the final changes included in this PR (special thank you to @Gingeropolous for letting me use his monster machines), here were my results:
9 failures total
select_outputs.gamma:
0 failuresselect_outputs.density
: 2 failuresselect_outputs.same_distribution
: 7 failuresselect_outputs.density
Using a
MAX_DEVIATION
of 2.0, I got the following 2 errors:Going with a
MAX_DEVIATION
> 2.113 would have avoided both. So I increased theMAX_DEVIATION
to 2.25 in this final PR just to add a decent buffer within reason.select_outputs.same_distribution
Sticking with the original allowed
avg_dev
of 0.02, I got the following 7 errors:Bumping to 0.0219 would have avoided all the above errors, so I went with 0.025 following the same logic.