unit tests: Harden flaky decoy output selection tests #8024

j-berman · 2021-10-27T05:13:58Z

Overview

Since gamma tests are not deterministic, there is a non-0 chance they can fail. The recent set of changes to the decoy selection algo (#7821) seem to cause a marginally higher failure rate in the tests. This PR makes small changes to the tests to get them to pass at a much higher frequency, so that they don't disrupt CI, while still maintaining a solid bar to pass.

It would be nice to eventually have deterministic tests using a random seed(s) so that results are reproducible.

Summary of changes

select_outputs.gamma

This tests the median age of selected outputs is the expected median. I decreased the expected median to account for recent changes to the algorithm & added logic to re-attempt the test with a 10x larger sample size if the test fails on first try.

select_outputs.density

This tests that outputs from blocks of various sizes are picked in sufficient proportion by the algorithm. I allowed a wider deviation from chain data to selected data for larger blocks, and a smaller deviation for smaller blocks (the allowed deviation is proportional to size now) + tested some other sensible heuristics.

select_outputs.same_distribution

This tests that the distribution of outputs picked matches the distribution of blocks which they are part of are picked. I allowed a slightly wider deviation from chain data to selected data.

Results

I ran all gamma tests 130k times before making the final changes included in this PR (special thank you to @Gingeropolous for letting me use his monster machines), here were my results:

9 failures total
select_outputs.gamma: 0 failures
select_outputs.density: 2 failures
select_outputs.same_distribution: 7 failures

select_outputs.density

Using a MAX_DEVIATION of 2.0, I got the following 2 errors:

Expected: (MAX_DEVIATION * chain_ratio) > (selected_ratio), actual: 0.00381708 vs 0.003865
Expected: (MAX_DEVIATION * chain_ratio) > (selected_ratio), actual: 0.00743647 vs 0.007587

Going with a MAX_DEVIATION > 2.113 would have avoided both. So I increased the MAX_DEVIATION to 2.25 in this final PR just to add a decent buffer within reason.

select_outputs.same_distribution

Sticking with the original allowed avg_dev of 0.02, I got the following 7 errors:

Expected: (avg_dev) < (0.02), actual: 0.020848 vs 0.02
Expected: (avg_dev) < (0.02), actual: 0.0204963 vs 0.02
Expected: (avg_dev) < (0.02), actual: 0.0201313 vs 0.02
Expected: (avg_dev) < (0.02), actual: 0.0200123 vs 0.02
Expected: (avg_dev) < (0.02), actual: 0.0214195 vs 0.02
Expected: (avg_dev) < (0.02), actual: 0.0218481 vs 0.02
Expected: (avg_dev) < (0.02), actual: 0.020335 vs 0.02

Bumping to 0.0219 would have avoided all the above errors, so I went with 0.025 following the same logic.

- select_outputs.gamma: decrease expected median for recent changes to algorithm (monero-project#7821) & re-attempt test with 10x larger sample size if the test fails on first try - select_outputs.density: allow a wider deviation from chain data to selected data for larger blocks, and smaller deviation for smaller blocks (the allowed deviation is proportional to size now) + test some other sensible heuristics - select_outputs.same_distribution: allow slightly larger average deviation from picks to chain data - still not perfect, but only deterministic tests can be perfect

j-berman changed the title ~~Harden flaky decoy output selection tests~~ unit tests: Harden flaky decoy output selection tests Oct 27, 2021

j-berman mentioned this pull request Oct 27, 2021

unit tests: Harden flaky decoy output selection tests #8025

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unit tests: Harden flaky decoy output selection tests #8024

unit tests: Harden flaky decoy output selection tests #8024

j-berman commented Oct 27, 2021

unit tests: Harden flaky decoy output selection tests #8024

Are you sure you want to change the base?

unit tests: Harden flaky decoy output selection tests #8024

Conversation

j-berman commented Oct 27, 2021

Overview

Summary of changes

Results

select_outputs.density

select_outputs.same_distribution