Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unit tests: Harden flaky decoy output selection tests #8024

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

j-berman
Copy link
Collaborator

Overview

Since gamma tests are not deterministic, there is a non-0 chance they can fail. The recent set of changes to the decoy selection algo (#7821) seem to cause a marginally higher failure rate in the tests. This PR makes small changes to the tests to get them to pass at a much higher frequency, so that they don't disrupt CI, while still maintaining a solid bar to pass.

It would be nice to eventually have deterministic tests using a random seed(s) so that results are reproducible.

Summary of changes

select_outputs.gamma

This tests the median age of selected outputs is the expected median. I decreased the expected median to account for recent changes to the algorithm & added logic to re-attempt the test with a 10x larger sample size if the test fails on first try.

select_outputs.density

This tests that outputs from blocks of various sizes are picked in sufficient proportion by the algorithm. I allowed a wider deviation from chain data to selected data for larger blocks, and a smaller deviation for smaller blocks (the allowed deviation is proportional to size now) + tested some other sensible heuristics.

select_outputs.same_distribution

This tests that the distribution of outputs picked matches the distribution of blocks which they are part of are picked. I allowed a slightly wider deviation from chain data to selected data.

Results

I ran all gamma tests 130k times before making the final changes included in this PR (special thank you to @Gingeropolous for letting me use his monster machines), here were my results:

9 failures total
select_outputs.gamma: 0 failures
select_outputs.density: 2 failures
select_outputs.same_distribution: 7 failures

select_outputs.density

Using a MAX_DEVIATION of 2.0, I got the following 2 errors:

Expected: (MAX_DEVIATION * chain_ratio) > (selected_ratio), actual: 0.00381708 vs 0.003865
Expected: (MAX_DEVIATION * chain_ratio) > (selected_ratio), actual: 0.00743647 vs 0.007587

Going with a MAX_DEVIATION > 2.113 would have avoided both. So I increased the MAX_DEVIATION to 2.25 in this final PR just to add a decent buffer within reason.

select_outputs.same_distribution

Sticking with the original allowed avg_dev of 0.02, I got the following 7 errors:

Expected: (avg_dev) < (0.02), actual: 0.020848 vs 0.02
Expected: (avg_dev) < (0.02), actual: 0.0204963 vs 0.02
Expected: (avg_dev) < (0.02), actual: 0.0201313 vs 0.02
Expected: (avg_dev) < (0.02), actual: 0.0200123 vs 0.02
Expected: (avg_dev) < (0.02), actual: 0.0214195 vs 0.02
Expected: (avg_dev) < (0.02), actual: 0.0218481 vs 0.02
Expected: (avg_dev) < (0.02), actual: 0.020335 vs 0.02

Bumping to 0.0219 would have avoided all the above errors, so I went with 0.025 following the same logic.

- select_outputs.gamma: decrease expected median for recent changes to algorithm (monero-project#7821) & re-attempt test with 10x larger sample size if the test fails on first try
- select_outputs.density: allow a wider deviation from chain data to selected data for larger blocks, and smaller deviation for smaller blocks (the allowed deviation is proportional to size now) + test some other sensible heuristics
- select_outputs.same_distribution: allow slightly larger average deviation from picks to chain data
- still not perfect, but only deterministic tests can be perfect
@j-berman j-berman changed the title Harden flaky decoy output selection tests unit tests: Harden flaky decoy output selection tests Oct 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant