Use deterministic latency in tests to avid intermittent failures #269

masih · 2024-05-22T17:23:19Z

The intermittent failures always occur for async tests, in form of going beyond the maximum number of rounds allowed in tests (mostly 10). The number of rounds is directly impacted by the order of message delivery to each node, dictated by the latency model.

The Async test options use a latency model that is instantiated when the simulation options are instantiated, not when the simulation itself is instantiated and run. This results in the reuse of the same latency model object across multiple tests.

In go tests may be run in parallel, but the same test never runs in parallel with itself when repeated multiple times via -count=n flag. When test are run in parallel using the same latency object across tests becomes the root cause of indeterministic behaviour, where at times a test case can require more rounds than allowed to complete.

The changes here introduce a latency model instantiator, called latency.Modeler which encapsulates the responsibility of constructing a latency model object used by a simulation. The simulation is then adopted to take modelers as option, not already instantiated models. This change dramatically reduces the scope of error that was previously occurring where latency models with unclean RNG state were being used by multiple test and assures that the order of message delivery in tests is deterministic regardless of the test execution parallelism.

Additionally, maximum test parallelism is enabled across all table tests and fuzz tests to reduce the total execution runtime.

Fixes #262

codecov-commenter · 2024-05-22T17:26:21Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.00%. Comparing base (a3670f6) to head (6af1dc2).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #269      +/-   ##
==========================================
+ Coverage   71.99%   72.00%   +0.01%     
==========================================
  Files          32       32              
  Lines        2389     2390       +1     
==========================================
+ Hits         1720     1721       +1     
  Misses        537      537              
  Partials      132      132

Files	Coverage Δ
sim/options.go	`75.67% <100.00%> (+0.33%)`	⬆️

github-actions · 2024-05-22T17:30:04Z

Fuzz test failed on commit 5fc1170. To troubleshoot locally, download the seed corpus using GitHub CLI by running:

gh run download 9195742636 -n testdata

Aleternatively, download directly from here.

Stebalien

LGTM although I'd consider breaking this into two PRs:

One to change from a latency model to latency modeler.
One to change the tests (parallel, naming, etc.).

masih · 2024-05-23T10:15:16Z

I'd consider breaking this into two PRs

Sure. Test refactors and extra parallelism increase is now removed from this one. Next one coming up shortly.

The intermittent failures always occur for async tests, in form of going beyond the maximum number of rounds allowed in tests (mostly 10). The number of rounds is directly impacted by the order of message delivery to each node, dictated by the latency model. The Async test options use a latency model that is instantiated when the simulation _options_ are instantiated, not when the simulation itself is instantiated and run. This results in the reuse of the same latency model object across multiple tests. In go tests may be run in parallel, but the same test never runs in parallel with itself when repeated multiple times via `-count=n` flag. When test are run in parallel using the same latency object across tests becomes the root cause of indeterministic behaviour, where at times a test case can require more rounds than allowed to complete. The changes here introduce a latency model instantiator, called `latency.Modeler` which encapsulates the responsibility of constructing a latency model object used by a simulation. The simulation is then adopted to take modelers as option, not already instantiated models. This change dramatically reduces the scope of error that was previously occurring where latency models with unclean RNG state were being used by multiple test and assures that the order of message delivery in tests is deterministic regardless of the test execution parallelism. Fixes #262

masih requested review from Kubuxu and anorth May 22, 2024 17:27

masih force-pushed the masih/fix-intermittent-test-failures branch from 5fc1170 to 6af1dc2 Compare May 22, 2024 17:32

Stebalien approved these changes May 22, 2024

View reviewed changes

masih force-pushed the masih/fix-intermittent-test-failures branch from 6af1dc2 to c683815 Compare May 23, 2024 10:14

masih force-pushed the masih/fix-intermittent-test-failures branch from c683815 to a18a596 Compare May 23, 2024 10:15

masih added this pull request to the merge queue May 23, 2024

Merged via the queue into main with commit 4415a55 May 23, 2024

masih deleted the masih/fix-intermittent-test-failures branch May 23, 2024 10:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use deterministic latency in tests to avid intermittent failures #269

Use deterministic latency in tests to avid intermittent failures #269

Uh oh!

masih commented May 22, 2024 •

edited

Loading

Uh oh!

codecov-commenter commented May 22, 2024 •

edited

Loading

Uh oh!

github-actions bot commented May 22, 2024

Uh oh!

Stebalien left a comment

Uh oh!

masih commented May 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Use deterministic latency in tests to avid intermittent failures #269

Use deterministic latency in tests to avid intermittent failures #269

Uh oh!

Conversation

masih commented May 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented May 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented May 22, 2024

Uh oh!

Stebalien left a comment

Choose a reason for hiding this comment

Uh oh!

masih commented May 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

masih commented May 22, 2024 •

edited

Loading

codecov-commenter commented May 22, 2024 •

edited

Loading