Investigate fuzzing process hung or terminated unexpectedly #251

masih · 2024-05-20T10:04:10Z

Specific fuzz tests seem to sporadically fail with error: fuzzing process hung or terminated unexpectedly: exit status 2. This could be a bug of sorts in the fuzzer itself as outlined in golang/go#56238. Or possibly a panic during simulation?

See example CI fuzz failure here.

The text was updated successfully, but these errors were encountered:

masih · 2024-05-20T10:59:41Z

More context:

this failure is intermittent. For example, rerunning this build passed.
it seems to be sensitive to the fzzer worker count. I can reproduce this locally with -parallel=4, i.e. what the CI fuzzer picks.

The intermittent failures always occur for async tests, in form of going beyond the maximum number of rounds allowed in tests (mostly 10). The number of rounds is directly impacted by the order of message delivery to each node, dictated by the latency model. The Async test options use a latency model that is instantiated when the simulation _options_ are instantiated, not when the simulation itself is instantiated and run. This results in the reuse of the same latency model object across multiple tests. In go tests may be run in parallel, but the same test never runs in parallel with itself when repeated multiple times via `-count=n` flag. When test are run in parallel using the same latency object across tests becomes the root cause of indeterministic behaviour, where at times a test case can require more rounds than allowed to complete. The changes here introduce a latency model instantiator, called `latency.Modeler` which encapsulates the responsibility of constructing a latency model object used by a simulation. The simulation is then adopted to take modelers as option, not already instantiated models. This change dramatically reduces the scope of error that was previously occurring where latency models with unclean RNG state were being used by multiple test and assures that the order of message delivery in tests is deterministic regardless of the test execution parallelism. Additionally, maximum test parallelism is enabled across all table tests and fuzz tests to reduce the total execution runtime. Fixes #262, #251

masih · 2024-05-23T15:00:56Z

More context:

seems to always fail for FuzzHonestMultiInstance_AsyncAgreement

Kubuxu · 2024-05-23T20:13:28Z

Each fuzz input run cannot take longer than 10s: https://github.com/golang/go/blob/019353d5323fcbffde939f4e85a68bd0093c6e14/src/internal/fuzz/worker.go#L492-L494

masih · 2024-05-24T19:48:58Z

Yep that's pretty much the conclusion in the golang issue linked in the description, in that timeout is not configurable in fuzz tests.

`FuzzHonestMultiInstance_AsyncAgreement` intermittently fails on CI, most likely due to taking too long to complete a test. To avoid intermittent failures: * Decrease the instance count to 3K * change the honest multi instance tests to cover incremental network sizes, up to 5 for sync and up to 4 for async * add additional static corpus to the async test for better coverage. Fixes #251

`FuzzHonestMultiInstance_AsyncAgreement` intermittently fails on CI, most likely due to taking too long to complete a test. To avoid intermittent failures: * Decrease the instance count to 3K * change the honest multi instance tests to cover incremental network sizes, up to 4 * add additional static corpus that failed the fuzz test Fixes #251

`FuzzHonestMultiInstance_AsyncAgreement` intermittently fails on CI, most likely due to taking too long to complete a test. To avoid intermittent failures: * change the honest multi instance tests to cover incremental network sizes, up to 4 * reduce the length of randomly generated EC chains at each instance as this should not affect the test quality for what it is testing. But it should make it run faster due to less GC. Additionally, add static corpus that failed locally after extended fuzz time to the fuzz test. Fixes #251

Kubuxu · 2024-05-28T13:26:46Z

Might be still a bit too slow, at least for my machine:

--- FAIL: FuzzHonest_AsyncMajorityCommonPrefix (20.17s)
    fuzzing process hung or terminated unexpectedly: exit status 2
    Failing input written to testdata/fuzz/FuzzHonest_AsyncMajorityCommonPrefix/6bac56228674bce6
    To re-run:
    go test -run=FuzzHonest_AsyncMajorityCommonPrefix/6bac56228674bce6

Kubuxu · 2024-05-28T13:27:05Z

Or not, it is a different test

masih · 2024-05-28T13:48:22Z

I think I know why FuzzHonest_AsyncMajorityCommonPrefix fails. I'll do a re-scan of fuzz tests.

Did that happen on CI or locally @Kubuxu ?

Kubuxu · 2024-05-28T13:52:19Z

Locally

masih · 2024-05-28T13:55:40Z

OK thanks. Since this issue was closed CI has been passing consistently? unless I have missed a failure. Having said that, that could be fluke.

I'll take a closer look at the fuzz tests.

Kubuxu · 2024-05-28T14:34:19Z

It is more that we are close to the edge of 10s that fuzzing fails from time to time on my laptop.

masih added the testing Related to testing and validation label May 20, 2024

masih self-assigned this May 22, 2024

masih added this to F3 May 22, 2024

masih moved this to In progress in F3 May 22, 2024

masih added this to the Milestone 1: Passive Testing Readiness milestone May 22, 2024

masih mentioned this issue May 22, 2024

Use deterministic latency in tests to avid intermittent failures #269

Merged

masih moved this from In progress to Todo in F3 May 23, 2024

masih mentioned this issue May 24, 2024

Adjust Fuzz honest multi-instance to avoid intermittent failures #277

Merged

masih moved this from Todo to In progress in F3 May 24, 2024

masih closed this as completed in #277 May 27, 2024

github-project-automation bot moved this from In progress to Done in F3 May 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate fuzzing process hung or terminated unexpectedly #251

Investigate fuzzing process hung or terminated unexpectedly #251

masih commented May 20, 2024

masih commented May 20, 2024

masih commented May 23, 2024

Kubuxu commented May 23, 2024 •

edited

Loading

masih commented May 24, 2024

Kubuxu commented May 28, 2024

Kubuxu commented May 28, 2024

masih commented May 28, 2024 •

edited

Loading

Kubuxu commented May 28, 2024

masih commented May 28, 2024 •

edited

Loading

Kubuxu commented May 28, 2024

Investigate fuzzing process hung or terminated unexpectedly #251

Investigate fuzzing process hung or terminated unexpectedly #251

Comments

masih commented May 20, 2024

masih commented May 20, 2024

masih commented May 23, 2024

Kubuxu commented May 23, 2024 • edited Loading

masih commented May 24, 2024

Kubuxu commented May 28, 2024

Kubuxu commented May 28, 2024

masih commented May 28, 2024 • edited Loading

Kubuxu commented May 28, 2024

masih commented May 28, 2024 • edited Loading

Kubuxu commented May 28, 2024

Kubuxu commented May 23, 2024 •

edited

Loading

masih commented May 28, 2024 •

edited

Loading

masih commented May 28, 2024 •

edited

Loading