-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate fuzzing process hung or terminated unexpectedly #251
Comments
More context:
|
The intermittent failures always occur for async tests, in form of going beyond the maximum number of rounds allowed in tests (mostly 10). The number of rounds is directly impacted by the order of message delivery to each node, dictated by the latency model. The Async test options use a latency model that is instantiated when the simulation _options_ are instantiated, not when the simulation itself is instantiated and run. This results in the reuse of the same latency model object across multiple tests. In go tests may be run in parallel, but the same test never runs in parallel with itself when repeated multiple times via `-count=n` flag. When test are run in parallel using the same latency object across tests becomes the root cause of indeterministic behaviour, where at times a test case can require more rounds than allowed to complete. The changes here introduce a latency model instantiator, called `latency.Modeler` which encapsulates the responsibility of constructing a latency model object used by a simulation. The simulation is then adopted to take modelers as option, not already instantiated models. This change dramatically reduces the scope of error that was previously occurring where latency models with unclean RNG state were being used by multiple test and assures that the order of message delivery in tests is deterministic regardless of the test execution parallelism. Additionally, maximum test parallelism is enabled across all table tests and fuzz tests to reduce the total execution runtime. Fixes #262, #251
More context:
|
Each fuzz input run cannot take longer than 10s: https://github.com/golang/go/blob/019353d5323fcbffde939f4e85a68bd0093c6e14/src/internal/fuzz/worker.go#L492-L494 |
Yep that's pretty much the conclusion in the golang issue linked in the description, in that timeout is not configurable in fuzz tests. |
`FuzzHonestMultiInstance_AsyncAgreement` intermittently fails on CI, most likely due to taking too long to complete a test. To avoid intermittent failures: * Decrease the instance count to 3K * change the honest multi instance tests to cover incremental network sizes, up to 5 for sync and up to 4 for async * add additional static corpus to the async test for better coverage. Fixes #251
`FuzzHonestMultiInstance_AsyncAgreement` intermittently fails on CI, most likely due to taking too long to complete a test. To avoid intermittent failures: * Decrease the instance count to 3K * change the honest multi instance tests to cover incremental network sizes, up to 4 * add additional static corpus that failed the fuzz test Fixes #251
`FuzzHonestMultiInstance_AsyncAgreement` intermittently fails on CI, most likely due to taking too long to complete a test. To avoid intermittent failures: * Decrease the instance count to 3K * change the honest multi instance tests to cover incremental network sizes, up to 4 * add additional static corpus that failed the fuzz test Fixes #251
`FuzzHonestMultiInstance_AsyncAgreement` intermittently fails on CI, most likely due to taking too long to complete a test. To avoid intermittent failures: * change the honest multi instance tests to cover incremental network sizes, up to 4 * reduce the length of randomly generated EC chains at each instance as this should not affect the test quality for what it is testing. But it should make it run faster due to less GC. Additionally, add static corpus that failed locally after extended fuzz time to the fuzz test. Fixes #251
`FuzzHonestMultiInstance_AsyncAgreement` intermittently fails on CI, most likely due to taking too long to complete a test. To avoid intermittent failures: * change the honest multi instance tests to cover incremental network sizes, up to 4 * reduce the length of randomly generated EC chains at each instance as this should not affect the test quality for what it is testing. But it should make it run faster due to less GC. Additionally, add static corpus that failed locally after extended fuzz time to the fuzz test. Fixes #251
Might be still a bit too slow, at least for my machine:
|
Or not, it is a different test |
I think I know why Did that happen on CI or locally @Kubuxu ? |
Locally |
OK thanks. Since this issue was closed CI has been passing consistently? unless I have missed a failure. Having said that, that could be fluke. I'll take a closer look at the fuzz tests. |
It is more that we are close to the edge of 10s that fuzzing fails from time to time on my laptop. |
Specific fuzz tests seem to sporadically fail with error:
fuzzing process hung or terminated unexpectedly: exit status 2
. This could be a bug of sorts in the fuzzer itself as outlined in golang/go#56238. Or possibly a panic during simulation?See example CI fuzz failure here.
The text was updated successfully, but these errors were encountered: