Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Jepsen transient failures under network partition conditions #7549
Hi! Jepsen tests include five nemeses (test scenarios) that introduce different types of network partitions (see here). The tests add documents to index before, during and after these partitions, and verify that the documents which were acknowledged during the partitions are retrievable afterwards. Sometimes the tests indicate that a number of documents were indexed, but are not retrievable---however, this does not happen on every run (of the same scenario). For example, in a run of 20 times each (against 598854d), the following :lost-frac amounts were reported:
isolate-self-primaries-nemesis 244/361, 2/733, 1/607, 1/603, 1/213, 65/216 (and 14 times 0)
In total, out of a 100 runs, 23 failed.
Hi @pilvitaneli, thanks for the testing results!
We're actively investigating Jepsen tests on top of our own tests, which resulted in #7572. The Jepsen tests helped verify that we fixed the split brain issue (it no longer happens). In all of our runs though, we couldn't simulate a result similar to your first run (the
I'll let you know how our continued testing with Jepsen goes, thanks again for your results!