Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
GitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
When minority of members removed from network, read/write failures occur on cluster #12240
Expected no failure during read/write operations to majority of cluster.
Data structures that failures observed are:
Can you share the link to the test?
How do these failures manifest? Does the client get some kind of exception?
Can you also cleanup the client logs, as I see lots of logs unrelated to the client instance, e.g.
@mmedenjak thanks for the reply.
During failures client side does not report exception, we collect
For each data structure that we built there are simple predicate functions that we verify in
I attached logs for a newer another run that has
So when it says
It means these checks have failed?
And this is using just one atomic long?
Where is the test code?
I have analyzed this failure. For instance, the following code shows how IAtomicLong is tested:
The client performs the write -> read -> test loop on the cluster and expects all steps to return
For quorum tests, I think we should only test two scenarios:
@lazerion confirmed that we already have tests for both of these scenarios. Any other correctness test is possibly irrelevant for now.
I suggest two options. We can either close this issue or move it to the 3.11. Once we have raft impl behind atomic data structures, we can make proper correctness tests for these stuff.