-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read skew leading to write corruption #2143
Comments
@aphyr :
|
Sorry bout that @janardhan1993! This was a bug in the Jepsen test framework, introduced as a part of writing followup tests, which inadvertently broke the bank test. I've fixed that bug and re-tested with the latest nightly build (2018-03-26, 8f9eff3). In this case, 20180327T114802.000-0500.zip, two minutes of transfers on a healthy cluster can turn $100 into $180,. |
The test passes with https://transfer.sh/13hS3v/dgraph-linux-amd64.tar.gz But is still failing with network partitions |
This passes now with server_side_sequencing hardcoded in server. |
I've re-tested with the @upsert schema directive and dgraph4j 1.3.0 with both the current nightly (5353140) and the fix/jepsen_delete build (224b560) and both of them still fail after ~5-10 minutes of testing, with no nemeses. Is there a newer build I should try?
|
Ah, wait, my mistake--I didn't realize the default would be to use client-side sequencing. I'll add flags and re-try with that option! |
Looks like with server-side sequencing, this is good to go. |
In the nightly build for 2018/02/19, with Jepsen, I can reliably reproduce what appears to be a snapshot isolation bug in which the sum of accounts gradually drifts higher and lower over time. This history also appears to exhibit a sequential consistency violation, where clients fail to read their own writes on the same nodes.
This particular version of the bank workload uses a three-part schema:
Every account has type "account", and is identified by a numeric key. We query for accounts with
and write back the key and amount, but not the type.
In this five-node cluster, with replication factor three, shortly after a partition begins...
Process 39 initiates and completes a transfer of $3 from account 5 to 0.
Process 39 fails to make two transfers due to insufficient funds.
Now something very interesting happens. Process 39 makes three reads in rapid fire, and observes the previous state (summing to 100), then the withdrawal it made only (summing to 97), then the previous state again. The value of account 5 goes from 3 to 0 to 3 again.
I believe this is not limited to read-only transactions: transfer transactions appear able to propagate these skewed reads into successful writes. A few transactions later, process 32 sees half of its own transaction moving $3 from account 1 to account 3.
These accounts sum to 104, instead of the expected 100. Moreover, process 39's concurrent transfer of $2 from 3 to 6 appears to observe, then write back, values based on that read skew: the total balance remains 104 for the next several transactions.
Total balances appear to random-walk over time; we observe totals as high as 116, which is more than 3x the maximum transfer amount. Moreover, note that each process is bound to a single client, against a single node, which means that a single client can write something to a node, get a successful acknowledgement, then fail to read back its write from that same node.
You can reproduce this with Jepsen 077bbff27120ab2950928cea64e5a113a4fad32c, by running
For future users trying to reproduce this: I think you'll want the next release after 2018/2/19; the behavior of the nightly package url above will change every day.
The text was updated successfully, but these errors were encountered: