Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read skew leading to write corruption #2143

Closed
aphyr opened this issue Feb 19, 2018 · 7 comments
Closed

Read skew leading to write corruption #2143

aphyr opened this issue Feb 19, 2018 · 7 comments
Assignees
Labels
kind/bug Something is broken.
Milestone

Comments

@aphyr
Copy link

aphyr commented Feb 19, 2018

In the nightly build for 2018/02/19, with Jepsen, I can reliably reproduce what appears to be a snapshot isolation bug in which the sum of accounts gradually drifts higher and lower over time. This history also appears to exhibit a sequential consistency violation, where clients fail to read their own writes on the same nodes.

This particular version of the bank workload uses a three-part schema:

key: int @index(int) .
type: string @index(exact) .
amount: int .

Every account has type "account", and is identified by a numeric key. We query for accounts with

{ q(func: eq(key, $key)) { uid key amount } }

and write back the key and amount, but not the type.

In this five-node cluster, with replication factor three, shortly after a partition begins...

{:type :info, :f :start, :process :nemesis, :time 30928687283, :value [:isolated {"n1" #{"n2" "n5" "n3"}, "n4" #{"n2" "n5" "n3"}, "n2" #{"n1" "n4"}, "n5" #{"n1" "n4"}, "n3" #{"n1" "n4"}}]}
{:type :invoke, :f :read, :process 39, :time 54175450678}
{:type :ok, :f :read, :process 39, :time 54177149097, :value {0 48, 1 11, 2 7, 3 1, 4 0, 5 3, 6 24, 7 6}}

Process 39 initiates and completes a transfer of $3 from account 5 to 0.

{:type :invoke, :f :transfer, :value {:from 5, :to 0, :amount 3}, :process 39, :time 54177292423}
{:type :ok, :f :transfer, :value {:from 5, :to 0, :amount 3}, :process 39, :time 54224717067}

Process 39 fails to read its own transfer; this is legal under SI, but might violate the sequential-consistency-within-a-group property that I think Dgraph intends to provide.

```clj
{:type :invoke, :f :read, :process 39, :time 54224966995}
{:type :ok, :f :read, :process 39, :time 54227006450, :value {0 48, 1 11, 2 7, 3 1, 4 0, 5 3, 6 24, 7 6}}

Process 39 fails to make two transfers due to insufficient funds.

{:type :invoke, :f :transfer, :value {:from 4, :to 1, :amount 1}, :process 39, :time 54227194310}
{:type :fail, :f :transfer, :value {:from 4, :to 1, :amount 1}, :process 39, :time 54230152613, :error :insufficient-funds}
{:type :invoke, :f :transfer, :value {:from 3, :to 6, :amount 4}, :process 39, :time 54230302494}
{:type :fail, :f :transfer, :value {:from 3, :to 6, :amount 4}, :process 39, :time 54233091057, :error :insufficient-funds}

Now something very interesting happens. Process 39 makes three reads in rapid fire, and observes the previous state (summing to 100), then the withdrawal it made only (summing to 97), then the previous state again. The value of account 5 goes from 3 to 0 to 3 again.

{:type :invoke, :f :read, :process 39, :time 54233226232}
{:type :ok, :f :read, :process 39, :time 54234963110, :value {0 48, 1 11, 2 7, 3 1, 4 0, 5 3, 6 24, 7 6}}
{:type :invoke, :f :read, :process 39, :time 54235118669}
{:type :ok, :f :read, :process 39, :time 54236829301, :value {0 48, 1 11, 2 7, 3 1, 4 0, 5 0, 6 24, 7 6}}
{:type :invoke, :f :read, :process 39, :time 54236973747}
{:type :ok, :f :read, :process 39, :time 54238624420, :value {0 48, 1 11, 2 7, 3 1, 4 0, 5 3, 6 24, 7 6}}

I believe this is not limited to read-only transactions: transfer transactions appear able to propagate these skewed reads into successful writes. A few transactions later, process 32 sees half of its own transaction moving $3 from account 1 to account 3.

{:type :ok, :f :read, :process 32, :time 54516345674, :value {0 42, 1 7, 2 7, 3 6, 4 0, 5 8, 6 25, 7 6}}
{:type :invoke, :f :transfer, :value {:from 0, :to 1, :amount 4}, :process 32, :time 54516628223}
{:type :ok, :f :transfer, :value {:from 1, :to 3, :amount 3}, :process 39, :time 54521564764}
{:type :invoke, :f :transfer, :value {:from 3, :to 6, :amount 2}, :process 39, :time 54521750272}
{:type :ok, :f :transfer, :value {:from 0, :to 1, :amount 4}, :process 32, :time 54601697974}
{:type :invoke, :f :read, :process 32, :time 54602020895}
{:type :ok, :f :read, :process 32, :time 54603332814, :value {0 38, 1 11, 2 7, 3 9, 4 0, 5 8, 6 25, 7 6}}
{:type :invoke, :f :read, :process 32, :time 54603836870}
{:type :ok, :f :read, :process 32, :time 54605064666, :value {0 38, 1 11, 2 7, 3 9, 4 0, 5 8, 6 25, 7 6}}

These accounts sum to 104, instead of the expected 100. Moreover, process 39's concurrent transfer of $2 from 3 to 6 appears to observe, then write back, values based on that read skew: the total balance remains 104 for the next several transactions.

{:type :invoke, :f :read, :process 32, :time 54605229770}
{:type :ok, :f :read, :process 32, :time 54606573195, :value {0 38, 1 11, 2 7, 3 7, 4 0, 5 8, 6 27, 7 6}}

Total balances appear to random-walk over time; we observe totals as high as 116, which is more than 3x the maximum transfer amount. Moreover, note that each process is bound to a single client, against a single node, which means that a single client can write something to a node, get a successful acknowledgement, then fail to read back its write from that same node.

You can reproduce this with Jepsen 077bbff27120ab2950928cea64e5a113a4fad32c, by running

lein run test -w bank --force-download --package-url https://github.com/dgraph-io/dgraph/releases/download/nightly/dgraph-linux-amd64.tar.gz --time-limit 120

For future users trying to reproduce this: I think you'll want the next release after 2018/2/19; the behavior of the nightly package url above will change every day.

@manishrjain manishrjain added the kind/bug Something is broken. label Feb 19, 2018
@manishrjain manishrjain assigned manishrjain and unassigned pawanrawal Mar 9, 2018
@manishrjain manishrjain added the kind/bug Something is broken. label Mar 21, 2018
@janardhan1993
Copy link
Contributor

@aphyr :
With latest nightly, i see the following error and can't find anything in the logs.

ERROR [2018-03-22 12:31:47,947] main - jepsen.cli Oh jeez, I'm sorry, Jepsen broke. Here's why:
java.lang.IllegalArgumentException: Don't know graphql+- type of nil
	at jepsen.dgraph.client$graphql_type.invokeStatic(client.clj:150) ~[classes/:na]
	at jepsen.dgraph.client$graphql_type.invoke(client.clj:136) ~[classes/:na]
	at jepsen.dgraph.client$query$fn__2308.invoke(client.clj:184) ~[classes/:na]
	at clojure.core$map$fn__5587.invoke(core.clj:2747) ~[clojure-1.9.0.jar:na]
	at clojure.lang.LazySeq.sval(LazySeq.java:40) ~[clojure-1.9.0.jar:na]
	at clojure.lang.LazySeq.seq(LazySeq.java:49) ~[clojure-1.9.0.jar:na]
	at clojure.lang.LazySeq.first(LazySeq.java:71) ~[clojure-1.9.0.jar:na]
	at clojure.lang.RT.first(RT.java:685) ~[clojure-1.9.0.jar:na]
	at clojure.core$first__5106.invokeStatic(core.clj:55) ~[clojure-1.9.0.jar:na]
	at clojure.string$join.invokeStatic(string.clj:180) ~[clojure-1.9.0.jar:na]
	at clojure.string$join.invoke(string.clj:180) ~[clojure-1.9.0.jar:na]
	at jepsen.dgraph.client$query.invokeStatic(client.clj:185) ~[classes/:na]
	at jepsen.dgraph.client$query.invoke(client.clj:171) ~[classes/:na]
	at jepsen.dgraph.client$upsert_BANG_.invokeStatic(client.clj:233) ~[classes/:na]
	at jepsen.dgraph.client$upsert_BANG_.invoke(client.clj:222) ~[classes/:na]
	at jepsen.dgraph.bank.Client.setup_BANG_(bank.clj:37) ~[na:na]
	at jepsen.client$open_compat_BANG_.invokeStatic(client.clj:44) ~[jepsen-0.1.9-SNAPSHOT.jar:na]
	at jepsen.client$open_compat_BANG_.invoke(client.clj:38) ~[jepsen-0.1.9-SNAPSHOT.jar:na]
	at jepsen.core.ClientWorker.setup_worker_BANG_(core.clj:346) ~[jepsen-0.1.9-SNAPSHOT.jar:na]
	at jepsen.core$do_worker_BANG_.invokeStatic(core.clj:165) ~[jepsen-0.1.9-SNAPSHOT.jar:na]
	at jepsen.core$do_worker_BANG_.invoke(core.clj:155) ~[jepsen-0.1.9-SNAPSHOT.jar:na]
	at jepsen.core$run_workers_BANG_$fn__4227$fn__4228.invoke(core.clj:221) ~[jepsen-0.1.9-SNAPSHOT.jar:na]
	at clojure.lang.AFn.applyToHelper(AFn.java:152) ~[clojure-1.9.0.jar:na]
	at clojure.lang.AFn.applyTo(AFn.java:144) ~[clojure-1.9.0.jar:na]
	at clojure.core$apply.invokeStatic(core.clj:657) ~[clojure-1.9.0.jar:na]
	at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1965) ~[clojure-1.9.0.jar:na]
	at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1965) ~[clojure-1.9.0.jar:na]
	at clojure.lang.RestFn.invoke(RestFn.java:425) ~[clojure-1.9.0.jar:na]
	at clojure.lang.AFn.applyToHelper(AFn.java:156) ~[clojure-1.9.0.jar:na]
	at clojure.lang.RestFn.applyTo(RestFn.java:132) ~[clojure-1.9.0.jar:na]
	at clojure.core$apply.invokeStatic(core.clj:661) ~[clojure-1.9.0.jar:na]
	at clojure.core$bound_fn_STAR_$fn__5471.doInvoke(core.clj:1995) ~[clojure-1.9.0.jar:na]
	at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.9.0.jar:na]
	at clojure.lang.AFn.run(AFn.java:22) ~[clojure-1.9.0.jar:na]
	at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_151]

@ghost ghost added the kind/bug Something is broken. label Mar 27, 2018
@aphyr
Copy link
Author

aphyr commented Mar 27, 2018

Sorry bout that @janardhan1993! This was a bug in the Jepsen test framework, introduced as a part of writing followup tests, which inadvertently broke the bank test.

I've fixed that bug and re-tested with the latest nightly build (2018-03-26, 8f9eff3). In this case, 20180327T114802.000-0500.zip, two minutes of transfers on a healthy cluster can turn $100 into $180,.

@ghost ghost added the kind/bug Something is broken. label Mar 28, 2018
@manishrjain manishrjain added this to the Sprint-000 milestone Mar 28, 2018
@janardhan1993
Copy link
Contributor

The test passes with https://transfer.sh/13hS3v/dgraph-linux-amd64.tar.gz
lein run test -w bank --force-download --package-url https://github.com/dgraph-io/dgraph/releases/download/nightly/dgraph-linux-amd64.tar.gz --time-limit 120

But is still failing with network partitions
lein run test -w bank --force-download --package-url https://github.com/dgraph-io/dgraph/releases/download/nightly/dgraph-linux-amd64.tar.gz --time-limit 120 --nemesis partition-random-halves

@janardhan1993
Copy link
Contributor

This passes now with server_side_sequencing hardcoded in server.
@aphyr Need to add @upsert directive for this test as well.
We also need to use latest client and set sequencing mode after this PR (https://github.com/dgraph-io/dgraph4j/pull/54/files) is merged.

@ghost ghost removed the backlog label Apr 4, 2018
@aphyr
Copy link
Author

aphyr commented Apr 4, 2018

I've re-tested with the @upsert schema directive and dgraph4j 1.3.0 with both the current nightly (5353140) and the fix/jepsen_delete build (224b560) and both of them still fail after ~5-10 minutes of testing, with no nemeses. Is there a newer build I should try?

lein run test --package-url https://github.com/dgraph-io/dgraph/releases/download/nightly/dgraph-linux-amd64.tar.gz -f -w bank --time-limit 120 --concurrency 10 --upsert-schema --test-count 10

@aphyr
Copy link
Author

aphyr commented Apr 4, 2018

Ah, wait, my mistake--I didn't realize the default would be to use client-side sequencing. I'll add flags and re-try with that option!

@aphyr
Copy link
Author

aphyr commented Apr 10, 2018

Looks like with server-side sequencing, this is good to go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something is broken.
Development

No branches or pull requests

4 participants