Description
In healthy clusters, with @upsert schema directives on key
and type
attributes (but not on values), and server-side ordering set for all client transactions, version
Dgraph version : v1.0.4
Commit SHA-1 : 6fb69e2
Commit timestamp : 2018-04-09 21:26:31 +0530
Branch : jan/node_lockup
can exhibit a violation of snapshot isolation, allowing account values to drift higher or lower over time. This may be related to #2143 (the previous bank test failure without any nemesis, now closed), but I suspect this issue miiight be different than #2290, where documents are lost or have nil
values instead of their correct data, because that would cause an undercount, and in bank tests, I can also observe overcounting. For instance, in 20180409T163239.000-0500.zip:
9 :invoke :transfer {:from 1, :to 3, :amount 2}
8 :ok :read {0 43, 1 4, 2 3, 3 3, 4 10, 5 7, 6 27, 7 3}
8 :invoke :transfer {:from 2, :to 6, :amount 4}
6 :fail :transfer {:from 7, :to 4, :amount 1} :conflict
6 :invoke :read nil
3 :fail :transfer {:from 5, :to 7, :amount 1} :conflict
3 :invoke :read nil
4 :ok :read {0 43, 1 4, 2 3, 3 3, 4 10, 5 7, 6 27, 7 3}
4 :invoke :read nil
2 :ok :read {0 43, 1 4, 2 5, 3 3, 4 10, 5 7, 6 25, 7 3}
6 :ok :read {0 43, 1 4, 2 3, 3 4, 4 10, 5 7, 6 27, 7 2}
2 :invoke :read nil
6 :invoke :read nil
6 :ok :read {0 43, 1 4, 2 3, 3 4, 4 10, 5 7, 6 27, 7 2}
6 :invoke :read nil
4 :ok :read {0 43, 1 4, 2 3, 3 4, 4 10, 5 7, 6 27, 7 2}
4 :invoke :read nil
3 :ok :read {0 43, 1 4, 2 3, 3 4, 4 10, 5 7, 6 27, 7 2}
3 :invoke :read nil
8 :fail :transfer {:from 2, :to 6, :amount 4} :insufficient-funds
8 :invoke :transfer {:from 4, :to 7, :amount 5}
0 :ok :transfer {:from 1, :to 6, :amount 2}
0 :invoke :read nil
6 :ok :read {0 43, 1 2, 2 3, 3 4, 4 10, 5 7, 6 29, 7 2}
6 :invoke :read nil
5 :fail :transfer {:from 1, :to 7, :amount 3} :conflict
5 :invoke :read nil
1 :fail :transfer {:from 6, :to 2, :amount 2} :conflict
1 :invoke :transfer {:from 6, :to 2, :amount 2}
0 :ok :read {0 43, 1 2, 2 3, 3 4, 4 10, 5 7, 6 29, 7 2}
3 :ok :read {0 43, 1 2, 2 3, 3 4, 4 10, 5 7, 6 29, 7 2}
0 :invoke :transfer {:from 2, :to 0, :amount 2}
3 :invoke :transfer {:from 7, :to 1, :amount 4}
5 :ok :read {0 43, 1 2, 2 3, 3 4, 4 10, 5 7, 6 29, 7 2}
4 :ok :read {0 43, 1 2, 2 3, 3 4, 4 10, 5 7, 6 29, 7 2}
5 :invoke :read nil
4 :invoke :transfer {:from 6, :to 5, :amount 3}
2 :ok :read {0 43, 1 4, 2 3, 3 4, 4 10, 5 7, 6 27, 7 2}
2 :invoke :read nil
5 :ok :read {0 43, 1 2, 2 3, 3 4, 4 10, 5 7, 6 29, 7 2}
6 :ok :read {0 43, 1 2, 2 3, 3 4, 4 10, 5 7, 6 29, 7 2}
5 :invoke :read nil
6 :invoke :read nil
5 :ok :read {0 43, 1 2, 2 3, 3 4, 4 10, 5 7, 6 29, 7 2}
5 :invoke :transfer {:from 1, :to 0, :amount 5}
6 :ok :read {0 43, 1 2, 2 3, 3 4, 4 10, 5 7, 6 29, 7 2}
6 :invoke :transfer {:from 1, :to 3, :amount 5}
9 :ok :transfer {:from 1, :to 3, :amount 2}
9 :invoke :transfer {:from 0, :to 2, :amount 2}
0 :ok :transfer {:from 2, :to 0, :amount 2}
0 :invoke :transfer {:from 3, :to 5, :amount 5}
5 :fail :transfer {:from 1, :to 0, :amount 5} :insufficient-funds
5 :invoke :read nil
2 :ok :read {0 43, 1 2, 2 3, 3 4, 4 10, 5 7, 6 29, 7 2}
2 :invoke :transfer {:from 0, :to 2, :amount 2}
6 :fail :transfer {:from 1, :to 3, :amount 5} :insufficient-funds
6 :invoke :transfer {:from 1, :to 2, :amount 5}
5 :ok :read {0 45, 1 2, 2 1, 3 6, 4 10, 5 7, 6 29, 7 2}
Here, the final read has a total of 102: 2 has been transferred from account 2 to account 0, but account 3 has also gained 2 out of nowhere. The only recent increment of account 3 comes from process 9's transfer of 2 from 1 to 3, but it doesn't appear to have applied atomically?
These don't appear to just be read-only anomalies; they can be promoted via writes back into the database state--the value remains 102 for the remainder of the test.
You can reproduce this with Jepsen 0ef6e711dfb07aad4afc84f7f9c3348961afa9d7 by running
lein run test --package-url https://transfer.sh/Z5CTJ/dgraph-linux-amd64.tar.gz --time-limit 300 --concurrency 2n --nemesis kill-alpha,fix-alpha,kill-zero --test-count 20 --workload bank --upsert-schema