Skip to content

SI violation with node crashes or predicate moves in bank tests #2321

Closed
@aphyr

Description

@aphyr

In healthy clusters, with @upsert schema directives on key and type attributes (but not on values), and server-side ordering set for all client transactions, version

Dgraph version   : v1.0.4
Commit SHA-1     : 6fb69e2
Commit timestamp : 2018-04-09 21:26:31 +0530
Branch           : jan/node_lockup

can exhibit a violation of snapshot isolation, allowing account values to drift higher or lower over time. This may be related to #2143 (the previous bank test failure without any nemesis, now closed), but I suspect this issue miiight be different than #2290, where documents are lost or have nil values instead of their correct data, because that would cause an undercount, and in bank tests, I can also observe overcounting. For instance, in 20180409T163239.000-0500.zip:

9	:invoke	:transfer	{:from 1, :to 3, :amount 2}
8	:ok	:read	{0 43, 1 4, 2 3, 3 3, 4 10, 5 7, 6 27, 7 3}
8	:invoke	:transfer	{:from 2, :to 6, :amount 4}
6	:fail	:transfer	{:from 7, :to 4, :amount 1}	:conflict
6	:invoke	:read	nil
3	:fail	:transfer	{:from 5, :to 7, :amount 1}	:conflict
3	:invoke	:read	nil
4	:ok	:read	{0 43, 1 4, 2 3, 3 3, 4 10, 5 7, 6 27, 7 3}
4	:invoke	:read	nil
2	:ok	:read	{0 43, 1 4, 2 5, 3 3, 4 10, 5 7, 6 25, 7 3}
6	:ok	:read	{0 43, 1 4, 2 3, 3 4, 4 10, 5 7, 6 27, 7 2}
2	:invoke	:read	nil
6	:invoke	:read	nil
6	:ok	:read	{0 43, 1 4, 2 3, 3 4, 4 10, 5 7, 6 27, 7 2}
6	:invoke	:read	nil
4	:ok	:read	{0 43, 1 4, 2 3, 3 4, 4 10, 5 7, 6 27, 7 2}
4	:invoke	:read	nil
3	:ok	:read	{0 43, 1 4, 2 3, 3 4, 4 10, 5 7, 6 27, 7 2}
3	:invoke	:read	nil
8	:fail	:transfer	{:from 2, :to 6, :amount 4}	:insufficient-funds
8	:invoke	:transfer	{:from 4, :to 7, :amount 5}
0	:ok	:transfer	{:from 1, :to 6, :amount 2}
0	:invoke	:read	nil
6	:ok	:read	{0 43, 1 2, 2 3, 3 4, 4 10, 5 7, 6 29, 7 2}
6	:invoke	:read	nil
5	:fail	:transfer	{:from 1, :to 7, :amount 3}	:conflict
5	:invoke	:read	nil
1	:fail	:transfer	{:from 6, :to 2, :amount 2}	:conflict
1	:invoke	:transfer	{:from 6, :to 2, :amount 2}
0	:ok	:read	{0 43, 1 2, 2 3, 3 4, 4 10, 5 7, 6 29, 7 2}
3	:ok	:read	{0 43, 1 2, 2 3, 3 4, 4 10, 5 7, 6 29, 7 2}
0	:invoke	:transfer	{:from 2, :to 0, :amount 2}
3	:invoke	:transfer	{:from 7, :to 1, :amount 4}
5	:ok	:read	{0 43, 1 2, 2 3, 3 4, 4 10, 5 7, 6 29, 7 2}
4	:ok	:read	{0 43, 1 2, 2 3, 3 4, 4 10, 5 7, 6 29, 7 2}
5	:invoke	:read	nil
4	:invoke	:transfer	{:from 6, :to 5, :amount 3}
2	:ok	:read	{0 43, 1 4, 2 3, 3 4, 4 10, 5 7, 6 27, 7 2}
2	:invoke	:read	nil
5	:ok	:read	{0 43, 1 2, 2 3, 3 4, 4 10, 5 7, 6 29, 7 2}
6	:ok	:read	{0 43, 1 2, 2 3, 3 4, 4 10, 5 7, 6 29, 7 2}
5	:invoke	:read	nil
6	:invoke	:read	nil
5	:ok	:read	{0 43, 1 2, 2 3, 3 4, 4 10, 5 7, 6 29, 7 2}
5	:invoke	:transfer	{:from 1, :to 0, :amount 5}
6	:ok	:read	{0 43, 1 2, 2 3, 3 4, 4 10, 5 7, 6 29, 7 2}
6	:invoke	:transfer	{:from 1, :to 3, :amount 5}
9	:ok	:transfer	{:from 1, :to 3, :amount 2}
9	:invoke	:transfer	{:from 0, :to 2, :amount 2}
0	:ok	:transfer	{:from 2, :to 0, :amount 2}
0	:invoke	:transfer	{:from 3, :to 5, :amount 5}
5	:fail	:transfer	{:from 1, :to 0, :amount 5}	:insufficient-funds
5	:invoke	:read	nil
2	:ok	:read	{0 43, 1 2, 2 3, 3 4, 4 10, 5 7, 6 29, 7 2}
2	:invoke	:transfer	{:from 0, :to 2, :amount 2}
6	:fail	:transfer	{:from 1, :to 3, :amount 5}	:insufficient-funds
6	:invoke	:transfer	{:from 1, :to 2, :amount 5}
5	:ok	:read	{0 45, 1 2, 2 1, 3 6, 4 10, 5 7, 6 29, 7 2}

Here, the final read has a total of 102: 2 has been transferred from account 2 to account 0, but account 3 has also gained 2 out of nowhere. The only recent increment of account 3 comes from process 9's transfer of 2 from 1 to 3, but it doesn't appear to have applied atomically?

These don't appear to just be read-only anomalies; they can be promoted via writes back into the database state--the value remains 102 for the remainder of the test.

You can reproduce this with Jepsen 0ef6e711dfb07aad4afc84f7f9c3348961afa9d7 by running

lein run test --package-url https://transfer.sh/Z5CTJ/dgraph-linux-amd64.tar.gz --time-limit 300 --concurrency 2n --nemesis kill-alpha,fix-alpha,kill-zero --test-count 20 --workload bank --upsert-schema

Metadata

Metadata

Assignees

Labels

kind/bugSomething is broken.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions