-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Closed
Labels
area/testing/jepsenkind/bugSomething is broken.Something is broken.status/acceptedWe accept to investigate/work on it.We accept to investigate/work on it.
Description
What version of Dgraph are you using?
v1.1.1-48-g157896305
Have you tried reproducing the issue with the latest release?
Yep! This is the current dev release.
What is the hardware spec (RAM, OS)?
A local Docker cluster, 48-way Xeon, 128 GB ECC.
Steps to reproduce the issue (command/config used to run Dgraph).
With Jepsen 3bff032adf3a4277e5cbbc2cd05ecec90c69f61e, try:
lein run test --local-binary dgraph-v1.1.1-48-g157896305 --concurrency 2n --nemesis move-tablet --time-limit 300 -w uid-set
This one may take a few runs; I don't have a good estimate of frequency yet.
Expected behaviour and actual result.
The UID set test appears to lose massive, contiguous regions of writes when tablet moves are allowed to occur. For instance, in this test run, everything looks fine until write 11350. The following 11544 acknowledged writes are missing from the final read; then writes from 23761 onwards look OK. Roughly 52% of acknowledged writes lost.
:stats {:valid? true,
:count 23879,
:ok-count 22188,
:fail-count 1491,
:info-count 200,
:by-f {:add {:valid? true,
:count 23878,
:ok-count 22187,
:fail-count 1491,
:info-count 200},
:read {:valid? true,
:count 1,
:ok-count 1,
:fail-count 0,
:info-count 0}}},
:workload {:ok-count 10652,
:valid? false,
:lost-count 11544,
:lost "#{11350..11958 12032..12095 12097 12099..12102 12106..12107 12112..12113 12115..12403 12481..13288 13291..13295 13298..13300 13303..13306 13311..13317 13370..13983 14074..14518 14520..14523 14525 14527..14530 14532 14534 14536..14538 14540 14543 14545..15490 15557..15762 15764..15772 15775..15776 15778..15779 15781 15783 15785..15791 15793..15795 15797 15799..16821 16823..16824 16902..16995 16998..16999 17001 17004 17006 17008..17011 17013..17016 17018..17021 17023..18171 18173..18182 18185 18187..18189 18192..18194 18197..18200 18203..19394 19396..19404 19406..19408 19412..19416 19420..19423 19425..19440 19442..19506 19562 19564..20189 20291..20481 20483..20489 20491 20493..20494 20498 20500 20502..20507 20509..20514 20516..21110 21179..21726 21728..21738 21742..21745 21747 21750..21759 21761..21762 21764..21765 21767..22653 22714..22763 22765..22766 22768..22779 22783..22787 22789..22791 22793..22798 22800..22808 22810..22813 22815..23713 23715}",
:acknowledged-count 22187,
:recovered "#{973 2250 3566 3568 4700 5968 7210 10952 10974}",
:ok "#{0..379 435..437 439..973 983..1100 1155..2247 2249..2250 2255..2256 2259 2261..2264 2266..2805 2860..3074 3125..3261 3316..3562 3564..3566 3568..3569 3574 3577..4691 4693..4697 4699..4700 4706..4710 4712..4715 4717..5772 5830..5958 5960 5962 5964 5966 5968..5972 5974..5975 5977 5979..5983 5985..6265 6330..6656 6716..7221 7224 7227..7228 7230..7237 7240 7242..7243 7245..7310 7357 7360..8135 8199..8438 8440..8443 8449..8450 8452..8454 8457..8463 8465..9351 9408..9409 9413..9721 9723..9731 9733..9734 9737..9738 9740 9743..9746 9750..10164 10211 10214..10946 10948..10959 10966..10970 10972..11281 23761..23877}",
:attempt-count 23878,
:unexpected "#{}",
:unexpected-count 0,
:recovered-count 9},Metadata
Metadata
Assignees
Labels
area/testing/jepsenkind/bugSomething is broken.Something is broken.status/acceptedWe accept to investigate/work on it.We accept to investigate/work on it.