Document that locks aren't really locks #11457

aphyr · 2019-12-16T22:36:05Z

As a part of our Jepsen testing, we've demonstrated that etcd's locks aren't actually locks: like all "distributed locks" they cannot guarantee mutual exclusion when processes are allowed to run slow or fast, or crash, or messages are delayed, or their clocks are unstable, etc. For example, this workload uses etcd's locks to protect read-modify-write updates to perfect
shared state. Here, the shared state is stored in memory, so we don't have to
worry about latency or failures, but realistically, we'd be sharing state in
something like a filesystem, object store, third-party database, etc.

https://github.com/jepsen-io/etcd/blob/c4787f4e71495584c276e998107ee811160dcea7/src/jepsen/etcd/lock.clj#L150-L163

This is directly adapted from the etcd 3.2 announcement, which demonstrates
using locks to increment a file on disk: https://coreos.com/blog/etcd-3.2-announcement:

Here’s a simple example that concurrently increments a file f in the shell;
the locking ensures everything adds up:

$ echo 0 >f
$ for `seq 1 100`; do
   ETCDCTL_API=3 etcdctl lock mylock -- bash -c 'expr 1 + $(cat f) > f' &
   pids="$pids $!"
done
$ wait $pids
$ cat f

Instead of updating a file on disk, our workload adds unique integers to a set
by reading a mutable variable, waiting some random amount of time from 0 to 2
seconds, and setting the variable to the value that was read, plus the given
integer.

We use the same lock acquisition and release strategy as described before: we
grant a lease with a 2-second TTL, keep it alive indefinitely using a watchdog
thread, then acquire a lock with that lease:

https://github.com/jepsen-io/etcd/blob/c4787f4e71495584c276e998107ee811160dcea7/src/jepsen/etcd/lock.clj#L33-L37

When we partition away leader nodes every 10 seconds or so, this workload
exhibits both lost updates and stale reads (due to the in-memory state
"flickering" as competing lock holders overwrite each other). This 60-second
test lost 10/42 successfully completed writes:

 :stats
  {:valid? true,
   :count 103,
   :ok-count 83,
   :fail-count 20,
   :info-count 0,
   :by-f
   {:add
    {:valid? true,
     :count 62,
     :ok-count 42,
     :fail-count 20,
     :info-count 0},
    :read
    {:valid? true,
     :count 41,
     :ok-count 41,
     :fail-count 0,
     :info-count 0}}},
  :exceptions {:valid? true},
  :workload
  {:set
   {:worst-stale
    ({:element 50,
      :outcome :stable,
      :stable-latency 886,
      :lost-latency nil,
      :known
      {:type :ok,
       :f :add,
       :value 50,
       :process 0,
       :time 51688906389,
       :error [:not-found "etcdserver: requested lease not found"],
       :index 191},
      :last-absent
      {:type :invoke,
       :f :read,
       :process 2,
       :time 52575661586,
       :index 200}}),
    :duplicated-count 0,
    :valid? false,
    :lost-count 10,
    :lost (21 22 30 32 33 41 42 51 52 53),
    :stable-count 31,
    :stale-count 1,
    :stale (50),
    :never-read-count 21,
    :stable-latencies {0 0, 0.5 0, 0.95 0, 0.99 886, 1 886},
    :attempt-count 62,
    :lost-latencies {0 0, 0.5 0, 0.95 119, 0.99 119, 1 119},
    :never-read
    (8 9 10 11 18 19 20 26 27 28 29 38 39 40 48 49 57 58 59 60 61),
    :duplicated {}},
   :timeline {:valid? true},
   :valid? false},

This problem was exacerbated by #11456, but fundamentally cannot be fixed. Users cannot use etcd as a naive locking system: they must carefully couple a fencing token (e.g. the etcd lock key revision number) to any systems they interact with in order to preserve exclusion boundaries.

etcd could remove locks altogether, but I don't think that's strictly necessary: it's still useful to have something which is mostly a lock. For example, users could use locks to ensure that most of the time, one node, rather than dozens, is performing a specific computation. Instead, I'd like to suggest changing the documentation to make these risks, and the correct use of locks, explicit. In particular, I think these pages could be revised:

https://coreos.com/blog/etcd-3.2-announcement

#10096

https://github.com/etcd-io/etcd/blob/master/Documentation/dev-guide/api_concurrency_reference_v3.md

https://github.com/etcd-io/etcd/tree/master/etcdctl#concurrency-commands

The text was updated successfully, but these errors were encountered:

maoling · 2019-12-17T02:41:07Z

@aphyr
fpj also had a great post of this related topic of zookeeper about Note on fencing and distributed locks

kien-truong · 2020-01-14T08:28:23Z

If you use lock to start a long running process, this child process will not even be terminated even if the lẹase is lost. As a result, etcd lock is only usable for short tasks.

mitake mentioned this issue Jan 27, 2020

RFC Documentation: enhance description of lock and lease #11490

Merged

2 tasks

xiang90 closed this as completed in #11490 Mar 5, 2020

michaljurecko mentioned this issue Feb 10, 2024

feat: Add AtomicOp.RequireLock keboola/keboola-as-code#1574

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document that locks aren't really locks #11457

Document that locks aren't really locks #11457

aphyr commented Dec 16, 2019 •

edited

Loading

maoling commented Dec 17, 2019

kien-truong commented Jan 14, 2020

Document that locks aren't really locks #11457

Document that locks aren't really locks #11457

Comments

aphyr commented Dec 16, 2019 • edited Loading

maoling commented Dec 17, 2019

kien-truong commented Jan 14, 2020

aphyr commented Dec 16, 2019 •

edited

Loading