# ZooKeeper

- [Home](https://zookeeper.apache.org/)
  - ZooKeeper Programmer's Guide
  - ZooKeeper Internals
  - ZooKeeper-cli: the ZooKeeper command line interface
- [Builtin ACL Schemes](https://zookeeper.apache.org/doc/r3.7.2/zookeeperProgrammers.html#sc_BuiltinACLSchemes)

Code: `D:\workspace\rtfsc\zookeeper`

# Introduction

1. ZooKeeper: Because Coordinating Distributed Systems is a Zoo

ZooKeeper is a high-performance **coordination service for distributed applications**. 

It exposes common services - such as naming, configuration management, synchronization, and group services - in a simple interface so you don't have to write them from scratch. 

You can use it off-the-shelf to implement *consensus*, *group management*, *leader election*, and *presence* **protocols**. And you can build on it for your own, specific needs.

2. Assumption

Our protocol assumes that we can construct **point-to-point FIFO channels between the servers**. While similar services usually assume message delivery that can lose or reorder messages, our assumption of FIFO channels is very practical given that we **use TCP for communication**. Specifically we rely on the following property of TCP:

Ordered delivery : Data is delivered in the same order it is sent and a message m is delivered only after all messages sent before m have been delivered. (The corollary to this is that if message m is lost all messages after m will be lost.)

No message after close : Once a FIFO channel is closed, no messages will be received from it.

# Conventions


- `myid`: service id.
- `all_server_count`: all server count.
- `zxid`: transaction id, 2 32-bit parts `(epoch, count)`, reflects total ordering.

Each time a new leader comes into power it will have its own `epoch` number.
We have a simple algorithm to assign a unique `zxid` to a proposal: the leader simply increments the `zxid` to obtain a unique zxid for each proposal.

- server state: `LOOKING`, `FOLLOWING`, `LEADING`, `OBSERVING`
- proposal

```
// leader election(LE) proposal
<proposal>=(
epoch,
current_server_state,
self_myid,    // my knowledge
self_max_zxid,
vote_myid,    // voting
vote_max_zxid
)

// LE proposal bookkeeping
<current_epoch>
<bookkeeping>=
[
  (voter_myid, candidate_myid)
]
```


# Leader Activation

## FastLeaderElection

> Actions performed by the servers.

(1) initial/preferred LE proposal: voting to itself.

```
<self_proposal>=(epoch, LOOKING, myid, max_zxid, myid, max_zxid)
boradcast <self_proposal>
```

(2) update proposal bookkeeping

```
if <proposal>.epoch < <current_epoch>: // 1. old proposal
  drop <proposal>
else if <proposal>.epoch > <current_epoch>: // 2. new proposal
  <current_epoch> = <proposal>.epoch
  update and resend <self_proposal>
else: // 3. current epoch proposal
  if <proposal>.vote_max_zxid < <self_proposal>.vote_max_zxid: // 3.1 know less
    add/update <proposal> to <bookkeeping>
  else if <proposal>.vote_max_zxid > <self_proposal>.vote_max_zxid: // 3.2 know more
    <self_proposal>.vote_myid = <proposal>.vote_myid // update my voting
    <self_proposal>.vote_max_zxid = <proposal>.vote_max_zxid
    resend <self_proposal>
  else: // 3.3 know same: order by myid
    if <proposal>.vote_myid < <self_proposal>.vote_myid:
      add/update <proposal> to <bookkeeping>
    else:
      <self_proposal>.vote_myid = <proposal>.vote_myid
      resend <self_proposal>
```

(3) determine server state

```
SELECT <proposal>.vote_myid, COUNT(<proposal>.vote_myid) AS CANDIDATE_COUNT
FROM <bookkeeping>
GROUP BY <proposal>.vote_myid

if found any CANDIDATE_COUNT > (all_server_count / 2):
  // believe the state server (<proposal>.vote_myid) = LEADING
  if myid = <proposal>.vote_myid:
    my state = LEADING
    sync with follower
    <current_epoch> = <current_epoch> + 1 // (epoch+1, 0)
    send NEW_LEADER proposal
    keep eye on HEARTBEATs of followers
  else:
    my state = FOLLOWING
    prepare to send HEARTBEAT to leader
```

## Fail-over

- Follower restart

```
 this follower
<self_proposal>=(epoch, LOOKING, myid, max_zxid, myid, max_zxid)
boradcast <self_proposal>

 leader
<proposal>=(epoch, LEADING, myid, max_zxid, myid, max_zxid)

 other followers
<proposal>=(epoch, FOLLOWING, myid, max_zxid, leader.myid, leader.max_zxid)
```

```
 this follower
mystate = FOLLOWING
```

- Leader restart

```
 followers
if find leader down through HEARTBEAT:
  trigger FastLeaderElection

 leader back online
<self_proposal>=(epoch, LOOKING, myid, max_zxid, myid, max_zxid)
branches:
  - find the new leader
  - leader election is in process
```


# Active Messaging

ZooKeeper messaging operates similar to a classic two-phase commit.

All communication channels are FIFO, so everything is done in order. Specifically the following operating constraints are observed:

- (1) The leader sends proposals to all followers using the same order. Moreover, this order follows the order in which requests have been received. Because we use FIFO channels this means that followers also receive proposals in order.
- (2) Followers process messages in the order they are received. This means that messages will be `ACK`ed in order and the leader will receive `ACK`s from followers in order, due to the FIFO channels. It also means that if message `m` has been *written to non-volatile storage*, all messages that were proposed before m have been written to non-volatile storage.
- (3) The leader will issue a `COMMIT` to all followers as soon as a quorum of followers have `ACK`ed a message. Since messages are `ACK`ed in order, `COMMIT`s will be sent by the leader as received by the followers in order.
- (4) `COMMIT`s are processed in order. Followers deliver a proposal message when that proposal is committed.

# Application: Distributed Lock

- znode: Persistent/Ephemeral, Sequence/Non-sequence
- Watch: attach a one-time watch on read operation; when the watched node is updated, a notidication is send to the watcher.

# More

UI:
- [ZooKeeper Assistant](https://www.redisant.com/za): ZooKeeper Desktop GUI, 免费版本只能连接本地.
- [PrettyZoo](https://github.com/vran-dev/PrettyZoo): archived. with zkCli.sh.
- [ZooNavigator](https://github.com/elkozmon/zoonavigator): Web-based ZooKeeper UI / editor / browser. ZooKeeper versions 3.4.x and 3.5.x are currently supported.

- 深入浅出Zookeeper（一） Zookeeper架构及FastLeaderElection机制: http://www.jasongj.com/zookeeper/fastleaderelection/
- 深入浅出Zookeeper（二） 基于Zookeeper的分布式锁与领导选举: http://www.jasongj.com/zookeeper/distributedlock/