[NEW] Improve "SENTINEL FAILOVER" by using the "FAILOVER" command

**The problem/use-case that the feature addresses**

The "SENTINEL FAILOVER" command does not offer to failover in a coordinated way 
e.g. for maintenance. It just assumes that the current master isn't reachable anymore,
easily leading to situations with multiple masters.

Introduce a "coordinated" variant of the "SENTINEL FAILOVER" command that actually
takes the current master into account during the failover.

**Description of the feature**

Since version 6.2, Redis supports the "FAILOVER" command to
switch master and replica roles in a coordinated fashion. However, one cannot use
this command in a Sentinel setup (Sentinel will usually failover again).

Additionally, the semantics of "FAILOVER" is different from the semantics defined
in the Sentinel client protocol. In the latter, a client will be disconnected
when nodes switch roles. "FAILOVER" only disconnects connections with blocking commands.
(Interestingly, there are clients that disconnect if they find that a connection turns
read only (e.g. redis-py))

We could try to make "FAILOVER" work nevertheless: Below is a proposal for a 
`SENTINEL FAILOVER <master> COORDINATED` command that uses "FAILOVER" 
in a modified forced failover procedure. 

**Alternatives you've considered**

Adapt the "FAILOVER" command for this use case (e.g. by killing client connections 
after failover. But how do we keep up the connection that is used to control
the failover?)

**Additional information**

A proof of concept implementation is at https://github.com/gmbnomis/redis/pull/1.

We can keep the current failover state machine with the following
changes for a coordinated failover:

**SENTINEL_FAILOVER_STATE_NONE**

No change required

**SENTINEL_FAILOVER_STATE_WAIT_START**

No change required

**SENTINEL_FAILOVER_STATE_SELECT_SLAVE**

No change required

**SENTINEL_FAILOVER_STATE_SEND_SLAVEOF_NOONE**

Instead of using "SLAVEOF NOONE", send (to the current master):

```
MULTI
CLIENT PAUSE <timeout> WRITE
FAILOVER TO <promoted replica host> <promoted replica port> TIMEOUT <timeout>
EXEC
```

and progress to `SENTINEL_FAILOVER_STATE_WAIT_PROMOTION`.

Rationale for the CLIENT PAUSE:

As said above, according to the Sentinel client protocol, the client shall
be disconnected when the master role changes to replica. 

FAILOVER does not do that, it just pauses writing clients and unpauses them
once it gets the acknowledgement for the PSYNC FAILOVER.

Fortunately, a CLIENT PAUSE takes precedence over the client pause that is part
of FAILOVER. When the FAILOVER succeeds (in time), the clients remain paused and we 
can disconnect them in the next state.

**NB:** The CLIENT PAUSE is tricky, because we must avoid any write commands in
order not to become paused ourselves. INFO and PING are fine, but the PUBLISH for the
sentinel hello blocks. Thus, we must not send those while clients are blocked.\
Additionally, other sentinels will begin to regard the master as not responding 
once they issue a PUBLISH. That's why the timeout in the PoC implementation is 
set to `down_after_period` instead of using the failover timeout. (probably, we
need further adjustment here. Or do we even need a mechanism to get us elected 
to be the leader for this epoch?)

**SENTINEL_FAILOVER_STATE_WAIT_PROMOTION**

No need to change the waiting part: Wait for the master role switch in `sentinelRefreshInstanceInfo`.

However, when we detect the change, we need to take care of the clients now. Send:

```
MULTI
CONFIG REWRITE
CLIENT KILL TYPE normal
CLIENT KILL TYPE pubsub
CLIENT UNPAUSE
EXEC
```

To _both_ the former master (now already replica) and the new master (before we
call the client reconf script)

If no switch happens until the failover_timeout is reached, the sentinel failover
will be aborted. The FAILOVER command should already have timed out by this time.

If the sentinel crashes before we can issue the commands above, clients will remain
connected and become unpaused at some point in time. However, once the remaining sentinels
will notice that the former master is not a master anymore they will initiate a
failover to a replica. This will disconnect the clients from the former master.

**SENTINEL_FAILOVER_STATE_RECONF_SLAVES**

No change (We can continue to ignore the former master, as it became a replica
of the new master by FAILOVER. In contrast to the current "SENTINEL FAILOVER", we
don't need to reconfigure this node later)

**SENTINEL_FAILOVER_STATE_UPDATE_CONFIG**

No change



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NEW] Improve "SENTINEL FAILOVER" by using the "FAILOVER" command #13118

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[NEW] Improve "SENTINEL FAILOVER" by using the "FAILOVER" command #13118

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions