When restarting a node, allow the client to restore their state #580

pgrange · 2022-10-25T12:09:50Z

Why

With the current implementation of the backup and restore feature, a restarted hydra node will successfully restore its state but will not emit to the client the events which happened before the restart of the node.

This is a problem for the clients which can't restore their own state when being restarted themselves.

What

When a new client connects to the node, it receive all the events that happened, for this node, in this head, no matter if the did happen before or after a potential restart of this node.

With the events, we provide a way for the client to figure out if it has already seen this event in the past so that the client can ensure idempotent behavior of replayed events. For instance, it can be the case that the client already seen this event in the past and performed some side effect and does not want to perform this side effect twice. Or maybe the client already maintains its own persistence of state and must filter between events already present in its state and new events from its point of view.

How

Persist & replay server outputs, such that
- The network or the chain shall not see any effect of the hydra node replaying the events.
- If several clients connect to the same hydra node, each of them receive the same events.
- After restart, clients can determine that no peers are connected anymore.
Associate some sort of monotonically increasing id to the events so that the client can notice that it has already viewed it in the past.

Acceptance Criteria

When using the TUI, if we open a head and then restart the hydra node and then the TUI, the same state is displayed in the TUI as before the restarting.

Out of Scope

It can be the case that, after some amount of running the head, we get a big amount of events to replay. The question of what to do about that, how to deal with problem related to too many events being replayed is out of scope of this specific issue.

Quantumplation · 2022-10-25T21:31:42Z

It's worth noting that not all events are equal in this space from a consumers perspective. That is, there are some events which we cannot recover from missing, and there are others which we effectively just ignore during playback. From our perspective, (and at first blush, may have miscategorized one or two) here are the events we do/don't care about:

Must replay:
 - ReadyToCommit
 - Committed
 - HeadIsOpen
 - HeadIsClosed
 - HeadIsContested
 - ReadyToFanout
 - HeadIsAborted
 - HeadIsFinalized
 - TxValid
 - SnapshotConfirmed

Unimportant:
 - Greetings
 - PeerConnected
 - PeerDisconnected
 - GetUTXOResponse
 - RolledBack
 - InvalidInput
 - PostTxOnChainFailed
 - CommandFailed
 - TxInvalid
 - TxExpired
 - TxSeen

If you replay all events, it's not a problem, but I thought I would provide that flavor just in case.

As for a monotonic timer for idempotence, that works perfectly.

pgrange · 2022-11-15T16:28:06Z

You raising an excellent point @Quantumplation.

Thinking about is, we realize that It's not obvious to figure out which events should not be sent on restart and which should be. For some, it could seem obvious, of course, but for other, it can be a bit tricky.

Also, we're not sure what the client application does about the events regarding its own state. I mean, it might be the case that the restart of a node would have subtle impact on the client state that would not be possible to implement by just sending it the right events.

Another approach would be to notify the client application about the restart and when it happened so that it can decide what to do with its current state when it sees it. And we could do it with the Greetings event which embark some data that could change between two restart, by the way.

That's the first approach will we try here: send a Greetings message after each restart in the events history.

For instance that could look like that:

Greetings
PeerConnected
Greetings -- restart occurred before event 3 clean your state if needed
PeerConnected
...

pgrange changed the title ~~Replay stored events when restarting a hydra node~~ When restarting a node, allow the customer to restore their state Oct 25, 2022

pgrange changed the title ~~When restarting a node, allow the customer to restore their state~~ When restarting a node, allow the client to restore their state Oct 25, 2022

ffakenz mentioned this issue Nov 14, 2022

replay events to client upon connection #611

Merged

2 tasks

ch1bo assigned pgrange and ch1bo Nov 15, 2022

ffakenz mentioned this issue Nov 18, 2022

Add timestamps and sequence numbers to server output #618

Merged

2 tasks

ch1bo closed this as completed in #618 Nov 18, 2022

ch1bo unassigned pgrange and ch1bo Nov 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When restarting a node, allow the client to restore their state #580

When restarting a node, allow the client to restore their state #580

pgrange commented Oct 25, 2022 •

edited by ch1bo

Loading

Quantumplation commented Oct 25, 2022

pgrange commented Nov 15, 2022

When restarting a node, allow the client to restore their state #580

When restarting a node, allow the client to restore their state #580

Comments

pgrange commented Oct 25, 2022 • edited by ch1bo Loading

Why

What

How

Acceptance Criteria

Out of Scope

Quantumplation commented Oct 25, 2022

pgrange commented Nov 15, 2022

pgrange commented Oct 25, 2022 •

edited by ch1bo

Loading