Handling reorgs #38

lrettig · 2020-06-02T18:38:09Z

Up to now, @avive and I have been operating under the assumption that we don't need any special handling of reorgs. The various streams can just re-send all updated data and let the client sort it out.

However, we need to think this out a little more thoroughly as there are likely some corner cases we're not currently accounting for. It's a little easier to reason about the mesh, but what happens wrt global state? What if, e.g., an account with a balance disappears after a reorg? Etc.

@talm proposed that a stream sends a special "reorg" token, then dies, as an indication of a reorg. @noamnelke has some ideas for how to handle reorgs that he will share.

avive · 2021-04-17T07:22:28Z

This might be a good way to this - when a 'reorg' event is sent, explorer-like clients should likely try to sync from genesis to get old blocks and transactions possible new state and wallet-like clients should query the api for fresh data for all the entities they store and cache locally. e.g. transactions and rewards.

noamnelke · 2021-05-05T06:40:43Z

Not sure re-syncing from genesis is required (it might become very expensive after a while).

For immutable objects (transactions, ATXs, etc.) we just need to tag them with the layer in which they were added to the database. When a re-org happens we should be able to know the depth of the re-org (how many layers back it affects) and then invalidate all objects created after this layer and recreate them as we re-sync from that layer to the current layer.

For mutable objects, like accounts (balance, nonce, etc.) and contracts, we must be able to recreate the updated state. Intuitively, it seems possible to me, but I need to look at the exact data we collect to know for sure.

lrettig · 2021-05-05T14:52:30Z

For immutable objects (transactions, ATXs, etc.) we just need to tag them with the layer in which they were added to the database. When a re-org happens we should be able to know the depth of the re-org (how many layers back it affects) and then invalidate all objects created after this layer and recreate them as we re-sync from that layer to the current layer.

With respect to the API, though, and especially streams, this would require sending some sort of special "token" on the stream to indicate that a reorg happened, and the depth of the reorg. We then have two options:

restream all updated data since the point of the reorg: this has the downside that it may overwhelm downstream clients that are slow to consume the data
instead, continue to stream only new data, and expect downstream clients to use a historical "query" endpoint to resync all data since the point of the reorg

noamnelke · 2021-05-06T09:17:12Z

Streaming only new data would be considerably harder to implement. The first option is what would happen "automatically".

Correct me if I'm wrong, but as I imagine it when a re-org happens, the node internally invalidates everything back to a certain point and then starts to work back to the "present" from there. This means all the normal calls to the processing methods happen and, unless we change anything, all the streams get everything along the way.

Since this isn't instantaneous (the node has to process everything) I don't see why this is different than when the node is syncing.

lrettig · 2021-05-06T18:11:08Z

You're probably right. It depends how we implement it. That sounds like the most reasonable design to me. In any case, the API design should be isolated from the lower-level implementation. Let's go with this plan for now: we will restream things after a reorg.

Do you think we need the "token" indicating the reorg, and its depth? Or should downstream clients be expected to figure this out for themselves when they see old data being restreamed?

lrettig · 2024-04-26T17:39:00Z

Revisiting this as it's come up in the API v2 design and implementation (#319 (comment)), and adding @kacpersaw. I see two potential approaches here:

As discussed above, send a special token in the stream (which streams exactly? just LayerStream?) indicating that a reorg has occurred, and the layer as of which it occurred. Then end the stream. The client can re-establish the stream and re-download/verify content beyond the reorg point.
Make it easy for the client to detect that a reorg has occurred, and the layer as of which it occurred. I think we need this independent of bullet point (1), i.e., we need both anyway. The issue is that there's no straightforward way to check this today without a cumulative state hash (v2alpha1: Add LayerService with transaction and block definition #319 (comment)).

Perhaps a quick-and-dirty solution here is to implement a cumulative hash that's just a hash of the chain of all previous layer/block hashes, i.e., it doesn't include a state hash yet since we don't yet know how we want to do that (spacemeshos/go-spacemesh#5677) and it may depend upon the VM design anyway. Thoughts?

CC @kacpersaw , @dshulyak

lrettig mentioned this issue Jun 10, 2020

Same tx in multiple layers? #8

Closed

lrettig mentioned this issue Apr 26, 2024

v2alpha1: Add LayerService with transaction and block definition #319

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling reorgs #38

Handling reorgs #38

lrettig commented Jun 2, 2020

avive commented Apr 17, 2021

noamnelke commented May 5, 2021

lrettig commented May 5, 2021

noamnelke commented May 6, 2021

lrettig commented May 6, 2021

lrettig commented Apr 26, 2024

Handling reorgs #38

Handling reorgs #38

Comments

lrettig commented Jun 2, 2020

avive commented Apr 17, 2021

noamnelke commented May 5, 2021

lrettig commented May 5, 2021

noamnelke commented May 6, 2021

lrettig commented May 6, 2021

lrettig commented Apr 26, 2024