Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rolling back stake pools take ages #1281

Closed
KtorZ opened this issue Jan 17, 2020 · 3 comments
Closed

Rolling back stake pools take ages #1281

KtorZ opened this issue Jan 17, 2020 · 3 comments
Assignees

Comments

@KtorZ
Copy link
Member

KtorZ commented Jan 17, 2020

Context

Information -
Version v2020-12-23+
Platform All
Installation N/A

Steps to Reproduce

  1. Create a standalone private blockchain (e.g. using the integration config).
  2. Turn on the wallet server and wait for a few slots / epochs.
  3. Turn the wallet off.
  4. Change a setting in the genesis configuration and re-generate the genesis block (effectively creating another blockchain).
  5. Restart the server on the same state directory.

Expected behavior

  1. In cases of fork, the stake pool monitor can quickly rollback to a known point and keep syncing normally.

Actual behavior

  1. The pool monitor rolls back roughly 10 by 10 until it finds a common ancestor.
[cardano-wallet.stake-pool-monitor:Info:362] [2020-01-17 18:09:53.65 UTC] Monitoring stake pools. Currently at 351628d6-[96.3233#64202]
[cardano-wallet.stake-pool-monitor:Info:362] [2020-01-17 18:09:53.89 UTC] Monitoring stake pools. Currently at 04fc30db-[96.3073#64193]
[cardano-wallet.stake-pool-monitor:Info:362] [2020-01-17 18:09:54.66 UTC] Monitoring stake pools. Currently at 70bf9891-[96.2978#64184]
[cardano-wallet.stake-pool-monitor:Info:362] [2020-01-17 18:09:55.17 UTC] Monitoring stake pools. Currently at edbf30a9-[96.2861#64175]
[cardano-wallet.stake-pool-monitor:Info:362] [2020-01-17 18:09:55.68 UTC] Monitoring stake pools. Currently at 105f158c-[96.2785#64166]
[cardano-wallet.stake-pool-monitor:Info:362] [2020-01-17 18:09:56.19 UTC] Monitoring stake pools. Currently at 3c67a40b-[96.2681#64157]

Resolution

Why is this happening? Because:

a) The stake pool monitoring doesn't clean up old checkpoints.
b) In case where the chain follower is unable to find a suitable intersection, it'll instrument the client to enter in a "recovery" mode, by first, asking it to rollback to the oldest known checkpoint in the current window. If there's no older checkpoint, it recover from genesis.

Because the stake pool monitoring doesn't clean up old checkpoints and because a new window is used after every rollback, the monitoring algorithm will make small hops of 10 blocks by 10 blocks until it finds an intersection point, which could possibly be many many blocks in the past.

In theory, this shouldn't be an issue if the chain follower is at least of length k = epoch stability, but since Jörmungandr doesn't abide by the k parameter, an intersection point can be arbitrarily far in the past.

Possible resolutions:

a. Clean up old checkpoints in the stake pool monitoring to allow rolling back to genesis when relevant ==> hard to do correctly without deeper changes in the database schemas (we need the block height for this).

b. Tweak the chain follower such that the recovery doesn't first attempt to rollback to the latest known checkpoint, but instead, directly try to rollback to genesis (the rationale is that, this first attempt to rollback to the latest known checkpoint is more of a desperate move to avoid rolling back from genesis...) ==> not 100% sure about the consequences and UX for wallets.

c. Increase the size of the window for Jörmungandr to be arbitrarily bigger (haskell nodes uses k=2160). ==> #1285


QA

  • Reproducing steps as described aboved shows that resyncing back to genesis now happens with bigger gaps (10x bigger) and is faster.
@piotr-iohk
Copy link
Contributor

Looks good in terms of stake pools.

Question: is this expected? 👇

I've repeated steps above slightly modified:

  • Create a standalone private blockchain (e.g. using the integration config).
  • Turn on the wallet server and wait for a few slots / epochs.
  • Create a wallet
  • Turn the wallet off.
  • Change a setting in the genesis configuration and re-generate the genesis block (effectively creating another blockchain).
  • Restart the server on the same state directory.

For version v2020-12-23:
This works fine and I can then use the wallet after the rollback.

For current master
I get:

[cardano-wallet.wallet-engine:Error:29] [2020-01-21 11:04:31.30 UTC] Worker has exited unexpectedly: ErrCheckIntegrityDifferentGenesis (Hash {getHash = "\216\163\131S\189W\b]\213+\139U\169Ue\205W\215\158\SI\131}\151ed\ACK\250\186O~\233\&7"}) (Hash {getHash = "<\a\ETX\SO6\191\255\230~..\192\158R\147\211\132c|\210\240\EOT5n\243 \243\254lR\EOT\SUB"})

and then:

$ cardano-wallet-jormungandr wallet list
Something went wrong

@KtorZ
Copy link
Member Author

KtorZ commented Jan 21, 2020

That the wallet worker exits, is normal and the expected behavior. The problem is the following:

If the genesis hashes are different, it means that we are so-to-speak on a different blockchain. So we have two choices:

  • We could rollback the whole wallet state and silently resync to this new chain.
  • We could prevent the wallet from starting because it corresponds to a different chain.

I've chosen to go for the second (also seconded by @jonathanknowles on this decision). The rationale is that, if it's a different chain, it's also a different wallet even if the mnemonic is the same. It's also best to be explicit because, you may be surprise that nothing was reported on the server but it seems that transactions are missing from your wallet or whatever... Not realizing you are on the wrong chain can take some debugging time, whereas here, the worker explictely exits telling you that the genesis hashes were different.


Having said that.... the "Something went wrong" when listing wallets afterwards is not cool and not really expected. Seems like the workers being terminated has some effect on the API result which really shouldn't be the case. We should look into this, although pretty minor.

@piotr-iohk
Copy link
Contributor

OK. Issue created -> #1292.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants