Slot rollback take longer than syncing the db #397

PapaSwer · 2020-11-05T06:31:38Z

After starting cardano-db-sync:

[db-sync-node:Info:31] [2020-11-05 05:02:14.10 UTC] Starting chainSyncClient
[db-sync-node:Info:31] [2020-11-05 05:02:14.15 UTC] Cardano. Db tip is at slot 7066517, block 4618318
[db-sync-node:Info:36] [2020-11-05 05:02:14.15 UTC] Running DB thread
[db-sync-node:Info:36] [2020-11-05 05:02:17.60 UTC] Rolling back to slot 6640000, hash dc3a7980681d38a4bbcbc1bec5f4e6f74d755848ee07889857ef83e78ca6765b
[db-sync-node:Info:36] [2020-11-05 05:02:18.03 UTC] Deleting slots numbered: [7066517..6640025]
[db-sync-node:Info:39] [2020-11-05 05:02:34.74 UTC] getHistoryInterpreter: acquired
-------------------FREEZE-------------------------------

Even after 24h no change is made.

The text was updated successfully, but these errors were encountered:

erikd · 2020-11-05T07:32:41Z

If you look at top or similar you will see that both postgres and db-sync are working really hard.

Whenever the number of slots to roll back is large (like you have there) it will be slow due to #256 .

The solution (while #256 remains unfixed) is to drop and recreate the database. That will take significantly less than 24m hours.

rcmorano · 2020-11-05T15:38:33Z

Problem for me with this is that it won't ever catch up tip because it falls into one of these rollbacks no matter how many times I've tried a sync from the scratch :(

erikd · 2020-11-05T20:48:40Z

The only way this rollback could happen while syncing is if db-sync or the node it is connected to is being stopped and restarted (and even then is unlikely).

If there is a problem, I need to be able to recreate it. Unfortunately, I have synced from scratch at least 50 times and I have not seen anything like this even once.

rcmorano · 2020-11-05T23:54:08Z

The only way this rollback could happen while syncing is if db-sync or the node it is connected to is being stopped and restarted (and even then is unlikely).

If there is a problem, I need to be able to recreate it. Unfortunately, I have synced from scratch at least 50 times and I have not seen anything like this even once.

Maybe also because of some postgres crash/hang up? I've been trying to sync on very low resources instances and also on this "weird" setup where I share the socket through TCP with socat (as a I told you about on some other issue), so yeah in my case postgres/cardano-node probably hung up for too long on some query, restarted... A myriad of bad things could have happened, but somehow 5.0.2 version managed to recover from those issues but on 5.0.3+ I haven't been able to reach tip :(

I've upgraded my instances to catch up tip and will then downgrade. Crossing fingers :D

erikd · 2020-11-06T01:39:03Z

Maybe also because of some postgres crash/hang up?

Yes, that is possible.

I've been trying to sync on very low resources instances and also on this "weird" setup where I share the socket through TCP with socat

That could easily be the problem. Also, if you do this, you MUST say so in the ticket. .

rcmorano · 2020-11-06T12:40:47Z

That could easily be the problem. Also, if you do this, you MUST say so in the ticket. .

I did on a different one (#304), but IMHO it's not very relevant to this one since it's the cause for the rollback to happen; not for the rollback to be slow.

erikd · 2020-11-09T05:36:41Z

I deal with a large number of tickets and it is easy to forget minor details. If there is a special circumstance with a ticket, that circumstance MUST be noted in every ticket where it is relevant.

Secondly, if db-sync is running in a resource constrained environment, problems that only occur in that resource constrained environment are highly likely to be caused by that environment and any further problems are likely due the first one.

Finally, for any problem to be fixed, I must be provided with sufficient information to recreate the problem. In most cases, if I cannot reproduce the problem, I cannot fix the problem.

xdzurman · 2020-11-10T10:19:25Z

I can confirm that this happens whenever I need to restart db-sync 6.0.0 or start it from a copied postgres db. Notice that it wants to roll back to slot 0.

[db-sync-node:Info:72] [2020-11-10 10:05:59.84 UTC] Starting chainSyncClient
[db-sync-node:Info:72] [2020-11-10 10:05:59.86 UTC] Cardano.Db tip is at slot 13436216, block 4932157
[db-sync-node:Info:76] [2020-11-10 10:05:59.86 UTC] Running DB thread
[db-sync-node:Info:76] [2020-11-10 10:05:59.92 UTC] Rolling back to slot 0, hash f0f7892b5c333cffc4b3c4344de48af4cc63f55e44936196f365a9ef2244134f
[db-sync-node:Info:79] [2020-11-10 10:05:59.94 UTC] getHistoryInterpreter: acquired
[db-sync-node:Info:76] [2020-11-10 10:08:02.65 UTC] Deleting slots numbered: [13436216..1]

Db-sync 5.0.3's output looks the following:
[db-sync-node:Info:79] [2020-11-10 10:02:10.00 UTC] Starting chainSyncClient
[db-sync-node:Info:79] [2020-11-10 10:02:10.49 UTC] Cardano.Db tip is at slot 13432357, block 4931976
[db-sync-node:Info:84] [2020-11-10 10:02:10.49 UTC] Running DB thread
[db-sync-node:Info:84] [2020-11-10 10:02:10.52 UTC] Rolling back to slot 13432357, hash 8ed30798ffe1fb981f7b4876b136d727c7cf71b136894101457232ad82fc8237
[db-sync-node:Info:84] [2020-11-10 10:02:10.52 UTC] Deleting blocks numbered: []
[db-sync-node:Info:87] [2020-11-10 10:02:10.54 UTC] getHistoryInterpreter: acquired
[db-sync-node:Info:84] [2020-11-10 10:02:17.90 UTC] insertShelleyBlock: epoch 228, slot 13436133, block 4932153, hash ee5692ae029858c895d71d254a8b35078c46437eb44d4c77dc3844d2fdc105a1
[db-sync-node:Info:84] [2020-11-10 10:02:21.84 UTC] insertShelleyBlock: epoch 228, slot 13436136, block 4932154, hash 46ad2af4eb675e3a9f6054d0998020c9827234ddb7b7460c7469502ad834473e
...
You can see that it's not rolling to slot 0 and starts right away.

What's the problem @erikd ?

erikd · 2020-11-10T11:08:54Z

whenever I need to restart db-sync 6.0.0 or start it from a copied postgres db

I totally admit that it will happen in the later case.

For the former case, I have personally restarted it (with the existing database and compatible state dir) hundreds of times without a rollback.

The important distinction is that there is a directory of state information (as specified by --state-dir). If that data is missing or from an incompatible db-sync version, it will try to roll back the database.

xdzurman · 2020-11-10T11:12:41Z

I'm starting it in docker compose similarly to https://github.com/input-output-hk/cardano-rest/blob/master/docker-compose.yml#L40, how do you specify the state dir?

Edit: found the answer at 6187081

erikd · 2020-11-11T09:38:10Z

I have a PR which adds better logging. Its not a solution but should be more informative.

sjlee1125 · 2020-11-12T02:54:00Z

I had same circumstance after upgrading 6.0.0 (I used docker-compose.yaml in cardano-rest repository)

only difference between cardano-db-sync and cardano-rest is 6187081#diff-e45e45baeda1c1e73482975a664062aa56f20c03dd9d64a827aba57775bed0d3

after adding that and recreating postgres, problem disappear

erikd · 2020-11-12T05:14:39Z

@sjlee1125 if something is missing from the cardano-rest docker specification, you should raise a ticket on that repo to make sure it gets fixed.

Mercurial · 2020-12-15T23:00:10Z

mine took 10 hours to rollback then I just stopped and resynced because that is way faster

Closes: #397 Closes: #256

rhyslbw mentioned this issue Nov 12, 2020

fix: add Docker volume to persist cardano-db-sync internal state input-output-hk/cardano-rest#104

Merged

kderme mentioned this issue Nov 30, 2020

Add Migration for efficient rollback #414

Merged

erikd pushed a commit that referenced this issue Jan 5, 2021

Delete Cascade Migration

8b21f06

Closes: #397 Closes: #256

erikd closed this as completed in 1227676 Jan 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slot rollback take longer than syncing the db #397

Slot rollback take longer than syncing the db #397

PapaSwer commented Nov 5, 2020 •

edited by erikd

erikd commented Nov 5, 2020

rcmorano commented Nov 5, 2020

erikd commented Nov 5, 2020 •

edited

rcmorano commented Nov 5, 2020

erikd commented Nov 6, 2020

rcmorano commented Nov 6, 2020 •

edited

erikd commented Nov 9, 2020 •

edited

xdzurman commented Nov 10, 2020 •

edited

erikd commented Nov 10, 2020

xdzurman commented Nov 10, 2020 •

edited

erikd commented Nov 11, 2020

sjlee1125 commented Nov 12, 2020 •

edited

erikd commented Nov 12, 2020

Mercurial commented Dec 15, 2020

Slot rollback take longer than syncing the db #397

Slot rollback take longer than syncing the db #397

Comments

PapaSwer commented Nov 5, 2020 • edited by erikd

erikd commented Nov 5, 2020

rcmorano commented Nov 5, 2020

erikd commented Nov 5, 2020 • edited

rcmorano commented Nov 5, 2020

erikd commented Nov 6, 2020

rcmorano commented Nov 6, 2020 • edited

erikd commented Nov 9, 2020 • edited

xdzurman commented Nov 10, 2020 • edited

erikd commented Nov 10, 2020

xdzurman commented Nov 10, 2020 • edited

erikd commented Nov 11, 2020

sjlee1125 commented Nov 12, 2020 • edited

erikd commented Nov 12, 2020

Mercurial commented Dec 15, 2020

PapaSwer commented Nov 5, 2020 •

edited by erikd

erikd commented Nov 5, 2020 •

edited

rcmorano commented Nov 6, 2020 •

edited

erikd commented Nov 9, 2020 •

edited

xdzurman commented Nov 10, 2020 •

edited

xdzurman commented Nov 10, 2020 •

edited

sjlee1125 commented Nov 12, 2020 •

edited