Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorg resistance #2320

Merged
merged 18 commits into from
Aug 10, 2023
Merged

Reorg resistance #2320

merged 18 commits into from
Aug 10, 2023

Conversation

raphjaph
Copy link
Collaborator

@raphjaph raphjaph commented Aug 7, 2023

If a reorg is detected it rolls back the database 6 blocks and reindexes from there.

I had to disable some test optimisations so no the tests run quite a bit longer. Will try to find a way to get those back.

@raphjaph raphjaph mentioned this pull request Aug 7, 2023
src/index.rs Outdated Show resolved Hide resolved
@raphjaph raphjaph marked this pull request as ready for review August 7, 2023 17:01
@raphjaph
Copy link
Collaborator Author

raphjaph commented Aug 7, 2023

@victorkirov if you want to have a look here :)

@victorkirov
Copy link
Contributor

victorkirov commented Aug 8, 2023

@raphjaph looks good overall. Some things to consider though:

  • if a reorg happens and it's more than 6 blocks deep, we won't be able to recover, so maybe it's worth keeping the reorged status
  • I think save points are pretty expensive to create and keep. I ended up getting OOM exception on my cluster when I kept 10 savepoints, granted I took a save point every 5 blocks, so things were different. I'd suggest only taking savepoints when you're close to the head block. I also ended up making a savepoint every 5 blocks (block height %5 == 0) and keeping just 2 savepoints. That gives you protection for 10-14 blocks and uses fewer savepoints.

@veryordinally
Copy link
Collaborator

@raphjaph looks good overall. Some things to consider though:

  • if a reorg happens and it's more than 6 blocks deep, we won't be able to recover, so maybe it's worth keeping the reorged status

Outside of an attack scenario the likelihood of a more than 6-block reorg are extremely low, and in Bitcoin's history we haven't seen it. But taking a snapshot only every few blocks seems like a good tradeoff. See https://blog.lopp.net/how-many-bitcoin-confirmations-is-enough/ for the math behind worst-case scenarios in an attack.

  • I think save points are pretty expensive to create and keep. I ended up getting OOM exception on my cluster when I kept 10 savepoints, granted I took a save point every 5 blocks, so things were different. I'd suggest only taking savepoints when you're close to the head block. I also ended up making a savepoint every 5 blocks (block height %5 == 0) and keeping just 2 savepoints. That gives you protection for 10-14 blocks and uses fewer savepoints.

I'll do some more measurements, but I haven't seen significant memory impact.

@victorkirov
Copy link
Contributor

victorkirov commented Aug 8, 2023

Outside of an attack scenario the likelihood of a more than 6-block reorg are extremely low, and in Bitcoin's history we haven't seen it. But taking a snapshot only every few blocks seems like a good tradeoff. See https://blog.lopp.net/how-many-bitcoin-confirmations-is-enough/ for the math behind worst-case scenarios in an attack.

True, the only time there has been a reorg with a lot of blocks was during a fork. Having the status there would be in case something like that happens, otherwise, I think it would currently silently fail to continue indexing.

I'll do some more measurements, but I haven't seen significant memory impact.

Strange. It's possible that I did something wrong 😅 I need to read up on how the savepoints work a bit more.

…5. This gives us a savepoint every 5 blocks when

initial indexing is completed, and every 5000 blocks during initial indexing.
Only keep 2 savepoints, as that allows us under this scheme to go back by 10+ blocks.
Copy link
Collaborator

@veryordinally veryordinally left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed and tested extensively. (Partially) incorporating suggestion from Victor.

@veryordinally
Copy link
Collaborator

Outside of an attack scenario the likelihood of a more than 6-block reorg are extremely low, and in Bitcoin's history we haven't seen it. But taking a snapshot only every few blocks seems like a good tradeoff. See https://blog.lopp.net/how-many-bitcoin-confirmations-is-enough/ for the math behind worst-case scenarios in an attack.

True, the only time there has been a reorg with a lot of blocks was during a fork. Having the status there would be in case something like that happens, otherwise, I think it would currently silently fail to continue indexing.
I've incorporated your suggestion to do savepoints only every 5 blocks, and keep 2, so we can go back by 10+. If we get a longer reorg, I'd say we have a problem quite more severe than the ord indexer stopping 😅 so not sure it's worth handling that case. We could just panic at that point (I would personally panic myself, I think)

I'll do some more measurements, but I haven't seen significant memory impact.

Strange. It's possible that I did something wrong 😅 I need to read up on how the savepoints work a bit more.

How much RAM do you have on the machines in the cluster? This may be an issue in more memory constrained environments.

@victorkirov
Copy link
Contributor

How much RAM do you have on the machines in the cluster? This may be an issue in more memory constrained environments.

We had 32Gb with a bitcoin node, electrs API and 2 instances of Ord running. I upped it to 64Gb and seems a lot more stable now.

Comment on lines 660 to 671
let savepoints = wtx.list_persistent_savepoints()?.collect::<Vec<u64>>();

if savepoints.len() >= 2 {
wtx.delete_persistent_savepoint(savepoints.into_iter().min().unwrap())?;
}

Index::increment_statistic(&wtx, Statistic::Commits, 2)?;
wtx.commit()?;
let wtx = self.index.begin_write()?;
log::debug!("creating savepoint at height {}", self.height);
wtx.persistent_savepoint()?;
wtx.commit()?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just 2 things to note here:

  1. This first checks if there are more than 2 savepoints and deletes extra ones, then it creates a new one. That means that at the end you'd end up with 3 savepoints until it runs again. Maybe it should first create a savepoint and then do the greater than or equal to 2 check and clean.
  2. Since this is in the commit function, it will only run on a full commit of an update run and could potentially contain more than 1 block in the commit. In that case, the %5 check could potentially skip over a savepoint, but this would only realistically happen on initial indexing, or starting up after being down for a few blocks, or after recovering from a reorg.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • This first checks if there are more than 2 savepoints and deletes extra ones, then it creates a new one. That means that at the end you'd end up with 3 savepoints until it runs again. Maybe it should first create a savepoint and then do the greater than or equal to 2 check and clean.

If first checks if there are 2 or more checkpoints, and if so, deletes the oldest one. That means if somehow we ever get to more than 2 checkpoints, we would never reduce them down to 2. But the normal case is that we have 0, 1, or 2. When we have 0 or 1, we just add a new one. When we already have two, we delete the oldest one, and add a new one. We could change the if statement in line 662 to a loop ... Not sure that's worth it, as the only way to get more than 2 savepoints in the beginning is by manually creating them or running code that creates more than two.

  • Since this is in the commit function, it will only run on a full commit of an update run and could potentially contain more than 1 block in the commit. In that case, the %5 check could potentially skip over a savepoint, but this would only realistically happen on initial indexing, or starting up after being down for a few blocks, or after recovering from a reorg.

Not sure I understand. In the initial indexing, commit is called every 5k blocks, so always on blocks divisible by 5000, which are also divisible by 5. In all other cases, commit() is called whenever the indexer reaches the tip, so it will eventually hit a block that meets the condition. I am not concerned about going back a few blocks more than absolutely required.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If first checks if there are 2 or more checkpoints, and if so, deletes the oldest one. That means if somehow we ever get to more than 2 checkpoints, we would never reduce them down to 2. But the normal case is that we have 0, 1, or 2. When we have 0 or 1, we just add a new one. When we already have two, we delete the oldest one, and add a new one. We could change the if statement in line 662 to a loop ... Not sure that's worth it, as the only way to get more than 2 savepoints in the beginning is by manually creating them or running code that creates more than two.

Sorry, I didn't describe the issue properly. It's not really an issue, just that in the normal case there will be 3 savepoints instead of 2. When it runs, if there are 3 checkpoints, it'll delete one bringing it down to 2, and then it'll create a new one. That results in there always being 3 check points in the normal case. If, instead, we first create a checkpoint and then delete the oldest one, we would have 2 in the normal case. Not an issue, just bringing it up in case you want to have 2 points in the normal case.

Not sure I understand. In the initial indexing, commit is called every 5k blocks, so always on blocks divisible by 5000, which are also divisible by 5. In all other cases, commit() is called whenever the indexer reaches the tip, so it will eventually hit a block that meets the condition. I am not concerned about going back a few blocks more than absolutely required.

The user can do a ctrl+c on block 3 in the initial indexing (for arguments sake) and then the commits will happen on blocks 5003 + 5000n. Also, in the case of them doing a ctrl+c on block 802316 and then starting up again on block 802321, then a savepoint wouldn't be created. In the worst case for the web server, if a reorg happens and we jump back 5 blocks, then we end up reindexing those 5 blocks and whatever is new, we could potentially jump over a savepoint before we start doing the single block index. I know that the chances of 2 reorgs happening in such a short timespan are low, but it is possible.

The other argument for this is people using Ord not in web-server mode with continuous indexing, instead, updating their index on demand when they execute a command. If they happen to never execute a command on a block divisible by 5, they would never get a savepoint.

The solution for this is to take the savepoint logic into the update loop and execute the savepoint creation there, or to keep the logic here, but also add a condition in that loop to commit on blocks divisible by 5 that are less than 10 (maybe a bit more) blocks away from the head block.

@raphjaph
Copy link
Collaborator Author

raphjaph commented Aug 9, 2023

@victorkirov @veryordinally

I've consolidated most of the logic into one file. I've added back the /status endpoint for unrecoverable reorgs (atm 14 blocks). Also added an error type for that. I haven't incorporated the comments from above yet. Looking at it now


let mut wtx = index.begin_write()?;

let oldest_savepoint =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this, an alternative to consider is to rollback to the latest savepoint and delete it. That would only roll you back 1-4 blocks instead of 10-14. If it's not far enough, then when the updater runs again it will hit a reorg again and rollback to the older savepoint automatically.

@@ -130,7 +117,7 @@ impl Updater {
self.commit(wtx, value_cache)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should be a commit test as part of this if block. Something like:

let shouldCommitForSavepoint = starting_height - self.height < 15 && (self.height + uncommitted) % 5 === 0;

if uncommitted == 5000 || shouldCommitForSavepoint {
  self.commit(wtx, value_cache)?;

Disclaimer: Terrible variable naming and the logic might be off by 1 😋

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe a helper function in the reorg file

@victorkirov
Copy link
Contributor

victorkirov commented Aug 9, 2023

I've consolidated most of the logic into one file. I've added back the /status endpoint for unrecoverable reorgs (atm 14 blocks). Also added an error type for that. I haven't incorporated the comments from above yet. Looking at it now

Reorg in its own file is great 👍 Nice that it's contained in one, central place

@victorkirov
Copy link
Contributor

victorkirov commented Aug 10, 2023

Just a heads up, maybe run this branch for a few days to check memory usage. These are our 4 instances that have been running for 2 days. The top 2 are creating 2 savepoints, similar to this PR but with 10 blocks per savepoint, while the bottom 2 are not. Could be a memory issue in redb with savepoints?

image

@raphjaph
Copy link
Collaborator Author

Just a heads up, maybe run this branch for a few days to check memory usage. These are our 4 instances that have been running for 2 days. The top 2 are creating 2 savepoints, similar to this PR but with 10 blocks per savepoint, while the bottom 2 are not. Could be a memory issue in redb with savepoints?

image

To alleviate this a bit, we only create savepoints if we are 100 blocks from the tip. So inital sync should not incur a performance hit.

@victorkirov
Copy link
Contributor

To alleviate this a bit, we only create savepoints if we are 100 blocks from the tip. So inital sync should not incur a performance hit.

I'm doing that as well. I actually only start about 25 blocks away from the head.

Here's an even better example, I restarted all 4 instances about an hour ago, this is their RAM usage over the last hour:
image

There are 2 instances per line, the top line being the 2 instances with savepoints and the bottom without. Maybe just run the same test with this branch and the main branch for an hour or 2 to confirm. I might've done something else wrong which isn't present in this PR.

@raphjaph
Copy link
Collaborator Author

raphjaph commented Aug 10, 2023

I think more RAM usage represents an acceptable trade-off for increased resilience. Would be great to see what you are doing exactly on your branch. I'm just going to merge this for now but if you find an improvement definitely open a PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants