Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - Deposit Cache Finalization & Fast WS Sync #2915

Closed
wants to merge 16 commits into from

Conversation

ethDreamer
Copy link
Member

@ethDreamer ethDreamer commented Jan 14, 2022

Summary

The deposit cache now has the ability to finalize deposits. This will cause it to drop unneeded deposit logs and hashes in the deposit Merkle tree that are no longer required to construct deposit proofs. The cache is finalized whenever the latest finalized checkpoint has a new Eth1Data with all deposits imported.

This has three benefits:

  1. Improves the speed of constructing Merkle proofs for deposits as we can just replay deposits since the last finalized checkpoint instead of all historical deposits when re-constructing the Merkle tree.
  2. Significantly faster weak subjectivity sync as the deposit cache can be transferred to the newly syncing node in compressed form. The Merkle tree that stores N finalized deposits requires a maximum of log2(N) hashes. The newly syncing node then only needs to download deposits since the last finalized checkpoint to have a full tree.
  3. Future proofing in preparation for EIP-4444 as execution nodes will no longer be required to store logs permanently so we won't always have all historical logs available to us.

More Details

Image to illustrate how the deposit contract merkle tree evolves and finalizes along with the resulting DepositTreeSnapshot
image

Other Considerations

I've changed the structure of the SszDepositCache so once you load & save your database from this version of lighthouse, you will no longer be able to load it from older versions.

@michaelsproul michaelsproul added the waiting-on-author The reviewer has suggested changes and awaits thier implementation. label Jan 18, 2022
@michaelsproul
Copy link
Member

michaelsproul commented Jan 18, 2022

Looks like a few of the tests aren't compiling currently

@ethDreamer ethDreamer force-pushed the deposit_snapshot branch 5 times, most recently from 04e15b5 to 100e256 Compare January 28, 2022 01:24
@ethDreamer ethDreamer removed the waiting-on-author The reviewer has suggested changes and awaits thier implementation. label Feb 1, 2022
@paulhauner paulhauner added the ready-for-review The code is ready for review label Feb 7, 2022
@michaelsproul michaelsproul self-requested a review March 28, 2022 22:05
@michaelsproul michaelsproul added the backwards-incompat Backwards-incompatible API change label Mar 29, 2022
Copy link
Member

@michaelsproul michaelsproul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found myself really wanting fast deposit sync the other day so I decided to come review this! 🤩

It's an awesome feature and you've done great work specifying and implementing it. I've left some comments that I think will help us get this closer to merging.

Before we merge I'd also like to run this on a lot of nodes (Prater would be good), and get Pawan or Paul to have a look over it as well (we can tag 'em when the time comes).

beacon_node/beacon_chain/src/beacon_chain.rs Outdated Show resolved Hide resolved
beacon_node/beacon_chain/src/beacon_chain.rs Outdated Show resolved Hide resolved
beacon_node/client/src/builder.rs Outdated Show resolved Hide resolved
beacon_node/client/src/builder.rs Outdated Show resolved Hide resolved
beacon_node/client/src/builder.rs Outdated Show resolved Hide resolved
consensus/ssz/src/decode/impls.rs Show resolved Hide resolved
consensus/state_processing/Cargo.toml Outdated Show resolved Hide resolved
beacon_node/beacon_chain/src/eth1_cache.rs Outdated Show resolved Hide resolved
beacon_node/eth1/src/deposit_cache.rs Outdated Show resolved Hide resolved
beacon_node/eth1/src/deposit_cache.rs Outdated Show resolved Hide resolved
@michaelsproul michaelsproul added waiting-on-author The reviewer has suggested changes and awaits thier implementation. and removed ready-for-review The code is ready for review labels Mar 29, 2022
bors bot pushed a commit that referenced this pull request Apr 1, 2022
## Proposed Changes

Add a `lighthouse db` command with three initial subcommands:

- `lighthouse db version`: print the database schema version.
- `lighthouse db migrate --to N`: manually upgrade (or downgrade!) the database to a different version.
- `lighthouse db inspect --column C`: log the key and size in bytes of every value in a given `DBColumn`.

This PR lays the groundwork for other changes, namely:

- Mark's fast-deposit sync (#2915), for which I think we should implement a database downgrade (from v9 to v8).
- My `tree-states` work, which already implements a downgrade (v10 to v8).
- Standalone purge commands like `lighthouse db purge-dht` per #2824.

## Additional Info

I updated the `strum` crate to 0.24.0, which necessitated some changes in the network code to remove calls to deprecated methods.

Thanks to @winksaville for the motivation, and implementation work that I used as a source of inspiration (#2685).
bors bot pushed a commit that referenced this pull request Apr 1, 2022
## Proposed Changes

Add a `lighthouse db` command with three initial subcommands:

- `lighthouse db version`: print the database schema version.
- `lighthouse db migrate --to N`: manually upgrade (or downgrade!) the database to a different version.
- `lighthouse db inspect --column C`: log the key and size in bytes of every value in a given `DBColumn`.

This PR lays the groundwork for other changes, namely:

- Mark's fast-deposit sync (#2915), for which I think we should implement a database downgrade (from v9 to v8).
- My `tree-states` work, which already implements a downgrade (v10 to v8).
- Standalone purge commands like `lighthouse db purge-dht` per #2824.

## Additional Info

I updated the `strum` crate to 0.24.0, which necessitated some changes in the network code to remove calls to deprecated methods.

Thanks to @winksaville for the motivation, and implementation work that I used as a source of inspiration (#2685).
paulhauner pushed a commit to paulhauner/lighthouse that referenced this pull request Apr 4, 2022
## Proposed Changes

Add a `lighthouse db` command with three initial subcommands:

- `lighthouse db version`: print the database schema version.
- `lighthouse db migrate --to N`: manually upgrade (or downgrade!) the database to a different version.
- `lighthouse db inspect --column C`: log the key and size in bytes of every value in a given `DBColumn`.

This PR lays the groundwork for other changes, namely:

- Mark's fast-deposit sync (sigp#2915), for which I think we should implement a database downgrade (from v9 to v8).
- My `tree-states` work, which already implements a downgrade (v10 to v8).
- Standalone purge commands like `lighthouse db purge-dht` per sigp#2824.

## Additional Info

I updated the `strum` crate to 0.24.0, which necessitated some changes in the network code to remove calls to deprecated methods.

Thanks to @winksaville for the motivation, and implementation work that I used as a source of inspiration (sigp#2685).
paulhauner pushed a commit to paulhauner/lighthouse that referenced this pull request May 6, 2022
## Proposed Changes

Add a `lighthouse db` command with three initial subcommands:

- `lighthouse db version`: print the database schema version.
- `lighthouse db migrate --to N`: manually upgrade (or downgrade!) the database to a different version.
- `lighthouse db inspect --column C`: log the key and size in bytes of every value in a given `DBColumn`.

This PR lays the groundwork for other changes, namely:

- Mark's fast-deposit sync (sigp#2915), for which I think we should implement a database downgrade (from v9 to v8).
- My `tree-states` work, which already implements a downgrade (v10 to v8).
- Standalone purge commands like `lighthouse db purge-dht` per sigp#2824.

## Additional Info

I updated the `strum` crate to 0.24.0, which necessitated some changes in the network code to remove calls to deprecated methods.

Thanks to @winksaville for the motivation, and implementation work that I used as a source of inspiration (sigp#2685).
@ethDreamer ethDreamer added ready-for-review The code is ready for review backwards-incompat Backwards-incompatible API change and removed waiting-on-author The reviewer has suggested changes and awaits thier implementation. backwards-incompat Backwards-incompatible API change labels May 16, 2022
Copy link
Member

@michaelsproul michaelsproul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking really good!

There are only two things I'd like to resolve before we merge:

  1. What to do about the database schema change, i.e. this comment. I think you mentioned something in Amsterdam about why this wasn't required, but I didn't follow the reasoning. Do you think we should take the --eth1-purge-cache approach for downgrades? I still prefer the explicit schema migration, which would be v9 -> v10, with a downgrade v10 -> v9. The lighthouse db migrate command is in unstable now, so we'd just need to add the upgrade and downgrade to schema_change.rs. And we'll still have the option of removing them later as in Mac's recent PR [Merged by Bors] - Remove DB migrations for legacy database schemas #3181.
  2. Whether we can impl Encode/Decode for Option<T> (this comment). I'll ping Paul to bring his attention to it.

The other thing to discuss is release timing. I think we could either ship this in v2.3.0 along with the v9 schema upgrade for #3157 so that most user's only perceive a single upgrade v8 -> v10, or we wait until after the merge. Currently there's no urgent push for v2.3.0, so we may be able to get this in before.

beacon_node/eth1/src/deposit_cache.rs Show resolved Hide resolved
Comment on lines 183 to 184
#[tokio::main]
pub async fn first_success_blocking<'a, F, O, R>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reckon let's delete it. Sometimes I put snippets like this in a Github gist in case I ever want to come back to them

};
}
if deposits == (0x1 << level) {
return Ok(MerkleTree::Finalized(finalized_branches[0]));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I agree the way it is now is better

beacon_node/eth1/src/deposit_cache.rs Outdated Show resolved Hide resolved
consensus/merkle_proof/src/lib.rs Show resolved Hide resolved
@michaelsproul michaelsproul added waiting-on-author The reviewer has suggested changes and awaits thier implementation. and removed ready-for-review The code is ready for review labels May 17, 2022
@michaelsproul
Copy link
Member

bors r+

bors bot pushed a commit that referenced this pull request Oct 29, 2022
## Summary

The deposit cache now has the ability to finalize deposits. This will cause it to drop unneeded deposit logs and hashes in the deposit Merkle tree that are no longer required to construct deposit proofs. The cache is finalized whenever the latest finalized checkpoint has a new `Eth1Data` with all deposits imported.

This has three benefits:

1. Improves the speed of constructing Merkle proofs for deposits as we can just replay deposits since the last finalized checkpoint instead of all historical deposits when re-constructing the Merkle tree.
2. Significantly faster weak subjectivity sync as the deposit cache can be transferred to the newly syncing node in compressed form. The Merkle tree that stores `N` finalized deposits requires a maximum of `log2(N)` hashes. The newly syncing node then only needs to download deposits since the last finalized checkpoint to have a full tree.
3. Future proofing in preparation for [EIP-4444](https://eips.ethereum.org/EIPS/eip-4444) as execution nodes will no longer be required to store logs permanently so we won't always have all historical logs available to us.

## More Details

Image to illustrate how the deposit contract merkle tree evolves and finalizes along with the resulting `DepositTreeSnapshot`
![image](https://user-images.githubusercontent.com/37123614/151465302-5fc56284-8a69-4998-b20e-45db3934ac70.png)

## Other Considerations

I've changed the structure of the `SszDepositCache` so once you load & save your database from this version of lighthouse, you will no longer be able to load it from older versions.

Co-authored-by: ethDreamer <37123614+ethDreamer@users.noreply.github.com>
@bors
Copy link

bors bot commented Oct 29, 2022

Build failed:

@michaelsproul
Copy link
Member

dang, when you have a sec @ethDreamer looks like there's some fixups required after the latest refactor

@michaelsproul
Copy link
Member

clippy unhappy as well

@ethDreamer
Copy link
Member Author

@michaelsproul looks like another spurious failure of the windows tests.

@michaelsproul
Copy link
Member

let's try bors!

bors r+

bors bot pushed a commit that referenced this pull request Oct 30, 2022
## Summary

The deposit cache now has the ability to finalize deposits. This will cause it to drop unneeded deposit logs and hashes in the deposit Merkle tree that are no longer required to construct deposit proofs. The cache is finalized whenever the latest finalized checkpoint has a new `Eth1Data` with all deposits imported.

This has three benefits:

1. Improves the speed of constructing Merkle proofs for deposits as we can just replay deposits since the last finalized checkpoint instead of all historical deposits when re-constructing the Merkle tree.
2. Significantly faster weak subjectivity sync as the deposit cache can be transferred to the newly syncing node in compressed form. The Merkle tree that stores `N` finalized deposits requires a maximum of `log2(N)` hashes. The newly syncing node then only needs to download deposits since the last finalized checkpoint to have a full tree.
3. Future proofing in preparation for [EIP-4444](https://eips.ethereum.org/EIPS/eip-4444) as execution nodes will no longer be required to store logs permanently so we won't always have all historical logs available to us.

## More Details

Image to illustrate how the deposit contract merkle tree evolves and finalizes along with the resulting `DepositTreeSnapshot`
![image](https://user-images.githubusercontent.com/37123614/151465302-5fc56284-8a69-4998-b20e-45db3934ac70.png)

## Other Considerations

I've changed the structure of the `SszDepositCache` so once you load & save your database from this version of lighthouse, you will no longer be able to load it from older versions.

Co-authored-by: ethDreamer <37123614+ethDreamer@users.noreply.github.com>
@bors bors bot changed the title Deposit Cache Finalization & Fast WS Sync [Merged by Bors] - Deposit Cache Finalization & Fast WS Sync Oct 30, 2022
@bors bors bot closed this Oct 30, 2022
macladson pushed a commit to macladson/lighthouse that referenced this pull request Jan 5, 2023
## Summary

The deposit cache now has the ability to finalize deposits. This will cause it to drop unneeded deposit logs and hashes in the deposit Merkle tree that are no longer required to construct deposit proofs. The cache is finalized whenever the latest finalized checkpoint has a new `Eth1Data` with all deposits imported.

This has three benefits:

1. Improves the speed of constructing Merkle proofs for deposits as we can just replay deposits since the last finalized checkpoint instead of all historical deposits when re-constructing the Merkle tree.
2. Significantly faster weak subjectivity sync as the deposit cache can be transferred to the newly syncing node in compressed form. The Merkle tree that stores `N` finalized deposits requires a maximum of `log2(N)` hashes. The newly syncing node then only needs to download deposits since the last finalized checkpoint to have a full tree.
3. Future proofing in preparation for [EIP-4444](https://eips.ethereum.org/EIPS/eip-4444) as execution nodes will no longer be required to store logs permanently so we won't always have all historical logs available to us.

## More Details

Image to illustrate how the deposit contract merkle tree evolves and finalizes along with the resulting `DepositTreeSnapshot`
![image](https://user-images.githubusercontent.com/37123614/151465302-5fc56284-8a69-4998-b20e-45db3934ac70.png)

## Other Considerations

I've changed the structure of the `SszDepositCache` so once you load & save your database from this version of lighthouse, you will no longer be able to load it from older versions.

Co-authored-by: ethDreamer <37123614+ethDreamer@users.noreply.github.com>
@dawsbot
Copy link

dawsbot commented Jul 27, 2023

It appears DappNode's sync URL does not yet support this. Is there a recommended alternative URL for DappNode?

WARN Remote BN does not support EIP-4881 fast deposit sync

Dappnode's default URL is https://checkpoint-sync.dappnode.io

@icculp
Copy link

icculp commented Aug 1, 2023

I've tried several urls, the main one and dappnodes and a couple others and I'm getting the same error about 4881

@michaelsproul
Copy link
Member

@dawsbot @icculp You can safely ignore that message, it's just a warning. We're waiting on checkpoint sync providers to update their infra. The mainnet.checkpoint.sigp.io endpoint supports 4881 if you want to try it out

@icculp
Copy link

icculp commented Aug 1, 2023

@michaelsproul it doesn't ignore the error for me. Idk if it's because it's for network gnosis, but it shuts down on that error #4559

@michaelsproul
Copy link
Member

@icculp That's not the true error, there must be something else amiss. If you get an exit code 132 then you need the portable binary. If the checkpoint sync URL is timing out then you can lengthen the timeout with --checkpoint-sync-url-timeout.

@fjvva
Copy link

fjvva commented Aug 3, 2023

Well, i am in the same situation with gnosis. I did try to add the --checkpoint-sync-url-timeout flag, but the result is still the same error:

Aug 03 09:41:05 gnosis lighthouse[382380]: Aug 03 07:41:05.374 INFO Starting checkpoint sync remote_url: https://checkpoint.gnosischain.com/, service: beacon Aug 03 09:41:05 gnosis lighthouse[382380]: Aug 03 07:41:05.492 WARN Remote BN does not support EIP-4881 fast deposit sync, error: Error fetching deposit snapshot from remote: ServerMessage(ErrorMessage { code: 500, message: "not found", stacktraces: [] }), service: beacon Aug 03 09:41:05 gnosis lighthouse[382380]: Aug 03 07:41:05.641 CRIT Failed to start beacon node reason: Error fetching finalized block from remote: ServerMessage(ErrorMessage { code: 500, message: "not found", stacktraces: [] }) Aug 03 09:41:05 gnosis lighthouse[382380]: Aug 03 07:41:05.641 INFO Internal shutdown received reason: Failed to start beacon node Aug 03 09:41:05 gnosis lighthouse[382380]: Aug 03 07:41:05.641 INFO Shutting down.. reason: Failure("Failed to start beacon node") Aug 03 09:41:05 gnosis lighthouse[382380]: Failed to start beacon node Aug 03 09:41:05 gnosis systemd[1]: beacon-chain.service: Main process exited, code=exited, status=1/FAILURE Aug 03 09:41:05 gnosis systemd[1]: beacon-chain.service: Failed with result 'exit-code'.

@michaelsproul
Copy link
Member

@fjvva The issue there is that the endpoint isn't returning a block:

Aug 03 07:41:05.641 CRIT Failed to start beacon node reason: Error fetching finalized block from remote: ServerMessage(ErrorMessage { code: 500, message: "not found", stacktraces: [] })

It's this CRIT (critical) log that is causing the failure, not the warning about 4881.

I asked the Gnosis devs and they're aware of the issue and will update the checkpoint.gnosischain.com server with a bugfix in the next ~24h.

In the meantime, you'd be best off trying another checkpoint sync provider, although for Gnosis there don't seem to be many. I got a different error from gateway.fm's server. Might just need to wait for the fix.

@fjvva
Copy link

fjvva commented Aug 4, 2023

@michaelsproul thanks for the heads up. I guess Gnosis users will just have to wait a bit then.

Edit: If anyone else has the same problem, this checkpoint provider works: https://checkpoint-sync-gnosis.dappnode.io/

@chilcano
Copy link

Hi there,
I changed from --checkpoint-sync-url=https://checkpoint.gnosischain.com to --checkpoint-sync-url=https://checkpoint-sync-gnosis.dappnode.io and still having the same issue Failed to start beacon node, however the error is slightly different.

Aug 10 21:08:16.544 INFO Logging to file                         path: "/var/lib/lighthouse/beacon/logs/beacon.log"
Aug 10 21:08:16.544 INFO Lighthouse started                      version: Lighthouse/v4.2.0-c547a11
Aug 10 21:08:16.544 INFO Configured for network                  name: gnosis
Aug 10 21:08:16.546 INFO Data directory initialised              datadir: /var/lib/lighthouse
Aug 10 21:08:16.546 INFO Deposit contract                        address: 0x0b98057ea310f4d31f2a452b414647007d1645d9, deploy_block: 19469077
Aug 10 21:08:16.722 INFO Starting checkpoint sync                remote_url: https://checkpoint.gnosischain.com/, service: beacon
Aug 10 21:08:16.850 WARN Remote BN does not support EIP-4881 fast deposit sync, error: Error fetching deposit snapshot from remote: ServerMessage(ErrorMessage { code: 415, message: "unsupported content-type: application/octet-stream", stacktraces: [] }), service: beacon
Aug 10 21:08:16.885 CRIT Failed to start beacon node             reason: Unable to parse SSZ: OffsetSkipsVariableBytes(388). Ensure the checkpoint-sync-url refers to a node for the correct network
Aug 10 21:08:16.885 INFO Internal shutdown received              reason: Failed to start beacon node
Aug 10 21:08:16.885 INFO Shutting down..                         reason: Failure("Failed to start beacon node")
Failed to start beacon node
Aug 10 21:44:04.213 INFO Logging to file                         path: "/var/lib/lighthouse/beacon/logs/beacon.log"
Aug 10 21:44:04.214 INFO Lighthouse started                      version: Lighthouse/v4.2.0-c547a11
Aug 10 21:44:04.214 INFO Configured for network                  name: gnosis
Aug 10 21:44:04.215 INFO Data directory initialised              datadir: /var/lib/lighthouse
Aug 10 21:44:04.216 INFO Deposit contract                        address: 0x0b98057ea310f4d31f2a452b414647007d1645d9, deploy_block: 19469077
Aug 10 21:44:04.268 INFO Starting checkpoint sync                remote_url: https://checkpoint-sync-gnosis.dappnode.io/, service: beacon
Aug 10 21:44:04.726 CRIT Failed to start beacon node             reason: Unable to parse SSZ: OffsetSkipsVariableBytes(388). Ensure the checkpoint-sync-url refers to a node for the correct network
Aug 10 21:44:04.727 INFO Internal shutdown received              reason: Failed to start beacon node
Aug 10 21:44:04.729 INFO Shutting down..                         reason: Failure("Failed to start beacon node")
Failed to start beacon node

I'm using sigp/lighthouse:v4.2.0.

I appreciate your support.
Regards.

@michaelsproul
Copy link
Member

@chilcano You need Lighthouse v4.3.0, because the Shapella hard fork has happened on Gnosis.

@chilcano
Copy link

Thanks @michaelsproul
Just updated and worked !!
Thanks again.
Regards.

@sigp sigp locked as off-topic and limited conversation to collaborators Aug 10, 2023
@michaelsproul
Copy link
Member

I'll lock this thread now so that it doesn't continue as a generic checkpoint sync debugging thread.

If anyone else stumbles upon the EIP-4881 warning or can't get checkpoint sync to work, please open a new issue or contact us on Discord.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backwards-incompat Backwards-incompatible API change ready-for-merge This PR is ready to merge. v3.3.0 Minor release following v3.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants