Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

State-only checkpoint state startup #4251

Merged
merged 3 commits into from
Nov 2, 2022
Merged

State-only checkpoint state startup #4251

merged 3 commits into from
Nov 2, 2022

Conversation

arnetheduck
Copy link
Member

@arnetheduck arnetheduck commented Oct 21, 2022

Currently, we require genesis and a checkpoint block and state to start from an arbitrary slot - this PR relaxes this requirement so that we can start with a state alone.

The current trusted-node-sync algorithm works by first downloading blocks until we find an epoch aligned non-empty slot, then downloads the state via slot.

However, current proposals for checkpointing prefer finalized state as the main reference - this allows more simple access control and caching on the server side - in particular, this should help checkpoint-syncing from sources that have a fast finalized state download (like teku) but are slow when accessing state via slot.

Earlier versions of Nimbus will not be able to read databases created without a checkpoint block and genesis. In most cases, backfilling makes the database compatible except where genesis is also missing (custom networks).

  • backfill checkpoint block from libp2p instead of checkpoint source, when doing trusted node sync
  • allow starting the client without genesis / checkpoint block
  • perform epoch start slot lookahead when loading tail state, so as to deal with the case where the epoch start slot does not have a block
  • replace --blockId with --state-id in TNS command line
  • when replaying, also look at the parent of the last-known-block (even if we don't have the parent block data, we can still replay from a "parent" state) - in particular, this clears the way for implementing state pruning
  • deprecate --finalized-checkpoint-block option (no longer needed)

@github-actions
Copy link

Unit Test Results

       9 files  ±0     681 suites  +3   20m 58s ⏱️ -10s
1 998 tests +1  1 849 ✔️ +1  149 💤 ±0  0 ±0 
8 120 runs  +3  7 947 ✔️ +3  173 💤 ±0  0 ±0 

Results for commit 86469f9. ± Comparison against base commit 655e2fa.

let
missingSlots = block:
var total = 0
for i in 0..<checkpointSlot.int:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, one can avoid this Slot <-> int conversion by writing

for i in 0.Slot ..< checkpointSlot:
  discard dbCache.isKnown(i)

where the rest of the loop aside, i remains a slot the entire time. At some point, there were Nim issues around this construct, so maybe not reliable enough to use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the range of int isn't necessarily the range of Slot -- seems better to at least use uint64 here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

68e5446 - that said, in constructs like for a in 0..<slot.int, there's a real advantage of doing the int conversion up-front in that the loop won't start at all if the slot is off (say because the code is missing a check or has an underflow) meaning less risk of corruption due to side effects in the loop..

@@ -23,13 +23,16 @@ type
summaries: Table[Eth2Digest, BeaconBlockSummary]
slots: seq[Option[Eth2Digest]]

proc updateSlots(cache: var DbCache, slot: Slot) =
if cache.slots.len() < slot.int + 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another int/uint64 thing -- ultimately, it uses setLen in the line below, which means that it's dealing with an int regardless, but still seems better to avoid potentially lossy conversions when feasible, e.g.,

  if cache.slots.lenu64() < slot + 1

The dependent on int is ultimately a dependence on seq and that built-in set of types. My preference is, even if that's going to be the case indefinitely, centralize as much of these type conversions to/from int from not-at-all-int types in one place, even when they're currently not feasible to avoid entirely.

let tmp = StateIdent.decodeString(stateId).valueOr:
error "Cannot decode checkpoint state id, must be a slot, hash, 'finalized' or 'head'"
quit 1
if tmp.kind == StateQueryKind.Slot:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Equivalent to checking for tmp.kind == StateQueryKind.Slot and not tmp.slot.is_epoch() and else, tmp, no?

Seems clearer to avoid unnecessarily nested if statements.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, we require genesis and a checkpoint block and state to start
from an arbitrary slot - this PR relaxes this requirement so that we can
start with a state alone.

The current trusted-node-sync algorithm works by first downloading
blocks until we find an epoch aligned non-empty slot, then downloads the
state via slot.

However, current
[proposals](ethereum/beacon-APIs#226) for
checkpointing prefer finalized state as
the main reference - this allows more simple access control and caching
on the server side - in particular, this should help checkpoint-syncing
from sources that have a fast `finalized` state download (like infura
and teku) but are slow when accessing state via slot.

Earlier versions of Nimbus will not be able to read databases created
without a checkpoint block and genesis. In most cases, backfilling makes
the database compatible except where genesis is also missing (custom
networks).

* backfill checkpoint block from libp2p instead of checkpoint source,
when doing trusted node sync
* allow starting the client without genesis / checkpoint block
* perform epoch start slot lookahead when loading tail state, so as to
deal with the case where the epoch start slot does not have a block
* replace `--blockId` with `--state-id` in TNS command line
* when replaying, also look at the parent of the last-known-block (even
if we don't have the parent block data, we can still replay from a
"parent" state) - in particular, this clears the way for implementing
state pruning
* deprecate `--finalized-checkpoint-block` option (no longer needed)
@arnetheduck arnetheduck enabled auto-merge (squash) November 2, 2022 06:46
@arnetheduck arnetheduck merged commit d839b9d into unstable Nov 2, 2022
@arnetheduck arnetheduck deleted the state-tns branch November 2, 2022 10:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants