Checkpoint Sync 5/5 - end-to-end test #10387

kasey · 2022-03-17T04:42:57Z

What type of PR is this?

Other

What does this PR do? Why is it needed?

This PR adds a step to the existing end-to-end test to perform retrieve a checkpoint state+block from one of the running nodes once all nodes are in their final synced state.

Which issues(s) does this PR fix?

Fixes #

Other notes for review

Due to the way the checkpoint sync period is calculated and the small number of epochs in the end-to-end, this test will currently always fetch the genesis state. It would probably be better for it to fetch HEAD-1, but that would require modifying ws calculation. Alternatively we could tweak the parameters of the checkpoint sync algorithm (like the safety coefficient) to try to shorten the interval to just an epoch or so, but I need to look at the math again to check how many epochs e2e would need to run for this to be feasible.

michaelsproul · 2022-03-23T04:05:46Z

I just built Prysm using this branch and tried to checkpoint sync against a Lighthouse node, but it failed due to the absence of the /eth/v1/beacon/weak_subjectivity endpoint.

[2022-03-23 14:39:04]  INFO requesting http://localhost:7052/eth/v1/beacon/weak_subjectivity
[2022-03-23 14:39:05] ERROR main: Error retrieving checkpoint origin state and block: unexpected API response for prysm-only weak subjectivity checkpoint API: code=405, url=http://localhost:7052/eth/v1/beacon/weak_subjectivity, body=response body:
{"code":405,"message":"METHOD_NOT_ALLOWED","stacktraces":[]}: did not receive 2xx response from API

I noticed that you try to detect a 404 from the BN and fallback to a backwards-compatible impl if it doesn't exist, but Lighthouse returns a 405 for non-existent HTTP methods (this is our bad, I've opened an issue here to track that: sigp/lighthouse#3112). It might be cool to handle the 405 in the meantime 🙏

After trying Lighthouse I tried Teku via Infura, and got a 401 Unauthorized, which I think is due to Prysm not passing along the project_id:project_secret (username and password) from the first part of the URL:

./beacon-chain --checkpoint-sync-url "https://project_id:project_secret@eth2-beacon-mainnet.infura.io"
...
[2022-03-23 14:47:42]  INFO requesting http://eth2-beacon-mainnet.infura.io/eth/v1/beacon/weak_subjectivity
[2022-03-23 14:47:45] ERROR main: Error retrieving checkpoint origin state and block: unexpected API response for prysm-only weak subjectivity checkpoint API: code=401, url=https://eth2-beacon-mainnet.infura.io:443/eth/v1/beacon/weak_subjectivity, body=response body:
project id required
: did not receive 2xx response from API

rauljordan · 2022-04-05T17:16:26Z

Hi @kasey curious about the status of this one and if there's anything I can do to help

kasey · 2022-04-05T17:20:10Z

I just built Prysm using this branch and tried to checkpoint sync against a Lighthouse node, but it failed due to the absence of the /eth/v1/beacon/weak_subjectivity endpoint.
[2022-03-23 14:39:04]  INFO requesting http://localhost:7052/eth/v1/beacon/weak_subjectivity
[2022-03-23 14:39:05] ERROR main: Error retrieving checkpoint origin state and block: unexpected API response for prysm-only weak subjectivity checkpoint API: code=405, url=http://localhost:7052/eth/v1/beacon/weak_subjectivity, body=response body:
{"code":405,"message":"METHOD_NOT_ALLOWED","stacktraces":[]}: did not receive 2xx response from API
I noticed that you try to detect a 404 from the BN and fallback to a backwards-compatible impl if it doesn't exist, but Lighthouse returns a 405 for non-existent HTTP methods (this is our bad, I've opened an issue here to track that: sigp/lighthouse#3112). It might be cool to handle the 405 in the meantime pray

After trying Lighthouse I tried Teku via Infura, and got a 401 Unauthorized, which I think is due to Prysm not passing along the project_id:project_secret (username and password) from the first part of the URL:
./beacon-chain --checkpoint-sync-url "https://project_id:project_secret@eth2-beacon-mainnet.infura.io"
...
[2022-03-23 14:47:42]  INFO requesting http://eth2-beacon-mainnet.infura.io/eth/v1/beacon/weak_subjectivity
[2022-03-23 14:47:45] ERROR main: Error retrieving checkpoint origin state and block: unexpected API response for prysm-only weak subjectivity checkpoint API: code=401, url=https://eth2-beacon-mainnet.infura.io:443/eth/v1/beacon/weak_subjectivity, body=response body:
project id required
: did not receive 2xx response from API

Hi @michaelsproul, thanks for trying this out! I opened a bug last week on the basic auth issue #10467 and plan to handle 405 equivalent to 404. Sorry this PR got a little stalled, I had to take some personal time off in the last week - I will ping you when these issues are resolved.

kasey · 2022-05-26T13:59:30Z

@michaelsproul we just merged PR #10723 which significantly changes the way our checkpoint sync works. Instead of trying to fetch the block from the beginning of the latest weak subjectivity period, we simply fetch the latest finalized block, working on the assumption that this is preferable in terms of sync speed. This is also more compatible with Infura, which fails to serve the state-by-slot request we need to accompany the checkpoint block, but does a fine job fetching genesis and finalized states + block by root. I was convinced by Adrian Sutton that this is generally just as safe as the other method given that users can confirm the fetched roots are canonical using a trusted beacon API. Please give this a try and let me know how it works for you.

kasey · 2022-05-26T14:00:17Z

Closing this PR as checkpoint sync behavior changed heavily in #10723 which includes its own e2e test.

kasey requested a review from a team as a code owner March 17, 2022 04:42

kasey requested review from jmozah, rauljordan, uncdr, a team, prestonvanloon, potuz and terencechain and removed request for a team, jmozah, rauljordan, uncdr, prestonvanloon, potuz and terencechain March 17, 2022 04:42

kasey force-pushed the 4-checkpoint-sync-runtime branch from 5be0fa9 to a4e0afd Compare March 18, 2022 18:20

kasey force-pushed the 5-checkpoint-sync-endtoend branch from 164b6c7 to 34571cd Compare March 18, 2022 18:21

kasey force-pushed the 4-checkpoint-sync-runtime branch from a4e0afd to d17c1c7 Compare March 25, 2022 14:42

kasey requested review from rkapka and nisdas as code owners March 25, 2022 14:42

kasey force-pushed the 4-checkpoint-sync-runtime branch 3 times, most recently from feb5120 to c1a8f56 Compare March 28, 2022 18:26

delete-merged-branch bot deleted the branch develop March 28, 2022 21:02

kasey force-pushed the 5-checkpoint-sync-endtoend branch from 34571cd to 23baedc Compare April 5, 2022 17:50

kasey changed the base branch from 4-checkpoint-sync-runtime to develop April 5, 2022 18:09

kasey added 18 commits May 3, 2022 14:29

remove deleted dep

bb2433d

update ee patch to use merged block fetcher

2faafce

fixing tests that create duplicate fork versions

1c25b1c

filter out web3signer test for now

5fe318f

skipping all other tests except cp sync

01eca18

adding missing deps to mainnet bazel package

1f69c55

gaz

3d0e4a4

fix race test fork collision; cp sync - more time

3de12f6

more wiggle room for checkpoint sync node

dc7f6a2

yolo in some extra peer headroom

5f91ee5

separate contexts for deadline and request

ab5d5ea

gofmt

4a1b730

see if CI failure is somehow specific to bellatrix

b83482c

single source of bellatrix

06e9289

revert bellatrix to epoch 8 for testing

b43663c

push away bellatrix again for testing

e090f22

restore bellatrix fork epoch

f7f0253

post-merge fixes

03fba41

kasey force-pushed the 5-checkpoint-sync-endtoend branch from 39c2df1 to 03fba41 Compare May 3, 2022 19:29

kasey added 6 commits May 4, 2022 14:52

config test fixes

d7b254b

Merge branch 'develop' into 5-checkpoint-sync-endtoend

8b8c97c

lint cleanup

38adeaf

unskip all the e2e tests

f2dc2f9

switch to getBlock results in different err result

b9a050e

remove redundant config override

0c99026

kasey mentioned this pull request May 4, 2022

Checkpoint sync fixes #10630

Merged

kasey closed this May 26, 2022

terencechain deleted the 5-checkpoint-sync-endtoend branch May 13, 2024 04:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Checkpoint Sync 5/5 - end-to-end test #10387

Checkpoint Sync 5/5 - end-to-end test #10387

kasey commented Mar 17, 2022

michaelsproul commented Mar 23, 2022

rauljordan commented Apr 5, 2022

kasey commented Apr 5, 2022 •

edited

Loading

kasey commented May 26, 2022

kasey commented May 26, 2022

Checkpoint Sync 5/5 - end-to-end test #10387

Checkpoint Sync 5/5 - end-to-end test #10387

Conversation

kasey commented Mar 17, 2022

michaelsproul commented Mar 23, 2022

rauljordan commented Apr 5, 2022

kasey commented Apr 5, 2022 • edited Loading

kasey commented May 26, 2022

kasey commented May 26, 2022

kasey commented Apr 5, 2022 •

edited

Loading