Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bump news to fix concurrency bug with >1MB frames #4028

Merged
merged 2 commits into from
Aug 25, 2022

Conversation

etan-status
Copy link
Contributor

When calling newPayload on a >1MB payload (can happen post-merge),
news splits up that payload into 1MB chunks. The chunks are each sent
individually, though, with await in-between. This means that when we
send concurrent forkChoiceUpdated calls, that those may end up getting
in-between the newPayload chunks, leading to invalid data being sent.
The EL then returns an error message with a null id entry (as it
could not read the request id due to the mangling) and disconnects.
A PR has been submitted to fix this in news, and merged into status
branch early as this fix is critical for reliable post-merge operation:
Tormund/news#22

When calling `newPayload` on a >1MB payload (can happen post-merge),
`news` splits up that payload into 1MB chunks. The chunks are each sent
individually, though, with `await` in-between. This means that when we
send concurrent `forkChoiceUpdated` calls, that those may end up getting
in-between the `newPayload` chunks, leading to invalid data being sent.
The EL then returns an error message with a `null` `id` entry (as it
could not read the request `id` due to the mangling) and disconnects.
A PR has been submitted to fix this in `news`, and merged into `status`
branch early as this fix is critical for reliable post-merge operation:
Tormund/news#22
@tersec
Copy link
Contributor

tersec commented Aug 25, 2022

We've mostly kept the Status news repo used aligned with upstream, but this is an important enough fix that if we need to, it's okay to push it regardless.

The other possibility is trying to resurrect nim-websock once again, but maybe not the thing to gamble on given the not-great record the last few times.

@github-actions
Copy link

Unit Test Results

       12 files  ±0       864 suites  ±0   1h 16m 28s ⏱️ - 3m 33s
  1 989 tests ±0    1 842 ✔️ ±0  147 💤 ±0  0 ±0 
10 690 runs  ±0  10 500 ✔️ ±0  190 💤 ±0  0 ±0 

Results for commit 50f5486. ± Comparison against base commit b6488d5.

@arnetheduck
Copy link
Member

the workaround being to use http instead, I guess?

@etan-status
Copy link
Contributor Author

I guess, HTTP has a different set of issues. WebSocket based setups should work, though. They are not documented to be broken, and expecting node operators to update their configurations yet another time is not feasible. But yes, experience with HTTP may vary in case of trouble.

@etan-status etan-status enabled auto-merge (squash) August 25, 2022 18:12
@etan-status
Copy link
Contributor Author

Manually tested by doing a full Sepolia sync on empty Geth / Nimbus DB with websocket.

@etan-status etan-status merged commit ebfb624 into unstable Aug 25, 2022
@etan-status etan-status deleted the dev/etan/el-1mbfix branch August 25, 2022 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants