bump `news` to fix concurrency bug with >1MB frames #4028

etan-status · 2022-08-25T00:12:39Z

When calling newPayload on a >1MB payload (can happen post-merge),
news splits up that payload into 1MB chunks. The chunks are each sent
individually, though, with await in-between. This means that when we
send concurrent forkChoiceUpdated calls, that those may end up getting
in-between the newPayload chunks, leading to invalid data being sent.
The EL then returns an error message with a null id entry (as it
could not read the request id due to the mangling) and disconnects.
A PR has been submitted to fix this in news, and merged into status
branch early as this fix is critical for reliable post-merge operation:
Tormund/news#22

When calling `newPayload` on a >1MB payload (can happen post-merge), `news` splits up that payload into 1MB chunks. The chunks are each sent individually, though, with `await` in-between. This means that when we send concurrent `forkChoiceUpdated` calls, that those may end up getting in-between the `newPayload` chunks, leading to invalid data being sent. The EL then returns an error message with a `null` `id` entry (as it could not read the request `id` due to the mangling) and disconnects. A PR has been submitted to fix this in `news`, and merged into `status` branch early as this fix is critical for reliable post-merge operation: Tormund/news#22

tersec · 2022-08-25T00:49:36Z

We've mostly kept the Status news repo used aligned with upstream, but this is an important enough fix that if we need to, it's okay to push it regardless.

The other possibility is trying to resurrect nim-websock once again, but maybe not the thing to gamble on given the not-great record the last few times.

github-actions · 2022-08-25T04:56:53Z

Unit Test Results

      12 files ±0     864 suites ±0 1h 16m 28s ⏱️ - 3m 33s
  1 989 tests ±0   1 842 ✔️ ±0 147 💤 ±0 0 ❌ ±0
10 690 runs ±0 10 500 ✔️ ±0 190 💤 ±0 0 ❌ ±0

Results for commit 50f5486. ± Comparison against base commit b6488d5.

arnetheduck · 2022-08-25T08:55:39Z

the workaround being to use http instead, I guess?

etan-status · 2022-08-25T17:55:36Z

I guess, HTTP has a different set of issues. WebSocket based setups should work, though. They are not documented to be broken, and expecting node operators to update their configurations yet another time is not feasible. But yes, experience with HTTP may vary in case of trouble.

etan-status · 2022-08-25T18:26:39Z

Manually tested by doing a full Sepolia sync on empty Geth / Nimbus DB with websocket.

arnetheduck approved these changes Aug 25, 2022

View reviewed changes

bump news to avoid unnecessary memory copy

b44c467

etan-status enabled auto-merge (squash) August 25, 2022 18:12

etan-status merged commit ebfb624 into unstable Aug 25, 2022

etan-status deleted the dev/etan/el-1mbfix branch August 25, 2022 21:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bump `news` to fix concurrency bug with >1MB frames #4028

bump `news` to fix concurrency bug with >1MB frames #4028

etan-status commented Aug 25, 2022

tersec commented Aug 25, 2022

github-actions bot commented Aug 25, 2022

arnetheduck commented Aug 25, 2022

etan-status commented Aug 25, 2022

etan-status commented Aug 25, 2022

bump news to fix concurrency bug with >1MB frames #4028

bump news to fix concurrency bug with >1MB frames #4028

Conversation

etan-status commented Aug 25, 2022

tersec commented Aug 25, 2022

github-actions bot commented Aug 25, 2022

Unit Test Results

arnetheduck commented Aug 25, 2022

etan-status commented Aug 25, 2022

etan-status commented Aug 25, 2022

bump `news` to fix concurrency bug with >1MB frames #4028

bump `news` to fix concurrency bug with >1MB frames #4028