Reduce size of futures in HTTP API to prevent stack overflows #5104

michaelsproul · 2024-01-22T06:21:15Z

Issue Addressed

Closes #5080.

Proposed Changes

Use Box::pin on a few futures in publish_block. This reduces the size of related futures from 100KB to ~33KB.
Use Arcs to reduce the amount of data on the stack in publish_block and related functions. This reduces the size of the publish block futures from ~33KB to ~16KB.

One of the reasons for the stack size blowup in futures is described in this blog post: https://swatinem.de/blog/future-size/.

Even with that knowledge, I still found it quite hard to determine interventions that would be effective, and did a lot of trial and error changing things and re-running the type-size analysis.

The commands I used for testing were:

RUSTFLAGS='-Zprint-type-sizes' cargo +nightly build --release > type_sizes.txt

Followed by post-processing:

cat type_sizes.txt | grep 'print-type-size type:' | sed 's/print-type-size type: `\(.*\)`: \([0-9]*\) bytes, alignment.*/"\1",\2/' | rg "warp" > output.csv

Additional Info

There's still more we could do here. Some types like the PreProcessingSnapshot end up on the stack during block publishing, and are quite large due to containing unarced blocks and states.

There's also an outstanding mystery as to how 100KB futures can cause an 8MB stack to overflow, when only they should only be nested ~1 at a time. Either I've misunderstood what happens when futures poll sub-futures recursively, or there's something in warp that creates the amplification.

NOTE: I have yet to measure whether the boxing and arcing has an impact on performance. It would be good to monitor metrics for block publication time after deploying this branch.

A backtrace implicating warp can be found here: https://gist.github.com/michaelsproul/d0a4b93bb9d21a5be1103acf4c0a3753.

Thanks to @dapplion for pairing on this and helping with some of the early progress.

Squashed commit of the following: commit 4d51a99 Author: Michael Sproul <michael@sigmaprime.io> Date: Mon Jan 22 17:08:43 2024 +1100 Arc the blocks early in publication commit 8a749db Author: Michael Sproul <michael@sigmaprime.io> Date: Mon Jan 22 15:19:57 2024 +1100 Box::pin a few big futures

jxs

LGTM Michael :)

jxs · 2024-01-22T11:22:22Z

beacon_node/beacon_chain/src/test_utils.rs

@@ -866,14 +866,14 @@ where
            panic!("Should always be a full payload response");
        };

-        let signed_block = block_response.block.sign(
+        let signed_block = Arc::new(block_response.block.sign(


is this test only code? If so do we also have stack overflows during tests?
(Question applies to all the uses in this file)

yeah this is just for tests, I ended up having to make quite a few changes to make the types consistent. There's a bit of inefficient Arc-ing and de-Arc-ing, but it's just in tests so it shouldn't be too bad

Jimmy got some stack overflows in tests, but only in debug mode

jxs · 2024-01-22T11:24:58Z

beacon_node/http_api/src/publish_blocks.rs

@@ -380,6 +379,10 @@ pub async fn reconstruct_block<T: BeaconChainTypes>(
        None
    };

+    // Perf: cloning the block here to unblind it is a little sub-optimal. This is considered an
+    // acceptable tradeoff to avoid passing blocks around on the stack (unarced), which blows up


Suggested change

// acceptable tradeoff to avoid passing blocks around on the stack (unarced), which blows up

// acceptable tradeoff to avoid passing blocks around on the stack (unArc'ed), which blows up

I kinda like the lowercase spelling

paulhauner

Nice! Simple changes that seem to make a lot of difference. It was an impressive research effort to get to the bottom of it.

paulhauner · 2024-01-23T04:31:58Z

I'll merge this because there doesn't seem to be anything substantial outstanding and @michaelsproul said it's ready for "review/merge".

) * Box::pin a few big futures * Arc the blocks early in publication * Fix more tests

michaelsproul added 2 commits January 22, 2024 17:09

Box::pin a few big futures

8a749db

Arc the blocks early in publication

4d51a99

michaelsproul added bug Something isn't working optimization Something to make Lighthouse run more efficiently. v4.6.0 ETA Q1 2024 labels Jan 22, 2024

paulhauner mentioned this pull request Jan 22, 2024

v4.6.0 #5091

Closed

2 tasks

jxs reviewed Jan 22, 2024

View reviewed changes

Fix more tests

6447142

michaelsproul added the ready-for-review The code is ready for review label Jan 23, 2024

paulhauner approved these changes Jan 23, 2024

View reviewed changes

paulhauner merged commit a403138 into sigp:unstable Jan 23, 2024
27 of 28 checks passed

This was referenced Jan 23, 2024

Stack overflow in POST /eth/v*/beacon/blocks with SSZ bodies #5080

Closed

SSZ block V2 stack overflow #5076

Closed

michaelsproul deleted the http-stack-overflow-fix branch January 23, 2024 04:52

dapplion mentioned this pull request Jan 29, 2024

Assume Content-Type is json for endpoints that require json #4575

Merged

danielramirezch pushed a commit to danielramirezch/lighthouse that referenced this pull request Feb 14, 2024

Reduce size of futures in HTTP API to prevent stack overflows (sigp#5104

1e55af5

) * Box::pin a few big futures * Arc the blocks early in publication * Fix more tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce size of futures in HTTP API to prevent stack overflows #5104

Reduce size of futures in HTTP API to prevent stack overflows #5104

michaelsproul commented Jan 22, 2024

jxs left a comment

jxs Jan 22, 2024

michaelsproul Jan 23, 2024

jxs Jan 22, 2024

michaelsproul Jan 23, 2024

paulhauner left a comment

paulhauner commented Jan 23, 2024

	// acceptable tradeoff to avoid passing blocks around on the stack (unarced), which blows up
	// acceptable tradeoff to avoid passing blocks around on the stack (unArc'ed), which blows up

Reduce size of futures in HTTP API to prevent stack overflows #5104

Reduce size of futures in HTTP API to prevent stack overflows #5104

Conversation

michaelsproul commented Jan 22, 2024

Issue Addressed

Proposed Changes

Additional Info

jxs left a comment

Choose a reason for hiding this comment

jxs Jan 22, 2024

Choose a reason for hiding this comment

michaelsproul Jan 23, 2024

Choose a reason for hiding this comment

jxs Jan 22, 2024

Choose a reason for hiding this comment

michaelsproul Jan 23, 2024

Choose a reason for hiding this comment

paulhauner left a comment

Choose a reason for hiding this comment

paulhauner commented Jan 23, 2024