Skip to content

Conversation

@michaelsproul
Copy link
Member

Proposed Changes

@potuz on the Eth R&D Discord observed that Lighthouse blocks on Pyrmont were always arriving at other nodes after at least 1 second. Part of this could be due to processing and slow propagation, but metrics also revealed that the Lighthouse nodes were usually taking 400-600ms to even just produce a block before broadcasting it.

I tracked the slowness down to the lack of a pre-built tree hash cache (THC) on the states being used for block production. This was due to using the head state for block production, which lacks a THC in order to keep fork choice fast (cloning a THC takes at least 30ms for 100k validators). This PR modifies block production to clone a state from the snapshot cache rather than the head, which speeds things up by 200-400ms by avoiding the tree hash cache rebuild. In practice this seems to have cut block production time down to 300ms or less. Ideally we could remove the snapshot from the cache (and save the 30ms), but it is required for when we re-process the block after signing it with the validator client.

Alternatives

I experimented with 2 alternatives to this approach, before deciding on it:

  • Alternative 1: ensure the head has a tree hash cache. This is too slow, as it imposes a +30ms hit on fork choice, which currently takes ~5ms (with occasional spikes).
  • Alternative 2: use Arc<BeaconSnapshot> in the snapshot cache and share snapshots between the cache and the head. This made fork choice blazing fast (1ms), and block production the same as in this PR, but had a negative impact on block processing which I don't think is worth it. It ended up being necessary to clone the full state from the snapshot cache during block production, imposing the +30ms penalty there as well as in block production.

In contract, the approach in this PR should only impact block production, and it improves it! Yay for pareto improvements 🎉

Additional Info

This commit (ac59dfa) is currently running on all the Lighthouse Pyrmont nodes, and I've added a dashboard to the Pyrmont grafana instance with the metrics.

In future work we should optimise the attestation packing, which consumes around 30-60ms and is now a substantial contributor to the total.

@michaelsproul
Copy link
Member Author

will fix up the tests on Monday

@michaelsproul
Copy link
Member Author

This updated graph from @potuz shows this PR (ac59dfa) clearly outperforming v1.0.4 (1abc70e). The other commits were intermediate versions that I ran while testing.

lighthouse1

@michaelsproul
Copy link
Member Author

Ready.

validator_graffiti: Option<Graffiti>,
) -> Result<BeaconBlockAndState<T::EthSpec>, BlockProductionError> {
let state = self
.state_at_slot(slot - 1, StateSkipConfig::WithStateRoots)
Copy link
Member

@paulhauner paulhauner Dec 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you aware that the slot - 1 has been removed and we will no longer be able to produce blocks from slots earlier than the head block?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I did that intentionally, I'll message you

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I've restored the slot - 1 with a warning, as we discussed.

I think this will be particularly relevant on the first slot of an epoch when there might be two seemingly legitimate proposers because of propagation delay of the last block of the previous epoch.

Copy link
Member

@paulhauner paulhauner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, it's good to be doing less hashing!

I'm happy to merge this into unstable, regardless of the nit.

"message" => "this block is more likely to be orphaned",
"slot" => slot,
);
self.state_at_slot(slot - 1, StateSkipConfig::WithStateRoots)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does feel like this and L1783-1874 should be de-duped, but I wont block on it.

@michaelsproul
Copy link
Member Author

bors r+

bors bot pushed a commit that referenced this pull request Dec 21, 2020
## Proposed Changes

`@potuz` on the Eth R&D Discord observed that Lighthouse blocks on Pyrmont were always arriving at other nodes after at least 1 second. Part of this could be due to processing and slow propagation, but metrics also revealed that the Lighthouse nodes were usually taking 400-600ms to even just produce a block before broadcasting it.

I tracked the slowness down to the lack of a pre-built tree hash cache (THC) on the states being used for block production. This was due to using the head state for block production, which lacks a THC in order to keep fork choice fast (cloning a THC takes at least 30ms for 100k validators). This PR modifies block production to clone a state from the snapshot cache rather than the head, which speeds things up by 200-400ms by avoiding the tree hash cache rebuild. In practice this seems to have cut block production time down to 300ms or less. Ideally we could _remove_ the snapshot from the cache (and save the 30ms), but it is required for when we re-process the block after signing it with the validator client.

## Alternatives

I experimented with 2 alternatives to this approach, before deciding on it:

* Alternative 1: ensure the `head` has a tree hash cache. This is too slow, as it imposes a +30ms hit on fork choice, which currently takes ~5ms (with occasional spikes).
* Alternative 2: use `Arc<BeaconSnapshot>` in the snapshot cache and share snapshots between the cache and the `head`. This made fork choice blazing fast (1ms), and block production the same as in this PR, but had a negative impact on block processing which I don't think is worth it. It ended up being necessary to clone the full state from the snapshot cache during block production, imposing the +30ms penalty there _as well_ as in block production.

In contract, the approach in this PR should only impact block production, and it improves it! Yay for pareto improvements 🎉

## Additional Info

This commit (ac59dfa) is currently running on all the Lighthouse Pyrmont nodes, and I've added a dashboard to the Pyrmont grafana instance with the metrics.

In future work we should optimise the attestation packing, which consumes around 30-60ms and is now a substantial contributor to the total.
@bors
Copy link

bors bot commented Dec 21, 2020

@bors bors bot changed the title Optimise tree hash caching for block production [Merged by Bors] - Optimise tree hash caching for block production Dec 21, 2020
@bors bors bot closed this Dec 21, 2020
@paulhauner
Copy link
Member

paulhauner commented May 14, 2021

This branch can potentially be deleted :)

@michaelsproul michaelsproul deleted the optimise-block-proposal branch May 14, 2021 04:59
@michaelsproul
Copy link
Member Author

done 😇

@michaelsproul michaelsproul added the consensus An issue/PR that touches consensus code, such as state_processing or block verification. label Nov 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A0 consensus An issue/PR that touches consensus code, such as state_processing or block verification. ready-for-review The code is ready for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants