Skip to content

Conversation

@michaelsproul
Copy link
Member

Proposed Changes

Merges unstable into capella. This PR should not be squash merged.

I had to patch up the block verification stuff from #3794 to work with the new abstract payload types. I did not add RLP withdrawal root verification, so blockHash verification is disabled until we add that (tracking issue: #3870).

Currently the capella branch isn't compiling because of a compile error introduced by combining the Arbitrary PR with historic summaries. This PR will fix that up as well.

AgeManning and others added 9 commits January 5, 2023 17:18
Our custom RPC implementation is lagging from the libp2p v50 version. 

We are going to need to change a bunch of function names and would be nice to have consistent ordering of function names inside the handlers. 

This is a precursor to the libp2p upgrade to minimize merge conflicts in function ordering.
I've needed to do this work in order to do some episub testing. 

This version of libp2p has not yet been released, so this is left as a draft for when we wish to update.

Co-authored-by: Diva M <divma@protonmail.com>
## Issue Addressed

Recent discussions with other client devs about optimistic sync have revealed a conceptual issue with the optimisation implemented in sigp#3738. In designing that feature I failed to consider that the execution node checks the `blockHash` of the execution payload before responding with `SYNCING`, and that omitting this check entirely results in a degradation of the full node's validation. A node omitting the `blockHash` checks could be tricked by a supermajority of validators into following an invalid chain, something which is ordinarily impossible.

## Proposed Changes

I've added verification of the `payload.block_hash` in Lighthouse. In case of failure we log a warning and fall back to verifying the payload with the execution client.

I've used our existing dependency on `ethers_core` for RLP support, and a new dependency on Parity's `triehash` crate for the Merkle patricia trie. Although the `triehash` crate is currently unmaintained it seems like our best option at the moment (it is also used by Reth, and requires vastly less boilerplate than Parity's generic `trie-root` library).

Block hash verification is pretty quick, about 500us per block on my machine (mainnet).

The optimistic finalized sync feature can be disabled using `--disable-optimistic-finalized-sync` which forces full verification with the EL.

## Additional Info

This PR also introduces a new dependency on our [`metastruct`](https://github.com/sigp/metastruct) library, which was perfectly suited to the RLP serialization method. There will likely be changes as `metastruct` grows, but I think this is a good way to start dogfooding it.

I took inspiration from some Parity and Reth code while writing this, and have preserved the relevant license headers on the files containing code that was copied and modified.
## Proposed Changes

Add the latest long-running Gnosis chain bootnodes provided to us by the Gnosis team.
…3728)

## Issue Addressed

NA

## Proposed Changes

Myself and others (sigp#3678) have observed  that when running with lots of validators (e.g., 1000s) the cardinality is too much for Prometheus. I've seen Prometheus instances just grind to a halt when we turn the validator monitor on for our testnet validators (we have 10,000s of Goerli validators). Additionally, the debug log volume can get very high with one log per validator, per attestation.

To address this, the `bn --validator-monitor-individual-tracking-threshold <INTEGER>` flag has been added to *disable* per-validator (i.e., non-aggregated) metrics/logging once the validator monitor exceeds the threshold of validators. The default value is `64`, which is a finger-to-the-wind value. I don't actually know the value at which Prometheus starts to become overwhelmed, but I've seen it work with ~64 validators and I've seen it *not* work with 1000s of validators. A default of `64` seems like it will result in a breaking change to users who are running millions of dollars worth of validators whilst resulting in a no-op for low-validator-count users. I'm open to changing this number, though.

Additionally, this PR starts collecting aggregated Prometheus metrics (e.g., total count of head hits across all validators), so that high-validator-count validators still have some interesting metrics. We already had logging for aggregated values, so nothing has been added there.

I've opted to make this a breaking change since it can be rather damaging to your Prometheus instance to accidentally enable the validator monitor with large numbers of validators. I've crashed a Prometheus instance myself and had a report from another user who's done the same thing.

## Additional Info

NA

## Breaking Changes Note

A new label has been added to the validator monitor Prometheus metrics: `total`. This label tracks the aggregated metrics of all validators in the validator monitor (as opposed to each validator being tracking individually using its pubkey as the label).

Additionally, a new flag has been added to the Beacon Node: `--validator-monitor-individual-tracking-threshold`. The default value is `64`, which means that when the validator monitor is tracking more than 64 validators then it will stop tracking per-validator metrics and only track the `all_validators` metric. It will also stop logging per-validator logs and only emit aggregated logs (the exception being that exit and slashing logs are always emitted).

These changes were introduced in sigp#3728 to address issues with untenable Prometheus cardinality and log volume when using the validator monitor with high validator counts (e.g., 1000s of validators). Users with less than 65 validators will see no change in behavior (apart from the added `all_validators` metric). Users with more than 65 validators who wish to maintain the previous behavior can set something like `--validator-monitor-individual-tracking-threshold 999999`.
## Issue Addressed

sigp#3795


Co-authored-by: realbigsean <sean@sigmaprime.io>
## Proposed Changes

Update all dependencies to new semver-compatible releases with `cargo update`. Importantly this patches a Tokio vuln: https://rustsec.org/advisories/RUSTSEC-2023-0001. I don't think we were affected by the vuln because it only applies to named pipes on Windows, but it's still good hygiene to patch.
## Issue Addressed

NA

## Proposed Changes

Bump versions

## Additional Info

- [x] ~~Blocked on sigp#3728, sigp#3801~~
- [x] ~~Blocked on sigp#3866~~
- [x] Requires additional testing
Fixing the conflicts involved patching up some of the `block_hash` verification,
the rest will be done as part of sigp#3870
@michaelsproul michaelsproul added ready-for-review The code is ready for review capella labels Jan 12, 2023
Copy link
Member

@ethDreamer ethDreamer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@michaelsproul michaelsproul merged commit b2c2d31 into sigp:capella Jan 12, 2023
@michaelsproul michaelsproul deleted the capella-v3.4.0 branch January 12, 2023 05:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

capella ready-for-review The code is ready for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants