Skip to content

Reduce validator client duty traffic #1828

@michaelsproul

Description

@michaelsproul

Description

A user on Discord is reporting their VC falling behind on duties when running with a high number of validators (1k), particularly when running the VC and BN on different machines. They're seeing errors like this:

beacon node:
19:33:44.010 WARN Error processing HTTP API request       method: GET, path: /eth/v1/validator/duties/proposer/18548, status: 400 Bad Request, elapsed: 2.045724ms

validator:
19:33:44.326 ERRO Failed to download validator duties     error: Failed to get proposer indices: ServerMessage(ErrorMessage { code: 400, message: "BAD_REQUEST: requested epoch is 18548 but only current epoch 18549 is allowed", stacktraces: [] }), service: duties

19:33:44.327 WARN Skipping block production for expired slot, info: Your machine could be overloaded, notification_slot: 593567, current_slot: 593568, service: block

I suspect the cause of the issue is the (serial) loading of individual duties every slot, here:

for pubkey in self.validator_store.voting_pubkeys() {
let remote_duties = match ValidatorDuty::download(
&self.beacon_node,
current_epoch,
request_epoch,
pubkey,
)
.await
{
Ok(duties) => duties,
Err(e) => {
error!(
log,
"Failed to download validator duties";
"error" => e
);
continue;
}
};

We could improve the situation by requesting duties less often (not as safe), in bulk (not sure if this is supported by the standard API), or in parallel. More thought and testing required.

Version

Lighthouse v0.3.x (presumably)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions