-
Notifications
You must be signed in to change notification settings - Fork 945
Closed
Labels
Description
Description
A user on Discord is reporting their VC falling behind on duties when running with a high number of validators (1k), particularly when running the VC and BN on different machines. They're seeing errors like this:
beacon node:
19:33:44.010 WARN Error processing HTTP API request method: GET, path: /eth/v1/validator/duties/proposer/18548, status: 400 Bad Request, elapsed: 2.045724ms
validator:
19:33:44.326 ERRO Failed to download validator duties error: Failed to get proposer indices: ServerMessage(ErrorMessage { code: 400, message: "BAD_REQUEST: requested epoch is 18548 but only current epoch 18549 is allowed", stacktraces: [] }), service: duties
19:33:44.327 WARN Skipping block production for expired slot, info: Your machine could be overloaded, notification_slot: 593567, current_slot: 593568, service: block
I suspect the cause of the issue is the (serial) loading of individual duties every slot, here:
lighthouse/validator_client/src/duties_service.rs
Lines 592 to 610 in eba51f0
| for pubkey in self.validator_store.voting_pubkeys() { | |
| let remote_duties = match ValidatorDuty::download( | |
| &self.beacon_node, | |
| current_epoch, | |
| request_epoch, | |
| pubkey, | |
| ) | |
| .await | |
| { | |
| Ok(duties) => duties, | |
| Err(e) => { | |
| error!( | |
| log, | |
| "Failed to download validator duties"; | |
| "error" => e | |
| ); | |
| continue; | |
| } | |
| }; |
We could improve the situation by requesting duties less often (not as safe), in bulk (not sure if this is supported by the standard API), or in parallel. More thought and testing required.
Version
Lighthouse v0.3.x (presumably)