Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid "Failure verifying attestation for gossip" #3266

Closed
mcdee opened this issue Jun 14, 2022 · 3 comments
Closed

Invalid "Failure verifying attestation for gossip" #3266

mcdee opened this issue Jun 14, 2022 · 3 comments
Assignees

Comments

@mcdee
Copy link
Contributor

mcdee commented Jun 14, 2022

Description

Lighthouse generates a "Failure verifying attestation for gossip" when seeing duplicate attestations.

Version

2.3.1

Present Behaviour

Logs are present of the form:

Jun 14 13:26:48.674 ERRO Failure verifying attestation for gossip, attestation_slot: 107534, committee_index: 17, request_index: 3, error: PriorAttestationKnown { validator_index: 98025, epoch: Epoch(3360) }

There are two issues here:

  1. this is logged as an error, however it doesn't appear that this is an error for the network, validator or node.
  2. the error states that this is a verification issue, however the verification appears to be successful and the issue is not propagating a duplicate attestation.

Expected Behaviour

Given that the "error" is that a prior attestation is known, there seems to be no impact on the network, validator or node. As such, it would be preferable for this error (and those similar to it) to be logged at a level below info.

It would also be useful for the message to more closely match the condition i.e. stating that lighthouse has already seen this attestation, rather than it failing verification.

Note that the same issue occurs for sync committee messages:

Jun 14 13:11:24.456 ERRO Failure verifying sync committee signature for gossip, validator_index: 96417, slot: 107457, request_index: 38, error: PriorSyncCommitteeMessageKnown { validator_index: 96417, slot: Slot(107457) }

Steps to resolve

Reduce logging level of message.

@michaelsproul
Copy link
Member

The reason we currently log this as such a loud error is because this is also the sort of thing you'd expect to see while running multiple validator clients with a key (i.e. a slashable configuration). I'm in favour of reducing the noisiness of this log when running with multiple beacon nodes, particularly as it causes all sorts of mess when timeouts and failovers are involved.

Maybe we could keep the current behaviour by default and add a runtime flag for dropping the severity of these messages to debug? Or if that's too complex maybe it's fine to lose the ability to warn loudly, and direct users to use doppelganger protection if they want this sort of assurance.

@mcdee
Copy link
Contributor Author

mcdee commented Jun 25, 2022

I can understand why you would want this to be loud by default, because running multiple beacon nodes is not going to be the common case and those that do should be able to handle another step or two with configuration. I think that a flag to tell the beacon node it is expected to see duplicate operations on the network and via its API, and hence to relax about them with respect to logging (and peer scoring?), would make sense here.

bors bot pushed a commit that referenced this issue Aug 10, 2022
## Issue Addressed

- Resolves #3266

## Proposed Changes

Return 200 OK rather than an error when a block, attestation or sync message is already known.

Presently, we will log return an error which causes a BN to go "offline" from the VCs perspective which causes the fallback mechanism to do work to try and avoid and upcheck offline nodes. This can be observed as instability in the `vc_beacon_nodes_available_count` metric.

The current behaviour also causes scary logs for the user. There's nothing to *actually* be concerned about when we see duplicate messages, this can happen on fallback systems (see code comments).

## Additional Info

NA
@paulhauner
Copy link
Member

Resolved via #3341.

The new behaviour will be to return a 200 OK when duplicate messages are found, effectively declaring that an already-seen message has been successfully broadcast (just not in this call).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants