Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test suite for attestation service timing #886

Closed
AgeManning opened this issue Mar 3, 2020 · 5 comments
Closed

Test suite for attestation service timing #886

AgeManning opened this issue Mar 3, 2020 · 5 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@AgeManning
Copy link
Member

Description

The naive attestation aggregation strategy involves quite a bit of timing logic. Depending on connected validators, a beacon node needs to subscribe to a set of long-lived subnets for a period of time. On top of this, the node needs to subscribe, unsubscribe and search for new peers when a validator needs to attest on a specific slot.

There is significant logic and edge cases that can occur here. Getting the timing of these events wrong could mean a validator missing attestations and loss of money.

A collection of tests should be built on the attester_service to ensure the it is behaving as expected.

If you are looking at picking this issue up, hit up @AgeManning either here or our discord for further direction

@AgeManning AgeManning added enhancement New feature or request good first issue Good for newcomers labels Mar 3, 2020
@paulhauner paulhauner added this to To Do (Must-Have) in Security Review Mar 4, 2020
@realbigsean
Copy link
Member

Hey, I'd be interested in picking this up.

@AgeManning
Copy link
Member Author

AgeManning commented Mar 9, 2020

Hey @realbigsean, thanks for the interest and the help!

This I've tried to make fairly localised and is a good start to learning some localised sections of the code base.

This issue in particular deals with an object I've called the attester_service. This currently exists on the naive_attestation_aggregation branch (which I will clean up so that tests can run fine on this branch).

The attestation_service can be found here: https://github.com/sigp/lighthouse/blob/naive-attestation-aggregation/beacon_node/network/src/attestation_service/mod.rs

In it's current (incomplete) form it has a single public function: https://github.com/sigp/lighthouse/blob/naive-attestation-aggregation/beacon_node/network/src/attestation_service/mod.rs#L115

This takes in a list of subscriptions then builds various timeouts for events that need to happen. The attester_service then needs to be regularly poll'd in order for the events to be emitted.

I imagine there are a few bugs in the current implentation, and it would be nice to find them by building some tests, that create an attestation_service give it a few subscriptions and see if it outputs events as expected.

To do this, we need to first create the service. This can be done with a dummy BeaconChain and NetworkGlobals. I'd recommend looking at these tests as an example of how to build a dummy beacon chain: https://github.com/sigp/lighthouse/blob/naive-attestation-aggregation/beacon_node/network/src/service/tests.rs#L38
And a NetworkGlobals can just be created with a random peer id, i.e PeerId::random()

Next, in order to do some tests, I need to explain the logic of what this is supposed to be doing.

When a validator connects to the beacon node, it looks two epochs in advance to see when it needs to submit an attestation. Note that a validator client can have many validators, so there could be many subscriptions, ultimately in this service you will just see a list of subscriptions for all kinds of validators and they come in at assorted times.

The general premise is that validators on a given slot, need to listen and obtain attestations from a particular gossipsub topic. This means the validator needs to search for peers that may be on this subnet, connect to them, subscribe to the topics and then unsubscribe after the slot. The discovery needs to happen no more than 1 epoch in advance and we should subscribe 1/3 of a slot prior to the slot (to allow for the subscription to reach our peers). We should then unsubscribe from the slot.

The above is the general principle. However to complicate things further, for each validator connected, the beacon node needs to subscribe to random_subnets_per_validator for a long-lived time epochs_per_random_subnet_subscription. See here: https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/validator.md#phase-0-attestation-subnet-stability for the spec.

There are only 64 subnets. So if 64 validators are connected, we should be subscribed to all subnets and shouldn't disconnect from any. The logic here should also account for the case that if we are already subscribed to a long-lived subnet, it shouldn't need to discover new peers or send a subscription request.

As there is quite a bit of logic going on here, some thought about what to test should also be considered. Some initial things to get started are some of the logic cases I mentioned above plus:

  • A new validator subscription should create a subscription to a long-lived subnet (if there are any left, and not if we are already subscribed to everything)
  • subscriptions for a subnet that is not long-lived should unsubscribe after a slot
  • If all local subscriptions are complete, we should only be subscribed to random_subnet_per_validator * seen_validator subnets. i.e we have unsubscribed from all unnecessary ones
  • Two subscriptions for the same slot, subnet should not produce multiple requests for discovery/subscription
  • Subscriptions for consecutive slots on the same subnet should not subscribe, unsubscribe then re-subscribe. I.e we should just stay connected to the subnet for consecutive slots. This might need to be tested for not just two, perhaps three.

Possibly a lot more, but this might do to start it off.

Let me know if you need help starting or building a framework to start the testing.

We're probably going to need to adjust the times, and slot definition to make the tests run faster than using realistic times. We can deal with this when we get to it i guess

@realbigsean
Copy link
Member

Thanks for the detailed explanation! I've started working on this and will let you know when I have some basic scenarios working, or if I run into any issues.

@divagant-martian
Copy link
Collaborator

I think this one can be closed?

@AgeManning
Copy link
Member Author

Yep, resolved in #1070

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
No open projects
Security Review
To Do (Must-Have)
Development

No branches or pull requests

3 participants