Test suite for attestation service timing #886

AgeManning · 2020-03-03T13:32:52Z

Description

The naive attestation aggregation strategy involves quite a bit of timing logic. Depending on connected validators, a beacon node needs to subscribe to a set of long-lived subnets for a period of time. On top of this, the node needs to subscribe, unsubscribe and search for new peers when a validator needs to attest on a specific slot.

There is significant logic and edge cases that can occur here. Getting the timing of these events wrong could mean a validator missing attestations and loss of money.

A collection of tests should be built on the attester_service to ensure the it is behaving as expected.

If you are looking at picking this issue up, hit up @AgeManning either here or our discord for further direction

The text was updated successfully, but these errors were encountered:

realbigsean · 2020-03-07T21:51:16Z

Hey, I'd be interested in picking this up.

AgeManning · 2020-03-09T03:58:53Z

Hey @realbigsean, thanks for the interest and the help!

This I've tried to make fairly localised and is a good start to learning some localised sections of the code base.

This issue in particular deals with an object I've called the attester_service. This currently exists on the naive_attestation_aggregation branch (which I will clean up so that tests can run fine on this branch).

The attestation_service can be found here: https://github.com/sigp/lighthouse/blob/naive-attestation-aggregation/beacon_node/network/src/attestation_service/mod.rs

In it's current (incomplete) form it has a single public function: https://github.com/sigp/lighthouse/blob/naive-attestation-aggregation/beacon_node/network/src/attestation_service/mod.rs#L115

This takes in a list of subscriptions then builds various timeouts for events that need to happen. The attester_service then needs to be regularly poll'd in order for the events to be emitted.

I imagine there are a few bugs in the current implentation, and it would be nice to find them by building some tests, that create an attestation_service give it a few subscriptions and see if it outputs events as expected.

To do this, we need to first create the service. This can be done with a dummy BeaconChain and NetworkGlobals. I'd recommend looking at these tests as an example of how to build a dummy beacon chain: https://github.com/sigp/lighthouse/blob/naive-attestation-aggregation/beacon_node/network/src/service/tests.rs#L38
And a NetworkGlobals can just be created with a random peer id, i.e PeerId::random()

Next, in order to do some tests, I need to explain the logic of what this is supposed to be doing.

When a validator connects to the beacon node, it looks two epochs in advance to see when it needs to submit an attestation. Note that a validator client can have many validators, so there could be many subscriptions, ultimately in this service you will just see a list of subscriptions for all kinds of validators and they come in at assorted times.

The general premise is that validators on a given slot, need to listen and obtain attestations from a particular gossipsub topic. This means the validator needs to search for peers that may be on this subnet, connect to them, subscribe to the topics and then unsubscribe after the slot. The discovery needs to happen no more than 1 epoch in advance and we should subscribe 1/3 of a slot prior to the slot (to allow for the subscription to reach our peers). We should then unsubscribe from the slot.

The above is the general principle. However to complicate things further, for each validator connected, the beacon node needs to subscribe to random_subnets_per_validator for a long-lived time epochs_per_random_subnet_subscription. See here: https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/validator.md#phase-0-attestation-subnet-stability for the spec.

There are only 64 subnets. So if 64 validators are connected, we should be subscribed to all subnets and shouldn't disconnect from any. The logic here should also account for the case that if we are already subscribed to a long-lived subnet, it shouldn't need to discover new peers or send a subscription request.

As there is quite a bit of logic going on here, some thought about what to test should also be considered. Some initial things to get started are some of the logic cases I mentioned above plus:

A new validator subscription should create a subscription to a long-lived subnet (if there are any left, and not if we are already subscribed to everything)
subscriptions for a subnet that is not long-lived should unsubscribe after a slot
If all local subscriptions are complete, we should only be subscribed to random_subnet_per_validator * seen_validator subnets. i.e we have unsubscribed from all unnecessary ones
Two subscriptions for the same slot, subnet should not produce multiple requests for discovery/subscription
Subscriptions for consecutive slots on the same subnet should not subscribe, unsubscribe then re-subscribe. I.e we should just stay connected to the subnet for consecutive slots. This might need to be tested for not just two, perhaps three.

Possibly a lot more, but this might do to start it off.

Let me know if you need help starting or building a framework to start the testing.

We're probably going to need to adjust the times, and slot definition to make the tests run faster than using realistic times. We can deal with this when we get to it i guess

realbigsean · 2020-03-11T04:27:02Z

Thanks for the detailed explanation! I've started working on this and will let you know when I have some basic scenarios working, or if I run into any issues.

divagant-martian · 2020-05-12T21:57:32Z

I think this one can be closed?

AgeManning · 2020-05-13T13:38:19Z

Yep, resolved in #1070

AgeManning added enhancement New feature or request good first issue Good for newcomers labels Mar 3, 2020

paulhauner added this to To Do (Must-Have) in Security Review Mar 4, 2020

realbigsean mentioned this issue Mar 26, 2020

Update any references to beacon_chain.spec.milliseconds_per_slot to b… #968

Merged

realbigsean mentioned this issue Apr 28, 2020

Attestation service test suite #1070

Merged

AgeManning closed this as completed May 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test suite for attestation service timing #886

Test suite for attestation service timing #886

AgeManning commented Mar 3, 2020

realbigsean commented Mar 7, 2020

AgeManning commented Mar 9, 2020 •

edited

Loading

realbigsean commented Mar 11, 2020

divagant-martian commented May 12, 2020

AgeManning commented May 13, 2020

Test suite for attestation service timing #886

Test suite for attestation service timing #886

Comments

AgeManning commented Mar 3, 2020

Description

realbigsean commented Mar 7, 2020

AgeManning commented Mar 9, 2020 • edited Loading

realbigsean commented Mar 11, 2020

divagant-martian commented May 12, 2020

AgeManning commented May 13, 2020

AgeManning commented Mar 9, 2020 •

edited

Loading