Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Choose signature aggregation dissemination strategy for mainnet #1331

Closed
arnetheduck opened this issue Aug 2, 2019 · 3 comments
Closed

Choose signature aggregation dissemination strategy for mainnet #1331

arnetheduck opened this issue Aug 2, 2019 · 3 comments

Comments

@arnetheduck
Copy link
Contributor

On mainnet, the expectation is that unaggregated attestations will be disseminated to a shard-specific topic, then aggregated and forwarded to a beacon-chain-wide topic so that block proposers can propose blocks without having to listen to all shard attestation channels.

In the networking spec, the aggregation strategy is left open, with a few notable alternatives having been discussed in the past (please add any that I missed):

  1. A random selection of validators are responsible for packaging attestations and forwarding to beacon topic - for example the first N in committee

  2. As 1) but random selection is a probability function instead where validators roll a local die, with increasing probability of doing the work as time passes and nobody has passed attestation - the function could weigh certain validators higher to prevent collisions.

  3. Handel

@ericsson49
Copy link
Contributor

We have been investigating the attestation aggregation and dissemination problem for awhile.
We've found many issues and I am to write a more elaborate and comprehensive overview sometime. But here I try to outline main issues (not so) breifly.

Preliminary scalability considerations

There will be many participant nodes, even we consider only a part of the overall network active at a given slot, e.g. attesters, intermediate nodes (aggregators, transient nodes) and proposers (of next blocks). It's expected there will be around 300K attesters initially (10M ether). This means about 300 attesters per shard/committee. Given 16 committees per slot, it's around 5K nodes. In future, amount of validators may grow, so if there are 1M validators, there will be 1K attesters per committee, i.e. around 15-16K nodes with at least one attester (assuming 100K nodes overall, it will be a rare situation to have more than one attester on a node).

The same issue with shard subnets, i.e. it's expected that around 300 validators plus 200 standard nodes are listening to a shard. There are several shards in a subnet, so a subnet size is several times more than a committee size. There are up to 16 active subnets in a slot, so it's lots of nodes too.

Given all above, one should be very careful when designing an aggregation/dissemination protocol. We'll look at this in more details in the following sections.

NB The estimates of 300 validators per shard and 200 standard nodes per shard are based on p2p/issues/1 and p2p/issues/6.

Aggregation and result delivery are separate problems

The network specifications states:

Unaggregated attestations are sent to the subnet topic. Aggregated attestations are sent to the beacon_attestation topic.

This implies only a subset of aggregators should send their results to proposers (beacon_attestation). Else it's too much traffic and it is probably simpler to send individual attestations directly (unless aggregators size is much smaller than attesters size).
This means the subset should be chosen somehow (e.g. by rolling a dice).
The subset size is a scalability vs BFT tradeoff, i.e. too small size means less BF tolerance (easier to block results propagation), while too big size means two many messages.

This is regardless of which aggregation protocol is employed. E.g. if Handel is employed to aggregate attestations, then the aggregating nodes still have to decides who sends the results.

Partial aggregates may be okay

Since several aggregators have to send their results to proposers, it may be okay not to wait when aggregates become complete or near complete (include all or almost all individual attestations).
If several agregators are to send aggregates, then we can weaken the requirement so that union of the aggregates should cover the committee. In other words, a proposer can participate in an aggregation protocol on the final round, performing the final merge.

Given network or byzantine failures, this is a highly desirable property, since some attestations may be lost for various reasons. This allows for aggregation protocol to stop before a final solution is obtained (which may be too late). However, it raises additional problems (see below).

Coordinated vs random aggregation

Let's look at the aggregation part in more details. There are three general kinds of protocols to aggregate data in p2p-networks:

  • Tree-like protocols. It's very efficient communication-wise, bit not tolerant to failures, specially byzantine ones.
  • Gossip-like protocols. Participants send partial aggregates to several (randomly) selected peers, to avoid traffic amplification. After some rounds, most participants should have received most individual items. It may be too long to wait when all nodes receive all items, especially in a byzantine context.
  • Hybrid protocols. Try to aggregate data in a coordinated manner (via tree-like structures), but handling network/byzantine failures (falling to some kind of a gossip protocol). Handel is a good example of the approach.

When node/link failures cannot be ignored, we have only two options, either a gossip or a hybrid approach. A gossip approach has a significant drawback: since partial aggregates are sent in a random way, at some point, it will be difficult or impossible to merge two partial aggregations, because the sets of their attesters are overlapping, i.e. there is one or more attesters, whose attestations are included in both partial aggregates. The problem is caused by the Attestation class, which uses bitifields to account for attesters.

A coordinated approach is required to avoid this, so that nodes should communicate in a way that allows for non-overlapping partial aggregates. Organizing nodes in a tree is an ideal choice in a fault-free setup, but in byzantine context, rather a forest of trees should be constructed to mask failures and message omissions.
Handel follows this way. However, it imposes some overhead. E.g. Handel requires pairwise connections between nodes, which is not compatible with p2p-graph approach, without modifications (e.g. messages will pass through transient nodes, which may happen to be byzantine).

Medium-sized partial aggregation

Gossip-like protocols are attractive because they require less coordination and well matched to p2p communication graph. Also it's beneficial (and may be even required if slot duration is about to elapse) to stop the aggregation stage before a final result is reached.
So, if the partial aggregate merge problem is resolved, then gossip protocols is a very attractive solution.
I.e. aggregators send their peers partial aggregates for several rounds, and when a aggregate become big enough (around sqrt(m), where m is the committee size) or before the end of the slot, an aggregator roll a dice and send its partial aggregate to proposers. It also may roll a dice several times.

Actually, the beacon block structure allows storing multiple partial attestations of a committee. The main obstacle is the 128 limit on total amount of Attestations. More importantly, storing too many attestation will bloat a beacon block, which can be a problem to scalability. However, we think that the problem can be resolved with proper block structure and/or smart compression. See here for details.

Handel is a partial solution

Handel is an interesting protocol, however as it follows from the above notes, it's not a complete solution.

First, Handel requires pairwise connections between nodes, which doesn't fit well p2p-graph, i.e. instead of direct connections, messages will pass through transient nodes, which means: a) additional delays, b) opportunities for byzantine attacks. The last is not Handel specific, though.

Second, after Handel is complete or partially complete, the results should be sent somehow to proposers in a reliable fashion - the problem common to all attestation aggregation-dissemination strategies (discussed before).

Third, Handel paper says that Handel is able to aggregate 4K attestations under 1 second in case of UDP setup. However, when using QUIC, Handel developers report it's three times slower. In case of p2p-graph, when a pairwise connection between nodes have to be implemented with sending a message via transient nodes, it means an additional latency. So, when implemented in the context of Ethereum 2.0 requirements, it's not clear whether it's performant enough or not.

Overall, if follow a coordinated route, Handel seems to be a very good starting point, which should be augmented to resolve the above issues.

Topic-based Publish-Subscribe pattern seems to be a poor match

As quoted before, the network specifications states:

Unaggregated attestations are sent to the subnet topic. Aggregated attestations are sent to the beacon_attestation topic.

However, fully delivering of individual attestations to subnet topic is very resource consuming. Earlier, we estimated that the are 16 committees of 300-1000 senders and each should send to a subnet topic of a size which is several times more, i.e. around thousands subscribers.

Actually, it's excessive since individual attestations have to be delivered to only some of aggregators. The final aggregation is obtained via several rounds of aggregation protocol. If all individual attestations are propagated to all members of a shard subnet, then there is no need for an aggregation protocol at all, since they can be sent to proposers directly, with less efforts (assuming amount of proposers in beacon_attestation is much less than amount of subscribers to a shard subnet topic).

An aggregation protocol also doesn't match topic-based publish subscribe pattern, since aggregators send partial aggregates which are growing with each round, so there are different messages.

The final stage, where aggregators send their results to proposers, looks a good match on a high level. However, considering implementation details, the subscribers to beacon_attestation topic are constantly changing. So, this is a serious problem with topic membership management, which discussed in the following section

Overlay management and Topic discovery

New proposers should subscribe to the topic beforehand to be able to receive results. And later unsubscribe (to keep topic subscribers small). The appropriate information about topic membership changes should be propagated to aggregator nodes, so that they know whom to send their results.

As the specification assumes that beacon_attestation are mostly proposers, we assume there won't be many subscribers -- around tens of them. From scalabilty point of view, ideally there should be one subscriber each slot -- the proposer of the next slot. However, it's safer to assume there will be proposers of some slots before and after the current one.

If the topic membership is small and changes rapidly, then it will be a problem for gossipsub to maintain the mesh for the topic. Basically, we should assume, a gossipsub router at a node should request beforehand Topic Discovery service for an information of latest topic changes. Moreover, for a validator which is assigned to be an attester for a particular slot, it's most important that the topic membership information includes the entry of the next slot proposer.
Which means the next slot proposer should beforehand advertise itself with a Topic Discovery as the topic member. Since it's a lengthy process and proposers are assigned in a preceding epoch before, it becomes a serious problem. Basically, the next slot proposer and the current slot attesters have from 64 to 128 slots (6 seconds), i.e. about 3-6 minutes to exchange with the necessary information.

Topic Discovery and BFT

Another critical problem is byzantine fault tolerance properties of Topic Discovery service. An adversary can advertise wrong records in Topic Discovery service or run Topic Discovery service instance which provides wrong records to honest nodes about who are the members of the beacon_attestation topic. The honest nodes will send their attestations in a wrong direction.
It's not clear how BF tolerant Topic Dsicovery is, but an excerpt from here suggests it's not

An attacker might wish to direct discovery traffic to a chosen address by returning records pointing to that address.

TBD: this is not solved.

Basically, Topic Discovery is based around Kademlia DHT and p2p DHT are known to have problems with BFT. The BFT in the context of p2p and DHT is also discussed here.

@nkeywal
Copy link

nkeywal commented Aug 22, 2019

Good stuff :-)

On Handel:

First, Handel requires pairwise connections between nodes
Yes. It's a drawback (and a tradeoff).

the results should be sent somehow to proposers
So it could depend, but I would expect the proposers to participate in the aggregation. If not, yes, you need to publish the final aggregation on a p2p network, adding some extra costs & time.

However, when using QUIC, Handel developers report it's three times slower.
I asked to @bkolad: the test was done when the 0-RTT handshake was not available (I don't know if it's available now) and we were creating a new session (eg. handshake) for each message. In real life we should be able to cache all the sessions and the cache should be reused between aggregation and the 0-RTT handshake should be there... But it's as well possible to mix protocols (eg. udp/quic/tor-like-hopefully-a-day).

@djrtwo
Copy link
Contributor

djrtwo commented Dec 12, 2019

Intention is to use the simple approach currently in the network/validator specs. Will revisit this if run into issues on testnets

@djrtwo djrtwo closed this as completed Dec 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants