-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Use bloom filters in collator protocol #6071
Comments
Very much related to: #2888 |
You can only share filters by sharing the key, which increases false positive risks, which increases risk if you also record reputation or miss behavior in the filter. You'd need a cuckoo filter though if you want to record reputation in the filter. |
Aren't parachains mostly running like 15ish nodes though? Any idea how big the largest is? We do not necessarily need many collators per parachain. We're thinking parathread blocks come accompanied by a transaction, perhaps even in opening the connection. We're also thinking deterministic runtimes require parachains or their collators have some bond. If all collators of all or most parachains wind up with relay chain accounts, then maybe we already know them all anyways? We might need large parachains for gav's special low-security recursive stuff, but then we could add special support later, while most parachains still run under some simple capped collator count mode. |
There are concerns about leaving the collator set as open as possible. I personally like the idea of nodes having accounts - that is actually a favorite of mine. Having a list/filter of previously known good connections is already a step in that direction. That list/filter could equally well become nodes having an account >some DOT amount at some later point. |
I like this bloom filter idea, but we should separate that discussion from any discussion about having collators be registered on the relay-chain (it's come up multiple times in the past and it's not a degree of freedom I feel comfortable removing at this stage). |
I really asked: Do we expect this many collators? In fact, there exists a smooth-ish transition plan which morally goes via cuckoo filters although not necessarily in practie: We could plan to make parachains provide a collator upper bound, and implement this using cuckoo filters, sized using this upper bound, which gives some advantages:
We do not need the upper bound while the cuckoo filter holds actual data, as filters could be rebuilt like |
Note: If any node is able to provide a collation (received it from actual collators), even if not a collator (or even if a collator, if the set is large/unlimited) it might still be possible to fill up all available slots slowly over time, by simply providing a good collation once and then never again. Counter measures:
|
I am currently pondering about an alternate design to solve more issues better. |
This can be seen as an intermediate step to the alternate proposal by @rphmeier here. It improves things on its own, but with candidate providers != candidate creators this becomes even more solid.
Idea being, validators maintain two bloom filters with collator
PeerId
s getting added every time they provide a valid collation. Those bloom filters will have a life time of let's say a day/two days. One bloom filter is the active one where newPeerId
s are added to, the other is inactive and stays at the state of the previous day. Both are used for prioritizing incoming connections: If thePeerId
passes one of the bloom filters it is accepted - regardless of the free slots in the peerset.The filtering should ideally be performed in substrate at the peerset level, similar to how the reserved set works. There are two differences:
-> That filter function should likely take the
PeerId
of the connecting peer and the number of already opened connections, so it can enforce a hard upper limit.Incoming connections will then be admitted if we either have free incoming slots or the
PeerId
passes the filter.Seeding initial filters
Validators should persist their filters to disk and in addition when coming up for the first time (or after a longer downtime) they can request bloom filters from random other validators. This gives us a 2/3 probability of starting with a good set, which is better than an empty set if the limit enforced by the above filter function is relatively low, otherwise it would probably be better to start with an empty filter. We should probably limit the number of bloom filter allowed connections quite aggressively, if the bloom filter is coming from an untrusted source, but have a rather large limit if it is was generated locally.
Sizes
Some back of the envelope calculations suggest that a bloom filter with a false positive rate of <0.1% for maintaining ~1000 good nodes (to be done properly), would be around 2kiB, therefore for 100 parachains this would be around 200kiB (400kiB for 2 filters).
Security Considerations
With a false positive rate of 0.1% an attacker needs 1000 connection attempts to compete with honest collators, which makes attacks way harder. The hashing should definitely be based on some non-guessable random seed, so the attacker has to actually attempt the connections.
Implementation
This has to be done in substrate in the peer set implementation, otherwise we would need to replicate the whole implementation - which would be slower and easier to attack. The issue in particular: Having some number of incoming free slots won't work anymore, as that would be both for bloom filtered connections and new connections.
The text was updated successfully, but these errors were encountered: