Lifecycle Stage | Maturity | Status | Latest Revision |
---|---|---|---|
2A | Candidate Recommendation | Active | r8, 2021-12-14 |
Authors: @vyzo
Interest Group: @yusefnapora, @raulk, @whyrusleeping, @Stebalien, @daviddias, @protolambda, @djrtwo, @dryajov, @mpetrunic, @AgeManning, @Nashatyrev, @mhchia
See the lifecycle document for context about maturity level and spec status.
- Overview
- Protocol extensions
This document specifies extensions to gossipsub v1.0 intended to improve
bootstrapping and protocol attack resistance. The extensions change the algorithms that
prescribe local peer behaviour and are fully backwards compatible with v1.0 of the protocol.
Peers that implement these extensions, advertise v1.1 of the protocol using /meshsub/1.1.0
as the protocol string.
The protocol now supports explicit peering agreements between node operators. With explicit peering, the application can specify a list of peers to remain connected to and unconditionally forward messages to each other outside of the vagaries of the peer scoring system and other defensive measures.
For every explicit peer, the router must establish and maintain a connection. The connections are initially established when the router boots, and are periodically checked for connectivity and reconnect if the connectivity is lost. The recommended period for connectivity checks is 5 minutes.
Peering agreements are established out of band and reciprocal. explicit peers exist outside the mesh: every new valid incoming message is forwarded to the direct peers, and incoming RPCs are always accepted from them. It is an error to GRAFT on an explicit peer, and such an attempt should be logged and rejected with a PRUNE.
Gossipsub relies on ambient peer discovery in order to find peers within a topic of interest. This puts pressure to the implementation of a scalable peer discovery service that can support the protocol. With Peer Exchange, the protocol can now bootstrap from a small set of nodes, without relying on an external peer discovery service.
Peer Exchange (PX) kicks in when pruning a mesh because of oversubscription. Instead of simply telling the pruned peer to go away, the pruning peer may provide a set of other peers where the pruned peer can connect to reform its mesh (see Peer Scoring below).
In addition, both the pruned and the pruning peer add a backoff period from each other, within which
they will not try to regraft. Both the pruning and the pruned peer will immediately prune a GRAFT
within the backoff period and extend it.
When a peer tries to regraft too early, the pruning peer may apply a behavioural penalty
for the action, and penalize the peer through P₇ (see Peer Scoring below).
When unsubscribing from a topic, the backoff period should be finished before subscribing to the topic again, otherwise a healthy mesh will be difficult to reach. A shorter backoff period can be used in case of an unsubscribe event, allowing faster resubscribing.
The recommended duration for the backoff period is 1 minute, while the recommended number of peers
to exchange is larger than D_hi
so that the pruned peer can reliably form a full mesh.
In order to correctly synchronize the two peers, the pruning peer should include the backoff period
in the PRUNE message. The peer has to wait the full backoff period before attempting to graft again
—plus some slack to account for the offset until the next heartbeat that clears the backoff—
otherwise it risks getting its graft rejected and being penalized in its score if it attempts to
graft too early.
In order to implement PX, we extend the PRUNE
control message to include an optional set of
peers the pruned peer can connect to. This set of peers includes the Peer ID and a signed peer
record for each peer exchanged.
In order to facilitate the transition to the usage of signed peer records within the libp2p ecosystem,
the emitting peer is allowed to omit the signed peer record if it doesn't have one.
In this case, the pruned peer will have to rely on the ambient peer discovery service (if set up) to discover the addresses for the peer.
The ControlPrune
message is extended with a peers
field as follows.
syntax = "proto2";
message ControlPrune {
optional string topicID = 1;
repeated PeerInfo peers = 2; // gossipsub v1.1 PX
optional uint64 backoff = 3; // gossipsub v1.1 backoff time (in seconds)
}
message PeerInfo {
optional bytes peerID = 1;
optional bytes signedPeerRecord = 2;
}
In gossipsub v1.0, peers publish new messages to the members of their mesh if they are subscribed to the topic to which they're publishing. A peer can also publish to topics they are not subscribed to, in which case they will select peers from their fanout map.
In gossipsub v1.1 publishing is (optionally) done by publishing the message to all connected peers with a score above a publish threshold (see Peer Scoring below). This applies regardless of whether the publisher is subscribed to the topic. With flood publishing enabled, the mesh is used when propagating messages from other peers, but a peer's own messages will always be published to all known peers in the topic.
This behaviour is prescribed to counter eclipse attacks and ensure that a newly published message from an honest node will reach all connected honest nodes and get out to the network at large. When flood publishing is in use there is no point in utilizing a fanout map or emitting gossip when the peer is a pure publisher not subscribed in the topic.
This behaviour also reduces message propagation latency as the message is injected to more points in the network.
In gossipsub v1.0 gossip is emitted to a fixed number of peers, as specified by the D_lazy
parameter. In gossipsub v1.1 the dissemination of gossip is adaptive; instead of emitting gossip
to a fixed number of peers, we emit gossip to a percentage of our peers with a minimum of D_lazy
peers.
The parameter controlling the emission of gossip is called the gossip factor. When a node wants to
emit gossip during the heartbeat, first it selects all peers with a peer score above a gossip
threshold (see Peer Scoring below). From these peers, it randomly selects gossip
factor peers with a minimum of D_lazy
, and emits gossip to the selected peers.
The recommended value for the gossip factor is 0.25
, which with the default of 3 rounds of gossip
per message ensures that each peer has at least a 50% chance of receiving gossip about a message.
More specifically, for 3 rounds of gossip, the probability of a peer not receiving gossip about
a fresh message is (3/4)³=27/64=0.421875
. So each peer receives gossip about a fresh message with
a 0.578125
probability.
This behaviour is prescribed to counter sybil attacks and ensures that a message from an honest node propagates in the network with high probability.
In gossipsub v1.0 mesh peers are randomly selected, without any weight given to the direction of the connection. In contrast, gossipsub v1.1 implements outbound connection quotas, so that a peer tries to always maintain a number of outbound connections in the mesh.
Specifically, we define a new overlay parameter D_out
, which must be set below D_lo
and
at most D/2
, such that:
- When the peer prunes because of oversubscription, it selects survivor peers under the constraint
that at least
D_out
peers are outbound connections; see also Peer Scoring below. - When the peer receives a GRAFT while oversubscribed (with mesh degree at
D_hi
or higher), it only accepts the new peer in the mesh if it is an outbound connection. - During heartbeat maintenance, if the peer already has at least
D_lo
peers in the mesh but not enough outbound connections, then it selects as many needed peers to fill the quota and grafts them in the mesh.
This behaviour is prescribed to counter sybil attacks and ensures that a coordinated inbound attack can never fully take over the mesh of a target peer.
In gossipsub v1.1 we introduce a peer scoring component: each individual peer maintains a score for other peers. The score is locally computed by each individual peer based on observed behaviour and is not shared. The score is a real value, computed as a weighted mix of parameters, with pluggable application-specific scoring. The score is computed across all (configured) topics with a weighted mix, such that faulty behaviour in one topic percolates to other topics. Furthermore, the score is retained for some period of time when a peer disconnects, so that malicious peers cannot easily reset their score when it drops to negative and well behaving peers don't lose their status because of a disconnection.
The intention is to detect malicious or faulty behaviour and penalize the misbehaving peers with a negative score.
The score is plugged into various gossipsub algorithms such that peers with negative scores are removed from the mesh. Peers with a heavily negative score are further penalized or even ignored if the score drops too low.
More specifically, the following thresholds apply:
0
: the baseline threshold; peers with a score below this threshold are pruned from the mesh during the heartbeat and ignored when looking for peers to graft. Furthermore, no PX information is emitted towards those peers and PX is ignored from them. In addition, when performing PX only peers with non-negative scores are exchanged.GossipThreshold
: when a peer's score drops below this threshold, no gossip is emitted towards that peer and gossip from that peer is ignored. This threshold should be negative, such that some information can be propagated to/from mildly negatively scoring peers.PublishThreshold
: when a peer's score drops below this threshold, self-published messages are not propagated towards this peer when (flood) publishing. This threshold should be negative, and less than or equal to the gossip threshold.GraylistThreshold
: when a peer's score drops below this threshold, the peer is graylisted and its RPCs are ignored. This threshold must be negative, and less than the gossip/publish threshold.AcceptPXThreshold
: when a peer sends us PX information with a prune, we only accept it and connect to the supplied peers if the originating peer's score exceeds this threshold. This threshold should be non-negative and for increased security a large positive score attainable only by bootstrappers and other trusted well-connected peers.OpportunisticGraftThreshold
: when the median peer score in the mesh drops below this value, the router may select more peers with a score above the median to opportunistically graft on the mesh (see Opportunistic Grafting below). This threshold should be positive, with a relatively small value compared to scores achievable through topic contributions.
The score is checked explicitly during heartbeat maintenance such that:
- Peers with a negative score are pruned from all meshes.
- When pruning because of oversubscription, the peer keeps the best
D_score
scoring peers and selects the remaining peers to keep at random. This protects the mesh from takeover attacks and ensures that the best scoring peers are kept in the mesh. At the same time, we do keep some peers as random so that the protocol is responsive to new peers joining the mesh. The selection is done under the constraint thatD_out
peers are outbound connections; if the scoring plus random selection does not result in enough outbound connections, then we replace the random and lower scoring peers in the selection with outbound connection peers. - When selecting peers to graft because of undersubscription, peers with a negative score are ignored.
It may be possible that the router gets stuck with a mesh of poorly performing peers, either due to churn of good peers or because of a successful large scale cold boot or covert flash attack. When this happens, the router will normally react through mesh failure penalties (see The Score Function below), but this reaction time may be slow: the peers selected to replace the negative scoring peers are selected at random among the non-negative scoring peers, which may result in multiple rounds of selections amongst a sybil poisoned pool. Furthermore, the population of sybils may be so large that the sticky mesh failure penalties completely decay before any good peers are selected, thus making sybils re-eligible for grafting.
In order to recover from such disaster scenarios and generally adaptively optimize the mesh over time,
gossipsub v1.1 introduces an opportunistic grafting mechanism.
Periodically, the router checks the median score of peers in the mesh against the OpportunisticGraftThreshold
.
If the median score is below the threshold, the router opportunistically grafts (at least) two peers
with score above the median in the mesh.
This improves an underperforming mesh by introducing good scoring peers that may have been gossiping
at us.
This also allows the router to get out of sticky disaster situations by replacing sybils attempting
an eclipse with peers which have actually forwarded messages through gossip recently.
The recommended period for opportunistic grafting is 1 minute, while the router should graft 2 peers (with the default parameters) so that it has the opportunity to become a conduit between them and establish a score in the mesh. Nonetheless, the number of peers that are opportunistically grafted is controlled by the application. It may be desirable to graft more peers if the application has configured a larger mesh than the default parameters.
The score function is a weighted mix of parameters, 4 of them per topic and 3 of them globally applicable.
Score(p) = TopicCap(Σtᵢ*(w₁(tᵢ)*P₁(tᵢ) + w₂(tᵢ)*P₂(tᵢ) + w₃(tᵢ)*P₃(tᵢ) + w₃b(tᵢ)*P₃b(tᵢ) + w₄(tᵢ)*P₄(tᵢ))) + w₅*P₅ + w₆*P₆ + w₇*P₇
where tᵢ
is the topic weight for each topic where per topic parameters apply.
The parameters are defined as follows:
P₁
: Time in Mesh for a topic. This is the time a peer has been in the mesh, capped to a small value and mixed with a small positive weight. This is intended to boost peers already in the mesh so that they are not prematurely pruned because of oversubscription.P₂
: First Message Deliveries for a topic. This is the number of messages first delivered by the peer in the topic, mixed with a positive weight. This is intended to reward peers who first forward a valid message.P₃
: Mesh Message Delivery Rate for a topic. This parameter is a threshold for the expected message delivery rate within the mesh in the topic. If the number of deliveries is above the threshold, then the value is 0. If the number is below the threshold, then the value of the parameter is the square of the deficit. This is intended to penalize peers in the mesh who are not delivering the expected number of messages so that they can be removed from the mesh. The parameter is mixed with a negative weight.P₃b
: Mesh Message Delivery Failures for a topic. This is a sticky parameter that counts the number of mesh message delivery failures. Whenever a peer is pruned with a negative score, the parameter is augmented by the rate deficit at the time of prune. This is intended to keep history of prunes so that a peer that was pruned because of underdelivery cannot quickly get re-grafted into the mesh. The parameter is mixed with negative weight.P₄
: Invalid Messages for a topic. This is the number of invalid messages delivered in the topic. This is intended to penalize peers who transmit invalid messages, according to application-specific validation rules. It is mixed with a negative weight.P₅
: Application-Specific score. This is the score component assigned to the peer by the application itself, using application-specific rules. The weight is positive, but the parameter itself has an arbitrary real value, so that the application can signal misbehaviour with a negative score or gate peers before an application-specific handshake is completed.P₆
: IP Colocation Factor. This parameter is a threshold for the number of peers using the same IP address. If the number of peers in the same IP exceeds the threshold, then the value is the square of the surplus, otherwise it is 0. This is intended to make it difficult to carry out sybil attacks by using a small number of IPs. The parameter is mixed with a negative weight.P₇
: Behavioural Penalty. This parameter captures penalties applied for misbehaviour. The parameter has an associated (decaying) counter, which is explicitly incremented by the router on specific events. The value of the parameter is the square of the counter and is mixed with a negative weight.
The TopicCap
function allows the application to specify an optional cap to the contribution to the
score across all topics.
The topic parameters are implemented using counters maintained internally by the router whenever an event of interest occurs. The counters decay periodically so that their values are not continuously increasing and ensure that a large positive or negative score isn't sticky for the lifetime of the peer.
The decay interval is configurable by the application, with shorter intervals resulting in faster decay.
Each decaying parameter can have its own decay factor, which is a configurable parameter that controls how much the parameter will decay during each decay period.
The decay factor is a float in the range of (0.0, 1.0) that will be multiplied with the current
parameter value at each decay interval update. For example, suppose the value for P₂
(First
Message Deliveries) is 120
, with a decay factor FirstMessageDeliveriesDecay = 0.97
. At the decay
interval, the value will be updated to 120 * 0.97 == 110.4
.
The decay factor and interval together determine the absolute rate of decay for each parameter. With
a decay interval of 1 second and a decay factor of 0.97
, a parameter will decrease by 3% every
second, while 0.90
would cause it lose 10%/sec, etc.
In order to compute P₁
, the router records the time when the peer is GRAFTed. The time in mesh
is calculated lazily during the decay update to avoid a large number of calls to gettimeofday
.
The parameter value is the division of the time elapsed since the GRAFT with an application
configurable quantum.
For example, with a quantum of one second, a peer's P₁
value will be equal to the number of
seconds elapsed since they were GRAFTed onto the mesh. With a quantum of 5 minutes, the P₁
value
will be the number of 5 minute intervals elapsed since GRAFTing. The P₁
value will be capped to an
application configurable maximum.
In pseudo-go:
// topic configuration parameters
var TimeInMeshQuantum time.Duration
var TimeInMeshCap float64
// lazily updated time in mesh
var meshTime time.Duration
// P₁
p1 := float64(meshTime / TimeInMeshQuantum)
if p1 > TimeInMeshCap {
p1 = TimeInMeshCap
}
In order to compute P₂
, the router maintains a counter that increments whenever a message
is first delivered in the topic by the peer. The parameter has a cap that applies at the time
of increment.
In pseudo-go:
// topic configuration parameters
var FirstMessageDeliveriesCap float64
// couner updated every time a first message delivery occurs
var firstMessageDeliveries float64
// counter update
firstMessageDeliveries += 1
if firstMessageDeliveries > FirstMessageDeliveriesCap {
firstMessageDeliveries = FirstMessageDeliveriesCap
}
// P₂
p2 := firstMessageDeliveries
In order to compute P₃
, the router maintains a counter that increments whenever a first
or near-first message delivery occurs in the topic by a peer in the mesh. A near-first message
delivery is a message delivery that occurs while a message has been first received and is being
validated or it has been received within a configurable window of validation of first message
delivery. The window is configurable but should be small (in the order of milliseconds) to avoid
allowing a mesh peer to build score by simply replaying back the messages received by the current
router. The parameter has a cap that applies at the time of increment.
In order to avoid triggering the penalty too early, the parameter has an activation window. This is a configurable value that is the time that the peer must have been in the mesh before the parameter applies.
In pseudo-go:
// topic configuration parameters
var MeshMessageDeliveriesCap, MeshMessageDeliveriesThreshold float64
var MeshMessageDeliveriesWindow, MeshMessageDeliveriesActivation time.Duration
// time in mesh, lazily updated
var meshTime time.Duration
// counter updated every time a first or near-first message delivery occurs by a mesh peer
var meshMessageDeliveries float64
// counter update
meshMessageDeliveries += 1
if meshMessageDeliveries > MeshMessageDeliveriesCap {
meshMessageDeliveries = MeshMessageDeliveriesCap
}
// calculation of P₃
var deficit float64
if meshTime > MeshMessageDeliveriesActivation && meshMessageDeliveries < MeshMessageDeliveriesThreshold {
deficit = MeshMessageDeliveriesThreshold - meshMessageDeliveries
}
p3 := deficit * deficit
In order to calculate P₃b, the router maintains a counter that is updated whenever the peer is pruned with an active deficit in message delivery. The parameter is uncapped.
In pseudo-go:
// counter updated at prune time
var meshFailurePenalty float64
// counter update
if meshTime > MeshMessageDeliveriesActivation && meshMessageDeliveries < MeshMessageDeliveriesThreshold {
deficit = MeshMessageDeliveriesThreshold - meshMessageDeliveries
meshFailurePenalty += deficit * deficit
}
// P₃b
p3b := meshFailurePenalty
In order to compute P₄
, the router maintains a counter that increments whenever a message fails
validation. The value of the parameter is the square of the counter, which is uncapped.
In pseudo-go:
// counter updated every time a message fails validation
var invalidMessageDeliveries float64
// counter update
invalidMessageDeliveries += 1
// P₄
p4 := invalidMessageDeliveries * invalidMessageDeliveries
The counters associated with P₂
, P₃
, P₃b
, and P₄
decay periodically by multiplying with a configurable
decay factor. When the value drops below a threshold it is considered zero.
In pseudo-go:
// decay factors
var FirstMessageDeliveriesDecay, MeshMessageDeliveriesDecay, MeshFailurePenaltyDecay, InvalidMessageDeliveriesDecay float64
// 0-threshold
var DecayToZero float64
// periodic decay of counters
firstMessageDeliveries *= FirstMessageDeliveriesDecay
if firstMessageDeliveries < DecayToZero {
firstMessageDeliveries = 0
}
meshMessageDeliveries *= MeshMessageDeliveriesDecay
if meshMessageDeliveries < DecayToZero {
meshMessageDeliveries = 0
}
meshFailurePenalty *= MeshFailurePenaltyDecay
if meshFailurePenalty < DecayToZero {
meshFailurePenalty = 0
}
invalidMessageDeliveries *= InvalidMessageDeliveriesDecay
if invalidMessageDeliveries < DecayToZero {
invalidMessageDeliveries = 0
}
TBD
: We are currently developing multiple types of simulations that will inform us on how to best recommend tuning the Scoring function. We will update this section once that work is complete
The pubsub subsystem incorporates application-specific message validators so that the application can signal invalid message delivery, and trigger the P₄ penalty. However, it is possible to have circumstances where a message should not be delivered to the application or forwarded to the network, but without triggering the P₄ penalty. A known use-case where this need exists is in the case of duplicate beacon messages or while an application is syncing its blockchain, in which case it would be unable to ascertain the validity of new messages.
In order to address this situation, all gossipsub v1.1 implementations must support extended validators with an enumerated decision interface. The outcome of extended validation can be at a minimum one of three things:
- Accept message; in this case the message is considered valid, and it should be delivered and forwarded to the network.
- Reject message; in this case the message is considered invalid, and it should be rejected and trigger the P₄ penalty.
- Ignore message; in this case the message is neither delivered nor forwarded to the network, but the router does not trigger the P₄ penalty.
The extensions that make up gossipsub v1.1 introduce several new application configurable parameters. This section summarizes all the new parameters along with a brief description.
The following parameters apply globally:
Parameter | Type | Description | Reasonable Default |
---|---|---|---|
PruneBackoff |
Duration | Time after pruning a mesh peer before we consider grafting them again. | 1 minute |
UnsubscribeBackoff |
Duration | Backoff to use when unsuscribing from a topic. Should not resubscribe to this topic before it expired. | 10 seconds |
FloodPublish |
Boolean | Whether to enable flood publishing | true |
GossipFactor |
Float [0.0, 1.0] | % of peers to send gossip to, if we have more than D_lazy available |
0.25 |
D_score |
Integer | Number of peers to retain by score when pruning because of oversubscription | 4 or 5 for a D of 6. |
D_out |
Integer | Number of outbound connections to keep in the mesh. Must be less than D_lo and at most D/2 |
2 for a D of 6 |
The remaining parameters apply to Peer Scoring. Because many parameters are interrelated and may be application-specific, reasonable defaults are not shown here. See Guidelines for Tuning the Scoring Function to understand how tune the parameters to the needs of an application.
The following peer scoring parameters apply globally to all peers and topics:
Parameter | Type | Description | Constraints |
---|---|---|---|
GossipThreshold |
Float | No gossip emitted to peers below threshold; incoming gossip is ignored. | Must be < 0 |
PublishThreshold |
Float | No self-published messages are sent to peers below threshold. | Must be <= GossipThreshold |
GraylistThreshold |
Float | All RPC messages are ignored from peers below threshold. | Must be < PublishThreshold |
AcceptPXThreshold |
Float | PX information by peers below this threshold is ignored. | Must be >= 0 |
OpportunisticGraftThreshold |
Float | If the median score in the mesh drops below this threshold, then the router may opportunistically graft better scoring peers. | Must be >= 0 |
DecayInterval |
Duration | Interval at which parameter decay is calculated. | |
DecayToZero |
Float | Limit below which we consider a decayed param to be "zero". | Should be close to 0.0 |
RetainScore |
Duration | Time to remember peer scores after a peer disconnects. |
The remaining peer score parameters affect how scores are computed for each peer based on their observed behavior.
Parameters with type Weight
are floats that determine
how much a score parameter contributes to the overall score for a peer. See The Score
Function for details.
There are some parameters that apply to the peer "as a whole", regardless of which topics they are subscribed to:
Parameter | Type | Description | Constraints |
---|---|---|---|
AppSpecificWeight |
Weight | Weight of P₅ , the application-specific score. |
Must be positive, however score values may be negative. |
IPColocationFactorWeight |
Weight | Weight of P₆ , the IP colocation score. |
Must be negative, to penalize peers with multiple IPs. |
IPColocationFactorThreshold |
Integer | Number of IPs a peer may have before being penalized. | Must be at least 1. Values above threshold will be penalized. |
BehaviourPenaltyWeight |
Weight | Weight of P₇ , the behaviour penalty. |
Must be negative to penalize peers for misbehaviour. |
BehaviourPenaltyDecay |
Float | Decay factor for P₇ . |
Must be between 0 and 1. |
The remaining parameters are applied to a peer's behavior within a single topic. Implementations
should be able to accept configurations for multiple topics, keyed by topic ID string. Each topic
may be configured with the following params. If a topic is not configured, a peer's behavior in that
topic will not contribute to their score. If a peer is in multiple configured topics, each topic
will contribute to their total score according to the TopicWeight
parameter.
Parameter | Type | Description | Constraints |
---|---|---|---|
TopicWeight |
Weight | How much does behavior in this topic contribute to the overall score? | |
P₁ |
Time in Mesh | ||
TimeInMeshWeight |
Weight | Weight of P₁ . |
Should be a small positive value. |
TimeInMeshQuantum |
Duration | Time a peer must be in mesh to accrue one "point" for P₁ . |
|
TimeInMeshCap |
Float | Maximum value for P₁ . |
Should be a small positive value. |
P₂ |
First Message Deliveries | ||
FirstMessageDeliveriesWeight |
Weight | Weight of P₂ . |
Should be positive, to reward fast peers. |
FirstMessageDeliveriesDecay |
Float | Decay factor for P₂ . |
|
FirstMessageDeliveriesCap |
Float | Maximum value for P₂ . |
|
P₃ |
Mesh Message Delivery Rate | ||
MeshMessageDeliveriesWeight |
Weight | Weight of P₃ . |
Should be negative, to penalize peers below threshold. |
MeshMessageDeliveriesDecay |
Float | Decay factor for P₃ . |
|
MeshMessageDeliveriesThreshold |
Float | Value for P₃ below which we start penalizing peers. |
Should be positive. Value depends on expected message rate for topic. |
MeshMessageDeliveriesCap |
Float | Maximum value for P₃ . |
Must be >= MeshMessageDeliveriesThreshold . |
MeshMessageDeliveriesActivation |
Duration | Time a peer must be in the mesh before we start applying P₃ score. |
|
MeshMessageDeliveryWindow |
Duration | Time after first delivery that is considered "near-first". | Should be small, e.g. 1-5 ms. |
P₃b |
Mesh Message Delivery Failures | ||
MeshFailurePenaltyWeight |
Weight | Weight of P₃b . |
Should be negative, to penalize failed deliveries. |
MeshFailurePenaltyDecay |
Float | Decay factor for P₃b . |
|
P₄ |
Invalid Messages | ||
InvalidMessageDeliveriesWeight |
Weight | Weight ofP₄ . |
Should be negative, to penalize invalid messages. |
InvalidMessageDeliveriesDecay |
Float | Decay factor for P₄ . |
In order counter spam that elicits responses and consumes resources, some measures have been taken:
GRAFT
messages for unknown topics are ignored; in gossipsub v1.0 the router would always respond with aPRUNE
, which opens up an avenue for flooding with spamGRAFT
messages and consuming resources.IWANT
message responses are limited in the number of retransmissions to a certain peer; in gossipsub v1.0 the router always responds toIWANT
messages when the message in the cache. In gossipsub v1.1 the router responds a limited number of times to each peer so thatIWANT
spam does not cause a signficant drain of resources.IHAVE
messages are capped to a certain number ofIHAVE
messages and aggregate number of message IDs advertised per heartbeat, in order to reduce the exposure to floods. If moreIHAVE
advertisements are received than the limit (or more messages are advertised than the limit), then additionalIHAVE
messages are ignored.- In flight
IWANT
requests, sent as a response to anIHAVE
advertisement, are probabilistically tracked. For eachIHAVE
advertisement which elicits anIWANT
request, the router tracks a random message ID within the advertised set. If the message is not received (from any peer) within a period of time, then a behavioural penalty is applied to the advertising peer throughP₇
. This measure helps protect against spamIHAVE
floods by quickly flagging and graylisting peers who advertise bogus message IDs and/or do not follow up to theIWANT
requests. - Invalid message spam, either directly transmitted or as a response to an
IHAVE
message is penalized by the score function. A peer transmitting lots of spam will quickly get graylisted, reducing the surface of spam-induced computation (eg validation). The application can take further steps and blacklist the peer if the spam persists after the negative score decays.
An important issue to consider when deploying gossipsub is the peer discovery mechanism, which must provide a secure way of discovering new peers. Prior to gossipsub v1.1, operators were required to utilize an external peer discovery mechanism to locate peers participating in particular topics; with gossipsub v1.1 this is now entirely optional and the network can bootstrap purely through a small set of network entry points (bootstrappers) by utilizing Peer Exchange. In other words, gossipsub 1.1 is now self-sufficient in this regard, as long as the node manages to find at least one peer participating in the topic of interest.
In order to successfully bootstrap the network without a discovery service, network operators should
- Create and operate a set of stable bootstrapper nodes, whose addresses are known ahead of time by the application.
- The bootstrappers should be configured without a mesh (ie set
D=D_lo=D_hi=D_out=0
) and with Peer Exchange enabled, utilizing Signed Peer Records. - The application should assign a high application-specific score to the bootstrappers and
set
AcceptPXThreshold
to a high enough value attainable only by the bootstrappers.
In this manner, the bootstrappers act purely as gossip and peer exchange nodes that facilitate the formation and maintenance of the network. Note that the score function is still present in the bootstrappers, which ensures that invalid messages, colocation, and behavioural penalties apply to misbehaving nodes such that they do not receive PX or are advertised to the rest of the network. In addition, network operators may configure the application-specific scoring function such that the bootstrappers enforce further constraints into accepting new nodes (eg protocol handshakes, staked participation, and so on).
It should be emphasized that the security of the peer discovery service affects the ability of the system to bootstrap securely and recover from large-scale attacks. Network operators must take care to ensure that whichever peer discovery mechanism they opt to utilize is resilient to attacks and can always return some honest peers so that connections between honest peers can be established. Furthermore, it is strongly recommended that any external discovery service is augmented by bootstrappers/directory nodes configured with Peer Exchange and high application-specific scores, as outlined above.