diff --git a/pubsub/README.md b/pubsub/README.md index 687851ec5..3404e32d7 100644 --- a/pubsub/README.md +++ b/pubsub/README.md @@ -32,6 +32,7 @@ and spec status. - [The RPC](#the-rpc) - [The Message](#the-message) - [Message Signing](#message-signing) + - [Message Identification](#message-identification) - [The Topic Descriptor](#the-topic-descriptor) - [AuthOpts](#authopts) - [AuthMode 'NONE'](#authmode-none) @@ -112,6 +113,9 @@ message Message { } ``` +The `optional` fields may be omitted, depending on the +[signature policy](#message-signing) and [message ID function](#message-identification) + The `from` field denotes the author of the message, note that this is not necessarily the peer who sent the RPC this message is contained in. This is done to allow content to be routed through a swarm of pubsubbing peers. @@ -123,14 +127,7 @@ The `seqno` field is a 64-bit big-endian uint that is a linearly increasing number that is unique among messages originating from each given peer. No two messages on a pubsub topic from the same peer should have the same `seqno` value, however messages from different peers may have the same sequence number, -so this number alone cannot be used to address messages. Notably the -'timecache' in use by the go implementation contains a `message_id`, -which is constructed from the concatenation of the `seqno` and the `from` -fields. This `message_id` is then unique among messages. It was also proposed -in [#116](https://github.com/libp2p/specs/issues/116) to use a `message_hash`, -however, it was noted: "a potential caveat with using hashes instead of seqnos: -the peer won't be able to send identical messages (e.g. keepalives) within the -timecache interval, as they will get rejected as duplicates." +so this number alone cannot be used to address messages by origin-stamping. The `topicIDs` field specifies a set of topics that this message is being published to. @@ -149,17 +146,52 @@ economics (see e.g. and [here](https://ethresear.ch/t/improving-the-ux-of-rent-with-a-sleeping-waking-mechanism/1480)). +## Message Identification + +To uniquely identify a message in a set of topics (for de-duplication, tracking, scoring and other purposes), a `message_id` is calculated based on the message. +How the calculated happens can be configured on the application layer by supplying a function `message_id_fn`, such that `message_id_fn(*Message) => message_id`. + +> [[ Implementation note ]]: At the time of writing this section, go-libp2p-pubsub (reference implementation of this spec) only allows configuring a single top-level `message_id_fn`. This function may, however, vary its behaviour based on the topic (contained inside its `*Message`) argument. Thus, it's feasible to implement a per-topic policy using branch selection control flow logic. go-libp2p-pubsub plans to push down the configuration of the `message_id_fn` to the topic level. Other implementations are encouraged to do the same. + +The message ID calculation approach generally fits in two flavors: +- **origin-stamped** messaging: the combination of the `seqno` and `from` fields + uniquely identifies a message based on the *author*. +- **content-addressed** messaging: a message ID derived from the `data` field + uniquely identifies a message based on the *data*. + +**The default `message_id_fn` is origin-stamped,** and defined as the string concatenation of `from` and `seqno`. + +If fabricated collisions are not a concern, or difficult enough within the window the message is relevant in, +a `message_id` based on a short digest of inputs may benefit performance. Whichever the choice, it is crucial that **all peers** participating in a topic implement the same message ID calculation logic, or the topic may function suboptimally. + +Note that different specialized pubsub components, such as the 'timecache' used in the Go implementation, scoring functions or circuit-breakers +may use the `message_id` to key and track messages. + +It was also proposed in [#116](https://github.com/libp2p/specs/issues/116) +to use a `message_hash`, however, it was noted: +> a potential caveat with using hashes instead of seqnos: +the peer won't be able to send identical messages (e.g. keepalives) within the +timecache interval, as they will get treated as duplicates. + +Some applications may not need keepalives, or choose to implement something more specific than a message hash. In those cases where duplicate payloads are not desirable, a `content-based` message ID function may be more appropriate. + ## Message Signing Messages can be optionally signed, and it is up to the peer whether to accept and forward unsigned messages. +The default choice of origin-stamped messaging, the receiver should enforce signatures strictly (`StrictSign`). +When the receiver expects unsigned content-stamped messages, and thus does not expect +the `from`, `seqno`, `signature`, or `key` fields, it may reject the messages (`StrictNoSign`). + +This optionality is configurable with the signature policy options starting from gossipsub v1.1. For signing purposes, the `signature` and `key` fields are used: - The `signature` field contains the signature. -- The `key` field contains the signing key when it cannot be inlined in the source peer ID. +- The `key` field contains the signing key when it cannot be inlined in the source peer ID (`from`). When present, it must match the peer ID. -The signature is computed over the marshalled message protobuf _excluding_ the key field. +The signature is computed over the marshalled message protobuf _excluding_ the `signature` field itself. +This includes any fields that are not recognized, but still included in the marshalled data. The protobuf blob is prefixed by the string `libp2p-pubsub:` before signing. When signature validation fails for a signed message, the implementation must diff --git a/pubsub/gossipsub/gossipsub-v1.1.md b/pubsub/gossipsub/gossipsub-v1.1.md index 6b754606e..466074169 100644 --- a/pubsub/gossipsub/gossipsub-v1.1.md +++ b/pubsub/gossipsub/gossipsub-v1.1.md @@ -37,6 +37,8 @@ See the [lifecycle document][lifecycle-spec] for context about maturity level an - [Explicit Peering Agreements](#explicit-peering-agreements) - [PRUNE Backoff and Peer Exchange](#prune-backoff-and-peer-exchange) - [Protobuf](#protobuf) + - [Signature Policy](#signature-policy) + - [Signature Policy Options](#signature-policy-options) - [Flood Publishing](#flood-publishing) - [Adaptive Gossip Dissemination](#adaptive-gossip-dissemination) - [Outbound Mesh Quotas](#outbound-mesh-quotas) @@ -134,6 +136,50 @@ message PeerInfo { } ``` +### Signature Policy + +The usage of the `signature`, `key`, `from`, and `seqno` fields in `Message` is now configurable per topic, in the manners specified in this section. +> [[ Implementation note ]]: At the time of writing this section, go-libp2p-pubsub (reference implementation of this spec) allows for configuring the signature policy at a global pubsub instance level. This needs to be pushed down to topic-level configuration. Other implementations are encouraged to support topic-level configuration, as the spec mandates. + +In the default origin-stamped messaging, the fields need to be strictly enforced: +the `seqno` and `from` fields form the `message_id`, and should be verified to avoid `message_id` collisions. + +In content-stamped messaging, the fields may negatively affect privacy: +revealing the relationship between `data` and `from`/`seqno`. + +#### Signature Policy Options + +In gossipsub v1.1, these fields are strictly present and verified, or completely omitted altogether: +- `StrictSign`: + - On the producing side: + - Build messages with the `signature`, `key` (`from` may be enough for certain inlineable public key types), `from` and `seqno` fields. + - On the consuming side: + - Enforce the fields to be present, reject otherwise. + - Propagate only if the fields are valid and signature can be verified, reject otherwise. +- `StrictNoSign`: + - On the producing side: + - Build messages without the `signature`, `key`, `from` and `seqno` fields. + - The corresponding protobuf key-value pairs are absent from the marshalled message, not just empty. + - On the consuming side: + - Enforce the fields to be absent, reject otherwise. + - Propagate only if the fields are absent, reject otherwise. + - A `message_id` function will not be able to use the above fields, and should instead rely on the `data` field. A commonplace strategy is to calculate a hash. + +In gossipsub v1.0, a legacy "lax" signing policy could be configured, to only verify signatures when present. For security reasons, this is strategy is discarded in subsequent versions, but MAY still be supported for backwards-compatibility. If so, its use should be discouraged through prominent deprecation warnings. These strategies will be entirely dropped in the future. +- `LaxSign`: *this was never an original gossipsub 1.0 option, but it's defined here for completeness, and considered insecure*. Always sign, and verify incoming signatures, and but accept unsigned messages. + - On the producing side: + - Build messages with the `signature`, `key` (`from` may be enough), `from` and `seqno` fields. + - On the consuming side: + - `signature` may be absent, and not verified. + - Verify `signature`, iff the `signature` is present, then reject if `signature` is invalid. +- `LaxNoSign`: *Previous default for no-verification*. Do not sign nor origin-stamp, but verify incoming signatures, and accept unsigned messages. + - On the producing side: + - Build messages without the `signature`, `key`, `from` and `seqno` fields. + - On the consuming side: + - Accept and propagate messages with above fields. + - Verify `signature`, iff the `signature` is present, then reject if `signature` is invalid. + + ### Flood Publishing In gossipsub v1.0, peers publish new messages to the members of their mesh if they are subscribed to