Optional Content addressed messages #247

AgeManning · 2019-12-16T00:46:54Z

Currently message Id's are source_id + sequence_number. For my particular use-case (Eth 2), it would be nice if we could optionally set this ID based on the content of message, for example: hash(message).

This would allow us to filter messages at the pubsub layer based on content rather than source peer.

I was thinking of adding a configuration parameter to rust-libp2p, which takes a function f: (source_id, seq_no, message) -> Stringto allow the user to specify the message id, given source_id, seq_no, message.

For my case, this would simply be hash(message)

This would be a general way to allow users to specify how messages are addressed.

I'm open to all suggestions on this and thoughts of feasibility in the go implementation.

The text was updated successfully, but these errors were encountered:

aschmahmann · 2019-12-16T09:19:01Z

TLDR: We should do this! Sorry this post is a bit long, I figured more context was better here. My proposal is at the bottom.

@AgeManning 💯 for having content-addressed (or configurable) messageIDs. However, it'd be great to understand why you think it's necessary for your particular case, as well as to make sure we cover a number of the edge cases.

Current State

The places messageIDs are currently used in this repo are:

Pubsub Validators: We prevent duplicate calls to the validator by checking messageIDs
Pubsub Propagation: Prevent republishing messages we've already published
Gossipsub
- Gossip Propagation: messageIDs are the keys in the list of messages to emit gossip about to the network
- Gossip Response: messageIDs are used by Gossipsub to decide if it wants to retrieve some message that was gossiped to it

The two Pubsub usages do not require a custom messageID (although it might be nice), since it could be implemented as part of the Validator for the given topic. However, making Gossipsub efficiently deal with many parties publishing the same content would require changes.

Issues to Consider

A few cases to consider if/when implementing this in go-libp2p-pubsub:

Pubsub Routers (e.g. Floodsub, Gossipsub, etc.) need to support multiple topics
go-libp2p-pubsub is not currently setup to support multiple instances (e.g. your application chooses to use a single pubsub instance with Gossipsub, not one with Gossipsub and another with Floodsub)
The pubsub protocol supports a single message belonging to multiple topics (although there's currently no way to access this behavior from go-libp2p-pubsub)
Some developers would like to rebroadcast messages whether to compensate for lost pubsub messages or because the application logic relies on the duplicate messages

1+2 together imply that we cannot just add an option to Pubsub or Gossipsub, but might be able to have a per-topic option.

Combining this with 3 implies that for any modifications to messageID sets we make in Pubsub or Gossipsub that we should make sure to make them per-topic

4 is a reminder that there are existing pubsub users + topics that wouldn't be served if we simply started switching to content addressed messages (which seems generally like a pretty useful idea)

Proposal

A proposal that could certainly be improved with greater understanding of why Eth2 wants to use content addressed messages is:

Add topic option for a custom messageID
Use a separate messageCache per topic. Use the topic's messageID function as the key
a) Ignore the Pubsub uses of messageID and use validator based checks
b) Use a separate messageSeen cache per topic

Note: we could be more efficient if we could use one cache per messageID function instead of per topic, but this is likely not worth the complexity.

@AgeManning @protolambda @vyzo does this seem like a reasonable solution?

vyzo · 2019-12-16T09:55:10Z

I am very open to having this functionality, and we can implement it quite easily.

vyzo · 2019-12-16T09:57:29Z

@aschmahmann per topic custom message IDs will greatly complicate things, as we can have a message being published to multiple topics.
I think for eth2's usage it suffices to have a single option that applies to all topics and eschew this complexity.

The simple thing to do is to have an optional parameter specifying the ID function, as @AgeManning suggested.

aschmahmann · 2019-12-16T10:27:46Z

The simple thing to do is to have an optional parameter specifying the ID function

If you do the simple thing and make the optional parameter global then we run into problems because go-libp2p-pubsub's Pubsub implementation is a singleton. There is no way for me to both publish using content addressed messageIDs and also rebroadcast a message to a topic that expects rebroadcasting to function.

@vyzo If you'd like to move ahead and use an optional parameter in PubSub that's fine, but the protocol IDs should not be meshsub or floodsub since they are no longer spec compliant.

If the optional parameter is at least in Gossipsub instead of PubSub then we won't break rebroadcasting and so the protocols can still be compliant.

vyzo · 2019-12-16T10:31:18Z

This is application specific configuration that is good for now; when we move to per-topic routers we can make the msgID a function a topic parameter.

vyzo · 2019-12-16T10:33:50Z

Also note that we can revise the spec to allow for arbitrary message ID functions; we'll have to revise for gossipsub v1.1 anyway.

aschmahmann · 2019-12-16T10:51:55Z

when we move to per-topic routers

Is there an issue or ETA on that I can track?

Also note that we can revise the spec to allow for arbitrary message ID functions; we'll have to revise for gossipsub v1.1 anyway.

Sure we can allow for arbitrary message ID functions in the spec, but until the per-topic routers issue is resolved these nodes will act like malfunctioning nodes when subscribed to topics where not everyone is using the same custom message ID function. If users wanted to run a single libp2p daemon on their machine to serve many applications they're now totally sunk.

protolambda · 2019-12-16T11:43:45Z

See PR #248, I think the function type MsgIdFunction func(pmsg *pb.Message) string allows for the customizability we need, and based on pmg.TopicIDs you can always call different other msg ID functions. Say MyAttestationIdFn for the attestations topic to get content based IDs, and then DefaultMsgIdFn for everything else. Composition, yay :)
There are some complexities if you want different message ids on different topics for the same message, but all the other code doesn't account for that either. And at that point where you change functionality per topic, I think it is fair to just send two separate message objects with the same contents but different topics.

protolambda added a commit to protolambda/go-libp2p-pubsub that referenced this issue Dec 16, 2019

fixes libp2p#247: implement msg id function as pubsub option

2f0a6fa

protolambda mentioned this issue Dec 16, 2019

Configurable message id function #248

Merged

protolambda added a commit to protolambda/go-libp2p-pubsub that referenced this issue Dec 16, 2019

fixes libp2p#247: implement msg id function as pubsub option

7981f9b

vyzo closed this as completed in #248 Dec 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optional Content addressed messages #247

Optional Content addressed messages #247

AgeManning commented Dec 16, 2019

aschmahmann commented Dec 16, 2019 •

edited

vyzo commented Dec 16, 2019

vyzo commented Dec 16, 2019

aschmahmann commented Dec 16, 2019 •

edited

vyzo commented Dec 16, 2019

vyzo commented Dec 16, 2019

aschmahmann commented Dec 16, 2019

protolambda commented Dec 16, 2019

Optional Content addressed messages #247

Optional Content addressed messages #247

Comments

AgeManning commented Dec 16, 2019

aschmahmann commented Dec 16, 2019 • edited

Current State

Issues to Consider

Proposal

vyzo commented Dec 16, 2019

vyzo commented Dec 16, 2019

aschmahmann commented Dec 16, 2019 • edited

vyzo commented Dec 16, 2019

vyzo commented Dec 16, 2019

aschmahmann commented Dec 16, 2019

protolambda commented Dec 16, 2019

aschmahmann commented Dec 16, 2019 •

edited

aschmahmann commented Dec 16, 2019 •

edited