Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional Content addressed messages #247

Closed
AgeManning opened this issue Dec 16, 2019 · 8 comments · Fixed by #248
Closed

Optional Content addressed messages #247

AgeManning opened this issue Dec 16, 2019 · 8 comments · Fixed by #248

Comments

@AgeManning
Copy link

Currently message Id's are source_id + sequence_number. For my particular use-case (Eth 2), it would be nice if we could optionally set this ID based on the content of message, for example: hash(message).

This would allow us to filter messages at the pubsub layer based on content rather than source peer.

I was thinking of adding a configuration parameter to rust-libp2p, which takes a function f: (source_id, seq_no, message) -> Stringto allow the user to specify the message id, given source_id, seq_no, message.

For my case, this would simply be hash(message)

This would be a general way to allow users to specify how messages are addressed.

I'm open to all suggestions on this and thoughts of feasibility in the go implementation.

protolambda added a commit to protolambda/go-libp2p-pubsub that referenced this issue Dec 16, 2019
protolambda added a commit to protolambda/go-libp2p-pubsub that referenced this issue Dec 16, 2019
@aschmahmann
Copy link
Contributor

aschmahmann commented Dec 16, 2019

TLDR: We should do this! Sorry this post is a bit long, I figured more context was better here. My proposal is at the bottom.

@AgeManning 💯 for having content-addressed (or configurable) messageIDs. However, it'd be great to understand why you think it's necessary for your particular case, as well as to make sure we cover a number of the edge cases.

Current State

The places messageIDs are currently used in this repo are:

  • Pubsub Validators: We prevent duplicate calls to the validator by checking messageIDs
  • Pubsub Propagation: Prevent republishing messages we've already published
  • Gossipsub
    • Gossip Propagation: messageIDs are the keys in the list of messages to emit gossip about to the network
    • Gossip Response: messageIDs are used by Gossipsub to decide if it wants to retrieve some message that was gossiped to it

The two Pubsub usages do not require a custom messageID (although it might be nice), since it could be implemented as part of the Validator for the given topic. However, making Gossipsub efficiently deal with many parties publishing the same content would require changes.

Issues to Consider

A few cases to consider if/when implementing this in go-libp2p-pubsub:

  1. Pubsub Routers (e.g. Floodsub, Gossipsub, etc.) need to support multiple topics
  2. go-libp2p-pubsub is not currently setup to support multiple instances (e.g. your application chooses to use a single pubsub instance with Gossipsub, not one with Gossipsub and another with Floodsub)
  3. The pubsub protocol supports a single message belonging to multiple topics (although there's currently no way to access this behavior from go-libp2p-pubsub)
  4. Some developers would like to rebroadcast messages whether to compensate for lost pubsub messages or because the application logic relies on the duplicate messages

1+2 together imply that we cannot just add an option to Pubsub or Gossipsub, but might be able to have a per-topic option.

Combining this with 3 implies that for any modifications to messageID sets we make in Pubsub or Gossipsub that we should make sure to make them per-topic

4 is a reminder that there are existing pubsub users + topics that wouldn't be served if we simply started switching to content addressed messages (which seems generally like a pretty useful idea)

Proposal

A proposal that could certainly be improved with greater understanding of why Eth2 wants to use content addressed messages is:

  1. Add topic option for a custom messageID
  2. Use a separate messageCache per topic. Use the topic's messageID function as the key
  3. a) Ignore the Pubsub uses of messageID and use validator based checks
    b) Use a separate messageSeen cache per topic

Note: we could be more efficient if we could use one cache per messageID function instead of per topic, but this is likely not worth the complexity.

@AgeManning @protolambda @vyzo does this seem like a reasonable solution?

@vyzo
Copy link
Collaborator

vyzo commented Dec 16, 2019

I am very open to having this functionality, and we can implement it quite easily.

@vyzo
Copy link
Collaborator

vyzo commented Dec 16, 2019

@aschmahmann per topic custom message IDs will greatly complicate things, as we can have a message being published to multiple topics.
I think for eth2's usage it suffices to have a single option that applies to all topics and eschew this complexity.

The simple thing to do is to have an optional parameter specifying the ID function, as @AgeManning suggested.

@aschmahmann
Copy link
Contributor

aschmahmann commented Dec 16, 2019

The simple thing to do is to have an optional parameter specifying the ID function

If you do the simple thing and make the optional parameter global then we run into problems because go-libp2p-pubsub's Pubsub implementation is a singleton. There is no way for me to both publish using content addressed messageIDs and also rebroadcast a message to a topic that expects rebroadcasting to function.


@vyzo If you'd like to move ahead and use an optional parameter in PubSub that's fine, but the protocol IDs should not be meshsub or floodsub since they are no longer spec compliant.

If the optional parameter is at least in Gossipsub instead of PubSub then we won't break rebroadcasting and so the protocols can still be compliant.

@vyzo
Copy link
Collaborator

vyzo commented Dec 16, 2019

This is application specific configuration that is good for now; when we move to per-topic routers we can make the msgID a function a topic parameter.

@vyzo
Copy link
Collaborator

vyzo commented Dec 16, 2019

Also note that we can revise the spec to allow for arbitrary message ID functions; we'll have to revise for gossipsub v1.1 anyway.

@aschmahmann
Copy link
Contributor

when we move to per-topic routers

Is there an issue or ETA on that I can track?

Also note that we can revise the spec to allow for arbitrary message ID functions; we'll have to revise for gossipsub v1.1 anyway.

Sure we can allow for arbitrary message ID functions in the spec, but until the per-topic routers issue is resolved these nodes will act like malfunctioning nodes when subscribed to topics where not everyone is using the same custom message ID function. If users wanted to run a single libp2p daemon on their machine to serve many applications they're now totally sunk.

@protolambda
Copy link
Contributor

See PR #248, I think the function type MsgIdFunction func(pmsg *pb.Message) string allows for the customizability we need, and based on pmg.TopicIDs you can always call different other msg ID functions. Say MyAttestationIdFn for the attestations topic to get content based IDs, and then DefaultMsgIdFn for everything else. Composition, yay :)
There are some complexities if you want different message ids on different topics for the same message, but all the other code doesn't account for that either. And at that point where you change functionality per topic, I think it is fair to just send two separate message objects with the same contents but different topics.

@vyzo vyzo closed this as completed in #248 Dec 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants