Per-topic MsgIdFunction #464

Wondertan · 2021-12-28T16:31:02Z

Pubsub only allows setting global MsgIdFunction, which applies to all the topics. Still, It would be useful to have the ability to set custom per topic msg id generation, which, e.g., can be dependant on the msg format of some topic. For this to be implemented using the global MsgIdFunction, it would need to be aware of all the msg types of each topic and do some ugly switching to determine what MsgIdFunction to use for what msg.

Fortunately, an unused TopicOpt can set custom values while joining the topic, so some per topic customizations are conceived by design, and the described feature can be implemented.

The text was updated successfully, but these errors were encountered:

vyzo · 2021-12-28T17:05:28Z

Is this actually necessary?
The msgID function has access to the message and can look at the topic.
So the desired functionality is already there.

vyzo · 2021-12-28T17:09:01Z

Note that there is no inherent need for ugly switching, one can use a map with a readonable default.

Wondertan · 2021-12-28T17:31:53Z

Imagine there are two completely different protocols relying on PubSub. One of them needs custom msg id generation and another one is not. The application which relies on these two protocols and creates a PubSub instance would need to be aware of the protocol internals and import their msg types to do the switch. So the ugliness comes not from the switch itself but from the ugly architecture requiring the application to know the internals. Sorry for the confusion.

Wondertan · 2021-12-28T17:41:12Z

The point is to allow msg id generation to be application/protocol/layer specific.

In my case, I need to exclude one non-deterministic field of a structure from id generation with blake2b, but I don't want to have one global MsgIdFunction importing all msg types from all the protocols to switch over them.

vyzo · 2021-12-28T17:42:05Z

Well ok, but it complicates things.

Wondertan · 2021-12-28T17:42:30Z

How?

vyzo · 2021-12-28T17:43:46Z

The right way to do it internally is to have one msgid function, which dsipatches on a map of funcs set potentially per topic and a default that can be overwritten as is now with the option.

vyzo · 2021-12-28T17:45:54Z

this way the internal changes will be minimal, and we dont break existing applications.

Wondertan · 2021-12-28T17:48:22Z

The solution I proposed keeps the logic and API the same and still allows customization. If WithCustomMessageIDFunc is not given for a topic, then the topic uses the global default or opted one, so nothing should break.

In other words,WithCustomMessageIDFunc optionally overrides global MsgIdFunction only for a specific topic.

vyzo · 2021-12-28T18:18:22Z

That's not very elegant i think, it makes users reinvent the wheel and messy internals.
We can do better.

Lets create a struct type that has a map and a default, with a RWMutex.
It also has a method for computing the msgid, which caches and we pass it where we otherwise use the msgid func.
Lets also add the topic option, which mutates the map with exclusive lock.

This way nothing changes in the external interface and on the same time we get a maximally flexible solution.
The extant msgid option would set the default.

vyzo · 2021-12-28T18:23:24Z

Btw, for future reference, it is preferrable to discuss design in an issue and reach consensus before opening the pr, so that you dont do unnecessary work.

Wondertan · 2021-12-28T19:23:20Z

I am in 👍🏻

We can do better.

Thank you for the detailed description. Qs:

The map like map[string]map[*pb.Message]string? // topic -> msg -> id
- So we don't need to add ID field on the Message(wrapper). Is this bad and/or introduces messy internals? IMHO, adding ID to the Message(wrapper) is clean, but I might be missing something.
Do we want to lock the processLoop? Although in practice map mutates would happen only on application start as rarely topics are joined in the runtime, so this is not a concern.
WDYT about msgIDGen as the name for the struct?
Should I still split the implementation into two PRs? For once only compute and for the option.
This requires each topic to have a ptr to the struct so the option can access it. Are you ok with that?

Lets also add the topic option, which mutates the map with exclusive lock.

Agree and I wanted to write a proposed solution but it turned out to be easier to implement it instead of describing it in English for this case :)

Btw, for future reference, it is preferrable to discuss design in an issue and reach consensus before opening the pr, so that you dont do unnecessary work.

vyzo · 2021-12-28T19:35:55Z

The map will be a map[string]func(*pb.Message)string. Basically a map of topic names to functions that compute their msgid. The entry point in the object will be a func (m *Message) string method, which does the following:

Checks the cache on m; if it is already computed, it returns it directly.
RLocks the map, and retrieves the compute func
Invokes the compute func with the *pb.Message, caches and returns the result

There is absolutely no need to lock the process loop; the lock is rw and scoped in the object that manages the msgid func map.
Not bad, but we can use more vowels. Maybe MsgIDGenerator?
No, just do one pr. We don't ahve to change the code that touches the msgid at all actually, we just pass the method in our MsgIdGen object to SetMsgID and we are set.
No, it doesn't; it has a pointer to the pubsub object and this is sufficient. We can invoke methods in the msgIDGen instance in pubsub directly.

Agree and I wanted to write a proposed solution but it turned out to be easier to implement it instead of describing it in English for this case :)

Yeah, sometimes it is easier and that's totally fine, I just don't want you to do work that might be thrown away at design stage. Then again you learn the internals better by doing that, so it's not a total loss.

vyzo · 2021-12-28T19:37:34Z

Note that the cache is new field we add on the Message wrapper struct.
It doesn't have to be thread safe, as the first thing we do with the message before any concurrency is involved is compute its message id.

vyzo · 2021-12-28T19:38:22Z

Also note that we should relinquish the rlock before invoking the compute function, so that we don't block other stuff while it is computing.

Wondertan · 2021-12-28T23:48:33Z

We still need to change MessageCache a bit as func (m *Message) string != func (m *pb.Message) string. This includes both Add(to rcv a wrapper) and SetMsgIdFn(to rcv a generator with wrapper as param) methods.

We don't ahve to change the code that touches the msgid at all actually, we just pass the method in our MsgIdGen object to SetMsgID and we are set.

This is also why I originally thought of the map with *pb.Message, as we don't have access to the wrapper in most places.

vyzo · 2021-12-29T06:55:55Z

Ah yes indeed, but thats a very small change.

BigLep · 2022-01-07T16:35:18Z

2022-01-07: there is a linked PR (#465 ) that we're trying to get across the line.

Wondertan mentioned this issue Dec 28, 2021

feat: custom per-topic MsgIdFunction #465

Closed

vyzo mentioned this issue Jan 21, 2022

feat: msgIdGenerator #468

Merged

vyzo closed this as completed in #468 Jan 23, 2022

cce mentioned this issue Jul 13, 2023

fix(pubsub): remove "Implementation note" about setting message_id_fn libp2p/specs#557

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Per-topic MsgIdFunction #464

Per-topic MsgIdFunction #464

Wondertan commented Dec 28, 2021

vyzo commented Dec 28, 2021

vyzo commented Dec 28, 2021

Wondertan commented Dec 28, 2021

Wondertan commented Dec 28, 2021 •

edited

vyzo commented Dec 28, 2021

Wondertan commented Dec 28, 2021

vyzo commented Dec 28, 2021

vyzo commented Dec 28, 2021

Wondertan commented Dec 28, 2021 •

edited

vyzo commented Dec 28, 2021 •

edited

vyzo commented Dec 28, 2021

Wondertan commented Dec 28, 2021

vyzo commented Dec 28, 2021

vyzo commented Dec 28, 2021

vyzo commented Dec 28, 2021

Wondertan commented Dec 28, 2021 •

edited

vyzo commented Dec 29, 2021

BigLep commented Jan 7, 2022

Per-topic MsgIdFunction #464

Per-topic MsgIdFunction #464

Comments

Wondertan commented Dec 28, 2021

vyzo commented Dec 28, 2021

vyzo commented Dec 28, 2021

Wondertan commented Dec 28, 2021

Wondertan commented Dec 28, 2021 • edited

vyzo commented Dec 28, 2021

Wondertan commented Dec 28, 2021

vyzo commented Dec 28, 2021

vyzo commented Dec 28, 2021

Wondertan commented Dec 28, 2021 • edited

vyzo commented Dec 28, 2021 • edited

vyzo commented Dec 28, 2021

Wondertan commented Dec 28, 2021

vyzo commented Dec 28, 2021

vyzo commented Dec 28, 2021

vyzo commented Dec 28, 2021

Wondertan commented Dec 28, 2021 • edited

vyzo commented Dec 29, 2021

BigLep commented Jan 7, 2022

Wondertan commented Dec 28, 2021 •

edited

Wondertan commented Dec 28, 2021 •

edited

vyzo commented Dec 28, 2021 •

edited

Wondertan commented Dec 28, 2021 •

edited