Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Per-topic MsgIdFunction #464

Closed
Wondertan opened this issue Dec 28, 2021 · 18 comments · Fixed by #468
Closed

Per-topic MsgIdFunction #464

Wondertan opened this issue Dec 28, 2021 · 18 comments · Fixed by #468

Comments

@Wondertan
Copy link
Contributor

Pubsub only allows setting global MsgIdFunction, which applies to all the topics. Still, It would be useful to have the ability to set custom per topic msg id generation, which, e.g., can be dependant on the msg format of some topic. For this to be implemented using the global MsgIdFunction, it would need to be aware of all the msg types of each topic and do some ugly switching to determine what MsgIdFunction to use for what msg.

Fortunately, an unused TopicOpt can set custom values while joining the topic, so some per topic customizations are conceived by design, and the described feature can be implemented.

@vyzo
Copy link
Collaborator

vyzo commented Dec 28, 2021

Is this actually necessary?
The msgID function has access to the message and can look at the topic.
So the desired functionality is already there.

@vyzo
Copy link
Collaborator

vyzo commented Dec 28, 2021

Note that there is no inherent need for ugly switching, one can use a map with a readonable default.

@Wondertan
Copy link
Contributor Author

Imagine there are two completely different protocols relying on PubSub. One of them needs custom msg id generation and another one is not. The application which relies on these two protocols and creates a PubSub instance would need to be aware of the protocol internals and import their msg types to do the switch. So the ugliness comes not from the switch itself but from the ugly architecture requiring the application to know the internals. Sorry for the confusion.

@Wondertan
Copy link
Contributor Author

Wondertan commented Dec 28, 2021

The point is to allow msg id generation to be application/protocol/layer specific.

In my case, I need to exclude one non-deterministic field of a structure from id generation with blake2b, but I don't want to have one global MsgIdFunction importing all msg types from all the protocols to switch over them.

@vyzo
Copy link
Collaborator

vyzo commented Dec 28, 2021

Well ok, but it complicates things.

@Wondertan
Copy link
Contributor Author

How?

@vyzo
Copy link
Collaborator

vyzo commented Dec 28, 2021

The right way to do it internally is to have one msgid function, which dsipatches on a map of funcs set potentially per topic and a default that can be overwritten as is now with the option.

@vyzo
Copy link
Collaborator

vyzo commented Dec 28, 2021

this way the internal changes will be minimal, and we dont break existing applications.

@Wondertan
Copy link
Contributor Author

Wondertan commented Dec 28, 2021

The solution I proposed keeps the logic and API the same and still allows customization. If WithCustomMessageIDFunc is not given for a topic, then the topic uses the global default or opted one, so nothing should break.

In other words,WithCustomMessageIDFunc optionally overrides global MsgIdFunction only for a specific topic.

@vyzo
Copy link
Collaborator

vyzo commented Dec 28, 2021

That's not very elegant i think, it makes users reinvent the wheel and messy internals.
We can do better.

Lets create a struct type that has a map and a default, with a RWMutex.
It also has a method for computing the msgid, which caches and we pass it where we otherwise use the msgid func.
Lets also add the topic option, which mutates the map with exclusive lock.

This way nothing changes in the external interface and on the same time we get a maximally flexible solution.
The extant msgid option would set the default.

@vyzo
Copy link
Collaborator

vyzo commented Dec 28, 2021

Btw, for future reference, it is preferrable to discuss design in an issue and reach consensus before opening the pr, so that you dont do unnecessary work.

@Wondertan
Copy link
Contributor Author

I am in 👍🏻

We can do better.

Thank you for the detailed description. Qs:

  • The map like map[string]map[*pb.Message]string? // topic -> msg -> id
    • So we don't need to add ID field on the Message(wrapper). Is this bad and/or introduces messy internals? IMHO, adding ID to the Message(wrapper) is clean, but I might be missing something.
  • Do we want to lock the processLoop? Although in practice map mutates would happen only on application start as rarely topics are joined in the runtime, so this is not a concern.
  • WDYT about msgIDGen as the name for the struct?
  • Should I still split the implementation into two PRs? For once only compute and for the option.
  • This requires each topic to have a ptr to the struct so the option can access it. Are you ok with that?

    Lets also add the topic option, which mutates the map with exclusive lock.

Agree and I wanted to write a proposed solution but it turned out to be easier to implement it instead of describing it in English for this case :)

Btw, for future reference, it is preferrable to discuss design in an issue and reach consensus before opening the pr, so that you dont do unnecessary work.

@vyzo
Copy link
Collaborator

vyzo commented Dec 28, 2021

  1. The map will be a map[string]func(*pb.Message)string. Basically a map of topic names to functions that compute their msgid. The entry point in the object will be a func (m *Message) string method, which does the following:
  • Checks the cache on m; if it is already computed, it returns it directly.
  • RLocks the map, and retrieves the compute func
  • Invokes the compute func with the *pb.Message, caches and returns the result
  1. There is absolutely no need to lock the process loop; the lock is rw and scoped in the object that manages the msgid func map.
  2. Not bad, but we can use more vowels. Maybe MsgIDGenerator?
  3. No, just do one pr. We don't ahve to change the code that touches the msgid at all actually, we just pass the method in our MsgIdGen object to SetMsgID and we are set.
  4. No, it doesn't; it has a pointer to the pubsub object and this is sufficient. We can invoke methods in the msgIDGen instance in pubsub directly.

Agree and I wanted to write a proposed solution but it turned out to be easier to implement it instead of describing it in English for this case :)

Yeah, sometimes it is easier and that's totally fine, I just don't want you to do work that might be thrown away at design stage. Then again you learn the internals better by doing that, so it's not a total loss.

@vyzo
Copy link
Collaborator

vyzo commented Dec 28, 2021

Note that the cache is new field we add on the Message wrapper struct.
It doesn't have to be thread safe, as the first thing we do with the message before any concurrency is involved is compute its message id.

@vyzo
Copy link
Collaborator

vyzo commented Dec 28, 2021

Also note that we should relinquish the rlock before invoking the compute function, so that we don't block other stuff while it is computing.

@Wondertan
Copy link
Contributor Author

Wondertan commented Dec 28, 2021

We still need to change MessageCache a bit as func (m *Message) string != func (m *pb.Message) string. This includes both Add(to rcv a wrapper) and SetMsgIdFn(to rcv a generator with wrapper as param) methods.

We don't ahve to change the code that touches the msgid at all actually, we just pass the method in our MsgIdGen object to SetMsgID and we are set.

This is also why I originally thought of the map with *pb.Message, as we don't have access to the wrapper in most places.

@vyzo
Copy link
Collaborator

vyzo commented Dec 29, 2021

Ah yes indeed, but thats a very small change.

@BigLep
Copy link

BigLep commented Jan 7, 2022

2022-01-07: there is a linked PR (#465 ) that we're trying to get across the line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants