-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Message Identifier Discussion #3
Comments
Id be curious on any and all scenarios that have requirements around this But the other question is around the 'internal' vs 'external' ids. like an id for the message itself and an id for the pub/sub model. Each app could have its own id system that is determined by them, but the pub/sub model could use an stricter id system for handling of ordering events or being able to replay/find certain events |
If we're talking about the minimal version so far, a separate id seems redundant. I think the timestamp is enough for most applications. |
We can make this pretty complex or pretty simple. First the complicated way. Take the following key attributes we're looking for and score them across a set of different ID systems. Key Attributes:
Evaluation Matrix (Empty):
We can go through each of these, score them, and see where we see the most highs. The simple way: Reduce it to two questions:
For example: the_quick_brown_fox can be represented as: 28690326576869079995668384064297231324925944108933159956634009002664434410779435694378716319761052827297390018201760082566703737107491700197379652288500795186180221022085578872 ...which is a nat. Not a great nat, but a nat. Ours would be a bit more thoughtful. Additionally, an Nat can be converted into a blob and represented as a lexically orderable string. Interesting hacker News article: https://news.ycombinator.com/item?id=18768909 Crockford Base32 looks like a good way to create a standard way of converting these to human-readable and shorten them up a bit: https://www.crockford.com/base32.html.
ie
For a distributed system, the prev id can be a complex encoded nat that includes the partition. The subscriber client can detect if it has missed a message from a particular partition by deciding the prev_id and making sure they have seen it, or else they wait or request. How this is interpreted may depend on if the event is a single-source or multi-source event. Single Source - Through strict partitioning, a subscriber can universally know if it is missing messages by the single source using a nonce based on the number of partitions to order messages. All of this to say: Unless there is some magic ID in the list above, or that we haven't thought of I'd propose some language along the lines of below: Event Ids are represented as Natural numbers with infinite precision. These numbers MAY be blob representations of more complicated numbering schemes converted to natural numbers. If the ID is encoded, it should be encoded as Crockford Base32 as specified in https://www.crockford.com/base32.html Events MAY specify a previous id in prev_id. to indicate the immediately preceding message id known by the broadcasting system. Event systems SHOULD provide null in systems where event ordering is not important or where ordering is dependent on details internal to the ID. Event systems MAY interpret the previous id as a matter of implementation such that: Single Publisher, Single Broadcaster systems should have a consistent chain of messages and no messages should be dropped. Single Publisher, Multi Broadcaster systems should have a consistent chain of messaging according to nonce partitioning and no messages should be dropped. Multi Publisher, Multi-Brodcaster systems should have consistent chains across publisher-based partitions such that either eacht partition stays consistent, or that all messages may be ordered provided there is an event-specific epoch close-out schema. I need to think about this last one a bit more....but I think this is basically emulating how the IC works in knowing that everyone has agreed on the ordering of messages in a block. Keep in mind that while this may seem complicated in specification language, we are creating something that very simple implementations can check themselves against. A simple private pub-sub that doesn't care about ordering and just increases a nonce can check itself against this and see that it doesn't violate the schema. |
Another thing that popped up...and along with some other comments is what to do with the difference between the EventID(assigned by the publisher) and the NotificationID(Assigned by the Broadcaster). Does the subscriber need to know this? Is it always just EventID + SubscriberID? Is is helpful in a debugging situation for the Subscriber to know it? If a downstream system is just looking at a reporting dashboard they may just have the event attributes and may be oblivious to canister ids...in which case, wouldn't they want to know their notification id? Perhaps this should be a separate Issue? My gut is that it would be helpful and simplify error reporting, message confirmation, etc, and we can make it optional as it won't be very useful for Single publisher, single broadcaster, single subscriber implementations. It kind of sucks that you can't just not include the value (at least in motoko) and have it default to null. But I guess if were building libraries, the simple libraries can just not include it in the client types and devs won't really know it is there. This would split the types between an EventType and a DeliveredEventType:
|
Java 'int' has a very small size of 4 bytes. Java 'long' has 8 bytes and can't contain very long numbers (max is 9,223,372,036,854,775,807). The pair <publisher's canisterId:timestamp> or (publisher's canisterId, timestamp) as eventId seems sufficient to uniquely identify a message even in a distributed system. It is simple, clear, and unambiguous. So my vote for Lamport timestamps with publisher ID. A pair (eventId, eventFilter) could be used to handle specific subscription events, since subscribers use the same filters as publishers. eventFilter could be a pair (text, blob) with a topic name and a filter within the topic or just text with candyPath. The optional prev_id seems logical and allows chaining to be handled. |
As I see it, the Broadcaster role is a middleware to send event messages to unknown (and untrusted) recipients (canisters). If so, the broadcaster's own id doesn't seem to be required, as the middleware only accepts messages with the event id from the publisher and forwards them to all subscribers via a one-way message. It will then send replies back to the publisher via a one-shot message with eventId and subscriberId. |
Vote passed |
The message identifier should be flexible enough so that for simple systems it can just be an increasing nonce, but for more complex systems we want deterministic ordering and the ability to know you missed a message.
How do we accomplish this?
Vector Clocks?
Lamport Timestamps?
Including previous complex nonce in each message?
{timestamp : Nat64} | {PublisherID: Nat64} | {(increasing nonce * broadcaster count)} //for partitioning
The text was updated successfully, but these errors were encountered: