MSC4295: Bot bounce limit - a better loop prevention mechanism#4295
MSC4295: Bot bounce limit - a better loop prevention mechanism#4295m13253 wants to merge 30 commits into
Conversation
There was a problem hiding this comment.
Implementation requirements:
- Sending bot
- Receiving bot (the one that would loop)
| 3. For a room purposed for technical support, the operator can run an AI-powered bot to automatically answer common questions. Such AI bot is allowed to trigger other bots for certain helpful tasks. | ||
| 4. The room operator can run a "UTD notification bot" that notifies room members that their messages can't be decrypted by others. However, it is very important to prevent it from replying another bot's message. | ||
| 5. When bridging rooms across three or more platforms (e.g., Matrix ⇌ Telegram ⇌ IRC ⇌ Matrix), it is necessary to make sure each bridge doesn't pick up another bridge's messages. | ||
| 6. Bridges supporting double-puppeting needs to ignore messages sent by a reverse puppet. Although they already employ proprietary methods (e.g., vendor-prefixed tags like `fi.mau.double_puppet_source` or a list of ignored user IDs), it could be very useful to provide a standardized loop-preventing mechanism, allowing bridges from different vendors to work in harmony at the same room. |
There was a problem hiding this comment.
This proposal doesn't appear to solve the problem that fi.mau.double_puppet_source was made for. Specifically, when a bridge sends a message from a double puppet (not a bridge ghost user), it must have some flag to prevent echoing back the message to the remote network where it came from. The flag is not meant to stop any other bridge or bot from reacting to the message, it's only meant to be detected by the origin bridge.
There was a problem hiding this comment.
This proposal doesn't appear to solve the problem that
fi.mau.double_puppet_sourcewas made for. Specifically, when a bridge sends a message from a double puppet (not a bridge ghost user), it must have some flag to prevent echoing back the message to the remote network where it came from. The flag is not meant to stop any other bridge or bot from reacting to the message, it's only meant to be detected by the origin bridge.
It’s an honor to get a feedback from Mautrix’s side!
After understanding your explanations, I admit this proposal doesn’t solve fi.mau.double_puppet_source’s problem. In fact, this proposal seems to solve a completely different problem orthogonal to fi.mau.double_puppet_source’s problem.
- Although both mechanisms are designed to prevent infinite loop,
fi.mau.double_puppet_sourceprevents looping between two platforms,m.bounce_limitprevents looping within Matrix. fi.mau.double_puppet_sourcedoesn’t prevent the message being interacted by other bots or bridges, whilem.bounce_limitsets a bounce limit to do so.- On the other hand,
fi.mau.double_puppet_sourcedoesn't deal with the situation where a room has two independent bridge instances -- e.g., one relaybot maintained by the room operator, and one double-puppet maintained by a room member on a separate homeserver --, whilem.bounce_limittries to solve this problem.
(Please correct me if my understanding is still inaccurate.)
Probably I will need to rephrase or remove this sentence. I think m.bounce_limit won’t replace fi.mau.double_puppet_source. They will co-exist, because two mechanisms solve two different problems.
There was a problem hiding this comment.
Update: I replaced this example in the Background section with another example.
|
|
||
| However, there are a few disadvantages of `m.notice`: | ||
|
|
||
| 1. It is analogous to `m.text`, which doesn't support attached files or encrypted images. |
There was a problem hiding this comment.
Extensible events already have a solution for this #3955
There was a problem hiding this comment.
Extensible events already have a solution for this #3955
Looks interesting. The difference is MSC3955 uses boolean to mark automated messages, while this MSC4295 uses integer.
The advantage of integer TTL is it allows multiple bots to work together. — which is the biggest motivation of this proposal.
Do you think combining both ideas together is a way to go? (Extensible Events + integer TTL)
(Informally I’ll call it TTL, as networking people may be more familiar with this term. Formally it should be called “Bounce Limit.”)
There was a problem hiding this comment.
Update: I included MSC3955 in the Existing solutions section, parallel to the m.notice subsection.
|
|
||
| 1. It is analogous to `m.text`, which doesn't support attached files or encrypted images. | ||
| 2. It is designed for automated messages, not bridged messages sent originally by a human. | ||
| 3. Similarly, `m.notice` won't be picked up by bridges to forward to a bridged platform. |
There was a problem hiding this comment.
Bridges can and do pick up m.notice if configured to do so
There was a problem hiding this comment.
Bridges can and do pick up
m.noticeif configured to do so
Thanks for your confirmation!
Indeed, I just checked the Matrix API Spec: The spec doesn’t say whether bridges should or shouldn’t pick up m.notice.
I was careless. I will rephrase this sentence.
Signed-off-by: Star Brilliant <coder@poorlab.com>
Signed-off-by: Star Brilliant <coder@poorlab.com>
Signed-off-by: Star Brilliant <coder@poorlab.com>
Signed-off-by: Star Brilliant <coder@poorlab.com>
Signed-off-by: Star Brilliant <coder@poorlab.com>
|
|
||
| These are invalid forms, and their normalization rules upon receiving: | ||
|
|
||
| 1. The number 0, which should be treated as missing. (This design is to simplify the development of bots in certain programming languages, such as Go.) |
There was a problem hiding this comment.
This doesn't make semantic sense. Logically a value of 0 would mean "do not forward".
There was a problem hiding this comment.
This doesn't make semantic sense. Logically a value of 0 would mean "do not forward".
Thanks for your comment.
Here are two explanations of this decision:
-
This is an analogy of Hop Limit in IP networks.
An IP packet with Hop Limit of 1 means the packet is able to transmit to the recipient, but if the recipient is a router, it isn’t allowed to forward the packet to anywhere else.
An IP packet with Hop Limit of 0, if I remember correctly, is invalid.
Therefore, a developer who has experience with IP networking might be able to feel the current design more familiar to them, than making 0 a valid value. -
If we make 0 an invalid value, some programming languages need some fewer steps to distinguish 0 and a missing value.
One example of such programming languages, is Go.
To distinguish a 0 value and a missing value, the Go struct needs to be written as:type OriginalRoomMessageEventContent struct { Body string `json:"body"` BounceLimit *int64 `json:"m.bounce_limit"` // *int64 instead of int64 ... }
which uses one layer of pointer to distinguish 0 and missing, meaning slower performance (although negligible), more memory fragments, and more work on the developer side to get the logic right.
C++ may be similar — depending on which JSON library you use.
In other programming languages that supports nullable data types, such as Rust’sOption, C#’sNullable, or TypeScript’sT | undefined, at least one more check is required to distinguish the missing value and to extract the valid numbers out.
Therefore, making 0 an invalid value is just simpler, faster to develop and run, and more similar to other existing network protocols.
There was a problem hiding this comment.
By the way, I have another question: Regarding the existing Matrix protocol, if any field is invalid, how should an implementation treat it?
Should an implementation treat it as 0, missing, or reject the message at all? Perhaps this new proposal needs to be consistent in this perspective…
There was a problem hiding this comment.
Sorry, seems your replies got lost in my mailbox...
Regarding the existing Matrix protocol, if any field is invalid, how should an implementation treat it?
depends on the context, generally if this is event auth, then the event should be rejected, whereas for regular client-server operation, the spec explicitly mentions that event contents are untrusted.
In response to your top level message from 9 hours ago, I feel like logically, a missing TTL on event forwarding should be treated the same as having no hops left (if you wanted to stick to the IPv4 analogy), but also one could frame it as how any person would interpret it too: there's still a ticket left to pass on, so we should pass it on. hence, missing a value outright would logically be the same as treating it as zero. (Also see null pointers in C/C++ usually being a litteral 0x00000000 pointer)
After I evaluated @TheArcaneBrony’s comment carefully again and talked with other people who have worked with the I plan to change the logic from “1 = stop” to “0 = stop.” I will change the proposal Markdown file in a couple of days. But before I make the change, I want to announce here to make sure no one else is simultaneously working on an implementation. P.S.: I am currently working on a Rust implementation. First, a proof of concept echo-bot is coming soon, then I will extend to other languages than Rust and incorporate it to some real-world bots. |
Signed-off-by: Star Brilliant <coder@poorlab.com>
|
A new version has been pushed to the branch. The biggest modification is the behavioral change from “1 = stop” to “0 = stop.” I also defined clear rules when dealing with multiple input messages rather than leaving it as an open question. I also removed the description about Telegram’s bot-to-bot restrictions, which is now outdated. I added some more discussion about whether A working PoC called echo-bot is available at: https://codeberg.org/m13253/matrix-echo-bot or https://github.com/m13253/matrix-echo-bot (They are mirrors of each other.) Finally, since I have been iterating on the branch for multiple commits, I am keeping all the historical commits so we don’t lose track of the comments tagged to those commits. Please instruct me whether and when and how I should squash the git history. |
…s bots" Signed-off-by: Star Brilliant <coder@poorlab.com>
Rendered
(About me: I develop E2EE-capable Matrix bots and bridges tailored for two communities. Recently, I open-sourced my matrixbot-ezlogin Rust library to help people build Matrix bots without worrying about the authentication and E2EE bootstrap process.)