New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Formatting of replies #647
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also like to see how this reacts when someone replies to a multiline message, particularly with short lines.
lib/bridge/MatrixHandler.js
Outdated
const REPLY_NAME_MAX_LENGTH = 12; | ||
if (event.content["m.relates_to"] && event.content["m.relates_to"]["m.in_reply_to"]) { | ||
// This is a reply. | ||
const match = /> <(@.*:.*)>(.*)\n\n(.*)/.exec(text); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unsafe to rely on the body being accurate. I highly suggest fetching the event with GET /rooms/:roomId/event/:eventId
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would love to, but I'm not sure I can risk the perf of doing so.
EDIT: To explain further, I don't want to call any APIs while parsing messages M->I as we cannot risk an extra 100-500ms of latency. Arguably we could keep a cache of events we have seen as a compromise...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the fallback you're relying on is intended to be removed so relying on it will eventually even for non-malicious events screw things up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also the /event/
API unlike the context API should be rather cheap
lib/bridge/MatrixHandler.js
Outdated
const match = /> <(@.*:.*)>(.*)\n\n(.*)/.exec(text); | ||
if (match.length === 4) { | ||
let rplName; | ||
let sourceLength = REPLY_SOURCE_MAX_LENGTH - "...".length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I think it should be okay to have the ellipsis push the message to 35 characters. Although one could use the utf8 ellipsis instead: …
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be tempted to..but there is the risk of blowing up non UTF8 supporting clients.
lib/bridge/MatrixHandler.js
Outdated
Math.min(REPLY_NAME_MAX_LENGTH, userid.indexOf(":") - 1) | ||
); | ||
} | ||
text = `<${rplName} "${rplSource.trim()}"> ${rplText}`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to trim, surely we should trim ahead of getting to this point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Point
@turt2live I've added a recent event cache containing events we've seen so you can pull the source from it, if it exists. And newlines now get shredded into periods, because I'm mean. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels lose-lose to have a cache or make a call to /event
. I'm sure there's some engineering that can be applied to make the branch under a 100ms and not consume the host, although thinking of that solution is hard at the moment.
lib/bridge/MatrixHandler.js
Outdated
@@ -19,6 +19,7 @@ const MSG_PMS_DISABLED_FEDERATION = "[Bridge] Sorry, PMs are disabled on " + | |||
|
|||
const KICK_RETRY_DELAY_MS = 15000; | |||
const KICK_DELAY_JITTER = 30000; | |||
const EVENT_CACHE_SIZE = 4096; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt this is enough for Freenode. Matrix.org is handling ~25Hz of messages, so we'll round up to 30 for math+future capabilities. At 30Hz, this cache is overturned every 3 minutes. Given most replies I've seen (note: not actually tested) happen 5-10+ minutes after the original message, your cache is too small.
Having a large cache means you're more likely to run out of memory though. Combined with the fact that the cache is not self-expiring, you'll never reclaim that memory during low volume periods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is assuming a lot of things.
Matrix.org is handling ~25Hz of messages.
You're assuming the bridge can see 25Hz of events. Even freenode isn't in that many rooms and is excluding things like PM traffic and the ~90% of rooms that aren't in a bridge. Let's not also forget that some of that traffic is IRC, which won't hit the cache. I seriously doubt the bridge is seeing an average of 30Hz M->I traffic a second.
likely to run out of memory though
We could up this to 16384 and be laughing to be honest, it's not a lot of memory.
cache is not self-expiring
It's really not something I want to waste CPU cycles on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, you're right - I was thinking of a similar nightmare from a different project and started having flashbacks. Would be nice to have the cache size configurable.
lib/bridge/MatrixHandler.js
Outdated
* Cache events in here so we can refer to them for replies. | ||
* We deliberately avoid fetching events from the homeserver due to speed concerns. | ||
*/ | ||
this._eventCache.set(event.event_id, { body: event.content.body, sender: event.sender }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no protection from massive events landing in the cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, we should trim this to how much of the body we actually need.
lib/bridge/MatrixHandler.js
Outdated
} | ||
// Replace newlines so we can fit this in the message body. | ||
// It's not a perfect solution but it's legible. | ||
rplSource = rplSource.replace(/\v/, '. '); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or just grab the first line and call it a day. Modifying user's text like this feels too invasive for a bridge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grabbing the first line makes sense...until someone does some really strange paragraphing. I'm not overly worried about editing out some newlines given the original text will be in IRC scrollback, and this version isn't too far removed from the original.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be the thing I'm going to push back on fairly strongly: where possible we shouldn't be modifying a user's message - that's not the job of the bridge. This includes format markers which may introduce less clarity.
If someone does something weird with paragraphs, that's their own fault in my honest opinion. For a safety net, we can grab the first non-empty line of text.
lib/bridge/MatrixHandler.js
Outdated
const delKey = this._eventCache.entries().next().value[0]; | ||
this._eventCache.delete(delKey); | ||
} | ||
|
||
let text = event.content.body; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's no read from the cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also if we cache miss, we should do a hit to /event
. Synapse's cache is likely going to be faster and more useful than ours anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Synapse's cache is likely going to be faster and more useful than ours anyways.
I doubt it's going to be faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
matrix-appservice-irc/lib/bridge/MatrixHandler.js
Lines 1286 to 1289 in 6e0ac53
if (this._eventCache.has(eventId)) { | |
rplName = this._eventCache.get(eventId).sender; | |
rplSource = this._eventCache.get(eventId).body; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Synapse's cache itself is faster. The HTTP overhead obviously won't be good.
Still should have a cache miss lookup. Put it behind an option if you'd like.
The event body is not looked up from the cache, therefore making the unsafe reliance on the sender's event still there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JS SDK doesn't support the /event endpoint, which is going to really bloat this PR out. I've made the caching configurable anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be great to have a js-sdk PR for it then. I'm quite surprised the endpoint doesn't already exist, given it's age and usefulness. Having a cache miss is important to deal with to prevent unexpected and unpredictable messages being sent, particularly by those who would be unaware of the problem in the first place.
lib/bridge/MatrixHandler.js
Outdated
body: event.content.body.substr(0, REPLY_SOURCE_MAX_LENGTH), | ||
sender: event.sender } | ||
); | ||
console.log("SET!", this._eventCache, this._eventCache.size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gah, github showed me the wrong diff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(oops, was dropped)
So to drag comments out from inline for a second, the caching thing is a problem. Frankly the reply format sucks as is, because I can't actually distinguish what is reply text and what is source text without running a regex through it so this is going to break at some point (according to @t3chguy, we are going to be dropping the source text from Separately, cache misses should mean we fetch from /event except the JS-SDK doesn't actually support it yet. I could do a PR, but then that is going to add more delay on a PR I'd like to make the 0.11 release. It's very tempting for cache misses to forgo the source text entirely. |
Yes this is expected as it is ONLY there for backwards-compatibility |
I really do appreciate that a release is imminent, however I don't want to have an incomplete feature be merged. I'm okay with pinning the js-sdk to the commit for the bridge (for now) once the PR is merged. |
@turt2live I've made a few changes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is largely okay now (yay!), although I do wonder if we should use the HTML instead of the fallback text to parse the reply. This is mostly because things that implement the feature inevitably mess up the formatting on the fallback, leading to regex not matching. Might not be something to block a release on (yet).
There's also some comments about instructions not working - those are why this is flagged as "changes requested".
config.sample.yaml
Outdated
|
||
# Config for Matrix -> IRC bridging | ||
matrixHandler: | ||
# Cache this many matrix events in memory to be used for m.relates_to messages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor thing: we should probably mention that m.relates_to
is effectively replies somewhere
lib/bridge/MatrixHandler.js
Outdated
|
||
/** | ||
* Cache events in here so we can refer to them for replies. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a large comment for a single line
@@ -1486,6 +1517,89 @@ MatrixHandler.prototype._onUserQuery = Promise.coroutine(function*(req, userId) | |||
yield this.ircBridge.getMatrixUser(ircUser); | |||
}); | |||
|
|||
MatrixHandler.prototype._textForReplyEvent = Promise.coroutine(function*(event, ircRoom) { | |||
const REPLY_REGEX = /> <(@.*:.*)>(.*)\n\n(.*)/; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we actually need to perform this regex at all - the existence of m.relates_to/m.in_reply_to
should be enough for us to consider it a reply, regardless of how silly the sender's client is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nevermind, we need the actual text the user sent... I wonder if it's worth using the html instead as one common thing amoung reply implementers is mixing up how the fallback actually works.
lib/bridge/MatrixHandler.js
Outdated
// Cut to a maximum length. | ||
rplSource = lines[0].substr(0, REPLY_SOURCE_MAX_LENGTH); | ||
// Ellipsis if needed. | ||
if (rplSource.length > REPLY_SOURCE_MAX_LENGTH) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition won't be activated because the instruction directly above trims it to a safe length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh wow, that's pretty silly
lib/bridge/MatrixHandler.js
Outdated
rplSource = rplSource + "..."; | ||
} | ||
// Wrap in formatting | ||
rplSource = ` "${lines[0]}"`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this throws away the trimmed body
lib/bridge/MatrixHandler.js
Outdated
rplName = sourceClient.nick; | ||
} | ||
else { | ||
// Somehow we failed, so fallback to userid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/userid/localpart (minor thing)
spec/integ/matrix-to-irc.spec.js
Outdated
msgtype: "m.text", | ||
"m.relates_to": { | ||
"m.in_reply_to": { | ||
"event_id": "$original:bar.com" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent
spec/integ/matrix-to-irc.spec.js
Outdated
content: { | ||
body: "> <@somedude:bar.com> This is the fake message\n\nReply Text", | ||
msgtype: "m.text", | ||
"m.relates_to": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically this event isn't spec compliant because it doesn't populate the html component (which through this review I'm starting to think is a better idea to use anyways). Spec coming Soon™️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
applies to other tests too
spec/integ/matrix-to-irc.spec.js
Outdated
msgtype: "m.text", | ||
"m.relates_to": { | ||
"m.in_reply_to": { | ||
"event_id": "$original:bar.com" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent
spec/integ/matrix-to-irc.spec.js
Outdated
msgtype: "m.text", | ||
"m.relates_to": { | ||
"m.in_reply_to": { | ||
"event_id": "$first:bar.com" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
half the indent is here, but not all of it
My reason for not doing HTML is because I'd have to strip away all the HTML (and also parse the HTML). I'm fairly confident this will be fine on Riot, and really this is something that needs speccing asap. I'd really love it if we either dropped the source text & sender from the body, or put the reply in it's own key. Going to fix the other bits now! |
yea, that's all fair. Having the reply text somewhere in the event body without having to parse it out would be nice - have filed https://github.com/matrix-org/matrix-doc/issues/1541 |
Also, looks like my replies totally break M->I formatting so I will be fixing that. EDIT: Nevermind, we cowardly give up if the HTML body contains anything we don't support and fall back to plain text so there isn't a good way to fix this :( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 (or 🌮 as github really wants me to autocomplete with)
|
ftr if you have to pin the dependencies to get the bridge sdk all happy and working, by all means do so. Would be nice to have it in a bridge sdk release that coincides with an IRC bridge release though :) |
Honestly once the js-sdk PR goes I might bump both. |
Fixes matrix-org/matrix-spec-proposals#636
The format turns it into something like
<source sender "source text"> reply text
Depends on matrix-org/matrix-appservice-bridge#79