Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formatting of replies #647

Merged
merged 32 commits into from Aug 21, 2018
Merged

Formatting of replies #647

merged 32 commits into from Aug 21, 2018

Conversation

Half-Shot
Copy link
Contributor

@Half-Shot Half-Shot commented Aug 13, 2018

Fixes matrix-org/matrix-spec-proposals#636

The format turns it into something like <source sender "source text"> reply text

Depends on matrix-org/matrix-appservice-bridge#79

@Half-Shot Half-Shot requested a review from a team August 13, 2018 17:42
@Half-Shot Half-Shot changed the base branch from master to develop August 13, 2018 17:43
Copy link
Member

@turt2live turt2live left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also like to see how this reacts when someone replies to a multiline message, particularly with short lines.

const REPLY_NAME_MAX_LENGTH = 12;
if (event.content["m.relates_to"] && event.content["m.relates_to"]["m.in_reply_to"]) {
// This is a reply.
const match = /> <(@.*:.*)>(.*)\n\n(.*)/.exec(text);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unsafe to rely on the body being accurate. I highly suggest fetching the event with GET /rooms/:roomId/event/:eventId.

Copy link
Contributor Author

@Half-Shot Half-Shot Aug 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would love to, but I'm not sure I can risk the perf of doing so.

EDIT: To explain further, I don't want to call any APIs while parsing messages M->I as we cannot risk an extra 100-500ms of latency. Arguably we could keep a cache of events we have seen as a compromise...

Copy link
Member

@t3chguy t3chguy Aug 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the fallback you're relying on is intended to be removed so relying on it will eventually even for non-malicious events screw things up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also the /event/ API unlike the context API should be rather cheap

const match = /> <(@.*:.*)>(.*)\n\n(.*)/.exec(text);
if (match.length === 4) {
let rplName;
let sourceLength = REPLY_SOURCE_MAX_LENGTH - "...".length;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I think it should be okay to have the ellipsis push the message to 35 characters. Although one could use the utf8 ellipsis instead: …

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be tempted to..but there is the risk of blowing up non UTF8 supporting clients.

Math.min(REPLY_NAME_MAX_LENGTH, userid.indexOf(":") - 1)
);
}
text = `<${rplName} "${rplSource.trim()}"> ${rplText}`;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to trim, surely we should trim ahead of getting to this point?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point

@Half-Shot
Copy link
Contributor Author

Half-Shot commented Aug 14, 2018

@turt2live I've added a recent event cache containing events we've seen so you can pull the source from it, if it exists.

And newlines now get shredded into periods, because I'm mean.

Copy link
Member

@turt2live turt2live left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels lose-lose to have a cache or make a call to /event. I'm sure there's some engineering that can be applied to make the branch under a 100ms and not consume the host, although thinking of that solution is hard at the moment.

@@ -19,6 +19,7 @@ const MSG_PMS_DISABLED_FEDERATION = "[Bridge] Sorry, PMs are disabled on " +

const KICK_RETRY_DELAY_MS = 15000;
const KICK_DELAY_JITTER = 30000;
const EVENT_CACHE_SIZE = 4096;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt this is enough for Freenode. Matrix.org is handling ~25Hz of messages, so we'll round up to 30 for math+future capabilities. At 30Hz, this cache is overturned every 3 minutes. Given most replies I've seen (note: not actually tested) happen 5-10+ minutes after the original message, your cache is too small.

Having a large cache means you're more likely to run out of memory though. Combined with the fact that the cache is not self-expiring, you'll never reclaim that memory during low volume periods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is assuming a lot of things.

Matrix.org is handling ~25Hz of messages.

You're assuming the bridge can see 25Hz of events. Even freenode isn't in that many rooms and is excluding things like PM traffic and the ~90% of rooms that aren't in a bridge. Let's not also forget that some of that traffic is IRC, which won't hit the cache. I seriously doubt the bridge is seeing an average of 30Hz M->I traffic a second.

likely to run out of memory though

We could up this to 16384 and be laughing to be honest, it's not a lot of memory.

cache is not self-expiring

It's really not something I want to waste CPU cycles on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, you're right - I was thinking of a similar nightmare from a different project and started having flashbacks. Would be nice to have the cache size configurable.

* Cache events in here so we can refer to them for replies.
* We deliberately avoid fetching events from the homeserver due to speed concerns.
*/
this._eventCache.set(event.event_id, { body: event.content.body, sender: event.sender });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no protection from massive events landing in the cache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, we should trim this to how much of the body we actually need.

}
// Replace newlines so we can fit this in the message body.
// It's not a perfect solution but it's legible.
rplSource = rplSource.replace(/\v/, '. ');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or just grab the first line and call it a day. Modifying user's text like this feels too invasive for a bridge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grabbing the first line makes sense...until someone does some really strange paragraphing. I'm not overly worried about editing out some newlines given the original text will be in IRC scrollback, and this version isn't too far removed from the original.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be the thing I'm going to push back on fairly strongly: where possible we shouldn't be modifying a user's message - that's not the job of the bridge. This includes format markers which may introduce less clarity.

If someone does something weird with paragraphs, that's their own fault in my honest opinion. For a safety net, we can grab the first non-empty line of text.

const delKey = this._eventCache.entries().next().value[0];
this._eventCache.delete(delKey);
}

let text = event.content.body;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no read from the cache?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also if we cache miss, we should do a hit to /event. Synapse's cache is likely going to be faster and more useful than ours anyways.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synapse's cache is likely going to be faster and more useful than ours anyways.

I doubt it's going to be faster.

Copy link
Contributor Author

@Half-Shot Half-Shot Aug 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (this._eventCache.has(eventId)) {
rplName = this._eventCache.get(eventId).sender;
rplSource = this._eventCache.get(eventId).body;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synapse's cache itself is faster. The HTTP overhead obviously won't be good.

Still should have a cache miss lookup. Put it behind an option if you'd like.

The event body is not looked up from the cache, therefore making the unsafe reliance on the sender's event still there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JS SDK doesn't support the /event endpoint, which is going to really bloat this PR out. I've made the caching configurable anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be great to have a js-sdk PR for it then. I'm quite surprised the endpoint doesn't already exist, given it's age and usefulness. Having a cache miss is important to deal with to prevent unexpected and unpredictable messages being sent, particularly by those who would be unaware of the problem in the first place.

body: event.content.body.substr(0, REPLY_SOURCE_MAX_LENGTH),
sender: event.sender }
);
console.log("SET!", this._eventCache, this._eventCache.size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gah, github showed me the wrong diff

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(oops, was dropped)

@Half-Shot
Copy link
Contributor Author

So to drag comments out from inline for a second, the caching thing is a problem. Frankly the reply format sucks as is, because I can't actually distinguish what is reply text and what is source text without running a regex through it so this is going to break at some point (according to @t3chguy, we are going to be dropping the source text from body).

Separately, cache misses should mean we fetch from /event except the JS-SDK doesn't actually support it yet. I could do a PR, but then that is going to add more delay on a PR I'd like to make the 0.11 release. It's very tempting for cache misses to forgo the source text entirely.

@t3chguy
Copy link
Member

t3chguy commented Aug 15, 2018

Frankly the reply format sucks as is, because I can't actually distinguish what is reply text and what is source text without running a regex through it so this is going to break at some point (according to @t3chguy, we are going to be dropping the source text from body).

Yes this is expected as it is ONLY there for backwards-compatibility

@turt2live
Copy link
Member

I really do appreciate that a release is imminent, however I don't want to have an incomplete feature be merged. I'm okay with pinning the js-sdk to the commit for the bridge (for now) once the PR is merged.

@Half-Shot
Copy link
Contributor Author

@turt2live I've made a few changes:

  • Rejoice, we now use /event to fetch events that aren't cached. PRs for those coming soon.
  • Replies of replies previously formatted awfully, now we seperate out the reply from the source guff in the cache and it looks better.
  • More tests

Copy link
Member

@turt2live turt2live left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is largely okay now (yay!), although I do wonder if we should use the HTML instead of the fallback text to parse the reply. This is mostly because things that implement the feature inevitably mess up the formatting on the fallback, leading to regex not matching. Might not be something to block a release on (yet).

There's also some comments about instructions not working - those are why this is flagged as "changes requested".


# Config for Matrix -> IRC bridging
matrixHandler:
# Cache this many matrix events in memory to be used for m.relates_to messages.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor thing: we should probably mention that m.relates_to is effectively replies somewhere


/**
* Cache events in here so we can refer to them for replies.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a large comment for a single line

@@ -1486,6 +1517,89 @@ MatrixHandler.prototype._onUserQuery = Promise.coroutine(function*(req, userId)
yield this.ircBridge.getMatrixUser(ircUser);
});

MatrixHandler.prototype._textForReplyEvent = Promise.coroutine(function*(event, ircRoom) {
const REPLY_REGEX = /> <(@.*:.*)>(.*)\n\n(.*)/;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we actually need to perform this regex at all - the existence of m.relates_to/m.in_reply_to should be enough for us to consider it a reply, regardless of how silly the sender's client is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nevermind, we need the actual text the user sent... I wonder if it's worth using the html instead as one common thing amoung reply implementers is mixing up how the fallback actually works.

// Cut to a maximum length.
rplSource = lines[0].substr(0, REPLY_SOURCE_MAX_LENGTH);
// Ellipsis if needed.
if (rplSource.length > REPLY_SOURCE_MAX_LENGTH) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition won't be activated because the instruction directly above trims it to a safe length.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wow, that's pretty silly

rplSource = rplSource + "...";
}
// Wrap in formatting
rplSource = ` "${lines[0]}"`;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this throws away the trimmed body

rplName = sourceClient.nick;
}
else {
// Somehow we failed, so fallback to userid.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/userid/localpart (minor thing)

msgtype: "m.text",
"m.relates_to": {
"m.in_reply_to": {
"event_id": "$original:bar.com"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent

content: {
body: "> <@somedude:bar.com> This is the fake message\n\nReply Text",
msgtype: "m.text",
"m.relates_to": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically this event isn't spec compliant because it doesn't populate the html component (which through this review I'm starting to think is a better idea to use anyways). Spec coming Soon™️

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applies to other tests too

msgtype: "m.text",
"m.relates_to": {
"m.in_reply_to": {
"event_id": "$original:bar.com"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent

msgtype: "m.text",
"m.relates_to": {
"m.in_reply_to": {
"event_id": "$first:bar.com"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

half the indent is here, but not all of it

@Half-Shot
Copy link
Contributor Author

This is mostly because things that implement the feature inevitably mess up the formatting on the fallback, leading to regex not matching.

My reason for not doing HTML is because I'd have to strip away all the HTML (and also parse the HTML). I'm fairly confident this will be fine on Riot, and really this is something that needs speccing asap. I'd really love it if we either dropped the source text & sender from the body, or put the reply in it's own key.

Going to fix the other bits now!

@turt2live
Copy link
Member

yea, that's all fair. Having the reply text somewhere in the event body without having to parse it out would be nice - have filed https://github.com/matrix-org/matrix-doc/issues/1541

@Half-Shot
Copy link
Contributor Author

Half-Shot commented Aug 20, 2018

Also, looks like my replies totally break M->I formatting so I will be fixing that.

EDIT: Nevermind, we cowardly give up if the HTML body contains anything we don't support and fall back to plain text so there isn't a good way to fix this :(

Copy link
Member

@turt2live turt2live left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 (or 🌮 as github really wants me to autocomplete with)

@Half-Shot
Copy link
Contributor Author

Half-Shot commented Aug 20, 2018

TACO MERGE TACO wait for js-sdk

@turt2live
Copy link
Member

ftr if you have to pin the dependencies to get the bridge sdk all happy and working, by all means do so. Would be nice to have it in a bridge sdk release that coincides with an IRC bridge release though :)

@Half-Shot
Copy link
Contributor Author

Half-Shot commented Aug 20, 2018

Honestly once the js-sdk PR goes I might bump both.

@Half-Shot Half-Shot merged commit c5a048a into matrix-org:develop Aug 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Our doc for the right syntax of filters is cryptic (SPEC-348)
3 participants