Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[multiline] Multiline message batches #398

Open
wants to merge 5 commits into
base: master
from
Open

Conversation

@jwheare
Copy link
Member

jwheare commented Sep 23, 2019

A solution to multiline messages (#208) using client to server batches based on https://gist.github.com/jesopo/092866e52b40cda6f6205263d04dbcf9


Currently implemented and testable with the IRCCloud client and servers (prattle: teams, bnc, slack) and bitbot.


Client sending a mutliline batch

Client: BATCH +123 draft/multiline #channel
Client: @batch=123 PRIVMSG #channel hello
Client: @batch=123 privmsg #channel :how is<SPACE>
Client: @batch=123;draft/multiline-concat PRIVMSG #channel :everyone?
Client: BATCH -123

Server sending the same batch

Server: @msgid=xxx;account=account :n!u@h BATCH +123 draft/multiline #channel
Server: @batch=123 :n!u@h PRIVMSG #channel hello
Server: @batch=123 :n!u@h PRIVMSG #channel :how is<SPACE>
Server: @batch=123;draft/multiline-concat :n!u@h PRIVMSG #channel :everyone?
Server: BATCH -123

Server sending messages to clients without multiline support

Server: @msgid=xxx;account=account :n!u@h PRIVMSG #channel hello
Server: @account=account :n!u@h PRIVMSG #channel :how is<SPACE>
Server: @account=account :n!u@h PRIVMSG #channel :everyone?

Final concatenated message

hello
how is everyone?
@jwheare jwheare added this to the Roadmap milestone Sep 23, 2019
@jwheare jwheare mentioned this pull request Sep 23, 2019

### Message ids ([spec][message ids])

Servers MUST only include a message ID on the first message of a batch when sending a fallback to non-supporting clients.

This comment has been minimized.

Copy link
@ProgVal

ProgVal Sep 23, 2019

Contributor

Why the first one? If the message ID is used eg for a reply or a reaction emoji, it visually makes more sense to have it attached to the last message.

This comment has been minimized.

Copy link
@slingamn

slingamn Nov 19, 2019

If I understand correctly, this is saying that (some of) the fallback PRIVMSG should be sent without any msgid tag at all? This seems to upend some expectations (e.g., msgids being available for deduplication purposes).

This comment has been minimized.

Copy link
@slingamn

slingamn Nov 19, 2019

For comparison, oragono.io/maxline-2 generates n+1 distinct message IDs for the original message and all n lines of the split message. This isn't perfect, but it maintains the expectation that PRIVMSG should have an ID.

The msgid spec says: "If this tag is being used, it SHOULD be attached to all PRIVMSG and NOTICE events."

This comment has been minimized.

Copy link
@jwheare

jwheare Nov 20, 2019

Author Member

msgids can't appear more than once. Generating unique n+1 msgids and only sending them on the fallback creates undesirable situations where people might reply to a fallback message and the context would then be missing for people who fully support maxline (i.e. discard fallback message tags)

If you're concerned about missing msgids, the solution would be to properly support the maxline batch spec.

This comment has been minimized.

Copy link
@edk0

edk0 Nov 20, 2019

I don't think that's true. It'd be easy to generate unique fallback msgids that the server can recognise as such and translate back or refuse replies for or something.

This comment has been minimized.

Copy link
@slingamn

slingamn Nov 20, 2019

@jwheare, just to be clear, when you say "maxline" you're referring to the present multiline spec, not to oragono.io/maxline-2?

Some degradation of experience is inevitable under any proposal to support PRIVMSG over 512 bytes. I'm much less concerned about the case you're describing than I am about sending PRIVMSG without IDs, which seems to complicate the handling of chat history at a more fundamental level.

I'm not 100% sold on @edk0's suggestion because it violates an expectation I have that servers shouldn't need to care about client-only tags, but I could be convinced.

Hypothetically, we could define a new tag like draft/fmsgid ("fallback msgid") for transmitting the fallback msgids to clients that support the multiline spec.

This comment has been minimized.

Copy link
@edk0

edk0 Nov 21, 2019

I'm going to care about client-only tags anyway, so it's not a bother for me. You don't have to implement my suggestion; I'm just saying a solution is possible.

@slingamn

This comment has been minimized.

Copy link

slingamn commented Dec 20, 2019

Are there any constraints on which messages in the client batch can carry client-only tags, and what those tags can be?

@jwheare

This comment has been minimized.

Copy link
Member Author

jwheare commented Dec 20, 2019

The way I've implemented this in the client, the only tags within the batch that I use are the @draft/multiline-concat, every other tag must appear on the opening BATCH message to be recognised.

Any tags that would have been added to the batch, e.g. message IDs, account tags etc MUST be included on the first message line. Tags MAY also be included on subsequent lines where it makes sense to do so.

@jwheare

This comment has been minimized.

Copy link
Member Author

jwheare commented Dec 20, 2019

Actually that wording is from the fallback handling and refers to unbatched messages. Could use some clearer guidance on non-fallback BATCHed messages and where to include the tag, though I feel it is implied by the fact that the BATCH indicates a single logical message.

@jwheare

This comment has been minimized.

Copy link
Member Author

jwheare commented Dec 20, 2019

Oh it is mentioned actually:

Clients MUST only send non-batch tags on the opening BATCH command when sending multiline batches.

That means only @batch and @multiline-concat are allowed on messages within the batch.

extensions/multiline.md Outdated Show resolved Hide resolved

This section is non-normative.

In the context of flood protection, keep in mind that clients might send an entire batched multiline message all at once.

This comment has been minimized.

Copy link
@slingamn

slingamn Dec 20, 2019

I haven't thought this through fully, but there's an issue here with sendqs. If the sendq is small relative to the value of max-bytes, you can DoS clients by completing multiple multiline batches at once and filling up their sendqs.

This comment has been minimized.

Copy link
@slingamn

slingamn Dec 22, 2019

Following up on a clarifying question from #ircv3: yeah, client A can try to disconnect client B from a server by filling up its sendq. Normally, flood control is a check on this. It's not clear what the best practices are for applying flood control to multiline messages --- the way to get parity with pre-multiline behavior w.r.t. the sendq would be to apply artificial delays to the outgoing batch.

This comment has been minimized.

Copy link
@slingamn

slingamn Dec 23, 2019

If max-lines is not set, the worst case for a DoS on the sendq is to send max-bytes lines, with a single byte on each line, and multiline-concat on every line after the first. Assuming the longest possible nickmask, the total size of the relayed message will be about 100 times max-bytes. Even for a conservative value of max-bytes, this should be large enough to overwhelm both the sendq and the TCP send buffer (I think it's typically around 85kB on modern Linux machines).

extensions/multiline.md Outdated Show resolved Hide resolved
@slingamn

This comment has been minimized.

Copy link

slingamn commented Dec 23, 2019

re. interactions between multiline and labeled-response: on which message does the client send the label tag, the BATCH + or the BATCH - line?

@jwheare

This comment has been minimized.

Copy link
Member Author

jwheare commented Dec 23, 2019

BATCH +

@slingamn

This comment has been minimized.

Copy link

slingamn commented Dec 27, 2019

I have a (likely buggy) implementation of this up on testnet.oragono.io. It uses fallback msgids (each individual message in the multiline batch has an alternative msgid, which is sent to multiline-capable clients under the draft/fmsgid tag, and to legacy clients under the msgid tag).

Please report any bugs at https://github.com/oragono/oragono :-)

@xPaw

This comment has been minimized.

Copy link

xPaw commented Jan 6, 2020

Do I understand correctly that the <SPACE> at the end of the line in these examples means there shouldn't be a new line inserted in the concatenated message?

foreach msg in messages
 concat_msg += msg
 if msg does not end with space
  concat_msg += "\n"

This is almost explained in splitting long lines, but for the sake of verbosity, it probably warrants an explicit explanation.

@jwheare

This comment has been minimized.

Copy link
Member Author

jwheare commented Jan 6, 2020

No, the draft/multiline-concat tag indicates that. The <SPACE> is there because without a newline, the words are separated with a space.

@xPaw

This comment has been minimized.

Copy link

xPaw commented Jan 6, 2020

Oh duh, I completely missed the tag there.

The "leave the space character at the end of the line. This ensures that concatenated lines don't lose the space" paragraph mislead me. I do see why it's this way for fallback messages though.

@jwheare

This comment has been minimized.

Copy link
Member Author

jwheare commented Jan 6, 2020

Any suggestions for clarifying that paragraph to make it less misleading?

@xPaw

This comment has been minimized.

Copy link

xPaw commented Jan 6, 2020

Probably mentioning the tag there would help clarify it.

I can't think of an example where "This ensures that concatenated lines don't lose the space" is relevant besides buggy code that trims the messages?

jwheare added 3 commits Jan 17, 2020
@jwheare

This comment has been minimized.

Copy link
Member Author

jwheare commented Jan 17, 2020

Any outstanding comments before we merge this as a draft?

@jesopo
jesopo approved these changes Jan 17, 2020
@slingamn

This comment has been minimized.

Copy link

slingamn commented Jan 17, 2020

Two issues I raised haven't been addressed:

  1. The question of "fallback msgids"
  2. Denial-of-service attacks that exhaust the sendq
@jwheare

This comment has been minimized.

Copy link
Member Author

jwheare commented Jan 17, 2020

  1. I'm not in favour of them.
  2. Any suggested wording that should be in the spec to address that?
@jwheare

This comment has been minimized.

Copy link
Member Author

jwheare commented Jan 17, 2020

To expand on 1) I feel the benefit to complexity ratio is skewed. It's aimed at the small subset of people who do support message tags, but don't support multiline, which will likely only become smaller as time goes on.

@slingamn

This comment has been minimized.

Copy link

slingamn commented Jan 17, 2020

On the one hand, I'm concerned about a spec that mandates sending any PRIVMSG without a message ID tag, because it seems like this compromises the use of message IDs as "infrastructure" (e.g., for deduplication). On the other hand, the possibility of skipping directly to the world where everyone supports both message IDs and multiline batches is tempting. So I'm not sure what I think right now.

re. DoS, I think at the very least the spec should require that max-lines be set. Beyond that, I think this is a serious problem to which I do not have a complete solution. A possibility that occurred to me is suggesting that servers apply fakelag to the outgoing messages as necessary. This would add a lot of implementation complexity, though.

| command | `PRIVMSG` | 7 |
| channel | `#channel` | variable |
| message separator | ` :` | 2 |
| crlf | `\n\r` | 2 |

This comment has been minimized.

Copy link
@xPaw

xPaw Jan 17, 2020

crlf is \r\n

@slingamn

This comment has been minimized.

Copy link

slingamn commented Jan 20, 2020

Two questions:

  1. The stated rationale for sending PRIVMSG without msgids to legacy clients is that this is a transitional period, and eventually every client that supports tags will also support multiline, rendering the question moot. Does this imply that after some point in the future, we will no longer allow new specifications that require sending a PRIVMSG without a msgid, at which point client developers will be able to rely on the presence of msgids?
  2. Suppose a client wants to implement support for draft/chathistory (possibly at a very rudimentary level, like just sending a LATEST query after every JOIN) during this transitional period, but doesn't want to implement draft/multiline (e.g., because it breaks too many assumptions). Is there any way for them to correctly deduplicate the history? There's seemingly no way to distinguish between duplicate playback of the intermediate lines of a multiline message, and intermediate lines that are genuinely repeated in the message, because the time tag will be the same on each line.
@jwheare

This comment has been minimized.

Copy link
Member Author

jwheare commented Jan 20, 2020

  1. There is no hard requirement that all PRIVMSGs have a msgid. It's only a SHOULD in the spec.
  2. Are you aware of real world implementors this would affect or is it a hypothetical? Some concrete spelled out examples of how this might manifest might help.
@slingamn

This comment has been minimized.

Copy link

slingamn commented Jan 20, 2020

  1. Yes, but apart from this, there's no good reason to send a PRIVMSG without a msgid, so it would be reasonable for clients to degrade in that case (e.g., by assuming that anything sent without a msgid is always a duplicate, or is never a duplicate, depending which side of the line they prefer to err on).
  2. It's mostly a hypothetical. I'm thinking of adding this kind of support to either Hexchat, irssi, or https://github.com/MCMrARM/revolution-irc, but I haven't looked too deeply at what would be required. It's true that implementing CHATHISTORY sensibly already requires support for BATCH, so the added burden is just to support nested batches.
@slingamn

This comment has been minimized.

Copy link

slingamn commented Jan 24, 2020

I tentatively withdraw my objection: I've come around to thinking that instead CHATHISTORY should be designed so that rudimentary use is possible without any awareness of BATCH, msgids, or deduplication.

@jwheare

This comment has been minimized.

Copy link
Member Author

jwheare commented Jan 25, 2020

OK, so without further comments, this will be merged as draft on Monday.

@slingamn

This comment has been minimized.

Copy link

slingamn commented Jan 26, 2020

My security concerns still haven't been addressed. I would propose:

  1. Make max-lines mandatory
  2. Add a "security considerations" section, with language along the following lines: "Server implementations and operators should consider the potential effects of relaying large amounts of message data at once. In particular, large multiline messages have the potential to exhaust client send queues, which will either result in client disconnection (a denial-of-service attack) or degraded client performance. Size on the wire will typically be constrained by max-lines, not max-bytes, because each additional protocol line can carry a large amount of metadata (in particular the sender's full nickmask and the channel name). Operators should set max-bytes, max-lines, and the send queue length in tandem to mitigate these attacks."
@jwheare

This comment has been minimized.

Copy link
Member Author

jwheare commented Jan 26, 2020

Strongly disagree that max lines needs to be mandatory. if a server uses a line based sendq then it might make sense but not all servers do and implementation details shouldn’t leak into the spec. Bytes are a less arbitrary limit on data that more directly maps to universal resource constraints.

@jwheare

This comment has been minimized.

Copy link
Member Author

jwheare commented Jan 26, 2020

Happy to include some server considerations based on the rest of your suggestion though.

@jwheare

This comment has been minimized.

Copy link
Member Author

jwheare commented Jan 26, 2020

Would be happier still to get feedback from other server implementers as well!

@slingamn

This comment has been minimized.

Copy link

slingamn commented Jan 26, 2020

if a server uses a line based sendq then it might make sense but not all servers do and implementation details shouldn’t leak into the spec.

The motivation is not line-based sendq's, but byte-based sendq's. The problem is that if max-lines is unset, the total size of the relayed message can be approximately 100 times max-bytes (after including the full nickmask, the channel target, and the concat tag on every line).

@jwheare

This comment has been minimized.

Copy link
Member Author

jwheare commented Jan 26, 2020

I want to avoid a situation where servers apply arbitrarily low limits on lines because they’re forced to pick a limit. Bearing in mind people use this feature to post code snippets.

If we want servers to consider the byte budget of repeated line envelopes, we should probably talk about it in terms of bytes. e.g. if you’re trying to account for 100 extra bytes per line, and your byte limit is 4000, the max possible lines is 4000 if each new line counts for a 1B \n That would require a sendq of at most 400KB. If you want to reduce that to 40KB, you’d need a max-lines 400.

What are typical sendq values in default or common configurations or on popular servers in the real world? Would we consider them overly conservative on the whole?

It’s not for the spec to decide resource constraints for servers, but we should help them come to a non arbitrary decision by thinking about them appropriately.

@jwheare

This comment has been minimized.

Copy link
Member Author

jwheare commented Jan 26, 2020

Side note: we should probably clarify how many bytes of content a line break takes up in terms of max-bytes. Internally on the server, in our implementation, it’s a single \n byte, rather than the CRLF of the protocol separator.

@slingamn

This comment has been minimized.

Copy link

slingamn commented Jan 26, 2020

Unreal's default is 200 KB. Inspircd's default is 1MB. Oragono's is 16 MB KB (probably too low, we should fix that).

@jwheare

This comment has been minimized.

Copy link
Member Author

jwheare commented Jan 26, 2020

The orders of magnitude there suggest that there often isn’t really a need for a separate max-lines, unless you’re unreal and don’t want to raise the sendq but still allow 4000 max bytes. So yeah, I’m still against it being mandatory, but there should certainly be some discussion, guidance around it.

@slingamn

This comment has been minimized.

Copy link

slingamn commented Jan 26, 2020

You can exceed the default inspircd sendq using three coordinating clients who terminate their batches at the same time.

It's also worth thinking about this in terms of raw bandwidth. If we accept that a single multiline message can take up 400 KB on the wire, relaying it to a 1000-client channel is 400 MB, which will exhaust a 100 mbps connection for approximately 30 seconds.

@slingamn

This comment has been minimized.

Copy link

slingamn commented Jan 27, 2020

This doesn't eliminate the problem (since it does nothing for legacy clients), but a potential amelioration is to say that the server MAY send * in place of both the nickmask and the target to fully capable clients, on the PRIVMSGs inside the batch. This could be about a 3x reduction in wire size?

To be clear, I still think max-lines should be required; values between 128 and 512 seem sane to me.

@jwheare

This comment has been minimized.

Copy link
Member Author

jwheare commented Jan 27, 2020

I’d be up for such an optimisation or similar even if it doesn’t help legacy.

One reason i don’t want to mandate max-lines: we’ve implemented this in a Slack gateway, and Slack has no such limitation.

Definitely worth mentioning the case for having such a limit though.

@slingamn

This comment has been minimized.

Copy link

slingamn commented Jan 27, 2020

One reason i don’t want to mandate max-lines: we’ve implemented this in a Slack gateway, and Slack has no such limitation.

Totally fair, it's much less of a concern in any setting where you're not worried about abuse. I'd be fine with recommending it instead of mandating it.

Suggested language: "To save space on the wire, when relaying multiline batches to fully compatible clients, servers MAY omit the source nickmask and send a * in place of the channel name on the PRIVMSG lines inside the batch (since this information is redundant with the initial BATCH line)."

@jwheare

This comment has been minimized.

Copy link
Member Author

jwheare commented Feb 7, 2020

I think we decided in IRC that this optimisation is probably not worth it. But we still need some more advice on setting a sensible max-lines value.

@slingamn

This comment has been minimized.

Copy link

slingamn commented Feb 7, 2020

As I remember it, we decided that the most aggressive optimization possible (sending * for the command and sending the message as a single parameter) wasn't worth it. I'm still very interested in doing a more modest optimization (omitting the prefix, keeping PRIVMSG as the command, and sending * as the first parameter and the message as the second). Suggested language for this is in my previous comment.

The main "upwards" constraint you identified earlier on max-lines is the need to hold code snippets. I think 128 lines is a reasonable limit for this.

@SadieCat

This comment has been minimized.

Copy link
Contributor

SadieCat commented Feb 7, 2020

Messages without a prefix are from the locally connected server. Also, this "optimisation" seems like it would complicate both client and server code for very little benefit.

@slingamn

This comment has been minimized.

Copy link

slingamn commented Feb 7, 2020

These messages would only be sent to fully compliant clients, which in any case need to take the batch tag into account when interpreting the message. So there should be no increase in client complexity over the baseline required to implement multiline. As for server complexity, it's a MAY, so servers don't have to implement it if they don't want it.

The potential benefit is a 60% reduction in the size of relayed messages, which seems worthwhile to me.

@kythyria

This comment has been minimized.

Copy link
Contributor

kythyria commented Feb 7, 2020

Compromise: Put only the nick as the sender. Could even put * as the sender, though that's perhaps inconsistent with other extensions using * to mean "yourself".

@slingamn

This comment has been minimized.

Copy link

slingamn commented Feb 7, 2020

Implementations might want to use the target for routing messages (in addition to the batch tag), so I withdraw my proposal to send * as the target.

I still think we should allow servers to omit the source nickmask: this is syntactically unproblematic, and should be semantically unproblematic for clients as well (at least one implementation supports this "by default", since sources for messages within the relayed S2C batch are ignored).

A lot of numbers have been bandied back and forth, so here's what I think is a realistic estimate: for a 100-line code snippet with an average line length of 60 characters, relayed to a channel with an 8-character name by a user with a 40-character nickmask, I compute a 30% reduction in wire size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

8 participants
You can’t perform that action at this time.