Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow increased IRC message lengths #281

Closed
wants to merge 8 commits into from

Conversation

@DanielOaks
Copy link
Member

@DanielOaks DanielOaks commented Nov 18, 2016

One of the big issues we want to solve is that IRC lines are capped at 512 octets. This... works, but it would be very nice to allow for longer messages and things like longer topics without needing to implement dodgy hacks for every single command we want to allow longer lengths on.

This should ensure that things stay 100% backwards compatible and work correctly for clients that do not support longer lines, while allowing more up-to-date clients to negotiate the longer message length allowed by the server.

This is gonna be something that is a bit controversial, but it would be extremely useful to allow and something we've been looking at for a while.

@HelixSpiral
Copy link

@HelixSpiral HelixSpiral commented Nov 18, 2016

Replace every instance of octets with chars or characters

Other than that, seems like a good addition. Variable line lengths is something I've been wanting for awhile now.

@DarthGandalf
Copy link
Member

@DarthGandalf DarthGandalf commented Nov 18, 2016

@shawn-smith every character can be more than one octet, especially as UTF-8 is recommended. The RFC says about octets (bytes), not about unicode characters.

@clokep
Copy link
Contributor

@clokep clokep commented Nov 18, 2016

Any thoughts on how the server is supposed to split the text? We (Instantbird and Thunderbird) will split a user's message on the closest space that makes the message small enough to send (if there's no space, then we'll split in the middle of a "word".) This is a somewhat sub-optimal though, depending on the language...If I recall correctly, certain spaces in French grammar are essentially the 'middle' of a word.

Probably out of scope for the specification, but figured I'd ask.

Also 👍 on being clear about octets vs. characters.

@HelixSpiral
Copy link

@HelixSpiral HelixSpiral commented Nov 18, 2016

@DarthGandalf IRC line length goes by number of characters regardless of how many octets it takes per character.

IRC messages are always lines of characters terminated with a CR-LF
   (Carriage Return - Line Feed) pair, and these messages shall not
   exceed 512 characters in length, counting all characters including
   the trailing CR-LF. Thus, there are 510 characters maximum allowed
   for the command and its parameters.  There is no provision for
   continuation message lines.  See section 7 for more details about
   current implementations.
@DanielOaks
Copy link
Member Author

@DanielOaks DanielOaks commented Nov 18, 2016

The RFCs refer to bytes, octets and characters (which is sketchy, but we need to work with). Octets seems to be the one that is most specific and is least likely to cause confusion/etc among people.

@clokep I don't really define how to split messages in this on purpose, but my thoughts essentially boil down to "somewhere that makes sense", which is difficult to codify. Some will split on spaces, some on dashes and things as well, etc.

@HelixSpiral
Copy link

@HelixSpiral HelixSpiral commented Nov 18, 2016

@DanielOaks An octet is a group of 8. You're talking specifically about characters in your specification.

Copy link
Member

@DarthGandalf DarthGandalf left a comment

@shawn-smith octet is 8 bit = 1 byte.
@DanielOaks to make it less confusing, let's use "byte" instead of octet. These days there are no non-8-bit bytes anymore AFAIK.


Similarly to standard message handling, tags and the rest of the message have separate length values. The value of the `maxline` capability represents the maximum number of octets that the tags section, and that the rest of the message, can take up. Line length calculation is done this way in order to better integrate with methods currently used by IRC software to limit line lengths.

As an example, if `maxline` is 1024 then the maximum size of a full IRC message would be 2048 bytes (1024 for the tags, 1024 for the rest of the message).

This comment has been minimized.

@DarthGandalf

DarthGandalf Nov 18, 2016
Member

Let's divide by 2 instead of multiplying by 2.
The name "maxline" says the "line", not "half of line", it's just too confusing.
If maxline is 2048, then tags and non-tags part should be up to 1024.

This comment has been minimized.

@digitalcircuit

digitalcircuit Nov 18, 2016
Contributor

I agree that's less confusing from the IRC server/client developer point of view, but (personally) it seems more confusing from the user/IRC operator point of view.

E.g. "I see maxline is set to 4096, why can't I send messages that long?" And dividing also means the minimum of 1024 is no longer as easily recognized as the minimum of 512.

This comment has been minimized.

@DanielOaks

DanielOaks Nov 19, 2016
Author Member

Yeah my thoughts are similar to @digitalcircuit's. If a user sees maxline=2048, I think they'd expect that they can send ~2000-long lines, which is why I think it should stay this way.

This comment has been minimized.

@DarthGandalf

DarthGandalf Nov 19, 2016
Member

In that case please emphasize this explanation in the text.


If a client has negotiated the `maxline` capability and sends a `PRIVMSG` or a `NOTICE` message that is longer than 512 octets, the receiving server MUST split this into multiple regular (512-octet) length messages when sending it to clients that have not negotiated the `maxline` capability.

Servers SHOULD split on whitespace, but may use whatever method is easiest for them to implement. Splitting does not need to occur at the exact max length of the message, and servers can instead opt to split a number of characters earlier to simplify processing.

This comment has been minimized.

@DarthGandalf

DarthGandalf Nov 18, 2016
Member

Line SHOULD NOT be split in middle of UTF-8 character (or a surrogate pair)

This comment has been minimized.

@DarthGandalf

DarthGandalf Nov 19, 2016
Member

Oops, by surrogate pairs I actually meant combining characters. But as number of them in a row is potentially unlimited, detecting combining characters can be more trouble than it worth.

This comment has been minimized.

@jwheare

jwheare Nov 19, 2016
Member

This can be simplified if you convert to Unicode code points to find the split point before converting back, but might depend on an assumed character encoding.

It would be interesting to look at various clients' splitting routines to compare and see if any general guidelines can be included.


Servers SHOULD split on whitespace, but may use whatever method is easiest for them to implement. Splitting does not need to occur at the exact max length of the message, and servers can instead opt to split a number of characters earlier to simplify processing.

Servers MAY split other commands/numerics into multiple lines in a way similar to `PRIVMSG` and `NOTICE` above, if it is purely for display purposes.

This comment has been minimized.

@DarthGandalf

DarthGandalf Nov 18, 2016
Member

Is logging a display purpose?
Need to clarify what is display purpose.

This comment has been minimized.

@DanielOaks

DanielOaks Nov 19, 2016
Author Member

Yeah this language is a bit sketch, I'll just remove that clarifier.

In this example, C1 has negotiated `maxlen` but C2 has not.

C1 -> PRIVMSG coolfriend :Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
C2 <- :c1!test@localhost PRIVMSG coolfriend :Lorem ipsum dolor sit amet, consectetur adipiscing elit,

This comment has been minimized.

@DarthGandalf

DarthGandalf Nov 18, 2016
Member

I think split lines also need to be marked with some tag.

This comment has been minimized.

@DarthGandalf

DarthGandalf Nov 18, 2016
Member

Maybe a batch.

This comment has been minimized.

@DanielOaks

DanielOaks Nov 19, 2016
Author Member

I'd consider that to be something to look at after we introduce message IDs

This comment has been minimized.

@DarthGandalf

DarthGandalf Nov 19, 2016
Member

I don't see how IDs are relevant. If anything, the ID would just apply to the batch as a whole,

@id=foo :irc.server.net BATCH +x too-long-line
@batch=x :c1!test@localhost PRIVMSG coolfriend :Lorem ipsum dolor
@batch=x :c1!test@localhost PRIVMSG coolfriend :sit amet
:irc.server.net BATCH -x

This comment has been minimized.

@DanielOaks

DanielOaks Nov 19, 2016
Author Member

Fair, my bad. In regards to marking them at all... maybe. I'm iffy on introducing a batch here because if they don't implement longer lines, they probably won't implement the batch either and it introduces a fairly large amount of overhead on the server side for a case which could be reasonably common.

Do we really need this and would it actually be useful for clients in the real world, or would it just unnecessarily complicate sending PRIVMSG/NOTICEs?

This comment has been minimized.

@DarthGandalf

DarthGandalf Nov 19, 2016
Member

Clients already can implement batches, and some of them do. So the situation where client supports batches, but not longer lines is very possible.

If client doesn't want to receive the lines in batch, it's free to not request batch, and still receive the lines. E.g. old pre-IRCv3 clients would do that.

This comment has been minimized.

@DanielOaks

DanielOaks Nov 19, 2016
Author Member

Would clients that don't implement support for longer lines implement the batch type instead? @attilamolnar @jwheare @dequis @SaberUK your thoughts here for some more from both ircd and client sides? I'm not convinced it'll be used but if people really want it then can write it up.

This comment has been minimized.

@TingPing

TingPing Nov 19, 2016
Contributor

I think the most likely situation is a client supports neither. Speaking for HexChat I am more likely to implement long lines than batch.

This comment has been minimized.

@dequis

dequis Nov 19, 2016
Contributor

From the client perspective I feel like tingping, from the server perspective I don't mind providing a batch fallback when splitting lines.

This comment has been minimized.

@jwheare

jwheare Nov 20, 2016
Member

I'm not really in favour of speculatively specifying transitional solutions that offer half-solutions, it muddies the overall spec.

I feel similarly about the suggestion that a client might want to request a longer than 512 but shorter than advertised limit. I'd prefer a clearer, simpler, all-or-nothing spec.

@digitalcircuit
Copy link
Contributor

@digitalcircuit digitalcircuit commented Nov 18, 2016

@clokep In some cases, the language/libraries might handle this automatically - e.g. Quassel uses Qt's QTextBoundaryFinder to handle word and grapheme splitting in different languages.

That might be overkill for simpler/low-resource servers, though, and I'd agree with @DanielOaks on not firmly defining it in the spec. The ideal path involves all clients using the new maxline capability anyways, eventually removing this issue entirely.

However, it might help to also suggest looking into whatever tools already exist for your given language/framework (if any).

@HelixSpiral
Copy link

@HelixSpiral HelixSpiral commented Nov 18, 2016

@shawn-smith octet is 8 bit = 1 byte.
@DanielOaks to make it less confusing, let's use "byte" instead of octet. These days there are no non-8-bit bytes anymore AFAIK.

@DarthGandalf The amount of bytes required for a character varies between charset and encoding. The reason RFC1459 states the max line length in characters and not octets and bytes is because it's charset/encoding agnostic. Regardless of what you use there will be 510 usable characters in an IRC line + the terminating CRLF.

This should be done the same way. Using characters and not octets or bytes.

@DarthGandalf
Copy link
Member

@DarthGandalf DarthGandalf commented Nov 18, 2016

@shawn-smith

2.2 Character codes
[...] The protocol is based on a set of codes which are composed of eight (8) bits, making up an octet.

RFC assumes that a character is 8 bits, that's why it uses such terms interchangeably.

@dwfreed
Copy link

@dwfreed dwfreed commented Nov 18, 2016

@shawn-smith Section 8.2 of RFC1459 specifies a buffer of 512 bytes holds 1 full message. Section 2.2 can easily be interpreted to mean that a character is 8 bits (as @DarthGandalf pointed out while I was writing this). Furthermore, every implementation I've seen has used C char arrays (or something semantically similar), with 512 usable locations, to hold a line. Specifying that the line length limit is based on encoding-agnostic characters means that the IRCd must know the encoding used (not always possible) and that the storage is variable, which does not work well in C-based IRCds. Therefore, the line length should be specified in a unit that does not vary (eg bytes) for simplicity and ease of implementation.

@dequis
Copy link
Contributor

@dequis dequis commented Nov 19, 2016

What if the server advertises a line length that is larger than what the client is willing to accept?

If this is enabled directly through CAP REQ, that leaves no place for negotiation.

My idea of that was that the value in CAP LS is just a suggestion, and doing CAP REQ just enables a new verb to set maximum line length, which is client initiated and acked by the server. Something like this:

C: CAP LS
S: CAP * LS :maxline=4096
C: CAP REQ :maxline
S: CAP * ACK :maxline
C: MAXLINE 2048
S: MAXLINE * ACK 2048
(or)
C: MAXLINE 8192
S: MAXLINE * NAK 8192 4096

(verb name subject to change)

Downside: this may add too much additional complexity for servers and reduce their chances to reuse messages. For other ircd devs: How does this look from your point of view? My ircd isn't real enough to care about reusing messages.

@dequis
Copy link
Contributor

@dequis dequis commented Nov 19, 2016

Another option: Specify a maximum upper bound that clients must accept.

Say, up to 64kb or some other arbitrary high-but-not-too-high-number. 640kb should be enough for everyone.

One could say that clients aren't as resource constrained as servers, or, at least, can't easily take advantage of message reuse like servers often do.

If the server advertises something higher than $upperbound, the client may reject it and choose to stay with 512+512 message lengths. Or it might accept it anyway, if its internal limit is higher than what the server advertises. But to be compliant with this spec that internal limit must not be lower than $upperbound

@IotaSpencer
Copy link

@IotaSpencer IotaSpencer commented Nov 19, 2016

If IRC is a protocol, which it is, clients should if anything, use what the server sends them, granted, there should be an upper limit that servers can set, so we aren't having 1000+ character topics etc.

@DanielOaks
Copy link
Member Author

@DanielOaks DanielOaks commented Nov 19, 2016

RFC1459:

2.3 Messages

   IRC messages are always lines of characters terminated with a CR-LF
   (Carriage Return - Line Feed) pair, and these messages shall not
   exceed 512 characters in length, counting all characters including
   the trailing CR-LF. Thus, there are 510 characters maximum allowed

5.8 Ison Message

   set of nicks  given  in  the parameter.  The only limit on the number
   of nicks that may be checked is that the combined length must not be
   too large as to cause the server to chop it off so it fits in 512
   characters.

8.2 Command Parsing

   results of the most recent read and parsing are kept.  A buffer size
   of 512 bytes is used so as to hold 1 full message, although, this

RFC refers to both bytes and characters. For this spec, I think just saying bytes makes sense (afaik most IRC servers should already be doing it based on bytes) so I'll do that.

@DanielOaks
Copy link
Member Author

@DanielOaks DanielOaks commented Nov 19, 2016

@dequis The main reason I've done it this way is to avoid that complexity. Just with mine, having to regenerate where to split messages for every single user based on whatever line length they've accepted seems really dodgy and could be more resource-intensive than it needs to be. At least with this (taking into account max nicklen/userlen/hostlen and all), you can generally split it once and use it across all the clients that haven't accepted longer line lengths.

If the client's not willing to accept the larger line length, I think they'd just not request the cap and leave it there.


If a client has negotiated the `maxline` capability and sends a `PRIVMSG` or a `NOTICE` message that is longer than 512 bytes, the receiving server MUST split this into multiple regular (512-byte) length messages when sending it to clients that have not negotiated the `maxline` capability.

Servers SHOULD split on whitespace, but may use whatever method is easiest for them to implement. Splitting does not need to occur at the exact max length of the message, and servers can instead opt to split a number of characters earlier to simplify processing. Lines SHOULD NOT be split in the middle of a UTF-8 character.

This comment has been minimized.

@ProgVal

ProgVal Nov 19, 2016
Contributor

What about users who use non-UTF-8 encodings?
Would it be possible to make UTF-8 mandatory for IRCv3.3 clients so this is not an issue?

This comment has been minimized.

@dequis

dequis Nov 19, 2016
Contributor

If it's not utf-8, split on byte boundaries. This is an optional "SHOULD NOT", it's a suggestion for servers to avoid introducing invalid utf-8 if possible. Most other encodings aren't multibyte, and it's okay to break those that are.

This comment has been minimized.

@TingPing

TingPing Nov 19, 2016
Contributor

Or just say don't split multi-byte characters.

This comment has been minimized.

@ProgVal

ProgVal Nov 19, 2016
Contributor

A server cannot reliably know if the client uses UTF-8 or not.

This comment has been minimized.

@dequis

dequis Nov 19, 2016
Contributor

Sure you can, it's trivial if you limit your scope to knowing if an individual message is valid UTF-8 or not.

You could even do it without validating the whole message. UTF-8 has a set of properties that make it easy to detect (roughly: start byte & 0xc0 == 0xc0, continuation byte & 0xc0 == 0x80, and the number of required continuation bytes). If the bytes at the splitting point follow those properties, you split the line before the start character. If they don't, it's not valid utf-8, split at the original point. This is a pretty common thing to do.

It's harder if you have to deal with other non-utf8 multibyte encodings at the same time which is one of the reasons i wouldn't bother.

This comment has been minimized.

@DanielOaks

DanielOaks Nov 19, 2016
Author Member

Optional SHOULD NOT to tell servers to avoid unnecessarily breaking UTF-8 messages. We like UTF-8, this sort of a line makes sense for us.


### The `truncated` Tag

The `truncated` tag, when present, indicates that a message has been truncated due to the client's line length. It may be sent to any client which supports message tags, as deemed appropriate by the IRCd.

This comment has been minimized.

@ProgVal

ProgVal Nov 19, 2016
Contributor

Maybe add the number of truncated bytes/characters as a value to the tag, so clients can show something like “and X more bytes/characters”?

This comment has been minimized.

@DanielOaks

DanielOaks Nov 19, 2016
Author Member

Hmm, the truncated tag itself does also reduce the number of characters that are allowed in the tags section which makes this difficult, makes for some slightly annoying math if we want to have it spit out this val. I feel like just being able to say "this message is truncated" or similar for the clients should be fine.

C: CAP LS
S: CAP * LS :maxline=2048

Similarly to standard message handling, tags and the rest of the message have separate length values. The value of the `maxline` capability represents the maximum number of bytes that the tags section, and that the rest of the message, can take up. Line length calculation is done this way in order to better integrate with methods currently used by IRC software to limit line lengths.

This comment has been minimized.

@luxaritas

luxaritas Nov 20, 2016

Could you explain why the tag limit and message limit can't be specified separately? I could imagine an environment (potentially such as one which I support) where many tags are needed to provide rich client capability, but a restricted message length would be preferred.

This comment has been minimized.

@DanielOaks

DanielOaks Jan 13, 2017
Author Member

Done, thanks very much for the feedback! This also lets it work much more nicely with the new message-tags changes that are coming.

@djahandarie
Copy link

@djahandarie djahandarie commented Nov 20, 2016

A few thoughts on the basic idea of longer message lengths:

  • Rate limiting: Most IRCds rate limit on lines/second for a given PRIVMSG/NOTICE target. With longer potential line lengths, those rate limits would need to be reduced to account for worst-case floods. Unfortunately, simply reducing the rate limit can cause problems for active channels — to support both active channels and protecting user's sendq buffers, IRCds would likely need to implement bandwidth-based rate limiting.

  • Denial of service: One effective protection most IRC-based software has is that the length of any message they need to parse is bound to 512 bytes. With longer messages, inefficient parsers become exposed to a far wider range of potentially dangerous inputs.

    Putting services and normal clients/bots aside, even IRCds themselves have various parsers/algorithms that I'd be fairly worried about under long message lengths: what happens when someone provides a max-length input to a cryptographic algorithm (like SASL or oper passwords or CHALLENGE), or to a channel mode parser, or to an extban parser, etc.

    Every single reasonably complex block of code which had been able to take this max-length protection for granted would need to be tested for performance issues under long inputs.

I think there should likely be some guidance about these points in the spec.

@DanielOaks
Copy link
Member Author

@DanielOaks DanielOaks commented Nov 20, 2016

Hmm, considering most users' messages (privmsgs/notices) are likely to be under 512 chars, could simply suggest something along the lines of this in a non-normative implementation considerations section:

Servers should carefully consider their rate-limiting policies when implementing this extension. For example, with a rate-limiter based on penalties applied to each client, where the server currently applies one rate-limiting penalty for messages 512-bytes and under, they may apply multiple rate-limiting penalties for larger messages to compensate.

This is particularly useful on the PRIVMSG and NOTICE commands, where the required message splitting may increase the workload on servers by a significant amount on larger messages.

Those general denial-of-service issues on the other hand... yeah that's hard to address, and becomes very interesting particularly as you look at passwords on registration/authentication. Unfortunately I don't think we can address that in the spec itself, more just something that implementers need to keep a close eye on. Maybe a similar sort of non-normative suggestion:

Most current algorithms and parsers are tuned and intended specifically to be used with 512-length messages. This extension allows servers to vastly increase these limits. As such, server, client, and services authors should all take a very close look at the algorithms and message-parsing code while investigating this extension. In particular, certain algorithms may present certain overflows or degraded performance when dealing with lines longer than were intended when those algorithms were written.

As an example, certain clients may find it worth having a minimum and a maximum limit to allow.

One example of an area to pay specific attention to is usernames/passwords when used for registration and authentication. As an example, clients may be locked out of accounts and channels when registering on a client with maxline, but later using a client without maxline to authenticate, and similar issues.

Thoughts?

edit: Better specified section and improved language, thanks @jwheare

@grawity
Copy link
Contributor

@grawity grawity commented Nov 20, 2016

SASL, at least, shouldn't be a problem since AUTHENTICATE already supports continuations, so mechanisms already expect huge messages.

Thanks @djahandarie for the advice here and for bringing this issue up.
@nomis
Copy link
Contributor

@nomis nomis commented Nov 25, 2016

Is the problem with wanting increased line lengths or just with knowing when your line is going to be truncated?

I don't think defining a limit in terms of line length alone works because the final message limit varies based on the hostname that the recipient sees, which may be different from what the sender thinks their hostname is or for privileged users that can see real hostnames. The end result is that clients may still try to guess where the text will be truncated and get it wrong.

If you split messages on behalf of clients, what happens when a message with colours and other formatting characters gets split?

What happens when a message with commands (in the middle of the line) intended for a bot gets split?
i.e. a long response from bot A (negotiating very long lines) prefixed with "nickname: " is split and another bot B (negotiating very short lines) interprets text inside that long response as a command.

I'd be inclined to add a MAXMSGLEN= to 005 and reduce the current limit based on the maximum length of other components of the message...

@jwheare jwheare added the protocol label Jan 7, 2017
@jwheare jwheare modified the milestone: Roadmap Jan 7, 2017
@DanielOaks
Copy link
Member Author

@DanielOaks DanielOaks commented Jan 13, 2017

I'll write the changes to let it interact much more nicely with message-tags' extension of tag lengths once those changes are merged in.

@DarthGandalf

This comment has been minimized.

Copy link
Member

@DarthGandalf DarthGandalf commented on extensions/line-lengths.md in 73f538c Jan 13, 2017

CAP LS 302

Otherwise, server isn't allowed to add =digits to the response

@DanielOaks
Copy link
Member Author

@DanielOaks DanielOaks commented Jan 19, 2017

There are still some q's to answer in the spec itself, but for peeps looking to test I've got an example implementation here.

@jwheare
Copy link
Member

@jwheare jwheare commented Jan 19, 2017

How will this interact with message IDs?

Given a stupidly short maxline (just for the sake of the example), a maxline supporting client might see:

@msgid=123 PRIVMSG #chan :long line that may or may not get split

While a legacy client might see:

@msgid=123 PRIVMSG #chan :long line that may
@msgid=123 PRIVMSG #chan : or may not get split

Is something like a @continuation=123 tag needed for the second line so the client can treat it as a single logical message. So e.g.

@msgid=123 PRIVMSG #chan :long line that may
@continuation=123 PRIVMSG #chan : or may not get split

@DanielOaks
Copy link
Member Author

@DanielOaks DanielOaks commented Jan 19, 2017

I could see a continuation tag make sense there (that works akin to that last example you posted). I'll throw that into the spec.

@DarthGandalf
Copy link
Member

@DarthGandalf DarthGandalf commented Jan 19, 2017

@jwheare
Copy link
Member

@jwheare jwheare commented Jan 19, 2017

Yeah perhaps a @split or @continued tag might make sense. Maybe with a value indicating how many continuations are to follow? So e.g. a theoretical message split into 3:

@msgid=123;continued=2 PRIVMSG #chan :long line that may
@continuation=123;continued=1 PRIVMSG #chan : or may not
@continuation=123 PRIVMSG #chan : get split

Dunno if the tag value is necessary, would it enable any valuable use cases?

@jwheare
Copy link
Member

@jwheare jwheare commented Jan 19, 2017

Oops, forgot about @ mentioning usernames :/

@luxaritas
Copy link

@luxaritas luxaritas commented Jan 19, 2017

Any reason why that continuation type can't be a batch tag?

@luxaritas
Copy link

@luxaritas luxaritas commented Jan 19, 2017

*Continuation tag

@jwheare
Copy link
Member

@jwheare jwheare commented Jan 19, 2017

Yes it could be I suppose.

@msgid=123 BATCH +foo split-msg
@batch=foo PRIVMSG #chan :long line that may
@batch=foo PRIVMSG #chan : or may not
@batch=foo PRIVMSG #chan : get split
BATCH -foo

And I guess if the client didn't enable batch, only the first message would get the tags. Although all messages could safely have e.g. an @account or @time tag. But mgid, label and client tags should probably not be duplicated.

@jwheare
Copy link
Member

@jwheare jwheare commented Jan 19, 2017

@lp0:

What happens when a message with commands (in the middle of the line) intended for a bot gets split?
i.e. a long response from bot A (negotiating very long lines) prefixed with "nickname: " is split and another bot B (negotiating very short lines) interprets text inside that long response as a command.

Is this a new or real world issue? Client side splitting has the same theoretical issue but does it cause problems?

@ProgVal
Copy link
Contributor

@ProgVal ProgVal commented Jan 19, 2017

Is there a client these continuation tags or batches will benefit to?

To be more specific: is there a client that will be updated to support continuation tags or batches, but not to support arbitrary line length?

@jwheare
Copy link
Member

@jwheare jwheare commented Jan 19, 2017

It's a good point. It does seem like quite an edge case and may not be worth the bother.

@DanielOaks
Copy link
Member Author

@DanielOaks DanielOaks commented Jan 19, 2017

Exactly, that's why I'll just go with whatever option is simplest and throw in an update to the proposal. Should go well

@SadieCat
Copy link
Contributor

@SadieCat SadieCat commented Jan 19, 2017

To be more specific: is there a client that will be updated to support continuation tags or batches, but not to support arbitrary line length?

Having completely arbitrary line lengths is risky, what else should happen if the server limit is beyond that which the client limits to?

@dequis
Copy link
Contributor

@dequis dequis commented Jan 19, 2017

Having completely arbitrary line lengths is risky, what else should happen if the server limit is beyond that which the client limits to?

Yeah, since this spec explicitly doesn't cover length negotiation to simplify things, clients have to choose to accept it or reject the length offered by the server, but they might still want to reassemble split messages. Either continuation tags or batches are fine for that.

@ProgVal
Copy link
Contributor

@ProgVal ProgVal commented Jan 19, 2017

Why would a client refuse a given message size if it can reassemble the same size it afterward?

@dequis
Copy link
Contributor

@dequis dequis commented Jan 19, 2017

Not necessarily the same size. For example, if the server offers 1mbyte and the client accepts up to 64kbytes, it can choose to stop reassembling incoming messages after appending that much to the original message and show them as separate messages. Most manually-written messages are going to be way smaller than the limit anyway, I'd expect almost everything to be in the range of 1-4 parts.

luxaritas pushed a commit to eternagame/HTML-Chat that referenced this pull request Feb 5, 2017
Hopefully this can be boosted as time goes on, particularly via ircv3/ircv3-specifications#281
luxaritas added a commit to eternagame/HTML-Chat that referenced this pull request Sep 28, 2017
Hopefully this can be boosted as time goes on, particularly via ircv3/ircv3-specifications#281
@DanielOaks
Copy link
Member Author

@DanielOaks DanielOaks commented Oct 29, 2017

I'd rather go with some sort of continuation cap that works with labels than this. The boosted tag space in the new message-tags spec helps out to a decent extent anyway, and given the complicated nature of this spec, I'd rather focus on something grounded a bit more in reality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

You can’t perform that action at this time.