Skip to content

Loading…

JSON-based protocol #55

Closed
DarthGandalf opened this Issue · 43 comments
@DarthGandalf
ircv3 member

Enableable by a cap json

E.g.

{
    "source": "nick!user@host",
    "verb": "PRIVMSG",
    "params": [
        "#channel",
        "Hello world!"
    ],
    "tags": {
        "server-time": "2014-03-10T08:28:43.126Z"
    }
}

One of issues solved by this: #54

The following has been written by @Aerdan and has been added to the first post for newcomers.

Why JSON?

The IRC protocol is structurally limited and doesn't allow for sideband information (e.g. message tags) without extending the protocol format. In addition, the IRC protocol is standardized on a strict 512-byte message, CRLF included. This means that any extensions which aim to add functionality of interest to users is inherently limited in what it can do. (See extended-join for an existing example of how adding features alters standard commands.)

JSON allows us to add new properties to messages without requiring clients to support those properties. They don't have to know, mind, or care about the size of those properties, either, because they are effectively invisible to clients which don't support them. This is not the case with message-tags, because every tag has a cost proportionate to the length of its name, plus the length + 1 of its contents, if it has a value, + 1 for each tag after the first.

Well, why not XML/protobuf/msgpack/insert message format here?

JSON is the most appropriate choice because we want a text-friendly protocol that plays well with real humans. It is also widely-supported, with implementations for every language anyone could possibly want to write IRC software in, unlike other formats. We don't want XML because XML is inherently unsuitable for messaging, particularly when people routinely write messages with control codes inside.

But this is going to break IRC! Waaaaah!

What? You mean it's not already broken? CAP exists because there was no way for clients to negotiate new features. Many other IRCv3 features exist to make the IRC experience better, and almost all of them could not exist without CAP.

But... But...

Any comments from this point on which are not constructive or additive to this debate will be purged with extreme prejudice. (Most of) You are adults. Act like it.

@DarthGandalf DarthGandalf added this to the IRCv3.3 milestone
@kaniini

Too much semantic data, needs to be simpler to make sure IRCd actually implements this. Think more like:

{'tags': {...}, 'payload': [...]}.

@Shawn-Smith

@kaniini I'm kind of on edge regarding what you're saying about it being simpler. If we make it too simple it'll be the exact same thing as before, just being sent with the additional overhead of JSON.

Eg:

{'tags': {sometags}, 'payload': 'PRIVMSG #derp SomeMessage' }

I don't think we should try to aim for something like this because it doesn't really help us extend functionality.

Now something like this for example:

{
    'tags':{
        'accountname': 'Shawn',
        'server-time': '2011-10-19T16:40:51.620Z',
        'some-other-tag': 'derpderp'
    },

    'source': {
        'nick': 'nick-name',
        'ident': 'ident',
        'host': 'host'
    },

    'verb': 'PRIVMSG',

    'destination': 'nick/#chan',

    'message': 'Hey! Look at me.'
}

Would give us more room to improve on. Say we wanted to expand PRIVMSG to also send the real-name of the user sending the message, no problem, just throw in a 'realname': 'name' and clients that haven't added that will ignore it and only use the first 3, while others can be updated to use this.

Since we're not changing the way source is parsed for older clients we don't need to worry about the breakage like we would if we were to change source from nick!ident@host to nick!ident@host:real

Here's an example for JOIN

{
    'tags':{
        'accountname': 'Shawn',
        'server-time': '2011-10-19T16:40:51.620Z',
    },

    'source': [
        'nick': 'nick-name',
        'ident': 'ident',
        'host': 'host'
    ],

    'verb': 'JOIN',

    'destination': '#channel',

    'key': 'optional key'
}

Now we could also expand on the destination in this command, possibly make it accept more than one channel like 'destination': ['#chan1', '#chan2' ],

All-in-all we are already wildly changing the protocol by wanting to use JSON, we should use this opportunity to give room for future expansion.

@DarthGandalf This doesn't really solve issue #54 as we still have to support users without this capability.

@attilamolnar
ircv3 member

Too much semantic data, needs to be simpler to make sure IRCd actually implements this

This is actually the most important thing. :+1:

@wodim

Interesting joke we have in here.

@SaberUK
'verb': 'PRIVMSG'

I think we should try to stick to the terminology used in RFC 1459 which is "command" rather than "verb".

@Shawn-Smith

@SaberUK I'm fine with that, I wasn't sure what to use so I used verb since @DarthGandalf did aswell

@kythyria

Enableable by a cap json

Then you'd still have to have enough of an RFC1459 parser to do capability negotiation.

@Shawn-Smith Having a "destination" field would be nice. Although how would that go with services?

{
    "verb": "privmsg",
    "destination": "chanserv",
    "params": ["access", "#ircv3", "list"]
}

This is not a significant improvement. Moving params[1] out into a separate target field would be better, as would something like this:

{
    "verb": "cs",
    "destination": "#ircv3",
    "params": ["access", "list"]
}

Not quite target but it matches commands that are built into the ircd better.

@jess-lawrence-axomic

Reinvent the wheel, because?

@wodim

As @jess-lawrence-axomic says, the fact that almost everything in computers can be expressed in JSON, doesn't mean that we should do it.

@dwfreed

Then you'd still have to have enough of an RFC1459 parser to do capability negotiation.

That's always going to be required, regardless. You can't convert the entire protocol, because you still have to support old clients (irssi is nearly dead, for example, but is one of the largest used non-graphical IRC clients). But realistically, all you need to understand is 'CAP REQ json' and 'CAP * ACK :json', and then you can just drop into JSON.

@Shawn-Smith Having a "destination" field would be nice. Although how would that go with services?

It doesn't. The IRCd you're connecting to would have to translate the JSON into something else, usually TS6, to hand off to services, because services never actually handles client-style IRC messages, only server-to-server messages (though server-to-server PRIVMSG is almost identical to client PRIVMSG).

This is not a significant improvement. Moving params[1] out into a separate target field would be better

This won't work, as because of the above reason, the IRCd must translate to the server to server protocol, and would have no idea where your specified channel or nick should go in the parameters list (and you can't assume last, because it could go in the middle), and the IRCd really shouldn't care what services you have attached to it. Services accommodating the IRCd through protocol modules in the services daemon (the current way) is much better than the IRCd accommodating services, especially in large networks.

@Ell

Why JSON and not something like protobuf?

@kythyria

@dwfreed I don't really agree with so much separation between ircd and services. In any case, this could be an opportunity to regularise the user-to-services protocol(s), and then have new protocol modules in the services daemon. It's not like it'll break any clients.

Besides, current ircds and services implementations have a tendency to not be well enough integrated: In unreal, for instance, +h makes you able to kick whether or not services thinks you should be able to.

@Ell Because JSON is at least textual. I'm not sure protobuf can cope with unexpected schema changes as well as JSON, either.

@Shawn-Smith

Besides, current ircds and services implementations have a tendency to not be well enough integrated: In unreal, for instance, +h makes you able to kick whether or not services thinks you should be able to.

In all honstey I hate +qah, I see no reason why +ov is not enough. Especially since +q was used as a quiet mode before it was added as a owner/founder mode, but that's getting a bit off-topic.

I think we should focus on getting a client<->server json protocol working, and then after that's good and working release an update-cap json-services that will better implement user<->services.

Edit :: Also, before we start working on user<->services integration I think we should work more on standardizing services themselves, things like db format, which flags do what, etc.

Some things I would like to see done for services:

  • Flags Flags Flags Flags! Flags are an amazing thing, and are incredibly extensible. Instead of having +AOo, etc though, we should map what we currently have to words. Eg: Flags +CanReadAccess +GainsOpOnJoin +CanOpOtherUsers, etc. Adding new ones would not be hard after this.

  • XOP system should be converted from whatever it currently does in it's specific implementation to setting certain flags on the user. Eg: XOP OP would give flags +CanReadAccess +GainsOpOnJoin +CanOpOtherUsers, or something.

  • Templates would be an amazing replacement for the current XOP stuff. Set templates for VOP, HOP, AOP, SOP, and Founder that can be used network-wide and allow users to fine-tune the templates on a per-channel basis. Atheme already allows for this.

  • Access. I have mixed feelings about access. It was what I 'grew up' using. It's something many networks use, but it's not very extensible. It's basically using a scale rule [-----|--]. The farther up the rule you go (the higher the access number) the more privileges you get. However you can not pick and choose which privileges you want to give a user. It's sort of a "you get everything from here and below" deal.

Edit 2 :: Issue #56 has been opened to work on standardizing services

@seshakiran

why can't it be msgpack?

@SaberUK

I don't really see any benefits of MessagePack over JSON which would be relevant to IRC.

@dwfreed

As @jess-lawrence-axomic says, the fact that almost everything in computers can be expressed in JSON, doesn't mean that we should do it.

The reason for this proposal, and this entire working group in general, is to extend the IRC protocol to make it more feature rich, and improve its extensibility. The current RFC1459 based protocol is very limiting, as you can only extend existing messages by tacking on parameters to the end of commands or numerics, or adding new ones (which becomes a mess in coordinating the additions). This JSON proposal opens the field for a lot more extensions that would have been limited or difficult to do within the limits of RFC1459 (though I'm not sure on whether one cap can require another cap). As an example, WHOX is a major pain to handle, as the numeric it uses (354) doesn't have a fixed format. The creators of WHOX worked around this by allowing you to specify a 3 character tag which would come first, and allow you to identify which WHOX you asked for so you know what parts translate to what, but it's a bit of a hack. With JSON, a proposal could easily be made to say WHOX replies are just objects, and everything you asked for is a property of the object, so you can just iterate over the list of properties, and pull out that data in a simpler fashion.

Why JSON and not something like protobuf?

@Ell Because JSON is at least textual. I'm not sure protobuf can cope with unexpected schema changes as well as JSON, either.

why can't it be msgpack?

I don't really see any benefits of MessagePack over JSON which would be relevant to IRC.

Indeed. The main point of protobuf and msgpack is to take up as little overhead on the wire and be as fast as possible. For IRC requirements, the speed of JSON generation is fast enough for large scale use, and the overhead on the wire isn't that much to be a major issue.

@attilamolnar attilamolnar added RFC and removed RFC labels
@khm

Nobody has specified any reason for switching to json aside from "it will make it easier to implement other bad ideas."

If you want XMPP, you know where to find it.

@kythyria

@Shawn-Smith If I wanted to keep +vhoaq I'd probably add some way of letting services define what they mean, so yes, templates, I guess. When going that route I was thinking being able to do +halfOpsCan kick,voice and so forth, and then granting people +h. But writing that as +CanKick ~v +CanVoice ~v or whatever and allowing individual names seems better in terms of allowing fine grainedness.

I kind of like that +hoaq displays roughly the hierarchy of who's got powers.

In short, I think all those bullets are :+1:, at least with the caveat that some easily displayable representation of authority remains, and that this enables using MODE, KICK, etc in preference to messaging chanserv.

@Ell

And what of the bandwidth concerns? If I have a channel with 1000 people in it actively conversing the overhead of sending JSON back and forth will be immense.

@robotoad

@wodim Using a widely known data exchange format like JSON seems like a decent attempt to move IRC forward into this century. The standardized text encoding alone would be worth it. Implementing a modern client for the Telnet-based protocol and its collection of hacks and subprotocols is mildly depressing. From an advocacy standpoint, a bold update like this would draw new interest from developers and drastically improve support for future extensibility.

@Ell

@prestonsumner why does IRC need to be moved forward in this way? The RFC for IRC is simple as hell and adding JSON to it wont make it simpler to parse (IRC lines are indeed super easy to parse you should try it). It honestly sounds like you just want XMPP but.. with JSON? Why even use IRC?

@tef

One major caveat for JSON is there is enough unspecified to make things a bit awkward, so you may want to specify some things left out in the RFC.

– JSON is defined as being 'Unicode', but will you accept UTF-16, UTF-32, or the myriad of other encodings? There are even awful variants of UTF-8 too.
— JSON is actually closer to UCS-2 in practice, so characters outside of the BMP may be represented as two unicode escape sequences, for the low and high surrogate pair.
— Will the Unicode be normalized after parsing?
— How should invalid codepoints be handled?
— What happens to duplicate keys in objects?

(You can see some of these in action on the README for JSONKit)

The key problem is that each of these introduces an ambiguity when parsing or interpreting for commands, and there is plenty of history (c.f langsec) of these ambiguities being exploited.

This could be a good reason to consider other formats, but mostly it's something to bear in mind if you do pick JSON.

@ednos

@prestonsumner Using a widely know data exchange format, like, I dunno, the mind-bogglingly simple IRC protocol specification, seems like a decent attempt to keep IRC right where it belongs, in any century with sane humans. From an advocacy standpoint, bogging down a trivially easy-to-implement protocol like IRC with all this nonsense overhead will make some laugh, others cry, and people on low-bandwidth connections rage. Find a real benefit that doesn't involve puking buzzwords.

@pmrowla

@Ell why does IRC need to be moved forward in this way? The RFC for IRC is simple as hell and adding JSON to it wont make it simpler to parse (IRC lines are indeed super easy to parse you should try it). It honestly sounds like you just want XMPP but.. with JSON? Why even use IRC?

this

@hlandau

This seems like a dubious proposal to me, though not in small part because I have yet to hear any use cases justifying the added complexity, or any way in which the IRC user experience could be improved by it.

Compatibility Issues, Network Effect, Adoption

The IRC data model and core semantics are basically concrete at this phase. By moving to JSON, you gain the freedom to move beyond the existing message semantics and data model, but you'd have to mandate mappability to the traditional IRC protocol to maintain compatibility, in which case there's no point moving to JSON in the first place. Conversely, if you don't maintain compatibility, there seems little point in labelling this as 'IRC' at all; without compatibility, you lose the Network Effect-based hope of reaching widespread adoption and the ability to take advantage of existing code deployments. See SILC and PSYC, and the general tendency for the first protocol to do X put into production on the internet to become the unreplaceable one, except by gradual compatible changes. AFAIK, even these changes rarely accomplish significant changes to the protocol's semantics and data model (one exception which comes to mind is Server Push in SPDY, but this is acceptably compatible with HTTP caching semantics).

If you -do- retain mappability, the transition itself is useless, but may create the opportunity for non-compatible changes down the line. But IRC software developers are unlikely to implement JSON without a compelling reason to, and any such compelling reason would have to break compatibility to justify JSON, etc. etc. etc.

Choice of Format

I'm also a little dubious that JSON represents the best choice of format. Alternatives such as bencoding or Canonical S-expressions or a variant thereof could be considered. Though I concede that in practice, the recent popularity of JSON is likely to mean anything else would have lesser political viability by comparison.

Existing IRC software is able to make assumptions about maximum line length, which aids processing efficiency. How is the JSON proposal reconciled with this? Will JSON messages have a maximum byte length for their encoded form?

Further, will JSON messages be newline-terminated, or length prefixed, or will bare JSON be sent? The latter would require a resumable parser to be used and would complicate matters in any involved software.

As for @kaniini's simplified proposal
{'tags': {sometags}, 'payload': 'PRIVMSG #derp SomeMessage' }
We already have IRCv3 message tags for this, so why go full-on JSON?

Miscellanea

@prestonsumner "The standardized text encoding alone would be worth it."
If you want an i_promise_to_use_utf8_at_all_times cap, why not suggest one? It sounds like a good idea to me.

@Shawn-Smith Shawn-Smith referenced this issue
Closed

IRC Services #56

@robotoad

@Ell You can't go by the RFC to implement modern IRC support. It's incomplete and in some cases inaccurate, as the popular networks long ago superseded it with new conventions, and there are subprotocols in use that the RFC doesn't cover.

@ednos If you're parsing PRIVMSG lines for your bot using some regex you found online, then IRC can be mind-boggingly simple, but if you're implementing a full-featured client, it's not. When I saw this proposal, I expected pushback over "buzzwords" from naysayers to pop up, and I roll my eyes at it.

@tef YOSPOS

@grawity grawity added the bikeshed label
@ednos

@prestonsumner Thanks for labeling your opponents "naysayers". If trivial text-parsing is too hard for you, maybe online communications isn't really your thing. Why don't you find an easier way to pass the time? The difficulties of creating a full-featured client have little to do with the protocol and much to do with the interface.

@Aerdan

Okay. Here we go.

Why JSON?

The IRC protocol is structurally limited and doesn't allow for sideband information (e.g. message tags) without extending the protocol format. In addition, the IRC protocol is standardized on a strict 512-byte message, CRLF included. This means that any extensions which aim to add functionality of interest to users is inherently limited in what it can do. (See extended-join for an existing example of how adding features alters standard commands.)

JSON allows us to add new properties to messages without requiring clients to support those properties. They don't have to know, mind, or care about the size of those properties, either, because they are effectively invisible to clients which don't support them. This is not the case with message-tags, because every tag has a cost proportionate to the length of its name, plus the length + 1 of its contents, if it has a value, + 1 for each tag after the first.

Well, why not XML/protobuf/msgpack/insert message format here?

JSON is the most appropriate choice because we want a text-friendly protocol that plays well with real humans. It is also widely-supported, with implementations for every language anyone could possibly want to write IRC software in, unlike other formats. We don't want XML because XML is inherently unsuitable for messaging, particularly when people routinely write messages with control codes inside.

But this is going to break IRC! Waaaaah!

What? You mean it's not already broken? CAP exists because there was no way for clients to negotiate new features. Many other IRCv3 features exist to make the IRC experience better, and almost all of them could not exist without CAP.

But... But...

Any comments from this point on which are not constructive or additive to this debate will be purged with extreme prejudice. (Most of) You are adults. Act like it.

@Ell

@prestonsumner the pushback isn't from buzzwords, its from shoving JSON into things where it doesn't belong. It sounds like the main issue is IRCd's not implementing things to spec, not the spec itself. The protocol is working fine, its just the edge cases involved in the various IRCd's that are the issue. Wouldn't a better goal be to have these IRCd's get together and agree on specification implementations? Right now it sounds like you want to create a new chat system that calls itself IRC but is in fact not IRC.

@pmrowla

@Aerdan there are protobuf bindings for just about any language I can think of that you would want to write an IRC client in. And as it has been said already, using something like protobuf would save a substantial amount of bandwidth, especially in channels with very large numbers of users.

@kythyria

@tef
Unicode: RFC-compliant UTF-8 everywhere. Surrogate pairs: see previous answer. Normalisation: Doesn't matter which one is picked so long as one is. Invalid codepoints: Client complains or silently converts to U+FFFD as desired, server kills the connection. Duplicate keys: That's an error. Kill the connection.

@Aerdan That people routinely send messages with control codes in isn't pleasant, but I don't see any significantly less shitty way to do rich text (CTCP needs to die in a fire, so it doesn't count).

XML is... for all I'm okay with it, it's not really optimal here. Nor is protobufs, given that it's not eyeballable. And the bandwidth you lose on JSON is probably more than made up for by the things that get rid of WHO polling.

@Aerdan

@pmrowla Maximal byte-wise efficiency is not a goal. That being said, protobuf is not extensible in the ways we want (you specifically have to mark extension points in the schema, I'm given to understand), which makes it unsuitable for our needs even if we were considering maximal efficiency.

(Incidentally, I meant it when I said I'd be purging nonconstructive/non-additive comments.)

@Aerdan Aerdan added RFC and removed bikeshed labels
@Ell

@Aerdan is mentioning XMPP nonconstructive? That's what the deleted comment mentioned.

@hlandau

@Aerdan
This doesn't make sense. You're implying that message size is a concern with the IRC protocol (as extended with message tags) and not with JSON, but limitations on message size are orthogonal to the choice of format. If you want to be able to attach additional data to messages willy-nilly, the simplest solution is to negotiate an expanded maximum message size, or some sort of dynamic message size restrictions. JSON vs. RFC protocol has no bearing on this issue.

Additionally, I reassert my above comments on parsing and message size issues for JSON. Are you suggesting that the JSON encoding should be free of message size limits? And how would you prefer for JSON messages to be delimited? Newlines? Length prefixes? Grammatically, via a resumable parser?

@pmrowla
I'd be against protobufs because it tends to be oriented towards nightmarish code generators. I don't feel that this is a particularly elegant solution.

@Aerdan

@ell We are discussing JSON as a protocol format. Unless you are drawing comparisons to XMPP which are actually germane to this discussion, no.

@ednos

How is referring to another protocol that has already solved this problem not constructive? The suggestions here clearly resemble XMPP more than they resemble IRC, so is it really that unreasonable to suggest that starting from a more modern foundation could greatly simplify this discussion?

@Aerdan

Any conversation that draws XMPP into it in a way that doesn't assist in formulating a JSON protocol is going to get emotional, if it hasn't already. So, no. Knock it off, please.

More to the point, XMPP does not solve the 'multiple users chatting at once' problem in any meaningful way.

@ednos

@Aerdan If you're discussing JSON as a protocol format, referring to related protocols (and their formats) seems constructive, unless you intend to construct the wheel yourself. If JSON is truly the best format for messaging, you should be open to adopting the best protocol for it, rather than using the shoehorn of dogma.

@Aerdan

But...you're not discussing XMPP in any meaningful way. A fair few comments talking about XMPP have been of the "why not just go use XMPP and forget about IRC?" variety, which is certainly not constructive.

@robotoad

@ednos My interest in this proposal has little to do with text parsing. I don't even care much that it's JSON. I'm interested in addressing real issues with the protocol, such as the example @dwfreed gave regarding WHOX. Today's protocol is a bit of a mess when you go beyond simple chat bots and try to implement a full-featured client.

What interests me most about this proposal is the extensibility it would provide, and that's what I meant about drawing in new developer interest by removing many of the limits imposed today. I believe more developer support ultimately leads to more users exposed to IRC.

@Ell It's not my intent to dismiss criticism, but you can see that the mention of JSON has brought out that contingent. Is there an alternative you'd suggest since you believe JSON doesn't "belong"? Or do you believe the protocol is fine as it is?

@attilamolnar attilamolnar removed this from the IRCv3.3 milestone
@kythyria

@Aerdan, @ednos: XML doesn't map onto JSON entirely cleanly, so XMPP isn't really relevant given we're discussing wire formats.

Some things from it are relevant, like being a better concept than CTCP (except for ACTION), but those concepts aren't much related to how things look on the wire. And I can't really recommend XMPP over this hypothetical JSON-IRC, for all I like it. A lot of the places where being made of XML really shines aren't something IRC is meant for.

@prestonsumner Most of the complexity of IRC isn't in the wire format, but I'll grant fishing things out of JSON might make it a bit nicer.

@attilamolnar attilamolnar removed the bikeshed label
@kaniini

Great, we have a bunch of idiots on the bug.

I guess we need to go back to JIRA so we can set appropriate participation levels.

@ednos

@prestonsumner If new developers aren't coming out of the woodwork to write IRC clients (was that your goal?), it might indicate the need to build robust IRC libraries. What do you envision a client requiring, protocol-wise, that a chat bot doesn't also require?

@Aerdan Did you miss http://xmpp.org/extensions/xep-0045.html or does that not satisfy the "multiple users chatting at once" problem?

@kythyria It doesn't need to be a trivial s/XML/JSON/ operation, but it seems clear to me that XMPP already solves many of the problems with chat protocol extensibility, and beginning from XMPP would require significantly less work than starting from IRC, for many of the reasons cited on JSON's behalf above.

@kaniini That was rather insightful; you've brought a lot to the table here that those XMPP fanatics will just never understand. Let's pity their small minds together.

@robotoad

@Ell How would you propose implementing new extensions to the current protocol while maintaining compatibility with existing clients? That's the intent here.

@grawity grawity added the enhancement label
@kaniini

@ednos so you believe it is acceptable for someone to post something on hacker news and then have a bunch of yuppies disrupt actual work? github refuses to do anything about this trend, and this is the latest victim in it. people seem to believe that actual bugs being used for actual work are a place for their soapboxing -- they are not.

Since this bug is trashed, I am closing it out. We will create another issue at a later date since we do not have JIRA available anymore and do not wish for people to bikeshed and soapbox in our workspace. I will certainly use this bug as an argument for bringing JIRA back at Atheme, though.

@kaniini kaniini closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.