Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add `resume` extension #306

Open
wants to merge 26 commits into
base: master
from

Conversation

@DanielOaks
Copy link
Member

DanielOaks commented Apr 13, 2017

Occasionally, clients disconnect from IRC. What happens these days is that the client connects with a different nick, joins all their old channels again, waits for the old connection to time out (or manually kills it using services), and then changes back to their original nickname.

This feature intends to vastly simplify this form of reconnection, and reduce the amount of nick-switching, JOIN and QUIT notices, and general disruption that other clients see when this happens.

Note that this is not related to detaching and re-attaching an active connection (such as if a client is forced to change servers), but only to streamline the reconnection process for unintended connection drops.

Rendered specification

Show resolved Hide resolved extensions/resume.md Outdated
Show resolved Hide resolved extensions/resume.md Outdated
Show resolved Hide resolved extensions/resume.md Outdated
@DanielOaks

This comment has been minimized.

Copy link
Member Author

DanielOaks commented Apr 13, 2017

The idea around using CHGHOST rather than an explicit separate verb for noting resumed connections.

One of the main things to convey when a reconnection happens is that message history may have been lost, which is why we send an explicit QUIT+JOIN for clients that don't support this capability. If we used CHGHOST with a resume tag, clients with message tags and resume would receive the CHGHOST, clients with resume but no tags would have to receive the QUIT+JOIN so the user knows that the connection was explicitly disconnected and reconnected. As well, not requiring message tags means that this can be implemented on clients and servers that don't currently support them.

Using an explicit new message reduces the dependencies on other extensions, which in this case I think is a good thing.

Show resolved Hide resolved extensions/resume.md Outdated
Show resolved Hide resolved extensions/resume.md Outdated
Show resolved Hide resolved extensions/resume.md Outdated
Show resolved Hide resolved extensions/resume.md
Show resolved Hide resolved extensions/resume.md Outdated

@jwheare jwheare modified the milestone: Roadmap May 9, 2017

@jwheare
Copy link
Member

jwheare left a comment

How might this interact with a METADATA spec or anything else like that? I would expect all the client set metadata to be restored when resuming.

Probably needs to be documented in either this spec or the revised metadata spec or both, but not urgent now as neither of them are mergeable yet.

Show resolved Hide resolved extensions/resume.md Outdated
Show resolved Hide resolved extensions/resume.md Outdated

## Examples

A client with the nick `dan` reconnecting. The old connection used the username `~old` and the host `192.168.0.5`, and the new connection uses the username `~d` and the host `127.0.0.1`:

This comment has been minimized.

@jwheare

jwheare Aug 9, 2017

Member

Mention new nick is dan- and ideally choose a more obviously different name.

Show resolved Hide resolved extensions/resume.md Outdated
Show resolved Hide resolved extensions/resume.md Outdated
@prawnsalad

This comment has been minimized.

Copy link

prawnsalad commented Aug 21, 2017

There's no mention of the length of time a server may allow a connection to be resumed. Is the intended use to be able to resume after 5mins of disconnection? 1 hour? Or x2 ping timeout length?

And I'm assuming that once the RESUME cap has been negotiated that's when the server will not simply quit the user after a ping timeout. Which leads me to think the cap should also advertise the length of time a connection can be resumed within?

@prawnsalad

This comment has been minimized.

Copy link

prawnsalad commented Aug 21, 2017

As mentioned on other specs, scoping any errors to the command will help tidy things up and remove the need for extra numerics. Eg. resume ERR_CANNOT_RESUME.

@DanielOaks

This comment has been minimized.

Copy link
Member Author

DanielOaks commented Aug 21, 2017

Honestly I haven't given thought to the possibility of letting/having the server extend the ping timeout length when the cap's negotiated, or similar. I suppose servers could definitely do so if they want to, that'd be interesting. Maybe the cap's value could be a series of k/v's similar to sts' params.

Right now, it's intended to just allow reconnection up to the point where the server disconnects the client due to ping timeout, but allowing servers to make it more clear (and extend how long the client connection stays open if they've negotiated the right cap) definitely makes sense, thanks for the yell with that!

@prawnsalad

This comment has been minimized.

Copy link

prawnsalad commented Aug 21, 2017

So if I've understood this correctly, the only way to make use of this is to guess the servers ping timeout length, and ping the server from the client at an interval shorter than that to try catch it before the server does. This sounds messy and ripe for a connection race condition.

Having an increased timeout once the cap has been negotiated will make this more useful.

@slingamn

This comment has been minimized.

Copy link

slingamn commented Feb 12, 2019

That sounds correct. So this would be an alternative server-side implementation strategy for a proposal with a single opaque token.

@DanielOaks

This comment has been minimized.

Copy link
Member Author

DanielOaks commented Feb 12, 2019

Updated to a method which only uses a single opaque token value. Allows implementers to build the tokens however they like (for example, they could use that HMAC system described above), with a secure standard method described and recommended that fulfills our security requirements.

Give us a yell for how this looks! Hopefully a lot easier for clients, while still being good and secure.

@jesopo

This comment has been minimized.

Copy link

jesopo commented Feb 12, 2019

draft/resume-0.3 implemented in bitbot as of https://github.com/jesopo/bitbot/blob/14418286/modules/resume.py

Show resolved Hide resolved extensions/resume.md Outdated
@slingamn

This comment has been minimized.

Copy link

slingamn commented Feb 12, 2019

oragono's master branch (anything containing oragono/oragono@5bd2133) now implements the server side of draft/resume-0.3.

@slingamn

This comment has been minimized.

Copy link

slingamn commented Feb 12, 2019

It might help to explicitly address how RESUME interacts with the PASS command, e.g., by specifying that a successful RESUME will complete the registration without requiring PASS.

Either way, it would help to add some language like this to the security considerations section:

A successful resume immediately concludes the registration process. Implementations should therefore verify that RESUME cannot be used to evade any checks that are normally enforced during registration, such as the PASS command, IP bans, or nickmask bans: each such check must be imposed on at least one of the old connection and the new connection.

@Alexendoo

This comment has been minimized.

Copy link

Alexendoo commented Feb 14, 2019

Brought up in channel:

There is a gap left between resume and brb, where the client loses connection to the server but the TCP connection is closed for whatever reason.

The flow of resume could be altered slightly to accommodate it, by considering the client active for the timeout duration even if the TCP connection closes (without a QUIT).

brb would continue to be a way for clients to manually enter into the active but disconnected state, inheriting the same timeout logic.

@jesopo

This comment has been minimized.

Copy link

jesopo commented Feb 14, 2019

A clean TCP socket disconnect without a brb should always fire off QUITs imo - whether or not the client actually sent a QUIT or not. It's very common for clients to disconnect without properly QUITing and if we start counting this as a pingout scenario, all we'll see if an increase in pingouts.

@SaberUK

This comment has been minimized.

Copy link
Contributor

SaberUK commented Feb 14, 2019

It's very common for clients to disconnect without properly QUITing

I don't see why we can't have sending QUIT when disconnecting as a mandatory part of the resume spec tbh.

@jesopo

This comment has been minimized.

Copy link

jesopo commented Feb 14, 2019

It's very common for clients to disconnect without properly QUITing

I don't see why we can't have sending QUIT when disconnecting as a mandatory part of the resume spec tbh.

2 things.

  1. it's not always the irc client's fault that the tcp socket was closed
  2. do we really want to make a tcp connection close without a QUIT be an implicit "please hold this session open"?
@SaberUK

This comment has been minimized.

Copy link
Contributor

SaberUK commented Feb 14, 2019

  • it's not always the irc client's fault that the tcp socket was closed

Yes, and they'd probably want to resume rather than disconnecting if possible in those cases.

2. do we really want to make a tcp connection close without a QUIT be an implicit "please hold this session open"?

I don't see any significant reasons why not?

@Alexendoo

This comment has been minimized.

Copy link

Alexendoo commented Feb 14, 2019

Plus it requires opting in via a cap, so it's not entirely implicit

@jesopo

This comment has been minimized.

Copy link

jesopo commented Feb 14, 2019

I just don't think the majority of these scenarios are ones in which people will want to RESUME

@dequis

This comment has been minimized.

Copy link
Contributor

dequis commented Feb 14, 2019

Quoting from upthread:

prawnsalad commented on Aug 21, 2017

There's no mention of the length of time a server may allow a connection to be resumed. Is the intended use to be able to resume after 5mins of disconnection? 1 hour? Or x2 ping timeout length?

And I'm assuming that once the RESUME cap has been negotiated that's when the server will not simply quit the user after a ping timeout. Which leads me to think the cap should also advertise the length of time a connection can be resumed within?

DanielOaks commented on Aug 21, 2017

Honestly I haven't given thought to the possibility of letting/having the server extend the ping timeout length when the cap's negotiated, or similar. I suppose servers could definitely do so if they want to, that'd be interesting. Maybe the cap's value could be a series of k/v's similar to sts' params.

Right now, it's intended to just allow reconnection up to the point where the server disconnects the client due to ping timeout, but allowing servers to make it more clear (and extend how long the client connection stays open if they've negotiated the right cap) definitely makes sense, thanks for the yell with that!

I think we never addressed these, and I was left with the idea that any disconnect that isn't an explicit QUIT would mean the server keeps the client alive as if it were a longer ping timeout. Apparently that wasn't actually part of the spec.

FWIW, Oragono seems to have slightly larger timeouts when the resume cap is enabled (IdleTimeoutWithResumeCap)

It would be a bad idea to restrict this feature to pure ping timeouts only, there are many things that can kill a connection and some of them look like clean disconnections.

BitlBee's session takeover feature is very similar to this spec, but relies on the old connection being still up, which makes it very unreliable. I dislike that feature because it fails most of the time, and it would be very easy for me to implement this spec based on that code, but that'd be just as unpleasant.

So IMO we need to:

  • Have a timeout after non-QUIT disconnections
  • Make QUIT mandatory when actually quitting
  • Mention that servers might want to have a longer ping timeout time when resume is enabled.
  • Optional: roll up BRB in this spec? It can be separate but it's very closely related, and complements this spec nicely.
@slingamn

This comment has been minimized.

Copy link

slingamn commented Feb 14, 2019

tentative +1 for including BRB in this spec

Rationale: this gives us a clean view of the matrix of possibilities: cap enabled or disabled, brb sent or not, quit sent or not, ping timeout or not, what's the expected behavior? Seems to me this gets cleaner and easier to understand if we specify it all in one place, and with only a single cap that can vary.


This message is sent to indicate that a client has reconnected. `<nick>`, `<olduser>`, and `<oldhost>` indicate the client that has reconnected, and are the details of the old client. `<user>` and `<host>` indicate the reconnecting client's new username and hostname, and the client MUST process this information as they would a regular [`CHGHOST`](https://ircv3.net/specs/extensions/chghost-3.2.html) message. `<timestamp>`, if given, is a timestamp of the form described above which indicates the last message received by the reconnecting clent.

Upon receiving a `RESUMED` message, clients MUST display in some way that the given user has reconnected. If `<timestamp>` is given, clients SHOULD use this to display how much message history seems to have been lost.

This comment has been minimized.

@Alexendoo

Alexendoo Feb 14, 2019

I think this is better left as a decision for clients to make, clients may wish not to display reconnects. They may not be displaying JOINs and QUITs at all

This comment has been minimized.

@DanielOaks

DanielOaks Feb 14, 2019

Author Member

Makes sense. I'll make the text here more exact as to why clients should make sure their users know that a reconnection happened and history might've been lost, and less demanding as it is right now.

@DanielOaks

This comment has been minimized.

Copy link
Member Author

DanielOaks commented Feb 14, 2019

For what it's worth, the spec already mentions extending the ping timeout length explicitly: "In addition, servers MAY extend the "ping timeout" length for that client (with the expectation that if the client disconnects, it will try to resume its connection as described here)." Referring to clients that have negotiated the cap. I'll make this more explicit, though the specific duration will remain up to implementer's choice.

What I'm going to do, with jes's blessing:

  • Pull BRB wholesale into this spec, and present it as a command that can be used if the resume cap is advertised/negotiated (since it already depends on having a resume token, etc).
  • Make the server hold the connection open for some server-decided length of time if a client that's negotiated resume disconnects uncleanly in any way, in addition to just something like a ping timeout happens and their connection stays open. For example, if the socket closes without a QUIT or a BRB, and the client has negotiated resume capabilities, the server SHOULD leave the session open for the same length of time that they'd normally hold it open for a BRB.

Thanks for the comments on this, much appreciated. Should end up with something better.

@jesopo

This comment has been minimized.

Copy link

jesopo commented Feb 15, 2019

I'd like the wording around missed-chat-playback to get a little tighter on this; there was confusion on IRC about whether this spec mandates chat playback or not.

I propose this spec mandates that servers automatically send chat playback on RESUME (if possible) unless the client also supports CHATHISTORY (#349, would need to become a CAP though) - this would allow clients to lazy-load (when needed) channel chat playback as well as actually having control of what is played back.

@DanielOaks

This comment has been minimized.

Copy link
Member Author

DanielOaks commented Feb 15, 2019

Makes sense to go with lazy-loading in that way, for clients that do support actually grabbing chathistory on their own. If chathistory is made a cap it's a simple check on when the resume command is sent after reconnecting. If chathistory isn't a cap it's an optional argument between RESUME and the token which indicates flags to resume with, the one specified one being 'lazychathistory'.

note from e: If a server supports replaying chat history automagically via resume but not the separate CHATHISTORY command, then that server should ignore the lazychathistory directive if it's sent by a client.

note from dan: if chathistory is made a cap that can be req'd instead, ignore all this advice whoo.

@jwheare

This comment has been minimized.

Copy link
Member

jwheare commented Feb 15, 2019

CAP is probably the best bet, even though it might seem weird that it's mainly needed to suppress the auto-playback feature behaviour that isn't chathistory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.