Skip to content
This repository has been archived by the owner on Jun 30, 2022. It is now read-only.

Auto reconnect capability #20

Open
shusugmt opened this issue Jan 5, 2015 · 13 comments
Open

Auto reconnect capability #20

shusugmt opened this issue Jan 5, 2015 · 13 comments

Comments

@shusugmt
Copy link
Contributor

shusugmt commented Jan 5, 2015

It would be nice if it reconnects automatically when connection to HipChat has lost.

I run my bot on my laptop. Once I close my PC, lita never reconnects automatically so I need to stop and launch it again every time, which is a bit annoying.

@shusugmt
Copy link
Contributor Author

shusugmt commented Jan 5, 2015

I've written a poc. @jimmycuadra before I update specs, can you see my implementation?
shusugmt/lita-hipchat@0d9369d

@jimmycuadra
Copy link
Collaborator

Thanks for the POC. I think trying to manage reconnection within the client like this is probably a little too complex. What do you think about making Lita crash (or just calling shut_down explicitly) when the connection is lost? Then it can easily be restarted by the user's preferred process manager/init system. I like the idea of just letting it crash and then starting up again with a totally clean slate.

@shusugmt
Copy link
Contributor Author

shusugmt commented Jan 9, 2015

@jimmycuadra Thank you. Your idea sounds better. I'll write a new poc on other branch.

@justintime
Copy link

Just an opinion, but it feels to me like you're hitting a nail with a sledgehammer by crashing lita. What if lita is using more than 1 adapter, like slack and hipchat? In this case, if hipchat goes down for more than a few seconds, lita will be in a join/leave cycle in slack until hipchat comes back up.

Also, many process managers will only attempt to restart a service a finite number of times before it gives up. If hipchat were down for long enough, it wouldn't ever be restarted if the process manager gave up.

@jimmycuadra
Copy link
Collaborator

A single Lita process can't be used with more than one adapter, so that is not a problem in this case. I'm not aware of a process manager that can't keep attempting to restart a process infinitely – do you have examples? What would you suggest rather than letting the process exit?

@justintime
Copy link

While I'm not new to relying on external services from long running processes, I'm really new to lita. Thanks for pointing out the single adapter thing.

systemd will often use restart=on-failure, which if lita exits with status 0, it wouldn't restart the service. In upstart, there's respawn-limit.

As to how to deal with it, I'd think that applying graceful degradation similar to the way @s2ugimot did in shusugmt/lita-hipchat@0d9369d would require less work from the sysadmin and would ultimately be a more "set-it-and-forget-it" approach.

Hubot does it similarly: hipchat/hubot-hipchat#174

BTW: didn't get to thank you for your ChatOps panel at DevOpsDaysRox. I was the one in the openspace that talked about the "clock in" feature that was the killer feature that got buy-in from the users. I've been working on replacing Hubot with Lita this week and have been really happy so far.

@jimmycuadra
Copy link
Collaborator

Interesting... thanks for the additional details. I'm not sure how I feel about where/how this issue should be handled. I'll keep thinking about it.

Glad you've been enjoying Lita so far! Getting feedback about it that can lead to more improvements is the best thanks I could ask for. :}

@dblessing
Copy link

Another point to consider - a process supervisor is not a given. sysvinit is still widely used.

I also imagine how I might feel if a Java or Rails app I manage simply exited because a 3rd party service went away for a moment and it didn't try to reconnect.

Thanks for continuing to think about this.

@willejs
Copy link

willejs commented May 28, 2015

@jimmycuadra I'm not a huge fan of this approach. If the websocket is closed, its an error? We should handle errors right?

Would you be adverse to some retry logic that is configurable, perhaps an exponential backoff?

@jimmycuadra
Copy link
Collaborator

lita-hipchat doesn't use WebSockets, but in any case, what type of error are you referring to? If there's something specific that is recoverable, then probably yes. I'm still hesitant about adding reconnection logic inside the adapter. I'd love to hear more specifics on how letting a process manager restart it has been a problem for folks.

@dblessing
Copy link

@jimmycuadra Do you have any examples of other applications that take this approach? I'm not sure you'll get many specifics on how it's worked for people because I'm not sure it's a common thing to do. Currently we use CentOS 6 and sysvinit so there is no process supervisor. We don't have any processes that routinely bomb out when external services fail and I would call it a bug if they did.

@jimmycuadra
Copy link
Collaborator

I guess I'm amenable to this feature after more thought (thank you for all the comments). I don't love the idea of having to maintain it, but clearly there is demand and the goal is to make users' lives easier. I'm really just kind of gun shy about handling additional complexity in the adapter since XMPP and the xmpp4r library are already quite nasty, and I don't want to introduce things that might create more subtle bugs in the future. But if someone wants to take this on, I will accept a really solid PR. I think it will need to handle reconnection with an exponential backoff and a maximum number of attempts to be safe.

@sjernigan
Copy link

We're running lita in kubernetes with built in process monitoring. We have other processes running under God and even custom scripts. While I appreciate the views about it reconnecting. I also like simplicity. I think process monitoring isn't that unusual and would suggest running lita under something even if it did handle this reconnect issue. Afterall, this isn't only reason lita might fail and process monitors have a rich feature set for dealing with these events. So practically, if making lita crash is an easy development task, I'd welcome it as an acceptable solution that moves lita forward. When/if a pull request comes with reconnect feature, I'll heed Jimmy's caution and let it bake a while before upgrading. Till then, it's manual intervention every time Hipchat's server's have a blip. Thanks for everyone who's contributed to lita.

alindeman added a commit to alindeman/lita-hipchat that referenced this issue Feb 19, 2016
If the socket has already disconnected, e.g., we don't want to raise _another_
error (see litaio#34, litaio#20) when trying to close it cleanly.
alindeman added a commit to alindeman/lita-hipchat that referenced this issue Feb 21, 2016
If the socket has already disconnected, e.g., we don't want to raise _another_
error (see litaio#34, litaio#20) when trying to close it cleanly.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants