Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frequent bot restart required #118

Open
jrrbelmont opened this issue Aug 9, 2017 · 16 comments
Open

Frequent bot restart required #118

jrrbelmont opened this issue Aug 9, 2017 · 16 comments

Comments

@jrrbelmont
Copy link

We've been running the bot for over a year here and every week or so the bot will stop responding to requests (and we see it go offline in slack). The node process is still running and there isn't anything interesting in the logs. We need to kill the current process and restart it.

Bot 0.0.13
Ubuntu 14.04

Known issue with node / slack / looker?

@wilg
Copy link
Contributor

wilg commented Aug 9, 2017

Hmm... I haven't heard of this. The bot is supposed to exit if the slack connection terminates, so I don't know why the process would stay running. Are you using slash commands or @ messages to communicate with the bot?

@jrrbelmont
Copy link
Author

Thanks for the quick response. No slash commands. Only @ mentions or, more often, direct messages in the Looker user channel.

@wilg
Copy link
Contributor

wilg commented Aug 10, 2017

Definitely sounds like the realtime API is shutting down. It should be exiting in that case though. Looks like this needs some investigation.

@jrrbelmont
Copy link
Author

What can I provide for debugging? Thanks.

@jtao27
Copy link

jtao27 commented Aug 22, 2017

+1 a customer needing restarts

@tomdev
Copy link
Contributor

tomdev commented Aug 22, 2017

Yeah, running into this a lot as well. Needed to restart lookerbot several times last week.

Please let me know if there is anything I can do to help debug. I'm not sure if there are any log files I can dig into?

@jrrbelmont
Copy link
Author

I'd love to help debug this. This is very annoying and too frequent. What logging can I add to see if this is a problem talking to Slack or Looker?

@wilg
Copy link
Contributor

wilg commented Oct 19, 2017

I am pretty sure it’s a problem talking to Slack but unfortunately I just am not sure how to debug it off the top of my head and I haven’t had the bandwidth to do a deep dive on this yet.

@wilg
Copy link
Contributor

wilg commented Nov 29, 2017

Please reopen if this still happens under 0.0.14.

@wilg wilg closed this as completed Nov 29, 2017
@wilg
Copy link
Contributor

wilg commented Dec 11, 2017

I'm seeing this internally – the bot continues responding to Slash commands but goes offline. Slack RTM is disconnecting somehow.

@wilg wilg reopened this Dec 11, 2017
@hporter
Copy link

hporter commented Mar 3, 2018

Experienced this today - bot was not responding to commands, offline in Slack. Restarting the docker container in Kubernetes fixed it instantly. Running latest version, deployed for the first time on Thursday.

We will try to debug and share anything we find, as everyone is pretty excited to see this working!

Even if it is tough to work out why this happening, a /status route within the app would at least allow Kube to restart the container automatically when it happens - assuming the problem could be detected from within Lookerbot. Some more logging to trawl through might help as well.

@wilg
Copy link
Contributor

wilg commented Mar 5, 2018

There is /health_check, but I think the issue is the process stays up but only the real-time websocket closes or stops listening. The bot should restart itself if it closes though. So I'm not sure what's going on.

@DylanBaker
Copy link

+1. We are seeing this a few times a week. Running in Elastic Beanstalk. Restart fixes every time.

@reedloden
Copy link
Contributor

@wilg Have you considered upgrading botkit? Noticed lookerbot is using 0.0.15, while the latest is 0.6.12. https://github.com/howdyai/botkit/blob/master/changelog.md lists a ton of fixes to Slack stuff, including various RTM fixes.

@hporter
Copy link

hporter commented Mar 27, 2018

I haven't had too much time to find a solution beyond restarting the container - though @reedloden suggests an idea worth trying out.

If it helps, when it broke today I was getting this error in the logs repeated every second...

-- ERROR rpc error: code = InvalidArgument desc = failed ot read query: expected label name, got EOF
-- Reconnecting
-- ERROR rpc error: code = InvalidArgument desc = failed ot read query: expected label name, got EOF
-- Reconnecting
-- ERROR rpc error: code = InvalidArgument desc = failed ot read query: expected label name, got EOF
-- Reconnecting
-- ERROR rpc error: code = InvalidArgument desc = failed ot read query: expected label name, got EOF

@samjbobb
Copy link
Contributor

samjbobb commented May 29, 2018

We were experiencing this: Lookerbot disconnected every few days and never reconnected.

Based on the suggestion by @reedloden, I updated botkit.

Our lookerbot has stayed connected since about April 20.

It's possible this breaks something in lookerbot that we're not using, but so far, we haven't found any problems.

PR here: #141

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants