Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pongbot does not reconnect #107

Closed
yoshinari-nomura opened this issue Dec 31, 2016 · 17 comments
Closed

pongbot does not reconnect #107

yoshinari-nomura opened this issue Dec 31, 2016 · 17 comments
Labels

Comments

@yoshinari-nomura
Copy link

Hi, I'm playing with your slack-ruby-bot in my holiday.
I started from pongbot in README and talked with the bot, it's 👍

However, after several minutes of network disconnection,
I noticed that pongbot does not try to reconnect to Slack.

For example, turn-off my WiFi network, wait several minutes, and then reconnect WiFi,
pongbot is left disconnected even if WiFi is back.
Terminal does not print any line during that experiment.

I believe handle_exceptions in server.rb kicks restart!.
However, it seems not to catch any exception in this situation.
Any hint to add reconnect feature to pongbot ?

I'm using Ruby 2.3.3 and macOS Sierra and slack-ruby-bot 0.9.
Here is the Gemfile.lock for the pongbot

GEM
  remote: https://rubygems.org/
  specs:
    activesupport (5.0.1)
      concurrent-ruby (~> 1.0, >= 1.0.2)
      i18n (~> 0.7)
      minitest (~> 5.1)
      tzinfo (~> 1.1)
    celluloid (0.17.3)
      celluloid-essentials
      celluloid-extras
      celluloid-fsm
      celluloid-pool
      celluloid-supervision
      timers (>= 4.1.1)
    celluloid-essentials (0.20.5)
      timers (>= 4.1.1)
    celluloid-extras (0.20.5)
      timers (>= 4.1.1)
    celluloid-fsm (0.20.5)
      timers (>= 4.1.1)
    celluloid-io (0.17.3)
      celluloid (>= 0.17.2)
      nio4r (>= 1.1)
      timers (>= 4.1.1)
    celluloid-pool (0.20.5)
      timers (>= 4.1.1)
    celluloid-supervision (0.20.6)
      timers (>= 4.1.1)
    concurrent-ruby (1.0.4)
    faraday (0.10.1)
      multipart-post (>= 1.2, < 3)
    faraday_middleware (0.10.1)
      faraday (>= 0.7.4, < 1.0)
    gli (2.14.0)
    hashie (3.4.6)
    hitimes (1.2.4)
    i18n (0.7.0)
    json (2.0.2)
    minitest (5.10.1)
    multipart-post (2.0.0)
    nio4r (2.0.0)
    slack-ruby-bot (0.9.0)
      hashie
      slack-ruby-client (>= 0.6.0)
    slack-ruby-client (0.7.7)
      activesupport
      faraday
      faraday_middleware
      gli
      hashie
      json
      websocket-driver
    thread_safe (0.3.5)
    timers (4.1.2)
      hitimes
    tzinfo (1.2.2)
      thread_safe (~> 0.1)
    websocket-driver (0.6.4)
      websocket-extensions (>= 0.1.0)
    websocket-extensions (0.1.2)

PLATFORMS
  ruby

DEPENDENCIES
  celluloid-io
  slack-ruby-bot

BUNDLED WITH
   1.13.6
@dblock dblock added the bug? label Dec 31, 2016
@dblock
Copy link
Collaborator

dblock commented Dec 31, 2016

If you can reproduce then we probably do have a bug. Could use a test/fix. Thanks.

@yoshinari-nomura
Copy link
Author

Thank you. I can reproduce that WiFi-case error on macOS and Linux (Raspberry PI).

I don't know which layer should raise the error. In this case, the client looks still connected,
but Slack RTM server recognizes the client disconnected
(watching from the other Slack client, pong-bot looks disconnected).

So, I think pong-bot needs to detect the disconnection using a kind of watchdog-mechanism
such as Websocket ping or Slack API ping. That is, slack-bot client have to keep sending
ping frame/message to the Slack RTM server and detect the absence of pong.
Is it already implemented? If so, we could restart the client on catching the long-absence of pong.

@dblock
Copy link
Collaborator

dblock commented Jan 5, 2017

Try with a bot that's implemented on top of https://github.com/dblock/slack-ruby-bot-server, I am pretty sure it has reconnect logic that handles it. For example https://github.com/dblock/slack-market.

@yoshinari-nomura
Copy link
Author

Thanks for your kind suggestion. I'll try it.

Let me confirm one more thing.
Do you mean pong-bot sample itself does not have the reconnection feature in question?
Should I use slack-ruby-bot-server?

@dblock
Copy link
Collaborator

dblock commented Jan 6, 2017

In theory we should have complete reconnect logic here. So what you described initially still looks like a bug to me, but you'll have to dig through it and possibly PR a fix.

@yoshinari-nomura
Copy link
Author

Maybe, this issue is related with faye/websocket-driver-ruby#42 and celluloid/celluloid-io#176.
restart! mechanism in server.rb depends on the :close event from celluloid-io.

Also, slack-ruby/slack-ruby-client#96 looks very similar. @tikh reproduced the same situation by turning off his WiFi. One thing different was periodic pining created a catchable exception, while my pongbot is too shy to sending pings.

I'm not sure I should make pongbot chatty, or hack celluloid-io.

@yoshinari-nomura
Copy link
Author

I settled to add such ping thread to my pongbot:

class PingThread  < SlackRubyBot::Server
  on 'hello' do |client, data|
    puts "PingThread: Hello!"

    @reconnect_count ||= 0

    if @ping_thread
      @ping_thread.exit
      @reconnect_count += 1
    end

    @ping_thread = Thread.new do
      sequence = 1
      loop do
        client.ping({id: sequence})
        sleep 30
        sequence += 1
      end
    end
  end
end

This thread emits Errno::ETIMEDOUT on disconnection, then I can catch the exception in my main loop.

@dblock
Copy link
Collaborator

dblock commented Apr 23, 2018

If you're using Celluloid you might want to change this not to use a thread. You can add include Celluloid and then use every. Like here.

@dblock
Copy link
Collaborator

dblock commented Apr 23, 2018

I'd love for this to be turned into a feature, something that checks bot connections periodically and is built in.

@yoshinari-nomura
Copy link
Author

Thanks for the comment. I've tried to replace thread by every like below:

class PingThread  < SlackRubyBot::Server
  extend Celluloid

  on 'hello' do |client, data|
    puts "PingThread: Hello!"

    @reconnect_count ||= 0

    if @timer
      @timer.cancel
      @reconnect_count += 1
    end

    sequence = 1
    @timer = every 30 do
      client.ping({id: sequence})
      puts "PingThread: ping #{sequence}, reconnect: #{@reconnect_count}."
      sequence += 1
    end
  end
end

I had to use extend instead of include.
I think I should make sure to cancel @timer after reconnect. Is it correct?

@dblock
Copy link
Collaborator

dblock commented May 2, 2018

Looks good to me.

@yoshinari-nomura
Copy link
Author

Thanks.

I hope this mechanism becomes a built-in feature.
But I have no idea where to slip this stuff into your code.

@gsmetal
Copy link

gsmetal commented Jun 18, 2018

I have similar problem, but it's worse as I understand. I have bot running without bot server, just on celluloid by .run. The problem is in environment, where network can change. After network change (I now test it by connecting/disconnecting to VPN while bot running) there is no disconnections, bot is still running, but no new messages are recieved, it looks like stuck.
To resolve problem I turn to ping/pong solution, found this issue and take code from here. But it does not work because ping is successfully sent:

D, [2018-06-18T19:35:35.201296 #29157] DEBUG -- SlackRubyBot::Client#send_json: {:type=>"ping", :id=>4}
D, [2018-06-18T19:35:35.201823 #29157] DEBUG -- Slack::RealTime::Concurrency::Celluloid::Socket#send_data: {"type":"ping","id":4}
D, [2018-06-18T19:35:35.202035 #29157] DEBUG -- Slack::RealTime::Concurrency::Celluloid::Socket#write: ��8L�C%8�Hbn��w%�_%`�Qcn�
                                                                                                                                z
D, [2018-06-18T19:35:35.202306 #29157] DEBUG -- : PingThread: ping 4, reconnect: 0.

but no pong recieved. Don't know if it's bug or not in bot or client as it thinks that message is sent, but server does not actually recieve it and response for it.

Now I fixed this by listening for the pong response and (a little dirty, but don't know how to do it cleaner) restarting if there was not:

def run
  instance.on 'hello', ->(client, data) do
    logger.debug "PingThread: Hello!"

    @reconnect_count ||= 0

    if @timer
      @timer.cancel
      @reconnect_count += 1
    end

    @sequence = 1
    @timer = every 15 do
      if @pong_recieved == false
        @pong_recieved = nil

        logger.warn 'Pong for previous ping not recieved, restarting.'
        instance.instance_variable_set('@client', nil)
        instance.restart!
        next
      end

      client.ping(id: @sequence)
      logger.debug "PingThread: ping #{@sequence}, reconnect: #{@reconnect_count}."
      @pong_recieved = false
    end
  end

  instance.on 'pong', ->(client, data) do
    logger.debug "PingThread: Pong received!"

    if data['reply_to'] == @sequence
      @pong_recieved = true
      @sequence += 1
    end
  end

  super
end

@dblock
Copy link
Collaborator

dblock commented Jun 18, 2018

I've recently been doing this. I definitely think we're not properly handling a server-side disconnect, see slack-ruby/slack-ruby-client#208, with someone at Slack looking at helping us.

@yoshinari-nomura
Copy link
Author

To tell the truth, my first version was very similar to @gsmetal 's one like below.
But I noticed that waiting pong was over-killing for my case.

class PingThread  < SlackRubyBot::Server
  extend Celluloid

  PONG_ID_QUEUE = []

  def self.pong_received?(id)
    !PONG_ID_QUEUE.delete(id).nil?
  end

  on 'hello' do |client, data|
    puts "Ping Thread HELLO!"
    PONG_ID_QUEUE.clear
    @reconnect_count ||= 0

    if @timer
      @timer.cancel
      @reconnect_count += 1
    end

    sequence = 1

    @timer = every 30 do
      client.ping({id: sequence})
      puts "ping #{sequence}, reconnect: #{@reconnect_count}."
      sleep 3
      raise Errno::ETIMEDOUT unless pong_received?(sequence)
      sequence += 1
    end
  end

  on 'pong' do |client, data|
    sequence = data[:reply_to].to_i
    puts "pong #{sequence}"
    PONG_ID_QUEUE << sequence
  end
end

@atul86244
Copy link

Hi,

I am trying to use the simple pong example but I keep getting this error:

D, [2019-02-15T12:39:50.493263 #1637] DEBUG -- request: Accept: "application/json; charset=utf-8"
User-Agent: "Slack Ruby Client/0.13.1"
Content-Type: "application/x-www-form-urlencoded"
I, [2019-02-15T12:40:13.718209 #1637]  INFO -- response: Status 200
D, [2019-02-15T12:40:13.718583 #1637] DEBUG -- response: content-type: "application/json; charset=utf-8"
transfer-encoding: "chunked"
connection: "close"
date: "Fri, 15 Feb 2019 12:39:51 GMT"
server: "Apache"
expires: "Mon, 26 Jul 1997 05:00:00 GMT"
x-accepted-oauth-scopes: "rtm:stream,client"
x-slack-req-id: "ee5bbe76-061b-4214-a598-c2502c77a5df"
pragma: "no-cache"
vary: "Accept-Encoding"
x-content-type-options: "nosniff"
x-xss-protection: "0"
x-oauth-scopes: "identify,read,post,client,apps"
cache-control: "private, no-cache, no-store, must-revalidate"
referrer-policy: "no-referrer"
strict-transport-security: "max-age=31536000; includeSubDomains; preload"
access-control-allow-origin: "*"
x-via: "haproxy-www-xwfn"
x-cache: "Miss from cloudfront"
via: "1.1 ba54b87be4c2c5972b040f7bc946173f.cloudfront.net (CloudFront)"
x-amz-cf-id: "A_T8oLVZ5YiLenwmSOGlUOxOFqt6nYMEhVSNJkZqUJW3eAtNsYaVOA=="
Killed

@dblock
Copy link
Collaborator

dblock commented Mar 25, 2019

Resolved via slack-ruby/slack-ruby-client#208

@dblock dblock closed this as completed Mar 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants