Skip to content
This repository has been archived by the owner on Jan 1, 2020. It is now read-only.

Sensu server stops by itself #1368

Closed
kaushiksriram100 opened this issue Jul 12, 2016 · 6 comments
Closed

Sensu server stops by itself #1368

kaushiksriram100 opened this issue Jul 12, 2016 · 6 comments

Comments

@kaushiksriram100
Copy link

kaushiksriram100 commented Jul 12, 2016

Seeing a wierd issue on only 1 server in a 4 node sensu cluster (0.21 version). The sensu-server simply stops and then starts by itself after some time. I presume it starts because chef-client daemon is running and may be starting it. I see mixed log messages. In one occasion it shows like this:

Any pointers?

{"timestamp":"2016-07-09T05:46:51.006604-0700","level":"fatal","message":"transport connection error","error":"rabbitmq channel closed"}
{"timestamp":"2016-07-09T05:46:51.006776-0700","level":"warn","message":"reconnecting to transport"}
/opt/sensu/embedded/lib/ruby/gems/2.2.0/gems/amqp-1.5.0/lib/amqp/session.rb:738:in send_frame': The connection is closed, you can't use it anymore! (AMQP::ConnectionClosedError) from /opt/sensu/embedded/lib/ruby/gems/2.2.0/gems/amqp-1.5.0/lib/amqp/channel.rb:1022:inacknowledge'
from /opt/sensu/embedded/lib/ruby/gems/2.2.0/gems/amqp-1.5.0/lib/amqp/header.rb:35:in ack' from /opt/sensu/embedded/lib/ruby/gems/2.2.0/gems/sensu-transport-3.3.0/lib/sensu/transport/rabbitmq.rb:74:inacknowledge'
from /opt/sensu/embedded/lib/ruby/gems/2.2.0/gems/sensu-transport-3.3.0/lib/sensu/transport/base.rb:117:in ack' from /opt/sensu/embedded/lib/ruby/gems/2.2.0/gems/sensu-0.21.0/lib/sensu/server/process.rb:460:inblock (2 levels) in setup_results'
from /opt/sensu/embedded/lib/ruby/gems/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:976:in call' from /opt/sensu/embedded/lib/ruby/gems/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:976:inblock in run_deferred_callbacks'
from /opt/sensu/embedded/lib/ruby/gems/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:973:in times' from /opt/sensu/embedded/lib/ruby/gems/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:973:inrun_deferred_callbacks'
from /opt/sensu/embedded/lib/ruby/gems/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:193:in run_machine' from /opt/sensu/embedded/lib/ruby/gems/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:193:inrun'
from /opt/sensu/embedded/lib/ruby/gems/2.2.0/gems/sensu-0.21.0/lib/sensu/server/process.rb:23:in run' from /opt/sensu/embedded/lib/ruby/gems/2.2.0/gems/sensu-0.21.0/exe/sensu-server:10:in<top (required)>'
from /opt/sensu/bin/sensu-server:23:in load' from /opt/sensu/bin/sensu-server:23:in

'

In another scenario it shows these:
{"timestamp":"2016-07-12T13:53:01.032871-0700","level":"warn","message":"unsubscribing from keepalive and result queues"}
/opt/sensu/embedded/lib/ruby/gems/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:331:in add_oneshot_timer': ran out of timers; use #set_max_timers to increase limit (RuntimeError) from /opt/sensu/embedded/lib/ruby/gems/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:331:inadd_timer'
from /opt/sensu/embedded/lib/ruby/gems/2.2.0/gems/eventmachine-1.0.8/lib/em/timers.rb:12:in initialize' from /opt/sensu/embedded/lib/ruby/gems/2.2.0/gems/sensu-0.21.0/lib/sensu/utilities.rb:21:innew'
from /opt/sensu/embedded/lib/ruby/gems/2.2.0/gems/sensu-0.21.0/lib/sensu/utilities.rb:21:in retry_until_true' from /opt/sensu/embedded/lib/ruby/gems/2.2.0/gems/sensu-0.21.0/lib/sensu/utilities.rb:23:inblock in retry_until_true'
from /opt/sensu/embedded/lib/ruby/gems/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:193:in call' from /opt/sensu/embedded/lib/ruby/gems/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:193:inrun_machine'
from /opt/sensu/embedded/lib/ruby/gems/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:193:in run' from /opt/sensu/embedded/lib/ruby/gems/2.2.0/gems/sensu-0.21.0/lib/sensu/server/process.rb:23:inrun'
from /opt/sensu/embedded/lib/ruby/gems/2.2.0/gems/sensu-0.21.0/exe/sensu-server:10:in <top (required)>' from /opt/sensu/bin/sensu-server:23:inload'
from /opt/sensu/bin/sensu-server:23:in `

'

@portertech
Copy link
Contributor

@kaushiksriram100 are you using a proxy in front of RabbitMQ?

I have a open pull request for the second scenario, #1370 👍

@kaushiksriram100
Copy link
Author

@portertech : no. Not using a proxy.

@portertech
Copy link
Contributor

@kaushiksriram100 does the RabbitMQ log suggest why the channels are being closed?

@portertech
Copy link
Contributor

@kaushiksriram100 in regards to the timer issue, how many check definitions does the install have?

@kaushiksriram100
Copy link
Author

kaushiksriram100 commented Jul 21, 2016

@portertech This is staging setup and there are 892 checks in this install but not all clients in this staging install are subscribed to run all of those checks. There are lesser clients in staging. Production setup also has the same 892 checks with more clients and more subscriptions.

Issue is currently observed in staging and one of the prod install.

@portertech
Copy link
Contributor

Closing this due to inactivity, please feel free to create a new issue if this hasn't been resolved.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants