Unable to restart the sidekiq process using monit. #3679

karthyfd · 2017-12-01T07:52:38Z

Ruby version: 2.3.4
Sidekiq / Pro / Enterprise version(s): 3.4.1

Please include your initializer and any error message with the full backtrace.

Hi,
This is the first time i am opening the issue in github. So please bear with me if i missed anything.

I have some good amount of time to figure out the below issue but i could not, so need your suggestions.

We are monitoring sidekiq process through monit. Through monit we are restarting the process once the sidekiq process reaches 2048 MB. The issues here is some of the sidekiq process go to execution failed state when monit trying to restart the process.

All the execution failed state sidekiq process has some status like 1 0f 10 busy. So I am thinking one of the job gets stuck. I am not able to find the stuck job. Could you please help me how i can find the stuck job ?. I checked in the corresponding worker log file but i could not.

We are trying to restart by first stopping and then starting the process. we use kill -term signal to kill the process. But it is getting failed. So I thought about killing the process forcefully with -9. But i am not sure whether the stuck job will get pushed to redis again if reliable fetch is activated. I am just curious to know whether we can forcefully kill the sidekiq process with some timeout ?

Are you using an old version? Yes
Have you checked the changelog to see if your issue has been fixed in a later version? Yes

https://github.com/mperham/sidekiq/blob/master/Changes.md

karthyfd · 2017-12-05T06:57:06Z

@mperham -> I have got below logs in worker when i am trying to soft kill the sidekiq process with term signal.

2017-12-05T05:37:44.104Z 15005 TID-44az98 WARN: Terminating 2 busy worker threads
2017-12-05T05:37:44.104Z 15005 TID-44az98 WARN: Work still in progress [#<struct Sidekiq::Pro::ReliableFetch::UnitOfWork queue="", message="{"args":[{}],"account_id":,"class":"r","":true,"uuid":null,"retry":0,"queue":"supervisor","backtrace":true,"jid":"7d9385390bf019c655a6eb72","created_at":1512448515.6960254,"enqueued_at":1512448515.6962037}", local_queue="-sidekiq-5_5">, #<struct Sidekiq::Pro::ReliableFetch::UnitOfWork queue="queue:", message="{"args":[{}],"account_id":**,"class":"","belongs_to_account":true,"uuid":null,"retry":0,"queue":"supervisor","backtrace":true,"jid":"92741e4d66257afdeb96d0b3","created_at":1512452087.1528904,"enqueued_at":1512452087.1567256}", local_queue="queue:supervisor_tickets-sidekiq-5_5">]
2017-12-05T05:37:44.459Z 15005 TID-44az98 INFO: Moving work from queue:supervisor_tickets-sidekiq-5_5 back to queue:supervisor
2017-12-05T05:37:44.460Z 15005 TID-44az98 INFO: Moving work from queue:supervisor_tickets-sidekiq-5_5 back to queue:supervisor

Does this mean that job has been successfully moved to redis ?

mperham · 2017-12-05T17:29:37Z

It's totally normal - Sidekiq moved the long-running job back to the public queue so it can be restarted once Sidekiq restarts.

You should upgrade your Sidekiq version and switch to super_fetch. reliable_fetch has been deprecated for a year now.

karthyfd · 2017-12-07T14:27:14Z

@mperham -> Thanks for your suggestion and the response. I need another clarification though.

Whenver we are issuing a kill with term to the sidekiq process, sidekiq is waiting till timeout(we configured 30 seconds) for the job to get finished. If the job does not get finished , job is being pushed to redis. It is absolutely working fine, but the sidekiq process ideally should get killed after the timeout(30 seconds), but the sidekiq process is not getting killed. If we are issuing a kill signal again with term signal to the same process, it is not all responding (i could not see that log in sidekiq worker). It is getting ignored from the next time onwards.

Have you faced this issue before and fixed it. Please let me know how we can handle this.

P.S: We will move to latest upgraded sidekiq pro version, in the mean time we need to fix this issue

mperham · 2017-12-07T16:52:19Z

If your Sidekiq process is not shutting down after TERM, there is something in your app that is preventing shutdown. Use TTIN to get a backtrace dump, see the Signals wiki page.

mperham closed this as completed Dec 5, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to restart the sidekiq process using monit. #3679

Unable to restart the sidekiq process using monit. #3679

karthyfd commented Dec 1, 2017

karthyfd commented Dec 5, 2017 •

edited

Loading

mperham commented Dec 5, 2017

karthyfd commented Dec 7, 2017

mperham commented Dec 7, 2017

Unable to restart the sidekiq process using monit. #3679

Unable to restart the sidekiq process using monit. #3679

Comments

karthyfd commented Dec 1, 2017

karthyfd commented Dec 5, 2017 • edited Loading

mperham commented Dec 5, 2017

karthyfd commented Dec 7, 2017

mperham commented Dec 7, 2017

karthyfd commented Dec 5, 2017 •

edited

Loading