Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to restart the sidekiq process using monit. #3679

Closed
karthyfd opened this issue Dec 1, 2017 · 4 comments
Closed

Unable to restart the sidekiq process using monit. #3679

karthyfd opened this issue Dec 1, 2017 · 4 comments

Comments

@karthyfd
Copy link

karthyfd commented Dec 1, 2017

Ruby version: 2.3.4
Sidekiq / Pro / Enterprise version(s): 3.4.1

Please include your initializer and any error message with the full backtrace.

Hi,
This is the first time i am opening the issue in github. So please bear with me if i missed anything.

I have some good amount of time to figure out the below issue but i could not, so need your suggestions.

We are monitoring sidekiq process through monit. Through monit we are restarting the process once the sidekiq process reaches 2048 MB. The issues here is some of the sidekiq process go to execution failed state when monit trying to restart the process.

All the execution failed state sidekiq process has some status like 1 0f 10 busy. So I am thinking one of the job gets stuck. I am not able to find the stuck job. Could you please help me how i can find the stuck job ?. I checked in the corresponding worker log file but i could not.

We are trying to restart by first stopping and then starting the process. we use kill -term signal to kill the process. But it is getting failed. So I thought about killing the process forcefully with -9. But i am not sure whether the stuck job will get pushed to redis again if reliable fetch is activated. I am just curious to know whether we can forcefully kill the sidekiq process with some timeout ?

Are you using an old version? Yes
Have you checked the changelog to see if your issue has been fixed in a later version? Yes

https://github.com/mperham/sidekiq/blob/master/Changes.md

@karthyfd
Copy link
Author

karthyfd commented Dec 5, 2017

@mperham -> I have got below logs in worker when i am trying to soft kill the sidekiq process with term signal.

2017-12-05T05:37:44.104Z 15005 TID-44az98 WARN: Terminating 2 busy worker threads
2017-12-05T05:37:44.104Z 15005 TID-44az98 WARN: Work still in progress [#<struct Sidekiq::Pro::ReliableFetch::UnitOfWork queue="", message="{"args":[{}],"account_id":,"class":"r","":true,"uuid":null,"retry":0,"queue":"supervisor","backtrace":true,"jid":"7d9385390bf019c655a6eb72","created_at":1512448515.6960254,"enqueued_at":1512448515.6962037}", local_queue="-sidekiq-5_5">, #<struct Sidekiq::Pro::ReliableFetch::UnitOfWork queue="queue:", message="{"args":[{}],"account_id":**,"class":"","belongs_to_account":true,"uuid":null,"retry":0,"queue":"supervisor","backtrace":true,"jid":"92741e4d66257afdeb96d0b3","created_at":1512452087.1528904,"enqueued_at":1512452087.1567256}", local_queue="queue:supervisor_tickets-sidekiq-5_5">]
2017-12-05T05:37:44.459Z 15005 TID-44az98 INFO: Moving work from queue:supervisor_tickets-sidekiq-5_5 back to queue:supervisor
2017-12-05T05:37:44.460Z 15005 TID-44az98 INFO: Moving work from queue:supervisor_tickets-sidekiq-5_5 back to queue:supervisor

Does this mean that job has been successfully moved to redis ?

@mperham
Copy link
Collaborator

mperham commented Dec 5, 2017

It's totally normal - Sidekiq moved the long-running job back to the public queue so it can be restarted once Sidekiq restarts.

You should upgrade your Sidekiq version and switch to super_fetch. reliable_fetch has been deprecated for a year now.

@mperham mperham closed this as completed Dec 5, 2017
@karthyfd
Copy link
Author

karthyfd commented Dec 7, 2017

@mperham -> Thanks for your suggestion and the response. I need another clarification though.

Whenver we are issuing a kill with term to the sidekiq process, sidekiq is waiting till timeout(we configured 30 seconds) for the job to get finished. If the job does not get finished , job is being pushed to redis. It is absolutely working fine, but the sidekiq process ideally should get killed after the timeout(30 seconds), but the sidekiq process is not getting killed. If we are issuing a kill signal again with term signal to the same process, it is not all responding (i could not see that log in sidekiq worker). It is getting ignored from the next time onwards.

Have you faced this issue before and fixed it. Please let me know how we can handle this.

P.S: We will move to latest upgraded sidekiq pro version, in the mean time we need to fix this issue

@mperham
Copy link
Collaborator

mperham commented Dec 7, 2017

If your Sidekiq process is not shutting down after TERM, there is something in your app that is preventing shutdown. Use TTIN to get a backtrace dump, see the Signals wiki page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants