Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sidekiq::Shutdown duplicates jobs after sidekiq:stop/sidekiq:start #897

Closed
SergeyKishenin opened this issue May 3, 2013 · 21 comments

Comments

Projects
None yet
7 participants
@SergeyKishenin
Copy link

commented May 3, 2013

During every deploy stopping Sidekiq's workers with sidekiq:stop makes all the running jobs fail with Sidekiq::Shutdown error and go to retries, also theese jobs go back to queue waiting until Sidekiq starts.

When Sidekiq is being started with sidekiq:start command jobs run again but duplicated. I guess this happens because of two origins: retries and queues.

@mperham mperham closed this in 541ff1f May 3, 2013

@SergeyKishenin

This comment has been minimized.

Copy link
Author

commented May 3, 2013

Awesome, thanks!

@queso

This comment has been minimized.

Copy link
Contributor

commented Aug 28, 2013

It looks like this problem still exists in some form. Somehow the Sidekiq::Shutdown looks like it bubbles into the mysql connection and it throws a

ActiveRecord::StatementInvalid: Sidekiq::Shutdown: Sidekiq::Shutdown: SELECT twitter_users.* FROM twitter_usersWHEREtwitter_users.twitter_id = 123 LIMIT 1

Should I open a new issue? Would be happy to help test/code a patch too.

@mperham

This comment has been minimized.

Copy link
Owner

commented Aug 28, 2013

@queso You aren't clear what the problem is. Sidekiq has to kill any lingering workers so it can shut down in a timely manner. Jobs that are shut down in that manner are pushed back onto the queue so they can be re-executed when Sidekiq starts back up.

@queso

This comment has been minimized.

Copy link
Contributor

commented Aug 28, 2013

Yes, and the requeue happens, but we also see this ActiveRecord::StatementInvalid thrown with the "Sidekiq::Shutdown" in it... So those get rescued and tossed onto our retry queue. So we end up with two jobs.

The big issue is our job has a memory leak and bluepill is set to restart sidekiq at a certain point, so the really leaky jobs end up with 100's of queued up and retry attempts.

@queso

This comment has been minimized.

Copy link
Contributor

commented Aug 28, 2013

I guess what I am saying is that the Sidekiq::Shutdown raise seems to happen during a mysql2 connection call and mysql is catching and raising it's own error somehow.

@queso

This comment has been minimized.

Copy link
Contributor

commented Aug 28, 2013

@queso

This comment has been minimized.

Copy link
Contributor

commented Aug 28, 2013

@brianmario thoughts? Line 19 seems a bit heavy handed...

@mperham

This comment has been minimized.

Copy link
Owner

commented Aug 28, 2013

@queso You shouldn't be using the EM client within Sidekiq jobs. Use the blocking client since you have multiple threads.

@queso

This comment has been minimized.

Copy link
Contributor

commented Aug 28, 2013

I am using activerecord to hit mysql2, so I assume that is the blocking call. I don't see any other obvious reasons that the exception from sidekiq is being rescued in mysql2 gem. Perhaps AR is doing the rescue?

@brianmario

This comment has been minimized.

Copy link

commented Aug 28, 2013

Are you guys using the EM ActiveRecord adapter? That line you pointed out seems fine to me because we need to bubble up the exception to the deferrable instead of raising in the executing thread (for EventMachine). But if you're using Sidekiq you should be using the regular mysql2 adapter. I'd track that down first then report back to us if you still have issues once you've switched to the regular adapter.

@queso

This comment has been minimized.

Copy link
Contributor

commented Aug 28, 2013

No, I am not using the EM Adapter, sorry, didn't catch that, just did a search for any rescue and figured that might be the trouble. I am using the regular adapter.

@brianmario

This comment has been minimized.

Copy link

commented Aug 28, 2013

Ah ok... what line is the backtrace leading to in mysql2?

@queso

This comment has been minimized.

Copy link
Contributor

commented Aug 29, 2013

@rajofchennai

This comment has been minimized.

Copy link

commented Nov 19, 2013

I am having a similar issue, If the worker is processing some ActiveRecord query during Sidekiq shutdown, The exception Sidekiq::Shutdown thrown by sidekiq is caught by active-record and rethrown as

ActiveRecord::StatementInvalid) "Sidekiq::Shutdown: Sidekiq::Shutdown" 

https://github.com/rails/rails/blob/3-2-stable/activerecord/lib/active_record/connection_adapters/abstract_adapter.rb#L281.
In this case, Instead of re queueing the job, It is added to list of failed jobs. I think this is not the expected behaviour.

@benjaminwood

This comment has been minimized.

Copy link

commented Jun 4, 2014

I'm having the same issue with long running jobs. When I shutdown sidekiq, I get:

ActiveRecord::StatementInvalid - Sidekiq::Shutdown: Sidekiq::Shutdown: SELECT logins.* FROM logins WHERE.... rest of my db query.

The most troubling part is that I have set the worker to not retry (sidekiq_options :retry => false) and yet it restarts the job when sidekiq starts up again. That's not expected behaviour, is it?

@rajofchennai - I see you've created your own solution to this bug. How's that working for you?

It'd be awesome to have this bug patched in v2.x!

@mperham

This comment has been minimized.

Copy link
Owner

commented Jun 4, 2014

@benjaminwood That's expected behavior. Jobs which are in-progress when Sidekiq shuts down will be immediately restarted. It's not a retry.

@benjaminwood

This comment has been minimized.

Copy link

commented Jun 5, 2014

@mperham, ah okay. Many thanks for your reply!

Is there a bug here with Sidekiq::Shutdown showing up inside the db query? Not a huge deal now that I understand it's not what is causing my job to restart. Just curious! Thanks again...

@mperham

This comment has been minimized.

Copy link
Owner

commented Jun 5, 2014

No bug. ActiveRecord catches any unexpected error and calls it a StatementInvalid error. Sidekiq has to use an exception to stop all busy threads so that any transactions in progress roll back.

@benjaminwood

This comment has been minimized.

Copy link

commented Jun 5, 2014

Gotcha! Thanks again for your reply and thanks for all your hard work on sidekiq! I live in Vancouver, perhaps I'll have the opportunity to buy you a beer someday. 🍻

sentience added a commit to avalancheau/sidekiq that referenced this issue Jun 27, 2014

Prevent exceptions caused by Sidekiq::Shutdown from causing jobs to b…
…e retried.

Jobs in progress during a Sidekiq shutdown/restart are requeued for immediate execution. They should not also be queued for retry.

This is an attempt at a cleaner fix than that proposed with mperham#1354, also discussed on mperham#897. Because it depends on Exception#cause, this fix will only be effective on Ruby 2.1.0; however, the code will run on earlier Rubies.
@nyddogghr

This comment has been minimized.

Copy link

commented Feb 28, 2019

Hi,

I'm sorry to bump this old issue, but this seems to be exactly what I'm having right now.
If you wish me to create another issue, just tell me.

I'm using sidekiq 5.2.5 (neither Pro nor Enterprise) with a Rails 5.2 application and I can see jobs being duplicated sometimes on sidekiq restart (using heroku). If the job fails during restart due to Sidekiq::Shutdown, it can be sent to both retry and failed queues.
Here is for instance logs showing the issue:

2019-02-28T15:57:59.930830+00:00 app background.1 - - 4 TID-oxr838ra4 INFO: Running in ruby 2.4.5p335 (2018-10-18 revision 65137) [x86_64-linux]
2019-02-28T15:57:59.931251+00:00 app background.1 - - 4 TID-oxr838ra4 INFO: See LICENSE and the LGPL-3.0 for licensing details.
2019-02-28T15:57:59.931492+00:00 app background.1 - - 4 TID-oxr838ra4 INFO: Upgrade to Sidekiq Pro for more features and support: http://sidekiq.org
2019-02-28T15:57:59.931652+00:00 app background.1 - - 4 TID-oxr838ra4 INFO: Booting Sidekiq 5.2.5 with redis options {:id=>"Sidekiq-server-PID-4", :url=>"redis://h:REDACTED@ec2-63-32-218-34.eu-west-1.compute.amazonaws.com:6729"}
2019-02-28T15:57:59.939714+00:00 app background.1 - - 4 TID-oxr838ra4 WARN: Sidekiq 6.0 will require Redis 4.0+, you are using Redis v3.2.12
2019-02-28T15:57:59.939794+00:00 app background.1 - - 4 TID-oxr838ra4 DEBUG: Client Middleware:
2019-02-28T15:57:59.939842+00:00 app background.1 - - 4 TID-oxr838ra4 DEBUG: Server Middleware: Raven::SidekiqCleanupMiddleware, NewRelic::SidekiqInstrumentation
2019-02-28T15:57:59.939869+00:00 app background.1 - - 4 TID-oxr838ra4 INFO: Starting processing, hit Ctrl-C to stop
2019-02-28T15:57:59.939976+00:00 app background.1 - - 4 TID-oxr838ra4 DEBUG: {:queues=>["critical", "critical", "critical", "critical", "critical", "critical", "critical", "background_api", "background_api", "background_api", "default", "default", "default", "default", "default", "thumb_image", "thumb_image", "thumb_image", "thumb_image", "thumb_image", "low", "low", "low"], :labels=>[], :concurrency=>7, :require=>".", :environment=>"production", :timeout=>8, :poll_interval_average=>nil, :average_scheduled_poll_interval=>5, :error_handlers=>[#<Sidekiq::ExceptionHandler::Logger:0x00007ff82736b330>, #<Raven::SidekiqErrorHandler:0x00007ff814911900>, #<Proc:0x00007ff818953a90@/app/vendor/bundle/ruby/2.4.0/gems/newrelic_rpm-5.2.0.345/lib/new_relic/agent/instrumentation/sidekiq.rb:66>], :death_handlers=>[], :lifecycle_events=>{:startup=>[], :quiet=>[], :shutdown=>[], :heartbeat=>[]}, :dead_max_jobs=>10000, :dead_timeout_in_seconds=>15552000, :reloader=>#<Sidekiq::Rails::Reloader @app=HerokuService::Application>, :verbose=>true, :limits=>{:thumb_image=>4}, :strict=>false, :config_file=>"./config/sidekiq.yml", :tag=>"app", :identity=>"background.1:4:931b719f8bfd", :fetch=>Sidekiq::LimitFetch}
2019-02-28T15:58:00.001045+00:00 app background.1 - - 4 TID-oxr4317cs Api::BackgroundRequestJob JID-938441f1737c768516eb805a INFO: start
2019-02-28T15:58:19.938971+00:00 app background.1 - - 4 TID-oxr838ra4 DEBUG: Got TERM signal
2019-02-28T15:58:19.938998+00:00 app background.1 - - 4 TID-oxr838ra4 INFO: Shutting down
2019-02-28T15:58:19.939252+00:00 app background.1 - - 4 TID-oxr838ra4 INFO: Terminating quiet workers
2019-02-28T15:58:19.939255+00:00 app background.1 - - 4 TID-oxr4o5ehs INFO: Scheduler exiting...
2019-02-28T15:58:20.441705+00:00 app background.1 - - 4 TID-oxr838ra4 INFO: Pausing to allow workers to finish...
2019-02-28T15:58:27.515236+00:00 app background.1 - - 4 TID-oxr838ra4 WARN: Terminating 1 busy worker threads
2019-02-28T15:58:27.515323+00:00 app background.1 - - 4 TID-oxr838ra4 WARN: Work still in progress [#<struct Sidekiq::LimitFetch::UnitOfWork queue="queue:background_api", job="{\"class\":\"Api::BackgroundRequestJob\",\"args\":[],\"retry\":3,\"queue\":\"background_api\",\"jid\":\"938441f1737c768516eb805a\",\"created_at\":1551369418.3886776,\"enqueued_at\":1551369418.3892639}">]
2019-02-28T15:58:27.515386+00:00 app background.1 - - 4 TID-oxr838ra4 DEBUG: Re-queueing terminated jobs
2019-02-28T15:58:27.516756+00:00 app background.1 - - 4 TID-oxr838ra4 INFO: Pushed 1 jobs back to Redis
2019-02-28T15:58:27.518465+00:00 app background.1 - - 4 TID-oxr838ra4 INFO: Bye!
2019-02-28T15:58:27.614586+00:00 app background.1 - - Sending event f745ed721b36484ba521c8cc30d62cb2 to Sentry
2019-02-28T15:58:28.354806+00:00 app background.1 - -
2019-02-28T15:58:28.354825+00:00 app background.1 - - Sidekiq::Shutdown (Sidekiq::Shutdown):
2019-02-28T15:58:28.354831+00:00 app background.1 - -
2019-02-28T15:58:28.354833+00:00 app background.1 - - app/lib/api/response.rb:43:in `to_json'
2019-02-28T15:58:28.354835+00:00 app background.1 - - app/controllers/api/application_controller.rb:15:in `respond_with'
2019-02-28T15:58:28.354836+00:00 app background.1 - - app/controllers/api/v2/media_controller.rb:199:in `index'
2019-02-28T15:58:28.354838+00:00 app background.1 - - app/jobs/api/background_request_job.rb:48:in `perform'
2019-02-28T15:58:31.649134+00:00 app background.1 - - /app/vendor/bundle/ruby/2.4.0/gems/logging-1.8.2/lib/logging/appender.rb:139: warning: constant ::Fixnum is deprecated
2019-02-28T15:58:34.166322+00:00 app background.1 - - 4 TID-ou72l12ck INFO: Running in ruby 2.4.5p335 (2018-10-18 revision 65137) [x86_64-linux]
2019-02-28T15:58:34.166344+00:00 app background.1 - - 4 TID-ou72l12ck INFO: See LICENSE and the LGPL-3.0 for licensing details.
2019-02-28T15:58:34.166346+00:00 app background.1 - - 4 TID-ou72l12ck INFO: Upgrade to Sidekiq Pro for more features and support: http://sidekiq.org
2019-02-28T15:58:34.166531+00:00 app background.1 - - 4 TID-ou72l12ck INFO: Booting Sidekiq 5.2.5 with redis options {:id=>"Sidekiq-server-PID-4", :url=>"redis://h:REDACTED@ec2-63-32-218-34.eu-west-1.compute.amazonaws.com:6729"}
2019-02-28T15:58:34.173212+00:00 app background.1 - - 4 TID-ou72l12ck WARN: Sidekiq 6.0 will require Redis 4.0+, you are using Redis v3.2.12
2019-02-28T15:58:34.173309+00:00 app background.1 - - 4 TID-ou72l12ck DEBUG: Client Middleware:
2019-02-28T15:58:34.173357+00:00 app background.1 - - 4 TID-ou72l12ck DEBUG: Server Middleware: Raven::SidekiqCleanupMiddleware, NewRelic::SidekiqInstrumentation
2019-02-28T15:58:34.173385+00:00 app background.1 - - 4 TID-ou72l12ck INFO: Starting processing, hit Ctrl-C to stop
2019-02-28T15:58:34.173541+00:00 app background.1 - - 4 TID-ou72l12ck DEBUG: {:queues=>["critical", "critical", "critical", "critical", "critical", "critical", "critical", "background_api", "background_api", "background_api", "default", "default", "default", "default", "default", "thumb_image", "thumb_image", "thumb_image", "thumb_image", "thumb_image", "low", "low", "low"], :labels=>[], :concurrency=>7, :require=>".", :environment=>"production", :timeout=>8, :poll_interval_average=>nil, :average_scheduled_poll_interval=>5, :error_handlers=>[#<Sidekiq::ExceptionHandler::Logger:0x00007f76403fae18>, #<Raven::SidekiqErrorHandler:0x00007f7636ab2300>, #<Proc:0x00007f7636683d10@/app/vendor/bundle/ruby/2.4.0/gems/newrelic_rpm-5.2.0.345/lib/new_relic/agent/instrumentation/sidekiq.rb:66>], :death_handlers=>[], :lifecycle_events=>{:startup=>[], :quiet=>[], :shutdown=>[], :heartbeat=>[]}, :dead_max_jobs=>10000, :dead_timeout_in_seconds=>15552000, :reloader=>#<Sidekiq::Rails::Reloader @app=HerokuService::Application>, :verbose=>true, :limits=>{:thumb_image=>4}, :strict=>false, :config_file=>"./config/sidekiq.yml", :tag=>"app", :identity=>"background.1:4:a230ea8abb57", :fetch=>Sidekiq::LimitFetch}
2019-02-28T15:58:34.262481+00:00 app background.1 - - 4 TID-ou6yv8vjs Api::BackgroundRequestJob JID-938441f1737c768516eb805a INFO: start
2019-02-28T15:58:58.681202+00:00 app background.1 - - 4 TID-ou6yvb7jw DEBUG: enqueued retry: {"class":"Api::BackgroundRequestJob","args":[],"retry":3,"queue":"background_api","jid":"938441f1737c768516eb805a","created_at":1551369418.3886776,"enqueued_at":1551369418.3892639,"error_message":"Call received non OK status 500 for path '/api/v2/media.json'.","error_class":"Api::Exception::Unknown","failed_at":1551369508.460359,"retry_count":0}
2019-02-28T15:58:58.683340+00:00 app background.1 - - 4 TID-ou6yv8vtg Api::BackgroundRequestJob JID-938441f1737c768516eb805a INFO: start

Sometimes the error shown is indeed Sidekiq::Shutdown, but it can also be other errors that are caused by it.

Is this indeed the same issue? Would you have any recommendation to prevent it?

Thanks.

@mperham

This comment has been minimized.

Copy link
Owner

commented Feb 28, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.