Supervise Sidekiq::Manager #1194

cpuguy83 · 2013-09-21T14:53:20Z

This should resolve issues where if the manager actor crashes Sidekiq should no longer freeze

Supervise Sidekiq::Manager

mgroebner · 2013-09-21T21:01:44Z

Are you sure this works? I'm not really familiar with celluloid, but as far as i understand a new manager getting initialized after the old one crashes. But it doesn't getting started or miss i something?

I tried to clarify this with the following gist.

https://gist.github.com/cyrez/6654091

With Jruby 1.7.4 and Celluloid 0.15.1 the manager crashes in this example. Then it getting initialized again but don't start processing again.

mperham · 2013-09-21T22:36:13Z

@cyrez That's a good point, you could very well be right. How do we restart the Manager properly?

cpuguy83 · 2013-09-21T22:40:15Z

Actually, I think we'll need to make the ".manager" accessor method always call Celluloid::Actor[:manager]
The way it currently @manager will always be the old actor.

Does that make sense? Sorry, writing this as I wait at the Apple store.

halorgium · 2013-09-21T22:41:41Z

In general, you'll want to hold the state of the system in an actor which does not crash often (read: does not do user-controlled actions). The main pattern is to hold a reference to the Manager in this actor and then trap_exit of the manager actor.

You might consider a Manager crash is a full system reboot.

halorgium · 2013-09-21T22:46:11Z

@cpuguy83 is correct.
One side note is that relying on the registry to call to the manager does mean that old processors will always talk to the latest manager rather than the one which they were started from.
If you head down this route, it would pay to check that Celluloid::Actor[:manager] is the original Manager.

Also, I would suggest namespacing the registry entry as currently the key is global to the process.
We have plans for this in Trello.

mperham · 2013-09-22T00:06:18Z

How's that look?

mgroebner · 2013-09-22T00:23:05Z

@halorgium wouldn't it be nice, if it is possible to tell a supervisor what to do if a actor crashes?

@mperham looks good to me

tarcieri · 2013-09-22T03:32:27Z

lib/sidekiq/launcher.rb

-      @manager = Sidekiq::Manager.new(options)
+
+      Sidekiq::Manager.supervise_as :manager, options
+      @manager = Celluloid::Actor[:manager]


This will refer to a stale version of the manager after the supervisor restarts it.

You should probably do:

def manager Celluloid::Actor[:manager] end

halorgium · 2013-09-22T09:21:58Z

@cyrez we've thought about this.

fred · 2013-10-20T14:11:38Z

I'm still facing issues where the workers do nothing after some period of time.

It processes jobs at first, but after a few hour of running sidekiq it just stops doing anything, jobs go to the queue, and stay there.

Just started to happen recently, maybe since version 2.15.0

Upgraded to sidekiq 2.16.0 to use "resolv-replace" and problem still persists.

I will do some debugging to check what is going on. I have so many different jobs all running often.

mperham · 2013-10-20T15:37:48Z

I'm positive this has little to do with sidekiq and more to do with your ruby vm and gems. I bet you added code or a worker which triggers this behavior.

On Oct 20, 2013, at 7:11, Frederico Araujo notifications@github.com wrote:

I'm still facing issues where the workers do nothing after some period of time.

It processes jobs at first, but after a few hour of running sidekiq it just stops doing anything, jobs go to the queue, and stay there.

Just started to happen recently, maybe since version 2.15.0

Upgraded to sidekiq 2.16.0 to use "resolv-replace" and problem still persists.

I will do some debugging to check what is going on. I have so many different jobs all running often.

—
Reply to this email directly or view it on GitHub.

Supervise Sidekiq::Manager

adeaa84

This should resolve issues where if the manager actor crashes Sidekiq should no longer freeze

mperham added a commit that referenced this pull request Sep 21, 2013

Merge pull request #1194 from cpuguy83/supervise_manager

2c69728

Supervise Sidekiq::Manager

mperham merged commit 2c69728 into sidekiq:master Sep 21, 2013

mperham added a commit that referenced this pull request Sep 22, 2013

Update Launcher to monitor the core Actors, #1194

5635d4f

tarcieri reviewed Sep 22, 2013
View reviewed changes

mperham added a commit that referenced this pull request Sep 22, 2013

Change watchdog to propagate exceptions so Sidekiq quickly dies, #1194.

a7b422a

cpuguy83 deleted the supervise_manager branch April 14, 2020 23:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supervise Sidekiq::Manager #1194

Supervise Sidekiq::Manager #1194

cpuguy83 commented Sep 21, 2013

mgroebner commented Sep 21, 2013

mperham commented Sep 21, 2013

cpuguy83 commented Sep 21, 2013

halorgium commented Sep 21, 2013

halorgium commented Sep 21, 2013

mperham commented Sep 22, 2013

mgroebner commented Sep 22, 2013

tarcieri Sep 22, 2013

halorgium commented Sep 22, 2013

fred commented Oct 20, 2013

mperham commented Oct 20, 2013

Supervise Sidekiq::Manager #1194

Supervise Sidekiq::Manager #1194

Conversation

cpuguy83 commented Sep 21, 2013

mgroebner commented Sep 21, 2013

mperham commented Sep 21, 2013

cpuguy83 commented Sep 21, 2013

halorgium commented Sep 21, 2013

halorgium commented Sep 21, 2013

mperham commented Sep 22, 2013

mgroebner commented Sep 22, 2013

tarcieri Sep 22, 2013

Choose a reason for hiding this comment

halorgium commented Sep 22, 2013

fred commented Oct 20, 2013

mperham commented Oct 20, 2013