Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supervise Sidekiq::Manager #1194

Merged
merged 1 commit into from
Sep 21, 2013
Merged

Conversation

cpuguy83
Copy link
Contributor

This should resolve issues where if the manager actor crashes Sidekiq should no longer freeze

#1188

This should resolve issues where if the manager actor crashes Sidekiq should no longer freeze
mperham added a commit that referenced this pull request Sep 21, 2013
@mperham mperham merged commit 2c69728 into sidekiq:master Sep 21, 2013
@mgroebner
Copy link

Are you sure this works? I'm not really familiar with celluloid, but as far as i understand a new manager getting initialized after the old one crashes. But it doesn't getting started or miss i something?

I tried to clarify this with the following gist.

https://gist.github.com/cyrez/6654091

With Jruby 1.7.4 and Celluloid 0.15.1 the manager crashes in this example. Then it getting initialized again but don't start processing again.

@mperham
Copy link
Collaborator

mperham commented Sep 21, 2013

@cyrez That's a good point, you could very well be right. How do we restart the Manager properly?

@cpuguy83
Copy link
Contributor Author

Actually, I think we'll need to make the ".manager" accessor method always call Celluloid::Actor[:manager]
The way it currently @manager will always be the old actor.

Does that make sense? Sorry, writing this as I wait at the Apple store.

@halorgium
Copy link
Contributor

In general, you'll want to hold the state of the system in an actor which does not crash often (read: does not do user-controlled actions). The main pattern is to hold a reference to the Manager in this actor and then trap_exit of the manager actor.

You might consider a Manager crash is a full system reboot.

@halorgium
Copy link
Contributor

@cpuguy83 is correct.
One side note is that relying on the registry to call to the manager does mean that old processors will always talk to the latest manager rather than the one which they were started from.
If you head down this route, it would pay to check that Celluloid::Actor[:manager] is the original Manager.

Also, I would suggest namespacing the registry entry as currently the key is global to the process.
We have plans for this in Trello.

@mperham
Copy link
Collaborator

mperham commented Sep 22, 2013

How's that look?

@mgroebner
Copy link

@halorgium wouldn't it be nice, if it is possible to tell a supervisor what to do if a actor crashes?

@mperham looks good to me

@manager = Sidekiq::Manager.new(options)

Sidekiq::Manager.supervise_as :manager, options
@manager = Celluloid::Actor[:manager]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will refer to a stale version of the manager after the supervisor restarts it.

You should probably do:

def manager
   Celluloid::Actor[:manager]
end

@halorgium
Copy link
Contributor

@cyrez we've thought about this.

@fred
Copy link

fred commented Oct 20, 2013

I'm still facing issues where the workers do nothing after some period of time.

It processes jobs at first, but after a few hour of running sidekiq it just stops doing anything, jobs go to the queue, and stay there.

Just started to happen recently, maybe since version 2.15.0

Upgraded to sidekiq 2.16.0 to use "resolv-replace" and problem still persists.

I will do some debugging to check what is going on. I have so many different jobs all running often.

@mperham
Copy link
Collaborator

mperham commented Oct 20, 2013

I'm positive this has little to do with sidekiq and more to do with your ruby vm and gems. I bet you added code or a worker which triggers this behavior.

On Oct 20, 2013, at 7:11, Frederico Araujo notifications@github.com wrote:

I'm still facing issues where the workers do nothing after some period of time.

It processes jobs at first, but after a few hour of running sidekiq it just stops doing anything, jobs go to the queue, and stay there.

Just started to happen recently, maybe since version 2.15.0

Upgraded to sidekiq 2.16.0 to use "resolv-replace" and problem still persists.

I will do some debugging to check what is going on. I have so many different jobs all running often.


Reply to this email directly or view it on GitHub.

@cpuguy83 cpuguy83 deleted the supervise_manager branch April 14, 2020 23:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants