Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dead actors? #233

Closed
dim opened this issue Jun 10, 2012 · 9 comments
Closed

Dead actors? #233

dim opened this issue Jun 10, 2012 · 9 comments

Comments

@dim
Copy link
Contributor

dim commented Jun 10, 2012

I'm running Sidekiq 2.0.1 (on Heroku) and workers simply seem to die during processing. I'm not sure how to debug this issue, but here's what happens:

$ heroku run "sidekiq -c 1 -v"

Running `sidekiq -c 1 -v` attached to terminal... up, run.1
2012-06-10T14:47:52Z 7 TID-e0kq0 INFO: Booting sidekiq 2.0.1 with Redis at redis://....
2012-06-10T14:47:52Z 7 TID-e0kq0 INFO: Running in ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-linux]
2012-06-10T14:47:52Z 7 TID-e0kq0 DEBUG: {:queues=>["default"], :concurrency=>1, :require=>".", :environment=>"production", :timeout=>8, :enable_rails_extensions=>true}
2012-06-10T14:47:53Z 7 TID-3lu3s INFO: Starting processing, hit Ctrl-C to stop
2012-06-10T14:47:53Z 7 TID-d8b30 BulkWorker MSG-d8bks INFO: start
2012-06-10T14:47:53Z 7 TID-d8b30 BulkWorker MSG-d8bks INFO: done: 0.173 sec
2012-06-10T14:47:54Z 7 TID-d8b30 BulkWorker MSG-dtzp8 INFO: start
2012-06-10T14:47:54Z 7 TID-d8b30 BulkWorker MSG-dtzp8 INFO: done: 0.151 sec
2012-06-10T14:47:54Z 7 TID-d8b30 BulkWorker MSG-iwwu8 INFO: start
2012-06-10T14:47:54Z 7 TID-d8b30 BulkWorker MSG-iwwu8 INFO: done: 0.411 sec
2012-06-10T14:47:55Z 7 TID-d8b30 BulkWorker MSG-kdqs8 INFO: start
2012-06-10T14:47:55Z 7 TID-d8b30 BulkWorker MSG-kdqs8 INFO: done: 0.168 sec

The processin stops here, although there are still of items in the queue. Pressing CTRL+C reveals:

2012-06-10T14:50:59Z 7 TID-3lu3s INFO: Shutting down
attempted to call a dead actor
/app/vendor/bundle/ruby/1.9.1/gems/celluloid-0.11.0/lib/celluloid/actor.rb:47:in `rescue in call'
/app/vendor/bundle/ruby/1.9.1/gems/celluloid-0.11.0/lib/celluloid/actor.rb:44:in `call'
/app/vendor/bundle/ruby/1.9.1/gems/celluloid-0.11.0/lib/celluloid/actor_proxy.rb:99:in `method_missing'
/app/vendor/bundle/ruby/1.9.1/gems/sidekiq-2.0.1/lib/sidekiq/cli.rb:78:in `rescue in run'
/app/vendor/bundle/ruby/1.9.1/gems/sidekiq-2.0.1/lib/sidekiq/cli.rb:69:in `run'
/app/vendor/bundle/ruby/1.9.1/gems/sidekiq-2.0.1/bin/sidekiq:8:in `<top (required)>'
bin/sidekiq:16:in `load'
bin/sidekiq:16:in `<main>'

Any clues? Thanks, Dim

@mperham
Copy link
Collaborator

mperham commented Jun 11, 2012

That backtrace would imply that the manager died somehow. You don't see an error in the log?

@dim
Copy link
Contributor Author

dim commented Jun 11, 2012

Unfortunately, no. Anything I can do to debug? Where should I start?

@mperham
Copy link
Collaborator

mperham commented Jun 11, 2012

Try setting celluloids global error handler and see if that catches the problem.

On 11 Jun 2012, at 00:19, Dimitrij Denissenkoreply@reply.github.com wrote:

Unfortunately, no. Anything I can do to debug? Where should I start?


Reply to this email directly or view it on GitHub:
#233 (comment)

@dim
Copy link
Contributor Author

dim commented Jun 11, 2012

Hmm, all I get is SystemStackError: stack level too deep and a single stack-trace line:

~/.rvm/.../gems/celluloid-0.11.0/lib/celluloid/task.rb:57

As said, doesn't every time, but only every now and then.

@dim
Copy link
Contributor Author

dim commented Jun 12, 2012

Not sure if that additional info helps, but I don't think there is anything wrong with my worker. Here's what currently happens when I heroku run "sidekiq -c 1":

2012-06-12T09:57:11Z 7 TID-pqo4g INFO: Booting sidekiq 2.0.1 with Redis at redis://50.30.35.9:2362/6
2012-06-12T09:57:11Z 7 TID-pqo4g INFO: Running in ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-linux]
2012-06-12T09:57:12Z 7 TID-b9i20 INFO: Starting processing, hit Ctrl-C to stop

There are 33 items in the queue, but they are simply not picked up. I would expect to see something like:

2012-06-12T09:57:12Z 7TID-abcde BulkWorker MSG-abcde INFO: start

But the worker doesn't even start. Pressing CTRL-C then returns the usual:

2012-06-12T10:03:06Z 7 TID-2usnw INFO: Shutting down
attempted to call a dead actor
/app/vendor/bundle/ruby/1.9.1/gems/celluloid-0.11.0/lib/celluloid/actor.rb:47:in `rescue in call'
/app/vendor/bundle/ruby/1.9.1/gems/celluloid-0.11.0/lib/celluloid/actor.rb:44:in `call'
/app/vendor/bundle/ruby/1.9.1/gems/celluloid-0.11.0/lib/celluloid/actor_proxy.rb:99:in `method_missing'
/app/vendor/bundle/ruby/1.9.1/gems/sidekiq-2.0.1/lib/sidekiq/cli.rb:78:in `rescue in run'
/app/vendor/bundle/ruby/1.9.1/gems/sidekiq-2.0.1/lib/sidekiq/cli.rb:69:in `run'
/app/vendor/bundle/ruby/1.9.1/gems/sidekiq-2.0.1/bin/sidekiq:8:in `<top (required)>'
bin/sidekiq:16:in `load'
bin/sidekiq:16:in `<main>'

@mperham
Copy link
Collaborator

mperham commented Jun 12, 2012

@dim I don't know what to tell you. Others aren't reporting this issue so I have to assume it is specific to your environment. Maybe open a Celluloid issue because the actor appears to be dying silently?

@dim
Copy link
Contributor Author

dim commented Jun 14, 2012

Hey! I finally figured it out and am happy to share. Although almost impossible to debug, I traced the actor's death to: https://github.com/mperham/sidekiq/blob/2c4be4c/lib/sidekiq/manager.rb#L113.

My jobs (often) contain a significant amount of information and there is a lot to parse. Sidekiq offloads that work to MultiJson (https://github.com/mperham/sidekiq/blob/222442/lib/sidekiq.rb#L89) which manages to cut through the 4k stack limit of ruby fibers (I found your bug report btw: http://bugs.ruby-lang.org/issues/3187). Switching the MultiJson backend from :oj to :yajl solved the problem, as Yajl is apparently less greedy with the stack.

Interesting stuff, just took me a whole week of debugging to "get there". Not sure what to do about it. I suspect most Sidekiq users have small jobs, so the 4k limit isn't a big deal. It would still be nice to mention it somewhere in the Troubleshooting section.

Thanks, D

@dim dim closed this as completed Jun 14, 2012
@mperham
Copy link
Collaborator

mperham commented Jun 14, 2012

Nice, thanks for tracking this down. Yajl is native, so it uses the C stack and doesn't have that 4k limit.

I had considered adding a log warning to Sidekiq if a message is greater than 64k. Ideally Sidekiq messages should be a set of IDs to process and not include a lot of state.

@dim
Copy link
Contributor Author

dim commented Jun 14, 2012

Hmm, oj is native too, maybe I should open a ticket there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants