Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Failed jobs count increments, but no failures logged. #1533

Closed
mlangenberg opened this Issue Mar 6, 2014 · 4 comments

Comments

2 participants

Using Sidekiq v2.17.7 and Rails 3.2

I am investigating why 10,000 of 50,000 jobs failed for a given day.
sidekiq_graph

Sidekiq stdout log only shows: INFO: start, INFO: done: 0.144 sec, for jobs that increase the failed jobs count. I don't see anything reported to Honeybadger for that particular job (I do for other jobs), and the Rails log looks fine as well.

Why is the failure count incremented, but is there no 'fail' log entry: server/logging.rb?

Retries tab stays empty as well.

Owner

mperham commented Mar 7, 2014

A failure generally means an exception is raised and logged. If this is not the behavior you're seeing, something is probably wrong with your environment. Maybe you have a 3rd party sidekiq extension which is screwing things up.

Thanks, I disabled the only 3rd party extension, mhenrixon/sidekiq-unique-jobs@c4bade7

I will update this issue after the next production deploy.

Okay, removing the sidekiq-unique-jobs gem did not mitigate the issue.

When I am tailing the log of the sidekiq worker processes that run on 4 machines, I do not see any errors or 'fail:' in the logs.

Unfortunately, failures appear in the graph, the failed jobs count increments, but the retries tab stays empty.

Yet when I start my worker from a rails production console, with bad arguments, I see the correct behavior.

  • MyWorker.perform_async('invalid_json')
  • sidekiq log:
    • INFO: fail: 0.01 sec
    • WARN: {"retry"=>true, "queue"=>"default", "class"=>"MyWoirker", "args"=>["invalid_json"], "jid"=>"...", "enqueued_at"=>..., "error_message"=>"795: unexpected token at 'invalid_json'", "error_class"=>"MultiJson::ParseError", "failed_at"=>..., "retry_count"=>0}
  • Job appears in Retries tab.

Then of course, in a running system, there are so many things going on. What other debug options are there? Can I try the sidekiq_retries_exhausted hook? Or is there a good chance it will not be called either.

Could this be a redis timeout?

Cause was unrelated to Sidekiq. An old process was still taking and failing jobs from the queue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment