Possible severe impact on resque performance #186

paneq · 2016-04-18T12:43:18Z

I described the investigation process deeply in my blogpost

TLDR:

Sometimes this 60s sleep in ensure block is reached after resque child finished processing a job. Ruby might wait for this thread to finish and in that time for one minute resque is waiting for the child process with Honeybadger thread to die. Until this happens resque is not processing new jobs.

  def work
    flush_metrics if metrics.flush?
    flush_traces if traces.flush?
  rescue StandardError => e
    error {
      msg = "error in agent thread class=%s message=%s\n\t%s"
      sprintf(msg, e.class, e.message.dump, Array(e.backtrace).join("\n\t"))
    }
  ensure
    sleep(delay)
  end

Related Ruby bug: https://bugs.ruby-lang.org/issues/12298

The text was updated successfully, but these errors were encountered:

joshuap · 2016-04-18T13:32:13Z

Thanks for reporting! This is a weird one; interested to hear what the Ruby team has to say. In any case, we'll take a closer look at this also.

joshuap · 2016-04-20T22:11:03Z

After studying this some more I'm thinking the solution in our case is to just not sleep during the ensure block. I think we can avoid this entirely with something like:

  def work
    flush_metrics if metrics.flush?
    flush_traces if traces.flush?
    sleep(delay)
  rescue StandardError => e
    error {
      msg = "error in agent thread class=%s message=%s\n\t%s"
      sprintf(msg, e.class, e.message.dump, Array(e.backtrace).join("\n\t"))
    }
    sleep(delay)
  end

(assuming the rescue isn't invoked when the thread is killed.) I should have a fix this week.

@paneq Really great writeup, btw.

mbell697 · 2016-04-20T23:20:42Z

In general using Thread.kill is not safe, the best solution is to not rely on it. In particular this bit of code is also extremely suspect:

honeybadger-ruby/lib/honeybadger/agent/worker.rb

Line 86 in 84ba2d3

thread.join # Allow ensure blocks to execute.

.

* honeybadger/master: (31 commits) Freudian slip? Start TROUBLESHOOTING.md doc. Ignore development gems in appraisal test runs. Update json gem. Allow custom ca bundle path for secure connections Release 2.6.0 Update appraisal gemfiles. Bump version to 2.6.0. Update CHANGELOG for honeybadger-io#188. Update CHANGELOG for honeybadger-io#187. Don't sleep in ensure. Fixes honeybadger-io#186. add method to get current context Always convert to string when sanitizing strings. Filter Authorization header by default. Fixes honeybadger-io#184 Adds OpenSSL exception class to the set of Exceptions that are rescued by the honey badger backend. Remove extra output. Try upgrading RubyGems. Try bundler 1.10.6. Try bundler fix suggested by CircleCI. Fix typo. ...

paneq mentioned this issue Apr 18, 2016

Honeybadger possible severe impact on resque performance resque/resque#1451

Closed

joshuap added a commit that referenced this issue Apr 21, 2016

Don't sleep in ensure. Fixes #186.

0821ac6

joshuap mentioned this issue Apr 21, 2016

Don't sleep in ensure. #188

Merged

joshuap closed this as completed in #188 Apr 22, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible severe impact on resque performance #186

Possible severe impact on resque performance #186

paneq commented Apr 18, 2016 •

edited

Loading

joshuap commented Apr 18, 2016

joshuap commented Apr 20, 2016

mbell697 commented Apr 20, 2016 •

edited

Loading

Possible severe impact on resque performance #186

Possible severe impact on resque performance #186

Comments

paneq commented Apr 18, 2016 • edited Loading

joshuap commented Apr 18, 2016

joshuap commented Apr 20, 2016

mbell697 commented Apr 20, 2016 • edited Loading

paneq commented Apr 18, 2016 •

edited

Loading

mbell697 commented Apr 20, 2016 •

edited

Loading