Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible severe impact on resque performance #186

Closed
paneq opened this issue Apr 18, 2016 · 3 comments · Fixed by #188
Closed

Possible severe impact on resque performance #186

paneq opened this issue Apr 18, 2016 · 3 comments · Fixed by #188

Comments

@paneq
Copy link

paneq commented Apr 18, 2016

I described the investigation process deeply in my blogpost

TLDR:

Sometimes this 60s sleep in ensure block is reached after resque child finished processing a job. Ruby might wait for this thread to finish and in that time for one minute resque is waiting for the child process with Honeybadger thread to die. Until this happens resque is not processing new jobs.

  def work
    flush_metrics if metrics.flush?
    flush_traces if traces.flush?
  rescue StandardError => e
    error {
      msg = "error in agent thread class=%s message=%s\n\t%s"
      sprintf(msg, e.class, e.message.dump, Array(e.backtrace).join("\n\t"))
    }
  ensure
    sleep(delay)
  end

Related Ruby bug: https://bugs.ruby-lang.org/issues/12298

@joshuap
Copy link
Member

joshuap commented Apr 18, 2016

Thanks for reporting! This is a weird one; interested to hear what the Ruby team has to say. In any case, we'll take a closer look at this also.

@joshuap
Copy link
Member

joshuap commented Apr 20, 2016

After studying this some more I'm thinking the solution in our case is to just not sleep during the ensure block. I think we can avoid this entirely with something like:

  def work
    flush_metrics if metrics.flush?
    flush_traces if traces.flush?
    sleep(delay)
  rescue StandardError => e
    error {
      msg = "error in agent thread class=%s message=%s\n\t%s"
      sprintf(msg, e.class, e.message.dump, Array(e.backtrace).join("\n\t"))
    }
    sleep(delay)
  end

(assuming the rescue isn't invoked when the thread is killed.) I should have a fix this week.

@paneq Really great writeup, btw.

@mbell697
Copy link

mbell697 commented Apr 20, 2016

In general using Thread.kill is not safe, the best solution is to not rely on it. In particular this bit of code is also extremely suspect:

thread.join # Allow ensure blocks to execute.
.

joshuap added a commit that referenced this issue Apr 21, 2016
ivanvc added a commit to ivanvc/honeybadger-ruby that referenced this issue May 18, 2016
* honeybadger/master: (31 commits)
  Freudian slip?
  Start TROUBLESHOOTING.md doc.
  Ignore development gems in appraisal test runs.
  Update json gem.
  Allow custom ca bundle path for secure connections
  Release 2.6.0
  Update appraisal gemfiles.
  Bump version to 2.6.0.
  Update CHANGELOG for honeybadger-io#188.
  Update CHANGELOG for honeybadger-io#187.
  Don't sleep in ensure. Fixes honeybadger-io#186.
  add method to get current context
  Always convert to string when sanitizing strings.
  Filter Authorization header by default. Fixes honeybadger-io#184
  Adds OpenSSL exception class to the set of Exceptions that are rescued by the honey badger backend.
  Remove extra output.
  Try upgrading RubyGems.
  Try bundler 1.10.6.
  Try bundler fix suggested by CircleCI.
  Fix typo.
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants