New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible severe impact on resque performance #186

Closed
paneq opened this Issue Apr 18, 2016 · 3 comments

Comments

Projects
None yet
3 participants
@paneq

paneq commented Apr 18, 2016

I described the investigation process deeply in my blogpost

TLDR:

Sometimes this 60s sleep in ensure block is reached after resque child finished processing a job. Ruby might wait for this thread to finish and in that time for one minute resque is waiting for the child process with Honeybadger thread to die. Until this happens resque is not processing new jobs.

  def work
    flush_metrics if metrics.flush?
    flush_traces if traces.flush?
  rescue StandardError => e
    error {
      msg = "error in agent thread class=%s message=%s\n\t%s"
      sprintf(msg, e.class, e.message.dump, Array(e.backtrace).join("\n\t"))
    }
  ensure
    sleep(delay)
  end

Related Ruby bug: https://bugs.ruby-lang.org/issues/12298

@joshuap

This comment has been minimized.

Show comment
Hide comment
@joshuap

joshuap Apr 18, 2016

Member

Thanks for reporting! This is a weird one; interested to hear what the Ruby team has to say. In any case, we'll take a closer look at this also.

Member

joshuap commented Apr 18, 2016

Thanks for reporting! This is a weird one; interested to hear what the Ruby team has to say. In any case, we'll take a closer look at this also.

@joshuap

This comment has been minimized.

Show comment
Hide comment
@joshuap

joshuap Apr 20, 2016

Member

After studying this some more I'm thinking the solution in our case is to just not sleep during the ensure block. I think we can avoid this entirely with something like:

  def work
    flush_metrics if metrics.flush?
    flush_traces if traces.flush?
    sleep(delay)
  rescue StandardError => e
    error {
      msg = "error in agent thread class=%s message=%s\n\t%s"
      sprintf(msg, e.class, e.message.dump, Array(e.backtrace).join("\n\t"))
    }
    sleep(delay)
  end

(assuming the rescue isn't invoked when the thread is killed.) I should have a fix this week.

@paneq Really great writeup, btw.

Member

joshuap commented Apr 20, 2016

After studying this some more I'm thinking the solution in our case is to just not sleep during the ensure block. I think we can avoid this entirely with something like:

  def work
    flush_metrics if metrics.flush?
    flush_traces if traces.flush?
    sleep(delay)
  rescue StandardError => e
    error {
      msg = "error in agent thread class=%s message=%s\n\t%s"
      sprintf(msg, e.class, e.message.dump, Array(e.backtrace).join("\n\t"))
    }
    sleep(delay)
  end

(assuming the rescue isn't invoked when the thread is killed.) I should have a fix this week.

@paneq Really great writeup, btw.

@mbell697

This comment has been minimized.

Show comment
Hide comment
@mbell697

mbell697 Apr 20, 2016

In general using Thread.kill is not safe, the best solution is to not rely on it. In particular this bit of code is also extremely suspect:

thread.join # Allow ensure blocks to execute.
.

mbell697 commented Apr 20, 2016

In general using Thread.kill is not safe, the best solution is to not rely on it. In particular this bit of code is also extremely suspect:

thread.join # Allow ensure blocks to execute.
.

joshuap added a commit that referenced this issue Apr 21, 2016

@joshuap joshuap closed this in #188 Apr 22, 2016

ivanvc added a commit to ivanvc/honeybadger-ruby that referenced this issue May 18, 2016

Merge remote-tracking branch 'honeybadger/master'
* honeybadger/master: (31 commits)
  Freudian slip?
  Start TROUBLESHOOTING.md doc.
  Ignore development gems in appraisal test runs.
  Update json gem.
  Allow custom ca bundle path for secure connections
  Release 2.6.0
  Update appraisal gemfiles.
  Bump version to 2.6.0.
  Update CHANGELOG for #188.
  Update CHANGELOG for #187.
  Don't sleep in ensure. Fixes #186.
  add method to get current context
  Always convert to string when sanitizing strings.
  Filter Authorization header by default. Fixes #184
  Adds OpenSSL exception class to the set of Exceptions that are rescued by the honey badger backend.
  Remove extra output.
  Try upgrading RubyGems.
  Try bundler 1.10.6.
  Try bundler fix suggested by CircleCI.
  Fix typo.
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment