Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add thread reaping to thread pool #702

Merged
merged 1 commit into from
Jun 10, 2015
Merged

Conversation

OleMchls
Copy link
Contributor

Hi there.

The Problem

As already discussed in #691 and #681 Puma does not recover from dead threads. Even though this is a fairly uncommon state and might only happen in edge cases. One of these edge cases seems to be using rack-timeout as you can see in zombocom/rack-timeout#76 and rails/rails#1627 (comment).

To reproduce this behavior use the following example rack app:

# dummy.ru
class SimpleRacktimeout # stand-in for a more complex Rails app using all kind of different non rubycode ressouces
  def initialize(app)
    @app = app
  end

  def call(env)
    p '########### HELLO FROM APP ###########'
    begin
      app_thread     = Thread.current
      timeout_thread = Thread.start do
        app_thread.raise(Exception)
      end

      `sleep 1`
    ensure
     timeout_thread.kill
     timeout_thread.join
   end
  end
end

app = Rack::Builder.new do |b|
  b.use SimpleRacktimeout
  b.run lambda { |_env| [200, {}, ['ok']] }
end.to_app

run app

Start it with puma -w 1 -b tcp://localhost -p 9292 -t 3:3 dummy.ru. And then force Puma to concurrently serve more requests than it has (healthy) threads available by using Apache Benchmark ab -n 8 -c 4 http://127.0.0.1:9292/.

Solution

This PR implements a thread reaping mechanism based on the auto trim functionality. It decreases the @spawned counter in the ThreadPool in order to allow it to spawn new threads.

Alternative Solution

Instead you can use th.abort_on_exception = true in https://github.com/puma/puma/blob/master/lib/puma/thread_pool.rb#L112 which also solves the problem described above. We decided for the former solution because removing dead threads from the pool seemed a cleaner approach.

As mentioned in #691 we are not sure if and how active record connections in dead threads will be cleaned up.

  • any idea about how to kill active record connections of dead threads?

/cc @Partyschaum

@evanphx
Copy link
Member

evanphx commented May 20, 2015

Seems like we should attack the initially problem, i.e. why threads in the pool are dying. I guess an exception is bubbling up to the top-level?

@OleMchls
Copy link
Contributor Author

I'm not completely sure what is killing the threads, from my educated guess, it's the fact that the exception is raised from another thread and thus bubbling up and killing the thread.

All the issues we've linked above and also our initial problem we've faced is that heroku recently switched their recommended web server from unicorn to puma (ref), but due to the fact that puma does not have a request timeout rack-timeout is recommended. And from what I've seen in our app as well in a bunch of issues all over different ruby repos (rails/puma/rack-timeout) is that this combination is not working pretty well, I'm not sure why this happens exactly. But it seems like the ActiveRecord connection pool is having a lot of trouble with that as we as the puma thread pool. From our investigations all the following reported problems are caused by a combination of rack-timeout not playing well with puma.

Maybe pulling in @schneems and @kch might be a good idea.

@OleMchls
Copy link
Contributor Author

@evanphx we are wondering what do you exactly mean with top level?

@kch
Copy link
Contributor

kch commented May 21, 2015

rack-timeout is kind of terrible and currently doesn't play well with puma, I'm working on a bunch of changes to it, which should remedy that.

But it seems to me threads could die for a number of valid reasons and puma should reap them when that happens.

@OleMchls
Copy link
Contributor Author

@kch I agree on the point that dead threads should be reaped, that is why we proposed this PR. On the other hand, could you maybe (in the rack-timeout repo) a bit more transparent on what kind a changes you are working and if you might need help? Happy to help :) Because of the issue mentioned we now switched back to unicorn.

@knightq
Copy link

knightq commented May 28, 2015

+1 for merge and release

@sgranata82
Copy link

+1

2 similar comments
@SebastianEdwards
Copy link

+1

@deepj
Copy link
Contributor

deepj commented Jun 3, 2015

+1

@knightq
Copy link

knightq commented Jun 8, 2015

We'll switch back to unicorn too until this will be resolved: it's source of dangerous raising in request queueing during high request burst periods.

@schneems
Copy link
Contributor

schneems commented Jun 8, 2015

Or you could run off of 1 thread and multiple puma workers which is essentially doing the same thing as Unicorn.

evanphx added a commit that referenced this pull request Jun 10, 2015
Add thread reaping to thread pool
@evanphx evanphx merged commit cd82b02 into puma:master Jun 10, 2015
@acrolink
Copy link

I had encountered the issue of request time-outs when I was testing my app using httperf tool (conns 500, rate 50, timeout 10). Puma failed badly,out of 500 requests: 181 status 200 OK and 319 time-out.

Any new updates on this?

@emerleite
Copy link

@acrolink same here. Sometimes I have only one thread working :(

@acrolink
Copy link

@emerleite .. If you don't mind trying a new framework I highy recommend Phoenix (upon Elixir) .. It is super fast compared to Ruby's Rails (5 or 6 times faster). As for Puma, I have kept using Unicorn for the meantime.

@emerleite
Copy link

We're currently doing that, but the Rails app must be running for now :)

@evanphx
Copy link
Member

evanphx commented Aug 3, 2016

@acrolink Can you tell me more about your setup?
@emerleite Can you tell me more about your setup?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants