Skip to content
This repository

Live streaming threads not dying #10989

Open
egru opened this Issue · 40 comments

28 participants

Eric Gruber Yves Senn Joe Ellis Sebastian.Kim huetsch jtomasrl Maximilian Busch Manu S Ajith siddharthbhagwan Nicholas Jakobsen Adam Fortuna teeg82 Alexey Medvedev Sergey Alexandrovich boutcheratwest Ruslan Kuzma John Ledbetter rubytastic liusong1111 Johnny Robeson panlilu Henry Hwangbo Pierre Pretorius and others
Eric Gruber
egru commented

Hello,

I'm running into a weird issue with live streams in rails4. I noticed when I refresh a page that is streaming data, the thread that has been created for it sometimes doesn't die. Here's an example controller I'm using:

class GetJobController < ApplicationController
    include ActionController::Live

    def get_job
    response.headers["Content-Type"] = "text/event-stream"
    redis = Redis.new
    redis.subscribe('job') do |on|
      on.message do |event, data|
        response.stream.write("event: #{event}\n")
        response.stream.write("data: #{data}\n\n")
       end
    end
    rescue IOError
        puts "Stream closed"
    ensure
        redis.quit
        response.stream.close
    end
end

Here is the coffeescript I am using

source = new EventSource("/job")
        source.addEventListener 'job', (e) ->
          message = e.data
          $("#job.id").text message

When I start up Puma, the number of threads starts at two. When I refresh the page, the number of threads is three. If I refresh the page too much, the thread limit for Puma gets reached and the server just freezes. I narrowed it down to the controller causing the problem. If I remove the redis.subscribe portion, I obviously don't get any data back and no threads are created. Is there something I am doing wrong? My first thought was something with thread safety, but I am not sure.

Thanks

Yves Senn
Collaborator
senny commented
Joe Ellis

I am also seeing the same problem here, I'm expecting the threads to die as I move off this controller, but they seem to hang around and I eventually run out.

When running puma config.ru instead of the rails s Puma, I've also come across found this message when trying to force shutdown the server, which may explain the issue:

Puma caught this error: Attempt to unlock a mutex which is locked by another thread
/Users/joe/.rbenv/versions/2.0.0-p0/lib/ruby/gems/2.0.0/gems/rack-1.5.2/lib/rack/lock.rb:22:in `unlock'
/Users/joe/.rbenv/versions/2.0.0-p0/lib/ruby/gems/2.0.0/gems/rack-1.5.2/lib/rack/lock.rb:22:in `ensure in call'
/Users/joe/.rbenv/versions/2.0.0-p0/lib/ruby/gems/2.0.0/gems/rack-1.5.2/lib/rack/lock.rb:23:in `call'
/Users/joe/.rbenv/versions/2.0.0-p0/lib/ruby/gems/2.0.0/gems/actionpack-4.0.0/lib/action_dispatch/middleware/static.rb:64:in `call'
/Users/joe/.rbenv/versions/2.0.0-p0/lib/ruby/gems/2.0.0/gems/railties-4.0.0/lib/rails/engine.rb:511:in `call'
/Users/joe/.rbenv/versions/2.0.0-p0/lib/ruby/gems/2.0.0/gems/railties-4.0.0/lib/rails/application.rb:97:in `call'
/Users/joe/.rbenv/versions/2.0.0-p0/lib/ruby/gems/2.0.0/gems/railties-4.0.0/lib/rails/railtie/configurable.rb:30:in `method_missing'
/Users/joe/.rbenv/versions/2.0.0-p0/lib/ruby/gems/2.0.0/gems/rack-1.5.2/lib/rack/commonlogger.rb:33:in `call'
/Users/joe/.rbenv/versions/2.0.0-p0/lib/ruby/gems/2.0.0/gems/puma-2.1.0/lib/puma/configuration.rb:66:in `call'
/Users/joe/.rbenv/versions/2.0.0-p0/lib/ruby/gems/2.0.0/gems/puma-2.1.0/lib/puma/server.rb:369:in `handle_request'
/Users/joe/.rbenv/versions/2.0.0-p0/lib/ruby/gems/2.0.0/gems/puma-2.1.0/lib/puma/server.rb:246:in `process_client'
/Users/joe/.rbenv/versions/2.0.0-p0/lib/ruby/gems/2.0.0/gems/puma-2.1.0/lib/puma/server.rb:145:in `block in run'
/Users/joe/.rbenv/versions/2.0.0-p0/lib/ruby/gems/2.0.0/gems/puma-2.1.0/lib/puma/thread_pool.rb:92:in `call'
/Users/joe/.rbenv/versions/2.0.0-p0/lib/ruby/gems/2.0.0/gems/puma-2.1.0/lib/puma/thread_pool.rb:92:in `block in spawn_thread'
Sebastian.Kim

having the same issue..

Rack app error: #<ThreadError: Attempt to unlock a mutex which is locked by another thread>
/Users/seb/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/rack-1.5.2/lib/rack/lock.rb:22:in `unlock'
/Users/seb/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/rack-1.5.2/lib/rack/lock.rb:22:in `ensure in call'
/Users/seb/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/rack-1.5.2/lib/rack/lock.rb:23:in `call'
/Users/seb/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/actionpack-4.0.0/lib/action_dispatch/middleware/static.rb:64:in `call'
/Users/seb/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/railties-4.0.0/lib/rails/engine.rb:511:in `call'
/Users/seb/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/railties-4.0.0/lib/rails/application.rb:97:in `call'
/Users/seb/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/rack-1.5.2/lib/rack/content_length.rb:14:in `call'
/Users/seb/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/puma-2.3.2/lib/puma/server.rb:371:in `handle_request'
/Users/seb/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/puma-2.3.2/lib/puma/server.rb:248:in `process_client'
/Users/seb/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/puma-2.3.2/lib/puma/server.rb:147:in `block in run'
/Users/seb/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/puma-2.3.2/lib/puma/thread_pool.rb:92:in `call'
/Users/seb/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/puma-2.3.2/lib/puma/thread_pool.rb:92:in `block in spawn_thread'
2013-07-16 03:17:24 -0700: Rack app error: #<ThreadError: Attempt to unlock a mutex which is not locked>
/Users/seb/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/rack-1.5.2/lib/rack/lock.rb:22:in `unlock'
/Users/seb/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/rack-1.5.2/lib/rack/lock.rb:22:in `ensure in call'
/Users/seb/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/rack-1.5.2/lib/rack/lock.rb:23:in `call'
/Users/seb/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/actionpack-4.0.0/lib/action_dispatch/middleware/static.rb:64:in `call'
/Users/seb/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.Completed 401 Unauthorized in 11710ms
0.0/gems/railties-4.0.0/lib/rails/engine.rb:511:in `call'
/Users/seb/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/railties-4.0.0/lib/rails/application.rb:97:in `call'
/Users/seb/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/rack-1.5.2/lib/rack/content_length.rb:14:in `call'
/Users/seb/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/puma-2.3.2/lib/puma/server.rb:371:in `handle_request'
/Users/seb/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/puma-2.3.2/lib/puma/server.rb:248:in `process_client'
/Users/seb/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/puma-2.3.2/lib/puma/server.rb:147:in `block in run'
/Users/seb/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/puma-2.3.2/lib/puma/thread_pool.rb:92:in `call'
/Users/seb/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/puma-2.3.2/lib/puma/thread_pool.rb:92:in `block in spawn_thread'
huetsch

I believe this is a problem with your controller code as opposed to a problem with the library. If you watch your network requests and look at the W3C spec for EventSource, you'll notice that an EventSource's behavior is to create a new connection upon receiving data. What happens is redis.subscribe blocks until it picks up a message, then it keeps on blocking. In the meantime, your EventSource in the browser has closed its connection (and the rails server has no way of knowing unless it attempts to write to the closed stream), and created a new request to the server. So now you have two threads, both blocking on redis.subscribe. Keep repeating that and you will eventually run out of threads.

If you call redis.unsubscribe('job') at the end of your message block you will not have this problem.

All that said, I do wish I knew of a clean way to create a global redis listener that just stayed constantly subscribed and broadcasted to my IO streams upon message receipt. This subscribing and unsubscribing once per data write seems ugly.

huetsch

Sorry, I went and re-read the W3C spec. The information above regarding the behavior of EventSource is wrong. EventSource will NOT automatically close the connection and then re-open it, so you should NOT call redis.unsubscribe as I mentioned above. I was confused because of the network request behavior I was seeing in my browser. I suspect you are seeing the same behavior, because the end result is maxing out a bunch of blocked threads.

My behavior was caused by an incorrectly configured nginx reverse proxy sitting in front of my rails. If you're using nginx too, the following configuration solved the problem for me:

  proxy_set_header X-Accel-Buffering no; 
  proxy_set_header Connection ''; 
  proxy_http_version 1.1;
  chunked_transfer_encoding off;
Eric Gruber
egru commented

I'm using puma as my server.

jtomasrl

same problem here, and when i control+C my puma server i get

Started GET "/orders/51e88b433e16c03bbc000012" for 127.0.0.1 at 2013-07-20 22:52:43 -0400
Processing by StreamerController#orders as text/event-stream
  Parameters: {"store_id"=>"51e88b433e16c03bbc000012"}
^CExiting
Completed 401 Unauthorized in 242743ms
2013-07-20 22:56:46 -0400: Rack app error: #<ThreadError: Attempt to unlock a mutex which is locked by another thread>
/Users/jtomasrl/.rbenv/versions/2.0.0-p0/lib/ruby/gems/2.0.0/gems/rack-1.5.2/lib/rack/lock.rb:22:in `unlock'
/Users/jtomasrl/.rbenv/versions/2.0.0-p0/lib/ruby/gems/2.0.0/gems/rack-1.5.2/lib/rack/lock.rb:22:in `ensure in call'
/Users/jtomasrl/.rbenv/versions/2.0.0-p0/lib/ruby/gems/2.0.0/gems/rack-1.5.2/lib/rack/lock.rb:23:in `call'
/Users/jtomasrl/.rbenv/versions/2.0.0-p0/lib/ruby/gems/2.0.0/gems/actionpack-4.0.0/lib/action_dispatch/middleware/static.rb:64:in `call'
/Users/jtomasrl/.rbenv/versions/2.0.0-p0/lib/ruby/gems/2.0.0/gems/railties-4.0.0/lib/rails/engine.rb:511:in `call'
/Users/jtomasrl/.rbenv/versions/2.0.0-p0/lib/ruby/gems/2.0.0/gems/railties-4.0.0/lib/rails/application.rb:97:in `call'
/Users/jtomasrl/.rbenv/versions/2.0.0-p0/lib/ruby/gems/2.0.0/gems/rack-1.5.2/lib/rack/content_length.rb:14:in `call'
/Users/jtomasrl/.rbenv/versions/2.0.0-p0/lib/ruby/gems/2.0.0/gems/puma-2.3.2/lib/puma/server.rb:371:in `handle_request'
/Users/jtomasrl/.rbenv/versions/2.0.0-p0/lib/ruby/gems/2.0.0/gems/puma-2.3.2/lib/puma/server.rb:248:in `process_client'
/Users/jtomasrl/.rbenv/versions/2.0.0-p0/lib/ruby/gems/2.0.0/gems/puma-2.3.2/lib/puma/server.rb:147:in `block in run'
/Users/jtomasrl/.rbenv/versions/2.0.0-p0/lib/ruby/gems/2.0.0/gems/puma-2.3.2/lib/puma/thread_pool.rb:92:in `call'
/Users/jtomasrl/.rbenv/versions/2.0.0-p0/lib/ruby/gems/2.0.0/gems/puma-2.3.2/lib/puma/thread_pool.rb:92:in `block in spawn_thread'

Joe Ellis

Also wanted to add that I tried switching to unicorn and have been running into the same issues (although no locked mutex error)

huetsch

As far as I know, there is no good way for the rails server to know that your client has closed a connection short of attempting to write to response.stream. I periodically send a heartbeat to redis to trigger the IOError and clean up old threads. Otherwise you will end up with a bunch of blocked threads. Try refreshing a page until the server (I use puma) hangs, then publish to your redis channel. If you use some logging and watch your logs while you do this, you should be able to understand where and why your threads are hanging, and the purpose of the "flushing heartbeat".

If there is a better approach than the heartbeat one, i.e. a way for the server to better know when a client has disconnected, I would love to hear about it.

Maximilian Busch

+1

Nicholas Jakobsen

I worked around what may be the same issue in this gist https://gist.github.com/njakobsen/6257887. However the solution is not exactly the Mona Lisa, so take it with a grain of salt. See the sleep hack comment.

Adam Fortuna

+1

Used something like this in the meantime: http://stackoverflow.com/questions/14268690/actioncontrollerlive-is-it-possible-to-check-if-connection-is-still-alive Not the most beautiful solution, but worked for killing the connection off.

teeg82

I've posted a question on StackOverflow to try and find other solutions in regards to this problem: http://stackoverflow.com/q/18970458/877472. The interim solution I've come up with (which you can see in the question) is an implementation of the "flushing heartbeat" idea suggested by @huetsch. It works, but I find it difficult to believe there isn't a better way. I'll soon refactor it so that it behaves more like a single long-running connection thread garbage collector (as opposed to creating a new flusher thread every connection), but other than that I cannot presently come up with a better solution either. Any suggestions are welcome.

huetsch

Thanks for summarizing all of this and putting it on StackOverflow @teeg82. My final code is a lot like yours, except I use a cron script to do my "flushing heartbeat" and, as you mentioned over there, I used psubscribe.

I'm still hoping for a better solution. SSE ultimately runs on TCP, so it seems like there should be some FIN packet sent from the client when the browser is done with its connection, but perhaps browser implementations just trash the connections without notifying the remote server that they're closing the socket. That would leave the connection half-open and mean we would need a solution like our current one. Maybe browsers don't actually do this and something isn't being caught in Rails that should. Or maybe browsers have a very good reason to not send a FIN packet to the server. I don't have enough knowledge of the situation to hypothesize further.

What I do know is that without this kind of fix, Live Streaming in Rails feels pretty crippled. If it really is a client-side issue, I'd hope that we could get a solution like above (or better) bundled into Rails. Making everyone roll their own seems... redundant. If it's just a Rails issue where something isn't being caught, that would probably be easier to track down and solve. @tenderlove wrote quite a bit about this feature on his blog (maybe he even authored most of the streaming code?) - perhaps he has something to say about the matter.

teeg82

Using Wireshark, I can indeed see FIN packets being sent. Admittedly, I am not very knowledgeable or experienced with protocol level stuff, however from what I can tell, I definitely detect a FIN packet being sent from the browser when I establish the connection using the EventSource library, and NO packet sent if I suppress that line of code (meaning disallow that connection). Though I'm not terribly up on my TCP knowledge, this seems to indicate to me that the connection is at least being properly terminated by the client. So like you said, @huetsch, it might indeed indicate a bug in Puma or Rails (at least to me). I'm going to try another server (Unicorn) and see if that makes any difference, though I'm not hopeful, since joeellis earlier indicated it made no difference.

huetsch

I believe I recall reading that Goliath had a solution to this problem - which would seem to confirm the theory that this is a problem in the Rails stack. I glanced at Goliath because of that, but I have a large codebase and porting everything to Goliath for a potential solution to this specific problem was definitely out of the picture. Still, it's a direction you may want to consider.

Alexey Medvedev

+1

Sergey Alexandrovich

+1

boutcheratwest

Same problem here. Have ruled out networking problems, as I am proxying everything through Apache into Puma, and I've killed Apache and the threads in Puma don't die. I've also published messages to Redis that would normally have been consumed by the clients, but the ActionController::Live methods aren't detecting the client isn't there.

boutcheratwest

Is this problem specific to that blocking redis.subscribe call? If you look at the redis documentation page for SUBSCRIBE, you'll see the top comment is someone asking for a timeout on the subscribe so it doesn't block forever.. http://redis.io/commands/subscribe#comment-720420850

teeg82

@boutcheratwest Yeah that post is very much related to the issue. As that person described, there's no way for your Rails code to execute the (p)unsubscribe method. The end result is a perpetually running thread. Ultimately, that's why the connection poker concept works, because we're deliberately causing an error to be thrown which allows the connection and the thread to shut down properly. Which is goofy and, as that person implies, inherently broken. A timeout I think would work too because your client code can simply re-establish the connection when it dies. I'm not sure which method is better. Ideally, for mine and huetsch's implementation of the solution, I'd like to find a way to determine whether the connection is closed on the other end without writing to the stream. As it is right now, I have to implement client-side code to handle my "poking" message separately, which is obtrusive and goofy as heck.

boutcheratwest

Taking a lot from @huetsch and @teeg82, ended up with a solution that I tossed on SO: http://stackoverflow.com/a/19485363/928963 .. One single thread publishing a heartbeat to a dedicated Redis channel, then the controller signs for the heartbeat channel as well as the one it was interested in, and ensures any heartbeat requests cause a heartbeat event sent down the pipe to the client.. Upon disconnection, 30-60 seconds later the thread will be destroyed. (Thanks guys for the help)

huetsch

I still believe this is a poor solution and that the answer lies further down the Rails stack.

It looks like it's definitely possible for the server to pick up client disconnects automatically. Check out Goliath:

https://github.com/postrank-labs/goliath/wiki/Streaming

They're breaking the incoming connection into headers, body, and close and giving you a chance to react with every event. "on_close" looks like it would give us exactly the hook we need for an elegant solution.

I wonder how hard it would be to re-architect Rails to do something similar. Maybe Rails was engineered in such a way that this isn't a quick fix. It would be great if someone more knowledgeable could chime in.

John Ledbetter

A heartbeat solution is not ideal because 1) you are pushing out more data than you need to to every client, and 2) worst case your client disconnects immediately after a heartbeat and your worker is sitting idle for the heartbeat duration, not doing anything useful.

With that said I may have to go with a heartbeat based approach until someone has a better solution :)

teeg82

Yup, @ledbettj and @huetsch are absolutely right, it is not ideal for exactly the reasons ledbettj pointed out. For this solution, you basically have a trade off between the two points he mentioned: Either you decrease the time between heartbeats so that you minimize the amount of time threads / connections are left open, resulting in MORE data being pushed through the open stream, or you increase that time and minimize the data being sent, leaving threads / connections potentially open longer.

Sadly, it is the only one we have at the moment, other than perhaps periodically closing the connection (which would require the client code to periodically re-establish the connection). I suppose you could also implement an "activity" timer that shuts down the connection if there has been no activity for a period of time. Obvious downsides would be that the client would still need to re-establish the connection, and depending on how long you consider "inactivity" to be, you'd still have dangling connections / threads before its considered "inactive".

Interesting find, huetsch. If Goliath can do it, Rails should be able to do it too (I hope). I'll try and look in to the Goliath stack to see how they implemented their solution, and see what we can find (time permitting...).

Johnny Robeson

related for puma users: puma/puma#389

Quoting Evan:

I'm going to add a timed terminate and some code to request exit and then actually wait for it to exit to 2.7.0. That should fix it.
panlilu

+1

Nathan Palmer nathanpalmer referenced this issue from a commit
Commit has since been removed from the repository and is no longer available.
Henry Hwangbo

+1

Pierre Pretorius

This issue can be reproduced without redis right? Any controller stream that doesn't write will be kept open indefinitely. I've seen some other strange behaviour like adding a sleep(0) gives different behaviour.

Nathan Palmer nathanpalmer referenced this issue from a commit
Commit has since been removed from the repository and is no longer available.
José Francisco Calvo

+1. The same here. @tenderlove any idea about the status of this?.

Aaron Patterson
Owner

Can someone give me a sample app to work with?

José Francisco Calvo

@tenderlove here you have one that I have just created: https://github.com/jfcalvo/rails-live-streaming-closing to help you replicate the problem.

It is really simple, you can send some SSE messages to the app with redis-cli like:

publish events-channel "hello"

I have created it with Rails 4.1.0. I though it has some problems too reloading with Puma threads and Spring (my first time using Spring so I'm not sure if thats the problem).

Nathan Palmer nathanpalmer referenced this issue from a commit
Commit has since been removed from the repository and is no longer available.
johnklawlor

If I wanted to submit a patch to Rails to fix this issue, does anyone have suggestions as to where I would begin? Does ActionDispatch seem like a reasonable place?

Eric Boehs ericboehs referenced this issue in gottfrois/dashing-rails
Closed

Redis / PostGreSQL Timeout in Heroku #12

Kyle d'Oliveira

+1

Rafael Mendonça França rafaelfranca locked and limited conversation to collaborators
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Something went wrong with that request. Please try again.