Problem with Passenger Enterprise 4.0.14 #10

Closed
fokcep opened this Issue Sep 24, 2013 · 61 comments

Projects

None yet

9 participants

@fokcep
fokcep commented Sep 24, 2013

Just wondering if someone had issues running tubesock with the Passenger Enterprise Server. Could be nothing to do with tubesock at all. It supposed to run ok. But getting a long delay when calling the following bit of code from tubesock:

def self.hijack(env)
  .......    
  handshake.from_rack env
  .......
end

it takes around 50 seconds to execute this method and eventually it fails to get WS connection.

Dump of env['rack.hijack']

Proc:0x00000004505060@/opt/passenger/passenger-enterprise-server-4.0.14/lib/phusion_passenger/rack/thread_handler_extension.rb:53 (lambda)

any ideas?

Thanks

@ngauthier
Owner

I haven't tried it on anything but Puma and Rails. Seems like they have supported it for a while:

http://blog.phusion.nl/2013/01/23/the-new-rack-socket-hijacking-api/

Can you run through that tutorial and make sure you have proper rack.hijack support going? After that, you're braving new territory! Let me know how it goes.

@fokcep
fokcep commented Sep 24, 2013

hm, it's interesting
I went through this tutorial which I've seen before and replaced the problem line of code in your hijack method (https://github.com/ngauthier/tubesock/blob/master/lib/tubesock.rb#L25) with the piece of code from the tutorial.

i.e. now instead of

def self.hijack(env)
  ...
  handshake = WebSocket::Handshake::Server.new
  handshake.from_rack env
  ...
end

I have

def self.rack_env_key_to_http_header_name(key)
    name = key.downcase.gsub('_', '-')
    name[0] = name[0].upcase
    name.gsub!(/-(.)/) do |chr|
      chr.upcase
    end
    name
end

def self.hijack(env)
  ...
  handshake = WebSocket::Handshake::Server.new
  data = "#{env['REQUEST_METHOD']} #{env['REQUEST_URI']} #{env['SERVER_PROTOCOL']}\r\n"
        env.each_pair do |key, val|
          if key =~ /^HTTP_(.*)/
            name = rack_env_key_to_http_header_name($1)
            data << "#{name}: #{val}\r\n"
          end
        end
   data << "\r\n"
   handshake << data
  ...
end

and now Puma seems to work fine and Passenger works too. I will do more tests tomorrow and let you know. But can you explain why 'handshake.from_rack env' might not work for Passenger but it works for Puma?

Cheers

@ngauthier
Owner

I added that specifically to websocket-ruby for tubesock's sake:

imanel/websocket-ruby#15

I think the best thing to do would be to take the env and make a test for the headers and post an issue at websocket-ruby. Then all projects would benefit from it.

Nick

@fokcep
fokcep commented Sep 26, 2013

Thanks for that!

I've done few tests with Passenger and looks like it's opening connections just fine but then no requests coming through from the clients or from the server. I.e. pings are not coming through.
So need to find out what it is. It's definitely not my load balancer. Something else

@ngauthier
Owner

Actually, it could be anything in front of your web server. A year ago when I was playing w/ websockets we couldn't use nginx, we had to use haproxy in a special mode to keep the websocket connections alive. Load balancers usually don't keep tcp connections like websockets open like that.

I believe nginx has websocket support now, but you may have to compile from scratch.

Try without the load balancer (directly connect) just to troubleshoot. If that still doesn't work at least we really can rule out the balancer.

What is your balancer by the way?

@fokcep
fokcep commented Sep 26, 2013

No, it's definitely not load balancer. I've tried to connect directly from the same box. So it could be nginx or something else.

I've noticed the following code doesn't return anything, it's stuck there on IO.select, which is strange

def each_frame
    ....
      while IO.select([@socket])
    ...
end

i.e. https://github.com/ngauthier/tubesock/blob/master/lib/tubesock.rb#L92

Will do some more digging

@ngauthier
Owner

Hmm. Just to check, are you on Ruby 2?

@fokcep
fokcep commented Sep 26, 2013

yep, ofc. I always check that everything works with Puma after I've done all my changes.

@fokcep
fokcep commented Sep 26, 2013

Obviously we've built nginx 1.4.2 and passenger enterprise 4.0 and they both have web socket support.

@fokcep
fokcep commented Sep 30, 2013

By the way, it seems that 'websocket-rails' library (which in turn uses Faye WebSocket which seems to use rack hijack but based on event machine) works on Passenger and nginx, and this chat application - https://github.com/themgt/ws42-chat works fine. I will try to find out whether tubesock can be improved based on that info.

@ngauthier
Owner

OK cool. Yeah I know the project, but at the time it didn't have
fine-grained enough control for me, hence tubesock. Also yeah, the EM
difference.

On Mon, Sep 30, 2013 at 12:34 PM, fokcep notifications@github.com wrote:

By the way, it seems that 'websocket-rails' library (which in turn uses
Faye WebSocket which seems to use rack hijack but based on event machine)
works on Passenger and nginx, and this chat application -
https://github.com/themgt/ws42-chat works fine. I will try to find out
whether tubesock can be improved based on that info.


Reply to this email directly or view it on GitHubhttps://github.com/ngauthier/tubesock/issues/10#issuecomment-25379416
.

@fokcep
fokcep commented Oct 1, 2013

Looks like EM should run in the different thread, and if it runs in the same thread it locks up.
So websocket-rails doesn't work with Passenger after all. So still no luck.
Any idea why IO.select([@socket]) could lock up when using tubesock?
I am gonna ask Passenger support and see if they can make it work.

@FooBarWidget

Phusion here. We got a support request from an Enterprise customer a while ago. We got Tubesock working now, and this is what we've found.

There are Rack specification violations in both Tubesock and the Websocket gem. Let me first address the freezing problem that you guys have already encountered. The freeze is caused by the fact that WebSocket::Handshake::Server#handshake calls the following code snippet:

@leftovers = env['rack.input'].read

Obviously, the intention is to slurp all remaining data on the socket.

This code path is not supposed to work because env['rack.input'].read is supposed to be blocking, and at this point the client is waiting for the server to send a reply, so the code would never have done what it intended to. And this is actually the reason why the code freezes in #handshake when run on Phusion Passenger.

This code path happens to work on Puma because Puma replaces env['rack.input'] with a NullIO object right after performing hijacking. The NullIO object turns the env['rack.input'].read call into a no-op. So things work on Puma by accident, while on Passenger the code gets stuck on the read, forever trying to read data.

There is also a Rack specification violation in Tubesock itself. From tubesock.rb line 93:

data, addrinfo = @socket.recvfrom(2000)

According to the Rack specification, the hijacked socket object is not guaranteed to implement #recvfrom. An indeed, Phusion Passenger does not, while Puma does. Both Phusion Passenger and Puma happen to implement #readpartial so you can use that as a more portable (albeit still non-standard) substitute. Starting from version 4.0.20, Phusion Passenger will implement #recvfrom as well.

Finally, we confirm that Tubesock works fine even with the open source version of Phusion Passenger. Enterprise is not necessary.

@fokcep
fokcep commented Oct 1, 2013

Interesting, thanks for that.

I don't seem to have a problem calling from_rack after fixing it in a different way, see above. I.e. opening connection now works but sending any subsequent WS message failed i.e. locks up in the LIne 92. Could it be something else in Passenger? A config option or something?

@FooBarWidget

What do you mean by "locks up"? Isn't it normal that the select call there blocks until the client sends something over the websocket?

Replacing the @leftovers = ... line in the websocket gem with @leftovers = "", and replacing the recvfrom call with readpartial, makes the sock-chat demo work for us.

@fokcep
fokcep commented Oct 1, 2013

The problem is that it blocks forever no matter if the client sends something or not. Have you tested with the open source Passenger as well as with the Enterprise version?

@FooBarWidget

Yes, it works fine with the open source version. Which demo app are you
using?

Sent from my Android phone.
Op 1 okt. 2013 20:42 schreef "fokcep" notifications@github.com het
volgende:

The problem is that it blocks forever no matter if the client sends
something or not. Have you tested with the open source Passenger as well as
with the Enterprise version?


Reply to this email directly or view it on GitHubhttps://github.com/ngauthier/tubesock/issues/10#issuecomment-25476462
.

@ngauthier
Owner

Hi @FooBarWidget, thanks for helping out!

Sounds like we should file an issue with websocket-ruby to fix up the read calls. Then in tubesock we can either switch to readpartial, or wait for recvfrom to be implemented there too.

@FooBarWidget, do you happen to know what the API of ENV['rack.input'] is guaranteed to be?

@FooBarWidget

It's documented in the Rack specification. See section "The Input Stream". Basically, only gets, each, read and rewind are guaranteed. Everything else is vendor-specific.

@ngauthier
Owner

OK, looks like we need to revisit the implementation of websocket-ruby and
also tubesock to ensure compliance with the rack spec. Hopefully we can be
compliant and also achieve the same functionality.

Thanks for your help so far!

Nick

On Tue, Oct 1, 2013 at 7:01 PM, Hongli Lai notifications@github.com wrote:

It's documented in the Rack specificationhttp://rack.rubyforge.org/doc/SPEC.html.
See section "The Input Stream". Basically, only gets, each, read and rewind
are guaranteed. Everything else is vendor-specific.


Reply to this email directly or view it on GitHubhttps://github.com/ngauthier/tubesock/issues/10#issuecomment-25500056
.

@fokcep
fokcep commented Oct 18, 2013

Hi again Nick,

Just a quick question if you know the answer before I post a new issue on
the git project's board.
I've been testing your chat app again. What I was trying to do is to use db
connection to save all the messages to the database i.e.

    Redis.new.subscribe "chat" do |on|
      on.message do |channel, message|
        Message.create!(:text => message) # <- my new line
        tubesock.send_data message
      end
    end

but it seems that it's probably a bad idea i.e. looks like every web socket
holds a single db connection. So if for instance I start 6 chat clients and
only have 5 connections in the pool the 6th client won't connect.
What would you recommend to do? Or is it a known limitation i.e. I should
use a message queue to get messages and then process them from the rails
app?

Thanks

Pavel

@ngauthier
Owner

I just turn the db pool way up. I like your queuing idea, that's way
smarter!

I wonder what a reasonable amount of maximum connections to a postgresql db
is...

Nick
On Oct 18, 2013 6:41 AM, "fokcep" notifications@github.com wrote:

Hi again Nick,

Just a quick question if you know the answer before I post a new issue on
the git project's board.
I've been testing your chat app again. What I was trying to do is to use
db
connection to save all the messages to the database i.e.

Redis.new.subscribe "chat" do |on|
on.message do |channel, message|
Message.create!(:text => message) # <- my new line
tubesock.send_data message
end
end

but it seems that it's probably a bad idea i.e. looks like every web
socket
holds a single db connection. So if for instance I start 6 chat clients
and
only have 5 connections in the pool the 6th client won't connect.
What would you recommend to do? Or is it a known limitation i.e. I should
use a message queue to get messages and then process them from the rails
app?

Thanks

Pavel

On Wed, Oct 2, 2013 at 1:52 PM, Nick Gauthier notifications@github.comwrote:

OK, looks like we need to revisit the implementation of websocket-ruby
and
also tubesock to ensure compliance with the rack spec. Hopefully we can
be
compliant and also achieve the same functionality.

Thanks for your help so far!

Nick

On Tue, Oct 1, 2013 at 7:01 PM, Hongli Lai notifications@github.com
wrote:

It's documented in the Rack specification<
http://rack.rubyforge.org/doc/SPEC.html>.
See section "The Input Stream". Basically, only gets, each, read and
rewind
are guaranteed. Everything else is vendor-specific.


Reply to this email directly or view it on GitHub<
https://github.com/ngauthier/tubesock/issues/10#issuecomment-25500056>
.


Reply to this email directly or view it on GitHub<
https://github.com/ngauthier/tubesock/issues/10#issuecomment-25535575>
.


Reply to this email directly or view it on GitHubhttps://github.com/ngauthier/tubesock/issues/10#issuecomment-26586364
.

@fokcep
fokcep commented Oct 18, 2013

You are right, but increasing db pool is not really scalable. So do you
know if there is a way to reuse db connection at all in the current
architecture? Or is it limitation of rails/rack app? I don't see any
particular reason why web socket should hold db connection. Could it be
just a bug somewhere (rails 4?)

On Fri, Oct 18, 2013 at 12:08 PM, Nick Gauthier notifications@github.comwrote:

I just turn the db pool way up. I like your queuing idea, that's way
smarter!

I wonder what a reasonable amount of maximum connections to a postgresql
db
is...

Nick
On Oct 18, 2013 6:41 AM, "fokcep" notifications@github.com wrote:

Hi again Nick,

Just a quick question if you know the answer before I post a new issue
on
the git project's board.
I've been testing your chat app again. What I was trying to do is to use
db
connection to save all the messages to the database i.e.

Redis.new.subscribe "chat" do |on|
on.message do |channel, message|
Message.create!(:text => message) # <- my new line
tubesock.send_data message
end
end

but it seems that it's probably a bad idea i.e. looks like every web
socket
holds a single db connection. So if for instance I start 6 chat clients
and
only have 5 connections in the pool the 6th client won't connect.
What would you recommend to do? Or is it a known limitation i.e. I
should
use a message queue to get messages and then process them from the rails
app?

Thanks

Pavel

On Wed, Oct 2, 2013 at 1:52 PM, Nick Gauthier notifications@github.comwrote:

OK, looks like we need to revisit the implementation of websocket-ruby
and
also tubesock to ensure compliance with the rack spec. Hopefully we
can
be
compliant and also achieve the same functionality.

Thanks for your help so far!

Nick

On Tue, Oct 1, 2013 at 7:01 PM, Hongli Lai notifications@github.com
wrote:

It's documented in the Rack specification<
http://rack.rubyforge.org/doc/SPEC.html>.
See section "The Input Stream". Basically, only gets, each, read and
rewind
are guaranteed. Everything else is vendor-specific.


Reply to this email directly or view it on GitHub<
https://github.com/ngauthier/tubesock/issues/10#issuecomment-25500056>

.


Reply to this email directly or view it on GitHub<
https://github.com/ngauthier/tubesock/issues/10#issuecomment-25535575>
.


Reply to this email directly or view it on GitHub<
https://github.com/ngauthier/tubesock/issues/10#issuecomment-26586364>
.


Reply to this email directly or view it on GitHubhttps://github.com/ngauthier/tubesock/issues/10#issuecomment-26587626
.

@ngauthier
Owner

Check out the rails file in the source. I explicitly grab a connection for
the whole request, and release afterwards.

It's much easier, but less scalable.

I think it would make more sense to not hold a connection, then let the
user exec a block with a connection.

Maybe we wrap the on message, on open, on close with a connection, so you
just grab a connection when the code actually runs. That would be awesome.

Nick
On Oct 18, 2013 7:12 AM, "fokcep" notifications@github.com wrote:

You are right, but increasing db pool is not really scalable. So do you
know if there is a way to reuse db connection at all in the current
architecture? Or is it limitation of rails/rack app? I don't see any
particular reason why web socket should hold db connection. Could it be
just a bug somewhere (rails 4?)

On Fri, Oct 18, 2013 at 12:08 PM, Nick Gauthier notifications@github.comwrote:

I just turn the db pool way up. I like your queuing idea, that's way
smarter!

I wonder what a reasonable amount of maximum connections to a postgresql
db
is...

Nick
On Oct 18, 2013 6:41 AM, "fokcep" notifications@github.com wrote:

Hi again Nick,

Just a quick question if you know the answer before I post a new issue
on
the git project's board.
I've been testing your chat app again. What I was trying to do is to
use
db
connection to save all the messages to the database i.e.

Redis.new.subscribe "chat" do |on|
on.message do |channel, message|
Message.create!(:text => message) # <- my new line
tubesock.send_data message
end
end

but it seems that it's probably a bad idea i.e. looks like every web
socket
holds a single db connection. So if for instance I start 6 chat
clients
and
only have 5 connections in the pool the 6th client won't connect.
What would you recommend to do? Or is it a known limitation i.e. I
should
use a message queue to get messages and then process them from the
rails
app?

Thanks

Pavel

On Wed, Oct 2, 2013 at 1:52 PM, Nick Gauthier <
notifications@github.com>wrote:

OK, looks like we need to revisit the implementation of
websocket-ruby
and
also tubesock to ensure compliance with the rack spec. Hopefully we
can
be
compliant and also achieve the same functionality.

Thanks for your help so far!

Nick

On Tue, Oct 1, 2013 at 7:01 PM, Hongli Lai notifications@github.com

wrote:

It's documented in the Rack specification<
http://rack.rubyforge.org/doc/SPEC.html>.
See section "The Input Stream". Basically, only gets, each, read
and
rewind
are guaranteed. Everything else is vendor-specific.


Reply to this email directly or view it on GitHub<

https://github.com/ngauthier/tubesock/issues/10#issuecomment-25500056>

.


Reply to this email directly or view it on GitHub<
https://github.com/ngauthier/tubesock/issues/10#issuecomment-25535575>

.


Reply to this email directly or view it on GitHub<
https://github.com/ngauthier/tubesock/issues/10#issuecomment-26586364>
.


Reply to this email directly or view it on GitHub<
https://github.com/ngauthier/tubesock/issues/10#issuecomment-26587626>
.


Reply to this email directly or view it on GitHubhttps://github.com/ngauthier/tubesock/issues/10#issuecomment-26587826
.

@FooBarWidget

When you use ActiveRecord in a new thread, ActiveRecord will checkout a connection from its connection pool if the current thread doesn't have one already. It is then up to you to release that connection with ActiveRecord::Base.clear_active_connections. If you don't then the pool will be stay exhausted.

@ngauthier
Owner

Yup. So we would release it immediately after threading, then during the
callback methods we could checkout a connection for the duration of the
method.

On Fri, Oct 18, 2013 at 8:00 AM, Hongli Lai notifications@github.comwrote:

When you use ActiveRecord in a new thread, ActiveRecord will checkout a
connection from its connection pool if the current thread doesn't have one
already. It is then up to you to release that connection with
ActiveRecord::Base.clear_active_connections. If you don't then the pool
will be stay exhausted.


Reply to this email directly or view it on GitHubhttps://github.com/ngauthier/tubesock/issues/10#issuecomment-26590199
.

@fokcep
fokcep commented Oct 18, 2013

@Nick - yes, I can see what you mean, it's in the code.
Would be great not to hold a connection, and pass it with a block

@fokcep fokcep closed this Oct 18, 2013
@fokcep fokcep reopened this Oct 18, 2013
@renchap renchap referenced this issue in imanel/websocket-ruby Oct 23, 2013
Closed

env['rack.input'].read is blocking in rack spec #19

@renchap
renchap commented Oct 23, 2013

I can confirm that the sock-chat and my own app works with Passenger 4.0.20 and by changing @leftovers = "" in websocket gem.

I reported the issue as imanel/websocket-ruby#19 and I hope for a quick fix.

@ngauthier
Owner

Awesome thanks.

On Wed, Oct 23, 2013 at 7:10 AM, Renaud Chaput notifications@github.comwrote:

I can confirm that the sock-chat and my own app works with Passenger
4.0.20 and by changing @leftovers https://github.com/leftovers = "" in
websocket gem.

I reported the issue as imanel/websocket-ruby#19https://github.com/imanel/websocket-ruby/issues/19and I hope for a quick fix.


Reply to this email directly or view it on GitHubhttps://github.com/ngauthier/tubesock/issues/10#issuecomment-26896547
.

@imanel
imanel commented Nov 13, 2013

Does changing read to readpartial in websocket-ruby solves this problem? I couldn't reproduce your error and it was tested on thin, unicorn and puma.

@FooBarWidget

Passenger supports readpartial.

@imanel
imanel commented Nov 13, 2013

Version 1.1.2 released that should fix this bug

@ngauthier
Owner

thanks @imanel!

@fokcep can you confirm it's fixed? If so, we can up tubesock's websocket-ruby version to ~> 1.1.2

@fokcep
fokcep commented Nov 14, 2013

Hi @ngauthier ,

nope, it doesn't work for me. @FooBarWidget solution worked i.e. setting @leftovers = ""
but in the 1.1.2 it's still doing input.read

i.e. https://github.com/imanel/websocket-ruby/blob/master/lib/websocket/handshake/server.rb#L92

@ngauthier
Owner

so, @FooBarWidget says that passenger supports readpartial, but you're still showing websocket ruby calling read, and not readpartial? Can you confirm your passenger version from your Gemfile.lock?

@FooBarWidget

Sorry, I should have been clearer. Passenger supports readpartial on the hijacked socket, but not on rack.input object. I haven't been following this topic so I was thinking that you guys would be using the Rack socket hijacking API.

Speaking about that: can you guys use the socket hijacking API? Our rack.input object is not a pure socket object because it has to stay compatible with a lot of broken Rack middlewares (e.g. middlewares that assume rack.input is seekable, although sockets are never seekable).

@ngauthier
Owner

@FooBarWidget it is the websocket-ruby gem which is reading from rack.input and parsing the handshake, so that's where the fix would have to be. websocket-ruby doesn't do rack hijacking, it's tubesock that does that. So this is split across two libraries with different intentions.

@imanel any intuition here?

@imanel
imanel commented Nov 14, 2013

I'm not sure if I'm correct, but from my understanding it looks like it might be bug in Passenger. If it blocks on read (which imho it shouldn't) and doesn't support readpartial (...) then it should probably be fixed. All other servers (webrick, mongrel, thin, unicorn, puma and goliath) are working with just read... But if you have some idea about how to fix it on WeSocket-Ruby side then I'm open for implementing it.

@FooBarWidget

If it blocks on read (which imho it shouldn't)

This is not a bug. It is exactly per the specification, as I've explained in detail in comment 25471793.

The whole purpose of read is to block until exactly N bytes are read. It will never return anything less than N bytes, unless on EOF. It always blocks if there is no data available.

All other servers (webrick, mongrel, thin, unicorn, puma and goliath) are working with just read

It doesn't work on the other servers, as I've explained in the comment before. It only appears that way.

The read call doesn't block on Puma because, after socket hijacking, Puma replaces rack.input with a NullIO object. In other words, reading from rack.input becomes a no-op, which is why it doesn't block in Puma.

In imanel/websocket-ruby#19, you mentioned the following:

Unfortunately changing @leftovers empty string is not an option - it's needed for draft 76, and the only way to distinguish draft 76 and draft 75.

However, your current code wouldn't do what you intend to do. You intend to read from the socket, but Puma just makes all read calls on rack.input return nothing, irregardless of the actual socket state!

What do you intend to do with the read call? What is the distinction between draft 76 and draft 75, and what are you doing to detect it?

@imanel
imanel commented Nov 14, 2013

So it looks like solution would be to check if rack.hijack_io is nil, if not then read from it using readpartial or read, if not then fallback to rack.input and try readpartial and read, and if all that fill fail then use empty string - is that right?

@FooBarWidget

I think so, but to be sure I need to understand what you intend to do with that read/readpartial call. What is the distinction between draft 76 and draft 75, and what are you doing to detect it?

@imanel
imanel commented Nov 14, 2013

Draft 76 (still most popular) is sending 8 bytes after headers. This is only way to distinguish it from other drafts and we need those bytes to confirm that it's valid websocket connection and send response.

@FooBarWidget

So what does draft 75 do? Does it not send anything at all after the headers?

However, detecting draft 76 by trying to read those 8 bytes is inherently error-prone, no matter which server you're running the app on. If you try to defect the draft version by reading those 8 bytes then it would be impossible to distinguish between the following two situations:

  • The 8 bytes will never arrive.
  • The 8 bytes will arrive, but have merely been delayed for an arbitrary amount of time.

I noticed that draft 75 does not contain the "Sec-WebSocket-Key1" header. Can't you use that instead for version detection?

@edwardvalentini

hi @FooBarWidget -- would you mind pasting the nginx configuration that enables you to get tubesock working with purely nginx and passenger enterprise? I am only able to get it to work by proxying nginx connection to puma on a specific url such as /websocket (which bypasses passenger altogether). My understanding is that we dont need puma - we can service the websocket entirely with passenger.

@FooBarWidget

@edwardvalentini It's not a Passenger Enterprise configuration issue. The issue lies in Tubesock and websocket-ruby. I'm trying to work with them to have this resolved properly.

@imanel So what do you think about my proposal? Any updates on this?

@imanel
imanel commented Jan 27, 2014

Hi - sorry for late reply, I'm very busy those days.

The place where you detect version is here:
https://github.com/imanel/websocket-ruby/blob/master/lib/websocket/handshake/server.rb#L147

It looks like lack of "Sec-WebSocket-Key1" key should be enough to detect it. Could you prepare pull request?

@jonathanhoskin
Contributor

Hi, I'm another Passenger (enterprise) user affected by this bug. What is the current status and/or workaround?

@imanel
imanel commented Jul 24, 2014

@jonathanhoskin please test it under websocket-ruby 1.2.0 (it was just released) and confirm in this problem still persist. All changes suggested by @FooBarWidget was already implemented there so hopefully it will work now.

@jonathanhoskin
Contributor

Hi, just confirming that it's now working ok: websocket-ruby 1.2.0 + Passenger 4.0.48 + nginx 1.7.3 + rails 4.0.8 + tubesock 0.2.3.

@jriesen
jriesen commented Aug 29, 2014

The sock-chat demo works correctly for me on OS/X with Passenger 4.0.50 + Nginx 1.6.1 + websocket 1.20 + tubesock 0.2.3 + rails 4.1.5. But it operates very intermittently, crashing passenger constantly, if config.cache_classes = true.

At first, I couldn't get it to work: it would crash horribly after every other request. I reproduced the exact versions of everything above (i.e. nginx 1.7.3 and passenger 4.0.48 and everything) and it still failed. Turns out, just like #17, the issue seemed to be with cache reloading; switching to production (or just setting config.cache_classes = true in development.rb) resolves the issue, though inconveniently. (I don't understand why nobody else seems to run into this problem -- doesn't anybody run in development? Then again, I'm probably one of three people in the world trying to use tubesock with Passenger.)

In this particular case, it fails up with a deadlock in Rack. Interestingly, it fails in basically the exact same way under Apache 2.2 + Passenger 4.0.50. (Heh, don't ask... my current production site is in Apache+Passenger, so my local dev environment mirrors it.) As suggested in #17 this probably has something to do with connected threads being unhappy about having their classes ganked out from under them.

For whatever reason Puma doesn't have this problem; a little testing indicates that open-websocket threads don't pick up development code changes... but they don't crash horribly either. In all cases I'm using rack-1.5.2... and it's failed in the same way on both Rails 4.0.8 and 4.1.5 (with slightly different line numbers in the stack trace for railties etc. naturally).

I suspect that makes this a Passenger-specific problem, but I figured I'd follow up here first? If I should have made a new Issue instead of posting it here, I apologize.

Here's the relevant nginx error_log lines:

App 67561 stderr: [ 2014-08-28 23:02:33.0010 67691/0x007f9689678c08(Worker 1) utils.rb:84 ]: *** Exception ThreadError in Rack application object (deadlock; recursive locking) (process 67691, thread 0x007f9689678c08(Worker 1)):
App 67561 stderr:   from /Users/jriesen/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/rack-1.5.2/lib/rack/lock.rb:16:in `lock'
App 67561 stderr:   from /Users/jriesen/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/rack-1.5.2/lib/rack/lock.rb:16:in `call'
App 67561 stderr:   from /Users/jriesen/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/actionpack-4.1.5
App 67561 stderr: /lib/action_dispatch/middleware/static.rb:64:in `call'
App 67561 stderr:   from /Users/jriesen/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/rack-1.5.2/lib/rack/sendfile.rb:112:in `call'
App 67561 stderr:   from /Users/jriesen/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/railties-4.1.5/lib/rails/engine.rb:514:in `call'
App 67561 stderr:   from /Users/jriesen/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/railties-4.1.5/lib/rails/application.rb:144:in `call'
App 67561 stderr:   from /usr/local/Cellar/passenger/4.0.50/libexec/lib/phusion_passenger/rack/thread_handler_extension.rb:74:in `process_request'
App 67561 stderr:   from /usr/local/Cellar/passenger/4.0.50/libexec/lib/phusion_passenger/request_handler/thread_handler.rb:141:in `accept_and_process_next_request'
App 67561 stderr:   from /usr/local/Cellar/passenger/4.0.50/libexec/lib/phusion_passenger/request_handler/thread_handler.rb:109:in `main_loop'
App 67561 stderr:   from /usr/local/Cellar/passenger/4.0.50/libexec/lib/phusion_passenger/request_handler.rb:455:in `block (3 levels) in start_threads'
2014/08/28 23:02:33 [error] 67559#0: *19 upstream prematurely closed connection while reading response header from upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.1", upstream: "passenger:/tmp/passenger.1.0.67551/generation-0/request:", host: "sockchat.localhost", referrer: "http://sockchat.localhost/"

EDIT: It's not inherently a caching issue -- config.autoload_once_paths has no effect. Looking at the middleware, when config.cache_classes = true, Rack::Lock is removed from the stack, thus Passenger doesn't crash every other time I hit 'refresh.' rack/rack#495 indicates that this deadlock "is most likely caused by one of your middleware dropping the body without calling close on it." It's possible that ActionDispatch::Reloader (also present in the middleware stack when config.cache_classes = false) is dropping the ball somehow... but I'm not smart enough to figure out what's going on.

@FooBarWidget

But it operates very intermittently, crashing passenger constantly, if config.cache_classes = true.

I think you meant config.cache_classes = false here?

@jriesen
jriesen commented Aug 29, 2014

Err... yes, correct. My mistake. I went back and tweaked my post and got mixed up, heh.

@jonathanhoskin
Contributor

@jriesen I don't use Passenger in development so I wouldn't know. I have a separate "staging" and "integration" Rails environments which are mostly clones of the production Rails environment. We run tubesock under Puma in "integration".

@jriesen
jriesen commented Aug 29, 2014

Good point, @jonathanhoskin, thank you. My current Apache+Passenger development environment is basically a holdover from early app testing/planning and I simply hadn't taken the time to revisit it. We're fine with using Puma for development. In all of my testing so far, Passenger (open source edition, even!) works great with tubesock with config.cache_classes = true (test/production), so I'm content.

@FooBarWidget: Thank you for looking into this. Given that it isn't a show-stopper for me or anything, I wouldn't lose any sleep over it; I'm fine with changing my dev envirionment for the time being.

@akshayrawat

I can confirm this is still a problem in development env (with Passenger) where cache_classes = false.

Removing Rack::Lock in development environment fixes it. But I think doing that might cause inconsistent class reloading, as then the requests wouldn't be run under a mutex, causing class reloads to not always be seen by all threads (or be inconsistent in some other way.).

Ideally, this should work without messing with the Rails development env defaults of cache_classes or Rack::Lock.

Any insights into the current state of this issue would be much appreciated.


Passenger Enterprise - 4.0.49, Tubesock - 0.2.5, Websocket - 1.2.1, Rails 4.1.8.

@akshayrawat akshayrawat referenced this issue in websocket-rails/websocket-rails Jan 21, 2015
Closed

Removing the Rack::Lock middleware? #119

@FooBarWidget

Phusion Passenger author here. I've found out the cause of the Rack::Lock problems. It boils down to a difference in Rack socket hijacking behavior bewteen Passenger and Puma.

The story begins with Rack::Lock. The idea of Rack::Lock is to ensure requests are handled serially, in order to prevent code reloading from breaking during development. This is because code reloading and multithreading are fundamentally incompatible. So Rack::Lock grabs the mutex before the request is processed. Rack::Lock releases the mutex in one of two cases:

  1. If the request finishes with a normal body, then Rack::Lock releases the mutex immediately.
  2. If the request finishes with a body that responds to #close, e.g. a File object, then Rack::Lock releases the mutex when #close is called on the body.

Passenger never calls #close on the body when the socket is hijacked, and so the the Rack::Lock mutex is never released, giving rise to the "recursive locking" problem. Puma does, and so Rack::Lock happens to work.

I've looked into how other servers behave. Thin calls #close on the body even after hijacking, though Tubesock doesn't work properly on Thin. Unicorn does not, and so it would suffer from the same problem that Passenger experiences, but Tubesock doesn't work on Unicorn for a totally different reason (namely that in Unicorn, the hijacked IO object is a Rack::Lint::HijackWrapper and Tubesock doesn't like that).

I've also looked into what the Rack specification says, but it's extremely vague. Here is the relevant paragraph:

Response (after headers)

It is also possible to hijack a response after the status and headers have been sent. [...] Servers must ignore the body part of the response tuple when the rack.hijack response API is in use.

So the specification says that servers must ignore the body object when hijacked. But does it mean servers must ignore the body upon a partial hijack (i.e. after headers are sent)? Or does it mean that servers must ALSO ignore the body upon a full hijack? I have no idea, and it's something I need to ask the Rack mailing list.

@FooBarWidget

So you can close this issue now. The Rack::Lock problem must be fixed in either Passenger+Unicorn or Puma+Thin+Rack::Lock, not in Tubesock. Passenger seems to otherwise work fine with Tubesock.

@ngauthier
Owner

Thanks for all the context!

Also, I am completely in support of adding server-specific plugins for tubesock if it comes to that (even though like you say this should be done via the Rack spec). We can follow in Rails's footsteps and have an abstract adapter that follows Rack, then have other adapters subclass from it and implement server-specific differences. Then someone can simply require tubesock then require tubesock/passenger for example.

@ngauthier ngauthier closed this Feb 2, 2015
@FooBarWidget FooBarWidget referenced this issue in phusion/passenger Feb 18, 2015
Closed

Rails app run with Passenger get stuck #1380

@FooBarWidget FooBarWidget added a commit to phusion/passenger that referenced this issue May 29, 2015
@FooBarWidget FooBarWidget Ruby handler: close Rack response body even when socket is hijacked
Fixes the issue documented at ngauthier/tubesock#10 (comment).
e1c0112
@FooBarWidget

I've made a change in Passenger 5.0.9. We now emulate Puma behavior; that is, we close the body even when the socket is hijacked. That should fix the Rack::Lock problems.

Unfortunately, I never heard back from the Rack guys about what behavior is considered official.

@hubot hubot pushed a commit to defunkt/unicorn that referenced this issue Jun 10, 2015
Eric Wong ensure body is closed during hijack
Middlewares such as Rack::Lock (used by Rails) break badly unless
the response body is closed on hijack, so we will close it to follow
the lead of other popular Rack servers.

While it's unclear if there's anybody using rack.hijack with unicorn,
we'll try to emulate the behavior of other servers as much as
possible.

ref: ngauthier/tubesock#10
02a0727
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment