New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with Passenger Enterprise 4.0.14 #10
Comments
I haven't tried it on anything but Puma and Rails. Seems like they have supported it for a while: http://blog.phusion.nl/2013/01/23/the-new-rack-socket-hijacking-api/ Can you run through that tutorial and make sure you have proper rack.hijack support going? After that, you're braving new territory! Let me know how it goes. |
hm, it's interesting i.e. now instead of
I have
and now Puma seems to work fine and Passenger works too. I will do more tests tomorrow and let you know. But can you explain why 'handshake.from_rack env' might not work for Passenger but it works for Puma? Cheers |
I added that specifically to websocket-ruby for tubesock's sake: I think the best thing to do would be to take the env and make a test for the headers and post an issue at websocket-ruby. Then all projects would benefit from it. Nick |
Thanks for that! I've done few tests with Passenger and looks like it's opening connections just fine but then no requests coming through from the clients or from the server. I.e. pings are not coming through. |
Actually, it could be anything in front of your web server. A year ago when I was playing w/ websockets we couldn't use nginx, we had to use haproxy in a special mode to keep the websocket connections alive. Load balancers usually don't keep tcp connections like websockets open like that. I believe nginx has websocket support now, but you may have to compile from scratch. Try without the load balancer (directly connect) just to troubleshoot. If that still doesn't work at least we really can rule out the balancer. What is your balancer by the way? |
No, it's definitely not load balancer. I've tried to connect directly from the same box. So it could be nginx or something else. I've noticed the following code doesn't return anything, it's stuck there on IO.select, which is strange
i.e. https://github.com/ngauthier/tubesock/blob/master/lib/tubesock.rb#L92 Will do some more digging |
Hmm. Just to check, are you on Ruby 2? |
yep, ofc. I always check that everything works with Puma after I've done all my changes. |
Obviously we've built nginx 1.4.2 and passenger enterprise 4.0 and they both have web socket support. |
By the way, it seems that 'websocket-rails' library (which in turn uses Faye WebSocket which seems to use rack hijack but based on event machine) works on Passenger and nginx, and this chat application - https://github.com/themgt/ws42-chat works fine. I will try to find out whether tubesock can be improved based on that info. |
OK cool. Yeah I know the project, but at the time it didn't have On Mon, Sep 30, 2013 at 12:34 PM, fokcep notifications@github.com wrote:
|
Looks like EM should run in the different thread, and if it runs in the same thread it locks up. |
Phusion here. We got a support request from an Enterprise customer a while ago. We got Tubesock working now, and this is what we've found. There are Rack specification violations in both Tubesock and the Websocket gem. Let me first address the freezing problem that you guys have already encountered. The freeze is caused by the fact that WebSocket::Handshake::Server#handshake calls the following code snippet:
Obviously, the intention is to slurp all remaining data on the socket. This code path is not supposed to work because This code path happens to work on Puma because Puma replaces There is also a Rack specification violation in Tubesock itself. From tubesock.rb line 93:
According to the Rack specification, the hijacked socket object is not guaranteed to implement #recvfrom. An indeed, Phusion Passenger does not, while Puma does. Both Phusion Passenger and Puma happen to implement #readpartial so you can use that as a more portable (albeit still non-standard) substitute. Starting from version 4.0.20, Phusion Passenger will implement #recvfrom as well. Finally, we confirm that Tubesock works fine even with the open source version of Phusion Passenger. Enterprise is not necessary. |
Interesting, thanks for that. I don't seem to have a problem calling from_rack after fixing it in a different way, see above. I.e. opening connection now works but sending any subsequent WS message failed i.e. locks up in the LIne 92. Could it be something else in Passenger? A config option or something? |
What do you mean by "locks up"? Isn't it normal that the select call there blocks until the client sends something over the websocket? Replacing the |
The problem is that it blocks forever no matter if the client sends something or not. Have you tested with the open source Passenger as well as with the Enterprise version? |
Yes, it works fine with the open source version. Which demo app are you Sent from my Android phone.
|
Hi @FooBarWidget, thanks for helping out! Sounds like we should file an issue with websocket-ruby to fix up the @FooBarWidget, do you happen to know what the API of |
It's documented in the Rack specification. See section "The Input Stream". Basically, only gets, each, read and rewind are guaranteed. Everything else is vendor-specific. |
OK, looks like we need to revisit the implementation of websocket-ruby and Thanks for your help so far! Nick On Tue, Oct 1, 2013 at 7:01 PM, Hongli Lai notifications@github.com wrote:
|
Hi again Nick, Just a quick question if you know the answer before I post a new issue on
but it seems that it's probably a bad idea i.e. looks like every web socket Thanks Pavel |
I just turn the db pool way up. I like your queuing idea, that's way I wonder what a reasonable amount of maximum connections to a postgresql db Nick
|
You are right, but increasing db pool is not really scalable. So do you On Fri, Oct 18, 2013 at 12:08 PM, Nick Gauthier notifications@github.comwrote:
|
Check out the rails file in the source. I explicitly grab a connection for It's much easier, but less scalable. I think it would make more sense to not hold a connection, then let the Maybe we wrap the on message, on open, on close with a connection, so you Nick
|
When you use ActiveRecord in a new thread, ActiveRecord will checkout a connection from its connection pool if the current thread doesn't have one already. It is then up to you to release that connection with ActiveRecord::Base.clear_active_connections. If you don't then the pool will be stay exhausted. |
Yup. So we would release it immediately after threading, then during the On Fri, Oct 18, 2013 at 8:00 AM, Hongli Lai notifications@github.comwrote:
|
@nick - yes, I can see what you mean, it's in the code. |
I'm not sure if I'm correct, but from my understanding it looks like it might be bug in Passenger. If it blocks on read (which imho it shouldn't) and doesn't support readpartial (...) then it should probably be fixed. All other servers (webrick, mongrel, thin, unicorn, puma and goliath) are working with just |
This is not a bug. It is exactly per the specification, as I've explained in detail in comment 25471793. The whole purpose of
It doesn't work on the other servers, as I've explained in the comment before. It only appears that way. The In imanel/websocket-ruby#19, you mentioned the following:
However, your current code wouldn't do what you intend to do. You intend to read from the socket, but Puma just makes all What do you intend to do with the read call? What is the distinction between draft 76 and draft 75, and what are you doing to detect it? |
So it looks like solution would be to check if rack.hijack_io is nil, if not then read from it using readpartial or read, if not then fallback to rack.input and try readpartial and read, and if all that fill fail then use empty string - is that right? |
I think so, but to be sure I need to understand what you intend to do with that read/readpartial call. What is the distinction between draft 76 and draft 75, and what are you doing to detect it? |
Draft 76 (still most popular) is sending 8 bytes after headers. This is only way to distinguish it from other drafts and we need those bytes to confirm that it's valid websocket connection and send response. |
So what does draft 75 do? Does it not send anything at all after the headers? However, detecting draft 76 by trying to read those 8 bytes is inherently error-prone, no matter which server you're running the app on. If you try to defect the draft version by reading those 8 bytes then it would be impossible to distinguish between the following two situations:
I noticed that draft 75 does not contain the "Sec-WebSocket-Key1" header. Can't you use that instead for version detection? |
hi @FooBarWidget -- would you mind pasting the nginx configuration that enables you to get tubesock working with purely nginx and passenger enterprise? I am only able to get it to work by proxying nginx connection to puma on a specific url such as /websocket (which bypasses passenger altogether). My understanding is that we dont need puma - we can service the websocket entirely with passenger. |
@edwardvalentini It's not a Passenger Enterprise configuration issue. The issue lies in Tubesock and websocket-ruby. I'm trying to work with them to have this resolved properly. @imanel So what do you think about my proposal? Any updates on this? |
Hi - sorry for late reply, I'm very busy those days. The place where you detect version is here: It looks like lack of "Sec-WebSocket-Key1" key should be enough to detect it. Could you prepare pull request? |
Hi, I'm another Passenger (enterprise) user affected by this bug. What is the current status and/or workaround? |
@jonathanhoskin please test it under websocket-ruby 1.2.0 (it was just released) and confirm in this problem still persist. All changes suggested by @FooBarWidget was already implemented there so hopefully it will work now. |
Hi, just confirming that it's now working ok: websocket-ruby 1.2.0 + Passenger 4.0.48 + nginx 1.7.3 + rails 4.0.8 + tubesock 0.2.3. |
The sock-chat demo works correctly for me on OS/X with Passenger 4.0.50 + Nginx 1.6.1 + websocket 1.20 + tubesock 0.2.3 + rails 4.1.5. But it operates very intermittently, crashing passenger constantly, if config.cache_classes = true. At first, I couldn't get it to work: it would crash horribly after every other request. I reproduced the exact versions of everything above (i.e. nginx 1.7.3 and passenger 4.0.48 and everything) and it still failed. Turns out, just like #17, the issue seemed to be with cache reloading; switching to production (or just setting config.cache_classes = true in development.rb) resolves the issue, though inconveniently. (I don't understand why nobody else seems to run into this problem -- doesn't anybody run in development? Then again, I'm probably one of three people in the world trying to use tubesock with Passenger.) In this particular case, it fails up with a deadlock in Rack. Interestingly, it fails in basically the exact same way under Apache 2.2 + Passenger 4.0.50. (Heh, don't ask... my current production site is in Apache+Passenger, so my local dev environment mirrors it.) As suggested in #17 this probably has something to do with connected threads being unhappy about having their classes ganked out from under them. For whatever reason Puma doesn't have this problem; a little testing indicates that open-websocket threads don't pick up development code changes... but they don't crash horribly either. In all cases I'm using rack-1.5.2... and it's failed in the same way on both Rails 4.0.8 and 4.1.5 (with slightly different line numbers in the stack trace for railties etc. naturally). I suspect that makes this a Passenger-specific problem, but I figured I'd follow up here first? If I should have made a new Issue instead of posting it here, I apologize. Here's the relevant nginx error_log lines:
EDIT: It's not inherently a caching issue -- config.autoload_once_paths has no effect. Looking at the middleware, when config.cache_classes = true, Rack::Lock is removed from the stack, thus Passenger doesn't crash every other time I hit 'refresh.' rack/rack#495 indicates that this deadlock "is most likely caused by one of your middleware dropping the body without calling close on it." It's possible that ActionDispatch::Reloader (also present in the middleware stack when config.cache_classes = false) is dropping the ball somehow... but I'm not smart enough to figure out what's going on. |
I think you meant |
Err... yes, correct. My mistake. I went back and tweaked my post and got mixed up, heh. |
@jriesen I don't use Passenger in development so I wouldn't know. I have a separate "staging" and "integration" Rails environments which are mostly clones of the production Rails environment. We run tubesock under Puma in "integration". |
Good point, @jonathanhoskin, thank you. My current Apache+Passenger development environment is basically a holdover from early app testing/planning and I simply hadn't taken the time to revisit it. We're fine with using Puma for development. In all of my testing so far, Passenger (open source edition, even!) works great with tubesock with config.cache_classes = true (test/production), so I'm content. @FooBarWidget: Thank you for looking into this. Given that it isn't a show-stopper for me or anything, I wouldn't lose any sleep over it; I'm fine with changing my dev envirionment for the time being. |
I can confirm this is still a problem in development env (with Passenger) where Removing Ideally, this should work without messing with the Rails development env defaults of Any insights into the current state of this issue would be much appreciated. Passenger Enterprise - 4.0.49, Tubesock - 0.2.5, Websocket - 1.2.1, Rails 4.1.8. |
Phusion Passenger author here. I've found out the cause of the Rack::Lock problems. It boils down to a difference in Rack socket hijacking behavior bewteen Passenger and Puma. The story begins with Rack::Lock. The idea of Rack::Lock is to ensure requests are handled serially, in order to prevent code reloading from breaking during development. This is because code reloading and multithreading are fundamentally incompatible. So Rack::Lock grabs the mutex before the request is processed. Rack::Lock releases the mutex in one of two cases:
Passenger never calls #close on the body when the socket is hijacked, and so the the Rack::Lock mutex is never released, giving rise to the "recursive locking" problem. Puma does, and so Rack::Lock happens to work. I've looked into how other servers behave. Thin calls #close on the body even after hijacking, though Tubesock doesn't work properly on Thin. Unicorn does not, and so it would suffer from the same problem that Passenger experiences, but Tubesock doesn't work on Unicorn for a totally different reason (namely that in Unicorn, the hijacked IO object is a Rack::Lint::HijackWrapper and Tubesock doesn't like that). I've also looked into what the Rack specification says, but it's extremely vague. Here is the relevant paragraph:
So the specification says that servers must ignore the body object when hijacked. But does it mean servers must ignore the body upon a partial hijack (i.e. after headers are sent)? Or does it mean that servers must ALSO ignore the body upon a full hijack? I have no idea, and it's something I need to ask the Rack mailing list. |
My Rack mailing list question: https://groups.google.com/forum/#!topic/rack-devel/afpYGMoFP0c |
So you can close this issue now. The Rack::Lock problem must be fixed in either Passenger+Unicorn or Puma+Thin+Rack::Lock, not in Tubesock. Passenger seems to otherwise work fine with Tubesock. |
Thanks for all the context! Also, I am completely in support of adding server-specific plugins for tubesock if it comes to that (even though like you say this should be done via the Rack spec). We can follow in Rails's footsteps and have an abstract adapter that follows Rack, then have other adapters subclass from it and implement server-specific differences. Then someone can simply require tubesock then require tubesock/passenger for example. |
Fixes the issue documented at ngauthier/tubesock#10 (comment).
I've made a change in Passenger 5.0.9. We now emulate Puma behavior; that is, we close the body even when the socket is hijacked. That should fix the Rack::Lock problems. Unfortunately, I never heard back from the Rack guys about what behavior is considered official. |
Middlewares such as Rack::Lock (used by Rails) break badly unless the response body is closed on hijack, so we will close it to follow the lead of other popular Rack servers. While it's unclear if there's anybody using rack.hijack with unicorn, we'll try to emulate the behavior of other servers as much as possible. ref: ngauthier/tubesock#10
Just wondering if someone had issues running tubesock with the Passenger Enterprise Server. Could be nothing to do with tubesock at all. It supposed to run ok. But getting a long delay when calling the following bit of code from tubesock:
it takes around 50 seconds to execute this method and eventually it fails to get WS connection.
Dump of env['rack.hijack']
Proc:0x00000004505060@/opt/passenger/passenger-enterprise-server-4.0.14/lib/phusion_passenger/rack/thread_handler_extension.rb:53 (lambda)
any ideas?
Thanks
The text was updated successfully, but these errors were encountered: