New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide a configuration option for buffering responses in order to better cope with slow clients #519
Comments
From honglilai on November 22, 2009 01:55:19 It looks like Apache is stuck while writing the response back to the web server. What's the value of your TimeOut config option? I think you need to decrease it, not |
From dustym on November 22, 2009 12:04:37 Thanks for the response Hongli, TimeOut is 30. I think that's pretty reasonable. I only increase it to recreate the Which is to say a low TimeOut isn't an adequate fix as it doesn't address the real Do you have any recommendations on how I might debug this issue further? I forgot to [ pid=12365 file=ext/apache2/Hooks.cpp:654 time=2009-11-22 15:01:58.440 ]: We've traced that to code/interaction within our app where in certain situations the |
From honglilai on November 23, 2009 14:28:54 Well looking at your backtraces it's definitely slow HTTP clients that are causing
into: // return APR_EAGAIN;
Check whether this helps. |
From travisdbell on November 24, 2009 15:37:45 We are having the EXACT same problem. I'll give the source change a try and report back. |
From andy.paul.007 on November 30, 2009 08:23:16 I've been getting a lot of these error messages as well. Im not sure if they're Im going to try commenting the code line and see how that goes, although Im not When you guys say "Passenger hangs", does your web app stay available or does the thx cheers |
From honglilai on November 30, 2009 12:41:16 andy.paul.007, your issue is unrelated, but if you get "cannot fork process" then |
From dustym on November 30, 2009 13:14:33 Just to give you guys an update, I patched passenger on one of our web servers in This is fine and we plan on rolling passenger across our cluster soon, as we've Hong Li, is it possible there could be an issue in the interaction somewhere between Additionally, do you think it would be possible to provide a buffering enable/disable Thanks again for the time and help. |
From honglilai on November 30, 2009 13:25:22 Javascript redirection: in that case the 'Stop' warning is completely legit and you Buffering: yes providing an option for buffering would be a good idea. I've marked Summary: Provide a configuration option for buffering responses in order to better cope with slow clients |
From honglilai on November 15, 2010 05:27:31 The Nginx version already provides such an option. Next up is Apache. Labels: Milestone-3.0.2 |
From honglilai on November 15, 2010 05:28:25 It should be noted that buffering responses conflicts with apps that try to stream large responses. At least without some very intelligent form of buffering such as implemented by mod_accel. This fact should be documented. |
From honglilai on November 15, 2010 06:10:05 Issue 396 has been merged into this issue. |
From Martin.Kammerlander on November 24, 2010 08:33:44 We had the same issue: [ pid=16687 file=ext/apache2/Hooks.cpp:654 time=2009-11-10 11:36:25.519 ]:
|
From ryoqun on November 11, 2011 01:32:59 Hi, I faced this issue at my deployed server. Applying Comment #3's patch, the slow clients' problem is fixed. Thank you for the info! So, I thought it is a good idea to make this configurable. Here is the patch: ryoqun@247a719 I'll also submit a pull request shortly. Can anybody review this patch? regards, |
From ryoqun on November 15, 2011 20:15:52 After since, I improved my patch. Here is a list of what the improvements are:
|
From ryoqun on November 15, 2011 20:22:12 I updated the description in the pull request as well. Well, I forgot to mention that this patch changes the default behavior of passenger to buffering responses from NOT buffering responses. |
From honglilai on November 23, 2011 09:01:18 Labels: -Milestone-3.0.2 Milestone-3.0.10 |
From dustym on November 20, 2009 02:10:45
What steps will reproduce the problem? 1. Have only seen this in production. What is the expected output? What do you see instead? Passenger freezes. What version of Phusion Passenger are you using? Which version of Rails? On what operating system? Passenger 2.2.5
Rails 2.2.2 Please provide any additional information below. We are running 64 bit Debian Etch. Each box has 8 gigs of ram, 4 of which
are dedicated to passenger and apache. Here is our passenger config:
PassengerRoot /usr/local/lib/ruby/gems/1.8/gems/passenger-2.2.5
PassengerRuby /usr/local/bin/ruby
PassengerUseGlobalQueue on
PassengerUserSwitching off
PassengerDefaultUser www-data
PassengerPoolIdleTime 0
RailsFrameworkSpawnerIdleTime 0
RailsAppSpawnerIdleTime 0
RailsSpawnMethod smart
PassengerMaxPoolSize 10
PassengerLogLevel 3
We are running REE 1.8.6:
ruby 1.8.6 (2008-08-11 patchlevel 287) [x86_64-linux]
Ruby Enterprise Edition 20090610
Our server header:
Server: Apache/2.2.3 (Debian) Phusion_Passenger/2.2.5
When we first put passenger into production, a couple of times a day we
would see requests back up in the global queue to the point that apache
would stop accepting connections due to MaxClients. This was with a timeout
of 300s in apache. We dug around the issue tracker and found this bug ( https://code.google.com/p/phusion-passenger/issues/detail?id=318 ) which
suggested disabling the global queue. We also lowered the apache timeout.
After that we would periodically see a backup at the handler level that
would last for a brief period of time and would only affect a couple of
handlers, but the other handlers would continue to process requests
quickly. Overall the systems seems more stable with a 30s timeout and we've
since switched back to the global queueing. But we are wary of what is
going on, apache might just be "unhanging" passenger when it's frozen by
dropping the connection. Indeed when passenger does occasionally completely
hang, we'll see a burst of these errors:
[ pid=16687 file=ext/apache2/Hooks.cpp:654 time=2009-11-10 11:36:25.519 ]:
Either the vistor clicked on the 'Stop' button in the web browser, or the
visitor's connection has stalled and couldn't receive the data that Apache
is sending to it. As a result, you will probably see a 'Broken Pipe' error
in this log file. Please ignore it, this is normal. You might also want to
increase Apache's TimeOut configuration option if you experience this
problem often.
Generally the box never swaps, and we kill rails handlers that go above a
certain memory threshold. Load is usually 1.5 to 5 on a quad core box.
At some point I found this support request for New Relic's RPM plugin: https://newrelic.tenderapp.com/discussions/support/1204-timeout-issues-should-switch-to-systemtimer-timeout-library The issue explained in that support request is pretty much identical to
ours. We also had the new relic plugin installed, but we've since
completely removed it and the problem persists.
When I send a kill -3 to the passenger processes during a freeze I'll get a
dump like that in the attached file kill_3.txt.
I installed Amit Gupta's gdb.rb ( http://github.com/tmm1/gdb.rb ) and
modified it to connect to each passenger related process (including
spawners) and output each thread's backtrace. This can be found in the
attached file gdb_rb.txt. I don't understand what's going on completely in
those backtraces, but I believe the first 3 sets of backtraces (split by
the lines DEBUGGING ) are the Passenger spawn server, Passenger
FrameworkSpawner: 2.2.2 and the Passenger ApplicationSpawner. The rest are
rails handlers. Seven of which are in this state:
node_fcall select in
/usr/local/lib/ruby/gems/1.8/gems/passenger-2.2.5/lib/phusion_passenger/abstract_request_handler.rb:367
The other three (pids 29327,29368,29247) are in this state:
node_call write in
/usr/local/lib/ruby/gems/1.8/gems/actionpack-2.2.2/lib/action_controller/cgi_process.rb:176
When I took this snapshot, the global queue had something like 450 backed
up connections and all of the processes in passenger-status reported to
be working on 1 session.
Based on Ludwig's notes in the new relic bug report above I'm leaning
toward this being a blocking IO issue, but I can't be sure.
If you guys need any other instrumentation or expirements done I can
somewhat consistently recreate the issue by upping the Apache Timeout value
to 300s on one of our cluster nodes during peak traffic. At some point
passenger will freeze.
Thanks for the help.
Attachment: kill_3.txt gdb_rb.txt
Original issue: http://code.google.com/p/phusion-passenger/issues/detail?id=419
The text was updated successfully, but these errors were encountered: