Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constantly getting error: Exception Errno::EPIPE in Passenger RequestHandler (Broken pipe) - Apache stopped forwarding the backend's response ... #535

Closed
FooBarWidget opened this issue May 29, 2014 · 46 comments

Comments

@FooBarWidget
Copy link
Member

From thomaswittold on December 09, 2009 17:12:29

I am using Passenger 2.2.7 with Apache/2.2.14 on an Gentoo Linux 2.6.21-xen
x86_64 box. Apache is compiled with several modules, including SSL, and the
xsendfile module ( http://tn123.ath.cx/mod_xsendfile/ ). I am delivering a
Ruby On Rails Application using MRI 1.8.7 patchlevel 174.

Appearently the Rails Application delivers the pages correctly, but I keep
on getting this Broken Pipe Error with every request in my error_log:

[ pid=15779 file=ext/apache2/Hooks.cpp:658 time=2009-12-09 17:08:03.124 ]:
Apache stopped forwarding the backend's response, even though the HTTP
client did not close the connection. Is this an Apache bug?
*** Exception Errno::EPIPE in Passenger RequestHandler (Broken pipe)
(process 17153):
from
/usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/rack/request_handler.rb:112:in
write' from /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/rack/request_handler.rb:112:in process_request'
from
/usr/lib64/ruby/gems/1.8/gems/actionpack-2.3.5/lib/action_controller/string_coercion.rb:10:in
each' from /usr/lib64/ruby/gems/1.8/gems/actionpack-2.3.5/lib/action_controller/response.rb:156:in each'
from
/usr/lib64/ruby/gems/1.8/gems/actionpack-2.3.5/lib/action_controller/string_coercion.rb:9:in
each' from /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/rack/request_handler.rb:111:in process_request'
from
/usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/abstract_request_handler.rb:207:in
main_loop' from /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/railz/application_spawner.rb:374:in start_request_handler'
from
/usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/railz/application_spawner.rb:332:in
handle_spawn_application' from /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/utils.rb:184:in safe_fork'
from
/usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/railz/application_spawner.rb:330:in
handle_spawn_application' from /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/abstract_server.rb:352:in send'
from
/usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/abstract_server.rb:352:in
main_loop' from /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/abstract_server.rb:196:in start_synchronously'
from
/usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/abstract_server.rb:163:in
start' from /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/railz/application_spawner.rb:209:in start'
from
/usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/spawn_manager.rb:262:in
spawn_rails_application' from /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/abstract_server_collection.rb:126:in lookup_or_add'
from
/usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/spawn_manager.rb:256:in
spawn_rails_application' from /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/abstract_server_collection.rb:80:in synchronize'
from
/usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/abstract_server_collection.rb:79:in
synchronize' from /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/spawn_manager.rb:255:in spawn_rails_application'
from
/usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/spawn_manager.rb:154:in
spawn_application' from /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/spawn_manager.rb:287:in handle_spawn_application'
from
/usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/abstract_server.rb:352:in
__send__' from /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/abstract_server.rb:352:in main_loop'
from
/usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/abstract_server.rb:196:in
`start_synchronously'
from
/usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/bin/passenger-spawn-server:61

Original issue: http://code.google.com/p/phusion-passenger/issues/detail?id=435

@FooBarWidget
Copy link
Member Author

From misi@planet-punk.de on January 01, 2010 06:20:29

Same here
Debian 5, 2.6.26-2-amd64
Rails 2.3.5
Passenger 2.2.8
ruby 1.8.7 (2009-06-12 patchlevel 174) [x86_64-linux], MBARI 0x6770, Ruby Enterprise Edition 2009.10

@FooBarWidget
Copy link
Member Author

From born70s on January 02, 2010 23:55:47

I'm seeing same error.

CentOS 5.2 x86_64
Rails 2.3.5
Passenger 2.2.8
ruby 1.9.1p376 (2009-12-07 revision 26041 ) [x86_64-linux]

Not sure if caused by this, I have to bounce Apache couple of times a day because the
number of apache processes keeps increasing and eventually stops responding.

@FooBarWidget
Copy link
Member Author

From honglilai on January 27, 2010 01:42:20

Does increasing the web server's maximum file descriptor limit work?

@FooBarWidget
Copy link
Member Author

From jason.lapier on February 05, 2010 13:33:01

Also seeing this, and according to ulimit, I'm already at unlimited file descriptors.

Linux ubuntu 2.6.24-26-server #1 SMP Tue Dec 1 19:19:20 UTC 2009 i686 GNU/Linux
Rails 2.3.5
Passenger 2.2.9
ruby 1.8.6 (2007-09-24 patchlevel 111) [i486-linux]
Apache2-mpm-prefork package - version 2.2.8-1ubuntu0.14

@FooBarWidget
Copy link
Member Author

From honglilai on February 07, 2010 03:20:43

I suspect that this problem might have something to do with a long-standing Safari
bug: https://bugs.webkit.org/show_bug.cgi?id=5760 Could you disable keep-alive and check whether the problem still occurs?

@FooBarWidget
Copy link
Member Author

From steve.quinlan on February 08, 2010 07:20:41

Readers of this comment may be interested in https://code.google.com/p/phusion-passenger/issues/detail id=378 and the work-arounds mentioned in it.

@FooBarWidget
Copy link
Member Author

From honglilai on February 10, 2010 02:11:29

Issue 459 has been merged into this issue.

@FooBarWidget
Copy link
Member Author

From honglilai on February 10, 2010 02:13:47

There are quite a lot of people reporting similar problems, though so far I've been
completely unable to reproduce it locally. The observed behavior also doesn't make
any sense, it's as if the kernel suddenly decided to close the socket between the
different Passenger processes for no apparently good reason. I've posted a question
on StackOverflow in the hope that someone else might know more about this: http://stackoverflow.com/questions/2235938/what-can-cause-an-spontaneous-epipe-error-without-either-end-calling-close-or-c

@FooBarWidget
Copy link
Member Author

From honglilai on February 10, 2010 03:05:16

Also asked here: http://www.developerweb.net/forum/showthread.php?p=28779#post28779

@FooBarWidget
Copy link
Member Author

From kent.thomas@medoraco.com on February 10, 2010 06:21:36

Would a dtruss of the passenger or the root apache process help at all?

@FooBarWidget
Copy link
Member Author

From honglilai on February 10, 2010 07:01:53

You can give it a try. Please try to keep the data as small as possible.

If there's a way for me to reproduce the problem locally then that would be even better.

@FooBarWidget
Copy link
Member Author

From kent.thomas@medoraco.com on February 10, 2010 07:56:05

The way I can reliably produce the errors is to do multiple refreshes of one page in the app. Simply hitting the
keyboard shortcut (command+r for mac)(f5 for windows) about 20 to 30 times in a row will produce the error.

Just a curious thought, but is there something in the way that mod_proxy handles their requests to apache that
dramatically differs from passenger?

@FooBarWidget
Copy link
Member Author

From honglilai on February 10, 2010 08:33:42

Not that I know of.

@FooBarWidget
Copy link
Member Author

From kent.thomas@medoraco.com on February 10, 2010 08:39:47

I should have specified in my last comment that hitting the keyboard shortcut for the refresh should be done in
rapid succession.

@FooBarWidget
Copy link
Member Author

From honglilai on February 10, 2010 13:26:52

Actually in your case that behavior is normal. If you refresh the browser too
quickly, and the browser was sending a request at the time the refresh button was
clicked, then the browser will abort that request and continue with the refresh,
causing an EPIPE error message.

@FooBarWidget
Copy link
Member Author

From kent.thomas@medoraco.com on February 10, 2010 13:36:04

Ok, I've been testing further. Here's another way I can reliably generate the errors. ab -n 200 -c 4 http://my-
rails-site.com
Of corse I test against my own rails app site. This particular test will generate at last 3 errors in the log file.
Could we get a way to see some more debug logging from passenger and REE?

@FooBarWidget
Copy link
Member Author

From honglilai on February 11, 2010 02:16:46

Issue 378 has been merged into this issue.

@FooBarWidget
Copy link
Member Author

From honglilai on February 11, 2010 02:21:38

A summary of what has been found so far:

  • Some EPIPE errors are apparently caused by a kernel bug in Mac OS X. Passenger 3
    solves this problem by using alternative facilities that do not trigger the kernel
    bug. This problem only occurs for those who are running Phusion Passenger on Mac OS X.
  • Some EPIPE errors are caused by low file descriptor limits. At least one person has
    reported that raising the limit helped.
  • Some EPIPE errors are caused by a long-standing Safari bug: https://bugs.webkit.org/show_bug.cgi?id=5760 . This can be worked around by disabling
    keep-alive in the web server.
  • Some EPIPE errors are caused by buggy routers, load balancers and stuff.
  • Some EPIPE errors are legit, i.e. the browser really aborted the connections.
  • The cause of the rest is still unknown.

@FooBarWidget
Copy link
Member Author

From honglilai on February 11, 2010 02:26:18

I have a patch which may help for problems that fall in the "rest" category: http://github.com/FooBarWidget/passenger/commit/c056b19d2cd8f754efe88aeb208dc54e15fe224b Does this work?

Please note that this patch will likely have no effect on OS X because of the kernel
bug, which is not fixed until Passenger 3.

@FooBarWidget
Copy link
Member Author

From steve.quinlan on February 11, 2010 05:05:53

Just confirming that disabling keep-alive in nginx seems to have solved my problem. App has been running 3
days without a problem (so far!). Thanks for the suggestion @honglilai and the work in dissecting this defect

@FooBarWidget
Copy link
Member Author

From willieabrams on February 11, 2010 07:34:44

@honglilai Can you provide details on the OS X kernel bug? Any link or pointer would be helpful. We can file tech
incidents with Apple to get it investigated.

Also, when will Passenger 3 be out? We run entirely on OS X and this bug is biting us all the time.

@FooBarWidget
Copy link
Member Author

From honglilai on February 11, 2010 08:39:00

We hope to be able to post more about Passenger 3 in a month.

As for the OS X kernel bug, the problem occurs in the following setup:

  1. Given two processes, A and B, connected to each other via a Unix domain socket.
  2. Given a process C, which listens on either a Unix domain socket server or a TCP
    server.
  3. A sends a request to B, and as a result B connects to C.
  4. The client socket that B obtained is sent to A via Unix domain socket file
    descriptor passing.
  5. A uses this passed file descriptor to communicate with C. Let's call this file
    descriptor X. Most of the time this works, but sometimes a read() call to X returns
    0, even though C did not close the connection. Retrying the same read() on X in a
    busy loop makes it work again later.

Furthermore:

  • The read() problem occurs the most often when C is listening on a Unix socket. It
    occurs less often when C is listening on a TCP socket. In Phusion Passenger 2, C
    listens on a TCP socket when running on OS X in order to minimize the impact of this
    problem.
  • The read() problem mostly occurs during benchmarking. About 2 in 1000 requests fail.
  • Passenger 3 works around this problem by the file descriptor passing routines in
    step (4). Instead, A now directly connects to C.

@FooBarWidget
Copy link
Member Author

From willieabrams on February 11, 2010 14:28:33

We see this particular error across our application server cluster hundreds to thousands times per day per server.
If you need to see it at work, send email to my name here @gmail.com.

@FooBarWidget
Copy link
Member Author

From honglilai on February 12, 2010 01:03:29

So your cluster is running on OS X? And just curious, which website is it?

@FooBarWidget
Copy link
Member Author

From willieabrams on February 12, 2010 07:15:55

http://www.vitalsource.com/ http://store.vitalsource.com/ (and several other white label stores like textbooks.vitalsource.com) http://online.vitalsource.com/ (and several other white label versions of this Bookshelf client)
notes.vitalbook.com (no UI, just services - heavily used sync server used by Bookshelf Mac and Windows clients)

Currently, we have 10 or so Xserves in production, we plan to scale back to 6 or so as some of our pilot project
traffic gets more predictable. We run MySQL on OS X as well.

@FooBarWidget
Copy link
Member Author

From willieabrams on February 12, 2010 07:18:30

I should add that store, online and notes run under passenger while www runs under mongrel still. We have
some other less public sites on that stack as well running under passenger, too.

@FooBarWidget
Copy link
Member Author

From honglilai on February 14, 2010 04:47:31

Phusion Passenger for Nginx does not suffer from the kernel bug because it uses
different mechanisms. You can try switching to Nginx for now.

@FooBarWidget
Copy link
Member Author

From vitalaaron on February 18, 2010 00:11:41

@honglilai

This reply is in reference to your question in issue 378 , which was merged with this
issue.

For reference, my setup is Ubuntu 8.04, Passenger 2.2.5, Rails 2.1.2,
nginx/0.7.61.

To answer your question from issue 378 , yes they are the same - multiple Application
Spawners for the same app have been verified to be present at the same time as the
problem.

To prevent this from occurring I've been running this every hour:

kill ps -eo pid,args | grep "Passenger spawn server" | grep -v grep | awk '{print $1}'
touch /opt/nginx/html/mx/tmp/restart.txt

This seemed to be working well, but sometime in the last hour the problem started
again ("upstream prematurely closed connection while reading response header from
upstream" for every single request). Running the above script manually (vs. cron)
fixed the problem.

@FooBarWidget
Copy link
Member Author

From steve.quinlan on February 18, 2010 01:13:35

@vitalaaron have you tried setting keep-alive to 0 in nginx.conf? Doing so solved my problem. If you try it
then kill nginx and restart it rather than an nginx reload or passenger restart.

@FooBarWidget
Copy link
Member Author

From vitalaaron on February 18, 2010 10:34:41

@steve.quinlan I hadn't tried altering the keep-alive setting, but I just made
change. Hopefully this will take care of the problem, at least until the ultimate
cause is determined. Thanks.

@FooBarWidget
Copy link
Member Author

From vitalaaron on February 19, 2010 22:14:18

@steve.quinlan & @honglilai

I set keep-alive to 0 and killed/started nginx the other day as you stated. The
problem, however, returned tonight (every request was returning the previously
mentioned error until the cron job ran to remedy the problem).

Unfortunately, this is a high-traffic website monetized solely by advertisers and I
cannot continue to experiment to work around this issue (can't risk more downtime and
upset users). Until this is resolved, I'm going to have to move back to a Thin setup.
Sorry that I cannot be of any more help :(

@FooBarWidget
Copy link
Member Author

From honglilai on February 21, 2010 07:54:48

Issue 461 has been merged into this issue.

@FooBarWidget
Copy link
Member Author

From naimlissone on February 24, 2010 09:19:39

Are we any closer to finding a solution to this issue?

@FooBarWidget
Copy link
Member Author

From vitalaaron on February 25, 2010 03:17:09

FYI - I am still running Passenger on my development server and got the same error
with the both latest version of Nginx (0.7.64) and Passenger (2.2.10) installed.

@FooBarWidget
Copy link
Member Author

From pierre.y on March 03, 2010 00:29:51

Debian Lenny / 2.6.26-2-amd64
Apache 2.2.9-10+lenny6 / Timeout 1200
Ruby 1.8.7 (2008-08-11 patchlevel 72) [x86_64-linux]
Passenger 2.2.5 and 2.2.10
Rails 2.3.4 and 2.3.5

Something interresting : with passenger 2.2.5 the website is still alive while with
passenger 2.2.10 the website becomes unreachable every 5 minutes and I have to
restart Apache.

@FooBarWidget
Copy link
Member Author

From honglilai on March 03, 2010 02:03:58

pierre.y: that's a regression in 2.2.10: http://groups.google.com/group/phusion- passenger/t/d5bb2f17c8446ea0?hl=en
It's got nothing to do with this issue. I've already posted a patch which needs
confirmation.

@FooBarWidget
Copy link
Member Author

From kivanio on March 10, 2010 04:58:03

i'm having this issue in 2.2.11:

pid=25077 file=ext/apache2/Hooks.cpp:656 time=2010-03-10 08:36:41.715 ]:
Either the vistor clicked on the 'Stop' button in the web browser, or the visitor's connection has stalled and
couldn't receive the data that Apache is sending to it. As a result, you will probably see a 'Broken Pipe' error in
this log file. Please ignore it, this is normal. You might also want to increase Apache's TimeOut configuration
option if you experience this problem often.
*** Exception Errno::EPIPE in Passenger RequestHandler (Broken pipe) (process 24433):
from /usr/lib/ruby/gems/1.8/gems/passenger-
2.2.11/lib/phusion_passenger/rack/request_handler.rb:109:in write' from /usr/lib/ruby/gems/1.8/gems/passenger- 2.2.11/lib/phusion_passenger/rack/request_handler.rb:109:inprocess_request'
from /usr/lib/ruby/gems/1.8/gems/actionpack-2.3.4/lib/action_controller/response.rb:155:in `each'

Opensuse 11.2
Server version: Apache/2.2.13 (Linux/SUSE)

@FooBarWidget
Copy link
Member Author

From cyn0nrautha on April 08, 2010 10:12:13

I'm also having this issue in 2.2.11 and 2.2.10 -- I know I can safely rollback to
2.2.2 without having this issue... I see it at startup and for a while everything
seems fine until 3-4 hours later and I come back to see 40-50+ apache processes and at
that point the site is not responding

my ulimit is default of 1024 but I see no reason why each apache proc. would need more
than that

@FooBarWidget
Copy link
Member Author

From christophe.lucas on May 03, 2010 10:47:50

I am getting the same errors on Centos, Nginx 0.8.36, Ruby Enterprise Edition and passenger 2.2.11. I only see it
when I use an ELB in front of Centos instances and when passenger starts queueing.

@FooBarWidget
Copy link
Member Author

From bogdan.ionescu on July 22, 2010 17:28:37

I have been getting this error for at least one year, on various versions of nginx and passenger.
It happens on Centos and it happens on Fedora.
At the moment I am using passenger 2.2.15 and nginx 0.7.67
I am just mystified how a problem that has been reported for more than 18 months has still not been reproduced, since to me it is shocking at the moment that some people do not have it.

@FooBarWidget
Copy link
Member Author

From steve.quinlan on July 22, 2010 23:20:26

@bogdan.ionescu - I temporarily changed my apps to use nginx + thin. I think Passenger 3 once it comes out will solve the problem.

@FooBarWidget
Copy link
Member Author

From bogdan.ionescu on July 23, 2010 06:27:59

@steve.quinlan thanks for your suggestion, I'm giving thin a try since the stability of the application is more important than the elusive memory footprint or autospawning advantages.

@FooBarWidget
Copy link
Member Author

From sahil.cooner on September 09, 2010 13:48:30

Running the following ruby
RubyGems Environment:

  • RUBYGEMS VERSION: 1.3.7
  • RUBY VERSION: 1.8.7 (2010-04-19 patchlevel 253) [x86_64-linux]
  • INSTALLATION DIRECTORY: /opt/ruby-enterprise/lib/ruby/gems/1.8
  • RUBY EXECUTABLE: /opt/ruby-enterprise/bin/ruby
  • EXECUTABLE DIRECTORY: /opt/ruby-enterprise/bin
  • RUBYGEMS PLATFORMS:
    • ruby
    • x86_64-linux
  • GEM PATHS:
    • /opt/ruby-enterprise/lib/ruby/gems/1.8
    • /root/.gem/ruby/1.8
  • GEM CONFIGURATION:
    • :update_sources => true
    • :verbose => true
    • :benchmark => false
    • :backtrace => false
    • :bulk_threshold => 1000
    • :sources => [" http://gems.rubyforge.org/ ", " http://gemcutter.org ", "http://gems.github.com"
  • REMOTE SOURCES:
    • http://gems.rubyforge.org/ - http://gemcutter.org - http://gems.github.com Apache
      Server version: Apache/2.2.14 (Ubuntu)
      Server built: Apr 13 2010 20:22:19
      Server's Module Magic Number: 20051115:23
      Server loaded: APR 1.3.8, APR-Util 1.3.9
      Compiled using: APR 1.3.8, APR-Util 1.3.9
      Architecture: 64-bit
      Server MPM: Worker
      threaded: yes (fixed thread count)
      forked: yes (variable process count)
      Server compiled with....
      -D APACHE_MPM_DIR="server/mpm/worker"
      -D APR_HAS_SENDFILE
      -D APR_HAS_MMAP
      -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled)
      -D APR_USE_SYSVSEM_SERIALIZE
      -D APR_USE_PTHREAD_SERIALIZE
      -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT
      -D APR_HAS_OTHER_CHILD
      -D AP_HAVE_RELIABLE_PIPED_LOGS
      -D DYNAMIC_MODULE_LIMIT=128
      -D HTTPD_ROOT=""
      -D SUEXEC_BIN="/usr/lib/apache2/suexec"
      -D DEFAULT_PIDLOG="/var/run/apache2.pid"
      -D DEFAULT_SCOREBOARD="logs/apache_runtime_status"
      -D DEFAULT_ERRORLOG="logs/error_log"
      -D AP_TYPES_CONFIG_FILE="/etc/apache2/mime.types"
      -D SERVER_CONFIG_FILE="/etc/apache2/apache2.conf"

and also running into the same issue. I haven't really dug into this yet, if anyone found a solution would love to hear it. If not I'll post back with any new findings :).

@FooBarWidget
Copy link
Member Author

From sahil.cooner on September 10, 2010 00:09:29

The resolution is to turn of the KeepAlive On => Off and you won't experience the crash. This rectifies the issue with passenger dying, but is still a bug.

@FooBarWidget
Copy link
Member Author

From daniel.thor on September 10, 2010 01:01:33

Changing the KeepAlive directive never worked for us. After several days of tinkering we have stopped trying to figure this one out and we've backed down to mongrel cluster again pending the release of Passenger 3.

Daniel

@FooBarWidget
Copy link
Member Author

From honglilai on September 15, 2010 11:59:47

Passenger 3 should fix this. I'm closing the bug for now. Feel free to comment or open a new issue if anyone still has problems.

Status: Fixed
Labels: Milestone-3.0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant