debug read error on connection #70

Open
sorinv opened this Issue Oct 30, 2011 · 109 comments

Projects

None yet
@sorinv
sorinv commented Oct 30, 2011

Hi there,

I am getting a lot of "read error on connection" errors. Server is under load, but not a lot. I traced this to always happen here:

  • library.c
  • redis_sock_read
  • php_stream_gets -> this returns NULL, so it triggers the exception.

Any way I can further debug this to figure out why it happens?

Thanks,
Sorin

@nicolasff
Contributor

Hello,

Could you check your timeout settings, and try changing them to higher value?
Try also looking at the number of clients that Redis allows, you might be hitting a limit.

Nicolas

On 30 oct. 2011, at 19:14, sorinvreply@reply.github.com wrote:

Hi there,

I am getting a lot of "read error on connection" errors. Server is under load, but not a lot. I traced this to always happen here:

  • library.c
  • redis_sock_read
  • php_stream_gets -> this returns NULL, so it triggers the exception.

Any way I can further debug this to figure out why it happens?

Thanks,
Sorin

Reply to this email directly or view it on GitHub:
#70

@sorinv
sorinv commented Oct 31, 2011

Hey Nicolas,

Thanks for the reply.

Timeout is set to 5 seconds (which is supposed to be like forever for redis). Number of clients is set to unlimited.

The server is very large and there are less than 200 clients connected at the same time. The redis process never maxes out a processor, so I assume it's not worked out too hard to issue timeouts.

I am using PHP-FPM and persistent connects (supposedly it'll save the connect time, so a bit faster). Can this be an issue?

Thanks,
Sorin

@rmarscher

Related to this... I was confused about the timeout setting in connect/pconnect. I thought it pertained only to obtaining a connection to the server. But it turns out it sets the timeout for all requests. I set it to 2.5 seconds and was wondering why I kept getting the "read error on connection" exception thrown. The server also has it's own timeout which will cause the exception to be thrown when it is reached.

@nicolasff
Contributor

@sorinv, I'm not sure how PHP-FPM is working internally. If it doesn't reuse processes, there's no advantage to using pconnect instead of connect. pconnect is useful with Apache in prefork mode, where processes are reused a number of times.

php_stream_gets returning NULL means that we can't read from the server, either because there is a connection issue, a timeout, or that the server closed the connection.

@sorinv
sorinv commented Nov 1, 2011

The PHP-FPM does reuse processes, so pconnect should be a gain (albeit small).

On the php_stream_get returning null, I assume no way to know exactly what the cause is? If the server closed the connection, I would know to look at the server side, for example.

@bobrik
bobrik commented Nov 1, 2011

@sorinv, what ulimit -n says from redis user?

Btw, providing more meaningful exceptions if a big plus for debugging.

@sorinv
sorinv commented Nov 1, 2011

@bobrik, that command shows 100000, so open files should not be an issue.

@bobrik
bobrik commented Nov 1, 2011

+1 for this issue, we have some errors like this and have no idea what's the reason.

@michael-grunder
Member

This specific error has plagued us as well, and it is incredibly hard to track down. I decided to hack at it a bit tonight and may have found something.

You can cause this error to happen by issuing a $connection->subscribe() command on a channel where you don't publish anything. After a certain amount of time it'll timeout (it's idle) and you'll get this error. This very simple program will do this for you:

$start = time();

$r = new Redis();
$r->connect('localhost',6379, 0);

try {
    $r->subscribe(array('test-channel'),'SubCallback');
} catch(Exception $ex) {
    $elapsed = time() - $start;

    echo "Exception: " . $ex->getMessage() . "\n";
    echo "Elapsed: " . $elapsed . "\n";
}

function SubCallback($objConnection, $strChannel, $message) {
    echo "Channel: $strChannel, message: $message\n";   
}

When I run this, I get the "read error on connection" message at 60 seconds, every time. My timeout in redis.conf is zero.
If I set the number to something huge (like 300 seconds), then it'll timeout at 300 seconds. Perhaps somehow passing zero is overridden as 60 seconds somewhere?

@nicolasff: I just wanted to mention how awesome phpredis is. Truly fantastic! :)

@nicolasff
Contributor

@michael-grunder, could you please try setting default_socket_timeout to 0 in php.ini? phpredis uses the php sockets, which will timeout after a set amount of time.

@daviddoran

For me, setting default_socket_timeout = 0 made the connection time out immediately.

I've been using default_socket_timeout = -1 and that seems to be working as "no timeout".

@RVN-BR
RVN-BR commented Feb 7, 2012

+1 on this issue...

When using a long timeout, occasionally the redis server is screwed up and the php lags trying to connect (when it should error out and go to the next server).... a short timeout starts showing up read error.... catch 22?

@wa0x6e
wa0x6e commented Feb 20, 2012

+1

There's 2 pattern in OS X crash report :

Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: 0x000000000000000d, 0x0000000000000000

VM Regions Near 0:
-->
__TEXT 0000000109a21000-0000000109a5a000 [ 228K] r-x/rwx SM=COW /opt/local/apache2/*

Application Specific Information:
objc[381]: garbage collection is OFF

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 libphp5.so 0x000000010a297df0 _php_stream_set_option + 16
1 libphp5.so 0x000000010a2a32e2 _php_stream_xport_create + 194
2 redis.so 0x000000010b6e9b41 redis_sock_connect + 321
3 redis.so 0x000000010b6e99c2 redis_check_eof + 162
4 redis.so 0x000000010b6e9e5c redis_sock_read + 44
5 redis.so 0x000000010b6eaaa4 redis_send_discard + 180
6 redis.so 0x000000010b6e7e74 zim_Redis_pconnect + 132

And

Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x00000000000000c0

VM Regions Near 0xc0:
-->
__TEXT 0000000109a21000-0000000109a5a000 [ 228K] r-x/rwx SM=COW /opt/local/apache2/*

Application Specific Information:
objc[381]: garbage collection is OFF

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 libphp5.so 0x000000010a297e79 _php_stream_eof + 9
1 redis.so 0x000000010b6e9932 redis_check_eof + 18
2 redis.so 0x000000010b6e9c24 redis_sock_write + 68
3 redis.so 0x000000010b6d7377 zim_Redis_hGet + 247

Rest of the log doesn't concern redis

  • Bug appears in 2.1.1 and 2.1.3 version
  • Redis version 2.4.7
  • Appears on both 32bits and 64bits

Using connect() instead of pconnect() seems to be a temporary fix

@timsweb
timsweb commented Feb 20, 2012

Another +1 on this when attempting to subscribe to a channel.

Setting the default_socket_timeout = 0 throws the exception immediately, -1 and the app hangs forever. All running on php 5.3 on Centos.

@rookie7799

+1 to the issue

@akitsukada

+1

@wayne530

+1

@colinmollenhour

Just adding another data point.. I ran into this when benchmarking Redis as a cache backend. It does not occur for me if connecting over TCP, only if using a unix socket. It occurs more frequently with more concurrent processes. For example, when testing with 64 concurrent clients, all but 17 had the error. When using a standalone PHP driver with all else being equal, there were no such errors.

Debian 64bit with Redis 2.4.9 (dotdeb), phpredis 2.1.3 (master), PHP 5.3.10 (dotdeb) from CLI.

Standalone PHP driver used with no errors is Credis_Client which connects with stream_socket_client and reads with fgets and stream_get_contents.

@colinmollenhour

Forgot to mention, I am not using persistent connections with either phpredis or standalone mode,

@bobrik
bobrik commented Mar 27, 2012

If you have a BIG bunch of servers (over 100, is really enough) then this error may occur because of syncookie enabled in the kernel (freebsd and linux). Disabling syncookie fixed issue for us.

@rookie7799

setting syncookies to off should be done on redis servers or on clients' servers or both ?
thanks!

@bobrik
bobrik commented Mar 27, 2012

We disabled it everywhere, but I think server side is more important.

@colinmollenhour

I am only running one instance of Redis. What baffles me is that the standalone PHP driver has no read errors on the socket for the exact same benchmark tests, but phpredis does.. So, there is something different with phpredis' connection handling that makes it less stable.

@asuth
asuth commented Apr 11, 2012

Same issue on Redis 2.4.10 and php-redis 2.1.3 on OS X 10.7.3 / macports. $redis->blpop('mykey', 0); and getting "read error on connection" after 5.0 seconds every time.

@michael-grunder
Member

@asuth How are you connecting to Redis. Are you setting a 5 second timeout? Also, have you checked your redis.conf timeout setting and the default_socket_timeout php.ini setting?

I don't get this problem with the following script:

$r = new Redis();
$r->connect('localhost',6379);

$st = microtime(true);
$r->blpop('mykey',0);
$et = microtime(true);
echo "took: " . ($et-$st) . " to die\n";

I let it run for like ten minutes, and it blocked until I pushed something onto 'mykey'

@rookie7799

@michael-grunder not everyone connects to localhost ...

@michael-grunder
Member

@rookie7799 They don't? I thought the whole internet was just 'localhost'! Seriously though, I just used localhost as an example rather than put the IP address for one of our production instances. Seemed prudent.

@rookie7799

I'm just saying that the problem lies somewhere in the network layer and since you're testing on localhost you can't really use that as an argument - no?

@michael-grunder
Member

@rookie7799 For sure, there are just other variables. For instance if you connect with $r->connect($svr,$port,5), you'll get a 5 second timeout. I believe you'll also get that if your default_socket_timeout is 5 in php.ini

The following script will timeout @5 seconds with a "Read error on connection" message, for example

try {
        $r = new Redis();
        if(!$r->connect('localhost',6379,5)) {
                echo "Couldn't connect\n";
                die();
        }
        $st = microtime(true);
        $val = $r->blpop('mykey',0);
        $et = microtime(true);
} catch(Exception $ex) {
        echo "ex: " . $ex->getMessage() . "\n";
        $et = microtime(true);
}

echo "Took: " . ($et-$st) . " to get the key\n";

It could totally be a bug, just good to track those bits down first is all.

@asuth
asuth commented Apr 16, 2012

derp, I'm silly, had pconnect with a timeout of 5 seconds in my redis wrapper class. It now seems though that if I set a timeout of 0, or don't set a timeout, I get an immediate failure, whereas if I set a timeout to 999 or whatever, it works fine. Still using 2.1.3

@fruit
fruit commented Apr 27, 2012

I've got the same problem as @asuth (fatal error when setting 0 to pconnect timeout)
default_socket_timeout option set to -1.

@simonify

Getting this when I keep a connection open potentially indefinitely (using redis subscribe). Seems to timeout and return that error after a certain amount of time (default_socket_timeout set to -1).

@fraserredmond

I just had this same problem, and may have one more piece of info to help you track it down.

I spent a few hours trying to work it out, including opening up all the timeouts suggested above, and found that my test scripts all worked fine :-(

We'd already decided to move to predis, as we're planning on using hiphop soon, so I gave up and decided to do the change sooner rather than just before starting with hiphop.

After changing to predis I still got an error message in about the same place:
"server error: 'Protocol error: too big bulk count string'"

I had a couple of really long strings (> 64kb) in some hashes that I was trying to set, and removing them from the array that I was passing in worked in predis (it was too late to try it in phpredis though, I'd already converted too much code.)

Hope that helps.

@michael-grunder
Member

That's a great find. Thanks for that. Looks like this is a hard limit in Redis for BULK replies. From redis.h:

#define REDIS_INLINE_MAX_SIZE (1024*64) /* Max size of inline reads */

I think reporting the specific error message from phpredis may be the way to go here.

Thanks for that!

Cheers,
Mike

@chuckdrew

I encountered this same error message. Using the tip from @redzarf - I did some research and found that some of the strings we were trying to store are indeed more than 64KB (65,536 Characters).

As a quick solution I started gzip'ing data before storing it in redis. Havn't seen the "read error on connection" error since.

I agree with @michael-grunder - a more specific error message would have saved me many hours of debugging...

Cheers!

Chuck

@kotas
Contributor
kotas commented Oct 4, 2012

I have sent a pull request to fix this issue.
#260

@fr0x fr0x referenced this issue in colinmollenhour/Cm_Cache_Backend_Redis Oct 17, 2012
Closed

read error on connection #18

@cma
cma commented Feb 5, 2013

Just a warning: set default_socket_timeout to -1 in php.ini has side effect on file_get_contents(). Sometimes it's working for http site. But it seems to always fail for https site.

~] php -r "ini_set('default_socket_timeout', 60); var_dump(file_get_contents('https://www.google.com'));"
string(46547) "<!doctype html><html itemscope="itemscope" itemtype="http://schema.org/WebPage"><head><meta itemprop="image" content="/images/google_favicon_128.png"><title>Google</title><script>(function(){\nwindow.google={kEI:"Ro8RUcTxLYf2rAHfu4CABw",getEI:function(a){for(var b;a&&(!a.getAttribute||!(b=a.getAttribute("eid")));)a=a.parentNode;return b||google.kEI},https:function(){return"https:"==window.location.protocol},kEXPI:"17259,39523,4000116,4001569,4001959,4001975,4002001,4002159,4002208,4002436,4002562,4002733,40"...
~] php -r "ini_set('default_socket_timeout', -1); var_dump(file_get_contents('https://www.google.com'));"
PHP Warning:  file_get_contents(https://www.google.com): failed to open stream: Operation now in progress in Command line code on line 1
PHP Stack trace:
PHP   1. {main}() Command line code:0
PHP   2. file_get_contents() Command line code:1

Warning: file_get_contents(https://www.google.com): failed to open stream: Operation now in progress in Command line code on line 1

Call Stack:
    0.0002     222912   1. {main}() Command line code:0
    0.0002     223872   2. file_get_contents() Command line code:1

bool(false)
@vetinary

+1 to the issue. We are using Redis as a session handler, and from time to time we get lot's of "read error on connection" errors while trying to call session_start(). This may last for 5-10 seconds on two servers working with the same Redis server, and while this is going on we get errors "Connection closed" when trying to call setExt() on some redis handler in PHP.

PHP: PHP 5.3.3-7+squeeze14
Redis extension: 2.1.3

@michael-grunder
Member

I've merged in the changes from @kotas, can everybody have a go and see if it solves the issue?

Cheers,
Mike

@marcin-github

Hi! I've similar problem to described in this issue but I'm not 100% sure is it the same problem. On server there is apache 2.2 (prefork), php 5.3, phpredis-2.2.3, edit: redis 2.6.13 (was: redis 2.2.13). Connection to redis is configured with socket (I had the same problem using tcp). When traffic becomes higher than about 200-300 req/s there starts to appear errors from zend&php:
Exception: RedisException#012Message: read error on connection

I run strace on redis processes, this is what it shows:
5197 16:15:22.029070 read(9, "*2\r\n$4\r\nAUTH\r\n$7\r\nXXXX\r\n", 16384) = 27 <0.000012>
5197 16:15:22.029120 epoll_ctl(3, EPOLL_CTL_MOD, 9, {...}) = 0 <0.000008>
5197 16:15:22.029156 write(9, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000009>
5197 16:15:22.029193 --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=5197, si_uid=75} ---
5197 16:15:22.029231 epoll_ctl(3, EPOLL_CTL_MOD, 9, {...}) = 0 <0.000007>
5197 16:15:22.029262 epoll_ctl(3, EPOLL_CTL_DEL, 9, {...}) = 0 <0.000008>
5197 16:15:22.029292 close(9) = 0 <0.000012>

So immediately after AUTH connections becomes broken. The interested thing is (but I don't know how to interpret this) the problem appears in burst on some file descriptors.
grep "(9,|(9)" /tmp/strace.log
[...]
5197 16:15:32.058722 fcntl(9, F_GETFL) = 0x2 (flags O_RDWR) <0.000006>
5197 16:15:32.058751 fcntl(9, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000006>
5197 16:15:32.058779 setsockopt(9, SOL_TCP, TCP_NODELAY, [1], 4) = -1 EOPNOTSUPP (Operation not supported) <0.000006>
5197 16:15:32.133849 read(9, "*2\r\n$4\r\nAUTH\r\n$7\r\nXXXX\r\n", 16384) = 27 <0.000010>
5197 16:15:32.133927 write(9, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000008>
5197 16:15:32.134066 close(9) = 0 <0.000012>

( it is repeated dozens time) and suddenly:

5197 16:15:34.112396 fcntl(9, F_GETFL) = 0x2 (flags O_RDWR) <0.000006>
5197 16:15:34.112424 fcntl(9, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000005>
5197 16:15:34.112450 setsockopt(9, SOL_TCP, TCP_NODELAY, [1], 4) = -1 EOPNOTSUPP (Operation not supported) <0.000006>
5197 16:15:34.113145 read(9, "_2\r\n$4\r\nAUTH\r\n$7\r\nXXXX\r\n", 16384) = 27 <0.000009>
5197 16:15:34.113505 write(9, "+OK\r\n", 5) = 5 <0.000062>
5197 16:15:34.115298 read(9, "_2\r\n$7\r\nHGETALL\r\n$17\r\nplace::1796\r\n", 16384) = 41 <0.000008>

And now is no problem with this particular fd (but there is problem with other):
5197 16:15:21.776421 write(8, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000014>
5197 16:15:22.029156 write(9, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000009>
5197 16:15:22.608229 write(8, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000009>
5197 16:15:22.695842 write(9, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000009>
5197 16:15:22.898302 write(8, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000008>
5197 16:15:23.057223 write(9, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000008>
[...]
5197 16:15:32.429691 write(6, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000007>
5197 16:15:32.438394 write(8, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000008>
5197 16:15:32.470813 write(6, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000008>
5197 16:15:32.475295 write(8, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000007>
5197 16:15:32.513270 write(6, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000012>
5197 16:15:32.549257 write(8, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000008>
[...]
5197 16:15:33.990477 write(6, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000007>
5197 16:15:33.990981 write(7, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000007>
5197 16:15:33.991401 write(6, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000007>
5197 16:15:33.991811 write(7, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000007>
5197 16:15:33.992220 write(6, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000007>
5197 16:15:33.992677 write(7, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000007>
[...]
5197 16:15:37.970107 write(98, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000012>
5197 16:15:38.065368 write(100, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000009>
5197 16:15:38.679323 write(108, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000010>
5197 16:15:39.107080 write(110, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000009>
5197 16:15:39.350717 write(108, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000012>
[...]
197 16:15:49.464131 write(130, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000007>
5197 16:15:49.467389 write(129, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000007>
5197 16:15:49.469742 write(130, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000008>
5197 16:15:49.472135 write(129, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000007>
5197 16:15:54.288204 write(206, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000010>
5197 16:15:54.989417 write(48, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000008>
5197 16:15:55.303994 write(7, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000008>
5197 16:15:55.849363 write(18, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000010>
5197 16:15:56.273513 write(7, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000009>
5197 16:15:56.292993 write(18, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000007>
[... and again fd 6 and 7]
5197 16:16:25.875449 write(7, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000007>
5197 16:16:25.876019 write(6, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000007>
5197 16:16:25.876429 write(7, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000007>
5197 16:16:25.876847 write(6, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000007>
5197 16:16:25.877254 write(7, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000007>
5197 16:16:25.877663 write(6, "+OK\r\n", 5) = -1 EPIPE (Broken pipe) <0.000007>

@michael-grunder
Member

Are you possibly running out of file descriptors? Can you up the limit with ulimit?

@marcin-github

Hi Michael, thanks for reply.
Redis has limit set to 11000, apache had limit 1024 but know I set limit to 300000:
# grep "open files" /proc/$(cat /run/redis/redis.pid)/limits
Max open files 11000 11000 files

grep "open files" /proc/$(cat /run/apache2.pid)/limits

Max open files 300000 300000 files

cat /proc/sys/fs/file-max

818191

lsof -n|wc -l

20158

but it doesn't change behavior of application. If limit of open file descripters would be reached shouldn't apache has problem with connecting to redis? Now it looks redis can't answer to apache (I forgot to mention I'm using php as mod_php) using earlier open fd. I'm just guessing.

@marcing
marcing commented May 27, 2013

For the this error happens on flushDB() (phpredis), any ideas?

@michael-grunder
Member

@marcing Generally this ends up being a timeout issue. Are you trying to flush a very large database or do you have a short timeout?

@marcing
marcing commented May 27, 2013

It happens right away when I invoke flushDB, so I think it is not the timeout: yes the database is large (or maybe not depends on what we are comparing it to ;))

@marcin-github

@michael-grunder , what do you think about my issue? Could it be issue with phpredis? Or is it something beside phpredis?

@michael-grunder
Member

@marcin-github These are always tricky to track down. File descriptors is always the first place I check, but there can be other issues. Are you using pconnect or connect? Because of how apache runs so many forks, it can end up holding connections forever (under pconnect), and this will become a problem if you are, say, running 20 web servers at the same time.

@marcin-github

I did test with pconnect() and with connect() - there was no difference. I turn on debug verbosity in redis, in redis .log I can see:
[10544] 28 May 17:14:12.653 - Error writing to client: Broken pipe
[10544] 28 May 17:14:12.653 - Accepted connection to /run/redis/redis.sock
[10544] 28 May 17:14:12.680 - Error writing to client: Broken pipe
[10544] 28 May 17:14:12.680 - Accepted connection to /run/redis/redis.sock
[10544] 28 May 17:14:12.718 - Error writing to client: Broken pipe
[10544] 28 May 17:14:12.718 - DB 0: 33 keys (14 volatile) in 64 slots HT.
[10544] 28 May 17:14:12.718 - 289 clients connected (0 slaves), 23212448 bytes in use
[10544] 28 May 17:14:12.719 - Accepted connection to /run/redis/redis.sock
[10544] 28 May 17:14:12.732 - Error writing to client: Broken pipe

@michael-grunder
Member

Are you able to check how many file descriptors are available when this occurs? We run phpredis in a very high volume setup as well, and that was almost always what was happening when we got this kind of error.

It's also possible that you're getting timeouts under such a heavy load, which would also cause a similar issue. I wouldn't think so, however, unless you have a low timeout and/or are issuing commands to Redis that take a very long time.

@marcin-github

Is it what you meant?
# cat /proc/sys/fs/file-nr
5248 0 818317

lsof -n|wc -l

68523

Meseems it's not question of timeots. I changed it from 3s to 13s - no changes. Strace shows that redis answers very quick - EPIPE appears after first response. It can be "OK" for "AUTH" or it can be values returned for "HGETALL".

@michael-grunder
Member

Just to be sure, make sure you also set the "read" timeout to something bigger within phpredis as well. You can do this like so:

$redis->setOption(Redis::OPT_READ_TIMOUT, $value);

It doesn't actually look like Redis is running out of FDs because if it were, you should get an error like this:

Error registering fd event for the new client...

You said you also tried with TCP/IP with the same results? If you use TCP you should get a different message (more like read error on connection), as you aren't dealing with pipes. Is that what happens?

@marcin-github

I set OPT_READ_TIMEOUT to 100 - no change. In both cases (TCP & socket) php throws such messages:
"Redis server went away in file ..." and "read error on connection in file ...".

@michael-grunder
Member

Hey,

Just to totally make sure, how are you issuing the connect command. Are you passing an explicit timeout, or are you relying on default_socket_timeout in php.ini? If you aren't passing a timeout to connect, it'll use this value.

Also, I just noticed (my bad) that you're running Redis 2.2.13. Is there a reason that you're running such an old version? Is it possible to upgrade to the current stable version of Redis to see if the problem persists. There have been massive changes since that version.

@marcin-github

Michael, I made typo, I'm using redis-2.6.13, I'm sorry for this.

Oryginal code looks like this:
if(is_null(self::$redis)) {
self::$redis = new Redis();
self::$redis->connect(self::$config->redis->host, self::$config->redis->port, self::$config->redis->timeout);
if(self::$config->redis->password) self::$redis->auth(self::$config->redis->password);
}

I'm not php programer, I hope I did it correctly. I put line:
self::$redis->setOption(Redis::OPT_READ_TIMEOUT, 100);
before "if" statement.

Thanks.

@ghost
ghost commented Jun 10, 2013

We're seeing the same issue "read error on connection" when redis has over a few hundred of connected clients. We have about a 100 webservers connecting and querying redis every second which sometimes brings connected client count to over a 1000. We made a small hack and reduced the number of connected clients by using a webservice which drastically reduced the number of connections to a few hundred. The errors went away, almost. We are batch loading redis with data once a day early in the morning and we noticed that the errors showed up again even though the number of connections to redis were quite low (130-170).

Next step we're going to try is making a ping() before making a get(). We have a suspicion that unix might be closing down sockets and the client isn't able to reconnect properly.

  • We're using pconnect with a timeout of 600s.
  • Syncookies are disabled.
  • We're using php-fpm
@michael-grunder
Member

Hey there,

As you may have noticed by this thread, this issue can be a tough one to track down.

In our live environment, we're running phpredis with php-fpm in nginx. We're also running some ridiculous number of web servers, so our setups are probably pretty similar to you guys. πŸ˜ƒ

The first thing I would suggest is looking to move from pconnect to connect. The problem with pconnect in an environment with so many servers (under php-fpm) is that you can end up with hundreds of connections per web server, which never die. This can run you out of file descriptors (or even if your limits are unlimited, resources in general).

Along the same lines, I would suggest monitoring this when you're going to go into a heavier load period (e.g. with netstat).

What we do is a hybrid approach (between connect and pconnect). Getting a connection to redis is pretty fast and it's not going to make you CPU bound. This will introduce more of a delay (vs. pconnect), but it should scale very well. We built logic into our handler classes where they will cache a connection (or multiple connections) per request.

One more thing to look at is an interesting open source offering from Twitter, called twemproxy. It would run on your web server, and proxy connections to Redis. This way, you can keep just a few open connections from the web server to the Redis server.

Cheers!
Mike

@ghost
ghost commented Jun 11, 2013

Thanks Mike! We were thinking of trying connect but were sort of worried about the latency of creating/destroying connections upon every page. As I mentioned in my original post, we tried doing a ping() after we retrieved a pconnection. After the ping caused the error message we logged it and made another pconnect call which then allowed every other operation afterwards to be successful. Error rates remained the same, 60-80 per hour during heavy loads but it was smooth sailing in terms of functionality.

-Viktor

@marcin-github

I've tried to use twemproxy. it doesn't help much, I get connection timeout beetwen apache+php and twemproxy.

@ghost
ghost commented Jun 20, 2013

Update 2:
Errors came back... Ignore the post below.


We have updated our Redis version and moved to a different OS and the issue went away.

Redis 2.4 was on Free BSD while 2.6 cluster is on CentOS 6.3


-Viktor

@evilsocket

just my 2 cents ... since php-fpm is multi process, shouldn't the extension use getpid() too to compute the persistent socket id ? ( look here, you use only host, port and timeout for it https://github.com/nicolasff/phpredis/blob/master/library.c#L1038 )
You could try to use a socket created from another php process resulting in an error.
Am I wrong ?

@parhamr
parhamr commented Jul 22, 2013

πŸ‘ @evilsocket’s findings look like the source of some bugs

@michael-grunder
Member

Hey @evilsocket and @parhamr,

I am by no means an expert on the PHP source code, or the stream handling methods, but I don't think this is how it would work. I have thus far just taken a cursory glance, but I don't see any obvious interprocess communication going on.

Phpredis uses php_stream_xport_create to create the stream itself, passing in the generated persistent_id that it creates.

In turn, php_stream_xport_create uses php_stream_from_persistent_id in main/streams/streams.c to see if it can find a persistent connection.

Specifically, it looks for it like so

    if (zend_hash_find(&EG(persistent_list), (char*)persistent_id, strlen(persistent_id)+1, (void*) &le) == SUCCESS) {

To my knowledge, zend_hash_find doesn't do any kind of interprocess communication. It's just a hash lookup, and one should exist per executing process.

@nicolasff Am I totally confused here?

Cheers guys,
Mike

@nicolasff
Contributor

Persistent connections are disabled in multi-threaded (ZTS) mode: https://github.com/nicolasff/phpredis/blob/e47e18c4b29b9887b0fe472e76ed461c9407c273/redis.c#L605 to avoid multiple threads accessing the same shared data structures.
I'm not sure whether there's an easy way to support this feature...

@nicolasff
Contributor

Oh my bad I misunderstood the context. There is indeed no IPC, the socket ID is local to the process.

@parhamr
parhamr commented Jul 23, 2013

@nicolasff if you’re able to provide an outline of the changes needed, I do have some senior-level systems and software engineers who might be able to assist with changes and tests.

@michael-grunder
Member

@parhamr,

I can take a crack at it. πŸ˜ƒ

Essentially, the problem with using persistent connections in the context of a multi-threaded build of PHP is that we can't have two threads gaining access to the same cached socket, as that will cause all sorts of havoc.

I am guessing that @nicolasff disabled persistent connections when they were first implemented, because it was unclear what the socket creation mechanism would do in that case.

So the big question, it would seem, is how PHP handles a &EG(persistent_list) lookup when ZTS is enabled. If it's truly global then we would have to add context to each id (thread id, etc). If there is one per thread, however, it might just work.

From this discussion, it looks like PHP maintains one list per thread.

This might mean we can use persistent connections as-is within the context of ZTS.

That being said, I will have a play with it to see what I can find.

Cheers,
Mike

@evilsocket

seems quite reasonable :)

@parhamr
parhamr commented Jul 23, 2013

Thanks, @michael-grunder! :)

@parhamr
parhamr commented Oct 16, 2013

Any updates?

@ghost
ghost commented Oct 16, 2013

Nope. We basically just do a ping before running any operation. We combine it in a pipeline to reduce the overhead.

@parhamr
parhamr commented Oct 16, 2013

Sorry, @muzhikas, but I’m wondering if there are yet updates regarding persistent connection support for php-fpm. This is particularly a question about @nicolasff’s link from 7-23

@ghost
ghost commented Oct 16, 2013

No worries.

On Wed, Oct 16, 2013 at 6:05 PM, Reid Parham notifications@github.comwrote:

Sorry, @muzhikas https://github.com/muzhikas, but I’m wondering if
there are yet updates regarding persistent connection support for php-fpm.
This is particularly a question about @nicolasffhttps://github.com/nicolasff’s
link from 7-23#70 (comment)

β€”
Reply to this email directly or view it on GitHubhttps://github.com/nicolasff/phpredis/issues/70#issuecomment-26463063
.

Sincerely,

Viktoras Truchanovicius
@UnclaimedPants

I had the same issue. It was driving me mad. Here's what helped me get rid of the problem forever:

  1. replacing pconnect() with:
$redis->connect(REDIS_HOST, REDIS_PORT, 0);
  1. disabling iptables invalid packet drop

  2. setting file-max to 300,000:

sysctl -w fs.file-max=300000
  1. setting ulimit -n to 100,000, then restarting redis service:
ulimit -n 100000

Bam, fixed.

@ikandars

I had the same issue. So I use connect() instead of pconnect() and set timeout to 1 sec. And all works well. My environment:

PHP 5.5.9 FPM
Nginx 1.5.7
Phpredis 2.2.4

@andygrunwald

Hey @all,

we are using phpredis and got the same issue under heavy load (> 2200 requests per second).
We got more than 20 webserver (php only) and four redis server on two physical machines in our local network.
Both (redis + webserver) are based on FreeBSD version 9.x.
We are using Apache 2.2 (prefork) with PHP 5.2.24.
Redis in version 2.6.4.

We got read error on connection several times during minutes / seconds.
We recognized that the connect method call returned true. At the select call the mentioned error occured.
Here is the code we are using:

$_bConnected = null; // Dummy value to initialize the variable
try {
   $_bConnected = $oRedis->connect($sHost, $iPort, $fTimeout);
   if (defined('\Redis::OPT_READ_TIMEOUT')) {
      $oRedis->setOption(\Redis::OPT_READ_TIMEOUT, 10);
   }
   if($_bConnected) {
      $oRedis->select($oConfiguration->iDbIndex);
   }
} catch(\Exception $_oException) {
   $_sExceptionInfo  = ' (Connected: ' . var_export($_bConnected, true) . ')';
   throw new ConnectException('Unable to connect: ' . $_oException->getMessage() . ' ' . $_sExceptionInfo, 1, $_oException);
}

We had tested everything:

  • File descriptors (current / max): On our webserver ~2.000 / ~64.000, on redis server ~150 / ~64.000
  • Disabling syncookie
  • Raising the mbuffer
  • Switching to predis with and without phpiredis. We got the same errors.
  • Updating phpredis to master
  • Setting connect / read timeouts to >= 10 seconds
  • Check default_socket_timeouts. Current value 60 seconds
  • Reducing max requests per apache child

Nothing happen / changed.

The redis server itselfs does not log any warnings / errors.
The server have got 0 rejected connections. No heavy load. Enough free connections. Everything seems to be fine.
We got several java services (runs ob debian) which got no problems with the redis connection issue.

We got system ops who can help to tackle down this issue and tweak settings. If anyone got more information, we will try it out. Thanks.

@parhamr
parhamr commented Mar 28, 2014

@andygrunwald this issue has been open for a long time and people who have previously claimed they were working on it have never delivered. I suggest you be bold and take charge to get it fixed! :)

@andygrunwald

@parhamr Thanks for this advice :D
We are bold and try to fix it. But currently we got a lack of ideas what the possible reason can be. We are running out of ideas what we can try. This is the current state and (of course) the reason why i post here. Maybe some one of you got a new idea.

@ghost
ghost commented Mar 28, 2014

The problem is that sockets get closed by the OS. We got around it by doing
a ping once on every page load and then again if the last ping was more
than 30s ago. Ping refreshes the connection and there are no more errors.

We have 8 redis clusters with master/slave each running on VM. There are
about 4k connections to all 8 masters on average.

On Fri, Mar 28, 2014 at 1:17 PM, Andy Grunwald notifications@github.comwrote:

@parhamr https://github.com/parhamr Thanks for this advice :D
We are bold and try to fix it. But currently we got a lack of ideas what
the possible reason can be. We are running out of ideas what we can try.
This is the current state and (of course) the reason why i post here. Maybe
some one of you got a new idea.

Reply to this email directly or view it on GitHubhttps://github.com/nicolasff/phpredis/issues/70#issuecomment-38944507
.

Sincerely,

Viktoras Truchanovicius
@michael-grunder
Member

Hey there,

This issue is one that is particularly hard to track down, as you might have noticed from the sheer length and number of people on this thread. That being said, a couple of things stick out to me.

First off, this error is almost always one of the following:

  • An actual timeout. "read error on connection" specifically means that there was an error reading on an already open connection.
  • Nefarious network layer gremlins. :)

Given the description you pasted, a timeout here seems unlikely. select is such a fast command, that I doubt you are timing out. Now, one thing that would be very useful would be to instrument your debug code to get a microtime(true) value before you connect, and then one in the exception block. Print or log that value. If it is always at or near your timeout value (or even better, if it's not at the timeout you think you're setting but always around a whole number), then you've found it. It's timing out for some reason.

The fact that you say you got the error using predis with and without phpredis enabled, also suggests that it's not related specifically to phpredis. Perhaps I read this incorrectly?

The other reasons this has happened to people are varied, but generally come down to something in the system configuration regarding the network. Unfortunately I'm a coder, and therefore terrible at sysops, so that is something I'm less able to help with.

@parhamr We do try to help track this down, but this is almost always related to timeouts or something to do with the networking setup. We use phpredis in production at extremely high levels of load/distribution, and do not ever see this problem -- unless we hit a timeout (think huge ZUNIONSTORE operations). I'm not in any way saying that it's not in phpredis because it doesn't happen to us, but it's just really hard to track down.

Cheers guys,
Mike

@parhamr
parhamr commented Mar 28, 2014

@michael-grunder thanks for the details! Could these steps be added to the official documentation? Is there an appropriate place for additional debug data to be logged?

@michael-grunder
Member

@parhamr That's actually not a bad idea, especially with how often the issue bites people. A specific section in the docs around this error is likely a good idea. Also, I'll troll through the php stream code and see if there is a way to find more detailed error information (like an actual error code), which could at least verify why PHP thinks the read failed. In terms of documentation, you can put it anywhere really. If the length of time between the call and exception is your timeout, it's a timeout. If it's very very fast, it's likely something else.

@atdt atdt added a commit to wikimedia/mediawiki that referenced this issue May 5, 2014
@AaronSchulz @atdt AaronSchulz + atdt Set Redis::OPT_READ_TIMEOUT by default
* The value is also configurable just like connectTimeout
* This deals with problems like phpredis/phpredis#99
  and phpredis/phpredis#70

Change-Id: I05e91e05764020854d04673b7decae30359f57e9
c6a7a3a
@bmeynell

Also encountering this error quite often on a single node setup with the client on the same node.

@parhamr
parhamr commented Mar 26, 2015

@bmeynell are you using TCP or Unix sockets? (e.g. does your config have bind 127.0.0.1 or unixsocket /tmp/redis.sock?)

@bmeynell

@parhamr - Both are commented out in the redis config file. So I think that's equivalent to bind 0.0.0.0.

@andygrunwald

@bmeynell my experience about this error is: Timeouts / Long running operations on Redis.
We had this error and we had stripped it down.
Please have a look what operations you fire up on the redis node. keys *? O(n) operations where n is big? A useful tool about this is Watchdog. It is like the "MySQL slow query log for redis".

The next thing is: How you connect to Redis? Do you open a new connection on every request? Or do you use a proxy like twemproxy?
We encountered during high load (>2.500 req/s) that twemproxy solved a lot of this connection issues. Before this we say "Hi", fire up one cmd, and say "bye" to redis more than >2.500 times per second.
It seems that redis was "quite busy" with connection handling instead of doing cmd works :)

@aleric323

I see the thread is not active anymore but was anything done in phpredis to fix this?

@virgofx
virgofx commented Oct 27, 2015

This can still easily be duplicated in PHP by creating a connection to Redis and then writing keys in a loop over more than one fork. (Happens in as few as 10 writes in a for loop).

@parhamr
parhamr commented Oct 27, 2015

@virgofx would you please provide your sample code in here? That would be helpful to reproduce and fix the bug (by whomever finally takes it on).

@virgofx
virgofx commented Oct 27, 2015

@parhamr Hopefully this script can provide insight into the issue. I've adapted it from our enterprise app into a small little test runner.

Note: You need at least 2 forks and about 10 iterations to begin seeing errors.
Note: All of my tests have Redis installed on a separate system (not 127.0.0.1) Testing vagrant to vagrant.

Usage

  1. Update the following constants:
  • host
  • password
  • forks
  • iterationsPerFork
  1. Run via php redis-test.php

redis-test.php

<?php

/**
 * Utility class to easily fork processes (Adapted for phpredis/phpredis - Issue #70)
 */
class Forker
{
    private $childProcesses = [];

    public function fork($callback)
    {
        $sockets = stream_socket_pair(STREAM_PF_UNIX, STREAM_SOCK_STREAM, STREAM_IPPROTO_IP);
        $pid = pcntl_fork();

        if ($pid === -1) {
            throw new \Exception('Could not fork process');
        } elseif ($pid) {
            fclose($sockets[1]);
            $this->childProcesses[] = [
                'pid'    => $pid,
                'socket' => $sockets[0]
            ];
        } else {
            fclose($sockets[0]);
            $return = $callback();

            if (fwrite($sockets[1], serialize($return))) {
                fclose($sockets[1]);
            }

            exit(0);
        }

        return $pid;
    }

    public function wait()
    {
        foreach ($this->childProcesses as $index => $childProcess) {
            pcntl_waitpid($childProcess['pid'], $status, WUNTRACED);

            if ($status > 0) {
                throw new \Exception('Child process exited unclean');
            }

            unset($this->childProcesses[$index]);
        }
    }

}

//////////////////////////////////////////////////////////////////////////////////////////////
// PHP CLI Runner
//////////////////////////////////////////////////////////////////////////////////////////////
$host = '127.0.0.1';
$port = 6379;
$password = 'your password';
$forks = 2;
$iterationsPerFork = 10;


$redis = new \Redis();
$redis->connect($host, $port);

if ($redis->auth($password) === false) {
    throw new \Exception('Redis authentication failed');
}

$forker = new \Forker();

for ($i = 0; $i < $forks; $i++) {
    $forker->fork(function() use ($iterationsPerFork, $forks, $i, $redis) {
        for ($idx = 0; $idx < $iterationsPerFork; $idx++) {
            try {
                if (! $redis->setex('S:' . md5(uniqid($i . $idx, true)), 1440, 'small-data')) {
                    echo '[fork ' . $i . '] [' . $idx . '] Unable to setex()' . PHP_EOL;
                } else {
                    echo '[fork ' . $i . '] [' . $idx . '] βœ“' . PHP_EOL;
                }
            } catch (\Exception $e) {
                echo '[fork ' . $i . '] [' . $idx . '] ' . $e->getMessage() . PHP_EOL;
            }
        }
    });
}
$forker->wait();

php redis-test.php (2 forks x 10 iterations) (Bugs out, see errors)

[fork 0] [0] βœ“
[fork 1] [0] βœ“
[fork 1] [1] βœ“
[fork 0] [1] read error on connection
[fork 1] [2] βœ“
[fork 1] [3] βœ“
[fork 0] [2] Unable to setex()
[fork 0] [3] Unable to setex()
[fork 0] [4] Unable to setex()
[fork 0] [5] Unable to setex()
[fork 0] [6] Unable to setex()
[fork 0] [7] Unable to setex()
[fork 0] [8] Unable to setex()
[fork 1] [4] βœ“
[fork 0] [9] Unable to setex()
[fork 1] [5] βœ“
[fork 1] [6] βœ“
[fork 1] [7] βœ“
[fork 1] [8] βœ“
[fork 1] [9] βœ“

Re-test with just one fork - Works, no issues.
php redis-test.php (1 x 10000)

[fork 0] [0] βœ“
....
[fork 0] [9999] βœ“
@wotek
wotek commented Nov 3, 2015

+1 on the issue

@jianhuihi

+1 on the issue

@mightydok

@virgofx @michael-grunder test script works well with phpiredis.
For phpredis even patch for php-src from @shafreeck dont help.

@virgofx
virgofx commented Nov 20, 2015

Bummer, thanks for testing @mightydok ... I was hoping with a reproducible test script that it wouldn't take long to get a fix in 😞

@mightydok

@virgofx as is see phpredis need rewrite network code, cause hiredis (c library for phpiredis) looks more mature and more well orginized code structure.

@virgofx
virgofx commented Nov 30, 2015

@mightydok I'm just concerned about getting the phpredis package updated as this is the most widely used PHP extension for Redis. If there is a suitable replacement the community would certainly move that direction ... but right now phpredis is the defacto standard.

@michael-grunder
Member

I don't know what is causing this to happen in the context of a fork, but
I'll try to figure it out with your script.

I am updating phpredis and fixing bugs but we do need a new official
package that includes cluster support. Truth is I'm finishing up a huge
project for my "real" job so I should have more time for this now.

Also I totally agree phpredis should use hiredis for its communication
layer, but that is a big task.

Cheers,
Mike

On Sun, Nov 29, 2015, 6:59 PM Mark Johnson notifications@github.com wrote:

@mightydok https://github.com/mightydok I'm just concerned about
getting the phpredis package updated as this is the most widely used PHP
extension for Redis. If there is a suitable replacement the community would
certainly move that direction ... but right now phpredis is the defacto
standard.

β€”
Reply to this email directly or view it on GitHub
#70 (comment).

@virgofx
virgofx commented Nov 30, 2015

@michael-grunder Thanks. A network layer switch would be a big task and could definitely be scheduled for sometime in the future. Since we have a reproducible script for this long outstanding bug I figured it would be easier to trace down the issue to at least make a temporary fix.

@virgofx
virgofx commented Jan 12, 2016

@michael-grunder Figured you've been busy .. just wondering if you or anyone else has been able to make any progress on this issue?

@fucongcong

pconnect and connect both throw the exception

@eeroniemi
eeroniemi commented Apr 27, 2016 edited

This is happening to us also on production (with latest stable driver version 2.2.7) and I was able to reproduce the issue with @virgofx's fork script, with both 2.2.7 and current phpredis devel branch build.

Edit: I tried to make shot in the dark and see if I could somehow work my way around the issue by catching the exception and doing ping or by creating new Redis object + reconnecting + calling setex again when this error happens, but seems like my script didn't work as intended. I'll update if I can either get workaround or can confirm that workaround doesn't help.

@eeroniemi
eeroniemi commented Apr 27, 2016 edited

Okay, seems like my test script had a bug - I didn't pass $host, $port and $timeout to the closure. With that fix it seems like this workaround could actually work. Fixed script:

<?php

/**
 * Utility class to easily fork processes (Adapted for phpredis/phpredis - Issue #70)
 */
class Forker
{
    private $childProcesses = [];

    public function fork($callback)
    {
        $sockets = stream_socket_pair(STREAM_PF_UNIX, STREAM_SOCK_STREAM, STREAM_IPPROTO_IP);
        $pid = pcntl_fork();

        if ($pid === -1) {
            throw new \Exception('Could not fork process');
        } elseif ($pid) {
            fclose($sockets[1]);
            $this->childProcesses[] = [
                'pid'    => $pid,
                'socket' => $sockets[0]
            ];
        } else {
            fclose($sockets[0]);
            $return = $callback();

            if (fwrite($sockets[1], serialize($return))) {
                fclose($sockets[1]);
            }

            exit(0);
        }

        return $pid;
    }

    public function wait()
    {
        foreach ($this->childProcesses as $index => $childProcess) {
            pcntl_waitpid($childProcess['pid'], $status, WUNTRACED);

            if ($status > 0) {
                throw new \Exception('Child process exited unclean');
            }

            unset($this->childProcesses[$index]);
        }
    }

}

//////////////////////////////////////////////////////////////////////////////////////////////
// PHP CLI Runner
//////////////////////////////////////////////////////////////////////////////////////////////
$host = '127.0.0.1';
$port = 6379;
$forks = 3;
$iterationsPerFork = 10;
$timeout = 2;

$redis = new \Redis();
$redis->connect($host, $port, $timeout);

$forker = new \Forker();

function setRedisData($redis, $key, $data, $i, $idx) {
    if (! $redis->setex($key, 1440, $data)) {
        echo '[fork ' . $i . '] [' . $idx . '] Unable to setex()' . PHP_EOL;
    } else {
        echo '[fork ' . $i . '] [' . $idx . '] βœ“' . PHP_EOL;
    }
}

for ($i = 0; $i < $forks; $i++) {
    $forker->fork(function() use ($iterationsPerFork, $forks, $i, $redis, $timeout, $host, $port) {
        for ($idx = 0; $idx < $iterationsPerFork; $idx++) {
            $key = 'S:' . md5(uniqid($i . $idx, true));
            $data = 'small-data';
            try {
                setRedisData($redis, $key, $data, $i, $idx);
            } catch (\Exception $e) {
                echo '[fork ' . $i . '] [' . $idx . '] ' . $e->getMessage() . ' - trying to reconnect and set data again' . PHP_EOL;

                try {
                    unset($redis);
                    $redis = new \Redis();
                    $redis->connect($host, $port, $timeout);
                } catch (\Exception $e) {
                    echo '[fork ' . $i . '] [' . $idx . '] ' . $e->getMessage() . ' - reconnection attempt failed' . PHP_EOL;
                }

                try {
                    setRedisData($redis, $key, $data, $i, $idx);
                } catch (\Exception $e) {
                    echo '[fork ' . $i . '] [' . $idx . '] ' . $e->getMessage() . ' - reconnection ok, setting data failed' . PHP_EOL;
                }
            }
        }
    });
}
$forker->wait();

Output:

[fork 0] [0] βœ“
[fork 0] [1] βœ“
[fork 0] [2] βœ“
[fork 0] [3] βœ“
[fork 0] [4] βœ“
[fork 0] [5] βœ“
[fork 0] [6] βœ“
[fork 0] [7] βœ“
[fork 0] [8] βœ“
[fork 0] [9] βœ“
[fork 1] [0] βœ“
[fork 2] [0] βœ“
[fork 2] [1] βœ“
[fork 1] [1] read error on connection - trying to reconnect and set data again
[fork 2] [2] βœ“
[fork 2] [3] βœ“
[fork 1] [1] βœ“
[fork 2] [4] βœ“
[fork 1] [2] βœ“
[fork 2] [5] βœ“
[fork 2] [6] βœ“
[fork 1] [3] βœ“
[fork 2] [7] βœ“
[fork 2] [8] βœ“
[fork 1] [4] βœ“
[fork 2] [9] βœ“
[fork 1] [5] βœ“
[fork 1] [6] βœ“
[fork 1] [7] βœ“
[fork 1] [8] βœ“
[fork 1] [9] βœ“

Notice these two lines - connection fails:

[fork 1] [1] read error on connection - trying to reconnect and set data again

And then on re-try it's able to store the data:

[fork 1] [1] βœ“

It's quite ugly workaround but should help with data loss until proper fix is found. Also seems to work with phpredis driver version 2.2.7. Tested with PHP version 5.6.20, no idea if this works with PHP < 5.6.13 (where this PHP bug was fixed).

Comments?

@virgofx
virgofx commented Apr 27, 2016

@eeroniemi Yes, creating a new redis connection on error per fork will work. This is only a tentative workaround, albeit, quite ugly :/ High throughput systems that have lots of forks may not have the option of opening up tons of connections - It would be better to fix this server side as this would also improve performance as well.

@eeroniemi

@virgofx I think this is reasonable trade-off if you're using non-persistent connections. I haven't tried persistent connections so don't really know how this workaround works under those conditions.

Our PHP app is running on top of nginx + php-fpm (via fastcgi), and we use non-persistent connections to Redis. During last 40 hours we've had on average 126.000 calls (set + get + connect + del) to Redis per minute, and this error has happened twice, so that is 2 times out of 30M+ calls. At this rate my workaround doesn't basically affect the performance at all.

@sfai05 sfai05 referenced this issue in wa0x6e/php-resque-ex May 13, 2016
Open

Use `connect` instead of `pconnect` #46

@Shreeshag
Shreeshag commented Jul 1, 2016 edited

Hi there,

This works like a charm for me :-)

No configuration changes required

Try this below code
@sorinv

$redis = new Redis();
ini_set('default_socket_timeout', -1);
 $redis -> pconnect('localhost', 6379);
@tshuy tshuy referenced this issue in ichadhr/wamp-redis Jul 23, 2016
Open

still use the old credentials of hosting server #1

@tmuras
tmuras commented Nov 27, 2016

I have the same issue with "read error on connection" - with centos, PHP7 (4 web nodes), Redis 3.2.5 - and I strongly suspect it's a network/system issue and connections are dropped outside of Redis and phpredis library.
I can reproduce it with CLI script similar to one from eeroniemi and the only work-around I have at the moment is the same - to re-open the connection and re-try.

@Tomdarkness

We've just started running into this issue. However, all we've done is upgrade from PHP 5.6 to 7.1. It's running on the same servers (EC2), connecting to the same redis server (Elasticache). Not changed any Redis/PHP configuration or any code related to Redis.

We started seeing these errors literally as soon as we started the 7.1 rollout.

@vingrad
vingrad commented Feb 10, 2017

@Tomdarkness do you have any solution?

@andygrunwald

Hey,

this issue is not an easy one. I think we all agree. Even more: This issue is different from setup to setup. We had issues with this in the past as well. In the meantime we solved all of them. We wrote down our experience in our blog. Checkout Learn Redis the hard way (in production).

This should not be an advertisement. More some information that might help someone of you to get rid of this issue in your setup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment