async redis client segmentation fault upon brpop() if redis client connection is reconnected #1502

simonhf · 2018-03-01T19:16:02Z

Please answer these questions before submitting your issue. Thanks!

What did you do? If possible, provide a recipe for reproducing the error.

An async redis client connects to redis using the following code which connects to redis, reconnects to redis if the connection is lost, upon connection does a redis brpop() with 5 second timeout, upon brpop() timeout does another brpop() with a 5 second timeout:

$ cat swoole-brpop-issue.php 
<?php
$redis_client = new swoole_redis;
$redis_client->on("close", function(swoole_redis $redis_client) {
        printf("%f - disconnected from redis by peer\n", microtime(true));
        redis_connect($redis_client);
});
redis_connect($redis_client);

function redis_connect(swoole_redis $redis_client) {
        $redis_client->connect('127.0.0.1', 6379, function (swoole_redis $redis_client, $result) {
                printf("%f - connected to redis and executing BRPOP\n", microtime(true));
                redis_brpop_callback($redis_client, NULL, "myqueue");
        });
}

function redis_brpop_callback(swoole_redis $redis_client, $result, $queue_name) {
        printf("%f - BRPOP(%s) callback; restart/timeout/result\n", microtime(true), $queue_name);
        if($result) {
                # process result here
        }
        $redis_client->brpop($queue_name, 5, function(swoole_redis $redis_client, $result) use($queue_name){ redis_brpop_callback($redis_client, $result, $queue_name); });
}

swoole_process::signal(SIGINT, function() {
        printf("\n%f - caught ctrl-c; exiting\n", microtime(true));
        swoole_event_exit();
});

printf("%f - entering event loop\n", microtime(true));
swoole_event_wait();

Unfortunately if e.g. redis server is restarted (sudo service redis-server restart) the next brpop() executed with the new redis connection causes the following segmentation fault:

$ php swoole-brpop-issue.php 
1519932571.340852 - entering event loop
1519932571.340909 - connected to redis and executing BRPOP
1519932571.340914 - BRPOP(myqueue) callback; restart/timeout/result
1519932576.414955 - BRPOP(myqueue) callback; restart/timeout/result
1519932581.473059 - BRPOP(myqueue) callback; restart/timeout/result
1519932583.618951 - disconnected from redis by peer
1519932584.618865 - connected to redis and executing BRPOP
1519932584.619085 - BRPOP(myqueue) callback; restart/timeout/result
PHP Warning:  swoole_redis::__call(): redis client connection is closed. in /home/simon/20180226-swoole-proxy/swoole-brpop-issue.php on line 33
Segmentation fault (core dumped)

What did you expect to see?

No segmentation fault and just an error than brpop() cannot work if there is no redis connection.

What did you see instead?

The segfault.

What version of Swoole are you using (php --ri swoole)?

$ php --ri swoole

swoole

swoole support => enabled
Version => 1.9.23
Author => tianfeng.han[email: mikan.tenny@gmail.com]
epoll => enabled
eventfd => enabled
timerfd => enabled
signalfd => enabled
cpu affinity => enabled
spinlock => enabled
rwlock => enabled
async redis client => enabled
async http/websocket client => enabled
sockets => enabled
openssl => enabled
http2 => enabled
Linux Native AIO => enabled
pcre => enabled
zlib => enabled
mutex_timedlock => enabled
pthread_barrier => enabled
futex => enabled

Directive => Local Value => Master Value
swoole.aio_thread_num => 2 => 2
swoole.display_errors => On => On
swoole.use_namespace => Off => Off
swoole.fast_serialize => Off => Off
swoole.unixsock_buffer_size => 8388608 => 8388608

What is your machine environment used (including version of kernel & php & gcc) ?

$ php -v
PHP 7.1.11-0ubuntu0.17.10.1 (cli) (built: Nov  1 2017 16:30:52) ( NTS )
Copyright (c) 1997-2017 The PHP Group
Zend Engine v3.1.0, Copyright (c) 1998-2017 Zend Technologies
    with Zend OPcache v7.1.11-0ubuntu0.17.10.1, Copyright (c) 1999-2017, by Zend Technologies

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 7.2.0-8ubuntu3.2' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 7.2.0 (Ubuntu 7.2.0-8ubuntu3.2)

If you are using ssl, what is your openssl version?

No SSL.

The text was updated successfully, but these errors were encountered:

simonhf · 2018-03-01T22:39:13Z

As a workaround I also tried the modified script which creates a brand new redis client instance instead of trying to reuse the last one and connect it again:

$ cat swoole-brpop-issue-workaround.php 
<?php
redis_new_and_connect();

function redis_new_and_connect() {
        unset($GLOBALS['redis_client']);
        $GLOBALS['redis_client'] = new swoole_redis;
        $GLOBALS['redis_client']->on("close", function(swoole_redis $redis_client) {
                printf("%f - disconnected from redis by peer\n", microtime(true));
                redis_new_and_connect();
        });
        $GLOBALS['redis_client']->connect('127.0.0.1', 6379, function (swoole_redis $redis_client, $result) {
                printf("%f - connected to redis and executing BRPOP\n", microtime(true));
                redis_brpop_callback($redis_client, NULL, "myqueue");
        });
}

function redis_brpop_callback(swoole_redis $redis_client, $result, $queue_name) {
        printf("%f - BRPOP(%s) callback; restart/timeout/result\n", microtime(true), $queue_name);
        if($result) {
                # process result here
        }
        $redis_client->brpop($queue_name, 5, function(swoole_redis $redis_client, $result) use($queue_name){ redis_brpop_callback($redis_client, $result, $queue_name); });
}

swoole_process::signal(SIGINT, function() {
        printf("\n%f - caught ctrl-c; exiting\n", microtime(true));
        swoole_event_exit();
});

printf("%f - entering event loop\n", microtime(true));
swoole_event_wait();

But it also has the same issue when redis server is restarted:

$ php swoole-brpop-issue-workaround.php 
1519943575.353548 - entering event loop
1519943575.353626 - connected to redis and executing BRPOP
1519943575.353632 - BRPOP(myqueue) callback; restart/timeout/result
1519943580.447851 - BRPOP(myqueue) callback; restart/timeout/result
1519943585.504992 - BRPOP(myqueue) callback; restart/timeout/result
1519943587.962970 - disconnected from redis by peer
1519943587.963174 - connected to redis and executing BRPOP
1519943587.963186 - BRPOP(myqueue) callback; restart/timeout/result
PHP Warning:  swoole_redis::__call(): redis client connection is closed. in /home/simon/20180226-swoole-proxy/swoole-brpop-issue-workaround.php on line 22
^C
1519943615.483674 - caught ctrl-c; exiting

Interestingly with this version of the code then no segfault happens, just the unexpected error message. So it's a slight improvement.

This makes me wonder if there is something weird going on with all the closure stuff? Any ideas?

matyhtf · 2018-03-02T03:38:45Z

I have fixed this bug.

When reconnecting need to set a timer, or the connection may be rejected because the redis server is starting.

$redis_client->on("close", function(swoole_redis $redis_client) {
        printf("%f - disconnected from redis by peer\n", microtime(true));
        swoole_timer_after(2000, function () use ($redis_client) {
            redis_connect($redis_client);
        });
});

simonhf · 2018-03-02T18:17:30Z

Thanks @matyhtf !

For other people interested in this redis client code. Here is some with a gentle redis retry using a single global 1,000 ms timer tick:

$ cat swoole-brpop-issue-workaround.php    
<?php
$redis_client = new swoole_redis;
$redis_client->on("close", function(swoole_redis $redis_client) {
        printf("%f - disconnected from redis by peer; %s=getState()\n", microtime(true), $redis_client->getState());
});

swoole_timer_tick(1000, function($id) {
        #printf("%f - timer; %s=getState()\n", microtime(true), $GLOBALS['redis_client']->getState());
        if ((0 == $GLOBALS['redis_client']->getState()) || (4 == $GLOBALS['redis_client']->getState())) {
                printf("%f - connecting to redis; %s=getState()\n", microtime(true), $GLOBALS['redis_client']->getState());
                $GLOBALS['redis_client']->connect('127.0.0.1', 6379, function (swoole_redis $redis_client, $result) {
                        if ($result) {
                                printf("%f - connected to redis and executing BRPOP; %s=getState()\n", microtime(true), $redis_client->getState());
                                redis_brpop_callback($redis_client, NULL, "myqueue");
                        }
                        else {
                                printf("%f - connection to redis failed; %s=getState()\n", microtime(true), $redis_client->getState());
                        }
                }); # redis connect event
        } # redis disconnected
});

function redis_brpop_callback(swoole_redis $redis_client, $result, $queue_name) {
        printf("%f - BRPOP(%s) callback; restart/timeout/result; %s=getState()\n", microtime(true), $queue_name, $redis_client->getState());
        if($result) {
                # process result here
        }
        $redis_client->brpop($queue_name, 5, function(swoole_redis $redis_client, $result) use($queue_name){ redis_brpop_callback($redis_client, $result, $queue_name); });
}

swoole_process::signal(SIGINT, function() {
        printf("\n%f - caught ctrl-c; exiting\n", microtime(true));
        swoole_event_exit();
});

printf("%f - entering event loop\n", microtime(true));
swoole_event_wait();

The the output when redis-server is restarted twice:

$ php swoole-brpop-issue-workaround.php 
1520014397.205857 - entering event loop
1520014398.208137 - connecting to redis; 0=getState()
1520014398.208607 - connected to redis and executing BRPOP; 1=getState()
1520014398.208629 - BRPOP(myqueue) callback; restart/timeout/result; 1=getState()
1520014403.248286 - BRPOP(myqueue) callback; restart/timeout/result; 1=getState()
1520014405.389933 - disconnected from redis by peer; 4=getState()
1520014406.208031 - connecting to redis; 4=getState()
1520014406.208365 - connected to redis and executing BRPOP; 1=getState()
1520014406.208422 - BRPOP(myqueue) callback; restart/timeout/result; 1=getState()
1520014411.228521 - BRPOP(myqueue) callback; restart/timeout/result; 2=getState()
1520014414.178245 - disconnected from redis by peer; 4=getState()
1520014414.205553 - connecting to redis; 4=getState()
1520014414.205756 - connection to redis failed; 4=getState()
1520014415.207930 - connecting to redis; 4=getState()
1520014415.208212 - connected to redis and executing BRPOP; 1=getState()
1520014415.208230 - BRPOP(myqueue) callback; restart/timeout/result; 1=getState()
1520014420.225662 - BRPOP(myqueue) callback; restart/timeout/result; 2=getState()
^C
1520014425.117773 - caught ctrl-c; exiting

simonhf · 2018-03-02T18:37:55Z

@matyhtf this also reminds me of another ticket I have open [1] because the tests don't run for me and other people. I'm guessing it's because there is environment setup that needs to be done before the tests are run, but this setup does not seem to be documented anywhere. And also, the tests do not test for whether the test environment is setup etc.

This got me to thinking that perhaps the example code presented in this ticket would be useful as a new redis client test for reconnecting to a redis-server?

On other projects I have worked on then it's standard policy to -- before fixing the bug, e.g. segfault -- create a failing test, then fix the bug, and then check that the new test is no longer failing. This helps in a few ways: Hopefully the bug never regresses because the test is now continually testing. And the test code itself acts as useful example code to new users of the library. Perhaps you and your fellow developers could adopt such a policy for swoole in the hopes that it will then become much more bullet proof over time in terms of quality? :-)

[1] #1403

simonhf changed the title ~~async redis client segmentation fault upon brpop() if redis client connection is closed~~ async redis client segmentation fault upon brpop() if redis client connection is reconnected Mar 1, 2018

matyhtf closed this as completed in cec84c0 Mar 2, 2018

matyhtf added the bug label Mar 2, 2018

matyhtf added a commit that referenced this issue Mar 2, 2018

fixed #1502

0b11daf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

async redis client segmentation fault upon brpop() if redis client connection is reconnected #1502

async redis client segmentation fault upon brpop() if redis client connection is reconnected #1502

simonhf commented Mar 1, 2018 •

edited

simonhf commented Mar 1, 2018 •

edited

matyhtf commented Mar 2, 2018

simonhf commented Mar 2, 2018

simonhf commented Mar 2, 2018

async redis client segmentation fault upon brpop() if redis client connection is reconnected #1502

async redis client segmentation fault upon brpop() if redis client connection is reconnected #1502

Comments

simonhf commented Mar 1, 2018 • edited

simonhf commented Mar 1, 2018 • edited

matyhtf commented Mar 2, 2018

simonhf commented Mar 2, 2018

simonhf commented Mar 2, 2018

simonhf commented Mar 1, 2018 •

edited

simonhf commented Mar 1, 2018 •

edited