Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RedisCluster segfaults after second connection with cache_slots enabled #1591

Closed
zorro-fr24 opened this issue Jul 5, 2019 · 25 comments
Closed
Labels

Comments

@zorro-fr24
Copy link
Contributor

zorro-fr24 commented Jul 5, 2019

Expected behaviour

phpredis should be able to open a connection more than once without segfaulting with the slots cache enabled

Actual behaviour

We are seeing a segfault occur on the second connection attempt. With php-cli we can reproduce by simply connecting twice to the same endpoint in a row ( see below ). With php-fpm behind nginx we see a SIGSEGV on the second http request to the same endpoint.

Setting other cluster distribution and connection persistence and pooling options appear to make no difference

Let me know if you need any more details.

I'm seeing this behaviour on

  • OS: Centos 7 - Linux 3.10.0-957.21.3.el7.x86_64 renameNx delivers false on success #1 SMP Tue Jun 18 16:35:19 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
  • Redis: 4.0.9
  • PHP: 7.2.20 (cli/fpm) (built: Jul 2 2019 13:37:16) ( NTS )
  • phpredis: 5.0.0

Steps to reproduce, backtrace or example script

Installed packages:

yum list installed | grep php
newrelic-php5.x86_64              8.7.0.242-1                    @newrelic      
newrelic-php5-common.noarch       8.7.0.242-1                    @newrelic      
php-cli.x86_64                    7.2.20-1.el7.remi              @remi-php72    
php-common.x86_64                 7.2.20-1.el7.remi              @remi-php72    
php-debuginfo.x86_64              7.2.20-1.el7.remi              @remi-php72-debuginfo
php-fedora-autoloader.noarch      1.0.0-1.el7                    @epel          
php-fpm.x86_64                    7.2.20-1.el7.remi              @remi-php72    
php-intl.x86_64                   7.2.20-1.el7.remi              @remi-php72    
php-json.x86_64                   7.2.20-1.el7.remi              @remi-php72    
php-mbstring.x86_64               7.2.20-1.el7.remi              @remi-php72    
php-mysqlnd.x86_64                7.2.20-1.el7.remi              @remi-php72    
php-opcache.x86_64                7.2.20-1.el7.remi              @remi-php72    
php-pdo.x86_64                    7.2.20-1.el7.remi              @remi-php72    
php-pear.noarch                   1:1.10.9-3.el7.remi            @remi-php72    
php-pecl-apcu.x86_64              5.1.17-1.el7.remi.7.2          @remi-php72    
php-pecl-apcu-debuginfo.x86_64    5.1.17-1.el7.remi.7.2          @remi-php72-debuginfo
php-pecl-igbinary.x86_64          3.0.1-1.el7.remi.7.2           @remi-php72    
php-pecl-igbinary-debuginfo.x86_64
                                  3.0.1-1.el7.remi.7.2           @remi-php72-debuginfo
php-pecl-imagick.x86_64           3.4.4-1.el7.remi.7.2           @remi-php72    
php-pecl-imagick-debuginfo.x86_64 3.4.4-1.el7.remi.7.2           @remi-php72-debuginfo
php-pecl-memcached.x86_64         3.1.3-1.el7.remi.7.2           @remi-php72    
php-pecl-memcached-debuginfo.x86_64
                                  3.1.3-1.el7.remi.7.2           @remi-php72-debuginfo
php-pecl-msgpack.x86_64           2.0.3-1.el7.remi.7.2           @remi-php72    
php-pecl-msgpack-debuginfo.x86_64 2.0.3-1.el7.remi.7.2           @remi-php72-debuginfo
php-pecl-redis5.x86_64            5.0.0-1.el7.remi.7.2           @remi-php72    
php-pecl-redis5-debuginfo.x86_64  5.0.0-1.el7.remi.7.2           @remi-php72-debuginfo
php-pgsql.x86_64                  7.2.20-1.el7.remi              @remi-php72    
php-php-gettext.noarch            1.0.12-1.el7                   @epel          
php-process.x86_64                7.2.20-1.el7.remi              @remi-php72    
php-soap.x86_64                   7.2.20-1.el7.remi              @remi-php72    
php-xml.x86_64                    7.2.20-1.el7.remi              @remi-php72    
php72-php-common.x86_64           7.2.20-1.el7.remi              @remi-safe     
php72-php-json.x86_64             7.2.20-1.el7.remi              @remi-safe     
php72-php-pecl-igbinary.x86_64    3.0.1-1.el7.remi               @remi-safe     
php72-php-pecl-msgpack.x86_64     2.0.3-1.el7.remi               @remi-safe     
php72-runtime.x86_64              2.0-1.el7.remi                 @remi-safe

Testcase:

<?php

$conn = new RedisCluster(null, [ "redis_cluster_001:6379"  ]);

$conn = new RedisCluster(null, [ "redis_cluster_001:6379"  ]);

echo $conn->ping("hello") . "\n";

Backtrace:

GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-114.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/php...Reading symbols from /usr/lib/debug/usr/bin/php.debug...done.
done.
(gdb) run cache_slots_testcase.php
Starting program: /usr/bin/php cache_slots_testcase.php
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
1
[Inferior 1 (process 92190) exited normally]
Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.el7_6.6.x86_64
(gdb) run -d redis.clusters.cache_slots=1 cache_slots_testcase.php
Starting program: /usr/bin/php -d redis.clusters.cache_slots=1 cache_slots_testcase.php
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
1

Program received signal SIGSEGV, Segmentation fault.
0x00007fffdd6aa2df in cluster_disconnect (c=c@entry=0x7ffff38bc000, force=force@entry=0) at /usr/src/debug/php-pecl-redis5-5.0.0/NTS/cluster_library.c:1269
1269	        ZEND_HASH_FOREACH_PTR(node->slaves, slave) {
(gdb) bt
#0  0x00007fffdd6aa2df in cluster_disconnect (c=c@entry=0x7ffff38bc000, force=force@entry=0) at /usr/src/debug/php-pecl-redis5-5.0.0/NTS/cluster_library.c:1269
#1  0x00007fffdd6aa342 in cluster_free (c=c@entry=0x7ffff38bc000, free_ctx=free_ctx@entry=0) at /usr/src/debug/php-pecl-redis5-5.0.0/NTS/cluster_library.c:871
#2  0x00007fffdd694b42 in free_cluster_context (object=0x7ffff38e05c0) at /usr/src/debug/php-pecl-redis5-5.0.0/NTS/redis_cluster.c:342
#3  0x0000555555872fac in zend_objects_store_del (object=0x7ffff38e05c0) at /usr/src/debug/php-7.2.20/Zend/zend_objects_API.c:190
#4  0x00005555558467b9 in _zend_hash_del_el_ex (prev=<optimized out>, p=<optimized out>, idx=<optimized out>, ht=<optimized out>) at /usr/src/debug/php-7.2.20/Zend/zend_hash.c:998
#5  _zend_hash_del_el (p=0x7ffff385f200, idx=8, ht=0x555555c709d0 <executor_globals+304>) at /usr/src/debug/php-7.2.20/Zend/zend_hash.c:1021
#6  zend_hash_reverse_apply (ht=ht@entry=0x555555c709d0 <executor_globals+304>, apply_func=apply_func@entry=0x5555558214c0 <zval_call_destructor>) at /usr/src/debug/php-7.2.20/Zend/zend_hash.c:1603
#7  0x0000555555821985 in shutdown_destructors () at /usr/src/debug/php-7.2.20/Zend/zend_execute_API.c:238
#8  0x0000555555833487 in zend_call_destructors () at /usr/src/debug/php-7.2.20/Zend/zend.c:1017
#9  0x00005555557ccca5 in php_request_shutdown (dummy=dummy@entry=0x0) at /usr/src/debug/php-7.2.20/main/main.c:1851
#10 0x00005555558e7603 in do_cli (argc=4, argv=0x555555c75ff0) at /usr/src/debug/php-7.2.20/sapi/cli/php_cli.c:1178
#11 0x000055555563dd9b in main (argc=4, argv=0x555555c75ff0) at /usr/src/debug/php-7.2.20/sapi/cli/php_cli.c:1403

I've checked

  • [:heavy_check_mark:] There is no similar issue from other users
  • [:heavy_check_mark:] Issue isn't fixed in develop branch
yatsukhnenko added a commit that referenced this issue Jul 5, 2019
@yatsukhnenko
Copy link
Member

@zorro-fr24 thank you for this report!

@taylorotwell
Copy link

taylorotwell commented Jul 5, 2019

Can confirm we also experienced this issue with a Laravel product on PHP 7.3.6.

michael-grunder pushed a commit that referenced this issue Jul 5, 2019
* Issue #1591

* Add notes to Changelog
@michael-grunder
Copy link
Member

This should now be fixed in the develop branch.

@michael-grunder
Copy link
Member

@zorro-fr24 Are you able to verify that the fix works on your end. We probably want to get 5.0.1 out soon since this bug crashes phpredis.

@zorro-fr24
Copy link
Contributor Author

zorro-fr24 commented Jul 9, 2019

@zorro-fr24 Are you able to verify that the fix works on your end. We probably want to get 5.0.1 out soon since this bug crashes phpredis.

Apologies, I was planning to test it directly, but I went on a poorly timed vacation ;)

I was able to login and compile the latest develop (5276474) in our test environment and I am no longer seeing segfaults. Our applications seem to run/pass tests etc.

Thanks for the quick fix!

@yatsukhnenko
Copy link
Member

We released 5.0.1 with the fix of this issue. Thank again to @zorro-fr24 for detailed report which allowed quickly fix this critical bug :

yatsukhnenko added a commit that referenced this issue Jul 12, 2019
@kalmykov
Copy link

kalmykov commented Jul 15, 2019

Hi guys - we are still experiencing some problems with cache_slots enabled. The problems we are seeing currently: number of redis connections grows indefinitely, and we are seeing random SIGABRT in php-fpm logs.

When trying to run same test script as in issue description (under same environment except that now we run phpredis 5.0.1), we see segmentation fault:

Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/php...Reading symbols from /usr/lib/debug/usr/bin/php.debug...done.
done.
(gdb) run -d redis.clusters.cache_slots=0 cache_slots_testcase.php
Starting program: /usr/bin/php -d redis.clusters.cache_slots=0 cache_slots_testcase.php
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
1
[Inferior 1 (process 93728) exited normally]
(gdb) run -d redis.clusters.cache_slots=1 cache_slots_testcase.php
Starting program: /usr/bin/php -d redis.clusters.cache_slots=1 cache_slots_testcase.php
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
1

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff4bb87b9 in malloc_consolidate (av=av@entry=0x7ffff4efe760 <main_arena>) at malloc.c:4153
4153		    unlink(av, p, bck, fwd);
(gdb) bt
#0  0x00007ffff4bb87b9 in malloc_consolidate (av=av@entry=0x7ffff4efe760 <main_arena>) at malloc.c:4153
#1  0x00007ffff4bb94ee in _int_free (av=0x7ffff4efe760 <main_arena>, p=0x555555ce3a80, have_lock=0) at malloc.c:4053
#2  0x0000555555833278 in zend_shutdown () at /usr/src/debug/php-7.2.20/Zend/zend.c:910
#3  0x00005555557cdc7b in php_module_shutdown () at /usr/src/debug/php-7.2.20/main/main.c:2453
#4  0x000055555563dceb in main (argc=6, argv=0x555555c75ec0) at /usr/src/debug/php-7.2.20/sapi/cli/php_cli.c:1418 ```

@yatsukhnenko
Copy link
Member

yatsukhnenko commented Jul 15, 2019

diff --git a/redis.c b/redis.c
index b087ac4..504fcda 100644
--- a/redis.c
+++ b/redis.c
@@ -559,8 +559,10 @@ free_reply_callbacks(RedisSock *redis_sock)
 
 /* Passthru for destroying cluster cache */
 static void cluster_cache_dtor(zend_resource *rsrc) {
-    redisCachedCluster *rcc = (redisCachedCluster*)rsrc->ptr;
-    cluster_cache_free(rcc);
+    if (rsrc->ptr) {
+        redisCachedCluster *rcc = (redisCachedCluster*)rsrc->ptr;
+        cluster_cache_free(rcc);
+    }
 }

@kalmykov could you try to apply this patch and test?

@michael-grunder
Copy link
Member

michael-grunder commented Jul 15, 2019

Hi, @kalmykov Let us know if this solves your issue.

II'll see if there's a simple way to add automatic tests that do things like use the connection pool and cluster cache across multiple connections. Right now all of our tests are via cli which makes bugs like this less likely to present themselves.

Edit: Also it would be awesome if others can help test slot caching in more production like environment (staging or similar) before we prepare a 5.0.2 release. Neither @yatsukhnenko or myself use RedisCluster (let alone slot caching) in production so it's harder for us to do this.

@michael-grunder
Copy link
Member

A bit more info for everybody.

I've been running the RedisCluster unit tests under php-fpm all day and no longer see any crashes with @yatsukhnenko's patch applied. Without the patch I see the exact same SIGABRT problem.

I'm going to keep the tests running for a while to make sure we don't see any other issues.

@yatsukhnenko
Copy link
Member

Ping @kalmykov, @zorro-fr24

@kalmykov
Copy link

unfortunately I still see same almost the segmentation fault when running example from issue description. There is small difference in backtrace though (new is on the right side):
Screenshot 2019-07-16 11 28 25

@yatsukhnenko
Copy link
Member

Are you sure you have applied the patch? 😄

@kalmykov
Copy link

Yes, I suspected that as well 😄 On my local machine (OSX) I stopped getting this error, but not on our web servers. Here is what git diff says:

index ad25b43..15ec3ce 100644
--- a/php_redis.h
+++ b/php_redis.h
@@ -25,7 +25,7 @@
 #define PHP_REDIS_H

 /* phpredis version */
-#define PHP_REDIS_VERSION "5.1.0-dev"
+#define PHP_REDIS_VERSION "5.1.0-cluster-cache-patch"

 PHP_METHOD(Redis, __construct);
 PHP_METHOD(Redis, __destruct);
diff --git a/redis.c b/redis.c
index b087ac4..504fcda 100644
--- a/redis.c
+++ b/redis.c
@@ -559,8 +559,10 @@ free_reply_callbacks(RedisSock *redis_sock)

 /* Passthru for destroying cluster cache */
 static void cluster_cache_dtor(zend_resource *rsrc) {
-    redisCachedCluster *rcc = (redisCachedCluster*)rsrc->ptr;
-    cluster_cache_free(rcc);
+    if (rsrc->ptr) {
+        redisCachedCluster *rcc = (redisCachedCluster*)rsrc->ptr;
+        cluster_cache_free(rcc);
+    }
 }

After that in phpredis dir I executed:

make clean
phpize
./configure --enable-redis-igbinary --enable-redis-json
make
sudo make install
echo "extension=redis.so" >> /etc/php.d/90-php.ini

If I run echo(phpversion("redis")); I get 5.1.0-cluster-cache-patch. But I still get segmentation fault when running example 🤔 Its like I am missing something

@yatsukhnenko
Copy link
Member

Have you restarted php-fpm?

@kalmykov
Copy link

kalmykov commented Jul 16, 2019

Yes, however I am running php-cli when getting segfault so I don't think it makes a difference?

Edit: I also ran php-fpm for a short while and got the same errors as before

@michael-grunder
Copy link
Member

I'll see if I can get it to break in a CentOS docker environment.

And just to be clear, you're able to segfault redis by just running that three line script above? Theoretically that should make it much simpler to fix once we can replicate the issue.

@kalmykov
Copy link

If I run a script in cli, I get segfault in ~80%. If I run it through gdb I get segfault all the time

@michael-grunder
Copy link
Member

Awesome, thanks for the info. We actually had a recent issue on production that only presented itself under CentOS, so hopefully it's as simple as running it there.

Are you on the most current patch of CentOS 7 (1810 I think)?

@kalmykov
Copy link

yes: centos-release-7-6.1810.2.el7.centos.x86_64

michael-grunder added a commit that referenced this issue Jul 17, 2019
@michael-grunder
Copy link
Member

Hi @kalmykov I've pushed a fix that incorporates @yatsukhnenko's changes and one more correction.

This bug would've been detected if I had run the tests through valgrind as the double free was reported. It's probably smart to add valgrind to the automatic testing for situations like this.

Hopefully this fixes the issue for you.

@kalmykov
Copy link

@michael-grunder @yatsukhnenko thanks a lot! I compiled extension from issue.1591-segfault-redux branch and I stopped getting any segfaults in both cli and fpm modes. I also been running extension on our production server for 2 hours and so far haven't seen any problems 🎉

@michael-grunder
Copy link
Member

Glad to hear it! I'll go ahead and close the issue but don't hesitate to let us know if you have any other issues.

@kalmykov
Copy link

Hi guys, do you know when this fix is planned to be released in PECL?

@yatsukhnenko
Copy link
Member

On Monday :)

yatsukhnenko pushed a commit that referenced this issue Oct 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants