Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hung apaches on pthread wrlocks #19

Closed
anoakie opened this issue Jun 14, 2013 · 51 comments
Closed

Hung apaches on pthread wrlocks #19

anoakie opened this issue Jun 14, 2013 · 51 comments
Labels
bug

Comments

@anoakie
Copy link

@anoakie anoakie commented Jun 14, 2013

We seem to be getting hung apaches w/PHP 5.5.0 RC3 w/APCu.

0x00007f1a9572853d in pthread_rwlock_wrlock () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) bt
#0 0x00007f1a9572853d in pthread_rwlock_wrlock () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f1a90debd49 in apc_lock_wlock (lock=) at /home/phpbuild/buildphp-5.5.0RC3/apcu/apc_lock.c:129
#2 0x00007f1a90df05b9 in apc_cache_insert (cache=0x7f1a970d2960, key=..., value=0x7f1a83591880, ctxt=0x7fff1c983920, t=1371246459, exclusive=0 '\000') at /home/phpbuild/buildphp-5.5.0RC3/apcu/apc_cache.c:837
#3 0x00007f1a90df0b85 in apc_cache_store (cache=0x7f1a970d2960, strkey=0x7f1a9778ccc0 "AutoLoad::prefix::6::xxxxxxxxxxxxxxxxxxx::xxxxxxxxxxxxxxxxxx", keylen=52, val=0x7f1a977b5238, ttl=600, exclusive=0 '\000')

at /home/phpbuild/buildphp-5.5.0RC3/apcu/apc_cache.c:443

#4 0x00007f1a90dedb5b in apc_store_helper (ht=, return_value=0x7f1a977b5208, exclusive=0 '\000', return_value_ptr=, this_ptr=, return_value_used=)

at /home/phpbuild/buildphp-5.5.0RC3/apcu/php_apc.c:568

#5 0x00007f1a93f80de6 in zend_do_fcall_common_helper_SPEC (execute_data=0x7f1a96df65b8) at /home/phpbuild/buildphp-5.5.0RC3/php-src-php-5.5.0RC3/Zend/zend_vm_execute.h:543
#6 0x00007f1a93f450d8 in execute_ex (execute_data=0x7f1a96df65b8) at /home/phpbuild/buildphp-5.5.0RC3/php-src-php-5.5.0RC3/Zend/zend_vm_execute.h:356
#7 0x00007f1a93ec8b70 in zend_call_function (fci=0x7fff1c983cb0, fci_cache=) at /home/phpbuild/buildphp-5.5.0RC3/php-src-php-5.5.0RC3/Zend/zend_execute_API.c:939
#8 0x00007f1a93eee3d7 in zend_call_method (object_pp=0x0, obj_ce=, fn_proxy=0x7f1a977137e0, function_name=0x7f1a977136a8 "autoload::loadone", function_name_len=,

retval_ptr_ptr=0x7fff1c983de0, param_count=1, arg1=0x7f1a97772900, arg2=0x0) at /home/phpbuild/buildphp-5.5.0RC3/php-src-php-5.5.0RC3/Zend/zend_interfaces.c:97

#9 0x00007f1a93dd8507 in zif_spl_autoload_call (ht=, return_value=, return_value_ptr=, this_ptr=, return_value_used=)

at /home/phpbuild/buildphp-5.5.0RC3/php-src-php-5.5.0RC3/ext/spl/php_spl.c:436

#10 0x00007f1a93ec8b14 in zend_call_function (fci=0x7fff1c983fc0, fci_cache=0x7fff1c984010) at /home/phpbuild/buildphp-5.5.0RC3/php-src-php-5.5.0RC3/Zend/zend_execute_API.c:957
#11 0x00007f1a93ec93fb in zend_lookup_class_ex (name=0x7f1a96f7d988 "xxxxxxxxx", name_length=9, key=0x7f1a977b52b0, use_autoload=1, ce=0x7fff1c9840d8)

at /home/phpbuild/buildphp-5.5.0RC3/php-src-php-5.5.0RC3/Zend/zend_execute_API.c:1107

#12 0x00007f1a93ec9b22 in zend_fetch_class_by_name (class_name=0x7f1a96f7d988 "xxxxxxxxx", class_name_len=, key=, fetch_type=4)

at /home/phpbuild/buildphp-5.5.0RC3/php-src-php-5.5.0RC3/Zend/zend_execute_API.c:1587

#13 0x00007f1a93f12c1a in ZEND_FETCH_CLASS_SPEC_CONST_HANDLER (execute_data=0x7f1a96df4ed8) at /home/phpbuild/buildphp-5.5.0RC3/php-src-php-5.5.0RC3/Zend/zend_vm_execute.h:1188
#14 0x00007f1a93f450d8 in execute_ex (execute_data=0x7f1a96df4ed8) at /home/phpbuild/buildphp-5.5.0RC3/php-src-php-5.5.0RC3/Zend/zend_vm_execute.h:356
#15 0x00007f1a93ed83b3 in zend_execute_scripts (type=8, retval=0x0, file_count=3) at /home/phpbuild/buildphp-5.5.0RC3/php-src-php-5.5.0RC3/Zend/zend.c:1316
#16 0x00007f1a93e7661c in php_execute_script (primary_file=0x7fff1c9865b0) at /home/phpbuild/buildphp-5.5.0RC3/php-src-php-5.5.0RC3/main/main.c:2481
#17 0x00007f1a93f8408d in php_handler (r=0x7f1a960f70a0) at /home/phpbuild/buildphp-5.5.0RC3/php-src-php-5.5.0RC3/sapi/apache2handler/sapi_apache2.c:667
#18 0x00007f1a96239508 in ap_run_handler ()
#19 0x00007f1a9623997e in ap_invoke_handler ()
#20 0x00007f1a96249570 in ap_process_request ()
#21 0x00007f1a96246398 in ?? ()
#22 0x00007f1a9623ffa8 in ap_run_process_connection ()
#23 0x00007f1a9624e1d0 in ?? ()
#24 0x00007f1a9624e93a in ?? ()
#25 0x00007f1a9624f4e7 in ap_mpm_run ()
#26 0x00007f1a962244a4 in main ()

@krakjoe
Copy link
Owner

@krakjoe krakjoe commented Jun 25, 2013

Could we know a bit about the software you are running, or anything else about your setup that might help to reproduce the same behaviour ...

frame appears significant, but it's difficult to guess what has caused it ...

@weltling
Copy link
Collaborator

@weltling weltling commented Jul 27, 2013

@anoakie ping ... any update on this, a repro script maybe?

@anoakie
Copy link
Author

@anoakie anoakie commented Jul 27, 2013

We tried with both PHP 5.5 RC3 and PHP 5.5.0 final. It's quite hard to reproduce. It happens to random servers in our apache cluster during heavy load every few days.
We're running Ubuntu's apache 2.2.22 prefork. We build a custom PHP 5.5.0 package.
We've reverted to PHP 5.3 + APC for now. I'm currently on sabbatical, so I'm not able to look into this further.
I'm adding my coworkers to this thread so they hopefully expand on this while I'm gone.

@krakjoe
Copy link
Owner

@krakjoe krakjoe commented Aug 12, 2013

@anoakie it's been a while ... I got to assume you still have the error (I think others have mentioned the same thing) ... without a way to reproduce I'm not able to debug ... it may be that your operating systems default rwlock is not suitable in some way (which we'll eventually find and fix when we can reproduce reliably and determine that to be definite cause) ... the only workaround I can suggest to get things going is to disable the use of rwlocks with --disable-apcu-rwlocks on configure for apcu ...

@magicmonkey
Copy link

@magicmonkey magicmonkey commented Oct 10, 2013

We're experiencing what I think is this problem as well - all our Apache children hang, and a GDB trace shows this:

#0  0x00002b50542905f0 in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
#1  0x00002b505faf253d in apc_lock_wlock (lock=0x2b5071483088) at /usr/src/redhat/BUILD/php-pecl-apcu-4.0.2/apc_lock.c:252
#2  0x00002b505faf61a5 in apc_cache_insert (cache=0x2b5071449d60, key=..., value=0x2b5071943a38, ctxt=0x7fffeae5fc60, t=1381421665, exclusive=0 '\000')
    at /usr/src/redhat/BUILD/php-pecl-apcu-4.0.2/apc_cache.c:841
#3  0x00002b505faf64a5 in apc_cache_store (cache=0x2b5071449d60, 
    strkey=0x2b5067d86480 "/var/lamp/code/v-2013-10-r63/application-apps--/etc/environment.ini-config", keylen=75, val=0x2b5052a11250, ttl=3600, 
    exclusive=0 '\000') at /usr/src/redhat/BUILD/php-pecl-apcu-4.0.2/apc_cache.c:440
#4  0x00002b505faf3bff in apc_store_helper (ht=<value optimized out>, return_value=0x2b5067c137c0, return_value_ptr=<value optimized out>, 
    this_ptr=<value optimized out>, return_value_used=<value optimized out>, exclusive=0 '\000')
    at /usr/src/redhat/BUILD/php-pecl-apcu-4.0.2/php_apc.c:659
#5  0x00002b505d1aee59 in zend_do_fcall_common_helper_SPEC (execute_data=0x2b50527e5578) at /usr/src/redhat/BUILD/php-5.5.3/Zend/zend_vm_execute.h:543
#6  0x00002b505d1ed7f8 in execute_ex (execute_data=0x2b50527e5578) at /usr/src/redhat/BUILD/php-5.5.3/Zend/zend_vm_execute.h:356
#7  0x00002b505d17b7da in zend_execute_scripts (type=8, retval=0x0, file_count=3) at /usr/src/redhat/BUILD/php-5.5.3/Zend/zend.c:1316
#8  0x00002b505d11c699 in php_execute_script (primary_file=0x7fffeae62340) at /usr/src/redhat/BUILD/php-5.5.3/main/main.c:2484
#9  0x00002b505d22828d in php_handler (r=0x2b509186a968) at /usr/src/redhat/BUILD/php-5.5.3/sapi/apache2handler/sapi_apache2.c:667
#10 0x00002b50527a5e8a in ap_run_handler ()
#11 0x00002b50527a9318 in ap_invoke_handler ()
#12 0x00002b50527b3c7a in ap_internal_redirect ()
#13 0x00002b505b5f7c70 in ap_make_dirstr_parent () from /etc/httpd/modules/mod_rewrite.so
#14 0x00002b50527a5e8a in ap_run_handler ()
#15 0x00002b50527a9318 in ap_invoke_handler ()
#16 0x00002b50527b3e28 in ap_process_request ()
#17 0x00002b50527b1010 in ?? ()
#18 0x00002b50527ad112 in ap_run_process_connection ()
#19 0x00002b50527b82c9 in ?? ()
#20 0x00002b50527b855a in ?? ()
#21 0x00002b50527b8dbd in ap_mpm_run ()
#22 0x00002b5052792fd8 in main ()

This is on a not-particularly-busy server. There are a couple of places in our code where we write to APCu, and various Apache children are stuck on writing various different keys (ie they're at different points in our code but all are hung when writing to APCu)

I'm guessing something has obtained the futex and not released it for some reason, but I'm not sure how to find out what.

Here's the APC bit from "php -i" (I'm pretty sure we're using the same config in Apache as on the CLI):

apc

APC support => Emulated

apcu

APCu Support => enabled
Version => 4.0.2
APCu Debugging => Disabled
MMAP Support => Enabled
MMAP File Mask => /tmp/apc.Zk0HWP
Serialization Support => php, eval
Revision => $Revision: 328290 $
Build Date => Oct  9 2013 10:17:53

Directive => Local Value => Master Value
apc.coredump_unmap => Off => Off
apc.enable_cli => On => On
apc.enabled => On => On
apc.entries_hint => 8192 => 8192
apc.gc_ttl => 3600 => 3600
apc.mmap_file_mask => /tmp/apc.Zk0HWP => /tmp/apc.Zk0HWP
apc.preload_path => no value => no value
apc.rfc1867 => Off => Off
apc.rfc1867_freq => 0 => 0
apc.rfc1867_name => APC_UPLOAD_PROGRESS => APC_UPLOAD_PROGRESS
apc.rfc1867_prefix => upload_ => upload_
apc.rfc1867_ttl => 3600 => 3600
apc.serializer => php => php
apc.shm_segments => 1 => 1
apc.shm_size => 512M => 512M
apc.slam_defense => On => On
apc.smart => 0 => 0
apc.ttl => 7200 => 7200
apc.use_request_time => On => On
apc.writable => /tmp => /tmp
@magicmonkey
Copy link

@magicmonkey magicmonkey commented Oct 10, 2013

(gdb) frame 1
#1  0x00002b505faf253d in apc_lock_wlock (lock=0x2b5071483088) at /usr/src/redhat/BUILD/php-pecl-apcu-4.0.2/apc_lock.c:252
252 /usr/src/redhat/BUILD/php-pecl-apcu-4.0.2/apc_lock.c: No such file or directory.
    in /usr/src/redhat/BUILD/php-pecl-apcu-4.0.2/apc_lock.c
(gdb) p *lock
$1 = {__data = {__lock = 0, __nr_readers = 4294967293, __readers_wakeup = 6, __writer_wakeup = 5, __nr_readers_queued = 0, __nr_writers_queued = 50, 
    __writer = 0, __shared = 128, __pad1 = 0, __pad2 = 0, __flags = 0}, 
  __size = "\000\000\000\000\375\377\377\377\006\000\000\000\005\000\000\000\000\000\000\000\062\000\000\000\000\000\000\000\200", '\000' <repeats 26 times>, __align = -12884901888}
@rathers
Copy link

@rathers rathers commented Oct 14, 2013

I believe I have managed to reproduce this issue with a test script and some load injection. I've uploaded the test script and instructions here:

https://github.com/rathers/apcu-repro

Essentially the test script does a load of apc_store and apc_fetch calls against the same key. Loading this at increasing rates eventually triggers the number of apache children to spiral out of control and lock up.

krakjoe added a commit that referenced this issue Oct 14, 2013
@krakjoe
Copy link
Owner

@krakjoe krakjoe commented Oct 14, 2013

@rathers Thanks for your effort ... got some time this evening to have a look, think we got it ... can I get some feedback ??

@glenjamin
Copy link

@glenjamin glenjamin commented Oct 14, 2013

Hi, I'm trying to get a handle on what has changed in that diff to reason about how the fix works.

As far as I can tell, it:

  • removes eval serialiser (unrelated)
  • ensures apache cannot kill a thread while it has a lock
  • checks the state of cache-busy more often

Under what circumstances is the cache considered "busy"?

@krakjoe
Copy link
Owner

@krakjoe krakjoe commented Oct 14, 2013

The eval serializer is unrelated, it was out in the wild for testing, and caused more harm than good ... so it's gotta go ...

The cache is considered busy during gc, which is invoked implicitly throughout, sometimes by apc itself, sometimes by the allocator underneath ... allowing php to call a function that operates on a busy cache is futile ...

The thing that fixes the problem with apache is utilizing [un]block interruptions on locks, just like APC did ...

@rathers
Copy link

@rathers rathers commented Oct 15, 2013

@krakjoe, thanks for the commit. We've rebuilt apcu including this commit and deployed it to a test box. It hasn't made any difference :( We're still seeing the same locking problem, reproducible in exactly the same way as before. Any further ideas to try out?

@krakjoe
Copy link
Owner

@krakjoe krakjoe commented Oct 15, 2013

Damn ... I have that pulse you get under your eye when things get a bit stressful ...

I'm a bit baffled, let me have some thinking time ...

I'm not able to make apache spiral out of control anymore, the number of processes remain steady and apache remains responsive but slow ...

@rathers
Copy link

@rathers rathers commented Oct 15, 2013

What version of apache are you using? We're still on 2.2 which may or may not make a difference!

@krakjoe
Copy link
Owner

@krakjoe krakjoe commented Oct 15, 2013

[joe@fiji php-src]$ httpd-nts -v
Server version: Apache/2.2.23 (Unix)
Server built:   Mar 15 2013 10:39:59
[joe@fiji php-src]$ httpd-zts -v
Server version: Apache/2.2.23 (Unix)
Server built:   Sep 16 2013 09:55:35
@krakjoe
Copy link
Owner

@krakjoe krakjoe commented Oct 15, 2013

try now ??

bit of a mistake on my part there, and an omission ... this has got to work ...

@krakjoe
Copy link
Owner

@krakjoe krakjoe commented Oct 15, 2013

here is the options I am running http_load with:

./http_load -parallel 100 -rate 10 -seconds 100 apc-bench

I've run it several times (lots and lots and still am) like that without a problem ... is that enough ??

@rathers
Copy link

@rathers rathers commented Oct 15, 2013

That looks a lot higher than I've managed to achieve. How beefy is your box? Try it without -parallel as I'm not sure which would take precedence.

We're just redeploying APCu with your latest patch then will try again

@krakjoe
Copy link
Owner

@krakjoe krakjoe commented Oct 15, 2013

8 (4+HT) cores @ 3.4ghz with 16GB DDR3 ... when I run it more agressively (x10) the amount of processes spawned to handle requests does shoot up, but they are all, eventually, shutdown gracefully and so cannot be blocking waiting to acquire a mutex ...

@krakjoe
Copy link
Owner

@krakjoe krakjoe commented Oct 15, 2013

@rathers I can't stand the suspense ??

@rathers
Copy link

@rathers rathers commented Oct 15, 2013

I'm starting to get very confused. I think there may be two things at play here:

1 - apache lockups
2 - apache procs exploding in number

(2) is still happenning, even on the latest build.
(1) Seems to be happening sometimes depending on the ini settings!

It is possible we only ever encountered (1) because (2) can occur, I'm really not sure.

If we set apc.ttl and apc.gc_ttl to 0 (as opposed to some number) then (1) doesn't happen. The apache procs will recover and reduce in number if we remove the load.

I've been trying to think what would cause (2) and wonder if there is some hard limit in APC/PHP/Apache/Linux somewhere that limits how many locks can be processed in a second. Do you think the kernel could have anything to do with it? Our test boxes are fairly old (CentOS 5.9) with a 2.6.18 kernel.

WDYT?

@krakjoe
Copy link
Owner

@krakjoe krakjoe commented Oct 15, 2013

This is APC:

[joe@fiji http_load-12mar2006]$ ./http_load -parallel 1000 -rate 100 -seconds 10 apc-bench
21 fetches, 979 max parallel, 147 bytes, in 10.0003 seconds
7 mean bytes/connection
2.09993 fetches/sec, 14.6995 bytes/sec
msecs/connect: 0.132667 mean, 0.224 max, 0.086 min
msecs/first-response: 3733.66 mean, 9703.39 max, 1079.31 min
HTTP response codes:
  code 200 -- 21

This is APCu:

[joe@fiji http_load-12mar2006]$ ./http_load -parallel 1000 -rate 100 -seconds 10 apc-bench
161 fetches, 839 max parallel, 4.00724e+07 bytes, in 10.0004 seconds
248897 mean bytes/connection
16.0994 fetches/sec, 4.0071e+06 bytes/sec
msecs/connect: 0.132323 mean, 0.216 max, 0.082 min
msecs/first-response: 1676.21 mean, 3298.75 max, 3.642 min
HTTP response codes:
  code 200 -- 161

They both behave with regard to apache processes in the same way now, lots are created but eventually shutdown.

I don't think there is a bug present anymore, tried comparing behaviour with normal APC ??

krakjoe added a commit that referenced this issue Oct 15, 2013
@krakjoe
Copy link
Owner

@krakjoe krakjoe commented Oct 15, 2013

This is APCu now:

[joe@fiji http_load-12mar2006]$ ./http_load -parallel 1000 -rate 100 -seconds 10 apc-bench
173 fetches, 827 max parallel, 4.30592e+07 bytes, in 10.0005 seconds
248897 mean bytes/connection
17.2991 fetches/sec, 4.3057e+06 bytes/sec
msecs/connect: 0.121665 mean, 0.231 max, 0.071 min
msecs/first-response: 1792.06 mean, 3494.57 max, 3.738 min
HTTP response codes:
  code 200 -- 173
@krakjoe
Copy link
Owner

@krakjoe krakjoe commented Oct 15, 2013

something does smell a bit fishy about that result ... I'm still looking, haven't given up ...

krakjoe added a commit that referenced this issue Oct 15, 2013
@rathers
Copy link

@rathers rathers commented Oct 15, 2013

When you're comparing APC and APCu are they using different PHP versions?

@rathers
Copy link

@rathers rathers commented Oct 15, 2013

The apache process explosion thing is quite subtle and hard to explain but using -rate 100 wont reveal it as it's too heavyweight. I found there was an implicit maximum rate that could be sustained (in our case around -rate 3 or 4) with apache beahving perfectly normal with a just a few busy procs and response times constant at around 300ms. Then just by injecting a few manual requests with curl (and i really do mean a few, like 3 or 4!) the whole thing then explodes. Response times increase into multiple seconds and beyond and apache spawns procs until it hits MaxClients. This just from a few extra requests! Never seen apache behave like that before.

Depending on the setup (APCu version, compile settings, ini settings etc) the apache procs may or may not recover

@krakjoe krakjoe added the feedback label Dec 7, 2015
@masterdead
Copy link

@masterdead masterdead commented Jan 5, 2016

Still persist

#0  0x00007f5da2862b2d in pthread_rwlock_wrlock () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007f5da1a38359 in apc_lock_wlock (lock=<optimized out>) at /usr/src/apcu/apcu-4.0.10/apc_lock.c:245
#2  0x00007f5da1a3cbe8 in apc_cache_insert (cache=cache@entry=0x2d73120, key=..., value=0x7f5d95412ce8, ctxt=ctxt@entry=0x7ffe99094b80, t=t@entry=1452024027, exclusive=exclusive@entry=0 '\000')
    at /usr/src/apcu/apcu-4.0.10/apc_cache.c:748
#3  0x00007f5da1a3d243 in apc_cache_store (cache=0x2d73120, strkey=0x7f5d899de0f8 "core-init-cache-fantasy", keylen=40, val=0x7f5d899130c8, ttl=59, exclusive=exclusive@entry=0 '\000')
    at /usr/src/apcu/apcu-4.0.10/apc_cache.c:347
#4  0x00007f5da1a39fab in apc_store_helper (ht=<optimized out>, return_value=0x7f5d899e4288, exclusive=<optimized out>, return_value_used=<error reading variable: Unhandled dwarf expression opcode 0xfa>, 
    this_ptr=<error reading variable: Unhandled dwarf expression opcode 0xfa>, return_value_ptr=<error reading variable: Unhandled dwarf expression opcode 0xfa>) at /usr/src/apcu/apcu-4.0.10/php_apc.c:662
#5  0x00000000006e86b9 in dtrace_execute_internal ()
#6  0x000000000079dd01 in ?? ()
#7  0x0000000000762e58 in execute_ex ()
#8  0x00000000006e858d in dtrace_execute_ex ()
#9  0x000000000079e338 in ?? ()
#10 0x0000000000762e58 in execute_ex ()
#11 0x00000000006e858d in dtrace_execute_ex ()
#12 0x000000000079e338 in ?? ()
#13 0x0000000000762e58 in execute_ex ()
#14 0x00000000006e858d in dtrace_execute_ex ()
#15 0x000000000079e338 in ?? ()
#16 0x0000000000762e58 in execute_ex ()
#17 0x00000000006e858d in dtrace_execute_ex ()
#18 0x000000000079e338 in ?? ()
#19 0x0000000000762e58 in execute_ex ()
#20 0x00000000006e858d in dtrace_execute_ex ()
#21 0x000000000079fb02 in ?? ()
#22 0x0000000000762e58 in execute_ex ()
#23 0x00000000006e858d in dtrace_execute_ex ()
#24 0x00000000006faee8 in zend_execute_scripts ()
#25 0x00000000006963d2 in php_execute_script ()
#26 0x0000000000473b32 in main ()
(gdb) frame 0
#0  0x00007f143d000b2d in pthread_rwlock_wrlock () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) frame 1
#1  0x00007f143c1d6359 in apc_lock_wlock (lock=<optimized out>) at /usr/src/apcu/apcu-4.0.10/apc_lock.c:245
245 /usr/src/apcu/apcu-4.0.10/apc_lock.c: No such file or directory.
(gdb) frame 2
#2  0x00007f143c1dabe8 in apc_cache_insert (cache=cache@entry=0x29d0120, key=..., value=0x7f142c94f1f8, ctxt=ctxt@entry=0x7ffd4e182b20, t=t@entry=1452028220, exclusive=exclusive@entry=0 '\000')
    at /usr/src/apcu/apcu-4.0.10/apc_cache.c:748
748 /usr/src/apcu/apcu-4.0.10/apc_cache.c: No such file or directory.
(gdb) frame 3
#3  0x00007f143c1db243 in apc_cache_store (cache=0x29d0120, strkey=0x7f142417a418 "hobby-blesk-cz-magsets-props-103", keylen=33, val=0x7f1424192430, ttl=237, exclusive=exclusive@entry=0 '\000')
    at /usr/src/apcu/apcu-4.0.10/apc_cache.c:347
347 in /usr/src/apcu/apcu-4.0.10/apc_cache.c
(gdb) frame 4
#4  0x00007f143c1d7fab in apc_store_helper (ht=<optimized out>, return_value=0x7f1424180d80, exclusive=<optimized out>, return_value_used=<error reading variable: Unhandled dwarf expression opcode 0xfa>, 
    this_ptr=<error reading variable: Unhandled dwarf expression opcode 0xfa>, return_value_ptr=<error reading variable: Unhandled dwarf expression opcode 0xfa>) at /usr/src/apcu/apcu-4.0.10/php_apc.c:662
662 /usr/src/apcu/apcu-4.0.10/php_apc.c: No such file or directory.
@jr997
Copy link

@jr997 jr997 commented Feb 29, 2016

I've experienced the same issue, but found the cause (at least in my case)

At one point I called apc_cache_info(), at a time where the result of that call was too big to fit in PHPs memory_limit. This gave me a "500 Internal Server Error", which seems acceptable, BUT it failed to release the lock. Therefor subsequent calls, in that or other apache child processes, all waited for a lock, that was never going to be released.

This does seem like a bug in my humble opinion: the system should never end up in this state, no matter what flaw the PHP programmer might introduce.

One thing I'd like to suggest, is that the apc_cache_info() returns something like FALSE if the memory_limit prevents the APC system from returning the requested data.

ps. to whoever needs a quick-fix: I simply changed the memory_limit in the script that called apc_cache_info(), and I haven't seen the problem since.

I use 4.0.10

@mattsmith0308
Copy link

@mattsmith0308 mattsmith0308 commented May 2, 2016

Has anyone looked into this issue recently? I was able to reproduce this in the same manner @jr997 mentioned using 4.0.10. It looks like a fix would be as simple as allocating all necessary memory for the cache_info function prior to taking this lock https://github.com/krakjoe/apcu/blob/PHP5/apc_cache.c#L1539.

@muxx
Copy link

@muxx muxx commented May 22, 2016

The same problem on php7.0.5 and apcu 5.1.2

@Zlender
Copy link

@Zlender Zlender commented Sep 26, 2016

@muxx do you have a script that would show how your reproduce it with php7. I was trying to do it the way it locks with php5.6 but were not able to.

@muxx
Copy link

@muxx muxx commented Sep 26, 2016

@Zlender Unfortunally I don't have script which can reproduce this case :( Problem appears on the workload at ~20-30 rps and can trigger after hour or after 3 hours of working or after 5 hours. And 20-30 rps prevent to understand what particular request causes the problem.

@Zlender
Copy link

@Zlender Zlender commented Sep 27, 2016

Instructions on how to reproduce this with PHP 5.6 https://github.com/Zlender/apcu_19 similar way also works on latest PHP 7 and apcu 5

@krakjoe krakjoe added bug and removed feedback labels Sep 29, 2016
@krakjoe krakjoe closed this in 9749246 Sep 29, 2016
krakjoe added a commit that referenced this issue Sep 29, 2016
@sarumpaet
Copy link

@sarumpaet sarumpaet commented Sep 29, 2016

@krakjoe I think that fix is too narrow? Similar code is in apc_cache.c/apc_cache_stat (APC_RLOCK, then array_init()) and several places in apc_iterator.c. Shouldn't basically all parts after any APC_RLOCK (and possibly APC_LOCK, too) be guarded by zend_try?

krakjoe added a commit that referenced this issue Sep 29, 2016
@krakjoe
Copy link
Owner

@krakjoe krakjoe commented Sep 29, 2016

@sarumpaet I think you are right, after a quick review I've added some more tries ...

I'm looking for feedback now on stability ... hopefully if nothing bad happens @remicollet can do a release in the coming days ...

@remicollet
Copy link
Collaborator

@remicollet remicollet commented Sep 29, 2016

hopefully if nothing bad happens @remicollet can do a release in the coming days ...

No problem, just ping me when ready.

krakjoe added a commit that referenced this issue Sep 29, 2016
@sarumpaet
Copy link

@sarumpaet sarumpaet commented Sep 29, 2016

I don't know how the zend_try mechanism works exactly - is it possible that some of the functions return ill defined values now in the out of memory case? E.g., zend_try { ... array_init(stat); ... } zend_end_try(); return stat; looks suspicious, possibly same for some boolean returns.

@krakjoe
Copy link
Owner

@krakjoe krakjoe commented Sep 29, 2016

Fixed that on last commit, forgot to tag issue ...

krakjoe added a commit that referenced this issue Oct 15, 2016
This reverts commit ac97f34.

for #203
krakjoe added a commit that referenced this issue Oct 15, 2016
This reverts commit 9749246.

for #203
@krakjoe krakjoe reopened this Oct 15, 2016
@krakjoe
Copy link
Owner

@krakjoe krakjoe commented Oct 15, 2016

Crap ...

@krakjoe krakjoe closed this in f0e9bed Oct 16, 2016
krakjoe added a commit that referenced this issue Oct 16, 2016
@patrickbajao
Copy link

@patrickbajao patrickbajao commented Nov 9, 2016

Hi,

Will there be a 4.0.12 release? I can see that the fix was already backported in the PHP5 branch but I don't know when it'll be released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.