New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hung apaches on pthread wrlocks #19

Closed
anoakie opened this Issue Jun 14, 2013 · 51 comments

Comments

Projects
None yet
@anoakie

anoakie commented Jun 14, 2013

We seem to be getting hung apaches w/PHP 5.5.0 RC3 w/APCu.

0x00007f1a9572853d in pthread_rwlock_wrlock () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) bt
#0 0x00007f1a9572853d in pthread_rwlock_wrlock () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f1a90debd49 in apc_lock_wlock (lock=) at /home/phpbuild/buildphp-5.5.0RC3/apcu/apc_lock.c:129
#2 0x00007f1a90df05b9 in apc_cache_insert (cache=0x7f1a970d2960, key=..., value=0x7f1a83591880, ctxt=0x7fff1c983920, t=1371246459, exclusive=0 '\000') at /home/phpbuild/buildphp-5.5.0RC3/apcu/apc_cache.c:837
#3 0x00007f1a90df0b85 in apc_cache_store (cache=0x7f1a970d2960, strkey=0x7f1a9778ccc0 "AutoLoad::prefix::6::xxxxxxxxxxxxxxxxxxx::xxxxxxxxxxxxxxxxxx", keylen=52, val=0x7f1a977b5238, ttl=600, exclusive=0 '\000')

at /home/phpbuild/buildphp-5.5.0RC3/apcu/apc_cache.c:443

#4 0x00007f1a90dedb5b in apc_store_helper (ht=, return_value=0x7f1a977b5208, exclusive=0 '\000', return_value_ptr=, this_ptr=, return_value_used=)

at /home/phpbuild/buildphp-5.5.0RC3/apcu/php_apc.c:568

#5 0x00007f1a93f80de6 in zend_do_fcall_common_helper_SPEC (execute_data=0x7f1a96df65b8) at /home/phpbuild/buildphp-5.5.0RC3/php-src-php-5.5.0RC3/Zend/zend_vm_execute.h:543
#6 0x00007f1a93f450d8 in execute_ex (execute_data=0x7f1a96df65b8) at /home/phpbuild/buildphp-5.5.0RC3/php-src-php-5.5.0RC3/Zend/zend_vm_execute.h:356
#7 0x00007f1a93ec8b70 in zend_call_function (fci=0x7fff1c983cb0, fci_cache=) at /home/phpbuild/buildphp-5.5.0RC3/php-src-php-5.5.0RC3/Zend/zend_execute_API.c:939
#8 0x00007f1a93eee3d7 in zend_call_method (object_pp=0x0, obj_ce=, fn_proxy=0x7f1a977137e0, function_name=0x7f1a977136a8 "autoload::loadone", function_name_len=,

retval_ptr_ptr=0x7fff1c983de0, param_count=1, arg1=0x7f1a97772900, arg2=0x0) at /home/phpbuild/buildphp-5.5.0RC3/php-src-php-5.5.0RC3/Zend/zend_interfaces.c:97

#9 0x00007f1a93dd8507 in zif_spl_autoload_call (ht=, return_value=, return_value_ptr=, this_ptr=, return_value_used=)

at /home/phpbuild/buildphp-5.5.0RC3/php-src-php-5.5.0RC3/ext/spl/php_spl.c:436

#10 0x00007f1a93ec8b14 in zend_call_function (fci=0x7fff1c983fc0, fci_cache=0x7fff1c984010) at /home/phpbuild/buildphp-5.5.0RC3/php-src-php-5.5.0RC3/Zend/zend_execute_API.c:957
#11 0x00007f1a93ec93fb in zend_lookup_class_ex (name=0x7f1a96f7d988 "xxxxxxxxx", name_length=9, key=0x7f1a977b52b0, use_autoload=1, ce=0x7fff1c9840d8)

at /home/phpbuild/buildphp-5.5.0RC3/php-src-php-5.5.0RC3/Zend/zend_execute_API.c:1107

#12 0x00007f1a93ec9b22 in zend_fetch_class_by_name (class_name=0x7f1a96f7d988 "xxxxxxxxx", class_name_len=, key=, fetch_type=4)

at /home/phpbuild/buildphp-5.5.0RC3/php-src-php-5.5.0RC3/Zend/zend_execute_API.c:1587

#13 0x00007f1a93f12c1a in ZEND_FETCH_CLASS_SPEC_CONST_HANDLER (execute_data=0x7f1a96df4ed8) at /home/phpbuild/buildphp-5.5.0RC3/php-src-php-5.5.0RC3/Zend/zend_vm_execute.h:1188
#14 0x00007f1a93f450d8 in execute_ex (execute_data=0x7f1a96df4ed8) at /home/phpbuild/buildphp-5.5.0RC3/php-src-php-5.5.0RC3/Zend/zend_vm_execute.h:356
#15 0x00007f1a93ed83b3 in zend_execute_scripts (type=8, retval=0x0, file_count=3) at /home/phpbuild/buildphp-5.5.0RC3/php-src-php-5.5.0RC3/Zend/zend.c:1316
#16 0x00007f1a93e7661c in php_execute_script (primary_file=0x7fff1c9865b0) at /home/phpbuild/buildphp-5.5.0RC3/php-src-php-5.5.0RC3/main/main.c:2481
#17 0x00007f1a93f8408d in php_handler (r=0x7f1a960f70a0) at /home/phpbuild/buildphp-5.5.0RC3/php-src-php-5.5.0RC3/sapi/apache2handler/sapi_apache2.c:667
#18 0x00007f1a96239508 in ap_run_handler ()
#19 0x00007f1a9623997e in ap_invoke_handler ()
#20 0x00007f1a96249570 in ap_process_request ()
#21 0x00007f1a96246398 in ?? ()
#22 0x00007f1a9623ffa8 in ap_run_process_connection ()
#23 0x00007f1a9624e1d0 in ?? ()
#24 0x00007f1a9624e93a in ?? ()
#25 0x00007f1a9624f4e7 in ap_mpm_run ()
#26 0x00007f1a962244a4 in main ()

@krakjoe

This comment has been minimized.

Show comment
Hide comment
@krakjoe

krakjoe Jun 25, 2013

Owner

Could we know a bit about the software you are running, or anything else about your setup that might help to reproduce the same behaviour ...

frame appears significant, but it's difficult to guess what has caused it ...

Owner

krakjoe commented Jun 25, 2013

Could we know a bit about the software you are running, or anything else about your setup that might help to reproduce the same behaviour ...

frame appears significant, but it's difficult to guess what has caused it ...

@weltling

This comment has been minimized.

Show comment
Hide comment
@weltling

weltling Jul 27, 2013

Collaborator

@anoakie ping ... any update on this, a repro script maybe?

Collaborator

weltling commented Jul 27, 2013

@anoakie ping ... any update on this, a repro script maybe?

@anoakie

This comment has been minimized.

Show comment
Hide comment
@anoakie

anoakie Jul 27, 2013

We tried with both PHP 5.5 RC3 and PHP 5.5.0 final. It's quite hard to reproduce. It happens to random servers in our apache cluster during heavy load every few days.
We're running Ubuntu's apache 2.2.22 prefork. We build a custom PHP 5.5.0 package.
We've reverted to PHP 5.3 + APC for now. I'm currently on sabbatical, so I'm not able to look into this further.
I'm adding my coworkers to this thread so they hopefully expand on this while I'm gone.

anoakie commented Jul 27, 2013

We tried with both PHP 5.5 RC3 and PHP 5.5.0 final. It's quite hard to reproduce. It happens to random servers in our apache cluster during heavy load every few days.
We're running Ubuntu's apache 2.2.22 prefork. We build a custom PHP 5.5.0 package.
We've reverted to PHP 5.3 + APC for now. I'm currently on sabbatical, so I'm not able to look into this further.
I'm adding my coworkers to this thread so they hopefully expand on this while I'm gone.

@denji

This comment has been minimized.

Show comment
Hide comment
@krakjoe

This comment has been minimized.

Show comment
Hide comment
@krakjoe

krakjoe Aug 12, 2013

Owner

@anoakie it's been a while ... I got to assume you still have the error (I think others have mentioned the same thing) ... without a way to reproduce I'm not able to debug ... it may be that your operating systems default rwlock is not suitable in some way (which we'll eventually find and fix when we can reproduce reliably and determine that to be definite cause) ... the only workaround I can suggest to get things going is to disable the use of rwlocks with --disable-apcu-rwlocks on configure for apcu ...

Owner

krakjoe commented Aug 12, 2013

@anoakie it's been a while ... I got to assume you still have the error (I think others have mentioned the same thing) ... without a way to reproduce I'm not able to debug ... it may be that your operating systems default rwlock is not suitable in some way (which we'll eventually find and fix when we can reproduce reliably and determine that to be definite cause) ... the only workaround I can suggest to get things going is to disable the use of rwlocks with --disable-apcu-rwlocks on configure for apcu ...

@magicmonkey

This comment has been minimized.

Show comment
Hide comment
@magicmonkey

magicmonkey Oct 10, 2013

We're experiencing what I think is this problem as well - all our Apache children hang, and a GDB trace shows this:

#0  0x00002b50542905f0 in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
#1  0x00002b505faf253d in apc_lock_wlock (lock=0x2b5071483088) at /usr/src/redhat/BUILD/php-pecl-apcu-4.0.2/apc_lock.c:252
#2  0x00002b505faf61a5 in apc_cache_insert (cache=0x2b5071449d60, key=..., value=0x2b5071943a38, ctxt=0x7fffeae5fc60, t=1381421665, exclusive=0 '\000')
    at /usr/src/redhat/BUILD/php-pecl-apcu-4.0.2/apc_cache.c:841
#3  0x00002b505faf64a5 in apc_cache_store (cache=0x2b5071449d60, 
    strkey=0x2b5067d86480 "/var/lamp/code/v-2013-10-r63/application-apps--/etc/environment.ini-config", keylen=75, val=0x2b5052a11250, ttl=3600, 
    exclusive=0 '\000') at /usr/src/redhat/BUILD/php-pecl-apcu-4.0.2/apc_cache.c:440
#4  0x00002b505faf3bff in apc_store_helper (ht=<value optimized out>, return_value=0x2b5067c137c0, return_value_ptr=<value optimized out>, 
    this_ptr=<value optimized out>, return_value_used=<value optimized out>, exclusive=0 '\000')
    at /usr/src/redhat/BUILD/php-pecl-apcu-4.0.2/php_apc.c:659
#5  0x00002b505d1aee59 in zend_do_fcall_common_helper_SPEC (execute_data=0x2b50527e5578) at /usr/src/redhat/BUILD/php-5.5.3/Zend/zend_vm_execute.h:543
#6  0x00002b505d1ed7f8 in execute_ex (execute_data=0x2b50527e5578) at /usr/src/redhat/BUILD/php-5.5.3/Zend/zend_vm_execute.h:356
#7  0x00002b505d17b7da in zend_execute_scripts (type=8, retval=0x0, file_count=3) at /usr/src/redhat/BUILD/php-5.5.3/Zend/zend.c:1316
#8  0x00002b505d11c699 in php_execute_script (primary_file=0x7fffeae62340) at /usr/src/redhat/BUILD/php-5.5.3/main/main.c:2484
#9  0x00002b505d22828d in php_handler (r=0x2b509186a968) at /usr/src/redhat/BUILD/php-5.5.3/sapi/apache2handler/sapi_apache2.c:667
#10 0x00002b50527a5e8a in ap_run_handler ()
#11 0x00002b50527a9318 in ap_invoke_handler ()
#12 0x00002b50527b3c7a in ap_internal_redirect ()
#13 0x00002b505b5f7c70 in ap_make_dirstr_parent () from /etc/httpd/modules/mod_rewrite.so
#14 0x00002b50527a5e8a in ap_run_handler ()
#15 0x00002b50527a9318 in ap_invoke_handler ()
#16 0x00002b50527b3e28 in ap_process_request ()
#17 0x00002b50527b1010 in ?? ()
#18 0x00002b50527ad112 in ap_run_process_connection ()
#19 0x00002b50527b82c9 in ?? ()
#20 0x00002b50527b855a in ?? ()
#21 0x00002b50527b8dbd in ap_mpm_run ()
#22 0x00002b5052792fd8 in main ()

This is on a not-particularly-busy server. There are a couple of places in our code where we write to APCu, and various Apache children are stuck on writing various different keys (ie they're at different points in our code but all are hung when writing to APCu)

I'm guessing something has obtained the futex and not released it for some reason, but I'm not sure how to find out what.

Here's the APC bit from "php -i" (I'm pretty sure we're using the same config in Apache as on the CLI):

apc

APC support => Emulated

apcu

APCu Support => enabled
Version => 4.0.2
APCu Debugging => Disabled
MMAP Support => Enabled
MMAP File Mask => /tmp/apc.Zk0HWP
Serialization Support => php, eval
Revision => $Revision: 328290 $
Build Date => Oct  9 2013 10:17:53

Directive => Local Value => Master Value
apc.coredump_unmap => Off => Off
apc.enable_cli => On => On
apc.enabled => On => On
apc.entries_hint => 8192 => 8192
apc.gc_ttl => 3600 => 3600
apc.mmap_file_mask => /tmp/apc.Zk0HWP => /tmp/apc.Zk0HWP
apc.preload_path => no value => no value
apc.rfc1867 => Off => Off
apc.rfc1867_freq => 0 => 0
apc.rfc1867_name => APC_UPLOAD_PROGRESS => APC_UPLOAD_PROGRESS
apc.rfc1867_prefix => upload_ => upload_
apc.rfc1867_ttl => 3600 => 3600
apc.serializer => php => php
apc.shm_segments => 1 => 1
apc.shm_size => 512M => 512M
apc.slam_defense => On => On
apc.smart => 0 => 0
apc.ttl => 7200 => 7200
apc.use_request_time => On => On
apc.writable => /tmp => /tmp

magicmonkey commented Oct 10, 2013

We're experiencing what I think is this problem as well - all our Apache children hang, and a GDB trace shows this:

#0  0x00002b50542905f0 in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
#1  0x00002b505faf253d in apc_lock_wlock (lock=0x2b5071483088) at /usr/src/redhat/BUILD/php-pecl-apcu-4.0.2/apc_lock.c:252
#2  0x00002b505faf61a5 in apc_cache_insert (cache=0x2b5071449d60, key=..., value=0x2b5071943a38, ctxt=0x7fffeae5fc60, t=1381421665, exclusive=0 '\000')
    at /usr/src/redhat/BUILD/php-pecl-apcu-4.0.2/apc_cache.c:841
#3  0x00002b505faf64a5 in apc_cache_store (cache=0x2b5071449d60, 
    strkey=0x2b5067d86480 "/var/lamp/code/v-2013-10-r63/application-apps--/etc/environment.ini-config", keylen=75, val=0x2b5052a11250, ttl=3600, 
    exclusive=0 '\000') at /usr/src/redhat/BUILD/php-pecl-apcu-4.0.2/apc_cache.c:440
#4  0x00002b505faf3bff in apc_store_helper (ht=<value optimized out>, return_value=0x2b5067c137c0, return_value_ptr=<value optimized out>, 
    this_ptr=<value optimized out>, return_value_used=<value optimized out>, exclusive=0 '\000')
    at /usr/src/redhat/BUILD/php-pecl-apcu-4.0.2/php_apc.c:659
#5  0x00002b505d1aee59 in zend_do_fcall_common_helper_SPEC (execute_data=0x2b50527e5578) at /usr/src/redhat/BUILD/php-5.5.3/Zend/zend_vm_execute.h:543
#6  0x00002b505d1ed7f8 in execute_ex (execute_data=0x2b50527e5578) at /usr/src/redhat/BUILD/php-5.5.3/Zend/zend_vm_execute.h:356
#7  0x00002b505d17b7da in zend_execute_scripts (type=8, retval=0x0, file_count=3) at /usr/src/redhat/BUILD/php-5.5.3/Zend/zend.c:1316
#8  0x00002b505d11c699 in php_execute_script (primary_file=0x7fffeae62340) at /usr/src/redhat/BUILD/php-5.5.3/main/main.c:2484
#9  0x00002b505d22828d in php_handler (r=0x2b509186a968) at /usr/src/redhat/BUILD/php-5.5.3/sapi/apache2handler/sapi_apache2.c:667
#10 0x00002b50527a5e8a in ap_run_handler ()
#11 0x00002b50527a9318 in ap_invoke_handler ()
#12 0x00002b50527b3c7a in ap_internal_redirect ()
#13 0x00002b505b5f7c70 in ap_make_dirstr_parent () from /etc/httpd/modules/mod_rewrite.so
#14 0x00002b50527a5e8a in ap_run_handler ()
#15 0x00002b50527a9318 in ap_invoke_handler ()
#16 0x00002b50527b3e28 in ap_process_request ()
#17 0x00002b50527b1010 in ?? ()
#18 0x00002b50527ad112 in ap_run_process_connection ()
#19 0x00002b50527b82c9 in ?? ()
#20 0x00002b50527b855a in ?? ()
#21 0x00002b50527b8dbd in ap_mpm_run ()
#22 0x00002b5052792fd8 in main ()

This is on a not-particularly-busy server. There are a couple of places in our code where we write to APCu, and various Apache children are stuck on writing various different keys (ie they're at different points in our code but all are hung when writing to APCu)

I'm guessing something has obtained the futex and not released it for some reason, but I'm not sure how to find out what.

Here's the APC bit from "php -i" (I'm pretty sure we're using the same config in Apache as on the CLI):

apc

APC support => Emulated

apcu

APCu Support => enabled
Version => 4.0.2
APCu Debugging => Disabled
MMAP Support => Enabled
MMAP File Mask => /tmp/apc.Zk0HWP
Serialization Support => php, eval
Revision => $Revision: 328290 $
Build Date => Oct  9 2013 10:17:53

Directive => Local Value => Master Value
apc.coredump_unmap => Off => Off
apc.enable_cli => On => On
apc.enabled => On => On
apc.entries_hint => 8192 => 8192
apc.gc_ttl => 3600 => 3600
apc.mmap_file_mask => /tmp/apc.Zk0HWP => /tmp/apc.Zk0HWP
apc.preload_path => no value => no value
apc.rfc1867 => Off => Off
apc.rfc1867_freq => 0 => 0
apc.rfc1867_name => APC_UPLOAD_PROGRESS => APC_UPLOAD_PROGRESS
apc.rfc1867_prefix => upload_ => upload_
apc.rfc1867_ttl => 3600 => 3600
apc.serializer => php => php
apc.shm_segments => 1 => 1
apc.shm_size => 512M => 512M
apc.slam_defense => On => On
apc.smart => 0 => 0
apc.ttl => 7200 => 7200
apc.use_request_time => On => On
apc.writable => /tmp => /tmp
@magicmonkey

This comment has been minimized.

Show comment
Hide comment
@magicmonkey

magicmonkey Oct 10, 2013

(gdb) frame 1
#1  0x00002b505faf253d in apc_lock_wlock (lock=0x2b5071483088) at /usr/src/redhat/BUILD/php-pecl-apcu-4.0.2/apc_lock.c:252
252 /usr/src/redhat/BUILD/php-pecl-apcu-4.0.2/apc_lock.c: No such file or directory.
    in /usr/src/redhat/BUILD/php-pecl-apcu-4.0.2/apc_lock.c
(gdb) p *lock
$1 = {__data = {__lock = 0, __nr_readers = 4294967293, __readers_wakeup = 6, __writer_wakeup = 5, __nr_readers_queued = 0, __nr_writers_queued = 50, 
    __writer = 0, __shared = 128, __pad1 = 0, __pad2 = 0, __flags = 0}, 
  __size = "\000\000\000\000\375\377\377\377\006\000\000\000\005\000\000\000\000\000\000\000\062\000\000\000\000\000\000\000\200", '\000' <repeats 26 times>, __align = -12884901888}

magicmonkey commented Oct 10, 2013

(gdb) frame 1
#1  0x00002b505faf253d in apc_lock_wlock (lock=0x2b5071483088) at /usr/src/redhat/BUILD/php-pecl-apcu-4.0.2/apc_lock.c:252
252 /usr/src/redhat/BUILD/php-pecl-apcu-4.0.2/apc_lock.c: No such file or directory.
    in /usr/src/redhat/BUILD/php-pecl-apcu-4.0.2/apc_lock.c
(gdb) p *lock
$1 = {__data = {__lock = 0, __nr_readers = 4294967293, __readers_wakeup = 6, __writer_wakeup = 5, __nr_readers_queued = 0, __nr_writers_queued = 50, 
    __writer = 0, __shared = 128, __pad1 = 0, __pad2 = 0, __flags = 0}, 
  __size = "\000\000\000\000\375\377\377\377\006\000\000\000\005\000\000\000\000\000\000\000\062\000\000\000\000\000\000\000\200", '\000' <repeats 26 times>, __align = -12884901888}
@rathers

This comment has been minimized.

Show comment
Hide comment
@rathers

rathers Oct 14, 2013

I believe I have managed to reproduce this issue with a test script and some load injection. I've uploaded the test script and instructions here:

https://github.com/rathers/apcu-repro

Essentially the test script does a load of apc_store and apc_fetch calls against the same key. Loading this at increasing rates eventually triggers the number of apache children to spiral out of control and lock up.

rathers commented Oct 14, 2013

I believe I have managed to reproduce this issue with a test script and some load injection. I've uploaded the test script and instructions here:

https://github.com/rathers/apcu-repro

Essentially the test script does a load of apc_store and apc_fetch calls against the same key. Loading this at increasing rates eventually triggers the number of apache children to spiral out of control and lock up.

krakjoe added a commit that referenced this issue Oct 14, 2013

@krakjoe

This comment has been minimized.

Show comment
Hide comment
@krakjoe

krakjoe Oct 14, 2013

Owner

@rathers Thanks for your effort ... got some time this evening to have a look, think we got it ... can I get some feedback ??

Owner

krakjoe commented Oct 14, 2013

@rathers Thanks for your effort ... got some time this evening to have a look, think we got it ... can I get some feedback ??

@glenjamin

This comment has been minimized.

Show comment
Hide comment
@glenjamin

glenjamin Oct 14, 2013

Hi, I'm trying to get a handle on what has changed in that diff to reason about how the fix works.

As far as I can tell, it:

  • removes eval serialiser (unrelated)
  • ensures apache cannot kill a thread while it has a lock
  • checks the state of cache-busy more often

Under what circumstances is the cache considered "busy"?

glenjamin commented Oct 14, 2013

Hi, I'm trying to get a handle on what has changed in that diff to reason about how the fix works.

As far as I can tell, it:

  • removes eval serialiser (unrelated)
  • ensures apache cannot kill a thread while it has a lock
  • checks the state of cache-busy more often

Under what circumstances is the cache considered "busy"?

@krakjoe

This comment has been minimized.

Show comment
Hide comment
@krakjoe

krakjoe Oct 14, 2013

Owner

The eval serializer is unrelated, it was out in the wild for testing, and caused more harm than good ... so it's gotta go ...

The cache is considered busy during gc, which is invoked implicitly throughout, sometimes by apc itself, sometimes by the allocator underneath ... allowing php to call a function that operates on a busy cache is futile ...

The thing that fixes the problem with apache is utilizing [un]block interruptions on locks, just like APC did ...

Owner

krakjoe commented Oct 14, 2013

The eval serializer is unrelated, it was out in the wild for testing, and caused more harm than good ... so it's gotta go ...

The cache is considered busy during gc, which is invoked implicitly throughout, sometimes by apc itself, sometimes by the allocator underneath ... allowing php to call a function that operates on a busy cache is futile ...

The thing that fixes the problem with apache is utilizing [un]block interruptions on locks, just like APC did ...

@rathers

This comment has been minimized.

Show comment
Hide comment
@rathers

rathers Oct 15, 2013

@krakjoe, thanks for the commit. We've rebuilt apcu including this commit and deployed it to a test box. It hasn't made any difference :( We're still seeing the same locking problem, reproducible in exactly the same way as before. Any further ideas to try out?

rathers commented Oct 15, 2013

@krakjoe, thanks for the commit. We've rebuilt apcu including this commit and deployed it to a test box. It hasn't made any difference :( We're still seeing the same locking problem, reproducible in exactly the same way as before. Any further ideas to try out?

@krakjoe

This comment has been minimized.

Show comment
Hide comment
@krakjoe

krakjoe Oct 15, 2013

Owner

Damn ... I have that pulse you get under your eye when things get a bit stressful ...

I'm a bit baffled, let me have some thinking time ...

I'm not able to make apache spiral out of control anymore, the number of processes remain steady and apache remains responsive but slow ...

Owner

krakjoe commented Oct 15, 2013

Damn ... I have that pulse you get under your eye when things get a bit stressful ...

I'm a bit baffled, let me have some thinking time ...

I'm not able to make apache spiral out of control anymore, the number of processes remain steady and apache remains responsive but slow ...

@rathers

This comment has been minimized.

Show comment
Hide comment
@rathers

rathers Oct 15, 2013

What version of apache are you using? We're still on 2.2 which may or may not make a difference!

rathers commented Oct 15, 2013

What version of apache are you using? We're still on 2.2 which may or may not make a difference!

@krakjoe

This comment has been minimized.

Show comment
Hide comment
@krakjoe

krakjoe Oct 15, 2013

Owner
[joe@fiji php-src]$ httpd-nts -v
Server version: Apache/2.2.23 (Unix)
Server built:   Mar 15 2013 10:39:59
[joe@fiji php-src]$ httpd-zts -v
Server version: Apache/2.2.23 (Unix)
Server built:   Sep 16 2013 09:55:35
Owner

krakjoe commented Oct 15, 2013

[joe@fiji php-src]$ httpd-nts -v
Server version: Apache/2.2.23 (Unix)
Server built:   Mar 15 2013 10:39:59
[joe@fiji php-src]$ httpd-zts -v
Server version: Apache/2.2.23 (Unix)
Server built:   Sep 16 2013 09:55:35
@krakjoe

This comment has been minimized.

Show comment
Hide comment
@krakjoe

krakjoe Oct 15, 2013

Owner

try now ??

bit of a mistake on my part there, and an omission ... this has got to work ...

Owner

krakjoe commented Oct 15, 2013

try now ??

bit of a mistake on my part there, and an omission ... this has got to work ...

@krakjoe

This comment has been minimized.

Show comment
Hide comment
@krakjoe

krakjoe Oct 15, 2013

Owner

here is the options I am running http_load with:

./http_load -parallel 100 -rate 10 -seconds 100 apc-bench

I've run it several times (lots and lots and still am) like that without a problem ... is that enough ??

Owner

krakjoe commented Oct 15, 2013

here is the options I am running http_load with:

./http_load -parallel 100 -rate 10 -seconds 100 apc-bench

I've run it several times (lots and lots and still am) like that without a problem ... is that enough ??

@rathers

This comment has been minimized.

Show comment
Hide comment
@rathers

rathers Oct 15, 2013

That looks a lot higher than I've managed to achieve. How beefy is your box? Try it without -parallel as I'm not sure which would take precedence.

We're just redeploying APCu with your latest patch then will try again

rathers commented Oct 15, 2013

That looks a lot higher than I've managed to achieve. How beefy is your box? Try it without -parallel as I'm not sure which would take precedence.

We're just redeploying APCu with your latest patch then will try again

@krakjoe

This comment has been minimized.

Show comment
Hide comment
@krakjoe

krakjoe Oct 15, 2013

Owner

8 (4+HT) cores @ 3.4ghz with 16GB DDR3 ... when I run it more agressively (x10) the amount of processes spawned to handle requests does shoot up, but they are all, eventually, shutdown gracefully and so cannot be blocking waiting to acquire a mutex ...

Owner

krakjoe commented Oct 15, 2013

8 (4+HT) cores @ 3.4ghz with 16GB DDR3 ... when I run it more agressively (x10) the amount of processes spawned to handle requests does shoot up, but they are all, eventually, shutdown gracefully and so cannot be blocking waiting to acquire a mutex ...

@krakjoe

This comment has been minimized.

Show comment
Hide comment
@krakjoe

krakjoe Oct 15, 2013

Owner

@rathers I can't stand the suspense ??

Owner

krakjoe commented Oct 15, 2013

@rathers I can't stand the suspense ??

@rathers

This comment has been minimized.

Show comment
Hide comment
@rathers

rathers Oct 15, 2013

I'm starting to get very confused. I think there may be two things at play here:

1 - apache lockups
2 - apache procs exploding in number

(2) is still happenning, even on the latest build.
(1) Seems to be happening sometimes depending on the ini settings!

It is possible we only ever encountered (1) because (2) can occur, I'm really not sure.

If we set apc.ttl and apc.gc_ttl to 0 (as opposed to some number) then (1) doesn't happen. The apache procs will recover and reduce in number if we remove the load.

I've been trying to think what would cause (2) and wonder if there is some hard limit in APC/PHP/Apache/Linux somewhere that limits how many locks can be processed in a second. Do you think the kernel could have anything to do with it? Our test boxes are fairly old (CentOS 5.9) with a 2.6.18 kernel.

WDYT?

rathers commented Oct 15, 2013

I'm starting to get very confused. I think there may be two things at play here:

1 - apache lockups
2 - apache procs exploding in number

(2) is still happenning, even on the latest build.
(1) Seems to be happening sometimes depending on the ini settings!

It is possible we only ever encountered (1) because (2) can occur, I'm really not sure.

If we set apc.ttl and apc.gc_ttl to 0 (as opposed to some number) then (1) doesn't happen. The apache procs will recover and reduce in number if we remove the load.

I've been trying to think what would cause (2) and wonder if there is some hard limit in APC/PHP/Apache/Linux somewhere that limits how many locks can be processed in a second. Do you think the kernel could have anything to do with it? Our test boxes are fairly old (CentOS 5.9) with a 2.6.18 kernel.

WDYT?

@krakjoe

This comment has been minimized.

Show comment
Hide comment
@krakjoe

krakjoe Oct 15, 2013

Owner

This is APC:

[joe@fiji http_load-12mar2006]$ ./http_load -parallel 1000 -rate 100 -seconds 10 apc-bench
21 fetches, 979 max parallel, 147 bytes, in 10.0003 seconds
7 mean bytes/connection
2.09993 fetches/sec, 14.6995 bytes/sec
msecs/connect: 0.132667 mean, 0.224 max, 0.086 min
msecs/first-response: 3733.66 mean, 9703.39 max, 1079.31 min
HTTP response codes:
  code 200 -- 21

This is APCu:

[joe@fiji http_load-12mar2006]$ ./http_load -parallel 1000 -rate 100 -seconds 10 apc-bench
161 fetches, 839 max parallel, 4.00724e+07 bytes, in 10.0004 seconds
248897 mean bytes/connection
16.0994 fetches/sec, 4.0071e+06 bytes/sec
msecs/connect: 0.132323 mean, 0.216 max, 0.082 min
msecs/first-response: 1676.21 mean, 3298.75 max, 3.642 min
HTTP response codes:
  code 200 -- 161

They both behave with regard to apache processes in the same way now, lots are created but eventually shutdown.

I don't think there is a bug present anymore, tried comparing behaviour with normal APC ??

Owner

krakjoe commented Oct 15, 2013

This is APC:

[joe@fiji http_load-12mar2006]$ ./http_load -parallel 1000 -rate 100 -seconds 10 apc-bench
21 fetches, 979 max parallel, 147 bytes, in 10.0003 seconds
7 mean bytes/connection
2.09993 fetches/sec, 14.6995 bytes/sec
msecs/connect: 0.132667 mean, 0.224 max, 0.086 min
msecs/first-response: 3733.66 mean, 9703.39 max, 1079.31 min
HTTP response codes:
  code 200 -- 21

This is APCu:

[joe@fiji http_load-12mar2006]$ ./http_load -parallel 1000 -rate 100 -seconds 10 apc-bench
161 fetches, 839 max parallel, 4.00724e+07 bytes, in 10.0004 seconds
248897 mean bytes/connection
16.0994 fetches/sec, 4.0071e+06 bytes/sec
msecs/connect: 0.132323 mean, 0.216 max, 0.082 min
msecs/first-response: 1676.21 mean, 3298.75 max, 3.642 min
HTTP response codes:
  code 200 -- 161

They both behave with regard to apache processes in the same way now, lots are created but eventually shutdown.

I don't think there is a bug present anymore, tried comparing behaviour with normal APC ??

krakjoe added a commit that referenced this issue Oct 15, 2013

@krakjoe

This comment has been minimized.

Show comment
Hide comment
@krakjoe

krakjoe Oct 15, 2013

Owner

This is APCu now:

[joe@fiji http_load-12mar2006]$ ./http_load -parallel 1000 -rate 100 -seconds 10 apc-bench
173 fetches, 827 max parallel, 4.30592e+07 bytes, in 10.0005 seconds
248897 mean bytes/connection
17.2991 fetches/sec, 4.3057e+06 bytes/sec
msecs/connect: 0.121665 mean, 0.231 max, 0.071 min
msecs/first-response: 1792.06 mean, 3494.57 max, 3.738 min
HTTP response codes:
  code 200 -- 173
Owner

krakjoe commented Oct 15, 2013

This is APCu now:

[joe@fiji http_load-12mar2006]$ ./http_load -parallel 1000 -rate 100 -seconds 10 apc-bench
173 fetches, 827 max parallel, 4.30592e+07 bytes, in 10.0005 seconds
248897 mean bytes/connection
17.2991 fetches/sec, 4.3057e+06 bytes/sec
msecs/connect: 0.121665 mean, 0.231 max, 0.071 min
msecs/first-response: 1792.06 mean, 3494.57 max, 3.738 min
HTTP response codes:
  code 200 -- 173
@krakjoe

This comment has been minimized.

Show comment
Hide comment
@krakjoe

krakjoe Oct 15, 2013

Owner

something does smell a bit fishy about that result ... I'm still looking, haven't given up ...

Owner

krakjoe commented Oct 15, 2013

something does smell a bit fishy about that result ... I'm still looking, haven't given up ...

krakjoe added a commit that referenced this issue Oct 15, 2013

@rathers

This comment has been minimized.

Show comment
Hide comment
@rathers

rathers Oct 15, 2013

When you're comparing APC and APCu are they using different PHP versions?

rathers commented Oct 15, 2013

When you're comparing APC and APCu are they using different PHP versions?

@rathers

This comment has been minimized.

Show comment
Hide comment
@rathers

rathers Oct 15, 2013

The apache process explosion thing is quite subtle and hard to explain but using -rate 100 wont reveal it as it's too heavyweight. I found there was an implicit maximum rate that could be sustained (in our case around -rate 3 or 4) with apache beahving perfectly normal with a just a few busy procs and response times constant at around 300ms. Then just by injecting a few manual requests with curl (and i really do mean a few, like 3 or 4!) the whole thing then explodes. Response times increase into multiple seconds and beyond and apache spawns procs until it hits MaxClients. This just from a few extra requests! Never seen apache behave like that before.

Depending on the setup (APCu version, compile settings, ini settings etc) the apache procs may or may not recover

rathers commented Oct 15, 2013

The apache process explosion thing is quite subtle and hard to explain but using -rate 100 wont reveal it as it's too heavyweight. I found there was an implicit maximum rate that could be sustained (in our case around -rate 3 or 4) with apache beahving perfectly normal with a just a few busy procs and response times constant at around 300ms. Then just by injecting a few manual requests with curl (and i really do mean a few, like 3 or 4!) the whole thing then explodes. Response times increase into multiple seconds and beyond and apache spawns procs until it hits MaxClients. This just from a few extra requests! Never seen apache behave like that before.

Depending on the setup (APCu version, compile settings, ini settings etc) the apache procs may or may not recover

@krakjoe

This comment has been minimized.

Show comment
Hide comment
@krakjoe

krakjoe Oct 15, 2013

Owner

I have adjusted a few things in the last few minutes, we should be completely consistent with apc now ... I get bad byte count messages though under extreme load ...

I am using same version, their timings are about now equal as we should expect them to be ...

I'm a little/lot tired, I need thinking time, I'll have to come back to it tomorrow ...

Thanks for your patience ... you're going to need some more :)

Owner

krakjoe commented Oct 15, 2013

I have adjusted a few things in the last few minutes, we should be completely consistent with apc now ... I get bad byte count messages though under extreme load ...

I am using same version, their timings are about now equal as we should expect them to be ...

I'm a little/lot tired, I need thinking time, I'll have to come back to it tomorrow ...

Thanks for your patience ... you're going to need some more :)

@fauvel

This comment has been minimized.

Show comment
Hide comment
@fauvel

fauvel Apr 10, 2014

We are experiencing the same issue and could reproduce it with a very simple CLI script – so it's not directly Apache related!

<?php

$text = 'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.';

$i = 0;
while (true) {
    echo "\r" . ++$i;
    apc_add((string) $i, $text);
}

The script hangs up after only 13'248 iterations if apc.shm_size has the "magical" value 16M. It runs through (so to say) with 15M!

The good thing is that you can reproduce it on our vagrant box:

git clone git@github.com:cargomedia/CM.git
cd CM
vagrant up
vagrant ssh # from now on in the vagrant box
sudo apt-get install emacs
sudo emacs /etc/php5/cli/conf.d/20-apcu.ini # set apc.shm_size to 16M
emacs apc.php # copy-paste the above script
php apc.php # hangs up

To get the backtrace in the vagrant box, run:

sudo apt-get install gdb
gdb # from now on in the gdb console
file /usr/bin/php5
run apc.php
^C # Interrupt the process hanging up
backtrace full

The relevant part of the backtrace is:

#0  0x00007ffff4d1cabd in pthread_rwlock_wrlock () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#1  0x00007ffff3b823b4 in apc_lock_wlock (lock=<optimized out>) at /tmp/91cdb7ca50775aa4c7a277127669e38b/apcu-4.0.4/apc_lock.c:252
No locals.
#2  0x00007ffff3b86cd8 in apc_cache_insert (cache=cache@entry=0xfe76f0, key=..., value=0x7fffe9222438, ctxt=ctxt@entry=0x7fffffffaac0, t=t@entry=1397131216, exclusive=exclusive@entry=1 '\001')
    at /tmp/91cdb7ca50775aa4c7a277127669e38b/apcu-4.0.4/apc_cache.c:736
No locals.
#3  0x00007ffff3b87333 in apc_cache_store (cache=0xfe76f0, strkey=0x7ffff7e72410 "13248", keylen=6, val=0x7ffff7e70960, ttl=0, exclusive=exclusive@entry=1 '\001')
    at /tmp/91cdb7ca50775aa4c7a277127669e38b/apcu-4.0.4/apc_cache.c:335
        entry = <optimized out>
        key = {str = 0x7ffff7e72410 "13248", len = 6, h = 6951369112743, mtime = 1397131216, owner = -21640}
        t = 1397131216
        ctxt = {pool = 0x7fffe92223a8, copy = APC_COPY_IN, force_update = 0, copied = {nTableSize = 0, nTableMask = 0, nNumOfElements = 0, nNextFreeElement = 0, pInternalPointer = 0x0, 
            pListHead = 0x0, pListTail = 0x0, arBuckets = 0x0, pDestructor = 0, persistent = 0 '\000', nApplyCount = 0 '\000', bApplyProtection = 0 '\000'}, serializer = 0x7ffff3d921a0, 
          key = 0x7fffffffaa90}
        ret = 0 '\000'
#4  0x00007ffff3b8409b in apc_store_helper (ht=<optimized out>, return_value=0x7ffff7e72780, exclusive=<optimized out>, 
    return_value_used=<error reading variable: Unhandled dwarf expression opcode 0xfa>, this_ptr=<error reading variable: Unhandled dwarf expression opcode 0xfa>, 
    return_value_ptr=<error reading variable: Unhandled dwarf expression opcode 0xfa>) at /tmp/91cdb7ca50775aa4c7a277127669e38b/apcu-4.0.4/php_apc.c:662
        key = 0x7ffff7e70900
        val = 0x7ffff7e70960
        ttl = 0

I hope this can help figuring out what's happening!

fauvel commented Apr 10, 2014

We are experiencing the same issue and could reproduce it with a very simple CLI script – so it's not directly Apache related!

<?php

$text = 'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.';

$i = 0;
while (true) {
    echo "\r" . ++$i;
    apc_add((string) $i, $text);
}

The script hangs up after only 13'248 iterations if apc.shm_size has the "magical" value 16M. It runs through (so to say) with 15M!

The good thing is that you can reproduce it on our vagrant box:

git clone git@github.com:cargomedia/CM.git
cd CM
vagrant up
vagrant ssh # from now on in the vagrant box
sudo apt-get install emacs
sudo emacs /etc/php5/cli/conf.d/20-apcu.ini # set apc.shm_size to 16M
emacs apc.php # copy-paste the above script
php apc.php # hangs up

To get the backtrace in the vagrant box, run:

sudo apt-get install gdb
gdb # from now on in the gdb console
file /usr/bin/php5
run apc.php
^C # Interrupt the process hanging up
backtrace full

The relevant part of the backtrace is:

#0  0x00007ffff4d1cabd in pthread_rwlock_wrlock () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#1  0x00007ffff3b823b4 in apc_lock_wlock (lock=<optimized out>) at /tmp/91cdb7ca50775aa4c7a277127669e38b/apcu-4.0.4/apc_lock.c:252
No locals.
#2  0x00007ffff3b86cd8 in apc_cache_insert (cache=cache@entry=0xfe76f0, key=..., value=0x7fffe9222438, ctxt=ctxt@entry=0x7fffffffaac0, t=t@entry=1397131216, exclusive=exclusive@entry=1 '\001')
    at /tmp/91cdb7ca50775aa4c7a277127669e38b/apcu-4.0.4/apc_cache.c:736
No locals.
#3  0x00007ffff3b87333 in apc_cache_store (cache=0xfe76f0, strkey=0x7ffff7e72410 "13248", keylen=6, val=0x7ffff7e70960, ttl=0, exclusive=exclusive@entry=1 '\001')
    at /tmp/91cdb7ca50775aa4c7a277127669e38b/apcu-4.0.4/apc_cache.c:335
        entry = <optimized out>
        key = {str = 0x7ffff7e72410 "13248", len = 6, h = 6951369112743, mtime = 1397131216, owner = -21640}
        t = 1397131216
        ctxt = {pool = 0x7fffe92223a8, copy = APC_COPY_IN, force_update = 0, copied = {nTableSize = 0, nTableMask = 0, nNumOfElements = 0, nNextFreeElement = 0, pInternalPointer = 0x0, 
            pListHead = 0x0, pListTail = 0x0, arBuckets = 0x0, pDestructor = 0, persistent = 0 '\000', nApplyCount = 0 '\000', bApplyProtection = 0 '\000'}, serializer = 0x7ffff3d921a0, 
          key = 0x7fffffffaa90}
        ret = 0 '\000'
#4  0x00007ffff3b8409b in apc_store_helper (ht=<optimized out>, return_value=0x7ffff7e72780, exclusive=<optimized out>, 
    return_value_used=<error reading variable: Unhandled dwarf expression opcode 0xfa>, this_ptr=<error reading variable: Unhandled dwarf expression opcode 0xfa>, 
    return_value_ptr=<error reading variable: Unhandled dwarf expression opcode 0xfa>) at /tmp/91cdb7ca50775aa4c7a277127669e38b/apcu-4.0.4/php_apc.c:662
        key = 0x7ffff7e70900
        val = 0x7ffff7e70960
        ttl = 0

I hope this can help figuring out what's happening!

@bogdan-plevit

This comment has been minimized.

Show comment
Hide comment
@bogdan-plevit

bogdan-plevit Oct 29, 2014

Hello,

We are experiencing the same issue described by rathers. In our setup we are using Nginx (version: nginx/1.7.3 ) and PHP-FPM (version 5.5.18). The APCu (version: 4.0.6 Revision: 328290) extension is loaded in PHP-FPM. This setup is used for now for our development infrastructure.

We have replicated this issue using the steps described here: https://github.com/rathers/apcu-repro. In our development setup we are using PHP-FPM with multiple pools and we have managed to lock up only the pool who is processing the requests from the http_load tool. The other pools are processing requests fine until they hit the same issue.

If we run http_load at a relatively low rate (10) for a couple of times, we are experiencing also an increase in response times when we run: time curl http://localhost/apc-bench.php.

We are planning to migrate our production environment (which also uses Nginx) from PHP-FPM (version 5.3.29) and APC (version 3.1.13) and we would like to use APCu as a drop-in replacement.

We are willing to provide any information or do debugging on the development environment.

Here is the output of php -i|grep -iE "^apc":
apc
APC support => Emulated
apcu
APCu Support => Disabled
APCu Debugging => Disabled
apc.coredump_unmap => Off => Off
apc.enable_cli => Off => Off
apc.enabled => On => On
apc.entries_hint => 4096 => 4096
apc.gc_ttl => 3600 => 3600
apc.mmap_file_mask => no value => no value
apc.preload_path => no value => no value
apc.rfc1867 => Off => Off
apc.rfc1867_freq => 0 => 0
apc.rfc1867_name => APC_UPLOAD_PROGRESS => APC_UPLOAD_PROGRESS
apc.rfc1867_prefix => upload_ => upload_
apc.rfc1867_ttl => 3600 => 3600
apc.serializer => php => php
apc.shm_segments => 1 => 1
apc.shm_size => 2048M => 2048M
apc.slam_defense => On => On
apc.smart => 0 => 0
apc.ttl => 1800 => 1800
apc.use_request_time => On => On
apc.writable => /tmp => /tmp

bogdan-plevit commented Oct 29, 2014

Hello,

We are experiencing the same issue described by rathers. In our setup we are using Nginx (version: nginx/1.7.3 ) and PHP-FPM (version 5.5.18). The APCu (version: 4.0.6 Revision: 328290) extension is loaded in PHP-FPM. This setup is used for now for our development infrastructure.

We have replicated this issue using the steps described here: https://github.com/rathers/apcu-repro. In our development setup we are using PHP-FPM with multiple pools and we have managed to lock up only the pool who is processing the requests from the http_load tool. The other pools are processing requests fine until they hit the same issue.

If we run http_load at a relatively low rate (10) for a couple of times, we are experiencing also an increase in response times when we run: time curl http://localhost/apc-bench.php.

We are planning to migrate our production environment (which also uses Nginx) from PHP-FPM (version 5.3.29) and APC (version 3.1.13) and we would like to use APCu as a drop-in replacement.

We are willing to provide any information or do debugging on the development environment.

Here is the output of php -i|grep -iE "^apc":
apc
APC support => Emulated
apcu
APCu Support => Disabled
APCu Debugging => Disabled
apc.coredump_unmap => Off => Off
apc.enable_cli => Off => Off
apc.enabled => On => On
apc.entries_hint => 4096 => 4096
apc.gc_ttl => 3600 => 3600
apc.mmap_file_mask => no value => no value
apc.preload_path => no value => no value
apc.rfc1867 => Off => Off
apc.rfc1867_freq => 0 => 0
apc.rfc1867_name => APC_UPLOAD_PROGRESS => APC_UPLOAD_PROGRESS
apc.rfc1867_prefix => upload_ => upload_
apc.rfc1867_ttl => 3600 => 3600
apc.serializer => php => php
apc.shm_segments => 1 => 1
apc.shm_size => 2048M => 2048M
apc.slam_defense => On => On
apc.smart => 0 => 0
apc.ttl => 1800 => 1800
apc.use_request_time => On => On
apc.writable => /tmp => /tmp

@procesorrr

This comment has been minimized.

Show comment
Hide comment
@procesorrr

procesorrr Oct 31, 2014

@bogdan-plevit @krakjoe

Hi guys,

I've done some additional testing with @bogdan-plevit and compiled apcu with
"--disable-apcu-mmap", and now the apc works better it still causes memory leakage but lower fragementation. While i run the http_load test i can now acces the apc info page and the curl works better. From what i see this only a workaround, the main issue is still at large 👍 .
If you need the steps to reproduce let me know. I will do some additional testing and hopefully see the problem.

procesorrr commented Oct 31, 2014

@bogdan-plevit @krakjoe

Hi guys,

I've done some additional testing with @bogdan-plevit and compiled apcu with
"--disable-apcu-mmap", and now the apc works better it still causes memory leakage but lower fragementation. While i run the http_load test i can now acces the apc info page and the curl works better. From what i see this only a workaround, the main issue is still at large 👍 .
If you need the steps to reproduce let me know. I will do some additional testing and hopefully see the problem.

@denji

This comment has been minimized.

Show comment
Hide comment
@denji

denji Oct 31, 2014

[apc_cache.c:1220]: (error) va_list 'args' used before va_start() was called.
[apc_mmap.c:77] -> [apc_mmap.c:67]: (warning) Possible null pointer dereference: file_mask - otherwise it is redundant to check it against null.
[apc_mmap.c:86] -> [apc_mmap.c:67]: (warning) Possible null pointer dereference: file_mask - otherwise it is redundant to check it against null.
[apc_shm.c:58]: (warning) Return value of function strerror() is not used.
[php_apc.c:540] -> [php_apc.c:534]: (warning) Possible null pointer dereference: info - otherwise it is redundant to check it against null.

apcu/apc_mmap.c:97 (warning) Call to function 'mktemp' is insecure as it always creates or uses insecure temporary file.  Use 'mkstemp' instead
apcu/apc_pool.c:291 (warning) Value stored to 'redzone' is never read
apcu/apc_pool.c:252 (warning) Value stored to 'redsize' is never read

denji commented Oct 31, 2014

[apc_cache.c:1220]: (error) va_list 'args' used before va_start() was called.
[apc_mmap.c:77] -> [apc_mmap.c:67]: (warning) Possible null pointer dereference: file_mask - otherwise it is redundant to check it against null.
[apc_mmap.c:86] -> [apc_mmap.c:67]: (warning) Possible null pointer dereference: file_mask - otherwise it is redundant to check it against null.
[apc_shm.c:58]: (warning) Return value of function strerror() is not used.
[php_apc.c:540] -> [php_apc.c:534]: (warning) Possible null pointer dereference: info - otherwise it is redundant to check it against null.

apcu/apc_mmap.c:97 (warning) Call to function 'mktemp' is insecure as it always creates or uses insecure temporary file.  Use 'mkstemp' instead
apcu/apc_pool.c:291 (warning) Value stored to 'redzone' is never read
apcu/apc_pool.c:252 (warning) Value stored to 'redsize' is never read

@fauvel

This comment has been minimized.

Show comment
Hide comment
@fauvel

fauvel Nov 3, 2014

Thank you guys for working on this issue, I would really appreciate any improvement of APCu stability at this point!

fauvel commented Nov 3, 2014

Thank you guys for working on this issue, I would really appreciate any improvement of APCu stability at this point!

@jerrylindahl

This comment has been minimized.

Show comment
Hide comment
@jerrylindahl

jerrylindahl Jan 28, 2015

I seem to have encountered the same issue. On ubuntu 14.04 with Apache 2.4.7 and php5.5.9 with APCu4.0.2 (also tested with APCu 4.0.6 from utopic) apache will stop responding at a random point while running our test suite.

Apache will stop responding and running strace shows processes waiting in a futex call.

Not able to reproduce with the script provided by @fauvel

Could this issue be related to this? #85

We haven't seen the issue in prod where we run an old oracle linux with Apache 2.4.10, php5.5.20 and APCu 4.0.7.

jerrylindahl commented Jan 28, 2015

I seem to have encountered the same issue. On ubuntu 14.04 with Apache 2.4.7 and php5.5.9 with APCu4.0.2 (also tested with APCu 4.0.6 from utopic) apache will stop responding at a random point while running our test suite.

Apache will stop responding and running strace shows processes waiting in a futex call.

Not able to reproduce with the script provided by @fauvel

Could this issue be related to this? #85

We haven't seen the issue in prod where we run an old oracle linux with Apache 2.4.10, php5.5.20 and APCu 4.0.7.

@adaladam

This comment has been minimized.

Show comment
Hide comment
@adaladam

adaladam Mar 31, 2015

I had the same issue with owncloud 8.0.0. APCu hung apache. This issue was solved by updating APCu to 4.0.6 version. See also this thread https://bugs.launchpad.net/ubuntu/+source/php-apcu/+bug/1422484, this might help you.

adaladam commented Mar 31, 2015

I had the same issue with owncloud 8.0.0. APCu hung apache. This issue was solved by updating APCu to 4.0.6 version. See also this thread https://bugs.launchpad.net/ubuntu/+source/php-apcu/+bug/1422484, this might help you.

@krakjoe

This comment has been minimized.

Show comment
Hide comment
@krakjoe

krakjoe Aug 30, 2015

Owner

It's reporting that stability improved in later versions, so closing the bug ... if I'm wrong, open a new bug.

Owner

krakjoe commented Aug 30, 2015

It's reporting that stability improved in later versions, so closing the bug ... if I'm wrong, open a new bug.

@krakjoe krakjoe closed this Aug 30, 2015

@krakjoe krakjoe reopened this Nov 20, 2015

@krakjoe

This comment has been minimized.

Show comment
Hide comment
@krakjoe

krakjoe Nov 20, 2015

Owner

Please try with the latest pecl release and report back ?

Owner

krakjoe commented Nov 20, 2015

Please try with the latest pecl release and report back ?

@krakjoe

This comment has been minimized.

Show comment
Hide comment
@krakjoe

krakjoe Dec 7, 2015

Owner

Bump, can you try with 4.0.10 or 5.1.2 please ?

Owner

krakjoe commented Dec 7, 2015

Bump, can you try with 4.0.10 or 5.1.2 please ?

@krakjoe krakjoe added the feedback label Dec 7, 2015

@masterdead

This comment has been minimized.

Show comment
Hide comment
@masterdead

masterdead Jan 5, 2016

Still persist

#0  0x00007f5da2862b2d in pthread_rwlock_wrlock () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007f5da1a38359 in apc_lock_wlock (lock=<optimized out>) at /usr/src/apcu/apcu-4.0.10/apc_lock.c:245
#2  0x00007f5da1a3cbe8 in apc_cache_insert (cache=cache@entry=0x2d73120, key=..., value=0x7f5d95412ce8, ctxt=ctxt@entry=0x7ffe99094b80, t=t@entry=1452024027, exclusive=exclusive@entry=0 '\000')
    at /usr/src/apcu/apcu-4.0.10/apc_cache.c:748
#3  0x00007f5da1a3d243 in apc_cache_store (cache=0x2d73120, strkey=0x7f5d899de0f8 "core-init-cache-fantasy", keylen=40, val=0x7f5d899130c8, ttl=59, exclusive=exclusive@entry=0 '\000')
    at /usr/src/apcu/apcu-4.0.10/apc_cache.c:347
#4  0x00007f5da1a39fab in apc_store_helper (ht=<optimized out>, return_value=0x7f5d899e4288, exclusive=<optimized out>, return_value_used=<error reading variable: Unhandled dwarf expression opcode 0xfa>, 
    this_ptr=<error reading variable: Unhandled dwarf expression opcode 0xfa>, return_value_ptr=<error reading variable: Unhandled dwarf expression opcode 0xfa>) at /usr/src/apcu/apcu-4.0.10/php_apc.c:662
#5  0x00000000006e86b9 in dtrace_execute_internal ()
#6  0x000000000079dd01 in ?? ()
#7  0x0000000000762e58 in execute_ex ()
#8  0x00000000006e858d in dtrace_execute_ex ()
#9  0x000000000079e338 in ?? ()
#10 0x0000000000762e58 in execute_ex ()
#11 0x00000000006e858d in dtrace_execute_ex ()
#12 0x000000000079e338 in ?? ()
#13 0x0000000000762e58 in execute_ex ()
#14 0x00000000006e858d in dtrace_execute_ex ()
#15 0x000000000079e338 in ?? ()
#16 0x0000000000762e58 in execute_ex ()
#17 0x00000000006e858d in dtrace_execute_ex ()
#18 0x000000000079e338 in ?? ()
#19 0x0000000000762e58 in execute_ex ()
#20 0x00000000006e858d in dtrace_execute_ex ()
#21 0x000000000079fb02 in ?? ()
#22 0x0000000000762e58 in execute_ex ()
#23 0x00000000006e858d in dtrace_execute_ex ()
#24 0x00000000006faee8 in zend_execute_scripts ()
#25 0x00000000006963d2 in php_execute_script ()
#26 0x0000000000473b32 in main ()
(gdb) frame 0
#0  0x00007f143d000b2d in pthread_rwlock_wrlock () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) frame 1
#1  0x00007f143c1d6359 in apc_lock_wlock (lock=<optimized out>) at /usr/src/apcu/apcu-4.0.10/apc_lock.c:245
245 /usr/src/apcu/apcu-4.0.10/apc_lock.c: No such file or directory.
(gdb) frame 2
#2  0x00007f143c1dabe8 in apc_cache_insert (cache=cache@entry=0x29d0120, key=..., value=0x7f142c94f1f8, ctxt=ctxt@entry=0x7ffd4e182b20, t=t@entry=1452028220, exclusive=exclusive@entry=0 '\000')
    at /usr/src/apcu/apcu-4.0.10/apc_cache.c:748
748 /usr/src/apcu/apcu-4.0.10/apc_cache.c: No such file or directory.
(gdb) frame 3
#3  0x00007f143c1db243 in apc_cache_store (cache=0x29d0120, strkey=0x7f142417a418 "hobby-blesk-cz-magsets-props-103", keylen=33, val=0x7f1424192430, ttl=237, exclusive=exclusive@entry=0 '\000')
    at /usr/src/apcu/apcu-4.0.10/apc_cache.c:347
347 in /usr/src/apcu/apcu-4.0.10/apc_cache.c
(gdb) frame 4
#4  0x00007f143c1d7fab in apc_store_helper (ht=<optimized out>, return_value=0x7f1424180d80, exclusive=<optimized out>, return_value_used=<error reading variable: Unhandled dwarf expression opcode 0xfa>, 
    this_ptr=<error reading variable: Unhandled dwarf expression opcode 0xfa>, return_value_ptr=<error reading variable: Unhandled dwarf expression opcode 0xfa>) at /usr/src/apcu/apcu-4.0.10/php_apc.c:662
662 /usr/src/apcu/apcu-4.0.10/php_apc.c: No such file or directory.

masterdead commented Jan 5, 2016

Still persist

#0  0x00007f5da2862b2d in pthread_rwlock_wrlock () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007f5da1a38359 in apc_lock_wlock (lock=<optimized out>) at /usr/src/apcu/apcu-4.0.10/apc_lock.c:245
#2  0x00007f5da1a3cbe8 in apc_cache_insert (cache=cache@entry=0x2d73120, key=..., value=0x7f5d95412ce8, ctxt=ctxt@entry=0x7ffe99094b80, t=t@entry=1452024027, exclusive=exclusive@entry=0 '\000')
    at /usr/src/apcu/apcu-4.0.10/apc_cache.c:748
#3  0x00007f5da1a3d243 in apc_cache_store (cache=0x2d73120, strkey=0x7f5d899de0f8 "core-init-cache-fantasy", keylen=40, val=0x7f5d899130c8, ttl=59, exclusive=exclusive@entry=0 '\000')
    at /usr/src/apcu/apcu-4.0.10/apc_cache.c:347
#4  0x00007f5da1a39fab in apc_store_helper (ht=<optimized out>, return_value=0x7f5d899e4288, exclusive=<optimized out>, return_value_used=<error reading variable: Unhandled dwarf expression opcode 0xfa>, 
    this_ptr=<error reading variable: Unhandled dwarf expression opcode 0xfa>, return_value_ptr=<error reading variable: Unhandled dwarf expression opcode 0xfa>) at /usr/src/apcu/apcu-4.0.10/php_apc.c:662
#5  0x00000000006e86b9 in dtrace_execute_internal ()
#6  0x000000000079dd01 in ?? ()
#7  0x0000000000762e58 in execute_ex ()
#8  0x00000000006e858d in dtrace_execute_ex ()
#9  0x000000000079e338 in ?? ()
#10 0x0000000000762e58 in execute_ex ()
#11 0x00000000006e858d in dtrace_execute_ex ()
#12 0x000000000079e338 in ?? ()
#13 0x0000000000762e58 in execute_ex ()
#14 0x00000000006e858d in dtrace_execute_ex ()
#15 0x000000000079e338 in ?? ()
#16 0x0000000000762e58 in execute_ex ()
#17 0x00000000006e858d in dtrace_execute_ex ()
#18 0x000000000079e338 in ?? ()
#19 0x0000000000762e58 in execute_ex ()
#20 0x00000000006e858d in dtrace_execute_ex ()
#21 0x000000000079fb02 in ?? ()
#22 0x0000000000762e58 in execute_ex ()
#23 0x00000000006e858d in dtrace_execute_ex ()
#24 0x00000000006faee8 in zend_execute_scripts ()
#25 0x00000000006963d2 in php_execute_script ()
#26 0x0000000000473b32 in main ()
(gdb) frame 0
#0  0x00007f143d000b2d in pthread_rwlock_wrlock () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) frame 1
#1  0x00007f143c1d6359 in apc_lock_wlock (lock=<optimized out>) at /usr/src/apcu/apcu-4.0.10/apc_lock.c:245
245 /usr/src/apcu/apcu-4.0.10/apc_lock.c: No such file or directory.
(gdb) frame 2
#2  0x00007f143c1dabe8 in apc_cache_insert (cache=cache@entry=0x29d0120, key=..., value=0x7f142c94f1f8, ctxt=ctxt@entry=0x7ffd4e182b20, t=t@entry=1452028220, exclusive=exclusive@entry=0 '\000')
    at /usr/src/apcu/apcu-4.0.10/apc_cache.c:748
748 /usr/src/apcu/apcu-4.0.10/apc_cache.c: No such file or directory.
(gdb) frame 3
#3  0x00007f143c1db243 in apc_cache_store (cache=0x29d0120, strkey=0x7f142417a418 "hobby-blesk-cz-magsets-props-103", keylen=33, val=0x7f1424192430, ttl=237, exclusive=exclusive@entry=0 '\000')
    at /usr/src/apcu/apcu-4.0.10/apc_cache.c:347
347 in /usr/src/apcu/apcu-4.0.10/apc_cache.c
(gdb) frame 4
#4  0x00007f143c1d7fab in apc_store_helper (ht=<optimized out>, return_value=0x7f1424180d80, exclusive=<optimized out>, return_value_used=<error reading variable: Unhandled dwarf expression opcode 0xfa>, 
    this_ptr=<error reading variable: Unhandled dwarf expression opcode 0xfa>, return_value_ptr=<error reading variable: Unhandled dwarf expression opcode 0xfa>) at /usr/src/apcu/apcu-4.0.10/php_apc.c:662
662 /usr/src/apcu/apcu-4.0.10/php_apc.c: No such file or directory.
@jr997

This comment has been minimized.

Show comment
Hide comment
@jr997

jr997 Feb 29, 2016

I've experienced the same issue, but found the cause (at least in my case)

At one point I called apc_cache_info(), at a time where the result of that call was too big to fit in PHPs memory_limit. This gave me a "500 Internal Server Error", which seems acceptable, BUT it failed to release the lock. Therefor subsequent calls, in that or other apache child processes, all waited for a lock, that was never going to be released.

This does seem like a bug in my humble opinion: the system should never end up in this state, no matter what flaw the PHP programmer might introduce.

One thing I'd like to suggest, is that the apc_cache_info() returns something like FALSE if the memory_limit prevents the APC system from returning the requested data.

ps. to whoever needs a quick-fix: I simply changed the memory_limit in the script that called apc_cache_info(), and I haven't seen the problem since.

I use 4.0.10

jr997 commented Feb 29, 2016

I've experienced the same issue, but found the cause (at least in my case)

At one point I called apc_cache_info(), at a time where the result of that call was too big to fit in PHPs memory_limit. This gave me a "500 Internal Server Error", which seems acceptable, BUT it failed to release the lock. Therefor subsequent calls, in that or other apache child processes, all waited for a lock, that was never going to be released.

This does seem like a bug in my humble opinion: the system should never end up in this state, no matter what flaw the PHP programmer might introduce.

One thing I'd like to suggest, is that the apc_cache_info() returns something like FALSE if the memory_limit prevents the APC system from returning the requested data.

ps. to whoever needs a quick-fix: I simply changed the memory_limit in the script that called apc_cache_info(), and I haven't seen the problem since.

I use 4.0.10

@mattsmith0308

This comment has been minimized.

Show comment
Hide comment
@mattsmith0308

mattsmith0308 May 2, 2016

Has anyone looked into this issue recently? I was able to reproduce this in the same manner @jr997 mentioned using 4.0.10. It looks like a fix would be as simple as allocating all necessary memory for the cache_info function prior to taking this lock https://github.com/krakjoe/apcu/blob/PHP5/apc_cache.c#L1539.

mattsmith0308 commented May 2, 2016

Has anyone looked into this issue recently? I was able to reproduce this in the same manner @jr997 mentioned using 4.0.10. It looks like a fix would be as simple as allocating all necessary memory for the cache_info function prior to taking this lock https://github.com/krakjoe/apcu/blob/PHP5/apc_cache.c#L1539.

@muxx

This comment has been minimized.

Show comment
Hide comment
@muxx

muxx May 22, 2016

The same problem on php7.0.5 and apcu 5.1.2

muxx commented May 22, 2016

The same problem on php7.0.5 and apcu 5.1.2

@dawehner dawehner referenced this issue Jun 3, 2016

Closed

Update php56-apcu.rb #3242

3 of 3 tasks complete
@Zlender

This comment has been minimized.

Show comment
Hide comment
@Zlender

Zlender Sep 26, 2016

@muxx do you have a script that would show how your reproduce it with php7. I was trying to do it the way it locks with php5.6 but were not able to.

Zlender commented Sep 26, 2016

@muxx do you have a script that would show how your reproduce it with php7. I was trying to do it the way it locks with php5.6 but were not able to.

@muxx

This comment has been minimized.

Show comment
Hide comment
@muxx

muxx Sep 26, 2016

@Zlender Unfortunally I don't have script which can reproduce this case :( Problem appears on the workload at ~20-30 rps and can trigger after hour or after 3 hours of working or after 5 hours. And 20-30 rps prevent to understand what particular request causes the problem.

muxx commented Sep 26, 2016

@Zlender Unfortunally I don't have script which can reproduce this case :( Problem appears on the workload at ~20-30 rps and can trigger after hour or after 3 hours of working or after 5 hours. And 20-30 rps prevent to understand what particular request causes the problem.

@Zlender

This comment has been minimized.

Show comment
Hide comment
@Zlender

Zlender Sep 27, 2016

Instructions on how to reproduce this with PHP 5.6 https://github.com/Zlender/apcu_19 similar way also works on latest PHP 7 and apcu 5

Zlender commented Sep 27, 2016

Instructions on how to reproduce this with PHP 5.6 https://github.com/Zlender/apcu_19 similar way also works on latest PHP 7 and apcu 5

@krakjoe krakjoe added bug and removed feedback labels Sep 29, 2016

@krakjoe krakjoe closed this in 9749246 Sep 29, 2016

krakjoe added a commit that referenced this issue Sep 29, 2016

@sarumpaet

This comment has been minimized.

Show comment
Hide comment
@sarumpaet

sarumpaet Sep 29, 2016

@krakjoe I think that fix is too narrow? Similar code is in apc_cache.c/apc_cache_stat (APC_RLOCK, then array_init()) and several places in apc_iterator.c. Shouldn't basically all parts after any APC_RLOCK (and possibly APC_LOCK, too) be guarded by zend_try?

sarumpaet commented Sep 29, 2016

@krakjoe I think that fix is too narrow? Similar code is in apc_cache.c/apc_cache_stat (APC_RLOCK, then array_init()) and several places in apc_iterator.c. Shouldn't basically all parts after any APC_RLOCK (and possibly APC_LOCK, too) be guarded by zend_try?

krakjoe added a commit that referenced this issue Sep 29, 2016

@krakjoe

This comment has been minimized.

Show comment
Hide comment
@krakjoe

krakjoe Sep 29, 2016

Owner

@sarumpaet I think you are right, after a quick review I've added some more tries ...

I'm looking for feedback now on stability ... hopefully if nothing bad happens @remicollet can do a release in the coming days ...

Owner

krakjoe commented Sep 29, 2016

@sarumpaet I think you are right, after a quick review I've added some more tries ...

I'm looking for feedback now on stability ... hopefully if nothing bad happens @remicollet can do a release in the coming days ...

@remicollet

This comment has been minimized.

Show comment
Hide comment
@remicollet

remicollet Sep 29, 2016

Collaborator

hopefully if nothing bad happens @remicollet can do a release in the coming days ...

No problem, just ping me when ready.

Collaborator

remicollet commented Sep 29, 2016

hopefully if nothing bad happens @remicollet can do a release in the coming days ...

No problem, just ping me when ready.

krakjoe added a commit that referenced this issue Sep 29, 2016

@sarumpaet

This comment has been minimized.

Show comment
Hide comment
@sarumpaet

sarumpaet Sep 29, 2016

I don't know how the zend_try mechanism works exactly - is it possible that some of the functions return ill defined values now in the out of memory case? E.g., zend_try { ... array_init(stat); ... } zend_end_try(); return stat; looks suspicious, possibly same for some boolean returns.

sarumpaet commented Sep 29, 2016

I don't know how the zend_try mechanism works exactly - is it possible that some of the functions return ill defined values now in the out of memory case? E.g., zend_try { ... array_init(stat); ... } zend_end_try(); return stat; looks suspicious, possibly same for some boolean returns.

@krakjoe

This comment has been minimized.

Show comment
Hide comment
@krakjoe

krakjoe Sep 29, 2016

Owner

Fixed that on last commit, forgot to tag issue ...

Owner

krakjoe commented Sep 29, 2016

Fixed that on last commit, forgot to tag issue ...

krakjoe added a commit that referenced this issue Oct 15, 2016

Revert "more for #19"
This reverts commit ac97f34.

for #203

krakjoe added a commit that referenced this issue Oct 15, 2016

Revert "fix #19"
This reverts commit 9749246.

for #203

@krakjoe krakjoe reopened this Oct 15, 2016

@krakjoe

This comment has been minimized.

Show comment
Hide comment
@krakjoe

krakjoe Oct 15, 2016

Owner

Crap ...

Owner

krakjoe commented Oct 15, 2016

Crap ...

@krakjoe krakjoe closed this in f0e9bed Oct 16, 2016

krakjoe added a commit that referenced this issue Oct 16, 2016

@patrickbajao

This comment has been minimized.

Show comment
Hide comment
@patrickbajao

patrickbajao Nov 9, 2016

Hi,

Will there be a 4.0.12 release? I can see that the fix was already backported in the PHP5 branch but I don't know when it'll be released.

patrickbajao commented Nov 9, 2016

Hi,

Will there be a 4.0.12 release? I can see that the fix was already backported in the PHP5 branch but I don't know when it'll be released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment