Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opcache + MPM ITK might cause apache2 to hang #9910

Open
diab303 opened this issue Nov 9, 2022 · 3 comments
Open

opcache + MPM ITK might cause apache2 to hang #9910

diab303 opened this issue Nov 9, 2022 · 3 comments

Comments

@diab303
Copy link

diab303 commented Nov 9, 2022

Description

This is referring to the bug that is present since at least 5.6. It still happens in 7.4 (just confirmed) and the same code is present in 8.x so I assume the behaviour is the same there, too.

Apache + PHP + MPM_ITK hangs at random intervals spamming "Cannot kill process" in the apache error log.

The original bug report is here: https://bugs.php.net/bug.php?id=76482

The fix looks pretty trivial (break the loop when kill fails) but I'm not sure if it would cause other problems.

Quoting the bug history:

It was reported in 5.6.36 but the bug is in 7.1 (reproduced here) and by looking into source code also in current 7.3 branch.

Not only mpm itk is affected but any MPM which switches child processes to different UIDs (via setuid()). Other example is mod_ruid2 - mind04/mod-ruid2#17

kill_all_lockers() has a loop which never ends if killing fails
as kill(mem_usage_check->l_pid, SIGKILL) returns EPERM in errno.

There is

                if (!success) {
                        /* errno is not ESRCH or we ran out of tries to kill the locker */
                        ZCSG(force_restart_time) = time(NULL); /* restore forced restart request */
                        /* cannot kill the locker, bail out with error */
                        zend_accel_error(ACCEL_LOG_ERROR, "Cannot kill process %d: %s!", mem_usage_check->l_pid, strerror(errno));
                }

but it doesn't break the loop. Is brake missing here?

PHP Version

PHP 7.4

Operating System

Debian 11

@cmb69
Copy link
Contributor

cmb69 commented Nov 9, 2022

The fix looks pretty trivial (break the loop when kill fails) but I'm not sure if it would cause other problems.

I don't think that either of the two patches attached to https://bugs.php.net/76482 would help, because on the previous line the process is supposed to be terminated, so the break would never be executed.

@diab303
Copy link
Author

diab303 commented Dec 10, 2022

I've finally found a way to reliably reproduce this, that should help with debugging.

  • Install Apache2 + MPM ITK + mod php 7.4
  • create two virtual hosts running as two different users (assignuserid in apache conf)
  • set opcache.force_restart_timeout=10
  • call vhost1/wait.php: <?php sleep(1000);
  • call vhost2/kill.php: <?php opcache_reset();
    After 10 seconds calling any php script will trigger the forced restart (I kept reloading kill.php a few times)
    This reliably results in the following in apache log, then all PHP scripts return an empty page until apache is restarted

Sat Dec 10 18:26:19 2022 (508078): Message Cached script '/var/www/test1/wait.php'
Sat Dec 10 18:26:21 2022 (508080): Message Cached script '/var/www/test2/kill.php'
Sat Dec 10 18:26:21 2022 (508080): Debug Restart Scheduled! Reason: user
Sat Dec 10 18:26:32 2022 (508083): Warning Forced restart at 1670693192 (after 10 seconds), locked by 508078
Sat Dec 10 18:26:32 2022 (508083): Warning Attempting to kill locker 508078
Sat Dec 10 18:26:32 2022 (508083): Error Cannot kill process 508078: Operation not permitted!

[edit]

Tried patching /ext/opcache/ZendAccelerator.c (this is Debian's 7.4 src) to fix this behaviour, based on this patch: https://bugs.php.net/patch-display.php?bug_id=76482&patch=dont-while-loop-for-ever-php-7.3-only-for-noperm&revision=latest

--- /ext/opcache/ZendAccelerator.c	2022-06-07 10:38:20.000000000 +0200
+++ ZendAccelerator.c	2022-12-10 20:22:56.763530921 +0100
@@ -780,21 +780,25 @@
 		success = 0;
 		tries = 10;
 
 		while (tries--) {
 			zend_accel_error(ACCEL_LOG_WARNING, "Attempting to kill locker %d", mem_usage_check->l_pid);
 			if (kill(mem_usage_check->l_pid, signal)) {
 				if (errno == ESRCH) {
 					/* Process died before the signal was sent */
 					success = 1;
 					zend_accel_error(ACCEL_LOG_WARNING, "Process %d died before SIGKILL was sent", mem_usage_check->l_pid);
-				}
+				} else if (errno == EPERM) {
+					/* We have no permission to kill the other process, so no point in trying */
+					success = 1;
+					zend_accel_error(ACCEL_LOG_WARNING, "No permission to kill Process %d", mem_usage_check->l_pid);
+ 				}
 				break;
 			}
 			/* give it a chance to die */
 			usleep(20000);
 			if (kill(mem_usage_check->l_pid, 0)) {
 				if (errno == ESRCH) {
 					/* successfully killed locker, process no longer exists  */
 					success = 1;
 					zend_accel_error(ACCEL_LOG_WARNING, "Killed locker %d", mem_usage_check->l_pid);
 				}

It improved it (web requests didn't hang instantly) then I eventually ended up with an endless loop like this, so there is definitely room for improvement..

Sat Dec 10 20:31:22 2022 (1108306): Warning Attempting to kill locker 1108241
Sat Dec 10 20:31:22 2022 (1108306): Warning No permission to kill Process 1108241
Sat Dec 10 20:31:22 2022 (1108306): Warning Attempting to kill locker 1108241
Sat Dec 10 20:31:22 2022 (1108306): Warning No permission to kill Process 1108241
Sat Dec 10 20:31:22 2022 (1108306): Warning Attempting to kill locker 1108241
Sat Dec 10 20:31:22 2022 (1108306): Warning No permission to kill Process 1108241

@arekm
Copy link

arekm commented Mar 20, 2024

I guess the same problem will happen if there is an apparmor policy preventing kill signal from being delivered to other httpd processes.

Code should handle EPERM in saner way than delay loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants