FPM: keepalived connection with fastcgi_finish_request causes dead lock #10335

Appla · 2023-01-16T08:44:33Z

Description

The following code:

<?php

$pid = getmypid();
echo 'pid:', $pid, ', start@', $_SERVER['REQUEST_TIME_FLOAT'], ', elapsed: ', microtime(true) - $_SERVER['REQUEST_TIME_FLOAT'], PHP_EOL;

if (isset($_GET['later'])) {
    if ($_GET['later'] === '1') {
        // using PR's version
        // $c = fastcgi_finish_request(true) ? 'modified:true' : 'modified:false';
        $c = fastcgi_finish_request() ? 'raw:true' : 'raw:false';
    } else {
        $c = fastcgi_finish_request() ? 'raw:true' : 'raw:false';
    }
    $ret = req(['pid' => $pid, 'sleep' => 5, 'c' => $c]);
    var_dump($ret);
} elseif (isset($_GET['sleep'])) {
    sleep((int)$_GET['sleep']);
} else {
    echo 'nothing', PHP_EOL;
}
$now = microtime(true);

echo 'pid:', $pid, ', end@', $now, ', elapsed: ', $now - $_SERVER['REQUEST_TIME_FLOAT'], PHP_EOL;


function req(array $args) {
    $tFmt = (new DateTime())->format('Y.m.d.H-i-s.u');
    $path = basename(__FILE__);
    $url = "{$_SERVER['REQUEST_SCHEME']}://{$_SERVER['HTTP_HOST']}/{$path}?reqAt={$tFmt}&" . http_build_query($args);
    return file_get_contents($url, false, stream_context_create([
        'http' => [
            'method'  => 'POST',
            'timeout' => 10,
            'header'  => 'Content-Type: application/x-www-form-urlencoded',
            'content' => http_build_query([
                'a' => 1,
                'b' => 2,
            ]),
        ],
    ]));
}

with FPM settings:

pm = static
pm.max_children = 2

// uncomment if wanna to see access info via fpm access log
; access.format = "%R - %u %t - %T \"%m %r%Q%q\" %s %f %{kilo}M %C%% %p-%P %{mili}d"
; access.log = /path/to/php/fpm-$pool.access.log

and NGINX settings:

## PHP-FPM upstream
upstream PHP_FCGI {
    ## path to fpm listen address
    server 127.0.0.1:9007;

    keepalive 2;
    keepalive_requests 1000;
    keepalive_timeout 60s;
}

## enable fastcgi keepalive
fastcgi_keep_conn on;

server {
    ....
    location = /fpm.php {
        root /path/to/php-file-root/;

        fastcgi_pass PHP_FCGI;
        fastcgi_split_path_info  ^((?U).+\.php)((?s)/.*)$;
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
        fastcgi_param SCRIPT_NAME $fastcgi_script_name;
        fastcgi_param  PATH_INFO  $fastcgi_path_info;
        include        fastcgi_params;
    }
    ....
}

Resulted in this log:

time curl -i 'http://t.fcgi.net/t_fpm.php?later=0'
HTTP/1.1 200 OK
Date: Mon, 16 Jan 2023 08:23:13 GMT
Content-Type: text/html; charset=UTF-8
Transfer-Encoding: chunked
Connection: keep-alive

pid:680637, start@1673857393.355, elapsed: 6.8187713623047E-5
curl -i 'http://t.fcgi.net/t_fpm.php?later=0'  0.00s user 0.00s system 77% cpu 0.006 total

==> /path/to/nginx/logs/access.log <==
127.0.0.1 - - [16/Jan/2023:16:23:13 +0800] http://t.fcgi.net:80 "GET /t_fpm.php?later=0 HTTP/1.1" 200 73 83 "-" "-" - - "curl/7.68.0" "-" 683465 231 - 127.0.0.1:9007 0.001 0.001
127.0.0.1 - - [16/Jan/2023:16:23:23 +0800] http://t.fcgi.net:80 "POST /t_fpm.php?reqAt=2023.01.16.16-23-13.355151&pid=680637&sleep=5&c=raw%3Atrue HTTP/1.0" 499 0 197 "-" "-" - - "-" "-" 683465 233 - 127.0.0.1:9007 10.008 10.007

==> /path/to/php/fpm-www.access.log <==
127.0.0.1 -  16/Jan/2023:16:23:13 +0800 - 16/Jan/2023:16:23:23 +0800 "GET /t_fpm.php?later=0" 200 /var/www/t_fpm.php 2048 0.00% 680637-680635 10008.038

==> /path/to/php/error/log <==
Jan 16 16:23:23 KB-111 ool www[680637]: PHP Warning:  file_get_contents(http://t.fcgi.net/t_fpm.php?reqAt=2023.01.16.16-23-13.355151&amp;pid=680637&amp;sleep=5&amp;c=raw%3Atrue): failed to open stream: HTTP request failed! in /var/www/t_fpm.php on line 35

==> /path/to/php/fpm-www.access.log <==
127.0.0.1 -  16/Jan/2023:16:23:23 +0800 - 16/Jan/2023:16:23:28 +0800 "POST /t_fpm.php?reqAt=2023.01.16.16-23-13.355151&pid=680637&sleep=5&c=raw%3Atrue" 200 /var/www/fpm.php 2048 0.00% 680637-680635 5000.215

But I expected this output instead:

time curl -i 'http://t.fcgi.net/t_fpm.php?later=1'
HTTP/1.1 200 OK
Date: Mon, 16 Jan 2023 08:27:39 GMT
Content-Type: text/html; charset=UTF-8
Transfer-Encoding: chunked
Connection: keep-alive

pid:680638, start@1673857659.7505, elapsed: 0.00011897087097168
time curl -i 'http://t.fcgi.net/t_fpm.php?later=1'  0.01s user 0.00s system 73% cpu 0.007 total

==> /path/to/nginx/logs/access.log <==
127.0.0.1 - - [16/Jan/2023:16:27:39 +0800] http://t.fcgi.net:80 "GET /t_fpm.php?later=1 HTTP/1.1" 200 75 83 "-" "-" - - "curl/7.68.0" "-" 683465 234 - 127.0.0.1:9007 0.001 0.000

==> /path/to/php/fpm-www.access.log <==
127.0.0.1 -  16/Jan/2023:16:27:39 +0800 - 16/Jan/2023:16:27:44 +0800 "POST /t_fpm.php?reqAt=2023.01.16.16-27-39.750866&pid=680638&sleep=5&c=raw%3Atrue" 200 /var/www/t_fpm.php 2048 0.00% 680637-680635 5000.297
127.0.0.1 -  16/Jan/2023:16:27:39 +0800 - 16/Jan/2023:16:27:44 +0800 "GET /t_fpm.php?later=1" 200 /var/www/t_fpm.php 2048 0.00% 680638-680635 5001.523

==> /path/to/nginx/logs/access.log <==
127.0.0.1 - - [16/Jan/2023:16:27:44 +0800] http://t.fcgi.net:80 "POST /t_fpm.php?reqAt=2023.01.16.16-27-39.750866&pid=680638&sleep=5&c=true%7Ctrue HTTP/1.0" 200 121 198 "-" "-" - - "-" "-" 683465 236 - 127.0.0.1:9007 5.001 5.000

This is easy to reproduced by the above desc, IMHO the log is clearly show’s the current PHP‘s behavior is something error.
the reason I think is after proc-X called fastcgi_finish_request this connection is marked reuseable by proxy and if the same process soon req same host the proxy might select the same connection then deadlock happens.

this also can be showing by GDB/tcpdump

so I made this #10273

PHP Version

7.4-8.2

Operating System

Ubuntu 20.04

The text was updated successfully, but these errors were encountered:

bukka · 2024-04-19T12:02:51Z

I have been doing some debugging and this is really an issue. Basically fastcgi_finish_request is pretty much useless when used with keepalive for busy servers. It will return the response sooner but it will introduce start delay for the next request because nginx thinks that it can send other request immediately after. Also as you show here, it introduces dead lock if calling itself. Same reason as nginx thinks that it can send other request so it re-uses connection.

I have been thinking about possible solutions and I can't see a way how to easily address it without closing the connection. Basically it would either require some sort of forking which is most likely more expensive than closing the connection or it would require ZTS which is not usual for FPM and probably not a good idea either due to complexity.

In terms of closing the connection, I'm not sure if it should be done in the function as it all depends on server / FPM configuration so maybe we should just add option that will control if the connection should be always closed after fastcgi_finish_request which might even make sense to be enabled by default as this behaviour is not what users expect. I think no one is expecting fastcgi_finish_request to introduce delay for the next request on busy servers. We should also update docs.

Appla · 2024-04-24T02:41:33Z

In terms of closing the connection, I'm not sure if it should be done in the function as it all depends on server / FPM configuration so maybe we should just add option that will control if the connection should be always closed after fastcgi_finish_request which might even make sense to be enabled by default as this behaviour is not what users expect. I think no one is expecting fastcgi_finish_request to introduce delay for the next request on busy servers. We should also update docs.

The issue arises when keepalive is used alongside fastcgi_finish_request.
To address this, I propose adding a new optional parameter to fastcgi_finish_request that defaults to closing the connection.
This approach not only resolves the problem but also gives users the flexibility to modify the default behavior if needed. Introducing a new configuration option to control this default behavior seems reasonable.

bukka · 2024-05-18T13:37:28Z

The problem is that it requires changes to code for something which is more an operational thing. To me it makes sense to give administrator control over this which is essentially configuration option that effects every fastcgi_finish_request call.

Appla · 2024-05-21T05:38:52Z

The problem is that it requires changes to code for something which is more an operational thing. To me it makes sense to give administrator control over this which is essentially configuration option that effects every fastcgi_finish_request call.

Sure, I have just introduced a new option as the default value for the fastcgi_finish_request parameter to control connection closure.

Appla added Bug Status: Needs Triage labels Jan 16, 2023

Appla mentioned this issue Jan 16, 2023

FPM: fastcgi_finish_request supports force close connection param #10273

Open

bukka added the SAPI: fpm label Jan 25, 2023

bukka removed the Status: Needs Triage label Apr 19, 2024

bukka mentioned this issue Apr 19, 2024

Add FPM early bootstrapping mode #6772

Draft

bwoebi mentioned this issue Apr 19, 2024

Move collecting and flushing telemetry to after final request flush DataDog/dd-trace-php#2555

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FPM: keepalived connection with fastcgi_finish_request causes dead lock #10335

FPM: keepalived connection with fastcgi_finish_request causes dead lock #10335

Appla commented Jan 16, 2023

bukka commented Apr 19, 2024

Appla commented Apr 24, 2024

bukka commented May 18, 2024 •

edited

Loading

Appla commented May 21, 2024

FPM: keepalived connection with fastcgi_finish_request causes dead lock #10335

FPM: keepalived connection with fastcgi_finish_request causes dead lock #10335

Comments

Appla commented Jan 16, 2023

Description

PHP Version

Operating System

bukka commented Apr 19, 2024

Appla commented Apr 24, 2024

bukka commented May 18, 2024 • edited Loading

Appla commented May 21, 2024

bukka commented May 18, 2024 •

edited

Loading