Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fpm: some workers get stuck on "\0fscf\0" write to master process over time #11447

Open
thias opened this issue Jun 13, 2023 · 8 comments
Open

Comments

@thias
Copy link

thias commented Jun 13, 2023

Description

After updating from PHP 7.3 to 8.1, we are now seeing the number of "active" php-fpm processes slowly grow over time, which is not expected. Looking at the fpm-status output, many processes report a duration of multiple days.

name value
pool www1
process manager static
start time 6/5/2023, 11:25:15 PM
start since 7d 13h 181
accepted conn 1109651666
listen queue 0
max listen queue 0
listen queue len 0
idle processes 678
active processes 90
total processes 768
max active processes 768
max children reached 0
slow requests 466772

In the above, after restarting php-fpm, the number of active processes goes back down to around 60. So only about 30 got stuck out of 1 billion accepted connections. Really not that many, but resulting in +50% active processes after a week, and getting worse as time goes.

Looking at the stuck processes, all are on the exact same write. Here's one example:

# strace -p 5843
strace: Process 5843 attached
write(2, "\0fscf\0", 6

That fd is a pipe to the master php-fpm process:

php-fpm 5843 phpwww    2w  FIFO               0,13       0t0 2303333681 pipe
php-fpm    5843          phpwww    2w     FIFO               0,13       0t0 2303333681 pipe
php-fpm   91227            root 1392r     FIFO               0,13       0t0 2303333681 pipe
91227 ?        Ss   339:39 php-fpm: master process (/etc/php-fpm1.conf)
79811 ?        S      1:37  \_ php-fpm: pool www1
80146 ?        S      1:38  \_ php-fpm: pool www1
80914 ?        S      1:36  \_ php-fpm: pool www1
[...]

The fscf string can be found in sapi/fpm/fpm/fpm_stdio.c:

#define FPM_STDIO_CMD_FLUSH "\0fscf"

So for some reason, it looks like in some rare cases, sending this command from a worker back to the master gets stuck forever.

PHP Version

PHP 8.1.20

Operating System

RHEL 7.9

@thias
Copy link
Author

thias commented Jun 13, 2023

For reference, this is the full configuration file used:

[global]
error_log = /var/log/php-fpm/error.log
log_level = notice
daemonize = no
rlimit_files = 10240

[www1]
listen = /run/php-fpm-www1.sock
listen.backlog = 1024
listen.owner = nginx
listen.group = nginx
listen.mode = 0660
user = phpwww
group = phpwww
pm = static
pm.max_children = 768
pm.max_requests = 128000
pm.status_path = /internal/php-fpm-status-www1
request_slowlog_timeout = 2s
slowlog = /var/log/php-fpm/www1-slow.log
catch_workers_output = yes
security.limit_extensions = .php .js
php_value[session.save_path] = /var/www/session
php_admin_value[error_log] = syslog
php_admin_value[syslog.facility] = local3

And the systemd unit file (here we run 2 different php-fpm pinned each to half of the NUMA node for performance reasons, but single CPU/NUMA servers are also affected):

[Unit]
Description=The PHP FastCGI Process Manager
After=syslog.target network.target
Before=nginx.target google-startup-scripts.service

[Service]
Type=notify
ExecStart=/usr/bin/numactl -m 0-7 -N 0-7 -- /usr/sbin/php-fpm --nodaemonize --fpm-config /etc/php-fpm1.conf
ExecReload=/bin/kill -USR2 $MAINPID
PrivateTmp=true
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

@bukka
Copy link
Member

bukka commented Jun 15, 2023

I have been thinking about this one and not sure why it blocks on that write. This message is sent after the request end and it is done to be able to flush message composed of multiple message not finalized by new line (e.g. https://github.com/php/php-src/blob/06dd1d78a7ec1678b53ef657033c2021f4dc902f/sapi/fpm/tests/log-bwd-multiple-msgs.phpt ). This pipe is used for all logs from the child so I'm just trying to figure out when this can actually block as normally it doesn't block because master should read from it just fine. It looks that for some reason this pipe is open but master does not read from it.

Would you be able to change log_level to debug and see if there is anything interesting for the children that are getting stuck or anything else that does not look right to you? Feel free to share with me some snippet in the time when at lest one of the children got stuck (you can email it to me - it does not really contain any secret info as it is just for master but better to share privately just in case).

@thias
Copy link
Author

thias commented Jun 21, 2023

I've set log_level = debug on one of our production servers, but so far none of the php-fpm workers have gotten stuck there. I'll keep waiting, and hopefully it'll end up happening again, with debug logs for that pid to be provided.

I really don't know what was triggering this, but I do remember often seeing multiple php-fpm workers stuck for the exact same amount of time, so probably something happening that managed to affect more than one worker while it was ongoing.

@thias
Copy link
Author

thias commented Jun 21, 2023

Screenshot 2023-06-21 at 11-08-50 PHP-FPM status page
Looking at one server where it has still been happening (but without debug logs, unfortunately), I just noticed that all of the "stuck" workers report having served exactly 10913 requests. The non-stuck ones look normal, reporting many different values that match the pm.max_requests = 128000 setting.
Either this is some cosmetic weirdness, or it would be useful information...
Screenshot 2023-06-21 at 11-12-07 PHP-FPM status page

@thias
Copy link
Author

thias commented Jun 26, 2023

Ever since enabling debug, the problem hasn't surfaced again.
In the meantime, we did manage to fix some long standing issues with excessive CouchBase accesses that would end up causing issues with moxi-server, which php-fpm connects to using a UNIX socket. Maybe it could have been related, especially since it had been happening on the vast majority of our servers, across multiple different datacenters.... and no longer is, with the same OS, php version and php modules.

Feel free to close this issue if nothing can be done with the current information, and I'll reopen attaching debug output if it ever reappears on our servers.

@bukka
Copy link
Member

bukka commented Jul 13, 2023

It might signal some issue as this should not normally happen. Can you describe a bit more the CouchBase issue and specifically the moxi-server setup? It might give me some idea what might have happened.

@thias
Copy link
Author

thias commented Jul 13, 2023

The moxi-server is just a local process that accepts memcache GET/SET calls and forwards them to a CouchBase cluster to which it's connected and aware of its topology. From php-fpm, the code is basically just using the memcached pecl module (3.2.0, latest as of writing this) to connect to that socket and send the GET and SET, of which many are large multi-GET, at an average rate of about 3000/s across two different php-fpm backends of 768 processes each.
The CouchBase issues were due to excessive GET operations being sent from php-fpm through moxi-server.

@bukka
Copy link
Member

bukka commented Jul 15, 2023

I have been looking into and thinking about this and it looks me that the primary cause might have been due to the UNIX socket overload and / or something related to the high load at that time which by the report happened at the same time. We won't probably figure that part out but more important question for me is why FPM needs to wait forever for this write. Obviously something went wrong with the pipe buffer (or something related to that) as it did not unblock the waiting process. It's a blocking pipe so it might potentially get to such state but why does this happen only for that request flushing write and not for other logs?
The answer is that that all other logs go to syslog so this is the only thing that is logged to the pipe. I realised that after re-reading your config.

So how could we better recover from this issue. I have got few ideas:

  • use non blocking pipe for stderr
  • add select / poll around that request flushing write with some timeout to make sure it goes through

Well the second one make sense only for non-blocking socket so it is really just one idea. :)

I think the above could potentially help and also improve pipe throughput. It requires some changes in reading and writing logic though. It might also have some side effects so it needs to be properly tested under heavy load. For those reason it will be more a feature as it will just target master branch for those changes due to their impact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants