Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

php-fpm master processes runs 100% CPU and keeps spawning new ones #12157

Closed
rhavenn opened this issue Sep 8, 2023 · 22 comments
Closed

php-fpm master processes runs 100% CPU and keeps spawning new ones #12157

rhavenn opened this issue Sep 8, 2023 · 22 comments

Comments

@rhavenn
Copy link

rhavenn commented Sep 8, 2023

Description

We've been using php-fpm on FreeBSD since the PHP 7.x days and since FreeBSD 11.x and haven't seen this issue until starting to work with FreeBSD 13.2 . This issue does NOT present itself on older FreeBSD 13.0 systems with older versions of PHP 8.1.x

side note: since building for FreeBSD 13.2 I started seeing errors around needing procfs which I have now enabled. We never have had it enabled prior to this. I don't know if the issue is related. The 13.0 systems with earlier version of PHP8.1.x (circa .4) don't need procfs and throw no errors regarding it.

The issue we're seeing is that newer versions of PHP 8.1.18+ on FreeBSD 13.2 keep spawning new php-fpm master processes and they start running at 100% CPU. This happens within about an 30 minutes to an hour or so of PHP FPM being restarted. If I let it run for a few days I end up with 5 or 6 of them. I turned on FPM debug mode and the only line I'm seeing is:
[08-Sep-2023 13:44:17.388809] DEBUG: pid 4361, fpm_unix_init_main(), line 531: The calling process is waiting for the master process to ping via fd=9
[08-Sep-2023 13:44:27.404184] ERROR: pid 4361, fpm_unix_init_main(), line 559: the master process didn't send back its status (via the pipe to the calling process)

and then a little further down:
[08-Sep-2023 13:44:41.500023] DEBUG: pid 17136, fpm_unix_init_main(), line 531: The calling process is waiting for the master process to ping via fd=9
[08-Sep-2023 13:44:41.501321] DEBUG: pid 17580, fpm_scoreboard_init_main(), line 38: got clock tick '128'
[08-Sep-2023 13:44:41.501430] DEBUG: pid 17580, fpm_signals_init_main(), line 219: Unblocking all signals
[08-Sep-2023 13:44:41.501967] DEBUG: pid 17580, fpm_event_init_main(), line 354: event module is kqueue and 1 fds have been reserved
[08-Sep-2023 13:44:41.502024] NOTICE: pid 17580, fpm_init(), line 83: fpm is running, pid 17580
[08-Sep-2023 13:44:41.502030] DEBUG: pid 17580, main(), line 1844: Sending "1" (OK) to parent via fd=10
[08-Sep-2023 13:44:41.502049] DEBUG: pid 17136, fpm_unix_init_main(), line 550: I received a valid acknowledge from the master process, I can exit without error
[08-Sep-2023 13:44:41.507431] DEBUG: pid 17580, fpm_pctl_heartbeat(), line 478: heartbeat have been set up with a timeout of 3333ms
[08-Sep-2023 13:44:41.507461] DEBUG: pid 17580, fpm_event_loop(), line 382: 190480 bytes have been reserved in SHM
[08-Sep-2023 13:44:41.507467] NOTICE: pid 17580, fpm_event_loop(), line 383: ready to handle connections

and that seems accurate, timewise, after the last restart. I don't know if any specific request is triggering this. We run sites through dedicated "site" pools. So, each site gets its own dedicated php-fpm "pool" user. They seem to spawn and die fine and sites still work until eventually we'll run into "max pool" issues due to all the master-processes running.

Let me know what additional information might be helpful.

PHP Version

8.1.23

Operating System

FreeBSD 13.2-p3

@devnexen
Copy link
Member

devnexen commented Sep 13, 2023

  • Would you mind sharing what you can in term of FPM config ?
  • Would the issue still occur with FreeBSD 13.2 but with older versions of PHP, procfs or not ?

Will try to give it a shot at some point with procfs in my own FreeBSD instance, I never needed nor used procfs (I do not need the Linux ABI layer).

@rhavenn
Copy link
Author

rhavenn commented Sep 13, 2023

Sure. Appreciate you taking the time.

I'm trying to track PID references in the debug log based on the original "master" PID and the children it's spawning. These "new" master processes could be children that are getting "killed" by the master, but don't fully exit.

I don't need the Linux ABI layer either. It shouldn't be using it. Does needing procfs imply it's using / trying to use the Linux ABI layer??? I'll double check my poudriere build and config options.

/var/run/php-fpm.pid (master PID file; 0644 perms)
/var/run/php-fpm.sockets.d/ (directory containing all the sockets for the child processes)

All process comms are via local sockets.

/usr/local/etc/php-fpm.conf:

;;;;;;;;;;;;;;;;;;;;;
; FPM Configuration ;
;;;;;;;;;;;;;;;;;;;;;

; All relative paths in this configuration file are relative to PHP's install
; prefix (/usr/local). This prefix can be dynamically changed by using the
; '-p' argument from the command line.

;;;;;;;;;;;;;;;;;;
; Global Options ;
;;;;;;;;;;;;;;;;;;

[global]
; Pid file
; Note: the default prefix is /var
; Default Value: none
pid = run/php-fpm.pid

; Error log file
; If it's set to "syslog", log is sent to syslogd instead of being written
; into a local file.
; Note: the default prefix is /var
; Default Value: log/php-fpm.log
error_log = log/php-fpm.log

; syslog_facility is used to specify what type of program is logging the
; message. This lets syslogd specify that messages from different facilities
; will be handled differently.
; See syslog(3) for possible values (ex daemon equiv LOG_DAEMON)
; Default Value: daemon
;syslog.facility = daemon

; syslog_ident is prepended to every message. If you have multiple FPM
; instances running on the same server, you can change the default value
; which must suit common needs.
; Default Value: php-fpm
;syslog.ident = php-fpm

; Log level
; Possible Values: alert, error, warning, notice, debug
; Default Value: notice
;log_level = notice
;log_level = warning
log_level = debug

; Log limit on number of characters in the single line (log entry). If the
; line is over the limit, it is wrapped on multiple lines. The limit is for
; all logged characters including message prefix and suffix if present. However
; the new line character does not count into it as it is present only when
; logging to a file descriptor. It means the new line character is not present
; when logging to syslog.
; Default Value: 1024
;log_limit = 4096

; Log buffering specifies if the log line is buffered which means that the
; line is written in a single write operation. If the value is false, then the
; data is written directly into the file descriptor. It is an experimental
; option that can potentially improve logging performance and memory usage
; for some heavy logging scenarios. This option is ignored if logging to syslog
; as it has to be always buffered.
; Default value: yes
;log_buffering = no

; If this number of child processes exit with SIGSEGV or SIGBUS within the time
; interval set by emergency_restart_interval then FPM will restart. A value
; of '0' means 'Off'.
; Default Value: 0
;emergency_restart_threshold = 0

; Interval of time used by emergency_restart_interval to determine when
; a graceful restart will be initiated.  This can be useful to work around
; accidental corruptions in an accelerator's shared memory.
; Available Units: s(econds), m(inutes), h(ours), or d(ays)
; Default Unit: seconds
; Default Value: 0
;emergency_restart_interval = 0

; Time limit for child processes to wait for a reaction on signals from master.
; Available units: s(econds), m(inutes), h(ours), or d(ays)
; Default Unit: seconds
; Default Value: 0
;process_control_timeout = 0

; The maximum number of processes FPM will fork. This has been designed to control
; the global number of processes when using dynamic PM within a lot of pools.
; Use it with caution.
; Note: A value of 0 indicates no limit
; Default Value: 0
; process.max = 128
process.max = 100

; Specify the nice(2) priority to apply to the master process (only if set)
; The value can vary from -19 (highest priority) to 20 (lowest priority)
; Note: - It will only work if the FPM master process is launched as root
;       - The pool process will inherit the master process priority
;         unless specified otherwise
; Default Value: no set
; process.priority = -19

; Send FPM to background. Set to 'no' to keep FPM in foreground for debugging.
; Default Value: yes
;daemonize = yes

; Set open file descriptor rlimit for the master process.
; Default Value: system defined value
;rlimit_files = 1024

; Set max core size rlimit for the master process.
; Possible Values: 'unlimited' or an integer greater or equal to 0
; Default Value: system defined value
;rlimit_core = 0

; Specify the event mechanism FPM will use. The following is available:
; - select     (any POSIX os)
; - poll       (any POSIX os)
; - epoll      (linux >= 2.5.44)
; - kqueue     (FreeBSD >= 4.1, OpenBSD >= 2.9, NetBSD >= 2.0)
; - /dev/poll  (Solaris >= 7)
; - port       (Solaris >= 10)
; Default Value: not set (auto detection)
;events.mechanism = epoll
events.mechanism = kqueue

; When FPM is built with systemd integration, specify the interval,
; in seconds, between health report notification to systemd.
; Set to 0 to disable.
; Available Units: s(econds), m(inutes), h(ours)
; Default Unit: seconds
; Default value: 10
;systemd_interval = 10

;;;;;;;;;;;;;;;;;;;;
; Pool Definitions ;
;;;;;;;;;;;;;;;;;;;;

; Multiple pools of child processes may be started with different listening
; ports and different management options.  The name of the pool will be
; used in logs and stats. There is no limitation on the number of pools which
; FPM can handle. Your system will tell you anyway :)

; Include one or more files. If glob(3) exists, it is used to include a bunch of
; files from a glob(3) pattern. This directive can be used everywhere in the
; file.
; Relative path can also be used. They will be prefixed by:
;  - the global prefix if it's been set (-p argument)
;  - /usr/local otherwise
include=/usr/local/etc/php-fpm.d/*.conf
include=/usr/local/etc/webman/php-fpm.d/*.conf

turning on "debug" and specifying "kqueue" ( I normally leave it to auto) are the only things I've changed from the configs I've been using for awhile.

example pool config from the webman directory:


;#pool name
[localuser-www]

; Unix user/group of processes
user = localuser-www
group = localuser-www

; The address:port on which to accept FastCGI requests.
listen.allowed_clients = 127.0.0.1
listen = /var/run/php-fpm.sockets.d/localuser-www.sock
listen.owner = localuser-www
listen.group = www
listen.mode = 0660


; FPM status URL
pm.status_path = /fpm-status

; child process management
pm = ondemand
pm.max_children = 40

;only for dynamic (ignored for ondemand)
pm.start_servers = 1
pm.min_spare_servers = 1
pm.max_spare_servers = 3

; ondemand process idle timeout
pm.process_idle_timeout = 10m

;logging
;access log
access.log = /data/www/webroot/localuser/logs/phplogs/localuser-www.access.log

; slow log
slowlog = /data/www/webroot/localuser/logs/phplogs/localuser-www.slow.log
request_slowlog_timeout = 10s

;error log
php_admin_value[error_log] = /data/www/webroot/localuser/logs/phplogs/localuser-www.error.log
php_admin_flag[log_errors] = on

; misc settings
chdir = /data/www/webroot/localuser/

; PHP process settings
php_value[include_path]  = ".:/data/www/webroot/localuser/"
php_value[short_open_tags] = "On"

php_admin_value[session.save_path] = "/data/www/webroot/localuser/.sessions"
php_admin_value[upload_tmp_dir] = "/data/www/webroot/localuser/.temp"

The snippet of the Apache config that calls it:

<VirtualHost *:80>
        ServerName domain.com


        DocumentRoot /data/www/webroot/localuser/domain.com/public_html/
        CustomLog /data/www/webroot/localuser/domain.com/logs/domain.com.log combined
        ErrorLog /data/www/webroot/localuser/domain.com/logs/domain.com.errors

        #PHP settings -
        ProxyTimeout 600
        <Proxy "fcgi://localuser-www/" acquire=3000 connectiontimeout=3 disablereuse=on max=20>
                #leave empty - matches the SetHandler directive

        </Proxy>

        <FilesMatch ".+\.php$">
                SetHandler "proxy:unix:/var/run/php-fpm.sockets.d/localuser-www.sock|fcgi://localuser-www/"
        </FilesMatch>
</VirtualHost>


#SNI enabled IP
<VirtualHost *:443>
        ServerName domain.com


        DocumentRoot /data/www/webroot/localuser/domain.com/public_html/
        CustomLog /data/www/webroot/localuser/domain.com/logs/domain.com.log combined
        ErrorLog /data/www/webroot/localuser/domain.com/logs/domain.com.errors

        #SSL settings
        SSLEngine on
        SSLCertificateFile /usr/local/etc/webman/ssl.crts/domain.com.crt
        SSLCertificateKeyFile /usr/local/etc/webman/ssl.crts/domain.com.key
        #intermediate CA cert
        SSLCACertificateFile /usr/local/etc/webman/ssl.crts/domain.com.ca


        #PHP settings -
        ProxyTimeout 600
        <Proxy "fcgi://localuser-www/" acquire=3000 connectiontimeout=3 disablereuse=on max=20>
                #leave empty - matches the SetHandler directive

        </Proxy>

        <FilesMatch ".+\.php$">
                SetHandler "proxy:unix:/var/run/php-fpm.sockets.d/localuser-www.sock|fcgi://localuser-www/"
        </FilesMatch>

</VirtualHost>

@rhavenn
Copy link
Author

rhavenn commented Sep 13, 2023

I can confirm the linux ABI layer is not enabled. kldstat doesn't show it and /etc/rc.conf certainly doesn't have it. There isn't any reference to procfs or ABI or Linux in the PHP packages. I have NOLINUX=yes in my poudriere.conf .

I was puzzled by the need for procfs (native FreeBSD version) in/ etc/fstab:

#PROCFS
proc    /proc   procfs  rw      0 0

as it has not ever been required. I figured it was something "new" in more recent PHP 8.1.x releases or something to do with 13.2 release. I have a 13.0 system with php8.1.6 that doesn't need it.

@devnexen
Copy link
Member

I was just mentioning Linux compatilibilty layer because of the manpage. Activating procfs does not enable Linux however due to its deprecation, to me it seems this is the only use case left which really needs it.

@devnexen
Copy link
Member

I can confirm the linux ABI layer is not enabled. kldstat doesn't show it and /etc/rc.conf certainly doesn't have it. There isn't any reference to procfs or ABI or Linux in the PHP packages. I have NOLINUX=yes in my poudriere.conf .

I was puzzled by the need for procfs (native FreeBSD version) in/ etc/fstab:

#PROCFS
proc    /proc   procfs  rw      0 0

as it has not ever been required. I figured it was something "new" in more recent PHP 8.1.x releases or something to do with 13.2 release. I have a 13.0 system with php8.1.6 that doesn't need it.

So I gave a go and procfs does not seem to affect that much the processes. Do you run it in jails ?

turning on "debug" and specifying "kqueue" ( I normally leave it to auto) are the only things I've changed from the configs I've been using for awhile.

auto or kqueue does not matter, on freebsd auto uses kqueue.

@rhavenn
Copy link
Author

rhavenn commented Sep 19, 2023

No, I don't use jails. It's all EC2 VMs. I thought the problem had gone away. It was fine for a few hours, but I checked this morning and the server is back to multiple root processes. However, I disabled procfs since you said it wasn't needed and switch back to "auto". I will enable debug again, but leave it on auto.

I now see a few of these errors again:

[18-Sep-2023 01:54:22] ERROR: failed to open /proc/43726/mem: No such file or directory (2)
[18-Sep-2023 02:36:23] ERROR: failed to open /proc/43726/mem: No such file or directory (2)
[18-Sep-2023 02:53:02] ERROR: failed to open /proc/43726/mem: No such file or directory (2)
[18-Sep-2023 03:56:07] ERROR: failed to open /proc/14295/mem: No such file or directory (2)
[18-Sep-2023 05:44:26] ERROR: failed to open /proc/58925/mem: No such file or directory (2)
[18-Sep-2023 07:04:36] ERROR: failed to open /proc/3002/mem: No such file or directory (2)
[18-Sep-2023 07:09:57] ERROR: failed to open /proc/3326/mem: No such file or directory (2)
[18-Sep-2023 07:44:27] ERROR: failed to open /proc/28753/mem: No such file or directory (2)
[18-Sep-2023 07:45:00] ERROR: failed to open /proc/29183/mem: No such file or directory (2)
[18-Sep-2023 07:51:04] ERROR: failed to open /proc/42751/mem: No such file or directory (2)
[18-Sep-2023 08:36:39] ERROR: failed to open /proc/29183/mem: No such file or directory (2)
[18-Sep-2023 08:39:10] ERROR: failed to open /proc/29183/mem: No such file or directory (2)
[18-Sep-2023 08:50:02] ERROR: failed to open /proc/84306/mem: No such file or directory (2)
[18-Sep-2023 11:12:44] ERROR: failed to open /proc/50393/mem: No such file or directory (2)
[18-Sep-2023 12:12:04] ERROR: failed to open /proc/63333/mem: No such file or directory (2)
[18-Sep-2023 12:23:19] ERROR: failed to open /proc/71502/mem: No such file or directory (2)
[18-Sep-2023 13:02:50] ERROR: failed to open /proc/35915/mem: No such file or directory (2)
[18-Sep-2023 13:07:17] ERROR: failed to open /proc/36004/mem: No such file or directory (2)
[18-Sep-2023 16:19:15] ERROR: failed to open /proc/38415/mem: No such file or directory (2)

if you're saying procfs shouldn't be used then I have no idea what's going on or why it would do that.

@devnexen
Copy link
Member

Thanks.
Did you rebuild php after having disabled procfs ? If you did not, would you mind make clean and rebuild php and if it fails with those lines, would you mind looking if PROC_MEM_FILE is set to mem in your config.log ?

@rhavenn
Copy link
Author

rhavenn commented Sep 19, 2023

In the debug logs I can see these spawn as child pool processes:

[19-Sep-2023 06:13:56.091911] DEBUG: pid 18681, fpm_children_make(), line 430: blocking signals before child birth
[19-Sep-2023 06:13:56.092976] DEBUG: pid 18681, fpm_children_make(), line 454: unblocking signals, child born
[19-Sep-2023 06:13:56.093018] DEBUG: pid 18681, fpm_children_make(), line 460: [pool userA-www] child 59524 started
[19-Sep-2023 06:13:56.093027] DEBUG: pid 18681, fpm_pctl_on_socket_accept(), line 553: [pool userA-www] got accept without idle child available .... I forked
[19-Sep-2023 06:13:56.093033] DEBUG: pid 18681, fpm_event_loop(), line 435: event module triggered 1 events
[19-Sep-2023 06:13:56.131737] DEBUG: pid 18681, fpm_children_make(), line 430: blocking signals before child birth
[19-Sep-2023 06:13:56.132822] DEBUG: pid 18681, fpm_children_make(), line 454: unblocking signals, child born
[19-Sep-2023 06:13:56.132849] DEBUG: pid 18681, fpm_children_make(), line 460: [pool userA-www] child 59540 started
[19-Sep-2023 06:13:56.132856] DEBUG: pid 18681, fpm_pctl_on_socket_accept(), line 553: [pool userA-www] got accept without idle child available .... I forked
[19-Sep-2023 06:13:56.132863] DEBUG: pid 18681, fpm_event_loop(), line 435: event module triggered 1 events

that PID, 59540, is now running as a root process and hogging the CPU with 2 other PIDs that did the same thing. So, it's like it's reverting back to a "master" process after spawning or something. It also is spawning at the same time as another. Is it possible PHP is running into some sort of "race" condition in the spawn logic?

@rhavenn
Copy link
Author

rhavenn commented Sep 19, 2023

Thanks. Did you rebuild php after having disabled procfs ? If you did not, would you mind make clean and rebuild php and if it fails with those lines, would you mind looking if PROC_MEM_FILE is set to mem in your config.log ?

I'm building PHP on a different box via a poudriere setup so I have a custom package repo for all my servers. procfs has never been enabled on it. Here are my 2 options file for php 8.1:

# This file is auto-generated by 'make config'.
# Options for php81-8.1.1
_OPTIONS_READ=php81-8.1.1
_FILE_COMPLETE_OPTIONS_LIST=CGI CLI DEBUG DTRACE EMBED FPM IPV6 LINKTHR  MYSQL80 MYSQLND PHPDBG ZTS
OPTIONS_FILE_SET+=CGI
OPTIONS_FILE_SET+=CLI
OPTIONS_FILE_UNSET+=DEBUG
OPTIONS_FILE_SET+=DTRACE
OPTIONS_FILE_UNSET+=EMBED
OPTIONS_FILE_SET+=FPM
OPTIONS_FILE_SET+=IPV6
OPTIONS_FILE_SET+=LINKTHR
OPTIONS_FILE_UNSET+=MYSQL80
OPTIONS_FILE_SET+=MYSQLND
OPTIONS_FILE_UNSET+=PHPDBG
OPTIONS_FILE_UNSET+=ZTS

and

# This file is auto-generated by 'make config'.
# Options for php81-extensions-1.0
_OPTIONS_READ=php81-extensions-1.0
_FILE_COMPLETE_OPTIONS_LIST=BCMATH BZ2 CALENDAR CTYPE CURL DBA DOM ENCHANT EXIF FFI FILEINFO FILTER FTP GD GETTEXT GMP ICONV IMAP INTL LDAP MBSTRING MYSQLI ODBC OPCACHE OPENSSL PCNTL PDO PDO_DBLIB PDO_FIREBIRD PDO_MYSQL PDO_ODBC PDO_PGSQL PDO_SQLITE PGSQL PHAR POSIX PSPELL READLINE SESSION SHMOP SIMPLEXML SNMP SOAP SOCKETS SODIUM SQLITE3 SYSVMSG SYSVSEM SYSVSHM TIDY TOKENIZER XML XMLREADER XMLWRITER XSL ZIP ZLIB
OPTIONS_FILE_SET+=BCMATH
OPTIONS_FILE_SET+=BZ2
OPTIONS_FILE_UNSET+=CALENDAR
OPTIONS_FILE_SET+=CTYPE
OPTIONS_FILE_SET+=CURL
OPTIONS_FILE_UNSET+=DBA
OPTIONS_FILE_SET+=DOM
OPTIONS_FILE_UNSET+=ENCHANT
OPTIONS_FILE_SET+=EXIF
OPTIONS_FILE_UNSET+=FFI
OPTIONS_FILE_SET+=FILEINFO
OPTIONS_FILE_SET+=FILTER
OPTIONS_FILE_SET+=FTP
OPTIONS_FILE_SET+=GD
OPTIONS_FILE_SET+=GETTEXT
OPTIONS_FILE_UNSET+=GMP
OPTIONS_FILE_SET+=ICONV
OPTIONS_FILE_UNSET+=IMAP
OPTIONS_FILE_SET+=INTL
OPTIONS_FILE_UNSET+=LDAP
OPTIONS_FILE_SET+=MBSTRING
OPTIONS_FILE_SET+=MYSQLI
OPTIONS_FILE_UNSET+=ODBC
OPTIONS_FILE_SET+=OPCACHE
OPTIONS_FILE_SET+=OPENSSL
OPTIONS_FILE_UNSET+=PCNTL
OPTIONS_FILE_SET+=PDO
OPTIONS_FILE_UNSET+=PDO_DBLIB
OPTIONS_FILE_UNSET+=PDO_FIREBIRD
OPTIONS_FILE_SET+=PDO_MYSQL
OPTIONS_FILE_UNSET+=PDO_ODBC
OPTIONS_FILE_UNSET+=PDO_PGSQL
OPTIONS_FILE_SET+=PDO_SQLITE
OPTIONS_FILE_UNSET+=PGSQL
OPTIONS_FILE_SET+=PHAR
OPTIONS_FILE_SET+=POSIX
OPTIONS_FILE_SET+=PSPELL
OPTIONS_FILE_UNSET+=READLINE
OPTIONS_FILE_SET+=SESSION
OPTIONS_FILE_UNSET+=SHMOP
OPTIONS_FILE_SET+=SIMPLEXML
OPTIONS_FILE_UNSET+=SNMP
OPTIONS_FILE_SET+=SOAP
OPTIONS_FILE_SET+=SOCKETS
OPTIONS_FILE_UNSET+=SODIUM
OPTIONS_FILE_SET+=SQLITE3
OPTIONS_FILE_UNSET+=SYSVMSG
OPTIONS_FILE_UNSET+=SYSVSEM
OPTIONS_FILE_UNSET+=SYSVSHM
OPTIONS_FILE_SET+=TIDY
OPTIONS_FILE_SET+=TOKENIZER
OPTIONS_FILE_SET+=XML
OPTIONS_FILE_SET+=XMLREADER
OPTIONS_FILE_SET+=XMLWRITER
OPTIONS_FILE_UNSET+=XSL
OPTIONS_FILE_SET+=ZIP
OPTIONS_FILE_SET+=ZLIB

@rhavenn
Copy link
Author

rhavenn commented Sep 19, 2023

The poudriere config / build log is showing checking for proc mem file... mem which I think what we want? I can't seem to get poudriere to keep the config.log file around for the successful build.

@devnexen
Copy link
Member

devnexen commented Sep 19, 2023

The poudriere config / build log is showing checking for proc mem file... mem which I think what we want? I can't seem to get poudriere to keep the config.log file around for the successful build.

As a matter of fact no, this is the opposite on a typical freebsd setting without procfs mounted, it should not output it at all.
Maybe another package (or dependency) in your poudriere does need procfs tough.

@rhavenn
Copy link
Author

rhavenn commented Sep 19, 2023

I'm not aware of anything and my OPTIONS files aren't grepping anything. I've made no major changes to my builds since the spring and it worked fine until August or there abouts, but certainly some other package could have forced it on.

@rhavenn
Copy link
Author

rhavenn commented Sep 19, 2023

I've added:

USE_PROCFS=no

to my poudriere.conf . I can't find anything info on it in the poudriere man pages, but some googlefu references to the poudriere source code seem to imply it defaults to yes or something like that. I re-ran the PHP build and I no longer see the checking for proc mem file... mem line. Let's see if that helped.

@rhavenn
Copy link
Author

rhavenn commented Sep 19, 2023

Looking at the poudriere source. The commits for that were in 2014 originally and I see no mention of recent changes, so I do think something changed in the php configure process perhaps for that to "flip" like that?

@rhavenn
Copy link
Author

rhavenn commented Sep 19, 2023

So, I'm not seeing anymore proc errors, but still seeing the weird process. It's like it forks, but doesn't "finish" the fork or process "init" to flip over to the pool user.

[19-Sep-2023 17:40:35.388524] DEBUG: pid 57428, fpm_children_make(), line 460: [pool userB-www] child 70978 started
[19-Sep-2023 17:40:35.388531] DEBUG: pid 57428, fpm_pctl_on_socket_accept(), line 553: [pool userB-www] got accept without idle child available .... I forked
[19-Sep-2023 17:40:35.388537] DEBUG: pid 57428, fpm_event_loop(), line 435: event module triggered 1 events

then this new PID, 70978, starts running 100% on 1 core. I can't kill it. A kill -9 does kill it.

ps aux shows it:

root          70978 100.0  0.8  208676  61532  -  R    17:40      70:01.53 php-fpm: master process (/usr/local/etc/php-fpm.conf) (php-fpm)

but the log after the kill -9 shows:

[19-Sep-2023 18:50:53.136804] DEBUG: pid 57428, fpm_got_signal(), line 82: received SIGCHLD
[19-Sep-2023 18:50:53.136829] DEBUG: pid 57428, fpm_event_loop(), line 435: event module triggered 1 events
[19-Sep-2023 18:50:53.136843] WARNING: pid 57428, fpm_children_bury(), line 281: [pool userB-www] child 70978 exited on signal 9 (SIGKILL) after 4217.748327 seconds from start
[19-Sep-2023 18:50:53.136857] DEBUG: pid 57428, fpm_children_make(), line 430: blocking signals before child birth
[19-Sep-2023 18:50:53.137677] DEBUG: pid 57428, fpm_children_make(), line 454: unblocking signals, child born
[19-Sep-2023 18:50:53.137699] NOTICE: pid 57428, fpm_children_make(), line 460: [pool userB-www] child 43924 started

So, ps thinks the process owner is root (it shouldn't be), but php is thinking it's the userB-www user. It's odd once that's killed that it immediately starts another one and that one has the right perms and is running as the right user.

@rhavenn
Copy link
Author

rhavenn commented Sep 29, 2023

I have switched back to an old PHP 8.1.18 package, from /var/cache/pkg, which I had built April 14th which now also exhibits this behavior. This has me heavily leaning in the direction that something changed at the OS layer to make PHP act this way. Still no idea what that is or where to even start. I still also have no idea why /procfs needed a hard "NO" in poudriere when it's been working fine for 5+ years across FreeBSD 11.x, 12.x and 13.x up until now. I only use RELEASE and RELEASE-p* branches.

EDIT: However, I see nothing in 13.2-RELEASE errata notices pointing at this, so it's also possible this is an issue that cropped up at release of 13.2 and I just didn't notice it due to low usage / dev ramp up of a few months to actually put loads on it.

@rhavenn
Copy link
Author

rhavenn commented Sep 29, 2023

I ran ktrace and a kdump on the php-fpm process that is running hot and it's just looping with:

 50600 php-fpm  RET   sched_yield 0
 50600 php-fpm  CALL  sched_yield
 50600 php-fpm  RET   sched_yield 0
 50600 php-fpm  CALL  sched_yield
 50600 php-fpm  RET   sched_yield 0
 50600 php-fpm  CALL  sched_yield
 50600 php-fpm  RET   sched_yield 0
 50600 php-fpm  CALL  sched_yield
 50600 php-fpm  RET   sched_yield 0
 50600 php-fpm  CALL  sched_yield
 50600 php-fpm  RET   sched_yield 0
 50600 php-fpm  CALL  sched_yield
 50600 php-fpm  RET   sched_yield 0

but I have no idea what that means, if anything. truss has similar output.

@devnexen
Copy link
Member

I ran ktrace and a kdump on the php-fpm process that is running hot and it's just looping with:

 50600 php-fpm  RET   sched_yield 0
 50600 php-fpm  CALL  sched_yield
 50600 php-fpm  RET   sched_yield 0
 50600 php-fpm  CALL  sched_yield
 50600 php-fpm  RET   sched_yield 0
 50600 php-fpm  CALL  sched_yield
 50600 php-fpm  RET   sched_yield 0
 50600 php-fpm  CALL  sched_yield
 50600 php-fpm  RET   sched_yield 0
 50600 php-fpm  CALL  sched_yield
 50600 php-fpm  RET   sched_yield 0
 50600 php-fpm  CALL  sched_yield
 50600 php-fpm  RET   sched_yield 0

but I have no idea what that means, if anything. truss has similar output.

those have trustworthy output in general indeed. If you think it comes from the system, did you try by any chance to fill a report in the FreeBSD's bugzilla tracker ?

@rhavenn
Copy link
Author

rhavenn commented Oct 6, 2023

Yea, I put in a ticket there 9/27: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=274135

EDIT: Seems I spoke too soon. The problem took 2 days to surface again, but it did come back. So, ASLR is not the culprit.

It seems FreeBSD's ASLR (default enabled in 13.2+) is to blame. I disabled it and no issues for the last 2 days. If nothing happens by Monday I'll see about escalating the issue...although not quite sure where to :)

disabled ASLR by editing /etc/sysctl.conf and rebooting:

#disable ASLR
kern.elf64.aslr.enable=0
kern.elf64.aslr.pie_enable=0

@rhavenn
Copy link
Author

rhavenn commented Nov 15, 2023

So, after more Google-Fu...this issue seems to be caused by compiling ImageMagick or GraphicsMagick with OpenMP enabled. I re-built my ports tree and both servers that were having the problem exhibited no further issues in the last 10 days. see: https://lists.freebsd.org/archives/freebsd-apache/2022-November/000289.html

So, I don't know if PHP should be responsible for still forking correctly or if it's FreeBSD problem or a OpenMP problem. @Mainteners. Close as you see fit.

@devnexen
Copy link
Member

let's not close it, still worth keeping an eye. thanks for your reports.

@bukka
Copy link
Member

bukka commented Dec 4, 2023

There are known issue with OpenMP and it is recommended to disable it. See a note about it in imagick docs. Reading the above it seems OpenMP somehow messes up the whole forking. I don't think we really want to fix this as OpenMP is not recommended by extension. I think packagers should rather make sure that PHP is not compiled with it.

@bukka bukka closed this as not planned Won't fix, can't repro, duplicate, stale Dec 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants