-
Notifications
You must be signed in to change notification settings - Fork 7.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PHP-FPM requests are stuck and childs are killed after elapsing of the timeout #14149
Comments
Please, try to identify the stucking worker, attach debgger ( |
If getting backtrace is problematic for you (e.g. when running in container), please consider configuring the slowlog which should give you more info and also dump the backtrace to the logs. It means set |
At the moment I may share identified and gathered results. strace was launched:
What's fd=11?
To remind you, config is the next
After that, manipulations with fd=2 were checked (decided to confirm that catch_workers_output = yes creates pipe in the master process - fpm_stdio_prepare_pipes).
18 and 25 were inherited from the master process, so the next step is to check the master process.
So, it was the call of fpm_stdio_prepare_pipes (link to target tag 8.1.15) and fpm_stdio_parent_use_pipes. So, the code may fail at one of these lines: fpm_event_set(&child->ev_stderr, child->fd_stderr, FPM_EV_READ, fpm_stdio_child_said, child);
fpm_event_add(&child->ev_stderr, 0); At the moment there is no answer why only 1 fd was added via |
I just had a proper look into this and the code around and can't really see why adding stderr event can be omitted. The only conditions that could trigger it is around the queue checks in If you are only able to re-produce it on production, there might be some external factors causing this. It is actually quite similar to #11447 so might be worth to check the application as well. |
@bukka updates.
php-src/sapi/fpm/fpm/fpm_events.c Lines 162 to 180 in 1a3d870
php-src/sapi/fpm/fpm/fpm_events.c Lines 145 to 160 in 1a3d870
This happens because the pointer to php-src/sapi/fpm/fpm/fpm_events.c Lines 44 to 47 in 1a3d870
After this, we started to track Below you may find logs and comments.
Ok, after adding
As you may see
and in the same line with
What next? We checked the code and have found...
php-src/sapi/fpm/fpm/fpm_children.c Lines 440 to 480 in 1a3d870
php-src/sapi/fpm/fpm/fpm_children.c Lines 350 to 377 in 1a3d870
And php-src/sapi/fpm/fpm/fpm_children.c Lines 41 to 54 in 1a3d870
After this, php-src/sapi/fpm/fpm/fpm_stdio.c Lines 318 to 335 in 1a3d870
php-src/sapi/fpm/fpm/fpm_events.c Lines 505 to 523 in 1a3d870
So, looks like Proofs for this. When the value of php-src/sapi/fpm/fpm/fpm_events.c Lines 505 to 523 in 1a3d870
Later, in php-src/sapi/fpm/fpm/events/epoll.c Lines 156 to 183 in 1a3d870
Our current hypothesis states:
@bukka do you have any idea why |
Additionally, we've checked occurrences of the 6 minutes before it was spotted with
Thid
Later, we've found
Let's check the log
So, |
We had actually an issue with freeing the event too soon which got fixed in 8.1.20: 1029537 . Could you try to update and patch your version to see if it helps by any chance? Ideally it would be great if you could update to latest 8.2 as we cannot really make a fix to 8.1 if it's something else. |
Description
php-fpm child processes periodically stuck. No matter what scripts are executing... Script may stuck and the same script with the same params will execute ok right after this. My application has logs and writes them to stdout/stderr, but during the mentioned "stucks" no logs are present at all, so the logger didn't initialize, and initialization is the first step, even before any external connections to databases/queues/caches. Looks like a bug.
I use a base docker image 8.1.15-fpm. nginx is used as a webserver with FastCGI support and nginx routes via upstream requests to the same host machine (different container) via TCP.
Configuration:
Additional info and logs during such a stuck.
HTTP status code is 499, because nginx closed the connection due to max wait time.
upstream_connect_time="0.000", upstream_bytes_sent="1888" upstream_bytes_received="0"
.There is no pattern present. Simple health checks fail as well as more complicated logic. The only thing in common is that "requests" for the affected child are always low, and the child starts <60s before the stuck.
Any ideas?
Paid help for investigation and resolution is welcomed too.
PHP Version
8.1.15
Operating System
Ubuntu 20.04 in docker
The text was updated successfully, but these errors were encountered: