Skip to content

Conversation

plmnikulin
Copy link
Contributor

epoll event backend does not guarantee that child input/output events
are reported before SIGCHILD due to finished worker. While a bunch of
events received by epoll is being processed, child-related structures
may be removed before dispatching of an I/O event for the same child.
The result may be attempt to access to memory region allocated for
another purpose, segfault of the master process, and unavailable web
sites.

Postpone processing of SIGCHILD events till other events in the same
bunch are processed.

Fix Bug #62418 php-fpm master process crashes
Fix Bug #65398 Race condition between SIGCHLD and child stdout/stderr event leads to segfault
Fix Bug #75112 php-fpm crashing, hard to reproduce
Fix Bug #77114 php-fpm master segfaults in fpm_event_epoll_wait/fpm_event_fire
Fix Bug #77185 Use-after-free in FPM master event handling

epoll event backend does not guarantee that child input/output events
are reported before SIGCHILD due to finished worker. While a bunch of
events received by epoll is being processed, child-related structures
may be removed before dispatching of an I/O event for the same child.
The result may be attempt to access to memory region allocated for
another purpose, segfault of the master process, and unavailable web
sites.

Postpone processing of SIGCHILD events till other events in the same
bunch are processed.

Fix Bug #62418 php-fpm master process crashes
Fix Bug #65398 Race condition between SIGCHLD and child stdout/stderr event leads to segfault
Fix Bug #75112 php-fpm crashing, hard to reproduce
Fix Bug #77114 php-fpm master segfaults in fpm_event_epoll_wait/fpm_event_fire
Fix Bug #77185 Use-after-free in FPM master event handling
@plmnikulin
Copy link
Contributor Author

The bug has been reported several times.

Bug #77185 suggests a way to reproduce the issue.
With older libc (e.g. in CentOS-7) the following environment
variable may be set to certainly catch use after free cases

MALLOC_PERTURB_=165

The patch fpm-race-condition.patch attached to the Bug #65398
does not eliminate the problem.

@nikic
Copy link
Member

nikic commented Jul 11, 2019

This looks conceptually right. I'm not very familiar with FPM APIs, so not totally sure whether registering a timer is the right way to delay the signal handling.

I'm assuming that the handling for other signals is not affected by this, because it doesn't do anything more than changing process control state.

@plmnikulin
Copy link
Contributor Author

Signals are converted by the handler to one-byte messages and passed through a dedicated file so for event backends they are regular "ready for read" events for a file descriptor.

At a first glance it seems that events should be sorted within the event backend but event stream is opaque at this level so it is hard to posptone just SIGCHILD's and process master restart or stop requests as soon as possible completely in the epoll backend. Frankly speaking, I am not happy that such change affects all event backends, not just the epoll one.

The timer, certainly, is a kind of hack but changing protocol of communication with event backend to implement e.g. "postpone this event" handler response requires much more work and testing with unclear benefits. Assigning some priority values to event handlers is not an option due to child exit and restart request are passed through the same handler. I hope, overhead for the timer could be neglected.

@Novynn
Copy link

Novynn commented Jul 20, 2019

One of our staging environments (Ubuntu 16.04, PHP 7.1.29) had FPMs consistently crashing when sending USR2 (catch_workers_output=yes).

With this patch applied I can confirm that the problem is fixed. I haven't been able to find any side affects as of yet.

@bukka
Copy link
Member

bukka commented Jul 21, 2019

It's a bit hack but I agree that a proper fix would require much bigger changes. I don't have much time for testing but the patch makes sense and if it works I would be fine with this going to 7.3+. I would prefer not to merge it to 7.2 as we have had some fixes in FPM recently that broke other stuff. I'm not saying that this is gonna break anything but think it's better to be safe and see how it works in 7.3 first. :)

@nikic
Copy link
Member

nikic commented Jul 22, 2019

Merged as bdf24f8 into 7.3, together with the test from https://bugs.php.net/bug.php?id=77185. Thanks everyone!

@nikic nikic closed this Jul 22, 2019
@plmnikulin plmnikulin deleted the issue-77114 branch February 18, 2020 07:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants