ext/opcache: Retry waitpid on EINTR from repeated reload signals#20051
Closed
welcomycozyhom wants to merge 1 commit into
Closed
ext/opcache: Retry waitpid on EINTR from repeated reload signals#20051welcomycozyhom wants to merge 1 commit into
welcomycozyhom wants to merge 1 commit into
Conversation
In production, reload signals may arrive multiple times before previous restart operations complete. This occurs when: - Legacy deployment scripts trigger rapid reloads - Monitoring systems aggressively check service health - Orchestration platforms retry operations The SAPI registers signal handlers without SA_RESTART, causing system calls to return EINTR. Without retry logic, waitpid() during preload can fail non-deterministically, terminating the master process unexpectedly. This adds EINTR handling to ensure stable operation in signal-heavy environments.
dstogov
reviewed
Oct 6, 2025
Comment on lines
+4891
to
+4893
| do { | ||
| chld_pid = waitpid(pid, &status, 0); | ||
| } while (chld_pid < 0 && errno == EINTR); |
Member
There was a problem hiding this comment.
This looks right, except we probably have to check for a stronger signals (e.g. SIGQUIT).
Otherwise we may hung in this loop forever.
@arnaud-lb can you please take care about this.
This was referenced Oct 6, 2025
Contributor
Author
|
Thanks for the review, @dstogov |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background
Our company has been running a self-developed e-commerce solution on Apache/httpd
with mod_php for over 20 years. This runtime environment is deployed across
thousands of physical servers with different configurations.
Recently, we started using
FFI::scope()from ext/ffi, which required enablingopcache.preload. During this implementation, we discovered an issue with howpreload handles graceful restarts.
The Problem
In our production environment, Apache graceful restarts (triggered by SIGUSR1)
occur frequently as part of our operational workflows. We found that when
multiple restart signals are accidentally sent to the master process in rapid
succession, the
waitpid()call inaccel_finish_startup()can be interruptedwith
EINTR.The current code doesn't handle this:
When waitpid() returns -1 due to EINTR, the master process terminates
unexpectedly, causing a complete service outage that looks like a system-wide
shutdown.
Notably, Apache's internal SIGUSR1 signal handler is registered without
SA_RESTART, which means interrupted system calls return EINTR rather than
auto-resuming.
This failure is completely unrelated to the actual success or failure of the
preload subprocess itself - it's purely a signal handling timing issue. While
the occurrence is rare and non-deterministic, it has significant impact when
it happens.
Given our large-scale on-premises infrastructure with 20+ years of accumulated
workflows, eliminating all scenarios where duplicate restart signals might be
sent is impractical. We believe a defensive approach in the PHP code is more
pragmatic.
Reproduction
While our production scenario is complex, here's a simplified test case that
demonstrates the issue:
Test script
php.ini
Before test - Normal process tree
Error log when issue occurs
After this error, the master process exits and all worker processes are lost,
requiring manual intervention to restore service.