Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix recovery conflict SIGUSR1 handling.
We shouldn't be doing non-trivial work in signal handlers in general, and in this case the handler could reach unsafe code and corrupt state. It also clobbered its own "reason" code. Move all recovery conflict decision logic into the next CHECK_FOR_INTERRUPTS(), and have the signal handler just set flags and the latch, following the standard pattern. Since there are several different "reasons", use a separate flag for each. With this refactoring, the recovery conflict system no longer piggy-backs on top of the regular query cancelation mechanism, but instead raises an error directly if it decides that is necessary. It still needs to respect QueryCancelHoldoffCount, because otherwise the FEBE protocol might get out of sync (see commit 2b3a8b2). This fixes one class of intermittent failure in the new 031_recovery_conflict.pl test added by commit 9f8a050, though the buggy coding is much older. Failures outside contrived testing seem to be very rare (or perhaps incorrectly attributed) in the field, based on lack of reports. No back-patch for now due to complexity and release schedule. We have the option to back-patch into 16 later, as 16 has prerequisite commit bea3d7e. Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version) Reviewed-by: Michael Paquier <michael@paquier.xyz> (earlier version) Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier version) Tested-by: Christoph Berg <myon@debian.org> Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com Discussion: https://postgr.es/m/CALj2ACVr8au2J_9D88UfRCi0JdWhyQDDxAcSVav0B0irx9nXEg%40mail.gmail.com
- Loading branch information