From f663b009189ff175a8d134a6cc6e26b5940a1449 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Tue, 20 Jun 2023 09:36:35 +0900 Subject: [PATCH] Fix failure at promotion with 2PC transactions and archiving enabled When archiving is enabled, a promotion request would fail with the following error when some 2PC transaction needs to be recovered from WAL, preventing the promotion to complete: FATAL: requested WAL segment pg_wal/000000010000000000000001 has already been removed The origin of the problem is that the last partial segment of the old timeline is renamed before recovering the 2PC data via RecoverPreparedTransactions() at the end of recovery, causing the FATAL because the segment wanted is now renamed with a .partial suffix. This commit reorders a bit the end-of-recovery actions so as the execution of recovery_end_command, the cleanup of the old segments of the old timeline (RemoveNonParentXlogFiles) and the last partial segment rename are done after the 2PC transaction data is recovered with RecoverPreparedTransactions(). This makes the order of these end-of-recovery actions more consistent with ~15, at the exception of the end-of-recovery checkpoint that still needs to happen before all the actions reordered here in v13 and v14, contrary to what 15~ does. v15 and newer versions have "fixed" this problem somewhat accidentally with 811051c, where the end-of-recovery actions got reordered. In this case, the recovery of 2PC transactions happens before the renaming of the last partial segment of the old timeline. v13 and v14 are the versions that can easily see this problem as per the refactoring of 38a95731 where XLogReaderState is reset in XLogBeginRead() before reading the 2PC transaction data. v11 and v12 could also see this problem, but may finish by reading the 2PC data from some of the WAL buffers instead. Perhaps something could be done for these two branches, but I am not really excited about doing something on these per the lack of complaints and per the fact that v11 is soon going to be EOL'd soon (there is always a risk of breaking something). Note that the TAP test 009_twophase.pl is able to exhibit the issue if it enables archiving on the primary node, which does not impact the test coverage as restore_command would remain unused. This is something that should be changed on v15 and HEAD as well, so this will be changed in a separate commit for clarity. Author: Julian Markwort Reviewed-by: Kyotaro Horiguchi, Michael Paquier Discussion: https://postgr.es/m/743b9b45a2d4013bd90b6a5cba8d6faeb717ee34.camel@cybertec.at Backpatch-through: 13 --- src/backend/access/transam/xlog.c | 102 +++++++++++++++--------------- 1 file changed, 51 insertions(+), 51 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 6c2181ce8eccd..fcc04e4acf813 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -8016,6 +8016,57 @@ StartupXLOG(void) CreateCheckPoint(CHECKPOINT_END_OF_RECOVERY | CHECKPOINT_IMMEDIATE); } + /* + * Preallocate additional log files, if wanted. + */ + PreallocXlogFiles(EndOfLog); + + /* + * Okay, we're officially UP. + */ + InRecovery = false; + + /* start the archive_timeout timer and LSN running */ + XLogCtl->lastSegSwitchTime = (pg_time_t) time(NULL); + XLogCtl->lastSegSwitchLSN = EndOfLog; + + /* also initialize latestCompletedXid, to nextXid - 1 */ + LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); + ShmemVariableCache->latestCompletedXid = ShmemVariableCache->nextXid; + FullTransactionIdRetreat(&ShmemVariableCache->latestCompletedXid); + LWLockRelease(ProcArrayLock); + + /* + * Start up subtrans, if not already done for hot standby. (commit + * timestamps are started below, if necessary.) + */ + if (standbyState == STANDBY_DISABLED) + StartupSUBTRANS(oldestActiveXID); + + /* + * Perform end of recovery actions for any SLRUs that need it. + */ + TrimCLOG(); + TrimMultiXact(); + + /* Reload shared-memory state for prepared transactions */ + RecoverPreparedTransactions(); + + /* Shut down xlogreader */ + if (readFile >= 0) + { + close(readFile); + readFile = -1; + } + XLogReaderFree(xlogreader); + + /* + * If any of the critical GUCs have changed, log them before we allow + * backends to write WAL. + */ + LocalSetXLogInsertAllowed(); + XLogReportParameters(); + if (ArchiveRecoveryRequested) { /* @@ -8097,57 +8148,6 @@ StartupXLOG(void) } } - /* - * Preallocate additional log files, if wanted. - */ - PreallocXlogFiles(EndOfLog); - - /* - * Okay, we're officially UP. - */ - InRecovery = false; - - /* start the archive_timeout timer and LSN running */ - XLogCtl->lastSegSwitchTime = (pg_time_t) time(NULL); - XLogCtl->lastSegSwitchLSN = EndOfLog; - - /* also initialize latestCompletedXid, to nextXid - 1 */ - LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); - ShmemVariableCache->latestCompletedXid = ShmemVariableCache->nextXid; - FullTransactionIdRetreat(&ShmemVariableCache->latestCompletedXid); - LWLockRelease(ProcArrayLock); - - /* - * Start up subtrans, if not already done for hot standby. (commit - * timestamps are started below, if necessary.) - */ - if (standbyState == STANDBY_DISABLED) - StartupSUBTRANS(oldestActiveXID); - - /* - * Perform end of recovery actions for any SLRUs that need it. - */ - TrimCLOG(); - TrimMultiXact(); - - /* Reload shared-memory state for prepared transactions */ - RecoverPreparedTransactions(); - - /* Shut down xlogreader */ - if (readFile >= 0) - { - close(readFile); - readFile = -1; - } - XLogReaderFree(xlogreader); - - /* - * If any of the critical GUCs have changed, log them before we allow - * backends to write WAL. - */ - LocalSetXLogInsertAllowed(); - XLogReportParameters(); - /* * Local WAL inserts enabled, so it's time to finish initialization of * commit timestamp.