Skip to content

Commit

Permalink
Make recovery report error message when invalid page header is found.
Browse files Browse the repository at this point in the history
Commit 0668719 changed XLogPageRead() so that it validated the page
header, if invalid page header was found reset the error message and
retried reading the page, to fix the scenario where streaming standby
got stuck at a continuation record. This change hid the error message
about invalid page header, which would make it harder for users to
investigate what the actual issue was found in WAL.

To fix the issue, this commit makes XLogPageRead() report the error
message when invalid page header is found.

When not in standby mode, an invalid page header should cause recovery
to end, not retry reading the page, so XLogPageRead() doesn't need to
validate the page header for the retry. Instead, ReadPageInternal() should
be responsible for the validation in that case. Therefore this commit
changes XLogPageRead() so that if not in standby mode it doesn't validate
the page header for the retry.

This commit has been originally pushed as of 6860198 for 15 and
newer versions, but not to the older branches.  A recent investigation
related to WAL replay failures has showed up that the lack of this patch
in 12~14 is an issue, as we want to be able to improve the WAL reader to
make a correct distinction between the end-of-wal and OOM cases when
validating record headers.  REL_11_STABLE is left out as it will be
EOL'd soon.

Reported-by: Yugo Nagata
Author: Yugo Nagata, Kyotaro Horiguchi
Reviewed-by: Ranier Vilela, Fujii Masao
Discussion: https://postgr.es/m/20210718045505.32f463ed6c227111038d8ae4@sraoss.co.jp
Discussion: https://postgr.es/m/17928-aa92416a70ff44a2@postgresql.org
Backpatch-through: 12
  • Loading branch information
MasaoFujii authored and michaelpq committed Sep 12, 2023
1 parent c8cea81 commit 7b03d3a
Showing 1 changed file with 16 additions and 2 deletions.
18 changes: 16 additions & 2 deletions src/backend/access/transam/xlog.c
Expand Up @@ -12041,7 +12041,7 @@ XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen,

/*
* Check the page header immediately, so that we can retry immediately if
* it's not valid. This may seem unnecessary, because XLogReadRecord()
* it's not valid. This may seem unnecessary, because ReadPageInternal()
* validates the page header anyway, and would propagate the failure up to
* ReadRecord(), which would retry. However, there's a corner case with
* continuation records, if a record is split across two pages such that
Expand All @@ -12064,9 +12064,23 @@ XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen,
*
* Validating the page header is cheap enough that doing it twice
* shouldn't be a big deal from a performance point of view.
*
* When not in standby mode, an invalid page header should cause recovery
* to end, not retry reading the page, so we don't need to validate the
* page header here for the retry. Instead, ReadPageInternal() is
* responsible for the validation.
*/
if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr, readBuf))
if (StandbyMode &&
!XLogReaderValidatePageHeader(xlogreader, targetPagePtr, readBuf))
{
/*
* Emit this error right now then retry this page immediately. Use
* errmsg_internal() because the message was already translated.
*/
if (xlogreader->errormsg_buf[0])
ereport(emode_for_corrupt_record(emode, EndRecPtr),
(errmsg_internal("%s", xlogreader->errormsg_buf)));

/* reset any error XLogReaderValidatePageHeader() might have set */
xlogreader->errormsg_buf[0] = '\0';
goto next_record_is_invalid;
Expand Down

0 comments on commit 7b03d3a

Please sign in to comment.