Skip to content

Commit

Permalink
nbtree VACUUM: cope with right sibling link corruption.
Browse files Browse the repository at this point in the history
Avoid "right sibling's left-link doesn't match" errors when vacuuming a
corrupt nbtree index.  Just LOG the issue and press on.  That way VACUUM
will have a decent chance of finishing off all required processing for
the index (and for the table as a whole).

This error was seen in the field from time to time (it's more than a
theoretical risk), so giving VACUUM the ability to press on like this
has real value.  Nothing short of a REINDEX is expected to fix the
underlying index corruption, so giving up (by throwing an error) risks
making a bad situation far worse.  Anything that blocks forward progress
by VACUUM like this might go unnoticed for a long time.  This could
eventually lead to a wraparound/xidStopLimit outage.

Note that _bt_unlink_halfdead_page() has always been able to bail on
page deletion when the target page's left sibling page was in an
inconsistent state.  It now does the same thing (returns false to back
out of the second phase of deletion) when it notices sibling link
corruption in the target page's right sibling page.

This is similar to the work from commit 5b861ba (later backpatched as
commit 43e409c), which taught nbtree to press on with vacuuming an
index when page deletion fails to "re-find" a downlink in the target
page's parent page.  The "re-find" check seems to make VACUUM bail on
page deletion more often in practice, but there is no reason to take any
chances here.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/CAH2-Wzko2q2kP1+UvgJyP9g0mF4hopK0NtQZcxwvMv9_ytGhkQ@mail.gmail.com
Backpatch: 11- (all supported versions).
  • Loading branch information
petergeoghegan committed May 25, 2023
1 parent 956c625 commit 8f876d1
Show file tree
Hide file tree
Showing 2 changed files with 35 additions and 13 deletions.
45 changes: 34 additions & 11 deletions src/backend/access/nbtree/nbtpage.c
Expand Up @@ -2032,13 +2032,6 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, BlockNumber scanblkno,
leftsib = opaque->btpo_next;
_bt_relbuf(rel, lbuf);

/*
* It'd be good to check for interrupts here, but it's not easy to
* do so because a lock is always held. This block isn't
* frequently reached, so hopefully the consequences of not
* checking interrupts aren't too bad.
*/

if (leftsib == P_NONE)
{
elog(LOG, "no left sibling (concurrent deletion?) of block %u in \"%s\"",
Expand All @@ -2057,6 +2050,9 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, BlockNumber scanblkno,
}
return false;
}

CHECK_FOR_INTERRUPTS();

lbuf = _bt_getbuf(rel, leftsib, BT_WRITE);
page = BufferGetPage(lbuf);
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
Expand Down Expand Up @@ -2118,13 +2114,40 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, BlockNumber scanblkno,
rbuf = _bt_getbuf(rel, rightsib, BT_WRITE);
page = BufferGetPage(rbuf);
opaque = (BTPageOpaque) PageGetSpecialPointer(page);

/*
* Validate target's right sibling page. Its left link must point back to
* the target page.
*/
if (opaque->btpo_prev != target)
ereport(ERROR,
{
/*
* This is known to fail in the field; sibling link corruption is
* relatively common. Press on with vacuuming rather than just
* throwing an ERROR (same approach used for left-sibling's-right-link
* validation check a moment ago).
*/
ereport(LOG,
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg_internal("right sibling's left-link doesn't match: "
"block %u links to %u instead of expected %u in index \"%s\"",
rightsib, opaque->btpo_prev, target,
RelationGetRelationName(rel))));
"right sibling %u of target %u with leafblkno %u "
"and scanblkno %u spuriously links to non-target %u "
"on level %u of index \"%s\"",
rightsib, target, leafblkno,
scanblkno, opaque->btpo_prev,
targetlevel, RelationGetRelationName(rel))));

/* Must release all pins and locks on failure exit */
if (BufferIsValid(lbuf))
_bt_relbuf(rel, lbuf);
_bt_relbuf(rel, rbuf);
_bt_relbuf(rel, buf);
if (target != leafblkno)
_bt_relbuf(rel, leafbuf);

return false;
}

rightsib_is_rightmost = P_RIGHTMOST(opaque);
*rightsib_empty = (P_FIRSTDATAKEY(opaque) > PageGetMaxOffsetNumber(page));

Expand Down
3 changes: 1 addition & 2 deletions src/backend/access/nbtree/nbtree.c
Expand Up @@ -1141,8 +1141,7 @@ btvacuumpage(BTVacState *vstate, BlockNumber scanblkno)
* can't be half-dead because only an interrupted VACUUM process can
* leave pages in that state, so we'd definitely have dealt with it
* back when the page was the scanblkno page (half-dead pages are
* always marked fully deleted by _bt_pagedel()). This assumes that
* there can be only one vacuum process running at a time.
* always marked fully deleted by _bt_pagedel(), barring corruption).
*/
if (!opaque || !P_ISLEAF(opaque) || P_ISHALFDEAD(opaque))
{
Expand Down

0 comments on commit 8f876d1

Please sign in to comment.