Skip to content

Commit

Permalink
Handle interleavings between CREATE DATABASE steps and base backup.
Browse files Browse the repository at this point in the history
Restoring a base backup taken in the middle of CreateDirAndVersionFile()
or write_relmap_file() would lose the function's effects.  The symptom
was absence of the database directory, PG_VERSION file, or
pg_filenode.map.  If missing the directory, recovery would fail.  Either
missing file would not fail recovery but would render the new database
unusable.  Fix CreateDirAndVersionFile() with the transam/README "action
first and then write a WAL entry" strategy.  That has a side benefit of
moving filesystem mutations out of a critical section, reducing the ways
to PANIC.  Fix the write_relmap_file() call with a lock acquisition, so
it interacts with checkpoints like non-CREATE DATABASE calls do.
Back-patch to v15, where commit 9c08aea
introduced STRATEGY=WAL_LOG and made it the default.

Discussion: https://postgr.es/m/20240130195003.0a.nmisch@google.com
  • Loading branch information
nmisch committed Feb 1, 2024
1 parent b4fb76f commit 6d423e9
Show file tree
Hide file tree
Showing 2 changed files with 32 additions and 28 deletions.
44 changes: 18 additions & 26 deletions src/backend/commands/dbcommands.c
Original file line number Diff line number Diff line change
Expand Up @@ -462,35 +462,12 @@ CreateDirAndVersionFile(char *dbpath, Oid dbid, Oid tsid, bool isRedo)
char buf[16];

/*
* Prepare version data before starting a critical section.
*
* Note that we don't have to copy this from the source database; there's
* only one legal value.
* Note that we don't have to copy version data from the source database;
* there's only one legal value.
*/
sprintf(buf, "%s\n", PG_MAJORVERSION);
nbytes = strlen(PG_MAJORVERSION) + 1;

/* If we are not in WAL replay then write the WAL. */
if (!isRedo)
{
xl_dbase_create_wal_log_rec xlrec;
XLogRecPtr lsn;

START_CRIT_SECTION();

xlrec.db_id = dbid;
xlrec.tablespace_id = tsid;

XLogBeginInsert();
XLogRegisterData((char *) (&xlrec),
sizeof(xl_dbase_create_wal_log_rec));

lsn = XLogInsert(RM_DBASE_ID, XLOG_DBASE_CREATE_WAL_LOG);

/* As always, WAL must hit the disk before the data update does. */
XLogFlush(lsn);
}

/* Create database directory. */
if (MakePGDirectory(dbpath) < 0)
{
Expand Down Expand Up @@ -534,9 +511,24 @@ CreateDirAndVersionFile(char *dbpath, Oid dbid, Oid tsid, bool isRedo)
/* Close the version file. */
CloseTransientFile(fd);

/* Critical section done. */
/* If we are not in WAL replay then write the WAL. */
if (!isRedo)
{
xl_dbase_create_wal_log_rec xlrec;

START_CRIT_SECTION();

xlrec.db_id = dbid;
xlrec.tablespace_id = tsid;

XLogBeginInsert();
XLogRegisterData((char *) (&xlrec),
sizeof(xl_dbase_create_wal_log_rec));

(void) XLogInsert(RM_DBASE_ID, XLOG_DBASE_CREATE_WAL_LOG);

END_CRIT_SECTION();
}
}

/*
Expand Down
16 changes: 14 additions & 2 deletions src/backend/utils/cache/relmapper.c
Original file line number Diff line number Diff line change
Expand Up @@ -303,14 +303,15 @@ RelationMapCopy(Oid dbid, Oid tsid, char *srcdbpath, char *dstdbpath)
* Write the same data into the destination database's relmap file.
*
* No sinval is needed because no one can be connected to the destination
* database yet. For the same reason, there is no need to acquire
* RelationMappingLock.
* database yet.
*
* There's no point in trying to preserve files here. The new database
* isn't usable yet anyway, and won't ever be if we can't install a relmap
* file.
*/
LWLockAcquire(RelationMappingLock, LW_EXCLUSIVE);
write_relmap_file(&map, true, false, false, dbid, tsid, dstdbpath);
LWLockRelease(RelationMappingLock);
}

/*
Expand Down Expand Up @@ -633,10 +634,12 @@ RelationMapFinishBootstrap(void)
Assert(pending_local_updates.num_mappings == 0);

/* Write the files; no WAL or sinval needed */
LWLockAcquire(RelationMappingLock, LW_EXCLUSIVE);
write_relmap_file(&shared_map, false, false, false,
InvalidOid, GLOBALTABLESPACE_OID, "global");
write_relmap_file(&local_map, false, false, false,
MyDatabaseId, MyDatabaseTableSpace, DatabasePath);
LWLockRelease(RelationMappingLock);
}

/*
Expand Down Expand Up @@ -891,6 +894,15 @@ write_relmap_file(RelMapFile *newmap, bool write_wal, bool send_sinval,
char mapfilename[MAXPGPATH];
char maptempfilename[MAXPGPATH];

/*
* Even without concurrent use of this map, CheckPointRelationMap() relies
* on this locking. Without it, a restore of a base backup taken after
* this function's XLogInsert() and before its durable_rename() would not
* have the changes. wal_level=minimal doesn't need the lock, but this
* isn't performance-critical enough for such a micro-optimization.
*/
Assert(LWLockHeldByMeInMode(RelationMappingLock, LW_EXCLUSIVE));

/*
* Fill in the overhead fields and update CRC.
*/
Expand Down

0 comments on commit 6d423e9

Please sign in to comment.