Skip to content

Commit

Permalink
Add support for incremental backup.
Browse files Browse the repository at this point in the history
To take an incremental backup, you use the new replication command
UPLOAD_MANIFEST to upload the manifest for the prior backup. This
prior backup could either be a full backup or another incremental
backup.  You then use BASE_BACKUP with the INCREMENTAL option to take
the backup.  pg_basebackup now has an --incremental=PATH_TO_MANIFEST
option to trigger this behavior.

An incremental backup is like a regular full backup except that
some relation files are replaced with files with names like
INCREMENTAL.${ORIGINAL_NAME}, and the backup_label file contains
additional lines identifying it as an incremental backup. The new
pg_combinebackup tool can be used to reconstruct a data directory
from a full backup and a series of incremental backups.

Patch by me.  Reviewed by Matthias van de Meent, Dilip Kumar, Jakub
Wartak, Peter Eisentraut, and Álvaro Herrera. Thanks especially to
Jakub for incredibly helpful and extensive testing.

Discussion: http://postgr.es/m/CA+TgmoYOYZfMCyOXFyC-P+-mdrZqm5pP2N7S-r0z3_402h9rsA@mail.gmail.com
  • Loading branch information
robertmhaas committed Dec 20, 2023
1 parent 174c480 commit dc21234
Show file tree
Hide file tree
Showing 49 changed files with 5,834 additions and 52 deletions.
89 changes: 85 additions & 4 deletions doc/src/sgml/backup.sgml
Expand Up @@ -857,12 +857,79 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 && cp pg_wal/0
</para>
</sect2>

<sect2 id="backup-incremental-backup">
<title>Making an Incremental Backup</title>

<para>
You can use <xref linkend="app-pgbasebackup"/> to take an incremental
backup by specifying the <literal>--incremental</literal> option. You must
supply, as an argument to <literal>--incremental</literal>, the backup
manifest to an earlier backup from the same server. In the resulting
backup, non-relation files will be included in their entirety, but some
relation files may be replaced by smaller incremental files which contain
only the blocks which have been changed since the earlier backup and enough
metadata to reconstruct the current version of the file.
</para>

<para>
To figure out which blocks need to be backed up, the server uses WAL
summaries, which are stored in the data directory, inside the directory
<literal>pg_wal/summaries</literal>. If the required summary files are not
present, an attempt to take an incremental backup will fail. The summaries
present in this directory must cover all LSNs from the start LSN of the
prior backup to the start LSN of the current backup. Since the server looks
for WAL summaries just after establishing the start LSN of the current
backup, the necessary summary files probably won't be instantly present
on disk, but the server will wait for any missing files to show up.
This also helps if the WAL summarization process has fallen behind.
However, if the necessary files have already been removed, or if the WAL
summarizer doesn't catch up quickly enough, the incremental backup will
fail.
</para>

<para>
When restoring an incremental backup, it will be necessary to have not
only the incremental backup itself but also all earlier backups that
are required to supply the blocks omitted from the incremental backup.
See <xref linkend="app-pgcombinebackup"/> for further information about
this requirement.
</para>

<para>
Note that all of the requirements for making use of a full backup also
apply to an incremental backup. For instance, you still need all of the
WAL segment files generated during and after the file system backup, and
any relevant WAL history files. And you still need to create a
<literal>recovery.signal</literal> (or <literal>standby.signal</literal>)
and perform recovery, as described in
<xref linkend="backup-pitr-recovery" />. The requirement to have earlier
backups available at restore time and to use
<literal>pg_combinebackup</literal> is an additional requirement on top of
everything else. Keep in mind that <application>PostgreSQL</application>
has no built-in mechanism to figure out which backups are still needed as
a basis for restoring later incremental backups. You must keep track of
the relationships between your full and incremental backups on your own,
and be certain not to remove earlier backups if they might be needed when
restoring later incremental backups.
</para>

<para>
Incremental backups typically only make sense for relatively large
databases where a significant portion of the data does not change, or only
changes slowly. For a small database, it's simpler to ignore the existence
of incremental backups and simply take full backups, which are simpler
to manage. For a large database all of which is heavily modified,
incremental backups won't be much smaller than full backups.
</para>
</sect2>

<sect2 id="backup-lowlevel-base-backup">
<title>Making a Base Backup Using the Low Level API</title>
<para>
The procedure for making a base backup using the low level
APIs contains a few more steps than
the <xref linkend="app-pgbasebackup"/> method, but is relatively
Instead of taking a full or incremental base backup using
<xref linkend="app-pgbasebackup"/>, you can take a base backup using the
low-level API. This procedure contains a few more steps than
the <application>pg_basebackup</application> method, but is relatively
simple. It is very important that these steps are executed in
sequence, and that the success of a step is verified before
proceeding to the next step.
Expand Down Expand Up @@ -1118,14 +1185,28 @@ SELECT * FROM pg_backup_stop(wait_for_archive => true);
</listitem>
<listitem>
<para>
Restore the database files from your file system backup. Be sure that they
If you're restoring a full backup, you can restore the database files
directly into the target directories. Be sure that they
are restored with the right ownership (the database system user, not
<literal>root</literal>!) and with the right permissions. If you are using
tablespaces,
you should verify that the symbolic links in <filename>pg_tblspc/</filename>
were correctly restored.
</para>
</listitem>
<listitem>
<para>
If you're restoring an incremental backup, you'll need to restore the
incremental backup and all earlier backups upon which it directly or
indirectly depends to the machine where you are performing the restore.
These backups will need to be placed in separate directories, not the
target directories where you want the running server to end up.
Once this is done, use <xref linkend="app-pgcombinebackup"/> to pull
data from the full backup and all of the subsequent incremental backups
and write out a synthetic full backup to the target directories. As above,
verify that permissions and tablespace links are correct.
</para>
</listitem>
<listitem>
<para>
Remove any files present in <filename>pg_wal/</filename>; these came from the
Expand Down
2 changes: 0 additions & 2 deletions doc/src/sgml/config.sgml
Expand Up @@ -4153,13 +4153,11 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows
<sect2 id="runtime-config-wal-summarization">
<title>WAL Summarization</title>

<!--
<para>
These settings control WAL summarization, a feature which must be
enabled in order to perform an
<link linkend="backup-incremental-backup">incremental backup</link>.
</para>
-->

<variablelist>
<varlistentry id="guc-summarize-wal" xreflabel="summarize_wal">
Expand Down
24 changes: 24 additions & 0 deletions doc/src/sgml/protocol.sgml
Expand Up @@ -2599,6 +2599,19 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
</listitem>
</varlistentry>

<varlistentry id="protocol-replication-upload-manifest">
<term>
<literal>UPLOAD_MANIFEST</literal>
<indexterm><primary>UPLOAD_MANIFEST</primary></indexterm>
</term>
<listitem>
<para>
Uploads a backup manifest in preparation for taking an incremental
backup.
</para>
</listitem>
</varlistentry>

<varlistentry id="protocol-replication-base-backup" xreflabel="BASE_BACKUP">
<term><literal>BASE_BACKUP</literal> [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ]
<indexterm><primary>BASE_BACKUP</primary></indexterm>
Expand Down Expand Up @@ -2838,6 +2851,17 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
</para>
</listitem>
</varlistentry>

<varlistentry>
<term><literal>INCREMENTAL</literal></term>
<listitem>
<para>
Requests an incremental backup. The
<literal>UPLOAD_MANIFEST</literal> command must be executed
before running a base backup with this option.
</para>
</listitem>
</varlistentry>
</variablelist>
</para>

Expand Down
1 change: 1 addition & 0 deletions doc/src/sgml/ref/allfiles.sgml
Expand Up @@ -202,6 +202,7 @@ Complete list of usable sgml source files in this directory.
<!ENTITY pgBasebackup SYSTEM "pg_basebackup.sgml">
<!ENTITY pgbench SYSTEM "pgbench.sgml">
<!ENTITY pgChecksums SYSTEM "pg_checksums.sgml">
<!ENTITY pgCombinebackup SYSTEM "pg_combinebackup.sgml">
<!ENTITY pgConfig SYSTEM "pg_config-ref.sgml">
<!ENTITY pgControldata SYSTEM "pg_controldata.sgml">
<!ENTITY pgCtl SYSTEM "pg_ctl-ref.sgml">
Expand Down
37 changes: 32 additions & 5 deletions doc/src/sgml/ref/pg_basebackup.sgml
Expand Up @@ -38,11 +38,25 @@ PostgreSQL documentation
</para>

<para>
<application>pg_basebackup</application> makes an exact copy of the database
cluster's files, while making sure the server is put into and
out of backup mode automatically. Backups are always taken of the entire
database cluster; it is not possible to back up individual databases or
database objects. For selective backups, another tool such as
<application>pg_basebackup</application> can take a full or incremental
base backup of the database. When used to take a full backup, it makes an
exact copy of the database cluster's files. When used to take an incremental
backup, some files that would have been part of a full backup may be
replaced with incremental versions of the same files, containing only those
blocks that have been modified since the reference backup. An incremental
backup cannot be used directly; instead,
<xref linkend="app-pgcombinebackup"/> must first
be used to combine it with the previous backups upon which it depends.
See <xref linkend="backup-incremental-backup" /> for more information
about incremental backups, and <xref linkend="backup-pitr-recovery" />
for steps to recover from a backup.
</para>

<para>
In any mode, <application>pg_basebackup</application> makes sure the server
is put into and out of backup mode automatically. Backups are always taken of
the entire database cluster; it is not possible to back up individual
databases or database objects. For selective backups, another tool such as
<xref linkend="app-pgdump"/> must be used.
</para>

Expand Down Expand Up @@ -197,6 +211,19 @@ PostgreSQL documentation
</listitem>
</varlistentry>

<varlistentry>
<term><option>-i <replaceable class="parameter">old_manifest_file</replaceable></option></term>
<term><option>--incremental=<replaceable class="parameter">old_meanifest_file</replaceable></option></term>
<listitem>
<para>
Performs an <link linkend="backup-incremental-backup">incremental
backup</link>. The backup manifest for the reference
backup must be provided, and will be uploaded to the server, which will
respond by sending the requested incremental backup.
</para>
</listitem>
</varlistentry>

<varlistentry>
<term><option>-R</option></term>
<term><option>--write-recovery-conf</option></term>
Expand Down

0 comments on commit dc21234

Please sign in to comment.