Skip to content

Commit

Permalink
Expanded feature bullet points into a new section with detailed expla…
Browse files Browse the repository at this point in the history
…nations.

Copy editing by John Harvey (@crunchyjohn), Jason O'Donnell (@Dwaligon), and Stephen Frost (@sfrost).
  • Loading branch information
dwsteele committed Apr 17, 2016
1 parent ed20c2e commit 7bd9b28
Show file tree
Hide file tree
Showing 2 changed files with 154 additions and 45 deletions.
86 changes: 64 additions & 22 deletions README.md
Expand Up @@ -4,35 +4,77 @@

pgBackRest aims to be a simple, reliable backup and restore system that can seamlessly scale up to the largest databases and workloads.

Primary pgBackRest features:

- Local or remote backup
- Multi-threaded backup/restore for performance
- Checksums
- Safe backups (checks that logs required for consistency are present before backup completes)
- Full, differential, and incremental backups
- Backup rotation (and minimum retention rules with optional separate retention for archive)
- In-stream compression/decompression
- Archiving and retrieval of logs for replicas/restores built in
- Async archiving for very busy systems (including space limits)
- Backup directories are consistent PostgreSQL clusters (when hardlinks are on and compression is off)
- Tablespace support
- Restore delta option
- Restore using timestamp/size or checksum
- Restore remapping base/tablespaces
- Support for PostgreSQL >= 8.3

Instead of relying on traditional backup tools like tar and rsync, pgBackRest implements all backup features internally and uses a custom protocol for communicating with remote systems. Removing reliance on tar and rsync allows for better solutions to database-specific backup issues. The custom remote protocol limits the types of connections that are required to perform a backup which increases security.
Instead of relying on traditional backup tools like tar and rsync, pgBackRest implements all backup features internally and uses a custom protocol for communicating with remote systems. Removing reliance on tar and rsync allows for better solutions to database-specific backup problems. The custom remote protocol allows for more flexibility and limits the types of connections that are required to perform a backup which increases security.

## Features

### Multithreaded Backup & Restore

Compression is usually the bottleneck during backup operations but, even with now ubiquitous multi-core servers, most database backup solutions are still single-threaded. pgBackRest solves the compression bottleneck with multithreading.

Utilizing multiple cores for compression makes it possible to achieve 1TB/hr raw throughput even on a 1Gb/s link. More cores and a larger pipe lead to even higher throughput.

### Local or Remote Operation

A custom protocol allows pgBackRest to backup, restore, and archive locally or remotely via SSH with minimal configuration. An interface to query PostgreSQL is also provided via the protocol layer so that remote access to PostgreSQL is never required, which enhances security.

### Full, Incremental, & Differential Backups

Full, differential, and incremental backups are supported. pgBackRest is not susceptible to the time resolution issues of rsync, making differential and incremental backups completely safe.

### Backup Rotation & Archive Expiration

Retention polices can be set for full and differential backups to create coverage for any timeframe. WAL archive can be maintained for all backups or strictly for the most recent backups. In the latter case WAL required to make older backups consistent will be maintained in the archive.

### Backup Integrity

Checksums are calculated for every file in the backup and rechecked during a restore. After a backup finishes copying files, it waits until every WAL segment required to make the backup consistent reaches the repository.

Backups in the repository are stored in the same format as a standard PostgreSQL cluster (including tablespaces). If compression is disabled and hard links are enabled it is possible to snapshot a backup in the repository and bring up a PostgreSQL cluster directly on the snapshot. This is advantageous for terabyte-scale databases that are time consuming to restore in the traditional way.

All operations utilize file and directory level fsync to ensure durability.

### Backup Resume

An aborted backup can be resumed from the point where it was stopped. Files that were already copied are compared with the checksums in the manifest to ensure integrity. Since this operation can take place entirely on the backup server, it reduces load on the database server and saves time since checksum calculation is faster than compressing and retransmitting data.

### Streaming Compression & Checksums

Compression and checksum calculations are performed in stream while files are being copied to the repository, whether the repository is located locally or remotely.

If the repository is on a backup server, compression is performed on the database server and files are transmitted in a compressed format and simply stored on the backup server. When compression is disabled a lower level of compression is utilized to make efficient use of available bandwidth while keeping CPU cost to a minimum.

### Delta Restore

The manifest contains checksums for every file in the backup so that during a restore it is possible to use these checksums to speed processing enormously. On a delta restore any files not present in the backup are first removed and then checksums are taken for the remaining files. Files that match the backup are left in place and the rest of the files are restored as usual. Since this process is multithreaded, it can lead to a dramatic reduction in restore times.

### Advanced Archiving

Dedicated commands are included for both pushing WAL to the archive and retrieving WAL from the archive.

The push command automatically detects WAL segments that are pushed multiple times and de-duplicates when the segment is identical, otherwise an error is raised. The push and get commands both ensure that the database and repository match by comparing PostgreSQL versions and system identifiers. This precludes the possibility of misconfiguring the WAL archive location.

Asynchronous archiving allows compression and transfer to be offloaded to another process which maintains a continuous connection to the remote server, improving throughput significantly. This can be a critical feature for databases with extremely high write volume.

### Tablespace & Link Support

Tablespaces are fully supported and on restore tablespaces can be remapped to any location. It is also possible to remap all tablespaces to one location with a single command which is useful for development restores.

File and directory links are supported for any file or directory in the PostgreSQL cluster. When restoring it is possible to restore all links to their original locations, remap some or all links, or restore some or all links as normal files or directories within the cluster directory.

### Compatibility with PostgreSQL >= 8.3

pgBackRest includes support for versions down to 8.3, since older versions of PostgreSQL are still regularly utilized.

## Getting Started

pgBackRest strives to be easy to configure and operate:

- [User guide](http://www.pgbackrest.org/user-guide.html) for Ubuntu 12.04 & 14.04 / PostgreSQL 9.4.
- [Command reference](http://www.pgbackrest.org/command.html) for command-line operations.
- [Configuration reference](http://www.pgbackrest.org/configuration.html) for creating rich pgBackRest configurations.
- [Configuration reference](http://www.pgbackrest.org/configuration.html) for creating pgBackRest configurations.

## Contributing
## Contributions

Contributions to pgBackRest are always welcome!

Expand All @@ -48,7 +90,7 @@ pgBackRest is completely free and open source under the [MIT](https://github.com

Creating a robust disaster recovery policy with proper replication and backup strategies can be a very complex and daunting task. You may find that you need help during the architecture phase and ongoing support to ensure that your enterprise continues running smoothly.

[Crunchy Data](http://www.crunchydata.com) provides packaged versions of pgBackRest for major operating systems and expert full life-cycle commercial support for pgBackRest and all things PostgreSQL. [Crunchy Data](http://www.crunchydata.com) is committed to providing open source solutions with no vendor lock-in so cross-compatibility with the community version of pgBackRest is always strictly maintained.
[Crunchy Data](http://www.crunchydata.com) provides packaged versions of pgBackRest for major operating systems and expert full life-cycle commercial support for pgBackRest and all things PostgreSQL. [Crunchy Data](http://www.crunchydata.com) is committed to providing open source solutions with no vendor lock-in, ensuring that cross-compatibility with the community version of pgBackRest is always strictly maintained.

Please visit [Crunchy Data](http://www.crunchydata.com) for more information.

Expand Down
113 changes: 90 additions & 23 deletions doc/xml/index.xml
Expand Up @@ -24,26 +24,93 @@
<section id="introduction">
<title>Introduction</title>

<p><backrest/> aims to be a simple, reliable backup and restore system that can seamlessly scale up to the largest databases and workloads.
<p><backrest/> aims to be a simple, reliable backup and restore system that can seamlessly scale up to the largest databases and workloads.</p>

Primary <backrest/> features:
<ul>
<li>Local or remote backup</li>
<li>Multi-threaded backup/restore for performance</li>
<li>Checksums</li>
<li>Safe backups (checks that logs required for consistency are present before backup completes)</li>
<li>Full, differential, and incremental backups</li>
<li>Backup rotation (and minimum retention rules with optional separate retention for archive)</li>
<li>In-stream compression/decompression</li>
<li>Archiving and retrieval of logs for replicas/restores built in</li>
<li>Async archiving for very busy systems (including space limits)</li>
<li>Backup directories are consistent <postgres/> clusters (when hardlinks are on and compression is off)</li>
<li>Tablespace support</li>
<li>Restore delta option</li>
<li>Restore using timestamp/size or checksum</li>
<li>Restore remapping base/tablespaces</li>
<li>Support for <postgres/> >= 8.3</li></ul>
Instead of relying on traditional backup tools like tar and rsync, <backrest/> implements all backup features internally and uses a custom protocol for communicating with remote systems. Removing reliance on tar and rsync allows for better solutions to database-specific backup issues. The custom remote protocol limits the types of connections that are required to perform a backup which increases security.</p>
<p>Instead of relying on traditional backup tools like tar and rsync, <backrest/> implements all backup features internally and uses a custom protocol for communicating with remote systems. Removing reliance on tar and rsync allows for better solutions to database-specific backup problems. The custom remote protocol allows for more flexibility and limits the types of connections that are required to perform a backup which increases security.</p>
</section>

<section id="features">
<title>Features</title>

<section id="multi-threaded">
<title>Multithreaded Backup &amp; Restore</title>

<p>Compression is usually the bottleneck during backup operations but, even with now ubiquitous multi-core servers, most database backup solutions are still single-threaded. <backrest/> solves the compression bottleneck with multithreading.</p>

<p>Utilizing multiple cores for compression makes it possible to achieve 1TB/hr raw throughput even on a 1Gb/s link. More cores and a larger pipe lead to even higher throughput.</p>
</section>

<section id="local-or-remote">
<title>Local or Remote Operation</title>

<p>A custom protocol allows <backrest/> to backup, restore, and archive locally or remotely via SSH with minimal configuration. An interface to query <postgres/> is also provided via the protocol layer so that remote access to <postgres/> is never required, which enhances security.</p>
</section>

<section id="backup-types">
<title>Full, Incremental, &amp; Differential Backups</title>

<p>Full, differential, and incremental backups are supported. <backrest/> is not susceptible to the time resolution issues of rsync, making differential and incremental backups completely safe.</p>
</section>

<section id="backup-rotation">
<title>Backup Rotation &amp; Archive Expiration</title>

<p>Retention polices can be set for full and differential backups to create coverage for any timeframe. WAL archive can be maintained for all backups or strictly for the most recent backups. In the latter case WAL required to make older backups consistent will be maintained in the archive.</p>
</section>

<section id="backup-intregrity">
<title>Backup Integrity</title>

<p>Checksums are calculated for every file in the backup and rechecked during a restore. After a backup finishes copying files, it waits until every WAL segment required to make the backup consistent reaches the repository.</p>

<p>Backups in the repository are stored in the same format as a standard <postgres/> cluster (including tablespaces). If compression is disabled and hard links are enabled it is possible to snapshot a backup in the repository and bring up a <postgres/> cluster directly on the snapshot. This is advantageous for terabyte-scale databases that are time consuming to restore in the traditional way.</p>

<p>All operations utilize file and directory level fsync to ensure durability.</p>
</section>

<section id="backup-resume">
<title>Backup Resume</title>

<p>An aborted backup can be resumed from the point where it was stopped. Files that were already copied are compared with the checksums in the manifest to ensure integrity. Since this operation can take place entirely on the backup server, it reduces load on the database server and saves time since checksum calculation is faster than compressing and retransmitting data.</p>
</section>

<section id="stream-compression-checksums">
<title>Streaming Compression &amp; Checksums</title>

<p>Compression and checksum calculations are performed in stream while files are being copied to the repository, whether the repository is located locally or remotely.</p>

<p>If the repository is on a backup server, compression is performed on the database server and files are transmitted in a compressed format and simply stored on the backup server. When compression is disabled a lower level of compression is utilized to make efficient use of available bandwidth while keeping CPU cost to a minimum.</p>
</section>

<section id="delta-restore">
<title>Delta Restore</title>

<p>The manifest contains checksums for every file in the backup so that during a restore it is possible to use these checksums to speed processing enormously. On a delta restore any files not present in the backup are first removed and then checksums are taken for the remaining files. Files that match the backup are left in place and the rest of the files are restored as usual. Since this process is multithreaded, it can lead to a dramatic reduction in restore times.</p>
</section>

<section id="advanced-archiving">
<title>Advanced Archiving</title>

<p>Dedicated commands are included for both pushing WAL to the archive and retrieving WAL from the archive.</p>

<p>The push command automatically detects WAL segments that are pushed multiple times and de-duplicates when the segment is identical, otherwise an error is raised. The push and get commands both ensure that the database and repository match by comparing <postgres/> versions and system identifiers. This precludes the possibility of misconfiguring the WAL archive location.</p>

<p>Asynchronous archiving allows compression and transfer to be offloaded to another process which maintains a continuous connection to the remote server, improving throughput significantly. This can be a critical feature for databases with extremely high write volume.</p>
</section>

<section id="tablespace-link-support">
<title>Tablespace &amp; Link Support</title>

<p>Tablespaces are fully supported and on restore tablespaces can be remapped to any location. It is also possible to remap all tablespaces to one location with a single command which is useful for development restores.</p>

<p>File and directory links are supported for any file or directory in the <postgres/> cluster. When restoring it is possible to restore all links to their original locations, remap some or all links, or restore some or all links as normal files or directories within the cluster directory.</p>
</section>

<section id="postgres-compatibility">
<title>Compatibility with <postgres/> >= 8.3</title>

<p><backrest/> includes support for versions down to 8.3, since older versions of PostgreSQL are still regularly utilized.</p>
</section>
</section>

<section id="getting-started">
Expand All @@ -53,11 +120,11 @@
<ul>
<li><link page="{[backrest-page-user-guide]}">User guide</link> for Ubuntu 12.04 &amp; 14.04 / <postgres/> 9.4.</li>
<li><link page="{[backrest-page-command]}">Command reference</link> for command-line operations.</li>
<li><link page="{[backrest-page-configuration]}">Configuration reference</link> for creating rich <backrest/> configurations.</li></ul></p>
<li><link page="{[backrest-page-configuration]}">Configuration reference</link> for creating <backrest/> configurations.</li></ul></p>
</section>

<section id="contributing">
<title>Contributing</title>
<section id="contributions">
<title>Contributions</title>

<p>Contributions to <backrest/> are always welcome!

Expand All @@ -75,7 +142,7 @@

Creating a robust disaster recovery policy with proper replication and backup strategies can be a very complex and daunting task. You may find that you need help during the architecture phase and ongoing support to ensure that your enterprise continues running smoothly.

<link url="{[crunchy-url-base]}">Crunchy Data</link> provides packaged versions of <backrest/> for major operating systems and expert full life-cycle commercial support for <backrest/> and all things <postgres/>. <link url="{[crunchy-url-base]}">Crunchy Data</link> is committed to providing open source solutions with no vendor lock-in so cross-compatibility with the community version of <backrest/> is always strictly maintained.
<link url="{[crunchy-url-base]}">Crunchy Data</link> provides packaged versions of <backrest/> for major operating systems and expert full life-cycle commercial support for <backrest/> and all things <postgres/>. <link url="{[crunchy-url-base]}">Crunchy Data</link> is committed to providing open source solutions with no vendor lock-in, ensuring that cross-compatibility with the community version of <backrest/> is always strictly maintained.

Please visit <link url="{[crunchy-url-base]}">Crunchy Data</link> for more information.</p>
</section>
Expand Down

0 comments on commit 7bd9b28

Please sign in to comment.