Skip to content

Commit

Permalink
Allow read only connections during recovery, known as Hot Standby.
Browse files Browse the repository at this point in the history
Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.

New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.

This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.

Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.

Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
  • Loading branch information
simonat2ndQuadrant committed Dec 19, 2009
1 parent 78a0914 commit efc16ea
Show file tree
Hide file tree
Showing 87 changed files with 6,160 additions and 423 deletions.
776 changes: 773 additions & 3 deletions doc/src/sgml/backup.sgml

Large diffs are not rendered by default.

114 changes: 113 additions & 1 deletion doc/src/sgml/config.sgml
@@ -1,4 +1,4 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/config.sgml,v 1.238 2009/12/17 14:36:16 rhaas Exp $ -->
<!-- $PostgreSQL: pgsql/doc/src/sgml/config.sgml,v 1.239 2009/12/19 01:32:31 sriggs Exp $ -->

<chapter Id="runtime-config">
<title>Server Configuration</title>
Expand Down Expand Up @@ -376,6 +376,12 @@ SET ENABLE_SEQSCAN TO OFF;
allows. See <xref linkend="sysvipc"> for information on how to
adjust those parameters, if necessary.
</para>

<para>
When running a standby server, you must set this parameter to the
same or higher value than on the master server. Otherwise, queries
will not be allowed in the standby server.
</para>
</listitem>
</varlistentry>

Expand Down Expand Up @@ -826,6 +832,12 @@ SET ENABLE_SEQSCAN TO OFF;
allows. See <xref linkend="sysvipc"> for information on how to
adjust those parameters, if necessary.
</para>

<para>
When running a standby server, you must set this parameter to the
same or higher value than on the master server. Otherwise, queries
will not be allowed in the standby server.
</para>
</listitem>
</varlistentry>

Expand Down Expand Up @@ -1733,6 +1745,51 @@ archive_command = 'copy "%p" "C:\\server\\archivedir\\%f"' # Windows

</variablelist>
</sect2>

<sect2 id="runtime-config-standby">
<title>Standby Servers</title>

<variablelist>

<varlistentry id="recovery-connections" xreflabel="recovery_connections">
<term><varname>recovery_connections</varname> (<type>boolean</type>)</term>
<listitem>
<para>
Parameter has two roles. During recovery, specifies whether or not
you can connect and run queries to enable <xref linkend="hot-standby">.
During normal running, specifies whether additional information is written
to WAL to allow recovery connections on a standby server that reads
WAL data generated by this server. The default value is
<literal>on</literal>. It is thought that there is little
measurable difference in performance from using this feature, so
feedback is welcome if any production impacts are noticeable.
It is likely that this parameter will be removed in later releases.
This parameter can only be set at server start.
</para>
</listitem>
</varlistentry>

<varlistentry id="max-standby-delay" xreflabel="max_standby_delay">
<term><varname>max_standby_delay</varname> (<type>string</type>)</term>
<listitem>
<para>
When server acts as a standby, this parameter specifies a wait policy
for queries that conflict with incoming data changes. Valid settings
are -1, meaning wait forever, or a wait time of 0 or more seconds.
If a conflict should occur the server will delay up to this
amount before it begins trying to resolve things less amicably, as
described in <xref linkend="hot-standby-conflict">. Typically,
this parameter makes sense only during replication, so when
performing an archive recovery to recover from data loss a
parameter setting of 0 is recommended. The default is 30 seconds.
This parameter can only be set in the <filename>postgresql.conf</>
file or on the server command line.
</para>
</listitem>
</varlistentry>

</variablelist>
</sect2>
</sect1>

<sect1 id="runtime-config-query">
Expand Down Expand Up @@ -4161,6 +4218,29 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
</listitem>
</varlistentry>

<varlistentry id="guc-vacuum-defer-cleanup-age" xreflabel="vacuum_defer_cleanup_age">
<term><varname>vacuum_defer_cleanup_age</varname> (<type>integer</type>)</term>
<indexterm>
<primary><varname>vacuum_defer_cleanup_age</> configuration parameter</primary>
</indexterm>
<listitem>
<para>
Specifies the number of transactions by which <command>VACUUM</> and
<acronym>HOT</> updates will defer cleanup of dead row versions. The
default is 0 transactions, meaning that dead row versions will be
removed as soon as possible. You may wish to set this to a non-zero
value when planning or maintaining a <xref linkend="hot-standby">
configuration. The recommended value is <literal>0</> unless you have
clear reason to increase it. The purpose of the parameter is to
allow the user to specify an approximate time delay before cleanup
occurs. However, it should be noted that there is no direct link with
any specific time delay and so the results will be application and
installation specific, as well as variable over time, depending upon
the transaction rate (of writes only).
</para>
</listitem>
</varlistentry>

<varlistentry id="guc-bytea-output" xreflabel="bytea_output">
<term><varname>bytea_output</varname> (<type>enum</type>)</term>
<indexterm>
Expand Down Expand Up @@ -4689,6 +4769,12 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
allows. See <xref linkend="sysvipc"> for information on how to
adjust those parameters, if necessary.
</para>

<para>
When running a standby server, you must set this parameter to the
same or higher value than on the master server. Otherwise, queries
will not be allowed in the standby server.
</para>
</listitem>
</varlistentry>

Expand Down Expand Up @@ -5546,6 +5632,32 @@ plruby.use_strict = true # generates error: unknown class name
</listitem>
</varlistentry>

<varlistentry id="guc-trace-recovery-messages" xreflabel="trace_recovery_messages">
<term><varname>trace_recovery_messages</varname> (<type>string</type>)</term>
<indexterm>
<primary><varname>trace_recovery_messages</> configuration parameter</primary>
</indexterm>
<listitem>
<para>
Controls which message levels are written to the server log
for system modules needed for recovery processing. This allows
the user to override the normal setting of log_min_messages,
but only for specific messages. This is intended for use in
debugging Hot Standby.
Valid values are <literal>DEBUG5</>, <literal>DEBUG4</>,
<literal>DEBUG3</>, <literal>DEBUG2</>, <literal>DEBUG1</>,
<literal>INFO</>, <literal>NOTICE</>, <literal>WARNING</>,
<literal>ERROR</>, <literal>LOG</>, <literal>FATAL</>, and
<literal>PANIC</>. Each level includes all the levels that
follow it. The later the level, the fewer messages are sent
to the log. The default is <literal>WARNING</>. Note that
<literal>LOG</> has a different rank here than in
<varname>client_min_messages</>.
Parameter should be set in the postgresql.conf only.
</para>
</listitem>
</varlistentry>

<varlistentry id="guc-zero-damaged-pages" xreflabel="zero_damaged_pages">
<term><varname>zero_damaged_pages</varname> (<type>boolean</type>)</term>
<indexterm>
Expand Down
34 changes: 33 additions & 1 deletion doc/src/sgml/func.sgml
@@ -1,4 +1,4 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/func.sgml,v 1.493 2009/12/15 17:57:46 tgl Exp $ -->
<!-- $PostgreSQL: pgsql/doc/src/sgml/func.sgml,v 1.494 2009/12/19 01:32:31 sriggs Exp $ -->

<chapter id="functions">
<title>Functions and Operators</title>
Expand Down Expand Up @@ -13132,6 +13132,38 @@ postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup());
<xref linkend="continuous-archiving">.
</para>

<indexterm>
<primary>pg_is_in_recovery</primary>
</indexterm>

<para>
The functions shown in <xref
linkend="functions-recovery-info-table"> provide information
about the current status of Hot Standby.
These functions may be executed during both recovery and in normal running.
</para>

<table id="functions-recovery-info-table">
<title>Recovery Information Functions</title>
<tgroup cols="3">
<thead>
<row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry>
</row>
</thead>

<tbody>
<row>
<entry>
<literal><function>pg_is_in_recovery</function>()</literal>
</entry>
<entry><type>bool</type></entry>
<entry>True if recovery is still in progress.
</entry>
</row>
</tbody>
</tgroup>
</table>

<para>
The functions shown in <xref linkend="functions-admin-dbsize"> calculate
the disk space usage of database objects.
Expand Down
7 changes: 6 additions & 1 deletion doc/src/sgml/ref/checkpoint.sgml
@@ -1,4 +1,4 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/ref/checkpoint.sgml,v 1.16 2008/11/14 10:22:45 petere Exp $ -->
<!-- $PostgreSQL: pgsql/doc/src/sgml/ref/checkpoint.sgml,v 1.17 2009/12/19 01:32:31 sriggs Exp $ -->

<refentry id="sql-checkpoint">
<refmeta>
Expand Down Expand Up @@ -42,6 +42,11 @@ CHECKPOINT
<xref linkend="wal"> for more information about the WAL system.
</para>

<para>
If executed during recovery, the <command>CHECKPOINT</command> command
will force a restartpoint rather than writing a new checkpoint.
</para>

<para>
Only superusers can call <command>CHECKPOINT</command>. The command is
not intended for use during normal operation.
Expand Down
6 changes: 5 additions & 1 deletion src/backend/access/gin/ginxlog.c
Expand Up @@ -8,7 +8,7 @@
* Portions Copyright (c) 1994, Regents of the University of California
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/access/gin/ginxlog.c,v 1.19 2009/06/11 14:48:53 momjian Exp $
* $PostgreSQL: pgsql/src/backend/access/gin/ginxlog.c,v 1.20 2009/12/19 01:32:31 sriggs Exp $
*-------------------------------------------------------------------------
*/
#include "postgres.h"
Expand Down Expand Up @@ -621,6 +621,10 @@ gin_redo(XLogRecPtr lsn, XLogRecord *record)
{
uint8 info = record->xl_info & ~XLR_INFO_MASK;

/*
* GIN indexes do not require any conflict processing.
*/

RestoreBkpBlocks(lsn, record, false);

topCtx = MemoryContextSwitchTo(opCtx);
Expand Down
8 changes: 7 additions & 1 deletion src/backend/access/gist/gistxlog.c
Expand Up @@ -8,7 +8,7 @@
* Portions Copyright (c) 1994, Regents of the University of California
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/access/gist/gistxlog.c,v 1.32 2009/01/20 18:59:36 heikki Exp $
* $PostgreSQL: pgsql/src/backend/access/gist/gistxlog.c,v 1.33 2009/12/19 01:32:32 sriggs Exp $
*-------------------------------------------------------------------------
*/
#include "postgres.h"
Expand Down Expand Up @@ -396,6 +396,12 @@ gist_redo(XLogRecPtr lsn, XLogRecord *record)
uint8 info = record->xl_info & ~XLR_INFO_MASK;
MemoryContext oldCxt;

/*
* GIST indexes do not require any conflict processing. NB: If we ever
* implement a similar optimization we have in b-tree, and remove killed
* tuples outside VACUUM, we'll need to handle that here.
*/

RestoreBkpBlocks(lsn, record, false);

oldCxt = MemoryContextSwitchTo(opCtx);
Expand Down

0 comments on commit efc16ea

Please sign in to comment.