Skip to content

Commit

Permalink
Revert recovery prefetching feature.
Browse files Browse the repository at this point in the history
This set of commits has some bugs with known fixes, but at this late
stage in the release cycle it seems best to revert and resubmit next
time, along with some new automated test coverage for this whole area.

Commits reverted:

dc88460: Doc: Review for "Optionally prefetch referenced data in recovery."
1d25757: Optionally prefetch referenced data in recovery.
f003d9f: Add circular WAL decoding buffer.
323cbe7: Remove read_page callback from XLogReader.

Remove the new GUC group WAL_RECOVERY recently added by a55a984, as the
corresponding section of config.sgml is now reverted.

Discussion: https://postgr.es/m/CAOuzzgrn7iKnFRsB4MHp3UisEQAGgZMbk_ViTN4HV4-Ksq8zCg%40mail.gmail.com
  • Loading branch information
macdice committed May 10, 2021
1 parent 63db0ac commit c2dc193
Show file tree
Hide file tree
Showing 35 changed files with 815 additions and 3,080 deletions.
83 changes: 0 additions & 83 deletions doc/src/sgml/config.sgml
Expand Up @@ -3588,89 +3588,6 @@ include_dir 'conf.d'
</variablelist>
</sect2>

<sect2 id="runtime-config-wal-recovery">

<title>Recovery</title>

<indexterm>
<primary>configuration</primary>
<secondary>of recovery</secondary>
<tertiary>general settings</tertiary>
</indexterm>

<para>
This section describes the settings that apply to recovery in general,
affecting crash recovery, streaming replication and archive-based
replication.
</para>


<variablelist>
<varlistentry id="guc-recovery-prefetch" xreflabel="recovery_prefetch">
<term><varname>recovery_prefetch</varname> (<type>boolean</type>)
<indexterm>
<primary><varname>recovery_prefetch</varname> configuration parameter</primary>
</indexterm>
</term>
<listitem>
<para>
Whether to try to prefetch blocks that are referenced in the WAL that
are not yet in the buffer pool, during recovery. Prefetching blocks
that will soon be needed can reduce I/O wait times in some workloads.
See also the <xref linkend="guc-wal-decode-buffer-size"/> and
<xref linkend="guc-maintenance-io-concurrency"/> settings, which limit
prefetching activity.
This setting is disabled by default.
</para>
<para>
This feature currently depends on an effective
<function>posix_fadvise</function> function, which some
operating systems lack.
</para>
</listitem>
</varlistentry>

<varlistentry id="guc-recovery-prefetch-fpw" xreflabel="recovery_prefetch_fpw">
<term><varname>recovery_prefetch_fpw</varname> (<type>boolean</type>)
<indexterm>
<primary><varname>recovery_prefetch_fpw</varname> configuration parameter</primary>
</indexterm>
</term>
<listitem>
<para>
Whether to prefetch blocks that were logged with full page images,
during recovery. Often this doesn't help, since such blocks will not
be read the first time they are needed and might remain in the buffer
pool after that. However, on file systems with a block size larger
than
<productname>PostgreSQL</productname>'s, prefetching can avoid a
costly read-before-write when blocks are later written.
The default is off.
</para>
</listitem>
</varlistentry>

<varlistentry id="guc-wal-decode-buffer-size" xreflabel="wal_decode_buffer_size">
<term><varname>wal_decode_buffer_size</varname> (<type>integer</type>)
<indexterm>
<primary><varname>wal_decode_buffer_size</varname> configuration parameter</primary>
</indexterm>
</term>
<listitem>
<para>
A limit on how far ahead the server can look in the WAL, to find
blocks to prefetch. Setting it too high might be counterproductive,
if it means that data falls out of the
kernel cache before it is needed. If this value is specified without
units, it is taken as bytes.
The default is 512kB.
</para>
</listitem>
</varlistentry>

</variablelist>
</sect2>

<sect2 id="runtime-config-wal-archive-recovery">

<title>Archive Recovery</title>
Expand Down
86 changes: 2 additions & 84 deletions doc/src/sgml/monitoring.sgml
Expand Up @@ -337,13 +337,6 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
</entry>
</row>

<row>
<entry><structname>pg_stat_prefetch_recovery</structname><indexterm><primary>pg_stat_prefetch_recovery</primary></indexterm></entry>
<entry>Only one row, showing statistics about blocks prefetched during recovery.
See <xref linkend="pg-stat-prefetch-recovery-view"/> for details.
</entry>
</row>

<row>
<entry><structname>pg_stat_subscription</structname><indexterm><primary>pg_stat_subscription</primary></indexterm></entry>
<entry>At least one row per subscription, showing information about
Expand Down Expand Up @@ -2948,78 +2941,6 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
copy of the subscribed tables.
</para>

<table id="pg-stat-prefetch-recovery-view" xreflabel="pg_stat_prefetch_recovery">
<title><structname>pg_stat_prefetch_recovery</structname> View</title>
<tgroup cols="3">
<thead>
<row>
<entry>Column</entry>
<entry>Type</entry>
<entry>Description</entry>
</row>
</thead>

<tbody>
<row>
<entry><structfield>prefetch</structfield></entry>
<entry><type>bigint</type></entry>
<entry>Number of blocks prefetched because they were not in the buffer pool</entry>
</row>
<row>
<entry><structfield>skip_hit</structfield></entry>
<entry><type>bigint</type></entry>
<entry>Number of blocks not prefetched because they were already in the buffer pool</entry>
</row>
<row>
<entry><structfield>skip_new</structfield></entry>
<entry><type>bigint</type></entry>
<entry>Number of blocks not prefetched because they were new (usually relation extension)</entry>
</row>
<row>
<entry><structfield>skip_fpw</structfield></entry>
<entry><type>bigint</type></entry>
<entry>Number of blocks not prefetched because a full page image was included in the WAL and <xref linkend="guc-recovery-prefetch-fpw"/> was set to <literal>off</literal></entry>
</row>
<row>
<entry><structfield>skip_seq</structfield></entry>
<entry><type>bigint</type></entry>
<entry>Number of blocks not prefetched because of repeated access</entry>
</row>
<row>
<entry><structfield>distance</structfield></entry>
<entry><type>integer</type></entry>
<entry>How far ahead of recovery the prefetcher is currently reading, in bytes</entry>
</row>
<row>
<entry><structfield>queue_depth</structfield></entry>
<entry><type>integer</type></entry>
<entry>How many prefetches have been initiated but are not yet known to have completed</entry>
</row>
<row>
<entry><structfield>avg_distance</structfield></entry>
<entry><type>float4</type></entry>
<entry>How far ahead of recovery the prefetcher is on average, while recovery is not idle</entry>
</row>
<row>
<entry><structfield>avg_queue_depth</structfield></entry>
<entry><type>float4</type></entry>
<entry>Average number of prefetches in flight while recovery is not idle</entry>
</row>
</tbody>
</tgroup>
</table>

<para>
The <structname>pg_stat_prefetch_recovery</structname> view will contain only
one row. It is filled with nulls if recovery is not running or WAL
prefetching is not enabled. See <xref linkend="guc-recovery-prefetch"/>
for more information. The counters in this view are reset whenever the
<xref linkend="guc-recovery-prefetch"/>,
<xref linkend="guc-recovery-prefetch-fpw"/> or
<xref linkend="guc-maintenance-io-concurrency"/> setting is changed and
the server configuration is reloaded.
</para>

<table id="pg-stat-subscription" xreflabel="pg_stat_subscription">
<title><structname>pg_stat_subscription</structname> View</title>
<tgroup cols="1">
Expand Down Expand Up @@ -5152,11 +5073,8 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
all the counters shown in
the <structname>pg_stat_bgwriter</structname>
view, <literal>archiver</literal> to reset all the counters shown in
the <structname>pg_stat_archiver</structname> view,
<literal>wal</literal> to reset all the counters shown in the
<structname>pg_stat_wal</structname> view or
<literal>prefetch_recovery</literal> to reset all the counters shown
in the <structname>pg_stat_prefetch_recovery</structname> view.
the <structname>pg_stat_archiver</structname> view or <literal>wal</literal>
to reset all the counters shown in the <structname>pg_stat_wal</structname> view.
</para>
<para>
This function is restricted to superusers by default, but other users
Expand Down
15 changes: 0 additions & 15 deletions doc/src/sgml/wal.sgml
Expand Up @@ -803,21 +803,6 @@
counted as <literal>wal_write</literal> and <literal>wal_sync</literal>
in <structname>pg_stat_wal</structname>, respectively.
</para>

<para>
The <xref linkend="guc-recovery-prefetch"/> parameter can
be used to improve I/O performance during recovery by instructing
<productname>PostgreSQL</productname> to initiate reads
of disk blocks that will soon be needed but are not currently in
<productname>PostgreSQL</productname>'s buffer pool.
The <xref linkend="guc-maintenance-io-concurrency"/> and
<xref linkend="guc-wal-decode-buffer-size"/> settings limit prefetching
concurrency and distance, respectively. The
prefetching mechanism is most likely to be effective on systems
with <varname>full_page_writes</varname> set to
<varname>off</varname> (where that is safe), and where the working
set is larger than RAM. By default, prefetching in recovery is disabled.
</para>
</sect1>

<sect1 id="wal-internals">
Expand Down
1 change: 0 additions & 1 deletion src/backend/access/transam/Makefile
Expand Up @@ -31,7 +31,6 @@ OBJS = \
xlogarchive.o \
xlogfuncs.o \
xloginsert.o \
xlogprefetch.o \
xlogreader.o \
xlogutils.o

Expand Down
6 changes: 3 additions & 3 deletions src/backend/access/transam/generic_xlog.c
Expand Up @@ -482,10 +482,10 @@ generic_redo(XLogReaderState *record)
uint8 block_id;

/* Protect limited size of buffers[] array */
Assert(XLogRecMaxBlockId(record) < MAX_GENERIC_XLOG_PAGES);
Assert(record->max_block_id < MAX_GENERIC_XLOG_PAGES);

/* Iterate over blocks */
for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
for (block_id = 0; block_id <= record->max_block_id; block_id++)
{
XLogRedoAction action;

Expand Down Expand Up @@ -525,7 +525,7 @@ generic_redo(XLogReaderState *record)
}

/* Changes are done: unlock and release all buffers */
for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
for (block_id = 0; block_id <= record->max_block_id; block_id++)
{
if (BufferIsValid(buffers[block_id]))
UnlockReleaseBuffer(buffers[block_id]);
Expand Down
14 changes: 6 additions & 8 deletions src/backend/access/transam/twophase.c
Expand Up @@ -1330,21 +1330,19 @@ XlogReadTwoPhaseData(XLogRecPtr lsn, char **buf, int *len)
char *errormsg;
TimeLineID save_currtli = ThisTimeLineID;

xlogreader = XLogReaderAllocate(wal_segment_size, NULL, wal_segment_close);

xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
XL_ROUTINE(.page_read = &read_local_xlog_page,
.segment_open = &wal_segment_open,
.segment_close = &wal_segment_close),
NULL);
if (!xlogreader)
ereport(ERROR,
(errcode(ERRCODE_OUT_OF_MEMORY),
errmsg("out of memory"),
errdetail("Failed while allocating a WAL reading processor.")));

XLogBeginRead(xlogreader, lsn);
while (XLogReadRecord(xlogreader, &record, &errormsg) ==
XLREAD_NEED_DATA)
{
if (!read_local_xlog_page(xlogreader))
break;
}
record = XLogReadRecord(xlogreader, &errormsg);

/*
* Restore immediately the timeline where it was previously, as
Expand Down

0 comments on commit c2dc193

Please sign in to comment.