Add pg_stat_io view, providing more detailed IO statistics

Builds on 28e626b and f30d62c. See the former for motivation. Rows of the view show IO operations for a particular backend type, IO target object, IO context combination (e.g. a client backend's operations on permanent relations in shared buffers) and each column in the view is the total number of IO Operations done (e.g. writes). So a cell in the view would be, for example, the number of blocks of relation data written from shared buffers by client backends since the last stats reset. In anticipation of tracking WAL IO and non-block-oriented IO (such as temporary file IO), the "op_bytes" column specifies the unit of the "reads", "writes", and "extends" columns for a given row. Rows for combinations of IO operation, backend type, target object and context that never occur, are ommitted entirely. For example, checkpointer will never operate on temporary relations. Similarly, if an IO operation never occurs for such a combination, the IO operation's cell will be null, to distinguish from 0 observed IO operations. For example, bgwriter should not perform reads. Note that some of the cells in the view are redundant with fields in pg_stat_bgwriter (e.g. buffers_backend). For now, these have been kept for backwards compatibility. Bumps catversion. Author: Melanie Plageman <melanieplageman@gmail.com> Author: Samay Sharma <smilingsamay@gmail.com> Reviewed-by: Maciek Sakrejda <m.sakrejda@gmail.com> Reviewed-by: Lukas Fittl <lukas@fittl.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/20200124195226.lth52iydq2n2uilq@alap3.anarazel.de
postgres · Feb 11, 2023 · a9c70b4 · a9c70b4
1 parent 44e56ba
commit a9c70b4
Show file tree

Hide file tree

Showing 7 changed files with 486 additions and 15 deletions.
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
@@ -469,6 +469,16 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_io</structname><indexterm><primary>pg_stat_io</primary></indexterm></entry>
+      <entry>
+       One row for each combination of backend type, context, and target object
+       containing cluster-wide I/O statistics.
+       See <link linkend="monitoring-pg-stat-io-view">
+       <structname>pg_stat_io</structname></link> for details.
+     </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_replication_slots</structname><indexterm><primary>pg_stat_replication_slots</primary></indexterm></entry>
       <entry>One row per replication slot, showing statistics about the
@@ -665,20 +675,16 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
   </para>
 
   <para>
-   The <structname>pg_statio_</structname> views are primarily useful to
-   determine the effectiveness of the buffer cache.  When the number
-   of actual disk reads is much smaller than the number of buffer
-   hits, then the cache is satisfying most read requests without
-   invoking a kernel call. However, these statistics do not give the
-   entire story: due to the way in which <productname>PostgreSQL</productname>
-   handles disk I/O, data that is not in the
-   <productname>PostgreSQL</productname> buffer cache might still reside in the
-   kernel's I/O cache, and might therefore still be fetched without
-   requiring a physical read. Users interested in obtaining more
-   detailed information on <productname>PostgreSQL</productname> I/O behavior are
-   advised to use the <productname>PostgreSQL</productname> statistics views
-   in combination with operating system utilities that allow insight
-   into the kernel's handling of I/O.
+   The <structname>pg_stat_io</structname> and
+   <structname>pg_statio_</structname> set of views are useful for determining
+   the effectiveness of the buffer cache. They can be used to calculate a cache
+   hit ratio. Note that while <productname>PostgreSQL</productname>'s I/O
+   statistics capture most instances in which the kernel was invoked in order
+   to perform I/O, they do not differentiate between data which had to be
+   fetched from disk and that which already resided in the kernel page cache.
+   Users are advised to use the <productname>PostgreSQL</productname>
+   statistics views in combination with operating system utilities for a more
+   complete picture of their database's I/O performance.
   </para>
 
  </sect2>
@@ -3669,6 +3675,293 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
     <structfield>last_archived_wal</structfield> have also been successfully
     archived.
   </para>
+ </sect2>
+
+ <sect2 id="monitoring-pg-stat-io-view">
+  <title><structname>pg_stat_io</structname></title>
+
+  <indexterm>
+   <primary>pg_stat_io</primary>
+  </indexterm>
+
+  <para>
+   The <structname>pg_stat_io</structname> view will contain one row for each
+   combination of backend type, target I/O object, and I/O context, showing
+   cluster-wide I/O statistics. Combinations which do not make sense are
+   omitted.
+  </para>
+
+  <para>
+   Currently, I/O on relations (e.g. tables, indexes) is tracked. However,
+   relation I/O which bypasses shared buffers (e.g. when moving a table from one
+   tablespace to another) is currently not tracked.
+  </para>
+
+  <table id="pg-stat-io-view" xreflabel="pg_stat_io">
+   <title><structname>pg_stat_io</structname> View</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry">
+       <para role="column_definition">
+        Column Type
+       </para>
+       <para>
+        Description
+       </para>
+      </entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry">
+       <para role="column_definition">
+        <structfield>backend_type</structfield> <type>text</type>
+       </para>
+       <para>
+        Type of backend (e.g. background worker, autovacuum worker). See <link
+        linkend="monitoring-pg-stat-activity-view">
+        <structname>pg_stat_activity</structname></link> for more information
+        on <varname>backend_type</varname>s. Some
+        <varname>backend_type</varname>s do not accumulate I/O operation
+        statistics and will not be included in the view.
+       </para>
+      </entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry">
+       <para role="column_definition">
+        <structfield>io_object</structfield> <type>text</type>
+       </para>
+       <para>
+        Target object of an I/O operation. Possible values are:
+       <itemizedlist>
+        <listitem>
+         <para>
+          <literal>relation</literal>: Permanent relations.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>temp relation</literal>: Temporary relations.
+         </para>
+        </listitem>
+       </itemizedlist>
+       </para>
+      </entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry">
+       <para role="column_definition">
+        <structfield>io_context</structfield> <type>text</type>
+       </para>
+       <para>
+        The context of an I/O operation. Possible values are:
+       </para>
+       <itemizedlist>
+        <listitem>
+         <para>
+          <literal>normal</literal>: The default or standard
+          <varname>io_context</varname> for a type of I/O operation. For
+          example, by default, relation data is read into and written out from
+          shared buffers. Thus, reads and writes of relation data to and from
+          shared buffers are tracked in <varname>io_context</varname>
+          <literal>normal</literal>.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>vacuum</literal>: I/O operations performed outside of shared
+          buffers while vacuuming and analyzing permanent relations. Temporary
+          table vacuums use the same local buffer pool as other temporary table
+          IO operations and are tracked in <varname>io_context</varname>
+          <literal>normal</literal>.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>bulkread</literal>: Certain large read I/O operations
+          done outside of shared buffers, for example, a sequential scan of a
+          large table.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>bulkwrite</literal>: Certain large write I/O operations
+          done outside of shared buffers, such as <command>COPY</command>.
+         </para>
+        </listitem>
+       </itemizedlist>
+      </entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry">
+       <para role="column_definition">
+        <structfield>reads</structfield> <type>bigint</type>
+       </para>
+       <para>
+        Number of read operations, each of the size specified in
+        <varname>op_bytes</varname>.
+       </para>
+      </entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry">
+       <para role="column_definition">
+        <structfield>writes</structfield> <type>bigint</type>
+       </para>
+       <para>
+        Number of write operations, each of the size specified in
+        <varname>op_bytes</varname>.
+       </para>
+      </entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry">
+       <para role="column_definition">
+        <structfield>extends</structfield> <type>bigint</type>
+       </para>
+       <para>
+        Number of relation extend operations, each of the size specified in
+        <varname>op_bytes</varname>.
+       </para>
+      </entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry">
+       <para role="column_definition">
+        <structfield>op_bytes</structfield> <type>bigint</type>
+       </para>
+       <para>
+        The number of bytes per unit of I/O read, written, or extended.
+       </para>
+       <para>
+        Relation data reads, writes, and extends are done in
+        <varname>block_size</varname> units, derived from the build-time
+        parameter <symbol>BLCKSZ</symbol>, which is <literal>8192</literal> by
+        default.
+       </para>
+      </entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry">
+       <para role="column_definition">
+        <structfield>evictions</structfield> <type>bigint</type>
+       </para>
+       <para>
+        Number of times a block has been written out from a shared or local
+        buffer in order to make it available for another use.
+       </para>
+       <para>
+        In <varname>io_context</varname> <literal>normal</literal>, this counts
+        the number of times a block was evicted from a buffer and replaced with
+        another block. In <varname>io_context</varname>s
+        <literal>bulkwrite</literal>, <literal>bulkread</literal>, and
+        <literal>vacuum</literal>, this counts the number of times a block was
+        evicted from shared buffers in order to add the shared buffer to a
+        separate, size-limited ring buffer for use in a bulk I/O operation.
+        </para>
+      </entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry">
+       <para role="column_definition">
+        <structfield>reuses</structfield> <type>bigint</type>
+       </para>
+       <para>
+        The number of times an existing buffer in a size-limited ring buffer
+        outside of shared buffers was reused as part of an I/O operation in the
+        <literal>bulkread</literal>, <literal>bulkwrite</literal>, or
+        <literal>vacuum</literal> <varname>io_context</varname>s.
+       </para>
+      </entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry">
+       <para role="column_definition">
+        <structfield>fsyncs</structfield> <type>bigint</type>
+       </para>
+       <para>
+        Number of <literal>fsync</literal> calls. These are only tracked in
+        <varname>io_context</varname> <literal>normal</literal>.
+       </para>
+      </entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry">
+       <para role="column_definition">
+        <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
+       </para>
+       <para>
+        Time at which these statistics were last reset.
+       </para>
+      </entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   Some backend types never perform I/O operations on some I/O objects and/or
+   in some I/O contexts. These rows are omitted from the view. For example, the
+   checkpointer does not checkpoint temporary tables, so there will be no rows
+   for <varname>backend_type</varname> <literal>checkpointer</literal> and
+   <varname>io_object</varname> <literal>temp relation</literal>.
+  </para>
+
+  <para>
+   In addition, some I/O operations will never be performed either by certain
+   backend types or on certain I/O objects and/or in certain I/O contexts.
+   These cells will be NULL. For example, temporary tables are not
+   <literal>fsync</literal>ed, so <varname>fsyncs</varname> will be NULL for
+   <varname>io_object</varname> <literal>temp relation</literal>. Also, the
+   background writer does not perform reads, so <varname>reads</varname> will
+   be NULL in rows for <varname>backend_type</varname> <literal>background
+   writer</literal>.
+  </para>
+
+  <para>
+   <structname>pg_stat_io</structname> can be used to inform database tuning.
+   For example:
+   <itemizedlist>
+    <listitem>
+     <para>
+      A high <varname>evictions</varname> count can indicate that shared
+      buffers should be increased.
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      Client backends rely on the checkpointer to ensure data is persisted to
+      permanent storage. Large numbers of <varname>fsyncs</varname> by
+      <literal>client backend</literal>s could indicate a misconfiguration of
+      shared buffers or of the checkpointer. More information on configuring
+      the checkpointer can be found in <xref linkend="wal-configuration"/>.
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      Normally, client backends should be able to rely on auxiliary processes
+      like the checkpointer and the background writer to write out dirty data
+      as much as possible. Large numbers of writes by client backends could
+      indicate a misconfiguration of shared buffers or of the checkpointer.
+      More information on configuring the checkpointer can be found in <xref
+      linkend="wal-configuration"/>.
+     </para>
+    </listitem>
+   </itemizedlist>
+  </para>
+
 
  </sect2>
 

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
@@ -1117,6 +1117,21 @@ CREATE VIEW pg_stat_bgwriter AS
         pg_stat_get_buf_alloc() AS buffers_alloc,
         pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
 
+CREATE VIEW pg_stat_io AS
+SELECT
+       b.backend_type,
+       b.io_object,
+       b.io_context,
+       b.reads,
+       b.writes,
+       b.extends,
+       b.op_bytes,
+       b.evictions,
+       b.reuses,
+       b.fsyncs,
+       b.stats_reset
+FROM pg_stat_get_io() b;
+
 CREATE VIEW pg_stat_wal AS
     SELECT
         w.wal_records,