Permalink
Browse files

Initial version from PostgreSQL feature branch

  • Loading branch information...
1 parent cf23e30 commit 94087531fbeb5565f07567742f87be31c52514de @petergeoghegan committed Apr 28, 2016
View
@@ -0,0 +1,5 @@
+*.o
+*.so
+*.swp
+.deps
+.vagrant
View
@@ -0,0 +1,18 @@
+Copyright (c) 2016, Peter Geoghegan
+
+Permission to use, copy, modify, and distribute this software and its
+documentation for any purpose, without fee, and without a written agreement
+is hereby granted, provided that the above copyright notice and this
+paragraph and the following two paragraphs appear in all copies.
+
+IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY FOR
+DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
+LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
+DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGE.
+
+THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIMS ANY WARRANTIES,
+INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
+AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS
+ON AN "AS IS" BASIS, AND THE UNIVERSITY OF CALIFORNIA HAS NO OBLIGATIONS TO
+PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
View
@@ -0,0 +1,25 @@
+short_ver = 0.1
+long_ver = $(shell (git describe --tags --long '--match=v*' 2>/dev/null || echo $(short_ver)-0-unknown) | cut -c2-)
+
+MODULE_big = amcheck
+OBJS = amcheck.o $(WIN32RES)
+
+EXTENSION = amcheck
+DATA = amcheck--0.1.sql
+PGFILEDESC = "amcheck - verify the logical consistency of indexes"
+DOCS = README.md
+REGRESS = install_amcheck extern_sort_bytea \
+ extern_sort_collations extern_sort_numeric
+
+PG_CONFIG = pg_config
+PGXS = $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+
+DEBUILD_ROOT = /tmp/amcheck
+
+deb:
+ mkdir -p $(DEBUILD_ROOT) && rm -rf $(DEBUILD_ROOT)/*
+ rsync -Ca --exclude=build/* ./ $(DEBUILD_ROOT)/
+ cd $(DEBUILD_ROOT) && make -f debian/rules orig
+ cd $(DEBUILD_ROOT) && debuild -us -uc -sa
+ cp -a /tmp/amcheck_* /tmp/postgresql-9.* build/
View
@@ -0,0 +1,232 @@
+# amcheck: Verify the logical consistency of PostgreSQL B-Tree indexes
+
+Current version: 0.1
+
+Author: Peter Geoghegan [`<pg@bowt.ie>`](mailto:pg@bowt.ie)
+
+License: <a href="https://opensource.org/licenses/postgresql">PostgreSQL license</a>
+
+Minimum supported version: PostgreSQL 9.4
+
+## Overview
+
+The `amcheck` module provides functions that allow you to verify the logical
+consistency of the structure of PostgreSQL indexes. If the structure appears
+to be valid, no error is raised. Currently, only B-Tree indexes are supported,
+although since in practice the majority of PostgreSQL indexes are B-Tree
+indexes, `amcheck` is likely to be effective as a general corruption smoke-test
+in production PostgreSQL installations.
+
+See "Using amcheck effectively" below for information about the kinds of
+real-world problems `amcheck` is intended to detect.
+
+### Invariants
+
+`amcheck` provides functions that specifically verify various *invariants* in
+the structure of the representation of particular indexes. The correctness of
+the access method functions behind index scans and other important operations
+is predicated on these invariants always holding. For example, certain
+functions verify, among other things, that all B-Tree pages have items in
+"logical" order; if that particular invariant somehow fails to hold, we can
+expect binary searches on any affected page to produce wrong answers. As a
+consequence of that, index scans may subtly produce wrong answers.
+
+Verification is performed using the same procedures as those used by index
+scans themselves, which may be user-defined operator class code. For example,
+B-Tree index verification relies on comparisons made with one or more B-Tree
+support function 1 routines, much like B-Tree index scans rely on the routines
+to guide the scan to a point in the underlying table; see
+http://www.postgresql.org/docs/current/static/xindex.html for details of
+operator class support functions.
+
+### Project background
+
+`amcheck` is proposed as a contrib extension for PostgreSQL 9.7. This
+externally maintained version of the extension exists to target earlier
+versions of PostgreSQL (PostgreSQL 9.4+).
+
+### Bugs
+
+Report bugs using the <a href="https://github.com/petergeoghegan/amcheck/issues">Github
+issue tracker</a>. Feature requests will not accepted, because `amcheck` is
+targeted for future inclusion into PostgreSQL as a contrib module.
+
+## Installation
+
+### Building using PGXS (generic)
+
+The module can be built using the standard PGXS infrastructure. For this to
+work, you will need to have the `pg_config` program available in your $PATH.
+
+If you are using a packaged PostgreSQL build and have `pg_config` available
+(and in your OS user's $PATH), the procedure is as follows:
+
+```shell
+tar xvzf amcheck-0.1.tar.gz
+cd amcheck-0.1
+make
+make install
+```
+
+Note that just because `pg_config` is located in one user's $PATH does not
+necessarily make it so for the root user.
+
+#### Building Debian/Ubuntu packages
+
+The Makefile also provides a target for building Debian packages. The target
+has a dependency on `debhelper`, `devscripts`, `postgresql-server-dev-all`, and
+the PostgreSQL source package itself (e.g. `postgresql-server-dev-9.4`).
+
+The packages can be created and installed from the amcheck directory as
+follows:
+
+```shell
+sudo aptitude install debhelper devscripts postgresql-server-dev-all
+make deb
+sudo dpkg -i ./build/postgresql-9.4-amcheck_*.deb
+```
+
+### Setting up PostgreSQL
+
+Once `amcheck` is built and installed, it should be created as a PostgreSQL
+extension in every database that requires it:
+
+`mydb=# CREATE EXTENSION amcheck;`
+
+`amcheck` functions may be used only by superusers.
+
+## Interface
+
+The `amcheck` extension has a simple interface. `amcheck` consists of just a
+few functions that can be used for verification of a named B-Tree index. Note
+that currently, no function inspects the structure of the underlying heap
+representation (table).
+
+`regclass` function arguments are used by `amcheck` to identify particular
+index relations. See the <a href="http://www.postgresql.org/docs/current/static/datatype-oid.html">PostgreSQL
+documentation on Object identifier types</a> for more information.
+
+### `bt_index_check(index regclass) returns void`
+
+`bt_index_check` tests that its target, a B-Tree index, respects a variety of
+invariants. Example usage:
+
+```sql
+ SELECT bt_index_check(c.oid), c.relname, c.relpages
+ FROM pg_index i
+ JOIN pg_opclass op ON i.indclass[0] = op.oid
+ JOIN pg_am am ON op.opcmethod = am.oid
+ JOIN pg_class c ON i.indexrelid = c.oid
+ JOIN pg_namespace n ON c.relnamespace = n.oid
+ WHERE am.amname = 'btree' AND n.nspname = 'pg_catalog'
+ -- Don't check pg_class (bt_index_parent_check() requires this):
+ AND c.relname NOT LIKE 'pg_class%'
+ -- Function may throw an error when this is omitted:
+ AND i.indisready AND i.indisvalid
+ ORDER BY c.relpages DESC LIMIT 10;
+```
+```shell
+ bt_index_check | relname | relpages
+ ----------------+---------------------------------+----------
+ | pg_depend_reference_index | 43
+ | pg_depend_depender_index | 40
+ | pg_proc_proname_args_nsp_index | 31
+ | pg_description_o_c_o_index | 21
+ | pg_attribute_relid_attnam_index | 14
+ | pg_proc_oid_index | 10
+ | pg_attribute_relid_attnum_index | 9
+ | pg_amproc_fam_proc_index | 5
+ | pg_amop_opr_fam_index | 5
+ | pg_amop_fam_strat_index | 5
+```
+
+This example shows a session that performs verification of every catalog index
+in the database "test" (except those associated with the pg_class catalog).
+Details of just the 10 largest indexes verified are displayed. Since no error
+is raised, all indexes tested appear to be logically consistent. Naturally,
+this query could easily be changed to call `bt_index_check` for every index in
+the database where verification is supported. An `AccessShareLock` is acquired
+on the target index by `bt_index_check`. This lock mode is the same lock mode
+acquired on relations by simple `SELECT` statements.
+
+`bt_index_check` does not verify invariants that span child/parent
+relationships, nor does it verify that the target index is consistent with its
+heap relation. When a routine, lightweight test for corruption is required in
+a live production environment, using `bt_index_check` often provides the best
+trade-off between thoroughness of verification and limiting the impact on
+application performance and availability.
+
+### `bt_index_parent_check(index regclass) returns void`
+
+`bt_index_parent_check` tests that its target, a B-Tree index, respects a
+variety of invariants. The checks performed by `bt_index_parent_check` are a
+superset of the checks performed by `bt_index_check`. `bt_index_parent_check`
+can be thought of as a more thorough variant of `bt_index_check`: unlike
+`bt_index_check`, `bt_index_parent_check` also checks invariants that span
+parent/child relationships. However, it does not verify that the target index
+is consistent with its heap relation. `bt_index_parent_check` follows the
+general convention of raising an error if it finds a logical inconsistency or
+other problem.
+
+An `ExclusiveLock` is required on the target index by bt_index_parent_check (a
+`ShareLock` is also acquired on the heap relation). These locks prevent
+concurrent data modification from `INSERT`, `UPDATE`, and `DELETE` commands.
+The locks also prevent the underlying relation from being concurrently
+processed by `VACUUM` and certain other utility commands. Note that the
+function holds locks for as short a duration as possible, so there is no
+advantage to verifying each index individually in a series of transactions.
+
+`bt_index_parent_check`'s additional verification is more likely to detect
+various pathological cases. These cases may involve an incorrectly implemented
+B-Tree operator class used by the index that is checked, or, hypothetically,
+undiscovered bugs in the underlying B-Tree index access method code. Note that
+`bt_index_parent_check` requires locks in a way that is incompatible with Hot
+Standby, unlike `bt_index_check`.
+
+## Using amcheck effectively
+
+`amcheck` can be effective at detecting various types of failure modes that
+data page checksums will always fail to catch. These include:
+
+* Filesystem or storage subsystem faults where checksums happen to simply not
+ be enabled. Note that `amcheck` examines a page as represented in some
+ shared memory buffer at the time of verification if there is only a shared
+ buffer hit when accessing the block. Consequently, `amcheck` does not
+ necessarily examine data read from the filesystem at the time of
+ verification. Note that when checksums are enabled, `amcheck` may raise an
+ error due to a checksum failure when a corrupt block is read into a buffer.
+
+* Corruption caused by faulty RAM, and the broader memory subsystem and
+ operating system. PostgreSQL does not protect against correctable memory
+ errors and it is assumed you will operate using RAM that uses industry
+ standard Error Correcting Codes (ECC) or better protection. However, ECC
+ memory is typically only immune to single-bit errors, and should not be
+ assumed to provide *absolute* protection against failures that result in
+ memory corruption.
+
+* Structural inconsistencies caused by incorrect operator class
+ implementations, including issues due to the comparison rules of operating
+ system collations changing the ordering implied by comparisons of an
+ underlying collatable type, such as text. Though rare, updates to operating
+ system collation rules can cause these issues. More commonly, an
+ inconsistency in the collation order between a master server and a standby
+ server is implicated, possibly because the *major* operating system version
+ in use is inconsistent. Such inconsistencies will generally only arise on
+ standby servers, and so can generally only be detected on standby servers.
+ If a problem like this arises, it may not affect each individual index that
+ is ordered using an affected collation, simply because *indexed* values might
+ happen to have the same absolute ordering regardless of the behavioral
+ inconsistency.
+
+* Corruption caused by hypothetical undiscovered bugs in the underlying
+ PostgreSQL access method code. Automatic verification of the structural
+ integrity of indexes plays a role in the general testing of new or proposed
+ PostgreSQL features that could plausibly allow a logical inconsistency to be
+ introduced. One obvious testing strategy is to call `amcheck` functions
+ continuously when running the standard regression tests.
+
+No error concerning corruption raised by `amcheck` should ever be a false
+positive. Users are strongly advised to perform a `REINDEX` for any index
+where corruption is indicated, although it is also important to identify and
+correct the underlying problem. In general, `amcheck` can only prove the
+presence of corruption; it cannot prove its absence.
View
@@ -0,0 +1,11 @@
+# -*- mode: ruby -*-
+# vi: set ft=ruby :
+
+# Vagrantfile API/syntax version. Don't touch unless you know what you're doing!
+VAGRANTFILE_API_VERSION = "2"
+
+Vagrant.configure("2") do |config|
+ config.vm.box = "debian/jessie64"
+
+ config.vm.provision :shell, :path => "bootstrap.sh"
+end
View
@@ -0,0 +1,20 @@
+/* contrib/amcheck/amcheck--0.1.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION amcheck" to load this file. \quit
+
+--
+-- bt_index_check()
+--
+CREATE FUNCTION bt_index_check(index regclass)
+RETURNS VOID
+AS 'MODULE_PATHNAME', 'bt_index_check'
+LANGUAGE C STRICT;
+
+--
+-- bt_index_parent_check()
+--
+CREATE FUNCTION bt_index_parent_check(index regclass)
+RETURNS VOID
+AS 'MODULE_PATHNAME', 'bt_index_parent_check'
+LANGUAGE C STRICT;
Oops, something went wrong.

0 comments on commit 9408753

Please sign in to comment.