Review the README, introduce a new first simple tutorial. (#942)

hapostgres · Oct 6, 2022 · 62f982b · 62f982b
1 parent 69441fe
commit 62f982b
Show file tree

Hide file tree

Showing 13 changed files with 866 additions and 674 deletions.
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # pg_auto_failover
 
-[![Documentation Status](https://readthedocs.org/projects/pg-auto-failover/badge/?version=master)](https://pg-auto-failover.readthedocs.io/en/master/?badge=master)
+[![Documentation Status](https://readthedocs.org/projects/pg-auto-failover/badge/?version=main)](https://pg-auto-failover.readthedocs.io/en/main/?badge=main)
 
 pg_auto_failover is an extension and service for PostgreSQL that monitors
 and manages automated failover for a Postgres cluster. It is optimized for
@@ -13,22 +13,24 @@ and secondary by the monitor.
 
 ![pg_auto_failover Architecture with 2 nodes](docs/tikz/arch-single-standby.svg?raw=true "pg_auto_failover Architecture with 2 nodes")
 
-The pg_auto_failover Monitor implements a state machine and relies on in-core
-PostgreSQL facilities to deliver HA. For example. when the **secondary** node
-is detected to be unavailable, or when its lag is too much, then the
-Monitor removes it from the `synchronous_standby_names` setting on the
-**primary** node. Until the **secondary** is back to being monitored healthy,
-failover and switchover operations are not allowed, preventing data loss.
+The pg_auto_failover Monitor implements a state machine and relies on
+in-core PostgreSQL facilities to deliver HA. For example. when the
+**secondary** node is detected to be unavailable, or when its lag is too
+much, then the Monitor removes it from the `synchronous_standby_names`
+setting on the **primary** node. Until the **secondary** is back to being
+monitored healthy, failover and switchover operations are not allowed,
+preventing data loss.
 
 pg_auto_failover consists of the following parts:
 
   - a PostgreSQL extension named `pgautofailover`
   - a PostgreSQL service to operate the pg_auto_failover monitor
   - a pg_auto_failover keeper to operate your PostgreSQL instances, see `pg_autoctl run`
 
-Starting with pg_auto_failover version 1.4, it is possible to implement a
-production architecture with any number of Postgres nodes, for better data
-availability guarantees.
+## Multiple Standbys
+
+It is possible to implement a production architecture with any number of
+Postgres nodes, for better data availability guarantees.
 
 ![pg_auto_failover Architecture with 3 nodes](docs/tikz/arch-multi-standby.svg?raw=true "pg_auto_failover Architecture with 3 nodes")
 
@@ -37,23 +39,19 @@ that reaches the secondary state is added to synchronous_standby_names on
 the primary. With pg_auto_failover 1.4 it is possible to remove a node from
 the _replication quorum_ of Postgres.
 
-## Dependencies
+## Citus HA
 
-At runtime, pg_auto_failover depends on only Postgres. Postgres versions 10,
-11, 12, 13, and 14 are currently supported.
+Starting with pg_auto_failover 2.0 it's now possible to also implement High
+Availability for a Citus cluster.
 
-At buildtime. pg_auto_failover depends on Postgres server development
-package like any other Postgres extensions (the server development package
-for Postgres 11 when using debian or Ubuntu is named
-`postgresql-server-dev-11`), and then `libssl-dev` and `libkrb5-dev` are
-needed to for the client side when building with all the `libpq`
-authentication options.
+![pg_auto_failover Architecture with Citus](docs/tikz/arch-citus.svg?raw=true "pg_auto_failover Architecture with Citus")
 
 ## Documentation
 
 Please check out project
-[documentation](https://pg-auto-failover.readthedocs.io/en/master/) for how
-to guides and troubleshooting information.
+[documentation](https://pg-auto-failover.readthedocs.io/en/main/) for
+tutorial, manual pages, detailed design coverage, and troubleshooting
+information.
 
 ## Installing pg_auto_failover from packages
 
@@ -64,16 +62,14 @@ the packages from there.
 ### Ubuntu or Debian:
 
 Binary packages for debian and derivatives (ubuntu) are available from
-`apt.postgresql.org`__ repository, install by following the linked
-documentation and then::
+[apt.postgresql.org](https://wiki.postgresql.org/wiki/Apt) repository,
+install by following the linked documentation and then::
 
 ```bash
 $ sudo apt-get install pg-auto-failover-cli
 $ sudo apt-get install postgresql-14-auto-failover
 ```
 
-__ https://wiki.postgresql.org/wiki/Apt
-
 When using debian, two packages are provided for pg_auto_failover: the
 monitor Postgres extension is packaged separately and depends on the
 Postgres version you want to run for the monitor itself. The monitor's
@@ -97,229 +93,32 @@ $ apt-get update
 $ apt-get install -y --no-install-recommends postgresql-14
 ```
 
-### Fedora, CentOS, or Red Hat:
-
-```bash
-# Add the repository to your system
-curl https://install.citusdata.com/community/rpm.sh | sudo bash
-
-# Install pg_auto_failover
-sudo yum install -y pg-auto-failover10_11
-
-# Confirm installation
-/usr/pgsql-11/bin/pg_autoctl --version
-```
+### Other installation methods
 
-## Building pg_auto_failover from source
+Please see our extended documentation chapter [Installing
+pg_auto_failover](https://pg-auto-failover.readthedocs.io/en/main/install.html)
+for details.
 
-To build the project, make sure you have installed the build-dependencies,
-then just type `make`. You can install the resulting binary using `make
-install`.
-
-Build dependencies example on debian for Postgres 11:
-
-~~~ bash
-$ sudo apt-get install postgresql-server-dev-11 libssl-dev libkrb5-dev libncurses6
-~~~
-
-Then build pg_auto_failover from sources with the following instructions:
-
-~~~ bash
-$ make
-$ sudo make install -j10
-~~~
+## Trying pg_auto_failover on your local computer
 
-For this to work though, the PostgreSQL client (libpq) and server
-(postgresql-server-dev) libraries must be available in your standard include
-and link paths.
+The main documentation for pg_auto_failover includes the following 3 tutorial:
 
-The `make install` step will deploy the `pgautofailover` PostgreSQL extension in
-the PostgreSQL directory for extensions as pointed by `pg_config`, and
-install the `pg_autoctl` binary command in the directory pointed to by
-`pg_config --bindir`, alongside other PostgreSQL tools such as `pg_ctl` and
-`pg_controldata`.
+  - The main [pg_auto_failover
+    Tutorial](https://pg-auto-failover.readthedocs.io/en/main/tutorial.html)
+    uses docker-compose on your local computer to start multiple Postgres
+    nodes and implement your first failover.
 
-## Trying pg_auto_failover on your local computer
+  - The complete [pg_auto_failover Azure VM
+    Tutorial](https://pg-auto-failover.readthedocs.io/en/main/azure-tutorial.html)
+    guides you into creating an Azure network and then Azure VMs in that
+    network, to then provisioning those VMs, and then running Postgres nodes
+    with pg_auto_failover and then introducing hard failures and witnessing
+    an automated failover.
 
-Once the building and installation is done, follow those steps:
-
-  0. If you're building from sources, and if you've already been using tmux,
-     then try the following command:
-
-	 ~~~ bash
-	 $ make cluster
-	 ~~~
-
-	 This creates a tmux session with multiple panes that are each running a
-     node for pg_auto_failover: the monitor, a first Postgres node, a second
-     Postgres node, and then there is another tmux pane for interactive
-     commands.
-
-  1. Install and run a monitor
-
-     ~~~ bash
-	 $ export PGDATA=./monitor
-	 $ export PGPORT=5000
-	 $ pg_autoctl create monitor --ssl-self-signed --hostname localhost --auth trust --run
-     ~~~
-
-  2. Get the Postgres URI (connection string) for the monitor node:
-
-     ~~~ bash
-     $ pg_autoctl show uri --formation monitor 
-	 postgres://autoctl_node@localhost:5000/pg_auto_failover?sslmode=require
-     ~~~
-
-     The following two steps are going to use the option `--monitor` which
-     expects that connection string. So copy/paste your actual Postgres URI
-     for the monitor in the next steps.
-
-  3. Install and run a primary PostgreSQL instance:
-
-     ~~~ bash
-	 $ export PGDATA=./node_1
-	 $ export PGPORT=5001
-     $ pg_autoctl create postgres \
-         --hostname localhost \
-         --auth trust \
-         --ssl-self-signed \
-         --monitor 'postgres://autoctl_node@localhost:5000/pg_auto_failover?sslmode=require' \
-         --run
-     ~~~
-
-  4. Install and run a secondary PostgreSQL instance, using exactly the same
-     command, but with a different PGDATA and PGPORT, because we're running
-     everything on the same host:
-
-     ~~~ bash
-	 $ export PGDATA=./node_2
-	 $ export PGPORT=5002
-     $ pg_autoctl create postgres \
-         --hostname localhost \
-         --auth trust \
-         --ssl-self-signed \
-         --monitor 'postgres://autoctl_node@localhost:5000/pg_auto_failover?sslmode=require' \
-         --run
-     ~~~
-
-  4. See the state of the new system:
-
-     ~~~ bash
-     $ export PGDATA=./monitor
-     $ export PGPORT=5000
-	 $ pg_autoctl show state
-	   Name |  Node |      Host:Port |       LSN | Reachable |       Current State |      Assigned State
-     -------+-------+----------------+-----------+-----------+---------------------+--------------------
-     node_1 |     1 | localhost:5001 | 0/30000D8 |       yes |             primary |             primary
-     node_2 |     2 | localhost:5002 | 0/30000D8 |       yes |           secondary |           secondary
-     ~~~
-
-That's it! You now have a running pg_auto_failover setup with two PostgreSQL nodes
-using Streaming Replication to implement fault-tolerance.
-
-## Your first failover
-
-Now that we have two nodes setup and running, we can initiate a manual
-failover, also named a switchover. It is possible to trigger such an
-operation without any node having to actually fail when using
-pg_auto_failover.
-
-The command `pg_autoctl perform switchover` can be used to force
-pg_auto_failover to orchestrate a failover. Because all the nodes are
-actually running fine (meaning that `pg_autoctl` actively reports the local
-state of each node to the monitor), the failover process does not have to
-carefully implement timeouts to make sure to avoid split-brain.
-
-~~~ bash
-$ pg_autoctl perform switchover
-19:06:41 63977 INFO  Listening monitor notifications about state changes in formation "default" and group 0
-19:06:41 63977 INFO  Following table displays times when notifications are received
-    Time |   Name |  Node |      Host:Port |       Current State |      Assigned State
----------+--------+-------+----------------+---------------------+--------------------
-19:06:43 | node_1 |     1 | localhost:5001 |             primary |            draining
-19:06:43 | node_2 |     2 | localhost:5002 |           secondary |   prepare_promotion
-19:06:43 | node_2 |     2 | localhost:5002 |   prepare_promotion |   prepare_promotion
-19:06:43 | node_2 |     2 | localhost:5002 |   prepare_promotion |    stop_replication
-19:06:43 | node_1 |     1 | localhost:5001 |             primary |      demote_timeout
-19:06:43 | node_1 |     1 | localhost:5001 |            draining |      demote_timeout
-19:06:43 | node_1 |     1 | localhost:5001 |      demote_timeout |      demote_timeout
-19:06:44 | node_2 |     2 | localhost:5002 |    stop_replication |    stop_replication
-19:06:44 | node_2 |     2 | localhost:5002 |    stop_replication |        wait_primary
-19:06:44 | node_1 |     1 | localhost:5001 |      demote_timeout |             demoted
-19:06:44 | node_1 |     1 | localhost:5001 |             demoted |             demoted
-19:06:44 | node_2 |     2 | localhost:5002 |        wait_primary |        wait_primary
-19:06:45 | node_1 |     1 | localhost:5001 |             demoted |          catchingup
-19:06:46 | node_1 |     1 | localhost:5001 |          catchingup |          catchingup
-19:06:47 | node_1 |     1 | localhost:5001 |          catchingup |           secondary
-19:06:47 | node_2 |     2 | localhost:5002 |        wait_primary |             primary
-19:06:47 | node_1 |     1 | localhost:5001 |           secondary |           secondary
-19:06:48 | node_2 |     2 | localhost:5002 |             primary |             primary
-~~~
-
-The promotion of the secondary node is finished when the node reaches the
-goal state *wait_primary*. At this point, the application that connects to
-the secondary is allowed to proceed with write traffic.
-
-Because this is a switchover and no nodes have failed, `node_1` that used to
-be the primary completes its cycle and joins as a secondary within the same
-operation. The Postgres tool `pg_rewind` is used to implement that
-transition.
-
-And there you have done a full failover from your `node_1`, former primary, to
-your `node_2`, new primary. We can have a look at the state now:
-
-~~~
-$ pg_autoctl show state
-  Name |  Node |      Host:Port |       LSN | Reachable |       Current State |      Assigned State
--------+-------+----------------+-----------+-----------+---------------------+--------------------
-node_1 |     1 | localhost:5001 | 0/3001648 |       yes |           secondary |           secondary
-node_2 |     2 | localhost:5002 | 0/3001648 |       yes |             primary |             primary
-~~~
-
-## Cleaning-up your local setup
-
-You can use the commands `pg_autoctl stop`, `pg_autoctl drop node
---destroy`, and `pg_autoctl drop monitor --destroy` if you want to get rid
-of everything set-up so far.
-
-## Formations and Groups
-
-In the previous example, the options `--formation` and `--group` are not
-used. This means we've been using the default values: the default formation
-is named *default* and the default group id is zero (0).
-
-It's possible to add other services to the same running monitor by using
-another formation.
-
-## Installing pg_auto_failover on-top of an existing Postgres setup
-
-The `pg_autoctl create postgres --pgdata ${PGDATA}` step can be used with an
-existing Postgres installation running at `${PGDATA}`, only with the primary
-node.
-
-On a secondary node, it is possible to re-use an existing data directory
-when it has the same `system_identifier` as the other node(s) already
-registered in the same formation and group.
-
-## Application and Connection Strings
-
-To retrieve the connection string to use at the application level, use the
-following command:
-
-~~~ bash
-$ pg_autoctl show uri --formation default --pgdata ...
-postgres://localhost:5002,localhost:5001/postgres?target_session_attrs=read-write&sslmode=require
-~~~
-
-You can use that connection string from within your application, adjusting
-the username that is used to connect. By default, pg_auto_failover edits the
-Postgres HBA rules to allow the `--username` given at `pg_autoctl create
-postgres` time to connect to this URI from the database node itself.
-
-To allow application servers to connect to the Postgres database, edit your
-`pg_hba.conf` file as documented in [the pg_hba.conf
-file](https://www.postgresql.org/docs/current/auth-pg-hba-conf.html) chapter
-of the PostgreSQL documentation.
+  - The [Citus Cluster Quick
+    Start](https://pg-auto-failover.readthedocs.io/en/main/citus-quickstart.html)
+    tutorial uses docker-compose to create a full Citus cluster and guide
+    you to a worker failover and then a coordinator failover.
 
 ## Reporting Security Issues