From 92c8e08ed8726041ac338e3a5ffff58a7adbca36 Mon Sep 17 00:00:00 2001
From: Peter Matulis <peter.matulis@canonical.com>
Date: Thu, 13 Aug 2015 15:53:41 -0400
Subject: [PATCH] for issue #99; HA logging + reviewed HA in general

---
 src/en/juju-ha.md                   | 154 +++++++++++++++-------------
 src/en/troubleshooting-debug-log.md |   4 +
 2 files changed, 88 insertions(+), 70 deletions(-)

diff --git a/src/en/juju-ha.md b/src/en/juju-ha.md
index 092299e1e..3e05e5b3e 100644
--- a/src/en/juju-ha.md
+++ b/src/en/juju-ha.md
@@ -1,70 +1,84 @@
-# Juju High Availability
-
-As of version 1.20, Juju supports high availability for its state server.
-
-High Availability in general terms means that a Juju environment has 3 or more
-(up to 7) redundant state servers. One of these is the master with automatic
-failover occurring should something happen to the master.
-
-This section describes how Juju's high availability features works.
-
-### MongoDB
-
-Juju's High Availability (HA) mode is tightly integrated and dependent on
-MongoDB.  Juju stores all its data about the environment in a MongoDB database.
-MongoDB has an HA implementation that connects multiple MongoDB databases in a
-cluster called a [replicaset](http://docs.mongodb.org/manual/replication/).  The
-replicaset has a 1:1 relation to Juju state servers, and the master of the
-replicaset is the master of the Juju state servers.
-
-### Ensure Availability
-
-Juju's HA mode is turned on using the `juju ensure-availability` command. By
-default this sets the desired number of state machines in the environment to 3.
-There is an optional `-n` parameter which can be used to set this number higher.
-
-As stated above, there is a 1:1 relationship between the number of state
-machines and the number of MongoDB instances. This means that the number of
-state machines must be an odd number to prevent ties during voting for master,
-and the number of state servers cannot exceed 7, due to MongoDB limits.  In
-practice, this means the possibilities are 3, 5 or 7 state machines.
- 
-Currently the number of state machines can be increased using the -n flag on
-`juju ensure-availability`, but not decreased. The only way to decrease the
-number of machines is to create a backup of your environment and then restore
-the backup to a new environment, which starts with a single state server.
-
-Whenever you run ensure-availability, the command will report the changes that
-it made to the system's desired model, which will shortly be reflected in
-reality.
-
-### When State Servers Fail
-
-Juju does not automatically re-spawn state machines if one or more fail.
-However, if an environment is already in HA mode, you can recover from state
-machine failure by manually re-running `juju ensure-availability`. This can be
-done as long as more than half the original number of machines are still
-running.
-
-The process to recovering when state servers have failed is:
-
-* Run `juju ensure-availability`. New state servers will be created to replace
-  the dead ones.
-* After some time the new state servers will be ready and the dead state servers
-  will be removed from Juju's set of high availability state servers. This will
-  take on the order of 30 seconds to 20 minutes depending on variables like the
-  load on the machines and the amount of Juju configuration data to
-  replicate. In the output from `juju status` the new state servers will have a
-  `state-server-member-status` value of `has-vote` and the dead state servers
-  will have `no-vote`. At this point, the state servers in the environment are
-  fully-redundant again.
-* To have Juju not treat the dead state server machines as state servers any
-  more, run `juju ensure-availability` again. The `state-server-member-status`
-  attribute will disappear from these machines in the `juju status` output.
-* The dead state server hosts can now be completely removed from Juju's
-  configuration by using `juju remove-machine`.
-  
-If fewer than half of the original state servers are still running, you cannot
-recover by using the ensure-availability command because the MongoDB replicaset
-does not have a quorum with which to elect a new master.  In this case, you must
-restore from a previous backup.
+Title: Juju High Availability
+
+
+# High Availability
+
+Juju High Availability (HA) means that a Juju environment has 3 or more (up to
+7) state servers (bootstrap nodes) one of which is the *master*. Automatic
+failover occurs should the master lose connectivity.
+
+
+## Juju HA and MongoDB
+
+Juju HA is tightly integrated with the MongoDB database since that is where all
+environment data is stored. The MongoDB software has a native HA implementation
+and this is what Juju HA uses. A MongoDB cluster is called a
+[replica set](http://docs.mongodb.org/manual/replication/) and, in the context
+of Juju HA, i) each of its members corresponds to a different Juju state server
+and ii) the MongoDB replica set master corresponds to the Juju state server master.
+
+The number of state servers must be an odd number to prevent ties during voting
+for master, and the number of state servers cannot exceed 7, due to MongoDB
+limitations. This means a Juju HA cluster can have 3, 5 or 7 state servers.
+
+
+## Activating and modifying HA
+
+Juju HA is activated and modified with the `juju ensure-availability` command.
+As will be shown in the next section, it is also used to recover from failed
+state servers.
+
+When activating HA, by default, this command sets the number of state
+servers in the environment to 3. The optional `-n` switch can modify this 
+number.
+
+When modifying HA, the `-n` switch can be used to increase the number of state
+servers. The only way to decrease is to create a backup of your environment
+and then restore the backup to a new environment, which starts with a single
+state server. You can then increase to the desired number.
+
+Whenever you run ensure-availability, the command will report the changes it
+intends to make, which will shortly be implemented.
+
+For complete syntax, see the [command reference page](./commands.html#ensure-availability
+).
+
+
+## Recovering from state server failure
+
+In the advent of failed state servers, Juju does not automatically re-spawn new
+state servers nor remove the failed ones. However, as long as more than half of
+the original number of state servers remain available you can manually recover.
+The process is detailed below.
+
+1. Run `juju ensure-availability`.
+1. Verify that the output of `juju status` shows a value of `has-vote` for the
+   `state-server-member-status` attribute for each new server and a value of
+   `no-vote` for each old server. Once confirmed, the new servers are fully
+   operational as cluster members and the old servers have been demoted (no longer
+   part of HA). This process can take between 30 seconds to 20 minutes depending
+   on machine resources and Juju data volume.
+1. Run `juju ensure-availability` again to have Juju no longer consider the
+   old machines as state servers. The `state-server-member-status` attribute
+   should disappear from these machines.
+1. Use the `juju remove-machine` command to remove the old machines entirely.
+
+You cannot repair the cluster as outlined above if fewer than half of the
+original state servers remain available because the MongoDB replica set will not
+have the quorum necessary to elect a new master. You must restore from backups
+in this case.
+
+
+## HA and logging
+
+All Juju units send logs to all state servers in the HA cluster and the user
+accesses those logs in the usual manner, via `juju debug-log` (see 
+[Troubleshooting with debug-log](./troubleshooting-debug-log.html)) or by
+viewing the logs directly on any state server (/var/log/juju).
+
+Logging to a state server begins once it becomes fully operational. One caveat
+is that past cluster logs are not sent to the new "slave" state server. It
+should therefore be noted that all state servers are not guaranteed to house
+the same logs. In particular, this should be understood when using `juju
+debug-log` as this triggers a connection to be made to a random state server.
+This will be corrected in the near future (past logs will be synced).
diff --git a/src/en/troubleshooting-debug-log.md b/src/en/troubleshooting-debug-log.md
index 7f060b142..cc492a6f7 100644
--- a/src/en/troubleshooting-debug-log.md
+++ b/src/en/troubleshooting-debug-log.md
@@ -1,5 +1,6 @@
 Title: Troubleshooting with debug-log
 
+
 # Troubleshooting with debug-log
 
 When problems arise the first step in determining the cause is to look at the
@@ -18,6 +19,9 @@ For complete syntax, see the [command reference page](./commands.html).
 You can also learn more by running `juju debug-log --help` and `juju help 
 logging`.
 
+See [Juju High Availability](./juju-ha.html#ha-and-logging) when viewing logs
+in an HA context.
+
 
 ## Examples: