From a3460c1845df3159eef59ae7b9da46f598890f6e Mon Sep 17 00:00:00 2001
From: Nick Vatamaniuc <nickva@users.noreply.github.com>
Date: Wed, 17 Mar 2021 11:20:53 -0400
Subject: [PATCH] 3.x fair share scheduler documetation (#629)

A short description on how the algorithm works along with the
configuration sections.

Main PR: https://github.com/apache/couchdb/pull/3364
---
 src/docs/src/config/replicator.rst      | 49 +++++++++++++++++++++++++
 src/docs/src/replication/replicator.rst | 45 +++++++++++++++++++++++
 2 files changed, 94 insertions(+)

diff --git a/src/docs/src/config/replicator.rst b/src/docs/src/config/replicator.rst
index 9e78b591e48..9e460266d31 100644
--- a/src/docs/src/config/replicator.rst
+++ b/src/docs/src/config/replicator.rst
@@ -249,3 +249,52 @@ Replicator Database Configuration
 
         .. note::
              In version 2.2, the session plugin is considered experimental and is not enabled by default.
+
+    .. config:option:: usage_coeff
+
+        .. versionadded:: 3.2.0
+
+        Usage coefficient decays historic fair share usage every
+        scheduling cycle. The value must be between 0.0 and 1.0. Lower
+        values will ensure historic usage decays quicker and higher
+        values means it will be remembered longer::
+
+            [replicator]
+            usage_coeff = 0.5
+
+    .. config:option:: priority_coeff
+
+        .. versionadded:: 3.2.0
+
+       Priority coefficient decays all the job priorities such that they slowly
+       drift towards the front of the run queue. This coefficient defines a maximum
+       time window over which this algorithm would operate. For example, if this
+       value is too small (0.1), after a few cycles quite a few jobs would end up at
+       priority 0, and would render this algorithm useless. The default value of
+       0.98 is picked such that if a job ran for one scheduler cycle, then didn't
+       get to run for 7 hours, it would still have priority > 0. 7 hours was picked
+       as it was close enough to 8 hours which is the default maximum error backoff
+       interval::
+
+            [replicator]
+            priority_coeff = 0.98
+
+.. _config/replicator.shares:
+
+Fair Share Replicator Share Allocation
+======================================
+
+.. config:section:: replicator.shares :: Per-Database Fair Share Allocation
+
+    .. config:option:: $replicator_db
+
+        .. versionadded:: 3.2.0
+
+        Fair share configuration section. More shares result in a
+        higher chance that jobs from that db get to run. The default
+        value is 100, minimum is 1 and maximum is 1000. The
+        configuration may be set even if the database does not exist::
+
+            [replicator.shares]
+            _replicator_db = 100
+            $another/_replicator_db = 100
diff --git a/src/docs/src/replication/replicator.rst b/src/docs/src/replication/replicator.rst
index de5393074f6..05a55e6005a 100644
--- a/src/docs/src/replication/replicator.rst
+++ b/src/docs/src/replication/replicator.rst
@@ -21,6 +21,11 @@ Replicator Database
    anymore. There are new replication job states and new API endpoints
    ``_scheduler/jobs`` and ``_scheduler/docs``.
 
+.. versionchanged:: 3.2.0 Fair share scheduling was introduced. Multiple
+   ``_replicator`` databases get an equal chance (configurable) of running
+   their jobs. Previously replication jobs were scheduled without any regard of
+   their originating database.
+
 The ``_replicator`` database works like any other in CouchDB, but
 documents added to it will trigger replications. Create (``PUT`` or
 ``POST``) a document to start replication. ``DELETE`` a replication
@@ -539,6 +544,46 @@ After this operation, replication pulling from server X will be stopped
 and the replications in the ``_replicator`` database (pulling from
 servers A and B) will continue.
 
+Fair Share Job Scheduling
+=========================
+
+When multiple ``_replicator`` databases are used, and the total number
+of jobs on any node is greater than ``max_jobs``, replication jobs
+will be scheduled such that each of the ``_replicator`` databases by
+default get an equal chance of running their jobs.
+
+This is accomplished by assigning a number of "shares" to each
+``_replicator`` database and then automatically adjusting the
+proportion of running jobs to match each database's proportion of
+shares. By default, each ``_replicator`` database is assigned 100
+shares. It is possible to alter the share assignments for each
+individual ``_replicator`` database in the :ref:`[replicator.shares]
+<config/replicator.shares>` configuration section.
+
+The fair share behavior is perhaps easier described with a set of
+examples. Each example assumes the default of ``max_jobs = 500``, and
+two replicator databases: ``_replicator`` and ``another/_replicator``.
+
+Example 1: If ``_replicator`` has 1000 jobs and
+``another/_replicator`` has 10, the scheduler will run about 490 jobs
+from ``_replicator`` and 10 jobs from ``another/_replicator``.
+
+Example 2: If ``_replicator`` has 200 jobs and ``another/_replicator``
+also has 200 jobs, all 400 jobs will get to run as the sum of all the
+jobs is less than the ``max_jobs`` limit.
+
+Example 3: If both replicator databases have 1000 jobs each, the
+scheduler will run about 250 jobs from each database on average.
+
+Example 4: If both replicator databases have 1000 jobs each, but
+``_replicator`` was assigned 400 shares, then on average the scheduler
+would run about 400 jobs from ``_replicator`` and 100 jobs from
+``_another/replicator``.
+
+The proportions described in the examples are approximate and might
+oscillate a bit, and also might take anywhere from tens of minutes to
+an hour to converge.
+
 Replicating the replicator database
 ===================================