From a3460c1845df3159eef59ae7b9da46f598890f6e Mon Sep 17 00:00:00 2001 From: Nick Vatamaniuc Date: Wed, 17 Mar 2021 11:20:53 -0400 Subject: [PATCH] 3.x fair share scheduler documetation (#629) A short description on how the algorithm works along with the configuration sections. Main PR: https://github.com/apache/couchdb/pull/3364 --- src/docs/src/config/replicator.rst | 49 +++++++++++++++++++++++++ src/docs/src/replication/replicator.rst | 45 +++++++++++++++++++++++ 2 files changed, 94 insertions(+) diff --git a/src/docs/src/config/replicator.rst b/src/docs/src/config/replicator.rst index 9e78b591e48..9e460266d31 100644 --- a/src/docs/src/config/replicator.rst +++ b/src/docs/src/config/replicator.rst @@ -249,3 +249,52 @@ Replicator Database Configuration .. note:: In version 2.2, the session plugin is considered experimental and is not enabled by default. + + .. config:option:: usage_coeff + + .. versionadded:: 3.2.0 + + Usage coefficient decays historic fair share usage every + scheduling cycle. The value must be between 0.0 and 1.0. Lower + values will ensure historic usage decays quicker and higher + values means it will be remembered longer:: + + [replicator] + usage_coeff = 0.5 + + .. config:option:: priority_coeff + + .. versionadded:: 3.2.0 + + Priority coefficient decays all the job priorities such that they slowly + drift towards the front of the run queue. This coefficient defines a maximum + time window over which this algorithm would operate. For example, if this + value is too small (0.1), after a few cycles quite a few jobs would end up at + priority 0, and would render this algorithm useless. The default value of + 0.98 is picked such that if a job ran for one scheduler cycle, then didn't + get to run for 7 hours, it would still have priority > 0. 7 hours was picked + as it was close enough to 8 hours which is the default maximum error backoff + interval:: + + [replicator] + priority_coeff = 0.98 + +.. _config/replicator.shares: + +Fair Share Replicator Share Allocation +====================================== + +.. config:section:: replicator.shares :: Per-Database Fair Share Allocation + + .. config:option:: $replicator_db + + .. versionadded:: 3.2.0 + + Fair share configuration section. More shares result in a + higher chance that jobs from that db get to run. The default + value is 100, minimum is 1 and maximum is 1000. The + configuration may be set even if the database does not exist:: + + [replicator.shares] + _replicator_db = 100 + $another/_replicator_db = 100 diff --git a/src/docs/src/replication/replicator.rst b/src/docs/src/replication/replicator.rst index de5393074f6..05a55e6005a 100644 --- a/src/docs/src/replication/replicator.rst +++ b/src/docs/src/replication/replicator.rst @@ -21,6 +21,11 @@ Replicator Database anymore. There are new replication job states and new API endpoints ``_scheduler/jobs`` and ``_scheduler/docs``. +.. versionchanged:: 3.2.0 Fair share scheduling was introduced. Multiple + ``_replicator`` databases get an equal chance (configurable) of running + their jobs. Previously replication jobs were scheduled without any regard of + their originating database. + The ``_replicator`` database works like any other in CouchDB, but documents added to it will trigger replications. Create (``PUT`` or ``POST``) a document to start replication. ``DELETE`` a replication @@ -539,6 +544,46 @@ After this operation, replication pulling from server X will be stopped and the replications in the ``_replicator`` database (pulling from servers A and B) will continue. +Fair Share Job Scheduling +========================= + +When multiple ``_replicator`` databases are used, and the total number +of jobs on any node is greater than ``max_jobs``, replication jobs +will be scheduled such that each of the ``_replicator`` databases by +default get an equal chance of running their jobs. + +This is accomplished by assigning a number of "shares" to each +``_replicator`` database and then automatically adjusting the +proportion of running jobs to match each database's proportion of +shares. By default, each ``_replicator`` database is assigned 100 +shares. It is possible to alter the share assignments for each +individual ``_replicator`` database in the :ref:`[replicator.shares] +` configuration section. + +The fair share behavior is perhaps easier described with a set of +examples. Each example assumes the default of ``max_jobs = 500``, and +two replicator databases: ``_replicator`` and ``another/_replicator``. + +Example 1: If ``_replicator`` has 1000 jobs and +``another/_replicator`` has 10, the scheduler will run about 490 jobs +from ``_replicator`` and 10 jobs from ``another/_replicator``. + +Example 2: If ``_replicator`` has 200 jobs and ``another/_replicator`` +also has 200 jobs, all 400 jobs will get to run as the sum of all the +jobs is less than the ``max_jobs`` limit. + +Example 3: If both replicator databases have 1000 jobs each, the +scheduler will run about 250 jobs from each database on average. + +Example 4: If both replicator databases have 1000 jobs each, but +``_replicator`` was assigned 400 shares, then on average the scheduler +would run about 400 jobs from ``_replicator`` and 100 jobs from +``_another/replicator``. + +The proportions described in the examples are approximate and might +oscillate a bit, and also might take anywhere from tens of minutes to +an hour to converge. + Replicating the replicator database ===================================