Skip to content

Commit

Permalink
finalize discovery mode
Browse files Browse the repository at this point in the history
  • Loading branch information
julien6387 committed Aug 14, 2023
1 parent a2f8c99 commit 1880dfe
Show file tree
Hide file tree
Showing 26 changed files with 924 additions and 661 deletions.
4 changes: 2 additions & 2 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
before they are published, so that it remains functional despite a network failure.
The internal TCP sockets are rebound when a network interface becomes up (requires `psutil`).

* Provide a discovery mode where the **Supvisors** instances are established on-the-fly without declaring them in
* Provide a discovery mode where the **Supvisors** instances are added on-the-fly without declaring them in
the `supvisors_list` option. The function relies on a Multicast Group definition (options `multicast_group`,
`multicast_interface` and `multicast_ttl` added to that purpose).
The attribute `discovery_mode` is added to the `get_state` and `get_instance_info` XML-RPCs.
Expand Down Expand Up @@ -63,7 +63,7 @@

* Do not catch XmlRpc exceptions in the JAVA client.

* Refactoring of the **Supvisors** TCP Publish-Subscribe.
* Refactoring of the **Supvisors** internal communications.


## 0.16 (2023-03-12)
Expand Down
21 changes: 10 additions & 11 deletions docs/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -110,10 +110,9 @@ behavior may happen. The present section details where it is applicable.

``multicast_group``

The IP address and port number of the Multicast Group where the |Supvisors| instances will share information
The IP address and port number of the Multicast Group where the |Supvisors| instances will share their identity
between them, separated by a colon (example: ``239.0.0.1:1234``).
This is an alternative to the ``supvisors_list`` option, replacing the internal TCP Publish / Subscribe,
and it allows |Supvisors| to work in a discovery mode *(Not stable yet)*.
This is an alternative to the ``supvisors_list`` option, that allows |Supvisors| to work in a discovery mode.

*Default*: None.

Expand All @@ -123,11 +122,9 @@ behavior may happen. The present section details where it is applicable.

.. hint::

Although it is an alternative, this option can yet be combined with ``supvisors_list`` and ``core_identifiers``.
In this case, the multicast group is definitely used to exchange information.
The impact is on the ``INITIALIZATION`` state where the status of the declared |Supvisors| instances (as defined
in the ``supvisors_list`` option) will be evaluated before exiting this state and the phase could eventually be
ended when the ``core_identifiers`` are all in a known state before the ``synchro_timeout`` is reached.
Although it is an alternative, this option can yet be combined with ``supvisors_list``.
In this case, the |Supvisors| instances declared in the ``supvisors_list`` option will form an initial group
that may grow when other unknown |Supvisors| instances declare themselves.

``multicast_interface``

Expand Down Expand Up @@ -219,8 +216,10 @@ behavior may happen. The present section details where it is applicable.

The conditions applied by |Supvisors| to exit the ``INITIALIZATION`` state.
Multiple values in [``LIST`` ; ``TIMEOUT`` ; ``CORE`` ; ``USER``], separated by commas.
If ``LIST`` is selected, |Supvisors| exits the ``INITIALIZATION`` state when all the |Supvisors| instances
declared in the ``supvisors_list`` option are no more in the ``UNKNOWN`` state.
If ``STRICT`` is selected, |Supvisors| exits the ``INITIALIZATION`` state when all the |Supvisors| instances
declared in the ``supvisors_list`` option are in the ``RUNNING`` state.
If ``LIST`` is selected, |Supvisors| exits the ``INITIALIZATION`` state when all known |Supvisors| instances
(including those declared in the ``supvisors_list`` option **AND** those discovered) are in the ``RUNNING`` state.
If ``TIMEOUT`` is selected, |Supvisors| exits the ``INITIALIZATION`` state after the duration defined in the
``synchro_timeout`` option.
If ``CORE`` is selected, |Supvisors| exits the ``INITIALIZATION`` state when all the |Supvisors| instances
Expand All @@ -230,7 +229,7 @@ behavior may happen. The present section details where it is applicable.
command of ``supervisorctl``).
The use of this option is more detailed in :ref:`synchronizing`.

*Default*: ``LIST,TIMEOUT,CORE``.
*Default*: ``STRICT,TIMEOUT,CORE``.

*Required*: No.

Expand Down
166 changes: 122 additions & 44 deletions docs/special.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,19 +13,53 @@ To that end, a communication protocol needs to be put in place place between all
Given the objectives of |Supvisors|, a polling mechanism doesn't fit. All |Supervisor| events have to be processed, so
an event-driven protocol is naturally considered.

The XML-RPC protocol provided by |Supervisor| is discarded as it is synchronous and improper to deal with a system
The XML-RPC protocol provided by |Supervisor| is discarded as it is synchronous and thus improper to deal with a system
involving multiple clients.

Communication protocols
~~~~~~~~~~~~~~~~~~~~~~~

2 internal communication protocols have been implemented in |Supvisors|.

TCP Publish-Subscribe
* in standard mode (TCP):
- ``internal_port`` ;
*********************

The main protocol implemented in |Supvisors| is based on a **Publish-Subscribe pattern over TCP**.

Although it was originally based on a PyZmq PUB-SUB, it has been replaced by a custom implementation to limit the
mandatory dependencies and to have a better control over the underlying threads and sockets.

This protocol is initially made up of all |Supvisors| instances declared in the ``supvisors_list`` option of the
``[supvisors]`` section in the |Supervisor| configuration file.

Each entry in the ``supvisors_list`` option defines (even implicitly) the TCP server host and port of each |Supvisors|
instance that the local |Supvisors| instance has to connect to publish its events.

.. note::

Depending on the |Supvisors| configuration, only the TCP server host may be defined in the ``supvisors_list``,
in which case |Supvisors| will take the value of the ``internal_port```option as applicable port for all TCP servers.
UDP Multicast
* in discovery mode (UDP Multicast) :
- ``multicast_group``
*************
The second protocol implemented in |Supvisors| is based on an **UDP Multicast**. It relies on the following options
in the ``[supvisors]`` section in the |Supervisor| configuration file:

* ``multicast_group`` ;
* ``multicast_interface`` ;
* ``multicast_ttl``.

With this protocol, the |Supvisors| instances could be unknown at start-up and will be discovered on-the-fly.
The UDP Multicast group is used to exchange ticks. Upon reception of a tick coming from an unknown |Supvisors| instance,
the local |Supvisors| instance adds the remote |Supvisors| instance into its internal model and opens the TCP
connections with it.

.. note::

Although it has been considered at some point, the idea od having |Supvisors| working only in UDP Multicast,
without the TCP Publish / Subscribe, has been discarded. |Supvisors| cannot afford to lose events or to receive them
in an inappropriate sequence.

Principles of Synchronization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -40,73 +74,114 @@ used for synchronizing multiple instances of |Supervisor|:
* ``synchro_options`` ;
* ``synchro_timeout`` ;
* ``core_identifiers`` ;
* ``auto_fence`` .
* ``auto_fence``.


Once started, all |Supvisors| instances publish the events received, especially the ``TICK`` events that are
triggered every 5 seconds, on their *Publish* socket bound to the ``internal_port``.
Common part
***********

On the other side, all |Supvisors| instances start a thread that subscribes to the internal events
through an internal *Subscribe* socket connected to the ``internal_port`` of all |Supvisors| instances
of the ``supvisors_list``.
Once started, all |Supvisors| instances publish the events received from |Supervisor|, especially the ``TICK`` events
that are triggered every 5 seconds.

At the beginning, all |Supvisors| instances are declared in an ``UNKNOWN`` state.
When the first ``TICK`` event is received from a remote |Supvisors| instance, a hand-shake is performed
between the 2 |Supvisors| instances. The local |Supvisors| instance:

* sets the remote |Supvisors| instance state to ``CHECKING`` ;
* performs a couple of XML-RPC to the remote |Supvisors| instance:

+ ``supvisors.get_master_identifier()`` and ``supvisors.get_supvisors_state()`` in order to know if the remote
instance is already in an established state ;
+ ``supvisors.get_instance_info(local_identifier)`` in order to know how the local |Supvisors| instance is
perceived by the remote |Supvisors| instance.
* performs a ``supvisors.get_instance_info(local_identifier)`` XML-RPC to the remote |Supvisors| instance
in order to know how the local |Supvisors| instance is perceived by the remote |Supvisors| instance.

At this stage, 2 possibilities:

* the local |Supvisors| instance is seen as ``ISOLATED`` by the remote instance:

+ the remote |Supvisors| instance is then reciprocally set to ``ISOLATED`` ;
+ the *URL* of the remote |Supvisors| instance is disconnected from the *Subscribe* socket ;
+ the remote |Supvisors| instance status is then reciprocally set to ``ISOLATED`` ;

* the local |Supvisors| instance is NOT seen as ``ISOLATED`` by the remote instance:

+ a ``supervisor.getAllProcessInfo()`` XML-RPC is requested to the remote instance ;
+ the processes information is loaded into the internal data structure ;
+ the remote |Supvisors| instance is finally set to ``RUNNING``.

When all |Supvisors| instances are identified as ``RUNNING`` or ``ISOLATED``, the synchronization is completed.
|Supvisors| then is able to work with the set (or subset) of |Supvisors| instances declared in ``supvisors_list``.
+ the remote |Supvisors| instance status is set to ``CHECKED``, then ``RUNNING``.

However, it may happen that some |Supvisors| instances do not publish (very late starting, no starting at all,
system down, network down, etc). Each |Supvisors| instance waits for ``synchro_timeout`` seconds to give a chance
to all other instances to publish. When this delay is exceeded, all the |Supvisors| instances that are **not**
identified as ``RUNNING`` or ``ISOLATED`` are set to:

* ``SILENT`` if `Auto-Fencing`_ is **not** activated ;
* ``ISOLATED`` if `Auto-Fencing`_ is activated.

Another possibility is when it is predictable that some |Supvisors| instances may be started later.
For example, the pool of nodes may include servers that will always be started from the very beginning and consoles
that may be started only on demand.
In this case, it would be a pity to always wait for ``synchro_timeout`` seconds. That's why the ``core_identifiers``
attribute has been introduced so that the synchronization phase is considered completed
when a subset of the |Supvisors| instances declared in ``supvisors_list`` are ``RUNNING``.
What happens next will depend on the conditions selected in the ``synchro_options`` option.

Whatever the number of available |Supvisors| instances, |Supvisors| elects a *Master* among the active |Supvisors|
instances and enters the ``DEPLOYMENT`` state to start automatically the applications.

By default, the |Supvisors| *Master* instance is the |Supvisors| instance having the smallest deduced name among all
the active |Supvisors| instances, unless the attribute ``core_identifiers`` is used. In the latter case, candidates
are taken from this list in priority.

.. important:: *About late Supvisors instances*

Back to this case, here is what happens when a |Supvisors| instance is started while the others are already in
``OPERATION``.
When a |Supvisors| instance is started while the others are already in ``OPERATION``.
During the hand-shake, the local |Supvisors| instance gets the *Master* identified by the remote |Supvisors|.
That confirms that the local |Supvisors| instance is a late starter and thus the local |Supvisors| instance adopts
this *Master* too and skips the synchronization phase.


``STRICT`` option
*****************

When the ``STRICT`` option is selected, the synchronization is complete when all |Supvisors| instances declared
in the ``supvisors_list`` option are marked as ``RUNNING``.
This excludes any |Supvisors| instance that has been added to |Supvisors| in discovery mode.

This option prevails over the ``LIST`` and ``USER`` options if combined with them.

``LIST`` option
***************

When the ``LIST`` option is selected, the synchronization is complete when all known |Supvisors| instances are marked
as ``RUNNING``.
This includes the |Supvisors| instances declared in the ``supvisors_list`` option **AND** the |Supvisors| instances
that has been added to |Supvisors| in discovery mode.

This option prevails over the ``USER`` options if combined with it.

``TIMEOUT`` option
******************

It may happen that some declared |Supvisors| instances do not publish (very late starting, no starting at all,
system down, network down, etc).

When the ``TIMEOUT`` option is selected, each |Supvisors| instance waits for ``synchro_timeout`` seconds
to give a chance to all other instances to publish. When this delay is exceeded, all the |Supvisors| instances
that are **not** identified as ``RUNNING`` or ``ISOLATED`` are set to:

* ``SILENT`` if `Auto-Fencing`_ is **not** activated ;
* ``ISOLATED`` if `Auto-Fencing`_ is activated.

This option prevails over all other ``synchro_options`` options if combined with them.

``CORE`` option
***************

Another possibility is when it is predictable that some |Supvisors| instances may be started later.
For example, the pool of nodes may include servers that will always be started from the very beginning and consoles
that may be started only on demand.

In this case, it would be a pity to always wait for ``synchro_timeout`` seconds.
That's why the ``core_identifiers`` attribute has been introduced so that the synchronization phase is considered
completed when a subset of the |Supvisors| instances declared in ``supvisors_list`` are ``RUNNING``.

This option prevails over ``LIST`` and ``USER`` options if combined with them.

``USER`` option
***************

This option is useful in a context where |Supvisors| is running in a system made up of many nodes that may be started
on a random basis and where core |Supvisors| instances cannot be easily identified.

When the ``USER`` option is selected, it allows the user to put an end to the synchronization phase when the set
of running |Supvisors| instances is suitable to the user.

This action can be performed through the |Supvisors| ``end_sync`` XML-RPC (via code, ``supervisorctl`` or
the |Supvisors| Web UI).
This XML-RPC has an optional parameter that allows the user to select the |Supvisors| *Master* instance. If not set,
the default election mechanism applies.


.. _auto_fencing:

Auto-Fencing
Expand Down Expand Up @@ -196,11 +271,12 @@ The following rules are applicable whatever the chosen strategy:

* the process must not be already in a *running* state in a broad sense, i.e. ``RUNNING``, ``STARTING``
or ``BACKOFF`` ;
* the process must be known to the |Supervisor| of the |Supvisors| instance ;
* the |Supvisors| instance must be ``RUNNING`` ;
* the |Supvisors| instance must be allowed in the ``identifiers`` rule of the process ;
* the *load* of the node where multiple |Supvisors| instances may be running must not exceed 100% when adding
the ``expected_loading`` of the program to be started.
* the process must be known to the |Supervisor| of the targeted |Supvisors| instance ;
* the related program must be enabled in the targeted |Supvisors| instance ;
* the targeted |Supvisors| instance must be ``RUNNING`` ;
* the targeted |Supvisors| instance must be allowed in the ``identifiers`` rule of the process ;
* the *load* of the targeted node where multiple |Supvisors| instances may be running must not exceed 100%
when adding the ``expected_loading`` of the program to be started.

The *load* of a |Supvisors| instance is defined as the sum of the ``expected_loading`` of each process running in this
|Supvisors| instance.
Expand All @@ -210,6 +286,8 @@ The *load* of a node is defined as the sum of the loads of the |Supvisors| insta
When applying the ``CONFIG`` strategy, |Supvisors| chooses the first |Supvisors| instance available in the
``supvisors_list``.

TODO: discovery

When applying the ``LESS_LOADED`` strategy, |Supvisors| chooses the |Supvisors| instance in the ``supvisors_list``
having the lowest *load*.
The aim is to distribute the process load among the available |Supvisors| instances.
Expand Down

0 comments on commit 1880dfe

Please sign in to comment.