diff --git a/CHANGES.md b/CHANGES.md index d6d31855..de618325 100644 --- a/CHANGES.md +++ b/CHANGES.md @@ -1,6 +1,6 @@ # Change Log -## 0.11 (2022-01-xx) +## 0.11 (2022-01-02) * Fixed [Issue #99](https://github.com/julien6387/supvisors/issues/99). Update the **Supvisors** design so that it can be used to supervise multiple Supervisor instances on multiple nodes. @@ -23,8 +23,10 @@ configuration file. This option accepts a more complex definition: `host_name:http_port:internal_port`. Note that the simple `host_name` is still supported in the event where **Supvisors** doesn't have to deal with multiple Supervisor instances on the same node. + - The `core_identifiers` option has been added to replace `force_synchro_if` in the **Supvisors** section of the + Supervisor configuration file. It targets the names deduced from the `supvisors_list` option. - The `identifiers` option has been added to replace the `addresses` option in the **Supvisors** rules file. - This option targets the `identifier` elements of the `supvisors_list` option (or the `host_name`, as previously). + This option targets the names deduced from the `supvisors_list` option. - The `address`-like attributes, XML-RPCs and options are deprecated and will be removed in the next version. * Fixed [Issue #98](https://github.com/julien6387/supvisors/issues/98). @@ -84,12 +86,8 @@ * Add class "action" to Web UI buttons that trigger an XML-RPC. -* Apply python f-strings. - * Switch from Travis-CI to GitHub Actions for continuous integration. -* Update documentation. - ## 0.10 (2021-09-05) @@ -98,8 +96,6 @@ * Add targets **Python 3.7** and **Python 3.8** to Travis-CI. -* Update documentation. - ## 0.9 (2021-08-31) @@ -124,7 +120,7 @@ * When ``stop_sequence`` is not set in the rules files, it is defaulted to the ``start_sequence`` value. With the new stop sequence logic, the stop sequence is by default exactly the opposite of the start sequence. -* Fixed Nodes column width for `supervisorctl application_rules`. +* Fixed Nodes' column width for `supervisorctl application_rules`. * `CHANGES.rst` replaced with `CHANGES.md`. @@ -132,8 +128,6 @@ * A 'Gathering' configuration has been added to the **Supvisors** use cases. It combines all uses cases. -* Update documentation. - ## 0.8 (2021-08-22) @@ -188,12 +182,10 @@ * 'Scenario 2' has been added to the **Supvisors** use cases. -* A script `breed.py` has been added to the install package. +* A script `breed.py` has been added to the installation package. It can be used to duplicate the applications based on a template configuration and more particularly used to prepare the Scenario 2 of the **Supvisors** use cases. -* Update documentation. - ## 0.7 (2021-08-15) @@ -240,8 +232,6 @@ * Start adding use cases to documentation, inspired by real examples. 'Scenario 1' has been added. -* Documentation updated. - ## 0.6 (2021-08-01) @@ -284,12 +274,10 @@ * Include this Change Log to documentation. -* Documentation updated. - ## 0.5 (2021-03-01) -* New option `force_synchro_if` to force the end of the synchronization phase when a subset of nodes are active. +* New option `force_synchro_if` to force the end of the synchronization phase when a subset of nodes is active. * New starting strategy `LOCAL` added to command the starting of an application on the local node only. @@ -319,8 +307,6 @@ * Logs (especially `debug` and `trace`) updated to remove printed objects. -* Documentation updated. - ## 0.4 (2021-02-14) @@ -332,8 +318,6 @@ * Fixed exception when rules files is not provided. -* Documentation updated. - ## 0.3 (2020-12-29) @@ -349,8 +333,6 @@ * 100% coverage reached in unit tests. -* Documentation updated. - ## 0.2 (2020-12-14) @@ -381,8 +363,6 @@ * Docs target added to Travis-CI. -* Documentation formatting issues fixed. - ## 0.1 (2017-08-11) diff --git a/README.md b/README.md index c2205b75..88441f1e 100644 --- a/README.md +++ b/README.md @@ -5,7 +5,7 @@ **Supvisors** is a Control System for Distributed Applications, based on -multiple instances of Supervisor. +multiple instances of Supervisor running over multiple nodes. The main features are: * a new web-based dashboard that replaces the default dashboard of Supervisor, @@ -47,13 +47,15 @@ but is not maintained anymore. **Supvisors** has dependencies on: -Package | Release | Optional -----------------------------------------------------|------------|--------- -[Supervisor](http://supervisord.org) | 4.2.1 | -[PyZMQ](http://pyzmq.readthedocs.io) | 20.0.0 | -[psutil](https://pypi.python.org/pypi/psutil) | 5.7.3 | X -[matplotlib](http://matplotlib.org) | 3.3.3 | X -[lxml](http://lxml.de) | 4.6.2 | X +|-----------------------------------------------|---------|----------| +| Package | Release | Optional | +|-----------------------------------------------|---------|----------| +| [Supervisor](http://supervisord.org) | 4.2.4 | | +| [PyZMQ](http://pyzmq.readthedocs.io) | 20.0.0 | | +| [psutil](https://pypi.python.org/pypi/psutil) | 5.7.3 | X | +| [matplotlib](http://matplotlib.org) | 3.3.3 | X | +| [lxml](http://lxml.de) | 4.6.2 | X | +|-----------------------------------------------|---------|----------| Please note that some of these dependencies may have their own dependencies. diff --git a/docs/configuration.rst b/docs/configuration.rst index e00f844a..4c0449c9 100644 --- a/docs/configuration.rst +++ b/docs/configuration.rst @@ -9,6 +9,20 @@ Supervisor's Configuration File This section explains how |Supvisors| uses and complements the `Supervisor configuration `_. +As written in the introduction, all |Supervisor| instances **MUST** be configured with an internet socket. +``username`` and ``password`` can be used at the condition that the same values are used for all |Supervisor| instances. + +.. code-block:: ini + + [inet_http_server] + port=:60000 + ;username=lecleach + ;password=p@$$w0rd + +Apart from the ``rpcinterface`` and ``ctlplugin`` sections related to |Supvisors|, all |Supervisor| instances can have +a completely different configuration, including the list of programs. + + .. _supvisors_section: rpcinterface extension point @@ -22,85 +36,140 @@ rpcinterface extension point supervisor.rpcinterface_factory = supvisors.plugin:make_supvisors_rpcinterface The parameters of |Supvisors| are set in this section of the |Supervisor| configuration file. -It is expected that all |Supvisors| *instances* use the same configuration (excluding included files and logger -parameters) or it may lead to unpredictable behavior. - -``address_list`` - - The list of node names where |Supvisors| will be running, separated by commas. +It is expected that some parameters are strictly identical for all |Supvisors| instances otherwise unpredictable +behavior may happen. The present section details where it is applicable. + +``supvisors_list`` + + The exhaustive list of |Supvisors| instances to handle, separated by commas. + Each element should match the following format: ``host_name:http_port:internal_port``, + where ``identifier`` is the optional but **unique** |Supervisor| identifier (it can be set in the |Supervisor| + configuration or in the command line when starting the ``supervisord`` program) ; + ``host_name`` is the name of the node where the |Supvisors| instance is running ; + ``http_port`` is the port of the internet socket used to communicate with the |Supervisor| instance (obviously + unique per node) ; + ``internal_port`` is the port of the socket used by the |Supvisors| instance to publish internal events (also + unique per node). + The value of ``supvisors_list`` defines how the |Supvisors| instances will share information between them and must + be identical to all |Supvisors| instances or unpredictable behavior may happen. *Default*: the local host name. *Required*: No. + *Identical*: Yes. + + .. note:: + + Actually, only the ``host_name`` is strictly required. + + if ``http_port`` or ``internal_port`` are not provided, the local |Supvisors| instance takes the assumption + that the other |Supvisors| instance uses the same ``http_port`` and ``internal_port``. + In this case, the outcome is that there cannot be 2 |Supvisors| instances on the same node. + + ``identifier`` can be seen as a nickname that may be more user-friendly than a ``host_name`` or a + ``host_name:http_port`` when displayed in the |Supvisors| Web UI or used in the `Supvisors' Rules File`_. + + .. important:: *About the deduced names* + + Depending on the value chosen, the *deduced name* of the |Supvisors| instance may vary. As this name is expected + to be used in the rules files to define where the processes can be started, it is important to understand how + it is built. + + As a general rule, ``identifier`` takes precedence as a deduced name when set. Otherwise ``host_name`` is + used when set alone, unless a ``http_port`` is explicitly defined, in which case ``host_name:http_port`` + will be used. + A few examples: + + +------------------------------------+---------------------+ + | Configured name | Deduced name | + +====================================+=====================+ + | ``10.0.0.1:8888:`` | ``supervisor_01`` | + +------------------------------------+---------------------+ + | ``10.0.0.1`` | ``supervisor_01`` | + +------------------------------------+---------------------+ + | ``10.0.0.1`` | ``10.0.0.1`` | + +------------------------------------+---------------------+ + | ``10.0.0.1:8888:8889`` | ``10.0.0.1:8888`` | + +------------------------------------+---------------------+ + + In case of doubt, the |Supvisors| Web UI displays the deduced names in the Supervisors navigation menu. + The names can also be found at the beginning of the |Supvisors| log traces. + + The recommendation is to uniformly use the |Supervisor| identifier. + .. attention:: - The node names are expected to be known to every nodes in the list. + The host names are expected to be known to every nodes in the list. If it's not the case, check the network configuration. .. hint:: - If the |psutil| module is installed, it is possible to use IP addresses in addition to node names. + If the |psutil| module is installed, IP addresses can be used in place of host names. - Like the node names, the IP addresses are expected to be known to every nodes in the list. + Like the host names, the IP addresses are expected to be known to every nodes in the list. If it's not the case, check the network configuration. - Choosing an IP address may change the network interface used by |Supvisors| to share information. +``address_list`` + *DEPRECATED* Please use ``supvisors_list``. This parameter will be removed in the next |Supvisors| version. ``rules_files`` - A list of paths to XML rules files, in the same format as the one used for + A space-separated sequence of file globs, in the same vein as `supervisord include section `_. - Their content is described in `Supvisors' Rules File`_. + Instead of ``ini`` files, XML rules files are expected here. Their content is described in `Supvisors' Rules File`_. + It is highly recommended that this parameter is identical to all |Supvisors| instances or the startup sequence would + be different depending on which |Supvisors| instance is the *Master*. *Default*: None. *Required*: No. -``rules_file`` - - **Obsolete. Will be removed in next version. Please use ``rules_files`` instead.** - The absolute or relative path of the XML rules file. The contents of this file is described in - `Supvisors' Rules File`_. + *Identical*: Yes. - *Default*: None. - - *Required*: No. +``rules_file`` + *DEPRECATED* Please use ``rules_files``. This parameter will be removed in the next |Supvisors| version. ``auto_fence`` - When true, |Supvisors| won't try to reconnect to a |Supvisors| instance that is inactive. + When true, |Supvisors| will definitely disconnect a |Supvisors| instance that is inactive. This functionality is detailed in :ref:`auto_fencing`. *Default*: ``false``. *Required*: No. + *Identical*: No. + ``internal_port`` - The internal port number used to publish local events to remote |Supvisors| instances. + The internal port number used to publish the local events to the other |Supvisors| instances. Events are published through a PyZMQ TCP socket. + The value must match the ``internal_port`` value of the corresponding |Supvisors| instance in ``supvisors_list``. - *Default*: ``65001``. + *Default*: local |Supervisor| HTTP port + 1. *Required*: No. + *Identical*: No. ``event_port`` - The port number used to publish all |Supvisors| events (Address, Application and Process events). - Events are published through a PyZMQ TCP socket. The protocol of this interface is explained + The port number used to publish all |Supvisors| events (Instance, Application and Process events). + Events are published through a PyZMQ TCP socket. The protocol of this interface is detailed in :ref:`event_interface`. - *Default*: ``65002``. + *Default*: local |Supervisor| HTTP port + 2. *Required*: No. + *Identical*: No. + ``synchro_timeout`` - The time in seconds that |Supvisors| waits for all expected |Supvisors| instances to publish. + The time in seconds that |Supvisors| waits for all expected |Supvisors| instances to publish their TICK. Value in [``15`` ; ``1200``]. This use of this option is detailed in :ref:`synchronizing`. @@ -108,36 +177,54 @@ parameters) or it may lead to unpredictable behavior. *Required*: No. -``force_synchro_if`` + *Identical*: No. + +``core_identifiers`` - A subset of ``address_list``, separated by commas. If the nodes of this subset are all ``RUNNING``, this will put - an end to the synchronization phase in |Supvisors|. - If not set, |Supvisors| waits for all expected |Supvisors| instances to publish until ``synchro_timeout``. + A subset of the names deduced from ``supvisors_list``, separated by commas. If the |Supvisors| instances of this + subset are all in a ``RUNNING`` state, this will put an end to the synchronization phase in |Supvisors|. + When not set, |Supvisors| waits for all expected |Supvisors| instances to publish their TICK until + ``synchro_timeout`` seconds. + This parameter must be identical to all |Supvisors| instances or unpredictable behavior may happen. *Default*: None. *Required*: No. + *Identical*: Yes. + +``force_synchro_if`` + + *DEPRECATED* Please use ``core_identifiers``. This parameter will be removed in the next |Supvisors| version. + ``starting_strategy`` - The strategy used to start applications on nodes. + The strategy used to start applications on |Supvisors| instances. Possible values are in { ``CONFIG``, ``LESS_LOADED``, ``MOST_LOADED``, ``LOCAL`` }. The use of this option is detailed in :ref:`starting_strategy`. + It is highly recommended that this parameter is identical to all |Supvisors| instances or the startup sequence would + be different depending on which |Supvisors| instance is the *Master*. *Default*: ``CONFIG``. *Required*: No. + *Identical*: Yes. + ``conciliation_strategy`` The strategy used to solve conflicts upon detection that multiple instances of the same program are running. Possible values are in { ``SENICIDE``, ``INFANTICIDE``, ``USER``, ``STOP``, ``RESTART``, ``RUNNING_FAILURE`` }. The use of this option is detailed in :ref:`conciliation`. + It is highly recommended that this parameter is identical to all |Supvisors| instances or the conciliation phase + would behave differently depending on which |Supvisors| instance is the *Master*. *Default*: ``USER``. *Required*: No. + *Identical*: Yes. + ``stats_enabled`` By default, |Supvisors| can provide basic statistics on the node and the processes spawned by |Supervisor| @@ -148,6 +235,8 @@ parameters) or it may lead to unpredictable behavior. *Required*: No. + *Identical*: No. + ``stats_periods`` The list of periods for which the statistics will be provided in the |Supvisors| :ref:`dashboard`, separated by @@ -157,6 +246,8 @@ parameters) or it may lead to unpredictable behavior. *Required*: No. + *Identical*: No. + ``stats_histo`` The depth of the statistics history. Value in [``10`` ; ``1500``]. @@ -165,6 +256,8 @@ parameters) or it may lead to unpredictable behavior. *Required*: No. + *Identical*: No. + ``stats_irix_mode`` The way of presenting process CPU values. @@ -175,6 +268,8 @@ parameters) or it may lead to unpredictable behavior. *Required*: No. + *Identical*: No. + The logging options are strictly identical to |Supervisor|'s. By the way, it is the same logger that is used. These options are more detailed in `supervisord Section values `_. @@ -190,6 +285,8 @@ These options are more detailed in *Required*: No. + *Identical*: No. + ``logfile_maxbytes`` The maximum number of bytes that may be consumed by the |Supvisors| activity log file before it is rotated @@ -200,6 +297,8 @@ These options are more detailed in *Required*: No. + *Identical*: No. + ``logfile_backups`` The number of backups to keep around resulting from |Supvisors| activity log file rotation. @@ -209,6 +308,8 @@ These options are more detailed in *Required*: No. + *Identical*: No. + ``loglevel`` The logging level, dictating what is written to the |Supvisors| activity log. @@ -220,11 +321,13 @@ These options are more detailed in *Required*: No. + *Identical*: No. + ctlplugin extension point ~~~~~~~~~~~~~~~~~~~~~~~~~ -|Supvisors| extends also `supervisorctl `_. +|Supvisors| also extends `supervisorctl `_. This feature is not described in |Supervisor| documentation. .. code-block:: ini @@ -240,14 +343,13 @@ Configuration File Example [inet_http_server] port=:60000 + ;username=lecleach + ;password=p@$$w0rd [supervisord] logfile=./log/supervisord.log - logfile_backups=2 loglevel=info pidfile=/tmp/supervisord.pid - nodaemon=false - umask=002 [rpcinterface:supervisor] supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface @@ -256,26 +358,27 @@ Configuration File Example serverurl=http://localhost:60000 [include] - files = */*.ini + files = common/*/*.ini %(host_node_name)s/*.ini %(host_node_name)s/*/*.ini - # Supvisors dedicated part [rpcinterface:supvisors] supervisor.rpcinterface_factory = supvisors.plugin:make_supvisors_rpcinterface - address_list = cliche01,cliche03,cliche02,cliche04 - rules_file = ./etc/my_movies.xml + supvisors_list = cliche81,192.168.1.49,cliche83:60000:60001,cliche84 + rules_files = ./etc/my_movies*.xml auto_fence = false internal_port = 60001 event_port = 60002 synchro_timeout = 20 - starting_strategy = LESS_LOADED - conciliation_strategy = INFANTICIDE + core_identifiers = cliche81,cliche82 + starting_strategy = CONFIG + conciliation_strategy = USER stats_enabled = true stats_periods = 5,60,600 stats_histo = 100 + stats_irix_mode = false logfile = ./log/supvisors.log logfile_maxbytes = 50MB logfile_backups = 10 - loglevel = info + loglevel = debug [ctlplugin:supvisors] supervisor.ctl_factory = supvisors.supvisorsctl:make_supvisors_controller_plugin @@ -295,13 +398,14 @@ and the quality of service expected. It relies on the |Supervisor| group and pro It is important to notice that all applications declared in this file will be considered as *Managed* by |Supvisors|. The main consequence is that |Supvisors| will try to ensure that one single instance of the program - is running over all the nodes considered. If two instances of the same program are running on two different nodes, - |Supvisors| will consider this as a conflict. Only the *Managed* applications have an entry in the navigation menu - of the |Supvisors| web page. + is running over all the |Supvisors| instances considered. If two instances of the same program are running in two + different |Supvisors| instances, |Supvisors| will consider this as a conflict. + Only the *Managed* applications have an entry in the navigation menu of the |Supvisors| Web UI. The groups declared in |Supervisor| configuration files and not declared in a rules file will thus be considered as *Unmanaged* by |Supvisors|. So they have no entry in the navigation menu of the |Supvisors| web page. - There can be as many running instances of the same program as |Supervisor| allows over the available nodes. + There can be as many running instances of the same program as |Supervisor| allows over the available |Supvisors| + instances. If the `lxml `_ package is available on the system, |Supvisors| uses it to validate the XML rules files before they are used. @@ -343,40 +447,45 @@ Here follows the definition of the attributes and rules applicable to an ``appli In the introduction, it is written that the aim of |Supvisors| is to manage distributed applications. However, it may happen that some applications are not designed to be distributed (for example due to inter-process - communication design) and thus distributing the application processes over a set of nodes would just make + communication design) and thus distributing the application processes over multiple nodes would just make the application non operational. - If set to ``true``, |Supvisors| will start all the application processes on the same node, provided that a node - can be found based on the application rules ``starting_strategy`` and ``addresses``. + If set to ``true``, |Supvisors| will start all the application processes on the same |Supvisors| instance, + provided that such a |Supvisors| instance can be found based on the application rules ``starting_strategy`` + and ``identifiers``. *Default*: ``true``. *Required*: No. -``addresses`` +``identifiers`` - This element is only used when ``distributed`` is set to ``false`` and gives the list of nodes where the application - programs can be started. The node names are to be taken from the ``address_list`` defined in - `rpcinterface extension point`_ or from the declared `Node aliases`_, and separated by commas. + This element is only used when ``distributed`` is set to ``false`` and gives the list of |Supvisors| instances + where the application programs can be started. The names are to be taken from the names deduced from the + ``supvisors_list`` parameter defined in `rpcinterface extension point`_ or from the declared `Instance aliases`_, + and separated by commas. Special values can be applied. - The wildcard ``*`` stands for all node names in ``address_list``. - Any node list including a ``*`` is strictly equivalent to ``*`` alone. + The wildcard ``*`` stands for all names deduced from ``supvisors_list``. + Any name list including a ``*`` is strictly equivalent to ``*`` alone. - The hashtag ``#`` can be used in a ``pattern`` definition and eventually complemented by a list of nodes. - The aim is to assign the Nth node of either ``address_list`` or of the subsequent node list to the Nth instance - of the application, **assuming that 'N' is provided at the end of the application name, preceded by a dash or - an underscore**. - An example will be given in `Using patterns and hashtags`_. + The hashtag ``#`` can be used in a ``pattern`` definition and eventually complemented by a list of deduced names. + The aim is to assign the Nth deduced name of ``supvisors_list`` or the Nth name of the subsequent list (made of + names deduced from ``supvisors_list``) to the Nth instance of the application, **assuming that 'N' is provided + at the end of the application name, preceded by a dash or an underscore**. + Yeah, a bit tricky to explain... Examples will be given in `Using patterns and hashtags`_. *Default*: ``*``. *Required*: No. -.. note:: +.. attention:: - When the application is not to be distributed (``distributed`` set to ``false``), the rule ``addresses`` of the + When the application is not to be distributed (``distributed`` set to ``false``), the rule ``identifiers`` of the application programs is not considered. +``addresses`` + + *DEPRECATED* Please use ``identifiers``. This parameter will be removed in the next |Supvisors| version. ``start_sequence`` @@ -410,7 +519,7 @@ Here follows the definition of the attributes and rules applicable to an ``appli ``starting_strategy`` - The strategy used to start applications on nodes. + The strategy used to start applications on |Supvisors| instances. Possible values are in { ``CONFIG``, ``LESS_LOADED``, ``MOST_LOADED``, ``LOCAL`` }. The use of this option is detailed in :ref:`starting_strategy`. @@ -430,8 +539,9 @@ Here follows the definition of the attributes and rules applicable to an ``appli ``running_failure_strategy`` - This element gives the strategy applied when the application loses running processes due to a node that becomes - silent (crash, power down, network failure, etc). This value can be superseded by the value set at program level. + This element gives the strategy applied when the application loses running processes due to a |Supvisors| instance + that becomes silent (crash, power down, network failure, etc). + This value can be superseded by the value set at program level. The possible values are { ``CONTINUE``, ``RESTART_PROCESS``, ``STOP_APPLICATION``, ``RESTART_APPLICATION`` } and are detailed in :ref:`running_failure_strategy`. @@ -477,23 +587,29 @@ The ``program`` element defines the rules applicable to at least one program. Th *Required*: Yes, unless an attribute ``name`` is provided. -``addresses`` +``identifiers`` - This element gives the list of nodes where the program can be started. The node names are to be taken from - the ``address_list`` defined in `rpcinterface extension point`_ or from the declared `Node aliases`_, - and separated by commas. Special values can be applied. + This element gives the list of |Supvisors| instances where the program can be started. + The names are to be taken from the names deduced from the ``supvisors_list`` parameter defined in the + `rpcinterface extension point`_ or from the declared `Instance aliases`_, and separated by commas. + Special values can be applied. - The wildcard ``*`` stands for all node names in ``address_list``. - Any node list including a ``*`` is strictly equivalent to ``*`` alone. + The wildcard ``*`` stands for all names deduced from ``supvisors_list``. + Any name list including a ``*`` is strictly equivalent to ``*`` alone. - The hashtag ``#`` can be used in a ``pattern`` definition and eventually complemented by a list of nodes. - The aim is to assign the Nth node of either ``address_list`` or the subsequent node list to the Nth instance - of the program in a homogeneous process group. An example will be given in `Using patterns and hashtags`_. + The hashtag ``#`` can be used in a ``pattern`` definition and eventually complemented by a list of deduced names. + The aim is to assign the Nth deduced name of ``supvisors_list`` or the Nth name of the subsequent list (made of + names deduced from ``supvisors_list``) to the Nth instance of the program in a homogeneous process group. + Examples will be given in `Using patterns and hashtags`_. *Default*: ``*``. *Required*: No. +``addresses`` + + *DEPRECATED* Please use ``identifiers``. This parameter will be removed in the next |Supvisors| version. + ``required`` This element gives the importance of the program for the application. @@ -539,8 +655,8 @@ The ``program`` element defines the rules applicable to at least one program. Th This element gives the expected percent usage of *resources*. The value is a estimation and the meaning in terms of resources (CPU, memory, network) is in the user's hands. - When multiple nodes are available, |Supvisors| uses the ``expected_loading`` value to distribute the processes over - the available nodes, so that the system remains safe. + When multiple |Supvisors| instances are available, |Supvisors| uses the ``expected_loading`` value to distribute + the processes over the available |Supvisors| instances, so that the system remains safe. *Default*: ``0``. @@ -557,8 +673,8 @@ The ``program`` element defines the rules applicable to at least one program. Th ``running_failure_strategy`` - This element gives the strategy applied when the process is running on a node that becomes silent (crash, power - down, network failure, etc). This value supersedes the value set at application level. + This element gives the strategy applied when the process is running in a |Supvisors| instance that becomes silent + (crash, power down, network failure, etc). This value supersedes the value set at application level. The possible values are { ``CONTINUE``, ``RESTART_PROCESS``, ``STOP_APPLICATION``, ``RESTART_APPLICATION`` } and their impact is detailed in :ref:`running_failure_strategy`. @@ -589,7 +705,7 @@ Here follows an example of a ``program`` definition: .. code-block:: xml - cliche01,cliche03,cliche02 + cliche01,cliche03,cliche02 true 1 1 @@ -610,12 +726,12 @@ It can be used to configure a set of programs in a more flexible way than just c like |Supervisor| does. The same ``program`` options are applicable, whatever a ``name`` attribute or a ``pattern`` attribute is used. -For a ``pattern`` attribute, a substring matching one |Supervisor| program name or more is expected. +For a ``pattern`` attribute, a substring (*not a regexp*) matching one |Supervisor| program name or more is expected. .. code-block:: xml - cliche01,cliche03,cliche02 + cliche01,cliche03,cliche02 2 true @@ -649,6 +765,8 @@ their own application. Unfortunately, using *homogeneous* program groups with ``numprocs`` set to N cannot help in the present case because |Supervisor| considers the program name in the group and not the ``process_name``. +.. hint:: + As it may be a bit clumsy to define the N definition sets, a script :command:`supvisors_breed` is provided in |Supvisors| package to help the user to duplicate an application from a template. Use examples can be found in the |Supvisors| use cases :ref:`scenario_2` and :ref:`scenario_3`. @@ -659,8 +777,8 @@ their own application. Using patterns and hashtags ~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Using a hashtag ``#`` in the program ``addresses`` is designed for a program that is meant to be started on every nodes -of the node list, or on a subset of them. +Using a hashtag ``#`` in the program ``identifiers`` is designed for a program that is meant to be started on every +|Supvisors| instances available, or on a subset of them. As an example, based on the following simplified |Supervisor| configuration: @@ -668,7 +786,7 @@ As an example, based on the following simplified |Supervisor| configuration: [rpcinterface:supvisors] supervisor.rpcinterface_factory = supvisors.plugin:make_supvisors_rpcinterface - address_list = cliche01,cliche02,cliche03,cliche04,cliche05 + supvisors_list = cliche01,cliche02,cliche03,cliche04,cliche05 [program:prg] process_name=prg_%(process_num)02d @@ -680,13 +798,13 @@ Without this option, it is necessary to define rules for all instances of the pr .. code-block:: xml - cliche01 + cliche01 - cliche05 + cliche05 Now with this option, the rule becomes more simple. @@ -694,21 +812,21 @@ Now with this option, the rule becomes more simple. .. code-block:: xml - # + # -It is also possible to give a subset of nodes only. +It is also possible to give a subset of deduced names. .. code-block:: xml - #,cliche04,cliche02 + #,cliche04,cliche02 .. note:: - Nodes are chosen in accordance with the sequence given in ``address_list`` or in the subsequent list. - In the second example above, :program:`prg_01` will be assigned to ``cliche04`` and :program:`prg_02` to + |Supvisors| instances are chosen in accordance with the sequence given in ``supvisors_list`` or in the subsequent + list. In the second example above, :program:`prg_01` will be assigned to ``cliche04`` and :program:`prg_02` to ``cliche02``. |Supvisors| does take into account the start index defined in ``numprocs_start``. @@ -716,30 +834,31 @@ It is also possible to give a subset of nodes only. .. important:: In the program configuration file, it is expected that the ``numprocs`` value matches the number of elements in - ``address_list``. + ``supvisors_list``. - If the number of nodes in ``address_list`` is greater than the ``numprocs`` value, programs will - be assigned to the ``numprocs`` first nodes. + If the number of elements in ``supvisors_list`` is greater than the ``numprocs`` value, programs will + be assigned to the ``numprocs`` first |Supvisors| instances. - On the other side, if the number of nodes in ``address_list`` is lower than the ``numprocs`` value, - the last programs won't be assigned to any node and it won't be possible to start them using |Supvisors|, - as the list of applicable nodes will be empty. - Nevertheless, in this case, it will be still possible to start them with |Supervisor|. + On the other side, if the number of elements in ``supvisors_list`` is lower than the ``numprocs`` value, + the last programs won't be assigned to any |Supvisors| instance and it won't be possible to start them using + |Supvisors|. Nevertheless, in this case, it will be still possible to start them with |Supervisor| directly. .. attention:: - As pointed out just before, |Supvisors| takes the information from the program configuration file. So this function - will definitely NOT work if the program is unknown to the local |Supervisor|. + As pointed out just before, |Supvisors| takes the information from the program configuration. So this function + will definitely NOT work if the program is unknown to the local |Supervisor|, which is a relevant use case. + As written before, the |Supervisor| configuration can be different for all |Supvisors| instances, including + the definition of groups and programs. .. important:: *Convention for application names when using patterns and hashtags* - When the hashtag is used for application ``addresses``, |Supvisors| cannot rely on the |Supervisor| configuration - to map the application instances to the nodes. + When the hashtag is used for the application ``identifiers``, |Supvisors| cannot rely on the |Supervisor| + configuration to map the application instances to the |Supvisors| instances. By convention, the application name MUST end with ``-N`` or ``_N``. The Nth application will be mapped to the Nth - node of the list, i.e. the node at index ``N-1`` in the list. + deduced name of the list, i.e. the name at index ``N-1`` in the list. - ``N`` must be striclty positive. Zero-padding is allowed, as long as ``N`` can be converted into an integer. + ``N`` must be strictly positive. Zero-padding is allowed, as long as ``N`` can be converted into an integer. ```` rules @@ -757,7 +876,7 @@ Here follows an example of model: .. code-block:: xml - cliche01,cliche02,cliche03 + cliche01,cliche02,cliche03 1 false false @@ -778,12 +897,12 @@ Here follows examples of ``program`` definitions referencing a model: -Node aliases -~~~~~~~~~~~~ +Instance aliases +~~~~~~~~~~~~~~~~ -When dealing with long lists of nodes, the content of application or program ``addresses`` options may impair -the readability of the rules file. It is possible to declare node aliases and to use the alias names in place -of the node names in the ``addresses`` option. +When dealing with long lists of |Supvisors| instances, the content of application or program ``identifiers`` options +may impair the readability of the rules file. It is possible to declare instance aliases and to use the alias names +in place of the deduced names in the ``identifiers`` option. Here follows a few usage examples: @@ -796,11 +915,11 @@ Here follows a few usage examples: servers,consoles - consoles + consoles - servers,consoles + servers,consoles .. hint:: *About aliases referencing other aliases* @@ -808,8 +927,8 @@ Here follows a few usage examples: Based on the previous example, an alias referencing other aliases will only work if it is placed *before* the aliases referenced. - At some point, the resulting node names are checked against the ``address_list`` - of the `rpcinterface extension point`_ so any unknown node name or remaining alias will simply be discarded. + At some point, the resulting names are checked against the names deduced from the ``supvisors_list`` parameter + of the `rpcinterface extension point`_ so any unknown name or remaining alias will simply be discarded. .. code-block:: xml @@ -833,57 +952,46 @@ Here follows a complete example of a rules file. It is used in |Supvisors| self + + #,cliche82,cliche83:60000,cliche84 + cliche82,cliche81 + - cliche81 + cliche81 5 disk_01 - 192.168.1.49 + cliche82 disk_01 - cliche83 + cliche83:60000 - * + * 25 - - - 1 - 4 - - - * - 1 - 1 - - - - 2 STOP - #,192.168.1.49,cliche83,cliche84 + distribute_sublist 1 - 2 true 0 - cliche81 + cliche81 2 - 1 true true 25 @@ -894,18 +1002,16 @@ Here follows a complete example of a rules file. It is used in |Supvisors| self 3 - 3 - # + # 1 - 1 5 CONTINUE - #,cliche81,cliche83 + #,cliche81,cliche83:60000 2 true 25 @@ -916,28 +1022,27 @@ Here follows a complete example of a rules file. It is used in |Supvisors| self 4 - 2 CONFIG CONTINUE - * + * 1 - 2 + 3 true 5 RESTART_APPLICATION - cliche84 + cliche84 2 true 3 - 192.168.1.49,cliche81 + consoles 3 1 10 @@ -958,17 +1063,17 @@ Here follows a complete example of a rules file. It is used in |Supvisors| self disk_01 - * + * converter - cliche83,cliche81,192.168.1.49 + cliche83:60000,cliche81,cliche82 converter - cliche81,cliche83,192.168.1.49 + cliche81,cliche83:60000,cliche82 @@ -980,7 +1085,7 @@ Here follows a complete example of a rules file. It is used in |Supvisors| self false - cliche81,cliche83 + cliche81,cliche83:60000 5 MOST_LOADED ABORT @@ -1002,11 +1107,11 @@ Here follows a complete example of a rules file. It is used in |Supvisors| self 6 - 1 + 2 LESS_LOADED - * + * 1 4 RESTART_PROCESS diff --git a/docs/dashboard.rst b/docs/dashboard.rst index 7f8afb92..17fb940f 100644 --- a/docs/dashboard.rst +++ b/docs/dashboard.rst @@ -4,7 +4,13 @@ Dashboard ========= Each |Supervisor| instance provides a `Web Server `_ -and the |Supvisors| extension provides its own web user interface, as a replacement of the |Supervisor| one. +and the |Supvisors| extension provides its own Web User Interface, as a replacement of the |Supervisor| one but using +the same infrastructure. + +.. note:: + + The information displayed in the Web User Interface is a synthesis of the information provided by all |Supvisors| + instances and as perceived by the |Supvisors| instance that displays the web pages. .. important:: *About the browser compliance*. @@ -13,9 +19,9 @@ and the |Supvisors| extension provides its own web user interface, as a replacem All pages are divided into 3 parts: - * the `Common Menu`_ on the left side, - * a header on the top right, - * the contents itself on the lower right. + * the `Common Menu`_ on the left side ; + * a header on the top right ; + * the content itself on the lower right. Common Menu @@ -26,31 +32,32 @@ Common Menu :align: center Clicking on the 'Supvisors' title brings the `Main page`_ back or the `Conciliation page`_ if it blinks in red. -The version of |Supvisors| is displayed underneath. +The version of |Supvisors| is displayed underneath. There's also a reminder of the |Supvisors| instance that provides +the information. -Below is the Addresses part that lists all the nodes defined in the :ref:`supvisors_section` of the |Supervisor| -configuration file. -The color gives the state of the Address, as seen by the |Supvisors| instance that is displaying this page: +Below is the **Supervisors** part that lists all the |Supvisors| instances defined in the :ref:`supvisors_section` +of the |Supervisor| configuration file. +The color gives the state of the |Supvisors| instance: - * grey for ``UNKNOWN``, - * grey-to-green gradient for ``CHECKING``, - * yellow for ``SILENT``, - * green for ``RUNNING``, + * grey for ``UNKNOWN`` ; + * grey-to-green gradient for ``CHECKING`` ; + * yellow for ``SILENT`` ; + * green for ``RUNNING`` ; * red for ``ISOLATED``. -Only the hyperlinks of the ``RUNNING`` nodes are active. The browser is redirected to the `Address page`_ -of the corresponding Web Server. +Only the hyperlinks of the ``RUNNING`` |Supvisors| instances are active. +The browser is redirected to the `Supervisor page`_ of the targeted |Supvisors| instance. The |Supvisors| instance playing the role of *Master* is pointed out with the ✪ sign. -Below is the Application part that lists all the *Managed* applications defined through the +Below is the **Applications** part that lists all the *Managed* applications defined through the `group sections `_ of the |Supervisor| configuration file and also declared in the |Supvisors| :ref:`rules_file`. The color gives the state of the Application, as seen by the |Supvisors| instance that is displaying this page: - * grey for ``UNKNOWN``, - * yellow for ``STOPPED``, - * yellow-to-green gradient for ``STARTING``, - * green-to-yellow gradient for ``STOPPING``, + * grey for ``UNKNOWN`` ; + * yellow for ``STOPPED`` ; + * yellow-to-green gradient for ``STARTING`` ; + * green-to-yellow gradient for ``STOPPING`` ; * green for ``RUNNING``. An additional red light is displayed in the event where a failure has been raised on the application. @@ -79,13 +86,13 @@ The state of |Supvisors| is displayed on the left side of the header: This is the |Supvisors| starting phase, waiting for all |Supvisors| instances to connect themselves. Refer to the :ref:`synchronizing` section for more details. - In this state, the |Supvisors| :ref:`xml_rpc` is restricted so that version, master and node information only - are available. + In this state, the |Supvisors| :ref:`xml_rpc` is restricted so that only version, master and |Supvisors| instance + information are available. ``DEPLOYMENT`` - In this state, |Supvisors| is automatically starting applications (here for more details). - Refer to the :ref:`starting_strategy` section for more details. + In this state, |Supvisors| is automatically starting applications. Refer to the :ref:`starting_strategy` section + for more details. The whole :ref:`xml_rpc_status` part and the :ref:`xml_rpc_supvisors` part of the |Supvisors| :ref:`xml_rpc` are available from this state. @@ -94,10 +101,10 @@ The state of |Supvisors| is displayed on the left side of the header: In this state, |Supvisors| is mainly: - * listening to |Supervisor| events, - * publishing the events on its :ref:`event_interface`, - * checking the activity of all remote |Supvisors| instances, - * detecting eventual multiple running instances of the same program, + * listening to |Supervisor| events ; + * publishing the events on its :ref:`event_interface` ; + * checking the activity of all remote |Supvisors| instances ; + * detecting eventual multiple running instances of the same program ; * providing statistics to its Dashboard. The whole |Supvisors| :ref:`xml_rpc` is available in this state. @@ -114,7 +121,7 @@ The state of |Supvisors| is displayed on the left side of the header: ``RESTARTING`` |Supvisors| is stopping all processes before commanding its own restart, i.e. the restart - of all |Supervisor| instances. + of all |Supvisors| instances including a restart of their related |Supervisor|. Refer to the :ref:`stopping_strategy` section for more details. The |Supvisors| :ref:`xml_rpc` is NOT available in this state. @@ -122,33 +129,38 @@ The state of |Supvisors| is displayed on the left side of the header: ``SHUTTING_DOWN`` |Supvisors| is stopping all processes before commanding its own shutdown, i.e. the shutdown - of all |Supervisor| instances. + of all |Supvisors| instances including a restart of their related |Supervisor|. Refer to the :ref:`stopping_strategy` section for more details. + The |Supvisors| :ref:`xml_rpc` is NOT available in this state. + ``SHUTDOWN`` This is the final state of |Supvisors|, in which it remains inactive and waits for the |Supervisor| stopping event. + This state is unlikely to be displayed. The |Supvisors| :ref:`xml_rpc` is NOT available in this state. On the right side, 3 buttons are available: - * |restart| restarts |Supvisors| through all |Supervisor| instances, - * |shutdown| shuts down |Supvisors| through all |Supervisor| instances, - * |refresh| refreshes the current page, + * |restart| restarts |Supvisors| through all |Supvisors| instances ; + * |shutdown| shuts down |Supvisors| through all |Supvisors| instances ; + * |refresh| refreshes the current page ; * |autorefresh| refreshes the current page and sets a periodic 5s refresh to the page. Main Page Contents ~~~~~~~~~~~~~~~~~~ -For every nodes, a box is displayed in the contents of the |Supvisors| Main Page. +For every |Supvisors| instances, a box is displayed in the contents of the |Supvisors| Main Page. Each box contains: - * the Address name, which is a hyperlink to the corresponding `Address Page`_ if the Address state is ``RUNNING``, - * the Address state, colored with the same rules used in the `Common Menu`_, - * the Address process loading, - * the list of all processes that are running on this node, whatever they belong to a *managed* application or not. + * the |Supvisors| instance deduced name, which is a hyperlink to the corresponding `Supervisor Page`_ + if the |Supvisors| instance is in the ``RUNNING`` state ; + * the |Supvisors| instance state, colored with the same rules used in the `Common Menu`_ ; + * the |Supvisors| instance process loading ; + * the list of all processes that are running in this |Supvisors| instance, whatever they belong to a *Managed* + application or not. Conciliation Page @@ -157,8 +169,8 @@ Conciliation Page If the page is refreshed when |Supvisors| is in ``CONCILIATION`` state, the 'Supvisors' label in the top left of the `Common Menu`_ becomes red and blinks. This situation is unlikely to happen if the ``conciliation_strategy`` chosen in the :ref:`supvisors_section` -of the |Supervisor| configuration file is different from ``USER``, as the other values will lead -to an immediate conciliation of the conflicts. +of the |Supervisor| configuration file is different from ``USER``, as the other values will trigger an immediate and +automatic conciliation of the conflicts. The Conciliation Page can be reached by clicking on this blinking red label. @@ -176,46 +188,50 @@ Conciliation Page Contents ~~~~~~~~~~~~~~~~~~~~~~~~~~ On the right side of the page, the list of process conflicts is displayed into a table. -A process conflict is raised when the same program is running on several hosts. +A process conflict is raised when the same program is running in multiple |Supvisors| instances. So the table lists, for each conflict: - * the name of the program incriminated, - * the list of nodes where it is running, - * the uptime of the corresponding process on each node, + * the name of the program incriminated ; + * the list of |Supvisors| instances where it is running ; + * the uptime of the corresponding process in each |Supvisors| instance ; * for each process, a list of actions helping to the solving of this conflict: - + Stop the process, - + Keep this process (and Stop all others), + + Stop the process ; + + Keep this process (and stop all others) ; * for each process, a list of automatic strategies (refer to :ref:`conciliation`) helping to the solving of this conflict. The left side of the page contains a simple box that enables the user to perform a global conciliation on all conflicts, -using one of the automatic strategies. +using one of the automatic strategies proposed by |Supvisors|. -Address Page ------------- +Supervisor Page +--------------- -The Address Page of |Supvisors| is a bit less "sparse" than the web page provided by |Supervisor|. -It shows the status of the node, as seen by the local |Supvisors| instance. -It also enables the user to command the processes declared on this node and provides statistics that may be useful -at software integration time. +The *Supervisor* Page of |Supvisors| is the page that most closely resembles the legacy |Supervisor| page, +hence its name, although it is a bit less "sparse" than the web page provided by |Supervisor|. +It shows the status of the |Supvisors| instance, as seen by the |Supvisors| instance itself as this page is always +re-directed accordingly. +It also enables the user to command the processes declared in this |Supvisors| instance and provides statistics +that may be useful at software integration time. -Address Page Header -~~~~~~~~~~~~~~~~~~~ +Supervisor Page Header +~~~~~~~~~~~~~~~~~~~~~~ -The status of the Address is displayed on the left side of the header: +The status of the |Supvisors| instance is displayed on the left side of the header: - * the Address name, marked with the ✪ sign if it corresponds to the *Master*, - * the current loading of the processes running on this node, - * the Address state, - * the date of the last tick received from the |Supervisor| running on this node. + * the |Supvisors| instance deduced name, marked with the ✪ sign if it is the *Master* ; + * the current loading of the processes running in this |Supvisors| instance ; + * the |Supvisors| instance state ; + * the date of the last tick received by the |Supvisors| instance (hopefully less than 5 seconds from the current + system time). In the middle of the header, the 'Statistics View' box enables the user to choose the information presented on this page. -By default, the `Processes Section`_ is displayed. The other choice is the `Host Section`_. +By default, the `Processes Section`_ is displayed. The other choice is the `Host Section`_. The *Host* button is named +after the name of the node hosting the |Supvisors| instance. The periods can be updated in the :ref:`supvisors_section` of the |Supervisor| configuration file. Next to it, the 'Statistics Period' box enables the user to choose the period used for the statistics of this page. @@ -223,10 +239,10 @@ The periods can be updated in the :ref:`supvisors_section` of the |Supervisor| c On the right side, 5 buttons are available: - * |stop| stops all processes handled by |Supervisor| on this node, - * |restart| restarts |Supervisor| on this node, - * |shutdown| shuts down |Supervisor| on this node, - * |refresh| refreshes the current page, + * |stop| stops all the processes handled by |Supervisor| in this |Supvisors| instance ; + * |restart| restarts this |Supvisors| instance, including |Supervisor| ; + * |shutdown| shuts down this |Supvisors| instance, including |Supervisor| ; + * |refresh| refreshes the current page ; * |autorefresh| refreshes the current page and sets a periodic 5s refresh to the page. Processes Section @@ -236,58 +252,58 @@ Processes Section :alt: Processes Section of Supvisors Address Page :align: center -The Processes Section looks like the page provided by |Supervisor|. +The **Processes Section** looks like the page provided by |Supervisor|. Indeed, it lists the programs that are configured in |Supervisor|, it presents their current state with an associated description and enables the user to perform some actions on them: - * Log tail (with a refresh button, click on the program name itself), - * Start, - * Stop, - * Restart, - * Clear log, - * Tail stdout log (auto-refreshed), + * Log tail (with a refresh button, click on the program name itself) ; + * Start ; + * Stop ; + * Restart ; + * Clear log ; + * Tail stdout log (auto-refreshed) ; * Tail stderr log (auto-refreshed). |Supvisors| shows additional information for each process, such as: - * the loading declared for the process in the rules file, - * the CPU usage of the process during the last period (only if the process is ``RUNNING``), + * the loading declared for the process in the rules file ; + * the CPU usage of the process during the last period (only if the process is ``RUNNING``) ; * the instant memory (Resident Set Size) occupation of the process at the last period tick (only if the process - is ``RUNNING``), + is ``RUNNING``). -All processes are grouped by their application and |Supvisors| provides expand / shrink actions per application +All processes are grouped by their application name and |Supvisors| provides expand / shrink actions per application to enable the user to show / hide blocks of processes. The application line displays: - * the overall state of the application, considering all nodes, - * a description of the operational status of the application, - * considering the application processes that are running on this node: + * the overall state of the application, considering all |Supvisors| instances where it may be distributed, + * a basic description of the operational status of the application, + * considering the application processes that are running in this |Supvisors| instance: - * the sum of their expected loading, - * the sum of their CPU usage, + * the sum of their expected loading ; + * the sum of their CPU usage ; * the sum of their instant memory occupation. A click on the CPU or RAM measures shows detailed statistics about the process. This is not active on the application values. More particularly, |Supvisors| displays on the right side of the page a table showing for both CPU and Memory: - * the last measure, - * the mean value, - * the value of the slope of the linear regression, + * the last measure ; + * the mean value ; + * the value of the slope of the linear regression built ; * the value of the standard deviation. A color and a sign are associated to the last value, so that: - * green and ↗ point out a significant increase of the value since the last measure, - * red and ↘ point out a significant decrease of the value since the last measure, - * blue and ↝ point out the stability of the value since the last measure, + * green and ↗ point out an increase of the value since the last measure ; + * red and ↘ point out a decrease of the value since the last measure ; + * blue and ↝ point out the stability of the value since the last measure. Underneath, |Supvisors| shows two graphs (CPU and Memory) built from the series of measures taken from the selected process: - * the history of the values with a plain line, - * the mean value with a dashed line and value in the top right corner, - * the linear regression with a straight dotted line, + * the history of the values with a plain line ; + * the mean value with a dashed line and value in the top right corner ; + * the linear regression with a straight dotted line ; * the standard deviation with a colored area around the mean value. Host Section @@ -316,9 +332,9 @@ Application Page The Application Page of |Supvisors|: - * shows the status of the *managed* application, as seen by the considered |Supvisors| instance, - * enables the user to command the application and its processes - * and provides statistics that may be useful at software integration time. + * shows the status of the *managed* application, as seen by the considered |Supvisors| instance ; + * enables the user to command the application and its processes ; + * provides statistics that may be useful at software integration time. .. image:: images/supvisors_application_page.png :alt: Supvisors Application page @@ -329,13 +345,13 @@ Application Page Header The status of the Application is displayed on the left side of the header, including: - * the name of the application, - * the state of the application, + * the name of the application ; + * the state of the application ; * a led corresponding to the operational status of the application: - + empty if not ``RUNNING``, - + red if ``RUNNING`` and at least one major failure is detected, - + orange if ``RUNNING`` and at least one minor failure is detected, and no major failure, + + empty if not ``RUNNING`` ; + + red if ``RUNNING`` and at least one major failure is detected ; + + orange if ``RUNNING`` and at least one minor failure is detected, and no major failure ; + green if ``RUNNING`` and no failure is detected. The second part of the header is the 'Starting strategy' box that enables the user to choose the strategy @@ -349,10 +365,10 @@ of the |Supervisor| configuration file. On the right side, 4 buttons are available: - * |start| starts the application, - * |stop| stops the application, - * |restart| restarts the application, - * |refresh| refreshes the current page, + * |start| starts the application ; + * |stop| stops the application ; + * |restart| restarts the application ; + * |refresh| refreshes the current page ; * |autorefresh| refreshes the current page and sets a periodic 5s refresh to the page. Application Page Contents @@ -360,26 +376,28 @@ Application Page Contents The table lists all the programs belonging to the application, and it shows: - * the 'synthetic' state of the process (refer to this note for details about the synthesis), - * the node where it runs, if appropriate, - * the description (initialized from |Supervisor|, node_name added depending on the state), - * the loading declared for the process in the rules file, - * the CPU usage of the process during the last period (only if the process is ``RUNNING``), + * the 'synthetic' state of the process (refer to this note for details about the synthesis) ; + * the |Supvisors| instances where it runs, if appropriate ; + * the description (after initialization from |Supervisor|, the deduced name of the corresponding |Supvisors| + instance is added depending on the state) ; + * the loading declared for the process in the rules file ; + * the CPU usage of the process during the last period (only if the process is ``RUNNING``) ; * the instant memory (Resident Set Size) occupation of the process at the last period tick (only if the process is ``RUNNING``). -Like the `Address page`_, the Application page enables the user to perform some actions on programs: +Like the `Supervisor page`_, the Application page enables the user to perform some actions on programs: - * Start, - * Stop, - * Restart, - * Clear log, - * Tail stdout log (auto-refreshed), + * Start ; + * Stop ; + * Restart ; + * Clear log ; + * Tail stdout log (auto-refreshed) ; * Tail stderr log (auto-refreshed). -The difference is that the process is not started necessarily on the node that displays this page. +The difference is that the process is not started necessarily in the |Supvisors| instance that displays this page. Indeed, |Supvisors| uses the rules of the program (as defined in the rules file) and the starting strategy selected -in the header part to choose a relevant node. If no rule is defined for the program, the starting will fail. +in the header part to choose a relevant |Supvisors| instance. If no rule is defined for the program, the starting +will fail. As previously, a click on the CPU or Memory measures shows detailed statistics about the process. diff --git a/docs/event_interface.rst b/docs/event_interface.rst index 5b7bd03b..9073bf8b 100644 --- a/docs/event_interface.rst +++ b/docs/event_interface.rst @@ -8,8 +8,8 @@ Protocol The |Supvisors| Event Interface relies on a PyZMQ_ socket. To receive the |Supvisors| events, the client application must configure a socket with a ``SUBSCRIBE`` pattern -and connect it on localhost using the ``event_port`` defined in the :ref:`supvisors_section` -of the |Supervisor| configuration file. +and connect it on localhost using the ``event_port`` defined in the :ref:`supvisors_section` of the |Supervisor| +configuration file. |Supvisors| publishes the events in multi-parts messages. @@ -23,7 +23,7 @@ defined as follows in the ``supvisors.utils`` module: .. code-block:: python SUPVISORS_STATUS_HEADER = u'supvisors' - NODE_STATUS_HEADER = u'node' + INSTANCE_STATUS_HEADER = u'instance' APPLICATION_STATUS_HEADER = u'application' PROCESS_STATUS_HEADER = u'process' PROCESS_EVENT_HEADER = u'event' @@ -58,17 +58,18 @@ Key Value ================== ================== -Node status -~~~~~~~~~~~ +|Supvisors| instance status +~~~~~~~~~~~~~~~~~~~~~~~~~~~ ================== ================== Key Value ================== ================== -'address_name' *DEPRECATED* The Node name. -'node_name' The Node name. -'statecode' The Node state, in [0;5]. -'statename' The Node state as string, among { ``'UNKNOWN'``, ``'CHECKING'``, ``'RUNNING'``, ``'SILENT'``, - ``'ISOLATING'``, ``'ISOLATED'`` }. +'address_name' *DEPRECATED* The deduced name of the |Supvisors| instance. + This entry will be removed in the next version. +'identifier' The deduced name of the |Supvisors| instance. +'statecode' The |Supvisors| instance state, in [0;5]. +'statename' The |Supvisors| instance state as string, among { ``'UNKNOWN'``, ``'CHECKING'``, ``'RUNNING'`` + ``'SILENT'``, ``'ISOLATING'``, ``'ISOLATED'`` }. 'remote_time' The date of the last ``TICK`` event received from this node, in ms. 'local_time' The local date of the last ``TICK`` event received from this node, in ms. 'loading' The sum of the expected loading of the processes running on the node, in [0;100]%. @@ -106,8 +107,9 @@ Key Value 'expected_exit' True if the exit status is expected (only when state is ``'EXITED'``). 'last_event_time' The date of the last process event received for this process, regardless of the originating |Supvisors| instance. -'addresses' *DEPRECATED* The list of nodes where the process is running. -'nodes' The list of nodes where the process is running. +'addresses' *DEPRECATED* The deduced names of the |Supvisors| instances where the process is running. + This entry will be removed in the next version. +'identifiers' The deduced names of the |Supvisors| instances where the process is running. 'extra_args' The additional arguments passed to the command line of the process. ================== ================== @@ -115,7 +117,8 @@ Key Value The ``expected_exit`` information of this event provides an answer to the following |Supervisor| request: - * `#1150 - Why do event listeners not report the process exit status when stopped/crashed? `_ + * `#1150 - Why do event listeners not report the process exit status when stopped/crashed? + `_ Process event ~~~~~~~~~~~~~ @@ -130,8 +133,9 @@ Key Value 'expected' True if the exit status is expected (only when state is 100 - ``EXITED``). 'now' The date of the event in the reference time of the node. 'pid' The UNIX process ID (only when state is 20 - ``RUNNING`` or 40 - ``STOPPING``). -'address' *DEPRECATED* The node where the event comes from. -'node' The node where the event comes from. +'address' *DEPRECATED* The deduced name of the |Supvisors| instance that sent the initial event. + This entry will be removed in the next version. +'identifier' The deduced name of the |Supvisors| instance that sent the initial event. 'extra_args' The additional arguments passed to the command line of the process. ================== ================== @@ -211,7 +215,7 @@ The binary JAR of :program:`Google Gson 2.8.6` is available in the } @Override - public void onNodeStatus(final SupvisorsNodeInfo status) { + public void onInstanceStatus(final SupvisorsInstanceInfo status) { System.out.println(status); } diff --git a/docs/images/supvisors_address_host_section.png b/docs/images/supvisors_address_host_section.png index 9766e009..45dbf2e4 100644 Binary files a/docs/images/supvisors_address_host_section.png and b/docs/images/supvisors_address_host_section.png differ diff --git a/docs/images/supvisors_address_process_section.png b/docs/images/supvisors_address_process_section.png index a64c3a4b..285d8795 100644 Binary files a/docs/images/supvisors_address_process_section.png and b/docs/images/supvisors_address_process_section.png differ diff --git a/docs/images/supvisors_application_page.png b/docs/images/supvisors_application_page.png index f3f6a4f8..3129186a 100644 Binary files a/docs/images/supvisors_application_page.png and b/docs/images/supvisors_application_page.png differ diff --git a/docs/images/supvisors_conciliation_page.png b/docs/images/supvisors_conciliation_page.png index 6c4f1fc6..1100d9c9 100644 Binary files a/docs/images/supvisors_conciliation_page.png and b/docs/images/supvisors_conciliation_page.png differ diff --git a/docs/images/supvisors_main_page.png b/docs/images/supvisors_main_page.png index db877ab9..b11bbdec 100644 Binary files a/docs/images/supvisors_main_page.png and b/docs/images/supvisors_main_page.png differ diff --git a/docs/images/supvisors_menu.png b/docs/images/supvisors_menu.png index 674b5b30..70296f7e 100644 Binary files a/docs/images/supvisors_menu.png and b/docs/images/supvisors_menu.png differ diff --git a/docs/images/supvisors_scenario_1.png b/docs/images/supvisors_scenario_1.png index 65ad7317..47fc6adf 100644 Binary files a/docs/images/supvisors_scenario_1.png and b/docs/images/supvisors_scenario_1.png differ diff --git a/docs/images/supvisors_scenario_2.png b/docs/images/supvisors_scenario_2.png index 37a97677..dd225365 100644 Binary files a/docs/images/supvisors_scenario_2.png and b/docs/images/supvisors_scenario_2.png differ diff --git a/docs/images/supvisors_scenario_3.png b/docs/images/supvisors_scenario_3.png index febacdee..3ef566d9 100644 Binary files a/docs/images/supvisors_scenario_3.png and b/docs/images/supvisors_scenario_3.png differ diff --git a/docs/introduction.rst b/docs/introduction.rst index 925090ef..e546a91d 100644 --- a/docs/introduction.rst +++ b/docs/introduction.rst @@ -1,3 +1,5 @@ +.. _introduction: + Introduction ============ @@ -6,20 +8,22 @@ Overview |Supvisors| is a control system for distributed applications over multiple |Supervisor| instances. -This piece of software was born from a common need in embedded systems where applications are distributed over several -nodes. +A few definitions first: + + * The term "Application" here refers to a group of programs designed to carry out a specific task. + * The term "Node" here refers to an operating system having a dedicated host name and IP address. -This problematic comes with the following challenges: +The |Supvisors| software is born from a common need in embedded systems where applications are distributed over several +nodes. The problematic comes with the following challenges: - * have a detailed status of the applications, - * have basic statistics about the resources taken by the applications, - * have a basic status of the nodes, - * start / stop processes dynamically, - * distribute the same application over different platforms, - * control the applications from nodes outside of the platform, - * secure the control / status interfaces, - * deal with loading (CPU, memory, etc), - * deal with failures: + * to have a status of the processes, + * to have a synthetic status of the applications based on the processes status, + * to have basic statistics about the resources taken by the applications, + * to have a basic status of the nodes, + * to control applications and processes dynamically, + * to distribute the same application over different platforms (developer machine, integration platform, etc), + * to deal with resources (CPU, memory, network, etc), + * to deal with failures: + missing node when starting, + crash of a process, @@ -30,20 +34,17 @@ As a bonus: * it should be free, open source, without export control, * it shouldn't require specific administration rights (root). -After some researches on the net - at the time -, it seemed that there was no simple, free and open source solution -meeting all these requirements. Of course, there are now orchestration solutions like Kubernetes, Docker Swarm, Mesos, -etc. coming with more or less complexity, some working on containers only. - |Supervisor| can handle a part of the requirements but it only works on a single UNIX-like operating system. The |Supervisor| website references some `third parties `_ -that deal with multiple |Supervisor| instances but they only consist in dashboards and they focus on the nodes rather than -on the applications and their possible distribution over nodes. +that deal with multiple |Supervisor| instances but they only consist in dashboards and they focus on the nodes rather +than on the applications and their possible distribution over nodes. Nevertheless, the extensibility of |Supervisor| makes it possible to implement the missing requirements. |Supvisors| works as a |Supervisor| plugin and is intended for those who are already familiar with |Supervisor| or -who have neither the time nor the resources to invest in a complex orchestration tool. +who have neither the time nor the resources to invest in a complex orchestration tool like Kubernetes. -In this documentation, a |Supvisors| *instance* refers to a Supervisor *instance* including a |Supvisors| extension. +In the present documentation, a |Supvisors| *instance* refers to a |Supervisor| *instance* including a |Supvisors| +extension. Platform Requirements @@ -55,9 +56,10 @@ Platform Requirements |Supvisors| works with Python 3.6 or later but will not work under any version of Python 2. -A previous release of |Supvisors| (version 0.1, available on PyPi) works with Python 2.7 (and previous versions of |Supervisor|, i.e. 3.3.0) but is not maintained anymore. +A previous release of |Supvisors| (version 0.1, available on PyPi) works with Python 2.7 (and previous versions +of |Supervisor|, i.e. 3.3.0) but is not maintained anymore. -The CSS of the Dashboard has been written for Firefox ESR 60.3.0. +The CSS of the Dashboard has been written for Firefox ESR 78.5.0. The compatibility with other browsers or other versions of Firefox is unknown. @@ -69,7 +71,7 @@ Installation +---------------+------------+-----------------------------------------------------------------+ | Package | Release | Usage | +===============+============+=================================================================+ -| |Supervisor| | 4.2.1 | Base software, extended by |Supvisors| | +| |Supervisor| | 4.2.4 | Base software, extended by |Supvisors| | +---------------+------------+-----------------------------------------------------------------+ | PyZMQ_ | 22.0.3 | Python binding of ZeroMQ | +---------------+------------+-----------------------------------------------------------------+ @@ -90,17 +92,17 @@ Supvisors can be installed with ``pip install``: # minimal install (including Supervisor and PyZMQ) [bash] > pip install supvisors - # extra install for all optional dependencies + # install including all optional dependencies [bash] > pip install supvisors[all] - # extra install for dashboard statistics and graphs only + # install for dashboard statistics and graphs only # (includes psutil and matplotlib) [bash] > pip install supvisors[statistics] - # extra install for XML validation only (includes lxml) + # install for XML validation only (includes lxml) [bash] > pip install supvisors[xml_valid] - # extra install for use of IP aliases only (includes psutil) + # install for use of IP aliases only (includes psutil) [bash] > pip install supvisors[ip_address] Without an Internet access @@ -125,12 +127,18 @@ Running |Supvisors| |Supvisors| runs as a plugin of |Supervisor| so it follows the same principle as `Running Supervisor `_ but using multiple UNIX-like operating systems. +Although |Supvisors| was originally designed to handle exactly one Supervisor instance per node, it can handle +multiple Supervisor instances on each node since the version 0.11. + However, the |Supervisor| configuration file **MUST**: - * be configured with an internet socket (refer to the `inet-http-server `_ section settings) ; + * be configured with an internet socket (refer to the + `inet-http-server `_ + section settings) ; * include the ``[rpcinterface:supvisors]`` and the ``[ctlplugin:supvisors]`` sections (refer to the :ref:`Configuration` part) ; - * be identical on all considered nodes. + * be consistent on all considered nodes, more particularly attention must be paid to the list of declared + |Supvisors| instances and the IP ports used. .. important:: diff --git a/docs/special.rst b/docs/special.rst index e9fa89c5..24d4a9a0 100644 --- a/docs/special.rst +++ b/docs/special.rst @@ -12,72 +12,75 @@ are mutually aware of each other. The following options defined in the :ref:`supvisors_section` of the |Supervisor| configuration file are particularly used for synchronizing multiple instances of |Supervisor|: - * ``address_list``, - * ``force_synchro_if``, - * ``internal_port``, - * ``synchro_timeout``, + * ``supvisors_list`` ; + * ``core_identifiers`` ; + * ``internal_port`` ; + * ``synchro_timeout`` ; * ``auto_fence``. Once started, all |Supvisors| instances publish the events received, especially the ``TICK`` events that are triggered every 5 seconds, on their ``PUBLISH`` PyZMQ_ socket bound on the ``internal_port``. On the other side, all |Supvisors| instances start a thread that subscribes to the internal events -through an internal ``SUBSCRIBE`` PyZMQ_ socket connected to the ``internal_port`` of **all** nodes -of the ``address_list``. +through an internal ``SUBSCRIBE`` PyZMQ_ socket connected to the ``internal_port`` of all |Supvisors| instances +of the ``supvisors_list``. -At the beginning, all nodes are in an ``UNKNOWN`` state. +At the beginning, all |Supvisors| instances are declared in an ``UNKNOWN`` state. When the first ``TICK`` event is received from a remote |Supvisors| instance, a hand-shake is performed -between the 2 nodes. The local |Supvisors| instance: +between the 2 |Supvisors| instances. The local |Supvisors| instance: - * sets the remote node state to ``CHECKING``, + * sets the remote |Supvisors| instance state to ``CHECKING`` ; * performs a couple of XML-RPC to the remote |Supvisors| instance: - + ``supvisors.get_master_address()`` and ``supvisors.get_supvisors_state()`` in order to know if the remote - instance is already in an established state. - + ``supvisors.get_address_info(local_address)`` in order to know how the local node is perceived - by the remote instance. + + ``supvisors.get_master_identifier()`` and ``supvisors.get_supvisors_state()`` in order to know if the remote + instance is already in an established state ; + + ``supvisors.get_instance_info(local_identifier)`` in order to know how the local |Supvisors| instance is + perceived by the remote |Supvisors| instance. At this stage, 2 possibilities: * the local |Supvisors| instance is seen as ``ISOLATED`` by the remote instance: - + the remote node is then reciprocally set to ``ISOLATED``, - + the *URL* of the remote |Supvisors| instance is disconnected from the ``SUBSCRIBE`` PyZMQ_ socket, + + the remote |Supvisors| instance is then reciprocally set to ``ISOLATED`` ; + + the *URL* of the remote |Supvisors| instance is disconnected from the ``SUBSCRIBE`` PyZMQ_ socket ; * the local |Supvisors| instance is NOT seen as ``ISOLATED`` by the remote instance: - + a ``supervisor.getAllProcessInfo()`` XML-RPC is requested to the remote instance, - + the processes information is loaded into the internal data structure, - + the remote node is finally set to ``RUNNING``. + + a ``supervisor.getAllProcessInfo()`` XML-RPC is requested to the remote instance ; + + the processes information is loaded into the internal data structure ; + + the remote |Supvisors| instance is finally set to ``RUNNING``. When all |Supvisors| instances are identified as ``RUNNING`` or ``ISOLATED``, the synchronization is completed. -|Supvisors| then is able to work with the set (or subset) of nodes declared in ``address_list``. +|Supvisors| then is able to work with the set (or subset) of |Supvisors| instances declared in ``supvisors_list``. However, it may happen that some |Supvisors| instances do not publish (very late starting, no starting at all, system down, network down, etc). Each |Supvisors| instance waits for ``synchro_timeout`` seconds to give a chance to all other instances to publish. When this delay is exceeded, all the |Supvisors| instances that are **not** identified as ``RUNNING`` or ``ISOLATED`` are set to: - * ``SILENT`` if `Auto-Fencing`_ is **not** activated, + * ``SILENT`` if `Auto-Fencing`_ is **not** activated ; * ``ISOLATED`` if `Auto-Fencing`_ is activated. -Another possibility is when it is predictable that some nodes may be started later. For example, the pool of nodes -may include servers that will always be started from the very beginning and consoles that may be started only -on demand. In this case, it would be a pity to always wait for ``synchro_timeout`` seconds. That's why the -``force_synchro_if`` attribute has been introduced so that the synchronization phase is considered completed -when a subset of the nodes declared in ``address_list`` are ``RUNNING``. +Another possibility is when it is predictable that some |Supvisors| instances may be started later. +For example, the pool of nodes may include servers that will always be started from the very beginning and consoles +that may be started only on demand. +In this case, it would be a pity to always wait for ``synchro_timeout`` seconds. That's why the ``core_identifiers`` +attribute has been introduced so that the synchronization phase is considered completed +when a subset of the |Supvisors| instances declared in ``supvisors_list`` are ``RUNNING``. -Whatever the number of available nodes, |Supvisors| elects a *Master* among the active nodes and enters -the ``DEPLOYMENT`` state to start automatically the applications. -By default, the *Master* node is the node having the smallest name among all the active nodes, unless the attribute -``force_synchro_if`` is used. In the latter case, candidates are taken from this list in priority. +Whatever the number of available |Supvisors| instances, |Supvisors| elects a *Master* among the active |Supvisors| +instances and enters the ``DEPLOYMENT`` state to start automatically the applications. +By default, the |Supvisors| *Master* instance is the |Supvisors| instance having the smallest deduced name among all +the active |Supvisors| instances, unless the attribute ``core_identifiers`` is used. In the latter case, candidates +are taken from this list in priority. -.. important:: *About late nodes* +.. important:: *About late |Supvisors| instances* - Back to this case, here is what happens when a node is started while the others are already in ``OPERATION``. + Back to this case, here is what happens when a |Supvisors| instance is started while the others are already in + ``OPERATION``. During the hand-shake, the local |Supvisors| instance gets the *Master* identified by the remote |Supvisors|. - That confirms that the local node is a late starter and thus the local |Supvisors| instance adopts this *Master* - too and skips the synchronization phase. + That confirms that the local |Supvisors| instance is a late starter and thus the local |Supvisors| instance adopts + this *Master* too and skips the synchronization phase. .. _auto_fencing: @@ -90,11 +93,11 @@ It takes place when one of the |Supvisors| instances is seen as inactive (crash, from the other |Supvisors| instances. In this case, the running |Supvisors| instances disconnect the corresponding URL from their subscription socket. -The Address is marked as ``ISOLATED`` and, in accordance with the program rules defined, |Supvisors| may restart -somewhere else the processes that were eventually running on that node. +The |Supvisors| instance is marked as ``ISOLATED`` and, in accordance with the program rules defined, +|Supvisors| may restart somewhere else the processes that were eventually running in that |Supvisors| instance. -If the incriminated node restarts, and the |Supvisors| instance is restarted on that system too, the isolation -doesn't prevent the new |Supvisors| instance to receive events from the other instances that have isolated it. +If the incriminated |Supvisors| instance is restarted, the isolation doesn't prevent the new |Supvisors| instance +to receive events from the other instances that have isolated it. Indeed, it is not possible to filter the subscribers from the ``PUBLISH`` side of a PyZMQ_ socket. That's why the hand-shake is performed in :ref:`synchronizing`. @@ -109,10 +112,10 @@ If the network failure is fixed, both sets of |Supvisors| are still running but .. attention:: |Supvisors| does NOT isolate the nodes at the Operating System level, so that when the incriminated nodes - become active again, it is still possible to perform network requests between all nodes, despite the - |Supvisors| instances do not communicate anymore. + become active again, it is still possible to perform network requests between all nodes, despite the |Supvisors| + instances do not communicate anymore. - Similarly, it is outside the scope of |Supvisors| to isolate the nodes at application level. + Similarly, it is outside the scope of |Supvisors| to isolate the communication at application level. It is the user's responsibility to isolate his applications. @@ -121,42 +124,35 @@ If the network failure is fixed, both sets of |Supvisors| are still running but Extra Arguments ---------------- -When using |Supervisor|, colleagues have often asked if it would be possible to add extra arguments to the command -line of a program without declaring them in the ini file. Indeed, the applicative context is evolving at runtime -and it may be quite useful to give some information to the new process (options, path, URL of a server, -URL of a display, ...), especially when dealing with distributed applications. +|Supervisor| users have requested the possibility to add extra arguments to the command line of a program without +having to update and reload the program configuration in |Supervisor|. -With |Supervisor|, it is possible to inform the process with a ``supervisor.sendProcessStdin`` XML-RPC. -The first drawback is that it requires to update the source code of an existing program that is already capable of -reading instructions from its command line. That is not always possible. -On the other hand, colleagues found the solution so clumsy that they finally preferred to use a dedicated com -to configure the process. + `#1023 - Pass arguments to program when starting a job? `_ + +Indeed, the applicative context is evolving at runtime and it may be quite useful to give some information +to the new process (options, path, URL of a server, URL of a display, etc), especially when dealing with +distributed applications. |Supvisors| introduces new XML-RPCs that are capable of taking into account extra arguments that are passed to the command line before the process is started: - * ``supvisors.start_args``: start a process on the local system, + * ``supvisors.start_args``: start a process in the local |Supvisors| instance ; * ``supvisors.start_process``: start a process using a starting strategy. -.. hint:: - - These additional commands are an answer to the following |Supervisor| request: - - * `#1023 - Pass arguments to program when starting a job? `_ - .. note:: - The extra arguments of the program are shared by all |Supervisor| instances. + The extra arguments of the program are shared by all |Supvisors| instances. Once used, they are published through a |Supvisors| internal event and are stored directly into the |Supervisor| internal configuration of the programs. - In other words, considering 2 nodes A and B, a process that is started on node A with extra arguments and - configured to restart on node crash (refer to `Running Failure strategy`_), if the node A crashes (or simply - becomes unreachable), the process will be restarted on node B with the same extra arguments. + In other words, considering 2 |Supvisors| instances A and B, a process that is started in |Supvisors| instance A + with extra arguments and configured to restart on node crash (refer to `Running Failure strategy`_). + if the |Supvisors| instance A crashes (or simply becomes unreachable), the process will be restarted in the + |Supvisors| instance B with the same extra arguments. .. attention:: - A limitation however: the extra arguments are reset each time a new node connects to the other ones, + A limitation however: the extra arguments are reset each time a new |Supvisors| instance connects to the other ones, either because it has started later or because it has been disconnected for a while due to a network issue. @@ -166,40 +162,50 @@ Starting strategy ----------------- |Supvisors| provides a means to start a process without telling explicitly where it has to be started, -and in accordance with the rules defined for this program, i.e. the ``address_list``. +and in accordance with the rules defined for this program. -Choosing a node -~~~~~~~~~~~~~~~ +Choosing a |Supvisors| instance +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The following rules are applicable whatever the chosen strategy: * the process must not be already in a *running* state in a broad sense, i.e. ``RUNNING``, ``STARTING`` or ``BACKOFF`` ; - * the program definition must be known to the node ; - * the node must be ``RUNNING`` ; - * the *load* of the node must not exceed 100% when adding the ``expected_loading`` of the program to be started. + * the process must be known to the |Supervisor| of the |Supvisors| instance ; + * the |Supvisors| instance must be ``RUNNING`` ; + * the |Supvisors| instance must be allowed in the ``identifiers`` rule of the process ; + * the *load* of the |Supvisors| instance must not exceed 100% when adding the ``expected_loading`` of the program + to be started. -The *load* of the chosen node is defined as the sum of the ``expected_loading`` of each process running on this node. +The *load* of the chosen |Supvisors| instance is defined as the sum of the ``expected_loading`` of each process running +in this |Supvisors| instance. -When applying the ``CONFIG`` strategy, |Supvisors| chooses the first node available in the ``address_list``. +When applying the ``CONFIG`` strategy, |Supvisors| chooses the first |Supvisors| instance available in the +``supvisors_list``. -When applying the ``LESS_LOADED`` strategy, |Supvisors| chooses the node in the ``address_list`` having the lowest -*load*. The aim is to distribute the process load among the available nodes. +When applying the ``LESS_LOADED`` strategy, |Supvisors| chooses the |Supvisors| instance in the ``supvisors_list`` +having the lowest *load*. +The aim is to distribute the process load among the available |Supvisors| instances. -When applying the ``MOST_LOADED`` strategy, with respect of the common rules, |Supvisors| chooses the node in -the ``address_list`` having the greatest *load*. -The aim is to maximize the loading of a node before starting to load another node. +When applying the ``MOST_LOADED`` strategy, |Supvisors| chooses the |Supvisors| instance in the ``supvisors_list`` +having the greatest *load*. +The aim is to maximize the loading of a |Supvisors| instance before starting to load another |Supvisors| instance. This strategy is more interesting when the resources are limited. -When applying the ``LOCAL`` strategy, |Supvisors| chooses the local node provided that it is compliant -with the ``address_list``. A typical use case is to start an HCI application on a given console, -while other applications / services may be distributed over other nodes. +When applying the ``LOCAL`` strategy, |Supvisors| chooses the local |Supvisors| instance. +A typical use case is to start an HCI application on a given console, while other applications / services may be +distributed over other nodes. .. attention:: A consequence of choosing the ``LOCAL`` strategy as the default ``starting_strategy`` - in the :ref:`supvisors_section` is that all programs will be started on the *Master* node. + in the :ref:`supvisors_section` is that all programs will be started on the |Supvisors| *Master* instance. + +.. attention:: + + When multiple |Supvisors| instances are running on the same node, it would be relevant to consider the *load* + of the node. This may be considered as a future enhancement in |Supvisors|. Starting a process @@ -207,32 +213,38 @@ Starting a process The internal *Starter* of |Supvisors| applies the following logic to start a process: -| if process state is not ``RUNNING``: -| choose a starting node for the program in accordance with the rules defined above -| perform a ``supvisors.start_args(namespec)`` XML-RPC to the |Supvisors| instance running on the chosen node +| if the process is stopped: +| choose a |Supvisors| instance for the process in accordance with the rules defined in the previous section +| perform a ``supvisors.start_args(namespec)`` XML-RPC to the chosen |Supvisors| instance | This single job is considered completed when: - * a ``RUNNING`` event is received and the ``wait_exit`` rule is **not** set for this process, - * an ``EXITED`` event with an expected exit code is received and the ``wait_exit`` rule is set for this process, - * an error is encountered (``FATAL`` event, ``EXITED`` event with an unexpected exit code), - * no ``STARTING`` event has still been received 10 seconds after the XML-RPC. + * a ``RUNNING`` event is received and the ``wait_exit`` rule is **not** set for this process ; + * an ``EXITED`` event is received with an expected exit code and the ``wait_exit`` rule is set for this process ; + * an error is encountered (``FATAL`` event, ``EXITED`` event with an unexpected exit code) ; + * no ``STARTING`` event has been received 3 seconds after the XML-RPC ; + * no ``RUNNING`` event has been received X+3 seconds after the XML-RPC, X corresponding to the ``startsecs`` of the + program definition in the |Supvisors| instance where the process has been requested to start. This principle is used for starting a single process using a ``supvisors.start_process`` XML-RPC. +.. attention:: *About using the wait_exit rule* + + If the process is expected to exit and does not exit, it will block the *Starter* until |Supvisors| is restarted. + Starting an application ~~~~~~~~~~~~~~~~~~~~~~~ -The application start sequence is re-evaluated every time a new node becomes active in |Supvisors|. Indeed, as -explained above, the internal data structure is updated with the programs configured in the |Supervisor| instance -of the new node and this new data may have an impact on the application start sequence. +The application start sequence is re-evaluated every time a new |Supvisors| instance becomes active in |Supvisors|. +Indeed, as explained above, the internal data structure is updated with the programs configured in the new |Supervisor| +instance and this may have an impact on the application start sequence. The start sequence corresponds to a dictionary where: - * the keys correspond to the list of ``start_sequence`` values defined in the program rules of the application, - * the value associated to a key is the list of programs having this key as ``start_sequence``. + * the keys correspond to the list of ``start_sequence`` values defined in the program rules of the application ; + * the value associated to a key contains the list of programs having this key as ``start_sequence``. .. hint:: @@ -271,39 +283,38 @@ the ``start_sequence`` rule configured for the applications and processes. The global start sequence corresponds to a dictionary where: - * the keys correspond to the list of ``start_sequence`` values defined in the application rules, + * the keys correspond to the list of ``start_sequence`` values defined in the application rules ; * the value associated to a key is the list of application start sequences whose applications have this key as ``start_sequence``. -The |Supvisors| *Master* instance uses the global start sequence to start the applications in the defined order. +The |Supvisors| *Master* instance starts the applications using the global start sequence. The following pseudo-code explains the logic used: | while global start sequence is not empty: -| pop the application start sequences having the lower (strictly positive) ``start_sequence`` -| -| while application start sequences are not empty: +| pop the application list having the lower (strictly positive) ``start_sequence`` | -| for each sequence in application start sequences: -| pop the process list having the lower (strictly positive) ``start_sequence`` +| for each application in application list: +| apply `Starting an application`_ | -| for each process in process list: -| apply `Starting a process`_ -| -| wait for the jobs to complete +| wait for the jobs to complete | .. note:: The applications having a ``start_sequence`` lower or equal to 0 are not considered, as they are not meant to be - autostarted. + automatically started. .. important:: When leaving the ``DEPLOYMENT`` state, it may happen that some applications are not started properly - due to missing nodes. When a node is started later and is authorized in the |Supvisors| ensemble, - |Supvisors| transitions back to the ``DEPLOYMENT`` state to repair such applications. - May the new node arrive during a ``DEPLOYMENT`` or ``CONCILIATION`` phase, the transition to the ``DEPLOYMENT`` - state is deferred until the current deployment or conciliation jobs are completed. + due to missing relevant |Supvisors| instances. + + When a |Supvisors| instance is started later and is authorized in the |Supvisors| ensemble, |Supvisors| transitions + back to the ``DEPLOYMENT`` state and tries to **repair** such applications. The applications are **not** restarted. + Only the stopped processes are considered. + + May the new |Supvisors| instance arrive during a ``DEPLOYMENT`` or ``CONCILIATION`` phase, the transition to the + ``DEPLOYMENT`` state is deferred until the current deployment or conciliation jobs are completed. It has been chosen NOT to transition back to the ``INITIALIZATION`` state to avoid a new synchronization phase. @@ -312,18 +323,22 @@ The following pseudo-code explains the logic used: Starting Failure strategy ------------------------- -When an application is starting, it may happen that any of its programs cannot be started due to various reasons -(the program command line is wrong ; third parties are missing ; none of the nodes defined in the ``address_list`` -of the program rules are started ; the applicable nodes are already too much loaded ; etc). +When an application is starting, it may happen that any of its programs cannot be started due to various reasons: + + * the program command line is wrong ; + * third parties are missing ; + * none of the |Supvisors| instances defined in the ``identifiers`` of the program rules are started ; + * the applicable |Supvisors| instances are already too much loaded ; + * etc. |Supvisors| uses the ``starting_failure_strategy`` option of the rules file to determine the behavior to apply -when a ``required`` program cannot be started. Program having the ``required`` set to False are not considered as +when a ``required`` process cannot be started. Programs having the ``required`` set to False are not considered as their absence is minor by definition. Possible values are: - * ``ABORT``: Abort the application starting. - * ``STOP``: Stop the application. + * ``ABORT``: Abort the application starting ; + * ``STOP``: Stop the application ; * ``CONTINUE``: Skip the failure and continue the application starting. @@ -338,22 +353,23 @@ the other |Supervisor| instances cannot do anything about that. |Supvisors| uses the ``running_failure_strategy`` option of the rules file to warm restart a process that was running on a node that has crashed, in accordance with the default ``starting_strategy`` set in the -:ref:`supvisors_section` and with the ``address_list`` program rules set in the :ref:`rules_file`. +:ref:`supvisors_section` and with the ``supvisors_list`` program rules set in the :ref:`rules_file`. This option can be also used to stop or restart the whole application after a process crash. Indeed, it may happen -that some applications cannot survive if one of their programs is just restarted. +that some applications cannot survive if one of their processes is just restarted. Possible values are: - * ``CONTINUE``: Skip the failure. The application keeps running. - * ``RESTART_PROCESS``: Restart the lost process on another node. - * ``STOP_APPLICATION``: Stop the application. + * ``CONTINUE``: Skip the failure and the application keeps running ; + * ``RESTART_PROCESS``: Restart the lost process on another |Supvisors| instance ; + * ``STOP_APPLICATION``: Stop the application ; * ``RESTART_APPLICATION``: Restart the application. .. important:: - The ``RESTART_PROCESS`` is NOT intended to replace the |Supervisor| ``autorestart`` on the local node. - Provided a program definition where ``autorestart`` is set to ``false`` in the |Supervisor| configuration file + The ``RESTART_PROCESS`` is NOT intended to replace the |Supervisor| ``autorestart`` for the local |Supvisors| + instance. + Provided a program definition where ``autorestart`` is set to ``false`` in the |Supervisor| configuration and where the ``running_failure_strategy`` option is set to ``RESTART_PROCESS`` in the |Supvisors| rules file, if the process crashes, |Supvisors| will NOT restart the process. @@ -390,15 +406,17 @@ Stopping a process The internal *Stopper* of |Supvisors| applies the following logic to stop a process: -| if process state is ``RUNNING``: -| perform a ``supervisor.stopProcess(namespec)`` XML-RPC to the |Supervisor| instance where the process is running +| if the process is running: +| perform a ``supervisor.stopProcess(namespec)`` XML-RPC to the |Supervisor| instances where the process is running | This single job is considered completed when: - * a ``STOPPED`` event is received for this process, - * an error is encountered (``FATAL`` event, ``EXITED`` event whatever the exit code), - * no ``STOPPING`` event has still been received 10 seconds after the XML-RPC. + * a ``STOPPED`` event is received for this process ; + * an error is encountered (``FATAL`` event, ``EXITED`` event whatever the exit code) ; + * no ``STOPPING`` event has been received 3 seconds after the XML-RPC ; + * no ``STOPPED`` event has been received X+3 seconds after the XML-RPC, X corresponding to the ``stopwaitsecs`` + of the program definition in the |Supvisors| instance where the process has been requested to stop. This principle is used for stopping a single process using a ``supvisors.stop_process`` XML-RPC. @@ -409,7 +427,7 @@ Stopping an application The application stop sequence is defined at the same moment than the application start sequence. It corresponds to a dictionary where: - * the keys correspond to the list of ``stop_sequence`` values defined in the program rules of the application, + * the keys correspond to the list of ``stop_sequence`` values defined in the program rules of the application ; * the value associated to a key is the list of programs having this key as ``stop_sequence``. .. note:: @@ -454,7 +472,7 @@ using the ``stop_sequence`` rule configured for the applications and processes. The global stop sequence corresponds to a dictionary where: - * the keys correspond to the list of ``stop_sequence`` values defined in the application rules, + * the keys correspond to the list of ``stop_sequence`` values defined in the application rules ; * the value associated to a key is the list of application stop sequences whose applications have this key as ``stop_sequence``. @@ -463,39 +481,37 @@ the global stop sequence to stop all the running applications in the defined ord The following pseudo-code explains the logic used: | while global stop sequence is not empty: -| pop the application stop sequences having the greater ``stop_sequence`` +| pop the application list having the greater ``stop_sequence`` | -| while application stop sequences are not empty: +| for each application in application list: +| apply `Stopping an application`_ | -| for each sequence in application stop sequences: -| pop the process list having the lower ``stop_sequence`` -| -| for each process in process list: -| apply `Stopping a process`_ -| -| wait for the jobs to complete +| wait for the jobs to complete | - .. _conciliation: Conciliation ------------ -|Supvisors| is designed so that there should be only one instance of the same program running on a set of systems, +|Supvisors| is designed so that there should be only one instance of the same process running on a set of nodes, although all of them may have the capability to start it. Nevertheless, it is still likely to happen in a few cases: - * using a request to |Supervisor| itself (through Web UI, :program:`supervisorctl`, XML-RPC), + * using a request to |Supervisor| itself (through Web UI, :program:`supervisorctl`, XML-RPC) ; * upon a network failure. .. attention:: - In the case of a network failure, as described in :ref:`auto_fencing`, and if the ``auto_fence`` option is not set, - the Address is set to ``SILENT`` instead of ``ISOLATED`` and its URL is not disconnected from the subscriber socket. + In the event of a network failure - let's say a network cable is unplugged -, if the ``auto_fence`` option is not + set, a |Supvisors| instance running on the isolated node will be set to ``SILENT`` instead of ``ISOLATED`` and its + URL will not disconnected from the subscriber socket. + + Depending on the rules set, this situation may lead |Supvisors| to warm restart the processes that were running in + the lost |Supvisors| instance onto other |Supvisors| instances. - When the network failure is fixed, |Supvisors| has likely to deal with a duplicated list of applications + When the network failure is fixed, |Supvisors| will likely have to deal with a bunch of duplicated applications and processes. When such a conflict is detected, |Supvisors| enters in the ``CONCILIATION`` state. @@ -513,7 +529,7 @@ of all duplicates: ``USER`` - That's the easy one. When applying the ``USER`` strategy, |Supvisors| just waits for an user application + That's the easy one. When applying the ``USER`` strategy, |Supvisors| just waits for a third party to solve the conflicts using Web UI, :program:`supervisorctl`, XML-RPC, process signals, or any other solution. ``STOP`` diff --git a/docs/supervisorctl.rst b/docs/supervisorctl.rst index 876989eb..55d68cb7 100644 --- a/docs/supervisorctl.rst +++ b/docs/supervisorctl.rst @@ -28,12 +28,13 @@ The additional commands provided by |Supvisors| are available by typing :command supvisors commands (type help ): ======================================= - address_status loglevel sshutdown start_process_args - application_info master sstate stop_application - application_rules process_rules sstatus stop_process - conciliate restart_application start_application strategies - conflicts restart_process start_args sversion - local_status sreload start_process update_numprocs + address_status loglevel sshutdown stop_application + application_info master sstate stop_process + application_rules process_rules sstatus strategies + conciliate restart_application start_application sversion + conflicts restart_process start_args update_numprocs + instance_status restart_sequence start_process + local_status sreload start_process_args .. _extended_status: @@ -50,23 +51,38 @@ Status ``master`` - Get the |Supvisors| master address. + Get the deduced name of the |Supvisors| *Master* instance. ``strategies`` Get the strategies applied in |Supvisors|. +``instance_status`` + + Get the status of all |Supvisors| instances. + +``instance_status identifier`` + + Get the status of the |Supvisors| instance identified by its deduced name. + +``instance_status identifier1 identifier2`` + + Get the status for multiple |Supervisor| instances identified by their deduced name. + ``address_status`` - Get the status of all |Supervisor| instances managed in |Supvisors|. + *DEPRECATED* Get the status of all |Supvisors| instances. + This command will be removed in the next version. ``address_status addr`` - Get the status of the |Supervisor| instance managed in |Supvisors| and running on addr. + *DEPRECATED* Get the status of the |Supvisors| instance identified by its deduced name. + This command will be removed in the next version. ``address_status addr1 addr2`` - Get the status for multiple addresses. + *DEPRECATED* Get the status for multiple |Supervisor| instances identified by their deduced name. + This command will be removed in the next version. ``application_info`` @@ -152,7 +168,7 @@ Status ``loglevel level`` - Change the level of |Supvisors| logger. + Change the level of the |Supvisors| logger. ``conciliate strategy`` @@ -165,11 +181,11 @@ Status ``sreload`` - Restart |Supvisors| through all |Supervisor| instances. + Restart all |Supvisors| instances. ``sshutdown`` - Shutdown |Supvisors| through all |Supervisor| instances. + Shutdown all |Supvisors| instances. .. _application_control: @@ -234,7 +250,7 @@ Process Control ``start_args proc arg_list`` - Start the process named proc on the local node and with the additional arguments arg_list passed + Start the process named proc in the local |Supvisors| instance and with the additional arguments arg_list passed to the command line. ``start_process_args strategy proc arg_list`` diff --git a/docs/xml_rpc.rst b/docs/xml_rpc.rst index 79956d24..2fe572e2 100644 --- a/docs/xml_rpc.rst +++ b/docs/xml_rpc.rst @@ -9,8 +9,7 @@ Detailed information can be found in the The ``supvisors`` namespace has been added to the :program:`supervisord` XML-RPC interface. -The XML-RPC :command:`system.listMethods` now provides the list of methods supported for both |Supervisor| and -|Supvisors|. +The XML-RPC :command:`system.listMethods` provides the list of methods supported for both |Supervisor| and |Supvisors|. .. code-block:: python @@ -68,8 +67,9 @@ Status ================== ========= =========== Key Type Description ================== ========= =========== - 'address_name' ``str`` *DEPRECATED* The |Supvisors| instance node. - 'node_name' ``str`` The |Supvisors| instance node. + 'address_name' ``str`` *DEPRECATED* The deduced name of the |Supvisors| instance. + This entry will be removed in the next version. + 'identifier' ``str`` The deduced name of the |Supvisors| instance. 'statecode' ``int`` The |Supvisors| instance state, in [0;5]. 'statename' ``str`` The |Supvisors| instance state as string, in [``'UNKNOWN'``, ``'CHECKING'``, ``'RUNNING'``, ``'SILENT'``, ``'ISOLATING'``, ``'ISOLATED'``]. @@ -114,8 +114,11 @@ Status ``'UNKNOWN'``]. 'expected_exit' ``bool`` A status telling if the process has exited expectedly. 'last_event_time' ``int`` The timestamp of the last event received for this process. - 'addresses' ``list(str)`` *DEPRECATED* The list of all nodes where the process is running. - 'nodes' ``list(str)`` The list of all nodes where the process is running. + 'addresses' ``list(str)`` *DEPRECATED* The deduced names of all |Supvisors| instances where the + process is running. + This entry will be removed in the next version. + 'identifiers' ``list(str)`` The deduced names of all |Supvisors| instances where the process is + running. 'extra_args' ``str`` The extra arguments used in the command line of the process. ================== =============== =========== @@ -127,7 +130,7 @@ Status .. note:: - If there is more than one element in the 'addresses' list, a conflict is in progress. + If there is more than one element in the 'identifiers' list, a conflict is in progress. .. automethod:: get_all_process_info() @@ -143,8 +146,8 @@ Status 'start' ``int`` The Process start date. 'now' ``int`` The Process current date. 'pid' ``int`` The UNIX process identifier. - 'startsecs' ``int`` The duration between process STARTING and RUNNING. - 'stopwaitsecs' ``int`` The duration between process STOPPING and STOPPED. + 'startsecs' ``int`` The configured duration between process STARTING and RUNNING. + 'stopwaitsecs' ``int`` The configured duration between process STOPPING and STOPPED. 'pid' ``int`` The UNIX process identifier. 'extra_args' ``str`` The extra arguments used in the command line of the process. ================== =============== =========== @@ -160,12 +163,13 @@ Status 'managed' ``bool`` The Application managed status in |Supvisors|. When ``False``, the following attributes are not provided. 'distributed' ``bool`` The Application distribution status in |Supvisors|. - 'addresses' ``list(str)`` *DEPRECATED* The list of all nodes where the non-distributed - application processes can be started, provided only if - ``distributed`` is ``False``. - 'nodes' ``list(str)`` The list of all nodes where the non-distributed application - processes can be started, provided only if ``distributed`` - is ``False``. + 'addresses' ``list(str)`` *DEPRECATED* The deduced names of all |Supvisors| instances + where the non-distributed application processes can be started, + provided only if ``distributed`` is ``False``. + This entry will be removed in the next version. + 'identifiers' ``list(str)`` The deduced names of all |Supvisors| instances where the + non-distributed application processes can be started, provided + only if ``distributed`` is ``False``. 'start_sequence' ``int`` The Application starting rank when starting all applications, in [0;127]. 'stop_sequence' ``int`` The Application stopping rank when stopping all applications, @@ -187,9 +191,11 @@ Status ========================== =============== =========== 'application_name' ``str`` The Application name the process belongs to. 'process_name' ``str`` The Process name. - 'addresses' ``list(str)`` *DEPRECATED* The list of all nodes where the process can be - started. - 'nodes' ``list(str)`` The list of all nodes where the process can be started. + 'addresses' ``list(str)`` *DEPRECATED* The deduced names of all |Supvisors| instances + where the process can be started. + This entry will be removed in the next version. + 'identifiers' ``list(str)`` The deduced names of all |Supvisors| instances where the process + can be started. 'start_sequence' ``int`` The Process starting rank when starting the related application, in [0;127]. 'stop_sequence' ``int`` The Process stopping rank when stopping the related application, @@ -280,7 +286,7 @@ The parameter requires a dictionary with the following variables set: * ``SUPERVISOR_PASSWORD``: the password for the HTTP authentication (may be void). If the Python client has been spawned by Supervisor, the environment already contains these parameters but they are -configured to communicate with the local |Supervisor| instance. If the Python client has to communicate with a distant +configured to communicate with the local |Supervisor| instance. If the Python client has to communicate with another |Supervisor| instance, the parameters must be set accordingly. .. code-block:: python diff --git a/setup.py b/setup.py index 7c748357..e5707a1f 100644 --- a/setup.py +++ b/setup.py @@ -53,6 +53,7 @@ "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Programming Language :: Python :: 3.8", + "Programming Language :: Python :: 3.9", "Topic :: System :: Boot", "Topic :: System :: Monitoring", "Topic :: System :: Software Distribution" @@ -61,31 +62,30 @@ version_txt = os.path.join(here, 'supvisors/version.txt') supvisors_version = open(version_txt).read().split('=')[1].strip() -dist = setup( - name='supvisors', - version=supvisors_version, - description="A Control System for Distributed Applications", - long_description=README + '\n\n' + CHANGES, - long_description_content_type='text/markdown', - classifiers=CLASSIFIERS, - author="Julien Le Cléach", - author_email="julien.6387.dev@gmail.com", - url="https://github.com/julien6387/supvisors", - download_url='https://github.com/julien6387/supvisors/archive/%s.tar.gz' % supvisors_version, - platforms=[ - "CentOS 8.3" - ], - packages=find_packages(), - install_requires=requires, - extras_require={'ip_address': ip_require, - 'statistics': statistics_require, - 'xml_valid': xml_valid_require, - 'all': ip_require + statistics_require + xml_valid_require, - 'testing': testing_extras}, - include_package_data=True, - zip_safe=False, - namespace_packages=['supvisors'], - test_suite="supvisors.tests", - entry_points={'console_scripts': ['supvisorsctl = supvisors.supvisorsctl:main', - 'supvisors_breed = supvisors.tools.breed:main']} -) +setup(name='supvisors', + version=supvisors_version, + description="A Control System for Distributed Applications", + long_description=README + '\n\n' + CHANGES, + long_description_content_type='text/markdown', + classifiers=CLASSIFIERS, + author="Julien Le Cléach", + author_email="julien.6387.dev@gmail.com", + url="https://github.com/julien6387/supvisors", + download_url='https://github.com/julien6387/supvisors/archive/%s.tar.gz' % supvisors_version, + platforms=[ + "CentOS 8.3" + ], + packages=find_packages(), + install_requires=requires, + extras_require={'ip_address': ip_require, + 'statistics': statistics_require, + 'xml_valid': xml_valid_require, + 'all': ip_require + statistics_require + xml_valid_require, + 'testing': testing_extras}, + include_package_data=True, + zip_safe=False, + namespace_packages=['supvisors'], + test_suite="supvisors.tests", + entry_points={'console_scripts': ['supvisorsctl = supvisors.supvisorsctl:main', + 'supvisors_breed = supvisors.tools.breed:main']} + ) diff --git a/supvisors/commander.py b/supvisors/commander.py index b28e635f..c10718d2 100644 --- a/supvisors/commander.py +++ b/supvisors/commander.py @@ -39,7 +39,7 @@ class ProcessCommand(object): - identifiers: the identifiers of the Supvisors instances where the commands are requested ; - request_time: the date when the command is requested. """ - TIMEOUT = 5 + TIMEOUT = 3 def __init__(self, process: ProcessStatus) -> None: """ Initialization of the attributes. @@ -47,6 +47,7 @@ def __init__(self, process: ProcessStatus) -> None: :param process: the process status to wrap """ self.process: ProcessStatus = process + self.logger: Logger = process.logger self.identifiers: NameList = [] self.request_time: int = 0 @@ -117,7 +118,10 @@ def timed_out(self, now: float) -> bool: local_state = local_info['state'] if local_state in [ProcessStates.BACKOFF, ProcessStates.STARTING]: # the RUNNING state is expected after startsecs seconds - if self.request_time + local_info['startsecs'] + ProcessCommand.TIMEOUT < now: + delay = local_info['startsecs'] + ProcessCommand.TIMEOUT + if self.request_time + delay < now: + self.logger.error(f'ProcessStartCommand.timed_out: {self.process.namespec}' + f' still not RUNNING after {delay} seconds so abort') return True elif local_state == ProcessStates.RUNNING: # if the evaluation is done in this state, the EXITED state must be expected @@ -125,9 +129,11 @@ def timed_out(self, now: float) -> bool: pass else: # from STOPPED_STATES, STARTING or BACKOFF event is expected quite immediately - # a STOPPING state is unexpected - # an external request may have been performed (e.g. stop request while in STARTING state) + # a STOPPING state is unexpected unless an external request has been performed (e.g. stop request while + # in STARTING state) if self.request_time + ProcessCommand.TIMEOUT < now: + self.logger.error(f'ProcessStartCommand.timed_out: {self.process.namespec}' + f' still not STARTING or BACKOFF after {ProcessCommand.TIMEOUT} seconds so abort') return True return False @@ -155,12 +161,17 @@ def timed_out(self, now: float) -> bool: local_state = local_info['state'] if local_state == ProcessStates.STOPPING: # the STOPPED state is expected after stopwaitsecs seconds - if self.request_time + local_info['stopwaitsecs'] + ProcessCommand.TIMEOUT < now: + delay = local_info['stopwaitsecs'] + ProcessCommand.TIMEOUT + if self.request_time + delay < now: + self.logger.error(f'ProcessStopCommand.timed_out: {self.process.namespec}' + f' still not STOPPED after {delay} seconds so abort') return True else: # from RUNNING_STATES, STOPPING event is expected quite immediately - # STOPPED_STATES are unexpected. this wrapper should have been removed + # STOPPED_STATES are unexpected because this wrapper would have been removed if self.request_time + ProcessCommand.TIMEOUT < now: + self.logger.error(f'ProcessStopCommand.timed_out: {self.process.namespec}' + f' still not STOPPING after {ProcessCommand.TIMEOUT} seconds so abort') return True return False @@ -286,7 +297,11 @@ def next(self) -> None: self.logger.debug(f'ApplicationJobs.next: application_name={self.application_name}' f' - next sequence={sequence_number} group={group}') # trigger application jobs - self.current_jobs = [command for command in group if self.process_job(command)] + # do NOT use a list comprehension as pending requests will not be considered in instance load + # process the jobs one by one and insert them in current_jobs asap + for command in group: + if self.process_job(command): + self.current_jobs.append(command) self.logger.trace(f'ApplicationJobs.next: current_jobs={self.current_jobs}') # recursive call in the event where there's already nothing left to do self.next() @@ -321,8 +336,6 @@ def check(self) -> None: for command in list(self.current_jobs): # get the ProcessStatus method corresponding to condition and call it if command.timed_out(now): - self.logger.error(f'ApplicationJobs.check_current_jobs: {command.process.namespec}' - f' still not acknowledged after {ProcessCommand.TIMEOUT} seconds so abort') # generate a process event for this process to inform all Supvisors instances reason = f'no process event received in time' self.supvisors.listener.force_process_state(command.process.namespec, self.failure_state, reason) diff --git a/supvisors/context.py b/supvisors/context.py index 3ae73ec6..38fce830 100644 --- a/supvisors/context.py +++ b/supvisors/context.py @@ -93,9 +93,9 @@ def running_core_identifiers(self) -> bool: :return: True if all core SupvisorsInstanceStatus are in RUNNING state """ - if self.supvisors.options.force_synchro_if: + if self.supvisors.supvisors_mapper.core_identifiers: identifiers = self.running_identifiers() - return all(identifier in identifiers for identifier in self.supvisors.options.force_synchro_if) + return all(identifier in identifiers for identifier in self.supvisors.supvisors_mapper.core_identifiers) def isolating_instances(self) -> NameList: """ Return the identifiers of the Supervisor instances in ISOLATING state. """ diff --git a/supvisors/initializer.py b/supvisors/initializer.py index 986bab77..5c8eb1f3 100644 --- a/supvisors/initializer.py +++ b/supvisors/initializer.py @@ -61,7 +61,7 @@ def __init__(self, supervisor: Supervisor, **config) -> None: # get declared Supvisors instances and check local identifier self.supvisors_mapper = SupvisorsMapper(self) try: - self.supvisors_mapper.configure(self.options.supvisors_list) + self.supvisors_mapper.configure(self.options.supvisors_list, self.options.core_identifiers) except ValueError as exc: self.logger.critical(f'Supvisors: {exc}') raise RPCError(Faults.SUPVISORS_CONF_ERROR, str(exc)) diff --git a/supvisors/options.py b/supvisors/options.py index fdefdcd6..194bfb80 100644 --- a/supvisors/options.py +++ b/supvisors/options.py @@ -44,7 +44,7 @@ class SupvisorsOptions(object): - event_port: port number used to publish all Supvisors events, - auto_fence: when True, Supvisors won't try to reconnect to a Supvisors instance that has been inactive, - synchro_timeout: time in seconds that Supvisors waits for all expected Supvisors instances to publish, - - force_synchro_if: subset of supvisors_list that will force the end of synchro when all RUNNING, + - core_identifiers: subset of supvisors_list identifiers that will force the end of synchro when all RUNNING, - conciliation_strategy: strategy used to solve conflicts when Supvisors has detected multiple running instances of the same program, - starting_strategy: strategy used to start processes on Supvisors instances, @@ -82,8 +82,12 @@ def __init__(self, supervisord, **config): self.event_port = self.to_port_num(config.get('event_port', '0')) self.auto_fence = boolean(config.get('auto_fence', 'false')) self.synchro_timeout = self.to_timeout(config.get('synchro_timeout', str(self.SYNCHRO_TIMEOUT_MIN))) - self.force_synchro_if = filter(None, list_of_strings(config.get('force_synchro_if', None))) - self.force_synchro_if = {node for node in self.force_synchro_if if node in self.supvisors_list} + # get the minimum list of identifiers to end the synchronization phase + if 'force_synchro_if' in config: + print('SupvisorsOptions: force_synchro_if is DEPRECATED. please use core_identifiers') + core_identifiers = config.get('core_identifiers', config.get('force_synchro_if', None)) + self.core_identifiers = set(filter(None, list_of_strings(core_identifiers))) + # get strategies self.conciliation_strategy = self.to_conciliation_strategy(config.get('conciliation_strategy', 'USER')) self.starting_strategy = self.to_starting_strategy(config.get('starting_strategy', 'CONFIG')) # configure statistics @@ -101,7 +105,7 @@ def __str__(self): """ Contents as string. """ return (f'supvisors_list={self.supvisors_list} rules_files={self.rules_files}' f' internal_port={self.internal_port} event_port={self.event_port} auto_fence={self.auto_fence}' - f' synchro_timeout={self.synchro_timeout} force_synchro_if={self.force_synchro_if}' + f' synchro_timeout={self.synchro_timeout} core_identifiers={self.core_identifiers}' f' conciliation_strategy={self.conciliation_strategy.name}' f' starting_strategy={self.starting_strategy.name}' f' stats_enabled={self.stats_enabled} stats_periods={self.stats_periods} stats_histo={self.stats_histo}' @@ -234,6 +238,7 @@ def _processes_from_section(self, parser, section, group_name, klass=None) -> Li """ This method is overridden to: store the program number of a homogeneous program. This is originally used in Supervisor to set the real program name from the format defined in the ini file. However, Supervisor does not keep this information in its internal structure. + :param parser: the config parser :param section: the program section :param group_name: the group that embeds the program definition @@ -258,6 +263,11 @@ def _processes_from_section(self, parser, section, group_name, klass=None) -> Li return process_configs def get_section(self, program_name: str): + """ Get the Supervisor relevant section name depending on the program name. + + :param program_name: the name of the program configured + :return: the Supervisor section name + """ klass = self.program_class[program_name] if klass is FastCGIProcessConfig: return f'fcgi-program:{program_name}' @@ -267,6 +277,7 @@ def get_section(self, program_name: str): def update_numprocs(self, program_name: str, numprocs: int) -> str: """ This method updates the numprocs value directly in the configuration parser. + :param program_name: the program name, as found in the sections of the Supervisor configuration files :param numprocs: the new numprocs value :return: The section updated @@ -278,6 +289,7 @@ def update_numprocs(self, program_name: str, numprocs: int) -> str: def reload_processes_from_section(self, section: str, group_name: str) -> List[ProcessConfig]: """ This method rebuilds the ProcessConfig instances for the program. + :param section: the program section in the configuration files :param group_name: the group that embeds the program definition :return: the list of ProcessConfig diff --git a/supvisors/rpcinterface.py b/supvisors/rpcinterface.py index 6785ab3b..cb9f5827 100644 --- a/supvisors/rpcinterface.py +++ b/supvisors/rpcinterface.py @@ -72,17 +72,17 @@ def get_supvisors_state(self): return self.supvisors.fsm.serial() def get_master_identifier(self): - """ Get the identificqtion of the Supvisors instance elected as **Supvisors** Master. + """ Get the identification of the Supvisors instance elected as **Supvisors** Master. - *@return* ``str``: the IPv4 address or host name. + *@return* ``str``: the Supvisors identifier. """ return self.supvisors.context.master_identifier def get_master_address(self): - """ Get the identificqtion of the Supvisors instance elected as **Supvisors** Master. + """ Get the identification of the Supvisors instance elected as **Supvisors** Master. *DEPRECATED* use ``get_master_identifier``. - *@return* ``str``: the IPv4 address or host name. + *@return* ``str``: the Supvisors identifier. """ self.logger.warn('RPCInterface.get_master_address: DEPRECATED. use get_master_identifier') return self.get_master_identifier() @@ -139,9 +139,9 @@ def get_address_info(self, node_name): """ Get information about the **Supvisors** instance running on the host named node. *DEPRECATED* use ``get_instance_info``. - *@param* ``str identifier``: the node where the Supervisor daemon is running. + *@param* ``str node_name``: the identifier of the Supvisors instance where the Supervisor daemon is running. - *@throws* ``RPCError``: with code ``Faults.INCORRECT_PARAMETERS`` if node is unknown to **Supvisors**. + *@throws* ``RPCError``: with code ``Faults.INCORRECT_PARAMETERS`` if the identifier is unknown to **Supvisors**. *@return* ``dict``: a structure containing data about the **Supvisors** instance. """ diff --git a/supvisors/sparser.py b/supvisors/sparser.py index 9bc8689d..f871e159 100644 --- a/supvisors/sparser.py +++ b/supvisors/sparser.py @@ -127,7 +127,7 @@ def load_application_rules(self, application_name: str, rules: ApplicationRules) def get_application_element(self, application_name: str) -> Optional[Any]: """ Try to find the definition of an application in rules files. - First try to to find the definition based on the exact application name. + First try to find the definition based on the exact application name. If not found, second try to find a corresponding pattern. :param application_name: the application name @@ -214,7 +214,7 @@ def get_element_name(elt: Any): def get_program_element(self, namespec: str) -> Optional[Any]: """ Try to find the definition of a program in rules files. - First try to to find the definition based on the exact program name. + First try to find the definition based on the exact program name. If not found, second try to find a corresponding pattern. :param namespec: the process namespec @@ -271,8 +271,8 @@ def check_identifier_list(self, identifier_list: str): for alias_name, alias in self.aliases.items(): if alias_name in identifiers: pos = identifiers.index(alias_name) - identifiers[pos:pos] = alias - # keep reference to hashtag as it will be removed by the filters + identifiers[pos:pos+1] = alias + # keep reference to hashtag as it will be removed by supvisors_mapper.filter ref_hashtag = '#' in identifiers if '*' in identifiers: identifiers = ['*'] diff --git a/supvisors/statemachine.py b/supvisors/statemachine.py index e7f983fe..9274ef2e 100644 --- a/supvisors/statemachine.py +++ b/supvisors/statemachine.py @@ -38,7 +38,7 @@ class AbstractState(object): - supvisors: the reference to the global Supvisors structure ; - context: the reference to the context of the global Supvisors structure ; - logger: the reference to the logger of the global Supvisors structure ; - - local_identifier: the identifier of the local Supvisors insatnce. + - local_identifier: the identifier of the local Supvisors instance. """ def __init__(self, supvisors: Any) -> None: @@ -100,7 +100,7 @@ class InitializationState(AbstractState): """ In the INITIALIZATION state, Supvisors synchronizes to all known Supvisors instances. """ def enter(self) -> None: - """ When entering in the INITIALIZATION state, reset the context. + """ When entering the INITIALIZATION state, reset the context. :return: None """ @@ -153,11 +153,13 @@ def exit(self) -> None: # force state of missing Supvisors instances running_identifiers = self.context.running_identifiers() self.logger.info(f'InitializationState.exit: working with Supvisors instances {running_identifiers}') - # elect master insatnce among working instances only if not fixed before + # elect master instance among working instances only if not fixed before # of course master instance must be running + self.logger.debug(f'InitializationState.exit: master_identifier={self.context.master_identifier}') if not self.context.master_identifier or self.context.master_identifier not in running_identifiers: # choose Master among the core instances because these instances are expected to be more stable - core_identifiers = self.supvisors.options.force_synchro_if + core_identifiers = self.supvisors.supvisors_mapper.core_identifiers + self.logger.info(f'InitializationState.exit: core_identifiers={core_identifiers}') if core_identifiers: running_core_identifiers = set(running_identifiers).intersection(core_identifiers) if running_core_identifiers: @@ -208,7 +210,7 @@ def next(self) -> SupvisorsStates: # check duplicated processes if self.context.conflicting(): return SupvisorsStates.CONCILIATION - # a redeploy mark has been set due to a new alive Supvisors instance + # a redeployment mark has been set due to a new alive Supvisors instance # back to DEPLOYMENT state to repair what may have failed before if self.supvisors.fsm.redeploy_mark: return SupvisorsStates.DEPLOYMENT @@ -218,7 +220,7 @@ class MasterConciliationState(AbstractState): """ In the CONCILIATION state, Supvisors conciliates the conflicts. """ def enter(self) -> None: - """ When entering in the CONCILIATION state, conciliate automatically the conflicts. """ + """ When entering the CONCILIATION state, conciliate automatically the conflicts. """ conciliate_conflicts(self.supvisors, self.supvisors.options.conciliation_strategy, self.context.conflicts()) @@ -248,7 +250,7 @@ class MasterRestartingState(AbstractState): """ In the RESTARTING state, Supvisors stops all applications before triggering a full restart. """ def enter(self) -> None: - """ When entering in the RESTARTING state, stop all applications. + """ When entering the RESTARTING state, stop all applications. :return: None """ @@ -272,7 +274,7 @@ def next(self) -> SupvisorsStates: def exit(self) -> None: """ When leaving the RESTARTING state, request the full restart. """ self.supvisors.zmq.pusher.send_restart(self.local_identifier) - # other instances will shutdown on reception of SHUTDOWN state + # other instances will shut down on reception of SHUTDOWN state # due to Supvisors design, the state publication will be fired before the send_shutdown @@ -302,7 +304,7 @@ def next(self): def exit(self): """ When leaving the SHUTTING_DOWN state, request the Supervisor shutdown. """ self.supvisors.zmq.pusher.send_shutdown(self.local_identifier) - # other instances will shutdown on reception of SHUTDOWN state + # other instances will shut down on reception of SHUTDOWN state # due to Supvisors design, the state publication will be fired before the send_shutdown @@ -328,7 +330,7 @@ class SlaveRestartingState(AbstractState): """ In the RESTARTING state, Supvisors stops all applications before triggering a full restart. """ def enter(self) -> None: - """ When entering in the RESTARTING state, abort all pending tasks applications. + """ When entering the RESTARTING state, abort all pending tasks applications. :return: None """ @@ -552,7 +554,7 @@ def on_authorization(self, identifier: str, authorized: bool, master_identifier: if master_identifier: if not self.context.master_identifier: # local Supvisors doesn't know about a master yet but remote Supvisors does - # typically happens when the local Supervisor has just been started whereas a Supvisors group + # typically happen when the local Supervisor has just been started whereas a Supvisors group # was already operating, so accept remote perception self.logger.warn(f'FiniteStateMachine.on_authorization: accept Master={master_identifier}' f' declared by Supvisors={identifier}') diff --git a/supvisors/supvisorsmapper.py b/supvisors/supvisorsmapper.py index f33b7929..a089d57f 100644 --- a/supvisors/supvisorsmapper.py +++ b/supvisors/supvisorsmapper.py @@ -40,7 +40,7 @@ class SupvisorsInstanceId(object): - event_port: the port number used to publish all Supvisors events. """ - PATTERN = re.compile(r'^(<(?P[\w\-]+)>)?(?P[\w\-\.]+)(:(?P\d{4,5})?' + PATTERN = re.compile(r'^(<(?P[\w\-]+)>)?(?P[\w\-.]+)(:(?P\d{4,5})?' r':(?P\d{4,5})?)?$') def __init__(self, item: str, supvisors: Any): @@ -123,11 +123,16 @@ class SupvisorsMapper(object): - logger: the reference to the common logger ; - _instances: the list of Supvisors instances declared in the supvisors section of the Supervisor configuration file ; + - _core_identifiers: the list of Supvisors core identifiers declared in the supvisors section of the Supervisor + configuration file ; - local_node_references: the list of known aliases of the current node, i.e. the host name and the IPv4 addresses ; - local_identifier: the local Supvisors identifier. """ + # annotation types + InstanceMap = Dict[str, SupvisorsInstanceId] + def __init__(self, supvisors: Any): """ Initialization of the attributes. @@ -137,7 +142,8 @@ def __init__(self, supvisors: Any): self.supvisors = supvisors self.logger: Logger = supvisors.logger # init attributes - self._instances: Dict[str, SupvisorsInstanceId] = OrderedDict() + self._instances: SupvisorsMapper.InstanceMap = OrderedDict() + self._core_identifiers: NameList = [] self.local_node_references = [gethostname(), *self.ipv4()] self.logger.debug(f'SupvisorsMapper: local_node_references={self.local_node_references}') self.local_identifier = None @@ -151,33 +157,45 @@ def local_instance(self) -> SupvisorsInstanceId: return self._instances[self.local_identifier] @property - def instances(self) -> NameList: + def instances(self) -> InstanceMap: """ Property getter for the _instances attribute. :return: the list of Supvisors instances configured in Supvisors """ return self._instances - def configure(self, supvisors_list: NameList) -> None: + @property + def core_identifiers(self) -> NameList: + """ Property getter for the _core_identifiers attribute. + + :return: the minimum Supvisors identifiers to end the synchronization phase + """ + return self._core_identifiers + + def configure(self, supvisors_list: NameList, core_list: NameList) -> None: """ Store the identification of the Supvisors instances declared in the configuration file and determine the local Supvisors instance in this list. :param supvisors_list: the Supvisors instances declared in the supvisors section of the configuration file + :param core_list: the minimum Supvisors identifiers to end the synchronization phase :return: None """ # get Supervisor identification from each element for item in supvisors_list: supvisors_id = SupvisorsInstanceId(item, self.supvisors) if supvisors_id.identifier: - self.logger.debug(f'SupvisorsMapper.instances: new SupvisorsInstanceId={supvisors_id}') + self.logger.debug(f'SupvisorsMapper.configure: new SupvisorsInstanceId={supvisors_id}') self._instances[supvisors_id.identifier] = supvisors_id else: message = f'could not parse Supvisors identification from {item}' self.logger.error(f'SupvisorsMapper.instances: {message}') raise ValueError(message) - self.logger.info(f'SupvisorsMapper.instances: {list(self._instances.keys())}') + self.logger.info(f'SupvisorsMapper.configure: identifiers={list(self._instances.keys())}') # get local Supervisor identification from list self.find_local_identifier() + # check core identifiers + self._core_identifiers = self.filter(core_list) + self.logger.info(f'SupvisorsMapper.configure: core_identifiers={self._core_identifiers}') def find_local_identifier(self): """ Find the local Supvisors identification in the list declared in the configuration file. @@ -224,6 +242,10 @@ def filter(self, identifier_list: NameList) -> NameList: """ # filter unknown Supvisors identifiers identifiers = [identifier for identifier in identifier_list if self.valid(identifier)] + # log invalid identifiers to warn the user + for identifier in identifier_list: + if identifier != '#' and identifier not in identifiers: # no warn for hashtag + self.logger.warn(f'SupvisorsMapper.valid: identifier={identifier} invalid') # remove duplicates keeping the same ordering return list(OrderedDict.fromkeys(identifiers)) diff --git a/supvisors/test/etc/my_movies.xml b/supvisors/test/etc/my_movies.xml index 0297f06e..f9b5b0ee 100644 --- a/supvisors/test/etc/my_movies.xml +++ b/supvisors/test/etc/my_movies.xml @@ -32,7 +32,7 @@ STOP - #,cliche82,cliche83:60000,cliche84 + distribute_sublist 1 true 0 diff --git a/supvisors/test/etc/supervisord.conf b/supvisors/test/etc/supervisord.conf index 4c4a3a4a..9f9b0726 100644 --- a/supvisors/test/etc/supervisord.conf +++ b/supvisors/test/etc/supervisord.conf @@ -7,8 +7,6 @@ port=:60000 logfile=./log/supervisord.log loglevel=info pidfile=/tmp/supervisord.pid -nodaemon=false -identifier=cliche81 [rpcinterface:supervisor] supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface @@ -21,13 +19,13 @@ files = common/*/*.ini %(host_node_name)s/*.ini %(host_node_name)s/*/*.ini [rpcinterface:supvisors] supervisor.rpcinterface_factory = supvisors.plugin:make_supvisors_rpcinterface -supvisors_list = cliche81,cliche82,cliche83:60000:60001,cliche84 -rules_files = ./etc/my_movies*.xml +supvisors_list = cliche81,192.168.1.49,cliche83:60000:60001,cliche84 +rules_files = etc/my_movies*.xml auto_fence = false internal_port = 60001 event_port = 60002 synchro_timeout = 20 -force_synchro_if = cliche81,192.168.1.49 +core_identifiers = cliche81,cliche82 starting_strategy = CONFIG conciliation_strategy = USER stats_enabled = true diff --git a/supvisors/test/scripts/check_running_strategy.py b/supvisors/test/scripts/check_running_strategy.py index 9ff633a7..ab2943da 100644 --- a/supvisors/test/scripts/check_running_strategy.py +++ b/supvisors/test/scripts/check_running_strategy.py @@ -107,7 +107,7 @@ def test_restart_process(self): # focus only on web_browser expected_events = [{'group': 'web_movies', 'name': 'web_browser', 'state': 10}, {'group': 'web_movies', 'name': 'web_browser', 'state': 20}] - received_events = self.evloop.wait_until_events(self.evloop.event_queue, expected_events, 15) + received_events = self.evloop.wait_until_events(self.evloop.event_queue, expected_events, 20) self.assertEqual(2, len(received_events)) self.assertEqual([], expected_events) # STARTING / RUNNING events are expected for web_movies application diff --git a/supvisors/test/scripts/running_identifiers.py b/supvisors/test/scripts/running_identifiers.py index 28a6d39a..5001101e 100644 --- a/supvisors/test/scripts/running_identifiers.py +++ b/supvisors/test/scripts/running_identifiers.py @@ -59,7 +59,7 @@ def setUp(self): supervisor_url = SupervisorServerUrl(os.environ.copy()) self.proxies = {} for identifier in self.running_identifiers: - supervisor_url.update_url(identifier) + supervisor_url.update_url(identifier.split(':')[0]) self.proxies[identifier] = getRPCInterface(supervisor_url.env) # create the thread of event subscriber self.zcontext = zmq.Context.instance() diff --git a/supvisors/test/scripts/start_all.sh b/supvisors/test/scripts/start_all.sh index 1df1119d..1b59ff66 100755 --- a/supvisors/test/scripts/start_all.sh +++ b/supvisors/test/scripts/start_all.sh @@ -8,11 +8,9 @@ TEST_DIR=`readlink -e $SCRIPTS_DIR/..` for host in ${@:-cliche81 cliche82 cliche83} do echo "start Supervisor on host" $host - ping -c 1 $host 2>&1 >/dev/null && ssh -fX $host "cd $TEST_DIR ; rm -rf log/* + ping -c 1 $host 2>&1 >/dev/null && ssh $host "cd $TEST_DIR ; rm -rf log/* export DISPLAY=:0 - export IDENTIFIER=$host - sed -i 's/identifier=.*$/identifier='\$IDENTIFIER'/' etc/supervisord.conf - supervisord" + supervisord -i $host" done cd $TEST_DIR diff --git a/supvisors/test/use_cases/gathering/etc/scen1_supvisors_rules.xml b/supvisors/test/use_cases/gathering/etc/scen1_supvisors_rules.xml new file mode 120000 index 00000000..6fbacb5c --- /dev/null +++ b/supvisors/test/use_cases/gathering/etc/scen1_supvisors_rules.xml @@ -0,0 +1 @@ +../../scenario_1/etc/supvisors_rules.xml \ No newline at end of file diff --git a/supvisors/test/use_cases/gathering/etc/scen2_supvisors_rules.xml b/supvisors/test/use_cases/gathering/etc/scen2_supvisors_rules.xml new file mode 120000 index 00000000..3b88bab0 --- /dev/null +++ b/supvisors/test/use_cases/gathering/etc/scen2_supvisors_rules.xml @@ -0,0 +1 @@ +../../scenario_2/etc/supvisors_rules.xml \ No newline at end of file diff --git a/supvisors/test/use_cases/gathering/etc/scen3_supvisors_rules.xml b/supvisors/test/use_cases/gathering/etc/scen3_supvisors_rules.xml new file mode 120000 index 00000000..6b04c9e6 --- /dev/null +++ b/supvisors/test/use_cases/gathering/etc/scen3_supvisors_rules.xml @@ -0,0 +1 @@ +../../scenario_3/etc/supvisors_rules.xml \ No newline at end of file diff --git a/supvisors/test/use_cases/gathering/etc/scenario_1 b/supvisors/test/use_cases/gathering/etc/scenario_1 new file mode 120000 index 00000000..b08a386e --- /dev/null +++ b/supvisors/test/use_cases/gathering/etc/scenario_1 @@ -0,0 +1 @@ +../../scenario_1/etc \ No newline at end of file diff --git a/supvisors/test/use_cases/gathering/etc/scenario_1/cliche81/group_cliche81.ini b/supvisors/test/use_cases/gathering/etc/scenario_1/cliche81/group_cliche81.ini deleted file mode 100644 index c7ca53d9..00000000 --- a/supvisors/test/use_cases/gathering/etc/scenario_1/cliche81/group_cliche81.ini +++ /dev/null @@ -1,2 +0,0 @@ -[group:scen1] -programs=scen1_wait_nfs_mount_1,scen1_hci,scen1_config_manager,scen1_data_processing,scen1_external_interface,scen1_data_recorder diff --git a/supvisors/test/use_cases/gathering/etc/scenario_1/cliche81/programs_cliche81.ini b/supvisors/test/use_cases/gathering/etc/scenario_1/cliche81/programs_cliche81.ini deleted file mode 100644 index b032ac9b..00000000 --- a/supvisors/test/use_cases/gathering/etc/scenario_1/cliche81/programs_cliche81.ini +++ /dev/null @@ -1,44 +0,0 @@ -[program:scen1_hci] -command=bin/scen1/hci.sh -autostart=false -killasgroup=true -stopasgroup=true -redirect_stderr=true -stdout_logfile=log/%(program_name)s_%(ENV_CUR_DATE)s_%(ENV_CUR_TIME)s.log -stdout_logfile_backups=5 - -[program:scen1_config_manager] -command=bin/scen1/config_manager.sh -autostart=false -killasgroup=true -stopasgroup=true -redirect_stderr=true -stdout_logfile=log/%(program_name)s_%(ENV_CUR_DATE)s_%(ENV_CUR_TIME)s.log -stdout_logfile_backups=5 - -[program:scen1_data_processing] -command=bin/scen1/data_processing.sh -autostart=false -killasgroup=true -stopasgroup=true -redirect_stderr=true -stdout_logfile=log/%(program_name)s_%(ENV_CUR_DATE)s_%(ENV_CUR_TIME)s.log -stdout_logfile_backups=5 - -[program:scen1_external_interface] -command=bin/scen1/external_interface.sh -autostart=false -killasgroup=true -stopasgroup=true -redirect_stderr=true -stdout_logfile=log/%(program_name)s_%(ENV_CUR_DATE)s_%(ENV_CUR_TIME)s.log -stdout_logfile_backups=5 - -[program:scen1_data_recorder] -command=bin/scen1/data_recorder.sh -autostart=false -killasgroup=true -stopasgroup=true -redirect_stderr=true -stdout_logfile=log/%(program_name)s_%(ENV_CUR_DATE)s_%(ENV_CUR_TIME)s.log -stdout_logfile_backups=5 diff --git a/supvisors/test/use_cases/gathering/etc/scenario_1/cliche81/wait_nfs_mount.ini b/supvisors/test/use_cases/gathering/etc/scenario_1/cliche81/wait_nfs_mount.ini deleted file mode 100644 index 1852e758..00000000 --- a/supvisors/test/use_cases/gathering/etc/scenario_1/cliche81/wait_nfs_mount.ini +++ /dev/null @@ -1,8 +0,0 @@ -[program:scen1_wait_nfs_mount_1] -command=bin/scen1/wait_nfs_mount.sh -autostart=false -killasgroup=true -stopasgroup=true -redirect_stderr=true -stdout_logfile=log/%(program_name)s_%(ENV_CUR_DATE)s_%(ENV_CUR_TIME)s.log -stdout_logfile_backups=5 diff --git a/supvisors/test/use_cases/gathering/etc/scenario_1/cliche82/group_cliche82.ini b/supvisors/test/use_cases/gathering/etc/scenario_1/cliche82/group_cliche82.ini deleted file mode 100644 index 9e71b290..00000000 --- a/supvisors/test/use_cases/gathering/etc/scenario_1/cliche82/group_cliche82.ini +++ /dev/null @@ -1,2 +0,0 @@ -[group:scen1] -programs=scen1_wait_nfs_mount_2,scen1_sensor_acquisition_1,scen1_sensor_processing_1 diff --git a/supvisors/test/use_cases/gathering/etc/scenario_1/cliche82/programs_cliche82.ini b/supvisors/test/use_cases/gathering/etc/scenario_1/cliche82/programs_cliche82.ini deleted file mode 100644 index 7e10631a..00000000 --- a/supvisors/test/use_cases/gathering/etc/scenario_1/cliche82/programs_cliche82.ini +++ /dev/null @@ -1,18 +0,0 @@ -[program:scen1_sensor_acquisition_1] -command=bin/scen1/sensor_acquisition.sh -autostart=false -killasgroup=true -stopasgroup=true -redirect_stderr=true -stdout_logfile=log/%(program_name)s_%(ENV_CUR_DATE)s_%(ENV_CUR_TIME)s.log -stdout_logfile_backups=5 - -[program:scen1_sensor_processing_1] -command=bin/scen1/sensor_processing.sh -autostart=false -killasgroup=true -stopasgroup=true -redirect_stderr=true -stdout_logfile=log/%(program_name)s_%(ENV_CUR_DATE)s_%(ENV_CUR_TIME)s.log -stdout_logfile_backups=5 - diff --git a/supvisors/test/use_cases/gathering/etc/scenario_1/cliche82/wait_nfs_mount.ini b/supvisors/test/use_cases/gathering/etc/scenario_1/cliche82/wait_nfs_mount.ini deleted file mode 100644 index 1cb661d3..00000000 --- a/supvisors/test/use_cases/gathering/etc/scenario_1/cliche82/wait_nfs_mount.ini +++ /dev/null @@ -1,8 +0,0 @@ -[program:scen1_wait_nfs_mount_2] -command=bin/scen1/wait_nfs_mount.sh -autostart=false -killasgroup=true -stopasgroup=true -redirect_stderr=true -stdout_logfile=log/%(program_name)s_%(ENV_CUR_DATE)s_%(ENV_CUR_TIME)s.log -stdout_logfile_backups=5 diff --git a/supvisors/test/use_cases/gathering/etc/scenario_1/cliche83/group_cliche83.ini b/supvisors/test/use_cases/gathering/etc/scenario_1/cliche83/group_cliche83.ini deleted file mode 100644 index 85d6755d..00000000 --- a/supvisors/test/use_cases/gathering/etc/scenario_1/cliche83/group_cliche83.ini +++ /dev/null @@ -1,2 +0,0 @@ -[group:scen1] -programs=scen1_wait_nfs_mount_3,scen1_sensor_acquisition_2,scen1_sensor_processing_2 diff --git a/supvisors/test/use_cases/gathering/etc/scenario_1/cliche83/programs_cliche83.ini b/supvisors/test/use_cases/gathering/etc/scenario_1/cliche83/programs_cliche83.ini deleted file mode 100644 index 523d7157..00000000 --- a/supvisors/test/use_cases/gathering/etc/scenario_1/cliche83/programs_cliche83.ini +++ /dev/null @@ -1,17 +0,0 @@ -[program:scen1_sensor_acquisition_2] -command=bin/scen1/sensor_acquisition.sh -autostart=false -killasgroup=true -stopasgroup=true -redirect_stderr=true -stdout_logfile=log/%(program_name)s_%(ENV_CUR_DATE)s_%(ENV_CUR_TIME)s.log -stdout_logfile_backups=5 - -[program:scen1_sensor_processing_2] -command=bin/scen1/sensor_processing.sh -autostart=false -killasgroup=true -stopasgroup=true -redirect_stderr=true -stdout_logfile=log/%(program_name)s_%(ENV_CUR_DATE)s_%(ENV_CUR_TIME)s.log -stdout_logfile_backups=5 diff --git a/supvisors/test/use_cases/gathering/etc/scenario_1/cliche83/wait_nfs_mount.ini b/supvisors/test/use_cases/gathering/etc/scenario_1/cliche83/wait_nfs_mount.ini deleted file mode 100644 index cf85cba2..00000000 --- a/supvisors/test/use_cases/gathering/etc/scenario_1/cliche83/wait_nfs_mount.ini +++ /dev/null @@ -1,8 +0,0 @@ -[program:scen1_wait_nfs_mount_3] -command=bin/scen1/wait_nfs_mount.sh -autostart=false -killasgroup=true -stopasgroup=true -redirect_stderr=true -stdout_logfile=log/%(program_name)s_%(ENV_CUR_DATE)s_%(ENV_CUR_TIME)s.log -stdout_logfile_backups=5 diff --git a/supvisors/test/use_cases/gathering/etc/scenario_1/localhost/group_localhost.ini b/supvisors/test/use_cases/gathering/etc/scenario_1/localhost/group_localhost.ini deleted file mode 100644 index 2c8a07aa..00000000 --- a/supvisors/test/use_cases/gathering/etc/scenario_1/localhost/group_localhost.ini +++ /dev/null @@ -1,2 +0,0 @@ -[group:scen1] -programs=scen1_hci,scen1_config_manager,scen1_data_processing,scen1_external_interface,scen1_data_recorder,scen1_sensor_acquisition_1,scen1_sensor_processing_1,scen1_sensor_acquisition_2,scen1_sensor_processing_2 diff --git a/supvisors/test/use_cases/gathering/etc/scenario_2 b/supvisors/test/use_cases/gathering/etc/scenario_2 new file mode 120000 index 00000000..212b2ee5 --- /dev/null +++ b/supvisors/test/use_cases/gathering/etc/scenario_2 @@ -0,0 +1 @@ +../../scenario_2/etc \ No newline at end of file diff --git a/supvisors/test/use_cases/gathering/etc/scenario_2/console/group_scen2_hci_01.ini b/supvisors/test/use_cases/gathering/etc/scenario_2/console/group_scen2_hci_01.ini deleted file mode 100644 index b79720f2..00000000 --- a/supvisors/test/use_cases/gathering/etc/scenario_2/console/group_scen2_hci_01.ini +++ /dev/null @@ -1,3 +0,0 @@ -[group:scen2_hci_01] -programs = scen2_chart_view,scen2_sensor_control,scen2_sensor_view,scen2_check_internal_data_bus,scen2_internal_data_bus - diff --git a/supvisors/test/use_cases/gathering/etc/scenario_2/console/group_scen2_hci_02.ini b/supvisors/test/use_cases/gathering/etc/scenario_2/console/group_scen2_hci_02.ini deleted file mode 100644 index 8b35abc0..00000000 --- a/supvisors/test/use_cases/gathering/etc/scenario_2/console/group_scen2_hci_02.ini +++ /dev/null @@ -1,3 +0,0 @@ -[group:scen2_hci_02] -programs = scen2_chart_view,scen2_sensor_control,scen2_sensor_view,scen2_check_internal_data_bus,scen2_internal_data_bus - diff --git a/supvisors/test/use_cases/gathering/etc/scenario_2/console/group_scen2_hci_03.ini b/supvisors/test/use_cases/gathering/etc/scenario_2/console/group_scen2_hci_03.ini deleted file mode 100644 index 332cd4a4..00000000 --- a/supvisors/test/use_cases/gathering/etc/scenario_2/console/group_scen2_hci_03.ini +++ /dev/null @@ -1,3 +0,0 @@ -[group:scen2_hci_03] -programs = scen2_chart_view,scen2_sensor_control,scen2_sensor_view,scen2_check_internal_data_bus,scen2_internal_data_bus - diff --git a/supvisors/test/use_cases/gathering/etc/scenario_2/console/programs_console.ini b/supvisors/test/use_cases/gathering/etc/scenario_2/console/programs_console.ini deleted file mode 100644 index 488e606e..00000000 --- a/supvisors/test/use_cases/gathering/etc/scenario_2/console/programs_console.ini +++ /dev/null @@ -1,47 +0,0 @@ -[program:scen2_chart_view] -command = ./bin/scen2/chart_view.sh -autostart = false -killasgroup = true -stopasgroup = true -redirect_stderr = true -stdout_logfile = log/%(group_name)s_%(program_name)s.log -stdout_logfile_backups = 5 - -[program:scen2_sensor_control] -command = ./bin/scen2/sensor_control.sh -autostart = false -killasgroup = true -stopasgroup = true -redirect_stderr = true -stdout_logfile = log/%(group_name)s_%(program_name)s.log -stdout_logfile_backups = 5 - -[program:scen2_sensor_view] -command = ./bin/scen2/sensor_view.sh -autostart = false -killasgroup = true -stopasgroup = true -redirect_stderr = true -stdout_logfile = log/%(group_name)s_%(program_name)s.log -stdout_logfile_backups = 5 - -[program:scen2_check_internal_data_bus] -command = ./bin/scen2/check_internal_data_bus.sh -autostart = false -startsecs = 0 -killasgroup = true -stopasgroup = true -redirect_stderr = true -stdout_logfile = log/%(group_name)s_%(program_name)s.log -stdout_logfile_backups = 5 - -[program:scen2_internal_data_bus] -command = ./bin/scen2/internal_data_bus.sh -autostart = false -startsecs = 10 -killasgroup = true -stopasgroup = true -redirect_stderr = true -stdout_logfile = log/%(group_name)s_%(program_name)s.log -stdout_logfile_backups = 5 - diff --git a/supvisors/test/use_cases/gathering/etc/scenario_2/server/group_scen2_srv_01.ini b/supvisors/test/use_cases/gathering/etc/scenario_2/server/group_scen2_srv_01.ini deleted file mode 100644 index 8776d329..00000000 --- a/supvisors/test/use_cases/gathering/etc/scenario_2/server/group_scen2_srv_01.ini +++ /dev/null @@ -1,3 +0,0 @@ -[group:scen2_srv_01] -programs = scen2_config_manager,scen2_common_bus_interface,scen2_check_common_data_bus,scen2_check_internal_data_bus,scen2_internal_data_bus,scen2_data_processing,scen2_sensor_acquisition - diff --git a/supvisors/test/use_cases/gathering/etc/scenario_2/server/group_scen2_srv_02.ini b/supvisors/test/use_cases/gathering/etc/scenario_2/server/group_scen2_srv_02.ini deleted file mode 100644 index 73141a14..00000000 --- a/supvisors/test/use_cases/gathering/etc/scenario_2/server/group_scen2_srv_02.ini +++ /dev/null @@ -1,3 +0,0 @@ -[group:scen2_srv_02] -programs = scen2_config_manager,scen2_common_bus_interface,scen2_check_common_data_bus,scen2_check_internal_data_bus,scen2_internal_data_bus,scen2_data_processing,scen2_sensor_acquisition - diff --git a/supvisors/test/use_cases/gathering/etc/scenario_2/server/group_scen2_srv_03.ini b/supvisors/test/use_cases/gathering/etc/scenario_2/server/group_scen2_srv_03.ini deleted file mode 100644 index f12ffea2..00000000 --- a/supvisors/test/use_cases/gathering/etc/scenario_2/server/group_scen2_srv_03.ini +++ /dev/null @@ -1,3 +0,0 @@ -[group:scen2_srv_03] -programs = scen2_config_manager,scen2_common_bus_interface,scen2_check_common_data_bus,scen2_check_internal_data_bus,scen2_internal_data_bus,scen2_data_processing,scen2_sensor_acquisition - diff --git a/supvisors/test/use_cases/gathering/etc/scenario_2/server/programs_server.ini b/supvisors/test/use_cases/gathering/etc/scenario_2/server/programs_server.ini deleted file mode 100644 index 6de878d5..00000000 --- a/supvisors/test/use_cases/gathering/etc/scenario_2/server/programs_server.ini +++ /dev/null @@ -1,66 +0,0 @@ -[program:scen2_config_manager] -command = ./bin/scen2/config_manager.sh -autostart = false -killasgroup = true -stopasgroup = true -redirect_stderr = true -stdout_logfile = log/%(group_name)s_%(program_name)s.log -stdout_logfile_backups = 5 - -[program:scen2_check_common_data_bus] -command = ./bin/scen2/check_common_data_bus.sh -autostart = false -startsecs = 0 -killasgroup = true -stopasgroup = true -redirect_stderr = true -stdout_logfile = log/%(group_name)s_%(program_name)s.log -stdout_logfile_backups = 5 - -[program:scen2_common_bus_interface] -command = ./bin/scen2/common_bus_interface.sh -autostart = false -killasgroup = true -stopasgroup = true -redirect_stderr = true -stdout_logfile = log/%(group_name)s_%(program_name)s.log -stdout_logfile_backups = 5 - -[program:scen2_check_internal_data_bus] -command = ./bin/scen2/check_internal_data_bus.sh -autostart = false -startsecs = 0 -killasgroup = true -stopasgroup = true -redirect_stderr = true -stdout_logfile = log/%(group_name)s_%(program_name)s.log -stdout_logfile_backups = 5 - -[program:scen2_internal_data_bus] -command = ./bin/scen2/internal_data_bus.sh -autostart = false -startsecs = 10 -killasgroup = true -stopasgroup = true -redirect_stderr = true -stdout_logfile = log/%(group_name)s_%(program_name)s.log -stdout_logfile_backups = 5 - -[program:scen2_data_processing] -command = ./bin/scen2/data_processing.sh -autostart = false -killasgroup = true -stopasgroup = true -redirect_stderr = true -stdout_logfile = log/%(group_name)s_%(program_name)s.log -stdout_logfile_backups = 5 - -[program:scen2_sensor_acquisition] -command = ./bin/scen2/sensor_acquisition.sh -autostart = false -killasgroup = true -stopasgroup = true -redirect_stderr = true -stdout_logfile = log/%(group_name)s_%(program_name)s.log -stdout_logfile_backups = 5 - diff --git a/supvisors/test/use_cases/gathering/etc/scenario_3 b/supvisors/test/use_cases/gathering/etc/scenario_3 new file mode 120000 index 00000000..339177b6 --- /dev/null +++ b/supvisors/test/use_cases/gathering/etc/scenario_3 @@ -0,0 +1 @@ +../../scenario_3/etc \ No newline at end of file diff --git a/supvisors/test/use_cases/gathering/etc/scenario_3/common/group_services.ini b/supvisors/test/use_cases/gathering/etc/scenario_3/common/group_services.ini deleted file mode 100644 index 6a51a2bc..00000000 --- a/supvisors/test/use_cases/gathering/etc/scenario_3/common/group_services.ini +++ /dev/null @@ -1,19 +0,0 @@ -[program:common_data_bus] -command=zenity --info --text=%(program_name)s -startsecs=10 -killasgroup=true -stopasgroup=true -redirect_stderr=true -stdout_syslog=True -stdout_logfile=log/%(program_name)s_%(host_node_name)s.log -stdout_logfile_backups=5 - -[program:scen3_internal_data_bus] -command=zenity --info --text=%(program_name)s -startsecs=10 -killasgroup=true -stopasgroup=true -redirect_stderr=true -stdout_syslog=True -stdout_logfile=log/%(program_name)s_%(host_node_name)s.log -stdout_logfile_backups=5 diff --git a/supvisors/test/use_cases/gathering/etc/scenario_3/console/cliche83/group_scen3_hci_01.ini b/supvisors/test/use_cases/gathering/etc/scenario_3/console/cliche83/group_scen3_hci_01.ini deleted file mode 100644 index 9bbef5dc..00000000 --- a/supvisors/test/use_cases/gathering/etc/scenario_3/console/cliche83/group_scen3_hci_01.ini +++ /dev/null @@ -1,3 +0,0 @@ -[group:scen3_hci_01] -programs = scen3_chart_view,scen3_item_control,scen3_check_internal_data_bus - diff --git a/supvisors/test/use_cases/gathering/etc/scenario_3/console/cliche84/group_scen3_hci_02.ini b/supvisors/test/use_cases/gathering/etc/scenario_3/console/cliche84/group_scen3_hci_02.ini deleted file mode 100644 index 8064469e..00000000 --- a/supvisors/test/use_cases/gathering/etc/scenario_3/console/cliche84/group_scen3_hci_02.ini +++ /dev/null @@ -1,3 +0,0 @@ -[group:scen3_hci_02] -programs = scen3_chart_view,scen3_item_control,scen3_check_internal_data_bus - diff --git a/supvisors/test/use_cases/gathering/etc/scenario_3/console/cliche86/group_scen3_hci_03.ini b/supvisors/test/use_cases/gathering/etc/scenario_3/console/cliche86/group_scen3_hci_03.ini deleted file mode 100644 index 38adaa04..00000000 --- a/supvisors/test/use_cases/gathering/etc/scenario_3/console/cliche86/group_scen3_hci_03.ini +++ /dev/null @@ -1,3 +0,0 @@ -[group:scen3_hci_03] -programs = scen3_chart_view,scen3_item_control,scen3_check_internal_data_bus - diff --git a/supvisors/test/use_cases/gathering/etc/scenario_3/console/cliche87/group_scen3_hci_04.ini b/supvisors/test/use_cases/gathering/etc/scenario_3/console/cliche87/group_scen3_hci_04.ini deleted file mode 100644 index e8ff1e8b..00000000 --- a/supvisors/test/use_cases/gathering/etc/scenario_3/console/cliche87/group_scen3_hci_04.ini +++ /dev/null @@ -1,3 +0,0 @@ -[group:scen3_hci_04] -programs = scen3_chart_view,scen3_item_control,scen3_check_internal_data_bus - diff --git a/supvisors/test/use_cases/gathering/etc/scenario_3/console/cliche88/group_scen3_hci_05.ini b/supvisors/test/use_cases/gathering/etc/scenario_3/console/cliche88/group_scen3_hci_05.ini deleted file mode 100644 index 4f4b7bdf..00000000 --- a/supvisors/test/use_cases/gathering/etc/scenario_3/console/cliche88/group_scen3_hci_05.ini +++ /dev/null @@ -1,3 +0,0 @@ -[group:scen3_hci_05] -programs = scen3_chart_view,scen3_item_control,scen3_check_internal_data_bus - diff --git a/supvisors/test/use_cases/gathering/etc/scenario_3/console/programs_console.ini b/supvisors/test/use_cases/gathering/etc/scenario_3/console/programs_console.ini deleted file mode 100644 index c576c66d..00000000 --- a/supvisors/test/use_cases/gathering/etc/scenario_3/console/programs_console.ini +++ /dev/null @@ -1,28 +0,0 @@ -[program:scen3_chart_view] -command = ./bin/scen3/chart_view.sh -autostart = false -killasgroup = true -stopasgroup = true -redirect_stderr = true -stdout_logfile = log/%(program_name)s.log -stdout_logfile_backups = 5 - -[program:scen3_item_control] -command = ./bin/scen3/item_control.sh -autostart = false -killasgroup = true -stopasgroup = true -redirect_stderr = true -stdout_logfile = log/%(program_name)s.log -stdout_logfile_backups = 5 - -[program:scen3_check_internal_data_bus] -command = ./bin/scen3/check_internal_data_bus.sh -autostart = false -startsecs = 0 -killasgroup = true -stopasgroup = true -redirect_stderr = true -stdout_logfile = log/%(program_name)s.log -stdout_logfile_backups = 5 - diff --git a/supvisors/test/use_cases/gathering/etc/scenario_3/server/group_server.ini b/supvisors/test/use_cases/gathering/etc/scenario_3/server/group_server.ini deleted file mode 100644 index 8dbe54e9..00000000 --- a/supvisors/test/use_cases/gathering/etc/scenario_3/server/group_server.ini +++ /dev/null @@ -1,59 +0,0 @@ -[group:scen3_srv] -programs=scen3_item_manager,scen3_track_manager,scen3_system_health,scen3_common_bus_interface,scen3_check_common_data_bus,scen3_check_internal_data_bus - -[program:scen3_item_manager] -command=./bin/scen3/item_manager.sh -autostart=false -killasgroup=true -stopasgroup=true -redirect_stderr=true -stdout_logfile=log/%(program_name)s.log -stdout_logfile_backups=5 - -[program:scen3_track_manager] -command=./bin/scen3/track_manager.sh -autostart=false -killasgroup=true -stopasgroup=true -redirect_stderr=true -stdout_logfile=log/%(program_name)s.log -stdout_logfile_backups=5 - -[program:scen3_system_health] -command=./bin/scen3/system_health.sh -autostart=false -killasgroup=true -stopasgroup=true -redirect_stderr=true -stdout_logfile=log/%(program_name)s.log -stdout_logfile_backups=5 - -[program:scen3_check_common_data_bus] -command=./bin/scen3/check_common_data_bus.sh -autostart=false -startsecs=0 -killasgroup=true -stopasgroup=true -redirect_stderr=true -stdout_logfile=log/%(program_name)s.log -stdout_logfile_backups=5 - -[program:scen3_common_bus_interface] -command=./bin/scen3/common_bus_interface.sh -autostart=false -killasgroup=true -stopasgroup=true -redirect_stderr=true -stdout_logfile=log/%(program_name)s.log -stdout_logfile_backups=5 - -[program:scen3_check_internal_data_bus] -command=./bin/scen3/check_internal_data_bus.sh -autostart=false -startsecs=0 -killasgroup=true -stopasgroup=true -redirect_stderr=true -stdout_logfile=log/%(program_name)s.log -stdout_logfile_backups=5 - diff --git a/supvisors/test/use_cases/gathering/etc/supervisord_console.conf b/supvisors/test/use_cases/gathering/etc/supervisord_console.conf index 77493759..caf09930 100644 --- a/supvisors/test/use_cases/gathering/etc/supervisord_console.conf +++ b/supvisors/test/use_cases/gathering/etc/supervisord_console.conf @@ -1,28 +1,24 @@ [inet_http_server] -port=:61000 +port=:62000 [supervisord] -logfile=log/supervisord.log -logfile_backups=10 -loglevel=info -pidfile=/tmp/supervisord.pid -nodaemon=false -umask=002 +logfile=log/supervisord_console.log +pidfile=/tmp/supervisord_console.pid [rpcinterface:supervisor] supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface [supervisorctl] -serverurl=http://localhost:61000 +serverurl=http://localhost:62000 [include] -files = common/*.ini console/*.ini -files = scen*1/%(host_node_name)s/* scen*2/console/* scen*3/common/* scen*3/console/* scen*3/console/%(host_node_name)s/* +files = scen*1/%(host_node_name)s/* scen*2/console/* scen*3/common/* scen*3/console/* scen*3/console/%(ENV_IDENTIFIER)s/* [rpcinterface:supvisors] supervisor.rpcinterface_factory = supvisors.plugin:make_supvisors_rpcinterface -address_list = cliche81,cliche82,cliche83,cliche84,cliche85,cliche86,cliche87,cliche88 -rules_files = etc/supvisors_rules.xml +supvisors_list = cliche81:61000:,cliche82:61000:,cliche83:61000:,cliche81:62000:,cliche82:62000:,cliche83:62000:,cliche84:62000:,cliche85:62000: +rules_files = etc/scen1_supvisors_rules.xml etc/scen2_supvisors_rules.xml etc/scen3_supvisors_rules.xml +core_identifiers = server_1,server_2,server_3 [ctlplugin:supvisors] supervisor.ctl_factory = supvisors.supvisorsctl:make_supvisors_controller_plugin diff --git a/supvisors/test/use_cases/gathering/etc/supervisord_console_localhost.conf b/supvisors/test/use_cases/gathering/etc/supervisord_console_localhost.conf new file mode 100644 index 00000000..2e4fa277 --- /dev/null +++ b/supvisors/test/use_cases/gathering/etc/supervisord_console_localhost.conf @@ -0,0 +1,24 @@ +[inet_http_server] +port=:62000 + +[supervisord] +logfile=log/supervisord_console.log +pidfile=/tmp/supervisord_console.pid + +[rpcinterface:supervisor] +supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface + +[supervisorctl] +serverurl=http://localhost:62000 + +[include] +files = scen*1/%(host_node_name)s/* scen*2/console/prog* scen*2/console/group*01* scen*3/common/* scen*3/console/* scen*3/console/console_1/* + +[rpcinterface:supvisors] +supervisor.rpcinterface_factory = supvisors.plugin:make_supvisors_rpcinterface +supvisors_list = cliche81:61000:,cliche81:62000: +rules_files = etc/scen1_supvisors_rules.xml etc/scen2_supvisors_rules.xml etc/scen3_supvisors_rules.xml +core_identifiers = server_1 + +[ctlplugin:supvisors] +supervisor.ctl_factory = supvisors.supvisorsctl:make_supvisors_controller_plugin diff --git a/supvisors/test/use_cases/gathering/etc/supervisord_server.conf b/supvisors/test/use_cases/gathering/etc/supervisord_server.conf index 43c3957a..e370ddf2 100644 --- a/supvisors/test/use_cases/gathering/etc/supervisord_server.conf +++ b/supvisors/test/use_cases/gathering/etc/supervisord_server.conf @@ -2,12 +2,8 @@ port=:61000 [supervisord] -logfile=log/supervisord.log -logfile_backups=10 -loglevel=info -pidfile=/tmp/supervisord.pid -nodaemon=false -umask=002 +logfile=log/supervisord_server.log +pidfile=/tmp/supervisord_server.pid [rpcinterface:supervisor] supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface @@ -20,8 +16,9 @@ files = scen*1/%(host_node_name)s/* scen*2/server/* scen*3/common/* scen*3/serve [rpcinterface:supvisors] supervisor.rpcinterface_factory = supvisors.plugin:make_supvisors_rpcinterface -address_list = cliche81,cliche82,cliche83,cliche84,cliche85,cliche86,cliche87,cliche88 -rules_files = etc/supvisors_rules.xml +supvisors_list = cliche81:61000:,cliche82:61000:,cliche83:61000:,cliche81:62000:,cliche82:62000:,cliche83:62000:,cliche84:62000:,cliche85:62000: +rules_files = etc/scen1_supvisors_rules.xml etc/scen2_supvisors_rules.xml etc/scen3_supvisors_rules.xml +core_identifiers = server_1,server_2,server_3 [ctlplugin:supvisors] supervisor.ctl_factory = supvisors.supvisorsctl:make_supvisors_controller_plugin diff --git a/supvisors/test/use_cases/gathering/etc/supervisord_localhost.conf b/supvisors/test/use_cases/gathering/etc/supervisord_server_localhost.conf similarity index 50% rename from supvisors/test/use_cases/gathering/etc/supervisord_localhost.conf rename to supvisors/test/use_cases/gathering/etc/supervisord_server_localhost.conf index 7cbca4c9..5d74650e 100644 --- a/supvisors/test/use_cases/gathering/etc/supervisord_localhost.conf +++ b/supvisors/test/use_cases/gathering/etc/supervisord_server_localhost.conf @@ -2,12 +2,8 @@ port=:61000 [supervisord] -logfile=log/supervisord.log -logfile_backups=10 -loglevel=info -pidfile=/tmp/supervisord.pid -nodaemon=false -umask=002 +logfile=log/supervisord_server.log +pidfile=/tmp/supervisord_server.pid [rpcinterface:supervisor] supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface @@ -16,11 +12,13 @@ supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface serverurl=http://localhost:61000 [include] -files = scen*1/*/prog* scen*1/local*/* scen*2/*/prog* scen*2/*/group*01* scen*3/*/* scen*3/console/cliche83/* +files = scen*1/*/prog* scen*1/localhost/* scen*2/server/prog* scen*2/server/group*01* scen*3/common/* scen*3/server/* [rpcinterface:supvisors] -supervisor.rpcinterface_factory = supvisors.plugin:make_supvisors_rpcinterface -rules_files = etc/supvisors_localhost_rules.xml +supervisor.rpcinterface_factory = supvisors.plugin:make_supvisors_rpcinterface +supvisors_list = cliche81:61000:,cliche81:62000: +rules_files = etc/scen1_supvisors_rules.xml etc/scen2_supvisors_rules.xml etc/scen3_supvisors_rules.xml +core_identifiers = server_1 [ctlplugin:supvisors] supervisor.ctl_factory = supvisors.supvisorsctl:make_supvisors_controller_plugin diff --git a/supvisors/test/use_cases/gathering/etc/supvisors_rules.xml b/supvisors/test/use_cases/gathering/etc/supvisors_rules.xml deleted file mode 100644 index d63c84e6..00000000 --- a/supvisors/test/use_cases/gathering/etc/supvisors_rules.xml +++ /dev/null @@ -1,138 +0,0 @@ - - - - cliche81,cliche82,cliche85 - cliche83,cliche84,cliche86,cliche87,cliche88 - - - - 2 - true - 2 - - - - servers - 3 - true - 4 - - - servers - 2 - true - true - - - servers - 1 - true - 2 - - - - - 1 - CONTINUE - - - model_scenario_1 - - - model_scenario_1 - 1 - true - - - - - - - false - servers - 1 - LESS_LOADED - STOP - - - model_services - 4 - - - check_data_bus - 3 - - - model_services - - - check_data_bus - - - data_bus - RESTART_APPLICATION - - - - - - false - consoles - LOCAL - CONTINUE - - - 3 - 8 - - - check_data_bus - - - data_bus - STOP_APPLICATION - - - - - - - - 1 - LESS_LOADED - CONTINUE - RESTART_PROCESS - - - model_services - 3 - - - check_data_bus - 2 - - - model_services - - - check_data_bus - - - - - - false - #,consoles - 2 - CONTINUE - - - 2 - 5 - - - check_data_bus - - - - diff --git a/supvisors/test/use_cases/gathering/start.sh b/supvisors/test/use_cases/gathering/start.sh index 0f30fb0f..356742f8 100755 --- a/supvisors/test/use_cases/gathering/start.sh +++ b/supvisors/test/use_cases/gathering/start.sh @@ -7,18 +7,24 @@ test_dir=$(dirname "$(readlink -f "$0")") export CUR_DATE=`date +'%y%m%d'` export CUR_TIME=`date +'%H%M%S'` -# start supervisor on all servers -for i in cliche81 cliche82 -do - echo "start Supvisors on" $i - ssh $i "export DISPLAY=:0 ; cd $test_dir ; rm -rf log ; mkdir log ; ./configure.sh ; export CUR_DATE=$CUR_DATE ; export CUR_TIME=$CUR_TIME ; supervisord -c etc/supervisord_server.conf" -done +# set default hosts if not provided in command line +HOSTS=${@:-cliche81 cliche82 cliche83} -# start supervisor on all consoles -for i in cliche83 +# clear logs / start server + console on each host +for host in $HOSTS do - echo "start Supvisors on" $i - ssh $i "export DISPLAY=:0 ; cd $test_dir ; rm -rf log ; mkdir log ; ./configure.sh ; export CUR_DATE=$CUR_DATE ; export CUR_TIME=$CUR_TIME ; supervisord -c etc/supervisord_console.conf" + x=`echo "$host" | tail -c 2` + ping -c 1 $host 2>&1 >/dev/null && ssh $host "cd $test_dir + rm -rf log ; mkdir log + export DISPLAY=:0 + export CUR_DATE=$CUR_DATE + export CUR_TIME=$CUR_TIME + echo \"start Supvisors on $host as server_$x\" + export IDENTIFIER=server_$x + supervisord -c etc/supervisord_server.conf -i \$IDENTIFIER + echo \"start Supvisors on $host as console_$x\" + export IDENTIFIER=console_$x + supervisord -c etc/supervisord_console.conf -i \$IDENTIFIER" done # start firefox to get the Web UI diff --git a/supvisors/test/use_cases/gathering/start_localhost.sh b/supvisors/test/use_cases/gathering/start_localhost.sh index cbecd317..86f9a6f9 100755 --- a/supvisors/test/use_cases/gathering/start_localhost.sh +++ b/supvisors/test/use_cases/gathering/start_localhost.sh @@ -2,22 +2,26 @@ # go to script folder test_dir=$(dirname "$(readlink -f "$0")") -cd $test_dir - -# clear log folder -rm -rf log -mkdir log - -# configure number of applications -./configure.sh # environmental variables for log file names export CUR_DATE=`date +'%y%m%d'` export CUR_TIME=`date +'%H%M%S'` +export DISPLAY=:0 + +# set default hosts if not provided in command line +HOST=`hostname` + +# clear logs / start server + console on each host +cd $test_dir +rm -rf log ; mkdir log + +echo "start Supvisors on $HOST as server_1" +export IDENTIFIER=server_1 +supervisord -c etc/supervisord_server_localhost.conf -i $IDENTIFIER + +echo "start Supvisors on $HOST as console_1" +export IDENTIFIER=console_1 +supervisord -c etc/supervisord_console_localhost.conf -i $IDENTIFIER # start firefox to get the Web UI firefox http://localhost:61000 & - -# start non-daemonized supervisor -echo "start Supvisors on" `hostname` -supervisord -c etc/supervisord_localhost.conf -n diff --git a/supvisors/test/use_cases/scenario_1/etc/supvisors_rules.xml b/supvisors/test/use_cases/scenario_1/etc/supvisors_rules.xml index 9abcaf4b..d4e2b3e1 100644 --- a/supvisors/test/use_cases/scenario_1/etc/supvisors_rules.xml +++ b/supvisors/test/use_cases/scenario_1/etc/supvisors_rules.xml @@ -4,6 +4,7 @@ 2 true + 2 diff --git a/supvisors/test/use_cases/scenario_2/etc/supervisord_console.conf b/supvisors/test/use_cases/scenario_2/etc/supervisord_console.conf index 76c9e98e..ce0774fa 100644 --- a/supvisors/test/use_cases/scenario_2/etc/supervisord_console.conf +++ b/supvisors/test/use_cases/scenario_2/etc/supervisord_console.conf @@ -1,27 +1,24 @@ [inet_http_server] -port=:61000 +port=:62000 [supervisord] -logfile=log/supervisord.log -logfile_backups=10 -loglevel=info -pidfile=/tmp/supervisord.pid -nodaemon=false -umask=002 +logfile=log/supervisord_console.log +pidfile=/tmp/supervisord_console.pid [rpcinterface:supervisor] supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface [supervisorctl] -serverurl=http://localhost:61000 +serverurl=http://localhost:62000 [include] files = common/*.ini console/*.ini [rpcinterface:supvisors] supervisor.rpcinterface_factory = supvisors.plugin:make_supvisors_rpcinterface -address_list = cliche81,cliche82,cliche83,cliche84,cliche85,cliche86,cliche87,cliche88 +supvisors_list = cliche81:61000:,cliche82:61000:,cliche83:61000:,cliche81:62000:,cliche82:62000:,cliche83:62000:,cliche84:62000:,cliche85:62000: rules_files = etc/supvisors_rules.xml +core_identifiers = server_1,server_2,server_3 [ctlplugin:supvisors] supervisor.ctl_factory = supvisors.supvisorsctl:make_supvisors_controller_plugin diff --git a/supvisors/test/use_cases/scenario_2/etc/supervisord_localhost.conf b/supvisors/test/use_cases/scenario_2/etc/supervisord_localhost.conf deleted file mode 100644 index c2d7f68a..00000000 --- a/supvisors/test/use_cases/scenario_2/etc/supervisord_localhost.conf +++ /dev/null @@ -1,26 +0,0 @@ -[inet_http_server] -port=:61000 - -[supervisord] -logfile=log/supervisord.log -logfile_backups=10 -loglevel=info -pidfile=/tmp/supervisord.pid -nodaemon=false -umask=002 - -[rpcinterface:supervisor] -supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface - -[supervisorctl] -serverurl=http://localhost:61000 - -[include] -files = */programs*.ini server/group_scen2_srv_01.ini console/group_scen2_hci_01.ini - -[rpcinterface:supvisors] -supervisor.rpcinterface_factory = supvisors.plugin:make_supvisors_rpcinterface -rules_files = etc/supvisors_localhost_rules.xml - -[ctlplugin:supvisors] -supervisor.ctl_factory = supvisors.supvisorsctl:make_supvisors_controller_plugin diff --git a/supvisors/test/use_cases/scenario_2/etc/supervisord_server.conf b/supvisors/test/use_cases/scenario_2/etc/supervisord_server.conf index 35c26abf..1933f907 100644 --- a/supvisors/test/use_cases/scenario_2/etc/supervisord_server.conf +++ b/supvisors/test/use_cases/scenario_2/etc/supervisord_server.conf @@ -2,12 +2,8 @@ port=:61000 [supervisord] -logfile=log/supervisord.log -logfile_backups=10 -loglevel=info -pidfile=/tmp/supervisord.pid -nodaemon=false -umask=002 +logfile=log/supervisord_server.log +pidfile=/tmp/supervisord_server.pid [rpcinterface:supervisor] supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface @@ -20,8 +16,9 @@ files = common/*.ini server/*.ini [rpcinterface:supvisors] supervisor.rpcinterface_factory = supvisors.plugin:make_supvisors_rpcinterface -address_list = cliche81,cliche82,cliche83,cliche84,cliche85,cliche86,cliche87,cliche88 +supvisors_list = cliche81:61000:,cliche82:61000:,cliche83:61000:,cliche81:62000:,cliche82:62000:,cliche83:62000:,cliche84:62000:,cliche85:62000: rules_files = etc/supvisors_rules.xml +core_identifiers = server_1,server_2,server_3 [ctlplugin:supvisors] supervisor.ctl_factory = supvisors.supvisorsctl:make_supvisors_controller_plugin diff --git a/supvisors/test/use_cases/scenario_2/etc/supvisors_localhost_rules.xml b/supvisors/test/use_cases/scenario_2/etc/supvisors_localhost_rules.xml deleted file mode 100644 index 67a20799..00000000 --- a/supvisors/test/use_cases/scenario_2/etc/supvisors_localhost_rules.xml +++ /dev/null @@ -1,66 +0,0 @@ - - - - - 3 - true - 10 - - - 2 - true - true - - - 1 - true - 2 - - - - - - false - 1 - LESS_LOADED - STOP - - model_services - 4 - - - check_data_bus - 3 - - - model_services - - - check_data_bus - - - data_bus - RESTART_APPLICATION - - - - - - false - 2 - LOCAL - CONTINUE - - 3 - 8 - - - check_data_bus - - - data_bus - STOP_APPLICATION - - - - diff --git a/supvisors/test/use_cases/scenario_2/etc/supvisors_rules.xml b/supvisors/test/use_cases/scenario_2/etc/supvisors_rules.xml index 3d196da4..96139525 100644 --- a/supvisors/test/use_cases/scenario_2/etc/supvisors_rules.xml +++ b/supvisors/test/use_cases/scenario_2/etc/supvisors_rules.xml @@ -1,8 +1,8 @@ - cliche81,cliche82,cliche85 - cliche83,cliche84,cliche86,cliche87,cliche88 + server_1,server_2,server_3 + console_1,console_2,console_3,console_4,console_5 @@ -25,7 +25,7 @@ false - servers + servers 1 LESS_LOADED STOP @@ -53,7 +53,7 @@ false - consoles + consoles LOCAL CONTINUE diff --git a/supvisors/test/use_cases/scenario_2/scenario_2.rst b/supvisors/test/use_cases/scenario_2/scenario_2.rst index 2ca77307..c50a52e5 100644 --- a/supvisors/test/use_cases/scenario_2/scenario_2.rst +++ b/supvisors/test/use_cases/scenario_2/scenario_2.rst @@ -164,7 +164,7 @@ The initial |Supervisor| configuration is as follows: * The ``etc`` folder is the target destination for the configurations files of all applications to be supervised. In this example, it just contains a definition of the common data bus (refer to |Req 17 abbr|) that will be - auto-started on all nodes. + auto-started on all |Supvisors| instances. The ``etc`` folder contains the |Supervisor| configuration files that will be used when starting :program:`supervisord`. @@ -431,7 +431,7 @@ over the X instances of the :program:`Scenario 2` application, as required in |R |Req 2 abbr| is just about declaring the ``distributed`` element to ``false``. It tells |Supvisors| that all the -programs of the application have to be started on the same node. +programs of the application have to be started on the same |Supvisors| instance. .. code-block:: xml @@ -446,25 +446,25 @@ programs of the application have to be started on the same node. -So far, all applications can be started on any node. Let's compel :program:`scen2_hci` to consoles and +So far, all applications can be started on any |Supvisors| instance. Let's compel :program:`scen2_hci` to consoles and :program:`scen2_srv` to servers, which satisfies |Req 10 abbr| and contributes to some console-related requirements. -For better readability, node aliases are introduced. +For better readability, Instance aliases are introduced. .. code-block:: xml - cliche81,cliche82,cliche85 - cliche83,cliche84,cliche86,cliche87,cliche88 + server_1,server_2,server_3 + console_1,console_2,console_3,console_4,console_5 false - servers + servers false - consoles + consoles @@ -495,8 +495,8 @@ operational as a standalone application, even if it's not connected to other pos - cliche81,cliche82,cliche85 - cliche83,cliche84,cliche86,cliche87,cliche88 + server_1,server_2,server_3 + console_1,console_2,console_3,console_4,console_5 3 @@ -514,7 +514,7 @@ operational as a standalone application, even if it's not connected to other pos false - servers + servers 1 @@ -538,7 +538,7 @@ operational as a standalone application, even if it's not connected to other pos false - consoles + consoles 3 @@ -563,9 +563,9 @@ The ``starting_strategy`` element of the :program:`scen2_srv` application is set |Req 13 abbr|. Before |Supvisors| starts an application or a program, it relies on the ``expected_loading`` set just before to: - * evaluate the current load on all nodes (due to processes already running), - * choose the node having the lowest load and that can accept the additional load required by the program - or application to start. + * evaluate the current load on all |Supvisors| instances (due to processes already running), + * choose the |Supvisors| instance having the lowest load and that can accept the additional load required + by the program or application to start. If none found, the application or the program is not started, which satisfies |Req 5 abbr|. @@ -594,8 +594,8 @@ responsibility to merge the status of :program:`scen2_srv_N` and :program:`scen2 - cliche81,cliche82,cliche85 - cliche83,cliche84,cliche86,cliche87,cliche88 + server_1,server_2,server_3 + console_1,console_2,console_3,console_4,console_5 @@ -618,7 +618,7 @@ responsibility to merge the status of :program:`scen2_srv_N` and :program:`scen2 false - servers + servers 1 LESS_LOADED STOP @@ -646,7 +646,7 @@ responsibility to merge the status of :program:`scen2_srv_N` and :program:`scen2 false - consoles + consoles LOCAL CONTINUE @@ -689,16 +689,16 @@ The operational status of :program:`Scenario 2` required by the |Req 3 abbr| is * the :ref:`extended_status` of the extended :program:`supervisorctl` or :program:`supvisorsctl` (example below), * the :ref:`event_interface`. -For the examples, the following context applies: +For the example, the following context applies: - * only 3 nodes among the 8 defined are running: 2 servers (``cliche81`` and ``cliche82``) and one console - (``cliche83``) - clearly due to limited testing resources ; + * due to limited resources - 3 nodes are available (``cliche81``, ``cliche82`` and ``cliche83``) -, each node hosts + 2 |Supvisors| instances, one server and one console, leaving 2 silent consoles ; * :program:`common_data_bus` is *Unmanaged* so |Supvisors| always considers this 'application' as ``STOPPED`` (the process status is yet ``RUNNING``) ; - * :program:`scen2_srv_01` and :program:`scen2_srv_03` are running on the server ``cliche81`` ; - * :program:`scen2_srv_02` is running on the server ``cliche82`` ; - * :program:`scen2_hci_02` has been started on the console ``cliche83`` ; - * an attempt to start :program:`scen2_hci_03` on the server ``cliche81`` has been rejected. + * :program:`scen2_srv_01`, :program:`scen2_srv_02` and :program:`scen2_srv_03` are running on ``server_1``, + ``server_2``, ``server_3``, respectively hosted by the nodes ``cliche81``, ``cliche82``, ``cliche83`` ; + * :program:`scen2_hci_02` has been started on ``console_3`` ; + * an attempt to start :program:`scen2_hci_03` on the server ``cliche81`` has been rejected (only allowed on a console). >>> from supervisor.childutils import getRPCInterface >>> proxy = getRPCInterface({'SUPERVISOR_SERVER_URL': 'http://localhost:61000'}) @@ -734,7 +734,8 @@ methods are available: - using the URL otherwise, * the start button |start| at the top right of the :ref:`dashboard_application` of the |Supvisors| Web UI, - **assuming that the user has navigated to this page using the relevant node** (check the url if necessary). + **assuming that the user has navigated to this page using the relevant |Supvisors| instance** (check the url + if necessary). >>> from supervisor.childutils import getRPCInterface @@ -777,9 +778,9 @@ To stop a :program:`scen2_hci` (|Req 27 abbr|), the following methods are availa * the :ref:`xml_rpc` (example below - **whatever the target**), * the :ref:`application_control` of the extended :program:`supervisorctl` or :program:`supvisorsctl` **from any - node where |Supvisors| is running** (example below), + |Supvisors| instance** (example below), * the stop button |stop| at the top right of the :ref:`dashboard_application` of the |Supvisors| Web UI, - **whatever the node hosting this page**. + **whatever the |Supvisors| instance displaying this page**. Indeed, as |Supvisors| knows where the application is running, it is able to drive the application stop from anywhere. @@ -808,9 +809,4 @@ Example The full example is available in `Supvisors Use Cases - Scenario 2 `_. -An additional configuration for a single node and with automatic start of a HCI is also provided: - - * etc/supervisord_localhost.conf - * etc/supvisors_localhost_rules.xml - .. include:: common.rst diff --git a/supvisors/test/use_cases/scenario_2/start.sh b/supvisors/test/use_cases/scenario_2/start.sh index 9a9a9f40..63cd51c0 100755 --- a/supvisors/test/use_cases/scenario_2/start.sh +++ b/supvisors/test/use_cases/scenario_2/start.sh @@ -3,22 +3,28 @@ # go to script folder test_dir=$(dirname "$(readlink -f "$0")") +# set default hosts if not provided in command line +HOSTS=${@:-cliche81 cliche82 cliche83} + # configure 3 applications SRV_CONFIG_CMD="supvisors_breed -d etc -t template_etc -p server/*.ini -b scen2_srv=3 -x -v" HCI_CONFIG_CMD="supvisors_breed -d etc -t template_etc -p console/*ini -b scen2_hci=3 -x -v" -# start supervisor on all servers -for i in cliche81 cliche82 -do - echo "start Supvisors on" $i - ssh $i "export DISPLAY=:0 ; cd $test_dir ; rm -rf log ; mkdir log ; $SRV_CONFIG_CMD ; supervisord -c etc/supervisord_server.conf" -done - -# start supervisor on all consoles -for i in cliche83 +# clear logs / configure / start server + console on each host +for host in $HOSTS do - echo "start Supvisors on" $i - ssh $i "export DISPLAY=:0 ; cd $test_dir ; rm -rf log ; mkdir log ; $HCI_CONFIG_CMD ; supervisord -c etc/supervisord_console.conf" + x=`echo "$host" | tail -c 2` + ping -c 1 $host 2>&1 >/dev/null && ssh $host "cd $test_dir + rm -rf log ; mkdir log + $SRV_CONFIG_CMD + $HCI_CONFIG_CMD + export DISPLAY=:0 + echo \"start Supvisors on $host as server_$x\" + export IDENTIFIER=server_$x + supervisord -c etc/supervisord_server.conf -i \$IDENTIFIER + echo \"start Supvisors on $host as console_$x\" + export IDENTIFIER=console_$x + supervisord -c etc/supervisord_console.conf -i \$IDENTIFIER" done # start firefox to get the Web UI diff --git a/supvisors/test/use_cases/scenario_2/start_localhost.sh b/supvisors/test/use_cases/scenario_2/start_localhost.sh deleted file mode 100755 index 7f6a5e9e..00000000 --- a/supvisors/test/use_cases/scenario_2/start_localhost.sh +++ /dev/null @@ -1,19 +0,0 @@ -#!/bin/bash - -# go to script folder -test_dir=$(dirname "$(readlink -f "$0")") -cd $test_dir - -# clear log folder -rm -rf log -mkdir log - -# configure 3 applications -supvisors_breed -d etc -t template_etc -b scen2_srv=3 scen2_hci=3 -x -v - -# start firefox to get the Web UI -firefox http://localhost:61000 & - -# start non-daemonized supervisor -echo "start Supvisors on" `hostname` -supervisord -c etc/supervisord_localhost.conf -n diff --git a/supvisors/test/use_cases/scenario_3/etc/supervisord_console.conf b/supvisors/test/use_cases/scenario_3/etc/supervisord_console.conf index 6a0f4d36..f3e21ec6 100644 --- a/supvisors/test/use_cases/scenario_3/etc/supervisord_console.conf +++ b/supvisors/test/use_cases/scenario_3/etc/supervisord_console.conf @@ -4,8 +4,6 @@ port=:62000 [supervisord] logfile=log/supervisord_console.log pidfile=/tmp/supervisord_console.pid -umask=002 -identifier=console_1 [rpcinterface:supervisor] supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface @@ -20,6 +18,7 @@ files = common/*.ini console/*.ini console/%(ENV_IDENTIFIER)s/*.ini supervisor.rpcinterface_factory = supvisors.plugin:make_supvisors_rpcinterface supvisors_list = cliche81:61000:,cliche82:61000:,cliche83:61000:,cliche81:62000:,cliche82:62000:,cliche83:62000:,cliche84:62000:,cliche85:62000: rules_files = etc/supvisors_rules.xml +core_identifiers = server_1,server_2,server_3 [ctlplugin:supvisors] supervisor.ctl_factory = supvisors.supvisorsctl:make_supvisors_controller_plugin diff --git a/supvisors/test/use_cases/scenario_3/etc/supervisord_server.conf b/supvisors/test/use_cases/scenario_3/etc/supervisord_server.conf index d34692e0..1933f907 100644 --- a/supvisors/test/use_cases/scenario_3/etc/supervisord_server.conf +++ b/supvisors/test/use_cases/scenario_3/etc/supervisord_server.conf @@ -4,9 +4,6 @@ port=:61000 [supervisord] logfile=log/supervisord_server.log pidfile=/tmp/supervisord_server.pid -nodaemon=false -umask=002 -identifier=server_1 [rpcinterface:supervisor] supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface @@ -21,6 +18,7 @@ files = common/*.ini server/*.ini supervisor.rpcinterface_factory = supvisors.plugin:make_supvisors_rpcinterface supvisors_list = cliche81:61000:,cliche82:61000:,cliche83:61000:,cliche81:62000:,cliche82:62000:,cliche83:62000:,cliche84:62000:,cliche85:62000: rules_files = etc/supvisors_rules.xml +core_identifiers = server_1,server_2,server_3 [ctlplugin:supvisors] supervisor.ctl_factory = supvisors.supvisorsctl:make_supvisors_controller_plugin diff --git a/supvisors/test/use_cases/scenario_3/scenario_3.rst b/supvisors/test/use_cases/scenario_3/scenario_3.rst index 5ee15351..a0ddd75d 100644 --- a/supvisors/test/use_cases/scenario_3/scenario_3.rst +++ b/supvisors/test/use_cases/scenario_3/scenario_3.rst @@ -114,7 +114,8 @@ The initial |Supervisor| configuration is as follows: * The ``etc`` folder is the target destination for the configurations files of all applications to be supervised. It initially contains: - - a definition of the common data bus (refer to |Req 14 abbr|) that will be auto-started on all nodes. + - a definition of the common data bus (refer to |Req 14 abbr|) that will be auto-started on all |Supvisors| + instances. - the configuration of the :program:`scen3_srv` group and programs. - the |Supervisor| configuration files that will be used when starting :program:`supervisord`: @@ -197,7 +198,11 @@ files from the |Supervisor| configuration file. to group all the applications of the different use cases into an unique |Supvisors| configuration. Adding ``scen3`` at this point is just to avoid overwriting of program definitions. -Knowing the host names of the consoles, an additional script is used to sort the files generated. + +XXX difference scenario 2 / impact + + +Based on the expected names of the consoles, an additional script is used to sort the files generated. The resulting file tree is as follows. .. code-block:: bash @@ -219,15 +224,15 @@ The resulting file tree is as follows. │ ├── common │ │ └── group_services.ini │ ├── console - │ │ ├── cliche83 + │ │ ├── console_1 │ │ │ └── group_scen3_hci_01.ini - │ │ ├── cliche84 + │ │ ├── console_2 │ │ │ └── group_scen3_hci_02.ini - │ │ ├── cliche86 + │ │ ├── console_3 │ │ │ └── group_scen3_hci_03.ini - │ │ ├── cliche87 + │ │ ├── console_4 │ │ │ └── group_scen3_hci_04.ini - │ │ ├── cliche88 + │ │ ├── console_5 │ │ │ └── group_scen3_hci_05.ini │ │ └── programs_console.ini │ ├── server @@ -279,8 +284,8 @@ As the logic of the starting sequence of :program:`Scenario 3` very similar to t won't be much detail about that in the present section. Please refer to the other use case if needed. The main difference is that :program:`scen3_internal_data_bus` has been removed. As a reminder, the consequence of -|Req 12 abbr| and |Req 20 abbr| is that this program must run in all nodes, so it has been moved to the services file -and configured as auto-started. +|Req 12 abbr| and |Req 20 abbr| is that this program must run in all |Supvisors| instances, so it has been moved +to the services file and configured as auto-started. Both applications :program:`scen3_srv` and :program:`scen3_hci` have their ``start_sequence`` set and strictly positive so they will be automatically started, as required by |Req 1 abbr|. Please just note that :program:`scen3_hci` has a @@ -303,15 +308,16 @@ elements. The value ``#,consoles`` used here needs some further explanation. When using hashtags in ``addresses``, applications and programs cannot be started anywhere until |Supvisors| solves the 'equation'. As defined in :ref:`patterns_hashtags`, an association will be made between the Nth application :program:`scen3_hci_N` and the Nth element of the ``consoles`` list. In the example, :program:`scen3_hci_01` will be -mapped with node ``cliche83`` once resolved. +mapped with |Supvisors| instance ``console_1`` once resolved. This will result in having exactly one :program:`scen3_hci` application per console, which satisfies |Req 20 abbr|. .. note:: - In :program:`scen3_hci`, the program ``scen3_check_internal_data_bus`` references a model that uses server nodes - in its ``addresses`` option. It doesn't matter in the present case because, as told before, the ``addresses`` - option of the non-distributed application supersedes the ``addresses`` eventually set in its programs. + In :program:`scen3_hci`, the program ``scen3_check_internal_data_bus`` references a model that uses server + |Supvisors| instances in its ``identifiers`` option. It doesn't matter in the present case because, as told before, + the ``identifiers`` option of the non-distributed application supersedes the ``identifiers`` eventually set in its + programs. Let's now focus on the strategies and options used at application level. @@ -323,9 +329,10 @@ To satisfy |Req 13 abbr|, the ``running_failure_strategy`` option of :program:`s ``CONTINUE`` is then used, as required in |Req 21 abbr|. Anyway, as the Nth application si only known by the |Supervisor| of the Nth console, it is just impossible to start this application elsewhere. -Finally, in order to satisfy |Req 5 abbr| and to have a load-balancing over the server nodes (refer to |Req 11 abbr|), -an arbitrary ``expected_loading`` has been set on programs. It is expected that relevant figures are used for a real -application. The ``starting_strategy`` option of :program:`scen3_srv` has been set to ``LESS_LOADED``. +Finally, in order to satisfy |Req 5 abbr| and to have a load-balancing over the server |Supvisors| instances (refer to +|Req 11 abbr|), an arbitrary ``expected_loading`` has been set on programs. It is expected that relevant figures are +used for a real application. +The ``starting_strategy`` option of :program:`scen3_srv` has been set to ``LESS_LOADED``. Here follows the resulting rules file. @@ -334,8 +341,8 @@ Here follows the resulting rules file. - cliche81,cliche82,cliche85 - cliche83,cliche84,cliche86,cliche87,cliche88 + server_1,server_2,server_3 + console_1,console_2,console_3,console_4,console_5 @@ -406,12 +413,23 @@ The operational status of :program:`Scenario 3` required by the |Req 2 abbr| is For the examples, the following context applies: - * only 3 nodes among the 8 defined are running: 2 servers (``cliche81`` and ``cliche82``) and one console - (``cliche83``) - clearly due to limited testing resources ; + * due to limited resources - 3 nodes are available (``cliche81``, ``cliche82`` and ``cliche83``) -, each node hosts + 2 |Supvisors| instances, one server and one console, leaving 2 silent consoles ; * :program:`common_data_bus` and :program:`scen3_internal_data_bus` are *Unmanaged* so |Supvisors| always considers these 'applications' as ``STOPPED`` ; - * :program:`scen3_srv` is distributed over the servers ``cliche81`` and ``cliche82`` ; - * :program:`scen3_hci_01` has been started on the console ``cliche83``. + * :program:`scen3_srv` is distributed over the 3 servers ; + * :program:`scen3_hci_01`,program:`scen3_hci_02`, program:`scen3_hci_03` have been respectively started on + ``console_1``, ``console_2``, ``console_3`` . + +The |Supervisor| configuration of the consoles has been changed to include the files related to the |Supervisor| +identifier ``console_X`` rather than those related to ``host_node_name``. As there is no automatic expansion related +to the |Supervisor| identifier so far, an environmental variable is used. + +.. code-block:: ini + + # include section in supervisord_console.conf + [include] + files = common/*.ini console/*.ini console/%(ENV_IDENTIFIER)s/*.ini >>> from supervisor.childutils import getRPCInterface >>> proxy = getRPCInterface({'SUPERVISOR_SERVER_URL': 'http://localhost:61000'}) @@ -419,7 +437,9 @@ For the examples, the following context applies: [{'application_name': 'common_data_bus', 'statecode': 0, 'statename': 'STOPPED', 'major_failure': False, 'minor_failure': False}, {'application_name': 'scen3_internal_data_bus', 'statecode': 0, 'statename': 'STOPPED', 'major_failure': False, 'minor_failure': False}, {'application_name': 'scen3_srv', 'statecode': 2, 'statename': 'RUNNING', 'major_failure': False, 'minor_failure': False}, -{'application_name': 'scen3_hci_01', 'statecode': 2, 'statename': 'RUNNING', 'major_failure': False, 'minor_failure': False}] +{'application_name': 'scen3_hci_01', 'statecode': 2, 'statename': 'RUNNING', 'major_failure': False, 'minor_failure': False}, +{'application_name': 'scen3_hci_02', 'statecode': 2, 'statename': 'RUNNING', 'major_failure': False, 'minor_failure': False}, +{'application_name': 'scen3_hci_03', 'statecode': 2, 'statename': 'RUNNING', 'major_failure': False, 'minor_failure': False}] .. code-block:: bash @@ -429,12 +449,14 @@ For the examples, the following context applies: scen3_internal_data_bus STOPPED False False scen3_srv RUNNING False False scen3_hci_01 RUNNING False False + scen3_hci_02 RUNNING False False + scen3_hci_03 RUNNING False False .. note:: - It could be felt strange to see only one of the 5 :program:`scen3_hci` applications. It has to be remembered that + It could be felt strange to see only 3 of the 5 :program:`scen3_hci` applications. It has to be remembered that the overall configuration has been built so that each console would include the configurations files related to the - only :program:`scen3_hci` meant to run on it. In the example, 4 consoles are ``SILENT`` so their programs are + only :program:`scen3_hci` meant to run on it. In the example, 2 consoles are ``SILENT`` so their programs are unknown so far. .. image:: images/supvisors_scenario_3.png @@ -450,9 +472,4 @@ Example The full example is available in `Supvisors Use Cases - Scenario 3 `_. -An additional configuration for a single node and with automatic start of a HCI is also provided: - - * etc/supervisord_localhost.conf - * etc/supvisors_localhost_rules.xml - .. include:: common.rst diff --git a/supvisors/test/use_cases/scenario_3/start.sh b/supvisors/test/use_cases/scenario_3/start.sh index 15199007..99ed4afc 100755 --- a/supvisors/test/use_cases/scenario_3/start.sh +++ b/supvisors/test/use_cases/scenario_3/start.sh @@ -3,34 +3,23 @@ # go to script folder test_dir=$(dirname "$(readlink -f "$0")") -HOSTS="cliche81 cliche82 cliche83" +# set default hosts if not provided in command line +HOSTS=${@:-cliche81 cliche82 cliche83} -# configure / clear logs -for i in $HOSTS +# clear logs / configure / start server + console on each host +for host in $HOSTS do - ssh $i "cd $test_dir ; rm -rf log ; mkdir log ; ./configure.sh" -done - -# start supervisor on all servers -for i in $HOSTS -do - x=`echo "$i" | tail -c 2` - echo "start Supvisors on $i as server_$x" - ssh $i "export DISPLAY=:0 ; export IDENTIFIER=server_$x - cd $test_dir - sed -i 's/identifier=.*$/identifier='\$IDENTIFIER'/' etc/supervisord_server.conf - supervisord -c etc/supervisord_server.conf -i \$IDENTIFIER" -done - -# start supervisor on all consoles -for i in $HOSTS -do - x=`echo "$i" | tail -c 2` - echo "start Supvisors on $i as console_$x" - ssh $i "export DISPLAY=:0 ; export IDENTIFIER=console_$x - cd $test_dir - sed -i 's/identifier=.*$/identifier='\$IDENTIFIER'/' etc/supervisord_console.conf - supervisord -c etc/supervisord_console.conf -i \$IDENTIFIER" + x=`echo "$host" | tail -c 2` + ping -c 1 $host 2>&1 >/dev/null && ssh $host "cd $test_dir + rm -rf log ; mkdir log + ./configure.sh + export DISPLAY=:0 + echo \"start Supvisors on $host as server_$x\" + export IDENTIFIER=server_$x + supervisord -c etc/supervisord_server.conf -i \$IDENTIFIER + echo \"start Supvisors on $host as console_$x\" + export IDENTIFIER=console_$x + supervisord -c etc/supervisord_console.conf -i \$IDENTIFIER" done # start firefox to get the Web UI diff --git a/supvisors/test/use_cases/scenario_3/start_localhost.sh b/supvisors/test/use_cases/scenario_3/start_localhost.sh deleted file mode 100755 index 11c4778a..00000000 --- a/supvisors/test/use_cases/scenario_3/start_localhost.sh +++ /dev/null @@ -1,27 +0,0 @@ -#!/bin/bash - -# go to script folder -test_dir=$(dirname "$(readlink -f "$0")") -cd $test_dir - -# clear log folder -rm -rf log -mkdir log - -# configure N HCI (N is number of consoles) -./configure.sh - -# start firefox to get the Web UI -firefox http://localhost:61000 & - -# Due to Supervisor issue#1483, it is impossible to assign the identifier using the command line -# identifier is not part of the possible expansions -sed -i 's/identifier=.*$/identifier=console_1/' etc/supervisord_console.conf -export IDENTIFIER=console_1 -echo "start Supvisors $IDENTIFIER on" `hostname` -supervisord -c etc/supervisord_console.conf -i $IDENTIFIER - -sed -i 's/identifier=.*$/identifier=server_1/' etc/supervisord_server.conf -export IDENTIFIER=server_1 -echo "start Supvisors $IDENTIFIER on" `hostname` -supervisord -c etc/supervisord_server.conf -i $IDENTIFIER -n diff --git a/supvisors/tests/base.py b/supvisors/tests/base.py index 36cb45ab..2ef52282 100644 --- a/supvisors/tests/base.py +++ b/supvisors/tests/base.py @@ -75,7 +75,7 @@ def __init__(self): self.supvisors_mapper = SupvisorsMapper(self) host_name = gethostname() identifiers = ['127.0.0.1', '10.0.0.1', '10.0.0.2', '10.0.0.3', '10.0.0.4', '10.0.0.5', host_name] - self.supvisors_mapper.configure(identifiers) + self.supvisors_mapper.configure(identifiers, []) self.supvisors_mapper.local_identifier = '127.0.0.1' # remove gethostname for the tests self.supvisors_mapper.instances.pop(host_name, None) diff --git a/supvisors/tests/test_commander.py b/supvisors/tests/test_commander.py index f56be613..1475ac35 100644 --- a/supvisors/tests/test_commander.py +++ b/supvisors/tests/test_commander.py @@ -91,16 +91,16 @@ def test_start_command_timed_out(): command.identifiers = ['10.0.0.1'] for state in [ProcessStates.BACKOFF, ProcessStates.STARTING]: process.info_map['10.0.0.1']['state'] = state - assert not command.timed_out(107) - assert command.timed_out(108) + assert not command.timed_out(105) + assert command.timed_out(106) # check call with nodes_names set and process state RUNNING on the node process.info_map['10.0.0.1']['state'] = ProcessStates.RUNNING assert not command.timed_out(1000) # check call with nodes_names set and process state in STOPPED_STATES or STOPPING on the node for state in [ProcessStates.STOPPING] + list(STOPPED_STATES): process.info_map['10.0.0.1']['state'] = state - assert not command.timed_out(105) - assert command.timed_out(106) + assert not command.timed_out(103) + assert command.timed_out(104) # ProcessStopCommand part @@ -134,14 +134,14 @@ def test_stop_command_timed_out(): command.identifiers = ['10.0.0.1', '10.0.0.2'] process.info_map['10.0.0.1']['state'] = ProcessStates.STOPPING process.info_map['10.0.0.2']['state'] = ProcessStates.STOPPING - assert not command.timed_out(115) - assert command.timed_out(116) + assert not command.timed_out(113) + assert command.timed_out(114) # check call with nodes_names set and process state in any other state on the node for state in _process_states_by_code.keys(): if state != ProcessStates.STOPPING: process.info_map['10.0.0.2']['state'] = state - assert not command.timed_out(105) - assert command.timed_out(106) + assert not command.timed_out(103) + assert command.timed_out(104) # ApplicationJobs part diff --git a/supvisors/tests/test_context.py b/supvisors/tests/test_context.py index ffce2e37..17c8305a 100644 --- a/supvisors/tests/test_context.py +++ b/supvisors/tests/test_context.py @@ -145,7 +145,7 @@ def test_instances_by_states(context): def test_running_core_identifiers(supvisors): """ Test if the core instances are in a RUNNING state. """ - supvisors.options.force_synchro_if = ['10.0.0.1', '10.0.0.4'] + supvisors.supvisors_mapper._core_identifiers = ['10.0.0.1', '10.0.0.4'] context = Context(supvisors) # test initial states assert sorted(context.unknown_identifiers()) == sorted(context.supvisors.supvisors_mapper.instances.keys()) diff --git a/supvisors/tests/test_options.py b/supvisors/tests/test_options.py index e7663bb0..ec9d347f 100644 --- a/supvisors/tests/test_options.py +++ b/supvisors/tests/test_options.py @@ -67,7 +67,7 @@ def test_options_creation(opt): assert opt.event_port == 0 assert not opt.auto_fence assert opt.synchro_timeout == 15 - assert opt.force_synchro_if == set() + assert opt.core_identifiers == set() assert opt.conciliation_strategy == ConciliationStrategies.USER assert opt.starting_strategy == StartingStrategies.CONFIG assert opt.stats_enabled @@ -88,7 +88,7 @@ def test_filled_options_creation(filled_opt): assert filled_opt.event_port == 60002 assert filled_opt.auto_fence assert filled_opt.synchro_timeout == 20 - assert filled_opt.force_synchro_if == {'cliche01', 'cliche03'} + assert filled_opt.core_identifiers == {'cliche01', 'cliche03'} assert filled_opt.conciliation_strategy == ConciliationStrategies.SENICIDE assert filled_opt.starting_strategy == StartingStrategies.MOST_LOADED assert not filled_opt.stats_enabled @@ -104,7 +104,7 @@ def test_filled_options_creation(filled_opt): def test_str(opt): """ Test the string output. """ assert str(opt) == (f'supvisors_list=[\'{gethostname()}\'] rules_files=None internal_port=0 event_port=0' - ' auto_fence=False synchro_timeout=15 force_synchro_if=set() conciliation_strategy=USER' + ' auto_fence=False synchro_timeout=15 core_identifiers=set() conciliation_strategy=USER' ' starting_strategy=CONFIG stats_enabled=True stats_periods=[10] stats_histo=200' f' stats_irix_mode=False logfile={Automatic} logfile_maxbytes={50 * 1024 * 1024}' ' logfile_backups=10 loglevel=20') diff --git a/supvisors/tests/test_statemachine.py b/supvisors/tests/test_statemachine.py index 8eb020bc..87fb3f44 100644 --- a/supvisors/tests/test_statemachine.py +++ b/supvisors/tests/test_statemachine.py @@ -105,7 +105,7 @@ def test_initialization_state(supvisors_ctx): result = state.next() assert result == SupvisorsStates.DEPLOYMENT # test case where end of synchro is forced based on core instances running - supvisors_ctx.options.force_synchro_if = {'10.0.0.2', '10.0.0.4'} + supvisors_ctx.supvisors_mapper._core_identifiers = {'10.0.0.2', '10.0.0.4'} nodes['10.0.0.3']._state = SupvisorsInstanceStates.UNKNOWN nodes['10.0.0.4']._state = SupvisorsInstanceStates.RUNNING # SYNCHRO_TIMEOUT_MIN not passed yet @@ -131,14 +131,14 @@ def test_initialization_state(supvisors_ctx): # test when master_identifier is not set and no core instances # check master is the lowest string among running node names supvisors_ctx.context.master_identifier = None - supvisors_ctx.options.force_synchro_if = {} + supvisors_ctx.supvisors_mapper._core_identifiers = {} state.exit() assert supvisors_ctx.context.running_identifiers() == ['127.0.0.1', '10.0.0.2', '10.0.0.4'] assert supvisors_ctx.context.master_identifier == '10.0.0.2' - # test when master_identifier is not set and forced instances are used + # test when master_identifier is not set and core instances are used # check master is the lowest string among the intersection between running node names and forced instances supvisors_ctx.context.master_identifier = None - supvisors_ctx.options.force_synchro_if = {'10.0.0.3', '10.0.0.4'} + supvisors_ctx.supvisors_mapper._core_identifiers = {'10.0.0.3', '10.0.0.4'} state.exit() assert supvisors_ctx.context.running_identifiers() == ['127.0.0.1', '10.0.0.2', '10.0.0.4'] assert supvisors_ctx.context.master_identifier == '10.0.0.4' diff --git a/supvisors/tests/test_supvisorsmapper.py b/supvisors/tests/test_supvisorsmapper.py index 63aa32e8..cfb1d59c 100644 --- a/supvisors/tests/test_supvisorsmapper.py +++ b/supvisors/tests/test_supvisorsmapper.py @@ -135,21 +135,23 @@ def test_mapper_configure(mocker, mapper): mocked_find = mocker.patch.object(mapper, 'find_local_identifier') # configure mapper with elements items = ['127.0.0.1', '10.0.0.5:7777:', '10.0.0.4:15000:8888'] - mapper.configure(items) + core_items = ['127.0.0.1', 'supervisor_05', '10.0.0.4:15000', 'supervisor_06', '10.0.0.4'] + mapper.configure(items, core_items) assert list(mapper.instances.keys()) == ['127.0.0.1', 'supervisor_05', '10.0.0.4:15000'] + assert mapper.core_identifiers == ['127.0.0.1', 'supervisor_05', '10.0.0.4:15000'] assert mocked_find.called mocked_find.reset_mock() # configure mapper with one invalid element items = ['127.0.0.1', '', '10.0.0.5', '10.0.0.4:15000'] with pytest.raises(ValueError): - mapper.configure(items) + mapper.configure(items, core_items) assert not mocked_find.called def test_find_local_identifier_identifier(mapper): """ Test the SupvisorsMapper.find_local_identifier method when Supervisor identifier is among the instances. """ items = ['127.0.0.1', '10.0.0.5:7777:'] - mapper.configure(items) + mapper.configure(items, []) assert mapper.local_identifier == 'supervisor' @@ -157,7 +159,7 @@ def test_find_local_identifier_host_name(mapper): """ Test the SupvisorsMapper.find_local_identifier method when one instance matches the host name. """ hostname = socket.gethostname() items = ['127.0.0.1', f'{hostname}:60000:7777'] - mapper.configure(items) + mapper.configure(items, []) assert mapper.local_identifier == f'{hostname}:60000' @@ -166,7 +168,7 @@ def test_find_local_identifier_ip_address(mapper): hostname = socket.gethostname() mapper.local_node_references = [hostname, '10.0.0.1'] items = ['127.0.0.1', '10.0.0.1:60000:7777'] - mapper.configure(items) + mapper.configure(items, []) assert mapper.local_identifier == 'host' @@ -177,7 +179,7 @@ def test_find_local_identifier_multiple(mapper): mapper.local_node_references = [hostname, '10.0.0.1'] items = ['10.0.0.1', f'{hostname}:60000:7777'] with pytest.raises(ValueError): - mapper.configure(items) + mapper.configure(items, []) def test_find_local_identifier_none(mapper): @@ -186,7 +188,7 @@ def test_find_local_identifier_none(mapper): mapper.local_node_references = ['10.0.0.1'] items = ['10.0.0.2', 'cliche:60000:7777'] with pytest.raises(ValueError): - mapper.configure(items) + mapper.configure(items, []) def test_valid(mapper): @@ -194,7 +196,7 @@ def test_valid(mapper): # add context hostname = socket.gethostname() items = ['127.0.0.1', '10.0.0.1:2222:', f'{hostname}:60000:7777'] - mapper.configure(items) + mapper.configure(items, []) # test calls assert mapper.valid('127.0.0.1') assert mapper.valid('host') @@ -209,7 +211,7 @@ def test_filter(mapper): # add context hostname = socket.gethostname() items = ['127.0.0.1', '10.0.0.1:2222:', f'{hostname}:60000:7777'] - mapper.configure(items) + mapper.configure(items, []) # test with a bunch of identifiers identifier_list = ['127.0.0.1', 'host', f'{hostname}:60000', hostname, 'host', 'supervisor', '10.0.0.1'] assert mapper.filter(identifier_list) == ['127.0.0.1', 'host', f'{hostname}:60000'] diff --git a/supvisors/ui/application.html b/supvisors/ui/application.html index d985d7c1..014b5776 100644 --- a/supvisors/ui/application.html +++ b/supvisors/ui/application.html @@ -41,7 +41,7 @@

Applications

Contact: Julien Le Cl&eacute;ach

- +
diff --git a/supvisors/ui/css/ui_style.css b/supvisors/ui/css/ui_style.css index 040c43dc..0d726606 100644 --- a/supvisors/ui/css/ui_style.css +++ b/supvisors/ui/css/ui_style.css @@ -481,8 +481,19 @@ th { text-align: center; position: sticky; top: 0; + z-index: 20; } +/* TO BE TESTED +tr:nth-child(even) td[scope=row] { + background-image: linear-gradient(180deg, var(--vlight2-color), var(--light2-color), var(--vlight2-color)); +} + +tr:nth-child(odd) td[scope=row] { + background-image: linear-gradient(180deg, var(--light1-color), var(--light2-color), var(--light1-color)); +} +*/ + .brightened { background-image: linear-gradient(180deg, var(--vlight2-color), var(--light2-color), var(--vlight2-color)); } diff --git a/supvisors/ui/host_instance.html b/supvisors/ui/host_instance.html index 52f4939e..9eee6841 100644 --- a/supvisors/ui/host_instance.html +++ b/supvisors/ui/host_instance.html @@ -41,7 +41,7 @@

Applications

Contact: Julien Le Cl&eacute;ach

- +
diff --git a/supvisors/ui/proc_instance.html b/supvisors/ui/proc_instance.html index c54a421f..1246cbae 100644 --- a/supvisors/ui/proc_instance.html +++ b/supvisors/ui/proc_instance.html @@ -41,7 +41,7 @@

Applications

diff --git a/tox.ini b/tox.ini index 032937fc..f83778b2 100644 --- a/tox.ini +++ b/tox.ini @@ -1,6 +1,6 @@ [tox] envlist = - cover,py36,py37,py38,docs + cover,py36,py37,py38,py39,docs [testenv] commands =