serving doc

formlio · Aug 19, 2022 · e3812d0 · e3812d0
1 parent 2c6d3b3
commit e3812d0
Show file tree

Hide file tree

Showing 5 changed files with 132 additions and 155 deletions.
diff --git a/docs/application.rst b/docs/application.rst
@@ -62,17 +62,19 @@ serving - e.g. *Project B*), on the other hand an application can possibly span
         end
 
 It makes sense to manage an application (descriptor) in the scope of some particular project if
-they constitute a 1:1 relationship (perhaps the most typical scenario). More complex applications
-might need to be maintained separately though.
+they form a 1:1 relationship (perhaps the most typical scenario). More complex applications might
+need to be maintained separately though.
 
-ForML :ref:`platform <platform>` maintains :ref:`published applications <application-publishing>`
+ForML :ref:`platform <platform>` persists :ref:`published applications <application-publishing>`
 within a special :ref:`application inventory <inventory>` where they are picked from at runtime
 by the :ref:`serving engine <serving>`.
 
-Process Control
----------------
+.. _application-dispatch:
 
-Applications play a key role in the :ref:`serving process <serving>` taking control over the
+Request Dispatching
+-------------------
+
+Applications play a key role in the :ref:`serving process <serving-process>` taking control over the
 following steps:
 
 .. md-mermaid::

diff --git a/docs/platform.rst b/docs/platform.rst
@@ -188,15 +188,20 @@ Command Group            Related Chapters
 ``$ forml application``  :ref:`Application Management <inventory-management>`
 
                          :ref:`Application Publishing <application-publishing>`
+
+                         :ref:`Serving Control <serving-gateway>`
 ``$ forml model``        :ref:`Model Management <registry-management>`
 
                          :ref:`Production Lifecycle Management <lifecycle-production>`
 ``$ forml project``      :ref:`Development Lifecycle Management <lifecycle-development>`
 =======================  ===============================================================
 
 
+Common Runtime Features
+-----------------------
+
 Core Exceptions
----------------
+^^^^^^^^^^^^^^^
 
 Following is the list of core ForML exceptions emitted at runtime:
 
@@ -209,3 +214,9 @@ Following is the list of core ForML exceptions emitted at runtime:
    :show-inheritance:
 .. autoclass:: forml.FailedError
    :show-inheritance:
+
+
+Runtime Performance Metric
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. autoclass:: forml.runtime.Stats
diff --git a/docs/serving.rst b/docs/serving.rst
@@ -18,6 +18,21 @@
 Serving Engine
 ==============
 
+In addition to the basic :ref:`CLI-driven <platform-cli>` project-level batch-mode :ref:`execution
+mechanism <platform-execution>`, ForML allows to operate the encompassing :ref:`applications
+<application>` within an interactive loop performing the *apply* action of the :ref:`production
+lifecycle <lifecycle-production>` - essentially providing *online predictions* a.k.a. *ML
+inference* based on the underlying models.
+
+.. _serving-process:
+
+Process Control
+---------------
+
+The core component driving the serving loop is the *Engine*. To facilitate the end-to-end
+prediction serving, it interacts with all the different :ref:`platform <platform>` sub-systems as
+shown in the following sequence diagram:
+
 .. md-mermaid::
 
     sequenceDiagram
@@ -50,38 +65,93 @@ Serving Engine
         Engine --) Client: Response
 
 
-a.k.a. inference
-
-can serve multiple :ref:`application` as long as they are published into the :ref:`inventory
-<inventory>`.
-
-one of the :ref:`execution mechanisms <platform-execution>`
-
-apply action of the :ref:`production lifecycle <lifecycle-production>`
-
-CLI ``forml application serve``
-
-Project = generic ML solution (how to solve - returns prediction outcomes (e.g. probabilities))
-Application = Model (project/generation) selection, data adaptation (how to deliver - returns
-a domain response (e.g. recommended products))
-Gateway = Transport Protocol (how to integrate)
-
-
-.. autoclass:: forml.runtime.Stats
+This diagram illustrates the following steps:
+
+#. Receiving a request containing the query payload and the target :ref:`application <application>`
+   reference.
+#. Upon a very first request for any given application, the engine fetches the particular
+   :ref:`application descriptor <application-implementation>` from the configured :ref:`inventory
+   <inventory>`. The descriptor remains cached for every follow-up request of that application.
+#. The engine uses the descriptor of the selected application to :ref:`dispatch the request
+   <application-dispatch>` by:
+
+   #. :ref:`Interpreting <application-interpret>` the query payload.
+   #. :ref:`Selecting <application-select>` a particular :ref:`model generation
+      <registry-assets>` to serve the given request (depending on the model-selection strategy
+      used by that application, this step might involve interaction with the :ref:`model registry
+      <registry>`).
+
+#. Unless already running, the engine spawns a dedicated :ref:`runner <runner>` which loads the
+   selected :ref:`model artifacts <registry-artifacts>` providing an isolated environment not
+   colliding with (dependencies of) other models also served by the same engine.
+#. The runner might involve the configured :ref:`feed system <feed>` to augment the provided
+   data points using a feature store.
+#. With the complete feature-set matching the project defined :ref:`schema <project-source>`,
+   the runner executes the :ref:`pipeline <project-pipeline>` in the :ref:`apply-mode
+   <workflow-mode>` obtaining the prediction outcomes.
+#. Finally, the engine again uses the application descriptor to :ref:`produce
+   the response <application-interpret>` which is then returned to the original caller.
+
+.. note::
+    An engine can serve any :ref:`application <application>` available in its linked
+    :ref:`inventory <inventory>` in a multiplexed fashion. Since the released :ref:`project
+    packages <registry-package>` contain all the :ref:`declared dependencies <project-setup>`,
+    the engine itself remains generic. To avoid collisions between dependencies of different
+    models, the engine separates each one in an isolated context.
 
 
 .. _serving-gateway:
 
 Frontend Gateway
 ----------------
 
-Protocol
+While the engine is full-featured in terms of the end-to-end application serving, it can only be
+engaged using its raw Python API. That's suitable for products natively embedding the engine as
+an integrated component, but for a truly decoupled client-server architecture this needs an extra
+layer providing some sort of a transport protocol.
+
+For this purpose, ForML comes with a concept of *serving frontend gateways*. They also follow the
+:ref:`provider pattern <provider>` allowing to deliver number of different interchangeable
+:ref:`implementations <serving-providers>` plugable at launch-time.
+
+Frontend gateways represent the outermost layer in the logical hierarchy of the ForML architecture:
+
+================================  =======================  =================  ====================
+Layer                             Objective/Task           Problem question   Product/Instance
+================================  =======================  =================  ====================
+:ref:`Project <project>`          ML solution              How to solve?      Prediction outcomes
+                                                                              (e.g. probabilities)
+:ref:`Application <application>`  Domain interpretation,   How to utilize?    Domain response
+                                  model selection                             (e.g. recommended
+                                                                              products)
+:ref:`Engine <serving>`           Serving control          How to operate?    Interactive
+                                                                              processing loop
+:ref:`Gateway <serving-gateway>`  Client-server transport  How to integrate?  ML service API
+================================  =======================  =================  ====================
+
+API
+^^^
 
 .. autoclass:: forml.runtime.Gateway
+   :members: run
+
+Service Management
+^^^^^^^^^^^^^^^^^^
+
+The gateway service can be managed using the :ref:`CLI <platform-cli>` as follows (see the
+integrated help for full synopsis):
+
+==========================  =============================
+Use case                    Command
+==========================  =============================
+Launch the gateway service  ``$ forml application serve``
+==========================  =============================
+
 
+.. _serving-providers:
 
 Gateway Providers
------------------
+^^^^^^^^^^^^^^^^^
 
 Gateway :ref:`providers <provider>` can be configured within the runtime :ref:`platform setup
 <platform>` using the ``[GATEWAY.*]`` sections.
@@ -93,124 +163,3 @@ The available implementations are:
    :nosignatures:
 
    forml.provider.gateway.rest.Gateway
-
-
-
-
-
-
-
-
-
-
-
-In addition to the basic :ref:`CLI driven <platform-cli>` isolated batch mode, ML projects implemented on ForML can be
-embedded into a dynamic serving layer and operated in an autonomous *full-cycle* fashion. This layer continuously serves
-the following functions:
-
-* incremental training
-* tuning
-* ongoing performance reporting
-* dynamic rollout strategies
-
-
-Feedback Loop
--------------
-
-In order to autonomously provide the full-cycle serving capabilities for a supervised ML project, there needs to be
-a programmatically reachable event-outcome feedback loop defined as an external reconciliation path providing
-a knowledge of the true outcome for every event the system is predicting for.
-
-Implementation of this feedback loop (the reconciliation logic) is in scope of the particular business application and
-its data architecture to which ForML simply plugs into using its :ref:`feed system <feed>`.
-
-The key attribute of this feedback loop is its *latency* which determines the turnaround time for all the serving
-functionality like performance monitoring, incremental training etc.
-
-
-.. _serving-components:
-
-Components
-----------
-
-The serving capabilities are provided through a number of additional :ref:`platform components
-<platform>` as explained in the following sections.
-
-.. image:: _static/images/serving-components.png
-
-Online Agent
-''''''''''''
-
-Online agent is the most apparent serving component responsible for answering the event queries with actual
-predictions. In scope of this process it needs to go through set of essential steps (some of them are part of
-agent bootstrapping or periodical cache refreshing while others are synchronous with each query):
-
-1. Fetching the serving manifest from the *project roster*.
-2. Selecting a particular model generation using the dynamic *rollout strategy* as defined in the serving manifest.
-3. Loading the selected model generation from the :ref:`model registry <registry>`.
-4. Fetching all missing input features for augmenting the particular request according to the project
-   :ref:`input DSL <concept-dsl>`.
-5. Running the prediction pipeline and responding with the result.
-6. Submitting query metadata to the *query logbus*.
-
-The rollout workflow employed by the agent is a powerful concept allowing to select particular model/generation
-dynamically based on the project-defined function of any available parameters (mainly the performance metrics). This
-allows to implement strategies like *canary deployment*, *multi-armed bandits*, *A/B testing*, *cold-start* or
-*fallback* models etc.
-
-The serving agent is expected to be embedded into a particular application layer (ie web/rest service) to provide the
-actual frontend facade.
-
-Project Roster
-''''''''''''''
-
-This is a tiny storage service used by the serving layer to pickup list of active projects and their serving
-manifests. It gets updated as part of project deployment promotion and continuously watched by the online/offline
-agents to determine things like the model generation selection.
-
-Query LogBus
-''''''''''''
-
-Standard publisher-subscriber software bus for distributing the serving queries metadata to allow for further (offline)
-processing like the performance reporting or general debugging. The typical attributes sent to the query logbus per each
-event are:
-
-* timestamp
-* query ID
-* project + version
-* query fields
-* obtained augmentation features
-* prediction result
-* latency
-
-PerfDB
-''''''
-
-Another storage service for aggregating the performance metric as time series derived from both the metadata pushed via
-*query logbus* as well as the main *feedback loop* and produced by the *offline agent* processing.
-
-The PerfDB is a crucial source of information not only for any sorts of operational monitoring/reporting but
-especially for the dynamic model generation selection performed by the online agent according to the rollout strategy
-when serving the actual event queries.
-
-The typical available metrics are:
-
-* per project:
-
-  * per model generation:
-
-    * serving latency (gauge)
-    * number of requests (counter)
-    * loss function value
-    * auxiliary project-defined metrics
-
-  * loss function value
-
-Offline Agent
-'''''''''''''
-
-Offline agent is the backend service responsible for doing all the heavy processing of:
-
-* (incremental) *training* and *tuning* of new model generations (pushed to the :ref:`model
-  registry <registry>`)
-* *evaluating* project performance (pushed to the `PerfDB`_)
diff --git a/docs/tutorials/titanic/pipeline.rst b/docs/tutorials/titanic/pipeline.rst
@@ -119,5 +119,5 @@ framework.
   :start-at: import
 
 
-:ref:`Component-wise <project-structure>`, this makes are project complete, allowing us to
+:ref:`Component-wise <project-structure>`, this makes our project complete, allowing us to
 further :doc:`progress its lifecycle <lifecycle>`.
diff --git a/forml/runtime/_service/__init__.py b/forml/runtime/_service/__init__.py
@@ -74,13 +74,21 @@ async def apply(self, application: str, request: layout.Request) -> layout.Respo
 
 
 class Gateway(provider.Service, default=setup.Gateway.default, path=setup.Gateway.path):
-    """Top-level serving gateway abstraction."""
+    """Top-level serving gateway abstraction.
+
+    Args:
+        inventory: Inventory of applications to be served.
+        registry: Model registry of project artifacts to be served.
+        feeds: Feeds to be used for potential feature augmentation.
+        processes: Process pool size for each model sandbox.
+        loop: Explicit even loop instance.
+    """
 
     def __init__(
         self,
-        inventory: typing.Optional[asset.Inventory] = None,
-        registry: typing.Optional[asset.Registry] = None,
-        feeds: typing.Optional[io.Importer] = None,
+        inventory: typing.Optional['asset.Inventory'] = None,
+        registry: typing.Optional['asset.Registry'] = None,
+        feeds: typing.Optional['io.Importer'] = None,
         processes: typing.Optional[int] = None,
         loop: typing.Optional['asyncio.AbstractEventLoop'] = None,
         **_,
@@ -103,10 +111,17 @@ def __exit__(self, exc_type, exc_val, exc_tb):
     @abc.abstractmethod
     def run(
         self,
-        apply: typing.Callable[[str, layout.Request], typing.Awaitable[layout.Response]],
+        apply: typing.Callable[[str, 'layout.Request'], typing.Awaitable['layout.Response']],
         stats: typing.Callable[[], typing.Awaitable['runtime.Stats']],
     ) -> None:
-        """Serving loop."""
+        """Serving loop implementation.
+
+        Args:
+            apply: Prediction request handler provided by the engine.
+                   The handler expects two parameters - the target *application name* and the
+                   *prediction request*.
+            stats: Stats producer callback provided by the engine.
+        """
 
     def main(self) -> None:
         """Frontend main method."""