Skip to content

Commit

Permalink
serving doc
Browse files Browse the repository at this point in the history
  • Loading branch information
antonymayi committed Aug 19, 2022
1 parent 2c6d3b3 commit e3812d0
Show file tree
Hide file tree
Showing 5 changed files with 132 additions and 155 deletions.
14 changes: 8 additions & 6 deletions docs/application.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,17 +62,19 @@ serving - e.g. *Project B*), on the other hand an application can possibly span
end

It makes sense to manage an application (descriptor) in the scope of some particular project if
they constitute a 1:1 relationship (perhaps the most typical scenario). More complex applications
might need to be maintained separately though.
they form a 1:1 relationship (perhaps the most typical scenario). More complex applications might
need to be maintained separately though.

ForML :ref:`platform <platform>` maintains :ref:`published applications <application-publishing>`
ForML :ref:`platform <platform>` persists :ref:`published applications <application-publishing>`
within a special :ref:`application inventory <inventory>` where they are picked from at runtime
by the :ref:`serving engine <serving>`.

Process Control
---------------
.. _application-dispatch:

Applications play a key role in the :ref:`serving process <serving>` taking control over the
Request Dispatching
-------------------

Applications play a key role in the :ref:`serving process <serving-process>` taking control over the
following steps:

.. md-mermaid::
Expand Down
13 changes: 12 additions & 1 deletion docs/platform.rst
Original file line number Diff line number Diff line change
Expand Up @@ -188,15 +188,20 @@ Command Group Related Chapters
``$ forml application`` :ref:`Application Management <inventory-management>`

:ref:`Application Publishing <application-publishing>`

:ref:`Serving Control <serving-gateway>`
``$ forml model`` :ref:`Model Management <registry-management>`

:ref:`Production Lifecycle Management <lifecycle-production>`
``$ forml project`` :ref:`Development Lifecycle Management <lifecycle-development>`
======================= ===============================================================


Common Runtime Features
-----------------------

Core Exceptions
---------------
^^^^^^^^^^^^^^^

Following is the list of core ForML exceptions emitted at runtime:

Expand All @@ -209,3 +214,9 @@ Following is the list of core ForML exceptions emitted at runtime:
:show-inheritance:
.. autoclass:: forml.FailedError
:show-inheritance:


Runtime Performance Metric
^^^^^^^^^^^^^^^^^^^^^^^^^^

.. autoclass:: forml.runtime.Stats
231 changes: 90 additions & 141 deletions docs/serving.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,21 @@
Serving Engine
==============

In addition to the basic :ref:`CLI-driven <platform-cli>` project-level batch-mode :ref:`execution
mechanism <platform-execution>`, ForML allows to operate the encompassing :ref:`applications
<application>` within an interactive loop performing the *apply* action of the :ref:`production
lifecycle <lifecycle-production>` - essentially providing *online predictions* a.k.a. *ML
inference* based on the underlying models.

.. _serving-process:

Process Control
---------------

The core component driving the serving loop is the *Engine*. To facilitate the end-to-end
prediction serving, it interacts with all the different :ref:`platform <platform>` sub-systems as
shown in the following sequence diagram:

.. md-mermaid::

sequenceDiagram
Expand Down Expand Up @@ -50,38 +65,93 @@ Serving Engine
Engine --) Client: Response


a.k.a. inference

can serve multiple :ref:`application` as long as they are published into the :ref:`inventory
<inventory>`.

one of the :ref:`execution mechanisms <platform-execution>`

apply action of the :ref:`production lifecycle <lifecycle-production>`

CLI ``forml application serve``

Project = generic ML solution (how to solve - returns prediction outcomes (e.g. probabilities))
Application = Model (project/generation) selection, data adaptation (how to deliver - returns
a domain response (e.g. recommended products))
Gateway = Transport Protocol (how to integrate)


.. autoclass:: forml.runtime.Stats
This diagram illustrates the following steps:

#. Receiving a request containing the query payload and the target :ref:`application <application>`
reference.
#. Upon a very first request for any given application, the engine fetches the particular
:ref:`application descriptor <application-implementation>` from the configured :ref:`inventory
<inventory>`. The descriptor remains cached for every follow-up request of that application.
#. The engine uses the descriptor of the selected application to :ref:`dispatch the request
<application-dispatch>` by:

#. :ref:`Interpreting <application-interpret>` the query payload.
#. :ref:`Selecting <application-select>` a particular :ref:`model generation
<registry-assets>` to serve the given request (depending on the model-selection strategy
used by that application, this step might involve interaction with the :ref:`model registry
<registry>`).

#. Unless already running, the engine spawns a dedicated :ref:`runner <runner>` which loads the
selected :ref:`model artifacts <registry-artifacts>` providing an isolated environment not
colliding with (dependencies of) other models also served by the same engine.
#. The runner might involve the configured :ref:`feed system <feed>` to augment the provided
data points using a feature store.
#. With the complete feature-set matching the project defined :ref:`schema <project-source>`,
the runner executes the :ref:`pipeline <project-pipeline>` in the :ref:`apply-mode
<workflow-mode>` obtaining the prediction outcomes.
#. Finally, the engine again uses the application descriptor to :ref:`produce
the response <application-interpret>` which is then returned to the original caller.

.. note::
An engine can serve any :ref:`application <application>` available in its linked
:ref:`inventory <inventory>` in a multiplexed fashion. Since the released :ref:`project
packages <registry-package>` contain all the :ref:`declared dependencies <project-setup>`,
the engine itself remains generic. To avoid collisions between dependencies of different
models, the engine separates each one in an isolated context.


.. _serving-gateway:

Frontend Gateway
----------------

Protocol
While the engine is full-featured in terms of the end-to-end application serving, it can only be
engaged using its raw Python API. That's suitable for products natively embedding the engine as
an integrated component, but for a truly decoupled client-server architecture this needs an extra
layer providing some sort of a transport protocol.

For this purpose, ForML comes with a concept of *serving frontend gateways*. They also follow the
:ref:`provider pattern <provider>` allowing to deliver number of different interchangeable
:ref:`implementations <serving-providers>` plugable at launch-time.

Frontend gateways represent the outermost layer in the logical hierarchy of the ForML architecture:

================================ ======================= ================= ====================
Layer Objective/Task Problem question Product/Instance
================================ ======================= ================= ====================
:ref:`Project <project>` ML solution How to solve? Prediction outcomes
(e.g. probabilities)
:ref:`Application <application>` Domain interpretation, How to utilize? Domain response
model selection (e.g. recommended
products)
:ref:`Engine <serving>` Serving control How to operate? Interactive
processing loop
:ref:`Gateway <serving-gateway>` Client-server transport How to integrate? ML service API
================================ ======================= ================= ====================

API
^^^

.. autoclass:: forml.runtime.Gateway
:members: run

Service Management
^^^^^^^^^^^^^^^^^^

The gateway service can be managed using the :ref:`CLI <platform-cli>` as follows (see the
integrated help for full synopsis):

========================== =============================
Use case Command
========================== =============================
Launch the gateway service ``$ forml application serve``
========================== =============================


.. _serving-providers:

Gateway Providers
-----------------
^^^^^^^^^^^^^^^^^

Gateway :ref:`providers <provider>` can be configured within the runtime :ref:`platform setup
<platform>` using the ``[GATEWAY.*]`` sections.
Expand All @@ -93,124 +163,3 @@ The available implementations are:
:nosignatures:

forml.provider.gateway.rest.Gateway











In addition to the basic :ref:`CLI driven <platform-cli>` isolated batch mode, ML projects implemented on ForML can be
embedded into a dynamic serving layer and operated in an autonomous *full-cycle* fashion. This layer continuously serves
the following functions:

* incremental training
* tuning
* ongoing performance reporting
* dynamic rollout strategies


Feedback Loop
-------------

In order to autonomously provide the full-cycle serving capabilities for a supervised ML project, there needs to be
a programmatically reachable event-outcome feedback loop defined as an external reconciliation path providing
a knowledge of the true outcome for every event the system is predicting for.

Implementation of this feedback loop (the reconciliation logic) is in scope of the particular business application and
its data architecture to which ForML simply plugs into using its :ref:`feed system <feed>`.

The key attribute of this feedback loop is its *latency* which determines the turnaround time for all the serving
functionality like performance monitoring, incremental training etc.


.. _serving-components:

Components
----------

The serving capabilities are provided through a number of additional :ref:`platform components
<platform>` as explained in the following sections.

.. image:: _static/images/serving-components.png

Online Agent
''''''''''''

Online agent is the most apparent serving component responsible for answering the event queries with actual
predictions. In scope of this process it needs to go through set of essential steps (some of them are part of
agent bootstrapping or periodical cache refreshing while others are synchronous with each query):

1. Fetching the serving manifest from the *project roster*.
2. Selecting a particular model generation using the dynamic *rollout strategy* as defined in the serving manifest.
3. Loading the selected model generation from the :ref:`model registry <registry>`.
4. Fetching all missing input features for augmenting the particular request according to the project
:ref:`input DSL <concept-dsl>`.
5. Running the prediction pipeline and responding with the result.
6. Submitting query metadata to the *query logbus*.

The rollout workflow employed by the agent is a powerful concept allowing to select particular model/generation
dynamically based on the project-defined function of any available parameters (mainly the performance metrics). This
allows to implement strategies like *canary deployment*, *multi-armed bandits*, *A/B testing*, *cold-start* or
*fallback* models etc.

The serving agent is expected to be embedded into a particular application layer (ie web/rest service) to provide the
actual frontend facade.

Project Roster
''''''''''''''

This is a tiny storage service used by the serving layer to pickup list of active projects and their serving
manifests. It gets updated as part of project deployment promotion and continuously watched by the online/offline
agents to determine things like the model generation selection.

Query LogBus
''''''''''''

Standard publisher-subscriber software bus for distributing the serving queries metadata to allow for further (offline)
processing like the performance reporting or general debugging. The typical attributes sent to the query logbus per each
event are:

* timestamp
* query ID
* project + version
* query fields
* obtained augmentation features
* prediction result
* latency

PerfDB
''''''

Another storage service for aggregating the performance metric as time series derived from both the metadata pushed via
*query logbus* as well as the main *feedback loop* and produced by the *offline agent* processing.

The PerfDB is a crucial source of information not only for any sorts of operational monitoring/reporting but
especially for the dynamic model generation selection performed by the online agent according to the rollout strategy
when serving the actual event queries.

The typical available metrics are:

* per project:

* per model generation:

* serving latency (gauge)
* number of requests (counter)
* loss function value
* auxiliary project-defined metrics

* loss function value

Offline Agent
'''''''''''''

Offline agent is the backend service responsible for doing all the heavy processing of:

* (incremental) *training* and *tuning* of new model generations (pushed to the :ref:`model
registry <registry>`)
* *evaluating* project performance (pushed to the `PerfDB`_)
2 changes: 1 addition & 1 deletion docs/tutorials/titanic/pipeline.rst
Original file line number Diff line number Diff line change
Expand Up @@ -119,5 +119,5 @@ framework.
:start-at: import


:ref:`Component-wise <project-structure>`, this makes are project complete, allowing us to
:ref:`Component-wise <project-structure>`, this makes our project complete, allowing us to
further :doc:`progress its lifecycle <lifecycle>`.
27 changes: 21 additions & 6 deletions forml/runtime/_service/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,13 +74,21 @@ async def apply(self, application: str, request: layout.Request) -> layout.Respo


class Gateway(provider.Service, default=setup.Gateway.default, path=setup.Gateway.path):
"""Top-level serving gateway abstraction."""
"""Top-level serving gateway abstraction.
Args:
inventory: Inventory of applications to be served.
registry: Model registry of project artifacts to be served.
feeds: Feeds to be used for potential feature augmentation.
processes: Process pool size for each model sandbox.
loop: Explicit even loop instance.
"""

def __init__(
self,
inventory: typing.Optional[asset.Inventory] = None,
registry: typing.Optional[asset.Registry] = None,
feeds: typing.Optional[io.Importer] = None,
inventory: typing.Optional['asset.Inventory'] = None,
registry: typing.Optional['asset.Registry'] = None,
feeds: typing.Optional['io.Importer'] = None,
processes: typing.Optional[int] = None,
loop: typing.Optional['asyncio.AbstractEventLoop'] = None,
**_,
Expand All @@ -103,10 +111,17 @@ def __exit__(self, exc_type, exc_val, exc_tb):
@abc.abstractmethod
def run(
self,
apply: typing.Callable[[str, layout.Request], typing.Awaitable[layout.Response]],
apply: typing.Callable[[str, 'layout.Request'], typing.Awaitable['layout.Response']],
stats: typing.Callable[[], typing.Awaitable['runtime.Stats']],
) -> None:
"""Serving loop."""
"""Serving loop implementation.
Args:
apply: Prediction request handler provided by the engine.
The handler expects two parameters - the target *application name* and the
*prediction request*.
stats: Stats producer callback provided by the engine.
"""

def main(self) -> None:
"""Frontend main method."""
Expand Down

0 comments on commit e3812d0

Please sign in to comment.