Skip to content

Commit

Permalink
deployment guide improvements
Browse files Browse the repository at this point in the history
  • Loading branch information
edublancas committed Mar 12, 2021
1 parent d427d9d commit d403637
Showing 1 changed file with 22 additions and 25 deletions.
47 changes: 22 additions & 25 deletions doc/user-guide/deployment.rst
Expand Up @@ -4,20 +4,19 @@ Deployment
The two most common ways to deploy data pipelines are batch and online.
Ploomber supports both deployment options.

Batch implies running your pipeline (usually on a schedule), generate results
and make them available for future consumption. For example, you may develop a
Batch means obtaining new observationsn (usually on a schedule),
make predictions and save them for later use. For example, you may develop a
Machine Learning pipeline that runs every morning, predicts the probability
of user churn and stores the probabilities in a database table.
probabilities can then later be used to guide decision-making.
of user churn and stores such probabilities in a database table.

Alternatively, you may deploy a pipeline as an online service. This time,
instead of storing predictions for later consumption, you expose your model
as a REST API and users can make requests and get predictions on demand.
Alternatively, you may deploy a pipeline as an online service and expose your
model as a REST APIl; users request predictions at any time by sending input
data.

Pipeline composition
====================

Before diving into the deployment details, let's introduce the concept of
Before diving into deployment details, let's introduce the concept of
pipeline composition.

The only difference between a Machine Learning training pipeline and its serving
Expand All @@ -42,7 +41,7 @@ model:
</div>

At **serving** time, we obtain new data, generate features and make
predictions using a previously trained model:
predictions using a trained model:


.. raw:: html
Expand All @@ -63,7 +62,7 @@ When the feature engineering process does not match,
This is one of the most common problems when deploying ML models. To fix it,
Ploomber allows you to compose pipelines: **write your
feature generation once and re-use it to compose your training and serving
pipelines**; ensuring that the feature engineering code matches exactly.
pipelines**; this ensures that the feature engineering code matches exactly.


Batch processing
Expand All @@ -74,7 +73,7 @@ processing. Check out our package
`Soopervisor <https://soopervisor.readthedocs.io/en/stable/index.html>`_, which
allows you to export to
`Kubernetes <https://soopervisor.readthedocs.io/en/stable/kubernetes.html>`_
(via Argo workflows) and
(via `Argo workflows <argoproj.github.io/>`_) and
`Airflow <https://soopervisor.readthedocs.io/en/stable/airflow.html>`_. It's
also possible to run Ploomber projects using `cron
<https://soopervisor.readthedocs.io/en/stable/scheduling.html#cron>`_ or
Expand All @@ -86,7 +85,7 @@ Composing batch pipelines
To compose a batch pipeline, use the ``import_tasks_from`` directive in
your ``pipeline.yaml`` file.

For example, define all your feature generation tasks in a ``features.yaml`` file:
For example, define your feature generation tasks in a ``features.yaml`` file:


.. code-block:: yaml
Expand Down Expand Up @@ -164,7 +163,7 @@ showing how to use ``import_tasks_from`` to create a training
Online service (API)
====================

To encapsulate all your pipeline's logic to generate online predictions, use
To encapsulate all your pipeline's logic for online predictions, use
:py:mod:`ploomber.OnlineDAG`. Once implemented, you can generate predictons
like this:

Expand All @@ -184,22 +183,20 @@ The only requisite is that your feature generation code should be entirely
made of Python functions (i.e., :py:mod:`ploomber.tasks.PythonCallable`) tasks
with configured :ref:`serializer-and-unserializer`.

The next section explains the implementation details.


Composing online pipelines
**************************

To create an online DAG, list your feature tasks in a ``features.yaml`` and
use ``import_tasks_from`` in your training pipeline ``pipeline.yaml``. To
create the serving pipeline, you have to create a subclass of
:py:mod:`ploomber.OnlineDAG`.
use ``import_tasks_from`` in your training pipeline (``pipeline.yaml``).
Subclass :py:mod:`ploomber.OnlineDAG` to create a serving pipelines.

``OnlineDAG`` will take your tasks from ``features.yaml`` and create
new "input tasks" based on ``upstream`` references in yout feature tasks.

``OnlineDAG`` will take your ``features.yaml`` and create "input tasks" based
on ``upstream`` references in yout feature tasks. For example, if your pipeline
has features ``a_feature`` and ``another_feature`` (just like the pipeline
described in the first section), and both obtain their inputs from a task
named ``get``, the code will look like this:
For example, if ``features.yaml`` has tasks ``a_feature`` and
``another_feature`` (see the diagram in the first section), and both obtain
their inputs from a task named ``get``, the source code may look like this:

.. code-block:: py
:class: text-editor
Expand All @@ -218,8 +215,8 @@ named ``get``, the code will look like this:
return df_another_feature
Since ``features.yaml`` does not contain a task named ``get``, ``OnlineDAG``
automatically identifies is as an input. Finally, you must provide a
"terminal task", which will be the last task in your online pipeline:
automatically identifies it as an "input task". Finally, you must provide a
"terminal task", which is the last task in your online pipeline:

.. raw:: html

Expand Down

0 comments on commit d403637

Please sign in to comment.