diff --git a/doc/user-guide/deployment.rst b/doc/user-guide/deployment.rst index f3db75736..b30810138 100644 --- a/doc/user-guide/deployment.rst +++ b/doc/user-guide/deployment.rst @@ -4,20 +4,19 @@ Deployment The two most common ways to deploy data pipelines are batch and online. Ploomber supports both deployment options. -Batch implies running your pipeline (usually on a schedule), generate results -and make them available for future consumption. For example, you may develop a +Batch means obtaining new observationsn (usually on a schedule), +make predictions and save them for later use. For example, you may develop a Machine Learning pipeline that runs every morning, predicts the probability -of user churn and stores the probabilities in a database table. -probabilities can then later be used to guide decision-making. +of user churn and stores such probabilities in a database table. -Alternatively, you may deploy a pipeline as an online service. This time, -instead of storing predictions for later consumption, you expose your model -as a REST API and users can make requests and get predictions on demand. +Alternatively, you may deploy a pipeline as an online service and expose your +model as a REST APIl; users request predictions at any time by sending input +data. Pipeline composition ==================== -Before diving into the deployment details, let's introduce the concept of +Before diving into deployment details, let's introduce the concept of pipeline composition. The only difference between a Machine Learning training pipeline and its serving @@ -42,7 +41,7 @@ model: At **serving** time, we obtain new data, generate features and make -predictions using a previously trained model: +predictions using a trained model: .. raw:: html @@ -63,7 +62,7 @@ When the feature engineering process does not match, This is one of the most common problems when deploying ML models. To fix it, Ploomber allows you to compose pipelines: **write your feature generation once and re-use it to compose your training and serving -pipelines**; ensuring that the feature engineering code matches exactly. +pipelines**; this ensures that the feature engineering code matches exactly. Batch processing @@ -74,7 +73,7 @@ processing. Check out our package `Soopervisor `_, which allows you to export to `Kubernetes `_ -(via Argo workflows) and +(via `Argo workflows `_) and `Airflow `_. It's also possible to run Ploomber projects using `cron `_ or @@ -86,7 +85,7 @@ Composing batch pipelines To compose a batch pipeline, use the ``import_tasks_from`` directive in your ``pipeline.yaml`` file. -For example, define all your feature generation tasks in a ``features.yaml`` file: +For example, define your feature generation tasks in a ``features.yaml`` file: .. code-block:: yaml @@ -164,7 +163,7 @@ showing how to use ``import_tasks_from`` to create a training Online service (API) ==================== -To encapsulate all your pipeline's logic to generate online predictions, use +To encapsulate all your pipeline's logic for online predictions, use :py:mod:`ploomber.OnlineDAG`. Once implemented, you can generate predictons like this: @@ -184,22 +183,20 @@ The only requisite is that your feature generation code should be entirely made of Python functions (i.e., :py:mod:`ploomber.tasks.PythonCallable`) tasks with configured :ref:`serializer-and-unserializer`. -The next section explains the implementation details. - Composing online pipelines ************************** To create an online DAG, list your feature tasks in a ``features.yaml`` and -use ``import_tasks_from`` in your training pipeline ``pipeline.yaml``. To -create the serving pipeline, you have to create a subclass of -:py:mod:`ploomber.OnlineDAG`. +use ``import_tasks_from`` in your training pipeline (``pipeline.yaml``). +Subclass :py:mod:`ploomber.OnlineDAG` to create a serving pipelines. + +``OnlineDAG`` will take your tasks from ``features.yaml`` and create +new "input tasks" based on ``upstream`` references in yout feature tasks. -``OnlineDAG`` will take your ``features.yaml`` and create "input tasks" based -on ``upstream`` references in yout feature tasks. For example, if your pipeline -has features ``a_feature`` and ``another_feature`` (just like the pipeline -described in the first section), and both obtain their inputs from a task -named ``get``, the code will look like this: +For example, if ``features.yaml`` has tasks ``a_feature`` and +``another_feature`` (see the diagram in the first section), and both obtain +their inputs from a task named ``get``, the source code may look like this: .. code-block:: py :class: text-editor @@ -218,8 +215,8 @@ named ``get``, the code will look like this: return df_another_feature Since ``features.yaml`` does not contain a task named ``get``, ``OnlineDAG`` -automatically identifies is as an input. Finally, you must provide a -"terminal task", which will be the last task in your online pipeline: +automatically identifies it as an "input task". Finally, you must provide a +"terminal task", which is the last task in your online pipeline: .. raw:: html