deployment guide improvements

ploomber · Mar 12, 2021 · d403637 · d403637
1 parent d427d9d
commit d403637
Showing 1 changed file with 22 additions and 25 deletions.
diff --git a/doc/user-guide/deployment.rst b/doc/user-guide/deployment.rst
@@ -4,20 +4,19 @@ Deployment
 The two most common ways to deploy data pipelines are batch and online.
 Ploomber supports both deployment options.
 
-Batch implies running your pipeline (usually on a schedule), generate results
-and make them available for future consumption. For example, you may develop a
+Batch means obtaining new observationsn (usually on a schedule),
+make predictions and save them for later use. For example, you may develop a
 Machine Learning pipeline that runs every morning, predicts the probability
-of user churn and stores the probabilities in a database table. 
-probabilities can then later be used to guide decision-making.
+of user churn and stores such probabilities in a database table.
 
-Alternatively, you may deploy a pipeline as an online service. This time,
-instead of storing predictions for later consumption, you expose your model
-as a REST API and users can make requests and get predictions on demand.
+Alternatively, you may deploy a pipeline as an online service and expose your
+model as a REST APIl; users request predictions at any time by sending input
+data.
 
 Pipeline composition
 ====================
 
-Before diving into the deployment details, let's introduce the concept of
+Before diving into deployment details, let's introduce the concept of
 pipeline composition.
 
 The only difference between a Machine Learning training pipeline and its serving
@@ -42,7 +41,7 @@ model:
     </div>
 
 At **serving** time, we obtain new data, generate features and make
-predictions using a previously trained model:
+predictions using a trained model:
 
 
 .. raw:: html
@@ -63,7 +62,7 @@ When the feature engineering process does not match,
 This is one of the most common problems when deploying ML models. To fix it,
 Ploomber allows you to compose pipelines: **write your
 feature generation once and re-use it to compose your training and serving
-pipelines**; ensuring that the feature engineering code matches exactly.
+pipelines**; this ensures that the feature engineering code matches exactly.
 
 
 Batch processing
@@ -74,7 +73,7 @@ processing. Check out our package
 `Soopervisor <https://soopervisor.readthedocs.io/en/stable/index.html>`_, which
 allows you to export to
 `Kubernetes <https://soopervisor.readthedocs.io/en/stable/kubernetes.html>`_
-(via Argo workflows) and
+(via `Argo workflows <argoproj.github.io/>`_) and
 `Airflow <https://soopervisor.readthedocs.io/en/stable/airflow.html>`_. It's
 also possible to run Ploomber projects using `cron
 <https://soopervisor.readthedocs.io/en/stable/scheduling.html#cron>`_ or
@@ -86,7 +85,7 @@ Composing batch pipelines
 To compose a batch pipeline, use the ``import_tasks_from`` directive in
 your ``pipeline.yaml`` file.
 
-For example, define all your feature generation tasks in a ``features.yaml`` file:
+For example, define your feature generation tasks in a ``features.yaml`` file:
 
 
 .. code-block:: yaml
@@ -164,7 +163,7 @@ showing how to use ``import_tasks_from`` to create a training
 Online service (API)
 ====================
 
-To encapsulate all your pipeline's logic to generate online predictions, use
+To encapsulate all your pipeline's logic for online predictions, use
 :py:mod:`ploomber.OnlineDAG`. Once implemented, you can generate predictons
 like this:
 
@@ -184,22 +183,20 @@ The only requisite is that your feature generation code should be entirely
 made of Python functions (i.e., :py:mod:`ploomber.tasks.PythonCallable`) tasks
 with configured :ref:`serializer-and-unserializer`.
 
-The next section explains the implementation details.
-
 
 Composing online pipelines
 **************************
 
 To create an online DAG, list your feature tasks in a ``features.yaml`` and
-use ``import_tasks_from`` in your training pipeline ``pipeline.yaml``. To
-create the serving pipeline, you have to create a subclass of
-:py:mod:`ploomber.OnlineDAG`.
+use ``import_tasks_from`` in your training pipeline (``pipeline.yaml``).
+Subclass :py:mod:`ploomber.OnlineDAG` to create a serving pipelines.
+
+``OnlineDAG`` will take your tasks from ``features.yaml`` and create
+new "input tasks" based on ``upstream`` references in yout feature tasks.
 
-``OnlineDAG`` will take your ``features.yaml`` and create "input tasks" based
-on ``upstream`` references in yout feature tasks. For example, if your pipeline
-has features ``a_feature`` and ``another_feature`` (just like the pipeline
-described in the first section), and both obtain their inputs from a task
-named ``get``, the code will look like this:
+For example, if ``features.yaml`` has tasks ``a_feature`` and
+``another_feature`` (see the diagram in the first section), and both obtain
+their inputs from a task named ``get``, the source code may look like this:
 
 .. code-block:: py
     :class: text-editor
@@ -218,8 +215,8 @@ named ``get``, the code will look like this:
         return df_another_feature
 
 Since ``features.yaml`` does not contain a task named ``get``, ``OnlineDAG``
-automatically identifies is as an input. Finally, you must provide a
-"terminal task", which will be the last task in your online pipeline:
+automatically identifies it as an "input task". Finally, you must provide a
+"terminal task", which is the last task in your online pipeline:
 
 .. raw:: html