Merge pull request #5

* Added intro to MyAutoML to docs * Updated process flow diagrams * Updated process flow diagrams * Unhide ToC in user guide and reference * Add missing model.png * Updated README.md to include a link to the latest documentation. * Added labels to modelling diagram
myautoml · Oct 27, 2020 · 9bbf160 · 9bbf160
1 parent a4626a5
commit 9bbf160
Show file tree

Hide file tree

Showing 15 changed files with 320 additions and 80 deletions.
diff --git a/README.md b/README.md
@@ -12,21 +12,22 @@ MyAutoML is a project that aims to help data scientists become more efficient, b
 - A programming framework to automate as much as possible of the repetitive
   work a data scientist is likely to encounter.
 
-### Getting started
+### Installation
 
 Install MyAutoML using `pip`:
 ```shell script
 pip install myautoml
 ```
 
-Import the Python package:
-```python
-import myautoml as maml
-print(maml.__version__)
-```
+### Documentation
+
+Documentation is under development. The latest version can be found at
+https://myautoml.readthedocs.io/en/latest/.
+
+
+### Example scripts
 
-Further documentation is under development.
-For now, have a look at the example `scripts`.
+We provide example scripts in the folder `scripts`.
 
 
 ### Cookiecutters

diff --git a/docs/source/getting_started/environment.rst b/docs/source/getting_started/environment.rst
@@ -0,0 +1,33 @@
+.. _environment:
+
+===========
+Environment
+===========
+
+To get the most out of MyAutoML you will need to install and setup several components in your environment for MyAutoML
+to work with. Please have a look at the :ref:`ml_process` to see where these components fit in.
+
+
+.. _model-registry-mlflow:
+
+Model Registry: MLflow
+~~~~~~~~~~~~~~~~~~~~~~
+
+As a model registry we work with `MLflow <https://mlflow.org>`__. MLflow has two separate modules helping us to keep
+a good record of our models:
+
+- MLflow Tracking
+- MLflow Model Registry
+
+In the :ref:`ml_process`, when we refer to a Model Registry, we mean both of these MLflow components above: every
+trained model is tracked in the MLflow Tracking Server. Additionally, some will be registered with a registered model
+name in the MLflow Model Registry. In the prediction process, a model is loaded from the MLflow Model Registry.
+
+Please refer to the `installation instructions <https://mlflow.org/docs/latest/quickstart.html#installing-mlflow>`__ and
+`MLflow Tracking Servers <https://mlflow.org/docs/latest/tracking.html#mlflow-tracking-servers>`__ to get you started.
+In order to use the MLflow Model Registry, you will need to setup an MLflow Tracking Server with a database Backend
+Store, such as SQLite or PostgreSQL.
+
+.. toctree::
+    :maxdepth: 2
+    :hidden:
diff --git a/docs/source/getting_started/index.rst b/docs/source/getting_started/index.rst
@@ -4,123 +4,128 @@
 Getting started
 ===============
 
-Installation
-------------
+Intro to MyAutoML
+-----------------
 
-.. raw:: html
+There are a lot of great open source tools for data scientists to use. Basically all data scientists know
+`pandas <https://pandas.pydata.org>`__,
+`numpy <https://numpy.org>`__ and
+`scikit-learn <https://scikit-learn.org>`__.
+Many will be familiar with tools such as
+`MLflow <https://mlflow.org>`__ and
+`Hyperopt <http://hyperopt.github.io/hyperopt>`__.
+However, while all these packages provide great functionalities, we still have to tie them all together when we want to
+build a functioning data science product. For this, many of us will be familiar with the feeling we're doing the same
+things over and over again.
 
-    <div class="container">
-        <div class="row">
-            <div class="col-lg-6 col-md-6 col-sm-12 col-xs-12 d-flex install-block">
-                <div class="card install-card shadow w-100">
-                <div class="card-header">
-                    Install using pip
-                </div>
-                <div class="card-body">
-                    <p class="card-text">
+MyAutoML aims to fill this gap: **allowing data scientists to focus on what makes their individual projects unique.**
 
-MyAutoML can be installed via pip from `PyPI <https://pypi.org/project/myautoml>`__.
+MyAutoML focuses on scikit-learn type data science projects of classification and regression:
 
-.. raw:: html
+.. figure:: ../images/modelling.png
+    :width: 65%
+    :align: center
 
-                    </p>
-                </div>
-                <div class="card-footer text-muted">
+What makes your project unique? Indeed, the data. The overarching process and the algorithms tend to be mostly the
+same for every project. Scikit-learn and other open source packages provides the algorithms. MyAutoML aims to cover
+the process tying everything together, so you as a data scientist can focus on what's most important:
 
-.. code-block:: bash
+- translating your business problem into an analytics problem,
+- preparing the target variable and features you need,
+- generating business value.
 
-   pip install myautoml
 
-.. raw:: html
+What to expect
+--------------
 
-                </div>
-                </div>
-            </div>
-            <div class="col-12 d-flex install-block">
-                <div class="card install-card shadow w-100">
-                <div class="card-header">
-                    In-depth instructions?
-                </div>
-                <div class="card-body">
-                    <p class="card-text">Installing a specific version?
-                      Installing from source?
-                      Check the advanced installation page.</p>
+Perhaps it is easier to start off with what not to expect. MyAutoML is not a port of AutoML as offered by the Amazons,
+Googles and Microsofts of this world to your local environment. Perhaps one day in the future we may go in that
+direction, but at least for now we offer you tools (functions, classes and template scripts) to automate most of the
+repetitive work you do for your projects.
 
-.. container:: custom-button
+To get the most out of MyAutoML you will need a basic infrastructure setup, built upon open source software, such as
+MLflow and Hyperopt. Please have a look at the :ref:`environment` page for more information.
 
-    :ref:`Learn more <install>`
 
-.. raw:: html
 
-                </div>
-                </div>
-            </div>
-        </div>
-    </div>
 
-.. _gentle_intro:
 
-Intro to MyAutoML
------------------
+Quick questions
+---------------
 
 .. raw:: html
 
     <div class="container">
     <div id="accordion" class="shadow tutorial-accordion">
 
         <div class="card tutorial-card">
-            <div class="card-header collapsed card-link" data-toggle="collapse" data-target="#collapseOne">
+            <div class="card-header collapsed card-link" data-toggle="collapse" data-target="#collapse_1">
                 <div class="d-flex flex-row tutorial-card-header-1">
                     <div class="d-flex flex-row tutorial-card-header-2">
                         <button class="btn btn-dark btn-sm"></button>
-                        How to get started?
+                        How to install?
                     </div>
                     <span class="badge gs-badge-link">
 
-:ref:`Installation...<install>`
+:ref:`install`
 
 .. raw:: html
 
                     </span>
                 </div>
             </div>
-            <div id="collapseOne" class="collapse" data-parent="#accordion">
+            <div id="collapse_1" class="collapse" data-parent="#accordion">
                 <div class="card-body">
 
-Installing from PyPI is easy.
+The simplest way to install MyAutoML is to from `PyPI <https://pypi.org/project/myautoml>`_ via pip:
+
+.. code-block:: bash
+
+    pip install myautoml
 
 .. raw:: html
 
-                    <div class="d-flex flex-row">
-                        <span class="badge gs-badge-link">
+                </div>
+            </div>
+        </div>
+        <div class="card tutorial-card">
+            <div class="card-header collapsed card-link" data-toggle="collapse" data-target="#collapse_2">
+                <div class="d-flex flex-row tutorial-card-header-1">
+                    <div class="d-flex flex-row tutorial-card-header-2">
+                        <button class="btn btn-dark btn-sm"></button>
+                        Glossary
+                    </div>
+                    <span class="badge gs-badge-link">
 
-:ref:`Installation...<install>`
+:ref:`glossary`
 
 .. raw:: html
 
-                        </span>
-                    </div>
+                    </span>
                 </div>
             </div>
-        </div>
+            <div id="collapse_2" class="collapse" data-parent="#accordion">
+                <div class="card-body">
 
-    </div>
-    </div>
+In the User Guide we have included a :ref:`glossary`.
 
+.. raw:: html
 
+                </div>
+            </div>
+        </div>
 
-Tutorials
----------
+    </div>
+    </div>
 
-For a quick overview of MyAutoML functionality, see ....
 
 
-.. If you update this toctree, also update the manual toctree in the
-   main index.rst.template
+.. If you update this toctree, also update the manual toctree in the main index.rst.template
 
 .. toctree::
     :maxdepth: 2
     :hidden:
 
     install
-    overview
+    ml_process
+    environment
diff --git a/docs/source/getting_started/install.rst b/docs/source/getting_started/install.rst
@@ -12,7 +12,10 @@ Installing from PyPI
 MyAutoML can be installed via pip from
 `PyPI <https://pypi.org/project/myautoml>`__.
 
-::
+.. code-block:: bash
 
     pip install myautoml
 
+.. toctree::
+    :maxdepth: 2
+    :hidden:
diff --git a/docs/source/getting_started/ml_process.rst b/docs/source/getting_started/ml_process.rst
@@ -0,0 +1,125 @@
+.. _ml_process:
+
+========================
+Machine Learning Process
+========================
+
+The main two processes that we aim to cover with MyAutoML are the training and predicting processes. They are two
+separate processes, one for training a model and one for making predictions using a trained model. Each process is
+executed by running a Python script, e.g. :code:`train.py` and :code:`predict.py`. This can be as simple or as complex
+as you like: you can run the scripts manually (you can even run the code from a Jupyter notebook), or as an automated
+script in a Docker container on a Kubernetes platform scheduled by Airflow.
+
+
+Training
+--------
+
+The purpose of the training process is to start with some data, process it with a certain algorithm and produce a
+model that captures the interesting patterns in the training data.
+
+.. figure:: ../images/training-process.png
+    :width: 100%
+    :align: center
+
+
+Predicting
+----------
+
+The goal of the prediction process is to use a (trained) model and apply it to some new data to make predictions.
+A prediction script can make predictions for a batch of items, or it can spawn an API for real-time, on-demand
+predictions.
+
+.. figure:: ../images/prediction-process.png
+    :width: 100%
+    :align: center
+
+
+Calibrating
+-----------------
+
+In some classification use cases we need to `calibrate <https://scikit-learn.org/stable/modules/calibration.html>`_
+the output of our models to actual probabilities, rather than generic scores. While sometimes this can be done
+directly in the training process, in other cases it is more pragmatic to train a model first, and perform the
+calibration separately using the following process:
+
+.. figure:: ../images/calibrating-process.png
+    :width: 100%
+    :align: center
+
+
+Further reading
+---------------
+
+.. raw:: html
+
+    <div class="container">
+    <div id="accordion" class="shadow tutorial-accordion">
+
+        <div class="card tutorial-card">
+            <div class="card-header collapsed card-link" data-toggle="collapse" data-target="#collapse_1">
+                <div class="d-flex flex-row tutorial-card-header-1">
+                    <div class="d-flex flex-row tutorial-card-header-2">
+                        <button class="btn btn-dark btn-sm"></button>
+                        Training a model
+                    </div>
+                </div>
+            </div>
+            <div id="collapse_1" class="collapse" data-parent="#accordion">
+                <div class="card-body">
+
+`Wikipedia: Training, validation, and test sets
+<https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets/>`_
+
+`Machine Learning Mastery: How to Use ROC Curves and Precision-Recall Curves for Classification in Python
+<https://machinelearningmastery.com/roc-curves-and-precision-recall-curves-for-classification-in-python/>`_
+
+.. raw:: html
+
+                </div>
+            </div>
+        </div>
+        <div class="card tutorial-card">
+            <div class="card-header collapsed card-link" data-toggle="collapse" data-target="#collapse_2">
+                <div class="d-flex flex-row tutorial-card-header-1">
+                    <div class="d-flex flex-row tutorial-card-header-2">
+                        <button class="btn btn-dark btn-sm"></button>
+                        Making predictions
+                    </div>
+                </div>
+            </div>
+            <div id="collapse_2" class="collapse" data-parent="#accordion">
+                <div class="card-body">
+
+.. raw:: html
+
+                </div>
+            </div>
+        </div>
+        <div class="card tutorial-card">
+            <div class="card-header collapsed card-link" data-toggle="collapse" data-target="#collapse_3">
+                <div class="d-flex flex-row tutorial-card-header-1">
+                    <div class="d-flex flex-row tutorial-card-header-2">
+                        <button class="btn btn-dark btn-sm"></button>
+                        Model calibration
+                    </div>
+                </div>
+            </div>
+            <div id="collapse_3" class="collapse" data-parent="#accordion">
+                <div class="card-body">
+
+`Machine Learning Mastery: How and When to Use a Calibrated Classification Model with scikit-learn
+<https://machinelearningmastery.com/calibrated-classification-model-in-scikit-learn/>`_
+
+.. raw:: html
+
+                </div>
+            </div>
+        </div>
+
+    </div>
+    </div>
+
+
+.. toctree::
+    :maxdepth: 2
+    :hidden: