Skip to content

Commit

Permalink
Merge pull request #5
Browse files Browse the repository at this point in the history
* Added intro to MyAutoML to docs

* Updated process flow diagrams

* Updated process flow diagrams

* Unhide ToC in user guide and reference

* Add missing model.png

* Updated README.md to include a link to the latest documentation.

* Added labels to modelling diagram
  • Loading branch information
erikjandevries committed Oct 27, 2020
1 parent a4626a5 commit 9bbf160
Show file tree
Hide file tree
Showing 15 changed files with 320 additions and 80 deletions.
17 changes: 9 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,21 +12,22 @@ MyAutoML is a project that aims to help data scientists become more efficient, b
- A programming framework to automate as much as possible of the repetitive
work a data scientist is likely to encounter.

### Getting started
### Installation

Install MyAutoML using `pip`:
```shell script
pip install myautoml
```

Import the Python package:
```python
import myautoml as maml
print(maml.__version__)
```
### Documentation

Documentation is under development. The latest version can be found at
https://myautoml.readthedocs.io/en/latest/.


### Example scripts

Further documentation is under development.
For now, have a look at the example `scripts`.
We provide example scripts in the folder `scripts`.


### Cookiecutters
Expand Down
33 changes: 33 additions & 0 deletions docs/source/getting_started/environment.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
.. _environment:

===========
Environment
===========

To get the most out of MyAutoML you will need to install and setup several components in your environment for MyAutoML
to work with. Please have a look at the :ref:`ml_process` to see where these components fit in.


.. _model-registry-mlflow:

Model Registry: MLflow
~~~~~~~~~~~~~~~~~~~~~~

As a model registry we work with `MLflow <https://mlflow.org>`__. MLflow has two separate modules helping us to keep
a good record of our models:

- MLflow Tracking
- MLflow Model Registry

In the :ref:`ml_process`, when we refer to a Model Registry, we mean both of these MLflow components above: every
trained model is tracked in the MLflow Tracking Server. Additionally, some will be registered with a registered model
name in the MLflow Model Registry. In the prediction process, a model is loaded from the MLflow Model Registry.

Please refer to the `installation instructions <https://mlflow.org/docs/latest/quickstart.html#installing-mlflow>`__ and
`MLflow Tracking Servers <https://mlflow.org/docs/latest/tracking.html#mlflow-tracking-servers>`__ to get you started.
In order to use the MLflow Model Registry, you will need to setup an MLflow Tracking Server with a database Backend
Store, such as SQLite or PostgreSQL.

.. toctree::
:maxdepth: 2
:hidden:
129 changes: 67 additions & 62 deletions docs/source/getting_started/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,123 +4,128 @@
Getting started
===============

Installation
------------
Intro to MyAutoML
-----------------

.. raw:: html
There are a lot of great open source tools for data scientists to use. Basically all data scientists know
`pandas <https://pandas.pydata.org>`__,
`numpy <https://numpy.org>`__ and
`scikit-learn <https://scikit-learn.org>`__.
Many will be familiar with tools such as
`MLflow <https://mlflow.org>`__ and
`Hyperopt <http://hyperopt.github.io/hyperopt>`__.
However, while all these packages provide great functionalities, we still have to tie them all together when we want to
build a functioning data science product. For this, many of us will be familiar with the feeling we're doing the same
things over and over again.

<div class="container">
<div class="row">
<div class="col-lg-6 col-md-6 col-sm-12 col-xs-12 d-flex install-block">
<div class="card install-card shadow w-100">
<div class="card-header">
Install using pip
</div>
<div class="card-body">
<p class="card-text">
MyAutoML aims to fill this gap: **allowing data scientists to focus on what makes their individual projects unique.**

MyAutoML can be installed via pip from `PyPI <https://pypi.org/project/myautoml>`__.
MyAutoML focuses on scikit-learn type data science projects of classification and regression:

.. raw:: html
.. figure:: ../images/modelling.png
:width: 65%
:align: center

</p>
</div>
<div class="card-footer text-muted">
What makes your project unique? Indeed, the data. The overarching process and the algorithms tend to be mostly the
same for every project. Scikit-learn and other open source packages provides the algorithms. MyAutoML aims to cover
the process tying everything together, so you as a data scientist can focus on what's most important:

.. code-block:: bash
- translating your business problem into an analytics problem,
- preparing the target variable and features you need,
- generating business value.

pip install myautoml

.. raw:: html
What to expect
--------------

</div>
</div>
</div>
<div class="col-12 d-flex install-block">
<div class="card install-card shadow w-100">
<div class="card-header">
In-depth instructions?
</div>
<div class="card-body">
<p class="card-text">Installing a specific version?
Installing from source?
Check the advanced installation page.</p>
Perhaps it is easier to start off with what not to expect. MyAutoML is not a port of AutoML as offered by the Amazons,
Googles and Microsofts of this world to your local environment. Perhaps one day in the future we may go in that
direction, but at least for now we offer you tools (functions, classes and template scripts) to automate most of the
repetitive work you do for your projects.

.. container:: custom-button
To get the most out of MyAutoML you will need a basic infrastructure setup, built upon open source software, such as
MLflow and Hyperopt. Please have a look at the :ref:`environment` page for more information.

:ref:`Learn more <install>`

.. raw:: html

</div>
</div>
</div>
</div>
</div>

.. _gentle_intro:

Intro to MyAutoML
-----------------
Quick questions
---------------

.. raw:: html

<div class="container">
<div id="accordion" class="shadow tutorial-accordion">

<div class="card tutorial-card">
<div class="card-header collapsed card-link" data-toggle="collapse" data-target="#collapseOne">
<div class="card-header collapsed card-link" data-toggle="collapse" data-target="#collapse_1">
<div class="d-flex flex-row tutorial-card-header-1">
<div class="d-flex flex-row tutorial-card-header-2">
<button class="btn btn-dark btn-sm"></button>
How to get started?
How to install?
</div>
<span class="badge gs-badge-link">

:ref:`Installation...<install>`
:ref:`install`

.. raw:: html

</span>
</div>
</div>
<div id="collapseOne" class="collapse" data-parent="#accordion">
<div id="collapse_1" class="collapse" data-parent="#accordion">
<div class="card-body">

Installing from PyPI is easy.
The simplest way to install MyAutoML is to from `PyPI <https://pypi.org/project/myautoml>`_ via pip:

.. code-block:: bash
pip install myautoml
.. raw:: html

<div class="d-flex flex-row">
<span class="badge gs-badge-link">
</div>
</div>
</div>
<div class="card tutorial-card">
<div class="card-header collapsed card-link" data-toggle="collapse" data-target="#collapse_2">
<div class="d-flex flex-row tutorial-card-header-1">
<div class="d-flex flex-row tutorial-card-header-2">
<button class="btn btn-dark btn-sm"></button>
Glossary
</div>
<span class="badge gs-badge-link">

:ref:`Installation...<install>`
:ref:`glossary`

.. raw:: html

</span>
</div>
</span>
</div>
</div>
</div>
<div id="collapse_2" class="collapse" data-parent="#accordion">
<div class="card-body">

</div>
</div>
In the User Guide we have included a :ref:`glossary`.

.. raw:: html

</div>
</div>
</div>

Tutorials
---------
</div>
</div>

For a quick overview of MyAutoML functionality, see ....


.. If you update this toctree, also update the manual toctree in the
main index.rst.template
.. If you update this toctree, also update the manual toctree in the main index.rst.template
.. toctree::
:maxdepth: 2
:hidden:

install
overview
ml_process
environment
5 changes: 4 additions & 1 deletion docs/source/getting_started/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,10 @@ Installing from PyPI
MyAutoML can be installed via pip from
`PyPI <https://pypi.org/project/myautoml>`__.

::
.. code-block:: bash
pip install myautoml
.. toctree::
:maxdepth: 2
:hidden:
125 changes: 125 additions & 0 deletions docs/source/getting_started/ml_process.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
.. _ml_process:

========================
Machine Learning Process
========================

The main two processes that we aim to cover with MyAutoML are the training and predicting processes. They are two
separate processes, one for training a model and one for making predictions using a trained model. Each process is
executed by running a Python script, e.g. :code:`train.py` and :code:`predict.py`. This can be as simple or as complex
as you like: you can run the scripts manually (you can even run the code from a Jupyter notebook), or as an automated
script in a Docker container on a Kubernetes platform scheduled by Airflow.


Training
--------

The purpose of the training process is to start with some data, process it with a certain algorithm and produce a
model that captures the interesting patterns in the training data.

.. figure:: ../images/training-process.png
:width: 100%
:align: center


Predicting
----------

The goal of the prediction process is to use a (trained) model and apply it to some new data to make predictions.
A prediction script can make predictions for a batch of items, or it can spawn an API for real-time, on-demand
predictions.

.. figure:: ../images/prediction-process.png
:width: 100%
:align: center


Calibrating
-----------------

In some classification use cases we need to `calibrate <https://scikit-learn.org/stable/modules/calibration.html>`_
the output of our models to actual probabilities, rather than generic scores. While sometimes this can be done
directly in the training process, in other cases it is more pragmatic to train a model first, and perform the
calibration separately using the following process:

.. figure:: ../images/calibrating-process.png
:width: 100%
:align: center


Further reading
---------------

.. raw:: html

<div class="container">
<div id="accordion" class="shadow tutorial-accordion">

<div class="card tutorial-card">
<div class="card-header collapsed card-link" data-toggle="collapse" data-target="#collapse_1">
<div class="d-flex flex-row tutorial-card-header-1">
<div class="d-flex flex-row tutorial-card-header-2">
<button class="btn btn-dark btn-sm"></button>
Training a model
</div>
</div>
</div>
<div id="collapse_1" class="collapse" data-parent="#accordion">
<div class="card-body">

`Wikipedia: Training, validation, and test sets
<https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets/>`_

`Machine Learning Mastery: How to Use ROC Curves and Precision-Recall Curves for Classification in Python
<https://machinelearningmastery.com/roc-curves-and-precision-recall-curves-for-classification-in-python/>`_

.. raw:: html

</div>
</div>
</div>
<div class="card tutorial-card">
<div class="card-header collapsed card-link" data-toggle="collapse" data-target="#collapse_2">
<div class="d-flex flex-row tutorial-card-header-1">
<div class="d-flex flex-row tutorial-card-header-2">
<button class="btn btn-dark btn-sm"></button>
Making predictions
</div>
</div>
</div>
<div id="collapse_2" class="collapse" data-parent="#accordion">
<div class="card-body">

.. raw:: html

</div>
</div>
</div>
<div class="card tutorial-card">
<div class="card-header collapsed card-link" data-toggle="collapse" data-target="#collapse_3">
<div class="d-flex flex-row tutorial-card-header-1">
<div class="d-flex flex-row tutorial-card-header-2">
<button class="btn btn-dark btn-sm"></button>
Model calibration
</div>
</div>
</div>
<div id="collapse_3" class="collapse" data-parent="#accordion">
<div class="card-body">

`Machine Learning Mastery: How and When to Use a Calibrated Classification Model with scikit-learn
<https://machinelearningmastery.com/calibrated-classification-model-in-scikit-learn/>`_

.. raw:: html

</div>
</div>
</div>

</div>
</div>


.. toctree::
:maxdepth: 2
:hidden:

0 comments on commit 9bbf160

Please sign in to comment.