# Reproducible, Portable, and Distributable ML Solutions in Python

This content is also [available in form of slides](0-index.slides.html) (best to browse your filesystem to open it outside of Jupyter).

## Agenda

| Block                   | Chapters                                                                    |
|-------------------------|-----------------------------------------------------------------------------|
| 1. Introduction         | Motivation, Context, Definitions                                            |
|                         | Domain Model Properties                                                     |
|                         | ML Lifecycle Patterns                                                       |
|                         | Tooling Comparison                                                          |
| 2. ForML Tutorial       | [Data Abstraction](2-tutorial/1-data-abstraction.ipynb)                     |
|                         | [Task Dependency Management](2-tutorial/2-task-dependency-management.ipynb) |
|                         | [Evaluation](2-tutorial/3-evaluation.ipynb)                                 |
|                         | [Project Management](2-tutorial/4-project-management.ipynb)                 |
| 3. Avazu CTR Solution   | [Setup & Exploration](3-solution/1-setup-and-exploration.ipynb)             |
|                         | [Formal Base Model](3-solution/2-formal-base-model.ipynb)                   |
|                         | [Pipeline Enhancements](3-solution/3-pipeline-enhancements.ipynb)           |
|                         | [Release & Deployment](3-solution/4-release-and-deployment.ipynb)           |
|                         | [Production Lifecycle Iterations](3-solution/5-lifecycle-iterations.ipynb)  |

## Setup

1. Clone the [workshop repository](https://github.com/formlio/mlprague23):

```shell
$ git clone git@github.com:formlio/mlprague23.git
$ cd mlprague23
```
2. [Install Docker Engine](https://docs.docker.com/engine/install/) along with the [Docker Compose plugin](https://docs.docker.com/compose/install/) (should be already part of any recent docker engine version).
3. Spin up the workspace container from within the `mlprague23` project root directory (this will need to bind ports `8888`, `8000` and `4040` on your machine):
```shell
$ docker compose up -d
```
4. Load the workspace notebook interface at [http://127.0.0.1:8888/lab](http://127.0.0.1:8888/lab) using your browser.

## Opening Remarks
* To demonstrate all the core principles, we are going to use the opensource tool [ForML](https://github.com/formlio/forml).
    * It's a development framework and MLOps platform for the lifecycle management of data science projects.
    * **Give it a Star on [GitHub](https://github.com/formlio/forml)!**
* For practical reasons, we choose to use a couple of traditional tools (Pandas, Scikit-learn) - they are by no means essential to any of the demonstrated principles.
* Participants are encouraged to follow with hands-on engaging in the practical exercises.
* We are using JupyterLab environment which works great (not only) for an interactive tutorial even though it doesn't shine in terms of reproducibility (see  the [famous talk at JupyterCon 2018 by Joel Grus](https://conferences.oreilly.com/jupyter/jup-ny/public/schedule/detail/68282.html)).
* Alternatively, feel free to follow the content in the form of [slides](0-index.slides.html) (though they sometimes overflow with bulky output content).