
*Note:* You can run this from your computer (Jupyter or terminal), or use one of the
hosted options:

[![binder-logo](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/ploomber/binder-env/main?urlpath=git-pull%3Frepo%3Dhttps%253A%252F%252Fgithub.com%252Fploomber%252Fprojects%26urlpath%3Dlab%252Ftree%252Fprojects%252Fml-advanced%252FREADME.ipynb%26branch%3Dmaster)

[![deepnote-logo](https://deepnote.com/buttons/launch-in-deepnote-small.svg)](https://deepnote.com/launch?template=deepnote&url=https://github.com/ploomber/projects/blob/master/ml-advanced/README.ipynb)


# ML advanced

This example shows a Machine Learning pipeline using the Python API, how to package
your projects so you can install them using `pip install .`, how to test
using `pytest`,how to create an array of experiments to try several models and
run them in parallel.

## Setup

Make sure you are in the `ml-advanced` folder:

~~~bash
conda env create --file environment.yml

conda activate ml-advanced

# install the pipeline as a package in editable mode, this way changes will
# reflect when you update the source code
pip install --editable .
~~~

## Executing pipeline

In [1]:
%%sh
ploomber build --entry-point ml_advanced.pipeline.make

name                                   Ran?      Elapsed (s)    Percentage
-------------------------------------  ------  -------------  ------------
get                                    True         0.022596     0.0485445
features                               True         0.092352     0.198406
join                                   True         0.049762     0.106907
fit-ml-advanced--models--svc           True        14.2806      30.6801
fit-ml-advanced--models--logistic-reg  True        14.2874      30.6947
fit-sklearn--ensemble--                True        17.8142      38.2714
RandomForestClassifier


100%|██████████| 6/6 [00:00<00:00, 6457.74it/s]
INFO:root:Added get to the pool...
INFO:ploomber.executors.parallel:Finished 0 out of 6 tasks
INFO:root:Added features to the pool...
INFO:ploomber.executors.parallel:Finished 1 out of 6 tasks
INFO:root:Added join to the pool...
INFO:ploomber.executors.parallel:Finished 2 out of 6 tasks
INFO:root:Added fit-ml-advanced--models--svc to the pool...
INFO:root:Added fit-ml-advanced--models--logistic-reg to the pool...
INFO:root:Added fit-sklearn--ensemble--RandomForestClassifier to the pool...
Executing:   0%|          | 0/15 [00:00<?, ?cell/s]INFO:ploomber.executors.parallel:Finished 3 out of 6 tasks
INFO:ploomber.executors.parallel:Finished 3 out of 6 tasks
INFO:ploomber.executors.parallel:Finished 3 out of 6 tasks
INFO:ploomber.executors.parallel:Finished 3 out of 6 tasks
INFO:ploomber.executors.parallel:Finished 3 out of 6 tasks
INFO:ploomber.executors.parallel:Finished 3 out of 6 tasks
INFO:ploomber.executors.parallel:Finished 3 out of 6 

## Deploy to airflow

WIP

## Testing

~~~bash
# incremental (will only run the tasks that have changed)
pytest

# complete (force execution of all tasks)
pytest --force

# to start a debugging session on exceptions
pytest --pdb

# to start a debugging session at the start of every test
pytest --trace
~~~

## Interacting with the pipeline

In a Python session (make sure `ml-advanced/env.yaml` is in the current active
directory):


In [2]:

from ml_advanced.pipeline import make

dag = make()
dag.status()

HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=6.0), HTML(value='')))




name,Last run,Outdated?,Product,Doc (short),Location
get,"21 seconds ago (Dec 30, 20 at 14:24)",False,output/data.parquet,Get data,/Users/Edu/dev/proj ects-ploomber/ml-ad vanced/src/ml_advan ced/tasks.py:6
features,"21 seconds ago (Dec 30, 20 at 14:24)",False,output/features.par quet,Generate new features from existing columns,/Users/Edu/dev/proj ects-ploomber/ml-ad vanced/src/ml_advan ced/tasks.py:24
join,"21 seconds ago (Dec 30, 20 at 14:24)",False,output/join.parquet,Join raw data with generated features,/Users/Edu/dev/proj ects-ploomber/ml-ad vanced/src/ml_advan ced/tasks.py:33
fit-ml-advanced-- models--svc,"6 seconds ago (Dec 30, 20 at 14:24)",False,"{'nb': File(output/fit-ml_ advanced.models.svc .ipynb), 'model': File(output/model-m l_advanced.models.s vc.joblib)}",Script to train a model,/Users/Edu/dev/proj ects-ploomber/ml-ad vanced/src/ml_advan ced/templates/fit.p y
fit-ml-advanced-- models--logistic- reg,"6 seconds ago (Dec 30, 20 at 14:24)",False,"{'nb': File(output/fit-ml_ advanced.models.log istic_reg.ipynb), 'model': File(output/model-m l_advanced.models.l ogistic_reg.joblib) }",Script to train a model,/Users/Edu/dev/proj ects-ploomber/ml-ad vanced/src/ml_advan ced/templates/fit.p y
fit-sklearn-- ensemble--RandomFor estClassifier,"3 seconds ago (Dec 30, 20 at 14:24)",False,"{'nb': File(output/fit-skl earn.ensemble.Rando mForestClassifier.i pynb), 'model': File(output/model-s klearn.ensemble.Ran domForestClassifier .joblib)}",Script to train a model,/Users/Edu/dev/proj ects-ploomber/ml-ad vanced/src/ml_advan ced/templates/fit.p y
