
*Note:* You can run this from your computer (Jupyter or terminal), or use one of the
hosted options:
[![binder-logo](https://mybinder.org/v2/gh/ploomber/binder-env/main?urlpath=git-pull%3Frepo%3Dhttps%253A%252F%252Fgithub.com%252Fploomber%252Fprojects%26urlpath%3Dlab%252Ftree%252Fprojects%252Fml-advanced%252FREADME.ipynb%26branch%3Dmaster)
[![deepnote-logo](https://deepnote.com/buttons/launch-in-deepnote-small.svg)](https://deepnote.com/launch?template=deepnote&url=https://github.com/ploomber/projects/blob/master/ml-advanced/README.ipynb)


# ML advanced

This example shows a Machine Learning pipeline using the Python API, how to package
your projects so you can install them using `pip install .`, how to test
using `pytest`,how to create an array of experiments to try several models and
run them in parallel.

## Setup

Make sure you are in the `ml-advanced` folder:

~~~bash
conda env create --file environment.yml

conda activate ml-advanced

# install the pipeline as a package in editable mode, this way changes will
# reflect when you update the source code
pip install --editable .
~~~

## Executing pipeline

In [1]:
%%sh
ploomber build --entry-point ml_advanced.pipeline.make

name                                   Ran?      Elapsed (s)    Percentage
-------------------------------------  ------  -------------  ------------
get                                    True         0.022316     0.0345759
features                               True         0.632311     0.979687
join                                   True         0.054247     0.084049
fit-ml-advanced--models--svc           True        19.2526      29.8295
fit-ml-advanced--models--logistic-reg  True        19.2419      29.813
fit-sklearn--ensemble--                True        25.3387      39.2592
RandomForestClassifier


100%|██████████| 6/6 [00:00<00:00, 6390.51it/s]
INFO:root:Added get to the pool...
INFO:ploomber.executors.parallel:Finished 0 out of 6 tasks
INFO:root:Added features to the pool...
INFO:ploomber.executors.parallel:Finished 1 out of 6 tasks
INFO:ploomber.executors.parallel:Finished 1 out of 6 tasks
INFO:ploomber.executors.parallel:Finished 1 out of 6 tasks
INFO:ploomber.executors.parallel:Finished 1 out of 6 tasks
INFO:root:Added join to the pool...
INFO:ploomber.executors.parallel:Finished 2 out of 6 tasks
INFO:root:Added fit-ml-advanced--models--svc to the pool...
INFO:root:Added fit-ml-advanced--models--logistic-reg to the pool...
INFO:root:Added fit-sklearn--ensemble--RandomForestClassifier to the pool...
INFO:ploomber.executors.parallel:Finished 3 out of 6 tasks
Executing:   0%|          | 0/15 [00:00<?, ?cell/s]INFO:ploomber.executors.parallel:Finished 3 out of 6 tasks
INFO:ploomber.executors.parallel:Finished 3 out of 6 tasks
INFO:ploomber.executors.parallel:Finished 3 out of 6 

## Deploy to airflow

WIP

## Testing

~~~bash
# incremental (will only run the tasks that have changed)
pytest

# complete (force execution of all tasks)
pytest --force

# to start a debugging session on exceptions
pytest --pdb

# to start a debugging session at the start of every test
pytest --trace
~~~

## Interacting with the pipeline

In a Python session (make sure `ml-advanced/env.yaml` is in the current active
directory):


In [2]:

from ml_advanced.pipeline import make

dag = make()
dag.status()

HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=6.0), HTML(value='')))




name,Last run,Outdated?,Product,Doc (short),Location
get,"29 seconds ago (Dec 09, 20 at 16:36)",False,output/data.p arquet,Get data,/Users/Edu/de v/projects- ploomber/ml-a dvanced/src/m l_advanced/ta sks.py:6
features,"29 seconds ago (Dec 09, 20 at 16:36)",False,output/featur es.parquet,Generate new features from existing columns,/Users/Edu/de v/projects- ploomber/ml-a dvanced/src/m l_advanced/ta sks.py:24
join,"29 seconds ago (Dec 09, 20 at 16:36)",False,output/join.p arquet,Join raw data with generated features,/Users/Edu/de v/projects- ploomber/ml-a dvanced/src/m l_advanced/ta sks.py:33
fit-ml- advanced-- models--svc,"9 seconds ago (Dec 09, 20 at 16:37)",False,"{'nb': File(o utput/fit-ml_ advanced.mode ls.svc.ipynb) , 'model': Fi le(output/mod el-ml_advance d.models.svc. joblib)}",Script to train a model,/Users/Edu/de v/projects- ploomber/ml-a dvanced/src/m l_advanced/te mplates/fit.p y
fit-ml- advanced-- models-- logistic-reg,"9 seconds ago (Dec 09, 20 at 16:37)",False,"{'nb': File(o utput/fit-ml_ advanced.mode ls.logistic_r eg.ipynb), 'model': File (output/model -ml_advanced. models.logist ic_reg.joblib )}",Script to train a model,/Users/Edu/de v/projects- ploomber/ml-a dvanced/src/m l_advanced/te mplates/fit.p y
fit-sklearn-- ensemble--Ran domForestClas sifier,"3 seconds ago (Dec 09, 20 at 16:37)",False,"{'nb': File(o utput/fit-skl earn.ensemble .RandomForest Classifier.ip ynb), 'model': File (output/model -sklearn.ense mble.RandomFo restClassifie r.joblib)}",Script to train a model,/Users/Edu/de v/projects- ploomber/ml-a dvanced/src/m l_advanced/te mplates/fit.p y
