# Intermediate ML project

This example shows how to build an ML pipeline with integration testing (using
the `on_finish` key). When the pipeline takes a lot of time to run end-to-end
it is a good idea to test with a sample, take a look at the `pipeline.yaml`,
`env.yaml` to see how this parametrization happens and how this affects the
`get` function defined in `tasks.py`.

## Setup

~~~bash
# same instructions as the other version
git clone https://github.com/ploomber/projects
cd ml-basic

conda env create --file environment.yml
conda activate ml-basic
~~~

## Execute the pipeline

In [1]:
%%sh
ploomber build

name      Ran?      Elapsed (s)    Percentage
--------  ------  -------------  ------------
get       False               0             0
features  False               0             0
join      False               0             0
fit.py    False               0             0


  0%|          | 0/4 [00:00<?, ?it/s]Rendering DAG:   0%|          | 0/4 [00:00<?, ?it/s]Rendering DAG:  25%|██▌       | 1/4 [00:00<00:00,  8.47it/s]Rendering DAG:  25%|██▌       | 1/4 [00:00<00:00,  8.47it/s]Rendering DAG:  25%|██▌       | 1/4 [00:00<00:00,  8.47it/s]Rendering DAG:  25%|██▌       | 1/4 [00:00<00:00,  8.47it/s]Rendering DAG: 100%|██████████| 4/4 [00:00<00:00,  9.43it/s]Rendering DAG: 100%|██████████| 4/4 [00:00<00:00, 11.35it/s]
0it [00:00, ?it/s]4it [00:00, 20020.54it/s]


## Integration testing with a sample

To see available parameters:

In [2]:
%%sh
ploomber build --help

usage: ploomber [-h] [--log LOG] [--entry-point ENTRY_POINT] [--force]
                [--partially PARTIALLY] [--env--sample ENV__SAMPLE]

Build pipeline

optional arguments:
  -h, --help            show this help message and exit
  --log LOG, -l LOG     Enables logging to stdout at the specified level
  --entry-point ENTRY_POINT, -e ENTRY_POINT
                        Entry point(DAG), defaults to pipeline.yaml. Replaced
                        if there is an ENTRY_POINT env variable defined
  --force, -f           Force execution by ignoring status
  --partially PARTIALLY, -p PARTIALLY
                        Build a pipeline partially until certain task
  --env--sample ENV__SAMPLE
                        Default: False


Run with a sample:

In [3]:
%%sh
ploomber build --env--sample true 

name      Ran?      Elapsed (s)    Percentage
--------  ------  -------------  ------------
get       False               0             0
features  False               0             0
join      False               0             0
fit.py    False               0             0


  0%|          | 0/4 [00:00<?, ?it/s]Rendering DAG:   0%|          | 0/4 [00:00<?, ?it/s]Rendering DAG:  25%|██▌       | 1/4 [00:00<00:00,  8.93it/s]Rendering DAG:  25%|██▌       | 1/4 [00:00<00:00,  8.93it/s]Rendering DAG:  25%|██▌       | 1/4 [00:00<00:00,  8.93it/s]Rendering DAG:  25%|██▌       | 1/4 [00:00<00:00,  8.93it/s]Rendering DAG: 100%|██████████| 4/4 [00:00<00:00,  9.83it/s]Rendering DAG: 100%|██████████| 4/4 [00:00<00:00, 11.60it/s]
0it [00:00, ?it/s]4it [00:00, 20610.83it/s]
