<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

In [None]:
#| include: false

## Overview

The functionality below uses the [`NumerFrame`](https://crowdcent.github.io/numerblox/numerframe.html#numerframe), `PreProcessor`, `Model` and `PostProcessor` objects to easily propagate
data, generate predictions and postprocess them in one go.

Specifically, this section introduces two objects:
1. [`ModelPipeline`](https://crowdcent.github.io/numerblox/modelpipeline.html#modelpipeline): Run all preprocessing, models and postprocessing that you define and return a [`NumerFrame`](https://crowdcent.github.io/numerblox/numerframe.html#numerframe).
2. [`ModelPipelineCollection`](https://crowdcent.github.io/numerblox/modelpipeline.html#modelpipelinecollection): Manage and run multiple [`ModelPipeline`](https://crowdcent.github.io/numerblox/modelpipeline.html#modelpipeline) objects.

In [None]:
#| include: false
from nbdev.showdoc import *

## 1. ModelPipeline

[`ModelPipeline`](https://crowdcent.github.io/numerblox/modelpipeline.html#modelpipeline) handles all preprocessing, model prediction and postprocessing. It returns a [`NumerFrame`](https://crowdcent.github.io/numerblox/numerframe.html#numerframe) with the preprocessed data, metadata and postprocessed prediction columns.

In [1]:
#| echo: false
#| output: asis
show_doc(ModelPipeline)

---

[source](https://github.com/crowdcent/numerblox/blob/master/numerblox/model_pipeline.py#L21){target="_blank" style="float:right; font-size:smaller"}

### ModelPipeline

>      ModelPipeline (models:List[numerblox.model.BaseModel],
>                     preprocessors:List[numerblox.preprocessing.BaseProcessor]=
>                     [], postprocessors:List[numerblox.preprocessing.BaseProces
>                     sor]=[], copy_first=True, standardize=True,
>                     pipeline_name:str=None)

Execute all preprocessing, prediction and postprocessing for a given setup.

:param models: Initiliazed numerai-blocks Models (Objects inheriting from BaseModel) 

:param preprocessors: List of initialized Preprocessors. 

:param postprocessors: List of initialized Postprocessors. 

:param copy_first: Whether to copy the NumerFrame as a first preprocessing step. 

Highly recommended in order to avoid surprise behaviour by manipulating the original dataset. 

:param pipeline_name: Unique name for pipeline. Only used for display purposes.

Example using several preprocessor, dummy models and postprocessors

In [None]:
model_names = ["test_0.5", "test_0.8"]

dataf = create_numerframe("test_assets/mini_numerai_version_1_data.csv")
preprocessors = [GroupStatsPreProcessor(), FeatureSelectionPreProcessor(feature_cols=['feature_intelligence_mean', 'feature_intelligence_std'])]
models = [ConstantModel(constant=0.5, model_name=model_names[0]), ConstantModel(constant=0.8, model_name=model_names[1])]
postprocessors = [MeanEnsembler(cols=[f"prediction_{name}" for name in model_names], final_col_name='prediction_ensembled'),
                  FeatureNeutralizer(feature_names=['feature_intelligence_mean', 'feature_intelligence_std'],
                                     pred_name='prediction_ensembled', proportion=0.8)]

In [None]:
test_pipeline = ModelPipeline(preprocessors=preprocessors, models=models,
                              postprocessors=postprocessors, pipeline_name="test_pipeline",
                              standardize=False)
processed_dataf = test_pipeline(dataf)

test_pipeline Preprocessing::   0%|          | 0/2 [00:00<?, ?it/s]

test_pipeline Model prediction:   0%|          | 0/2 [00:00<?, ?it/s]

test_pipeline Postprocessing:   0%|          | 0/2 [00:00<?, ?it/s]

In [None]:
assert processed_dataf.meta == dataf.meta
assert isinstance(processed_dataf, NumerFrame)
processed_dataf.head(2)

Unnamed: 0,feature_intelligence_mean,feature_intelligence_std,target,id,era,data_type,prediction_test_0.5,prediction_test_0.8,prediction_ensembled,prediction_ensembled_neutralized_0.8
0,0.333333,0.246183,0.5,n000315175b67977,era1,train,0.5,0.8,0.65,0.0
1,0.208333,0.234359,0.25,n0014af834a96cdd,era1,train,0.5,0.8,0.65,0.36088


## 2. ModelPipelineCollection

[`ModelPipelineCollection`](https://crowdcent.github.io/numerblox/modelpipeline.html#modelpipelinecollection) can be used to manage and run multiple [`ModelPipeline`](https://crowdcent.github.io/numerblox/modelpipeline.html#modelpipeline) objects.

[`ModelPipelineCollection`](https://crowdcent.github.io/numerblox/modelpipeline.html#modelpipelinecollection) simply takes a list of [`ModelPipeline`](https://crowdcent.github.io/numerblox/modelpipeline.html#modelpipeline) objects as input.

In [2]:
#| echo: false
#| output: asis
show_doc(ModelPipelineCollection)

---

[source](https://github.com/crowdcent/numerblox/blob/master/numerblox/model_pipeline.py#L90){target="_blank" style="float:right; font-size:smaller"}

### ModelPipelineCollection

>      ModelPipelineCollection (pipelines:List[__main__.ModelPipeline])

Execute multiple initialized ModelPipelines in a sequence.

:param pipelines: List of initialized ModelPipelines.

We introduce a different pipeline with no preprocessing or postprocessing. Only a [`RandomModel`](https://crowdcent.github.io/numerblox/model.html#randommodel).

In [None]:
test_pipeline2 = ModelPipeline(models=[RandomModel()], pipeline_name="test_pipeline2")

We process two [`ModelPipeline`](https://crowdcent.github.io/numerblox/modelpipeline.html#modelpipeline)s with different characteristics on the same data.

In [None]:
collection = ModelPipelineCollection([test_pipeline, test_pipeline2])
assert collection.get_pipeline("test_pipeline2").pipeline_name == 'test_pipeline2'

In [None]:
result_datasets = collection(dataf=dataf)

Processing Pipeline Collection:   0%|          | 0/2 [00:00<?, ?it/s]

test_pipeline Preprocessing::   0%|          | 0/2 [00:00<?, ?it/s]

test_pipeline Model prediction:   0%|          | 0/2 [00:00<?, ?it/s]

test_pipeline Postprocessing:   0%|          | 0/2 [00:00<?, ?it/s]

test_pipeline2 Preprocessing:: 0it [00:00, ?it/s]

test_pipeline2 Model prediction:   0%|          | 0/1 [00:00<?, ?it/s]

test_pipeline2 Postprocessing: : 0it [00:00, ?it/s]

The [`ModelPipelineCollection`](https://crowdcent.github.io/numerblox/modelpipeline.html#modelpipelinecollection) returns a dictionary mapping pipeline names to [`NumerFrame`](https://crowdcent.github.io/numerblox/numerframe.html#numerframe) objects, retaining all metadata and added prediction columns for each. Note that in this example, the 1st [`NumerFrame`](https://crowdcent.github.io/numerblox/numerframe.html#numerframe) had a feature selection step, so it did not retain all columns. However, the second dataset retained all feature columns, because no preprocessing was done.

In [None]:
result_datasets.keys()

dict_keys(['test_pipeline', 'test_pipeline2'])

In [None]:
result_datasets['test_pipeline'].head(2)

Unnamed: 0,feature_intelligence_mean,feature_intelligence_std,target,id,era,data_type,prediction_test_0.5,prediction_test_0.8,prediction_ensembled,prediction_ensembled_neutralized_0.8
0,0.333333,0.246183,0.5,n000315175b67977,era1,train,0.5,0.8,0.65,0.0
1,0.208333,0.234359,0.25,n0014af834a96cdd,era1,train,0.5,0.8,0.65,0.36088


In [None]:
result_datasets['test_pipeline2'].head(2)

Unnamed: 0,id,era,data_type,feature_intelligence1,feature_intelligence2,feature_intelligence3,feature_intelligence4,feature_intelligence5,feature_intelligence6,feature_intelligence7,...,feature_wisdom39,feature_wisdom40,feature_wisdom41,feature_wisdom42,feature_wisdom43,feature_wisdom44,feature_wisdom45,feature_wisdom46,target,prediction_random
0,n000315175b67977,era1,train,0.0,0.5,0.25,0.0,0.5,0.25,0.25,...,1.0,0.75,0.5,0.75,0.5,1.0,0.5,0.75,0.5,0.5
1,n0014af834a96cdd,era1,train,0.0,0.0,0.0,0.25,0.5,0.0,0.0,...,1.0,0.0,0.0,0.75,0.25,0.0,0.25,1.0,0.25,0.1


-----------------------------------------------------------------------------