# A primer on building PipeGraph's custom blocks

## Wrappers for Scikit-Learn standard objects
Consider the following Scikit-Learn common objects:

In [79]:
import sklearn
from sklearn.naive_bayes import GaussianNB
from sklearn.preprocessing import MinMaxScaler
from sklearn.cluster import DBSCAN

classifier = GaussianNB()
scaler = MinMaxScaler() 
dbscanner = DBSCAN()

And let's load some data to run the examples:

In [80]:
from sklearn.datasets import load_iris
iris = load_iris()
X, y = iris.data, iris.target

Now, let's fit each of the above defined sklearn objects and get the output produced afterwards by using the corresponding method (predict, fit_predict, transform):

In [81]:
classifier.fit(X, y)
scaler.fit(X);
dbscanner.fit(X, y)

DBSCAN(algorithm='auto', eps=0.5, leaf_size=30, metric='euclidean',
    metric_params=None, min_samples=5, n_jobs=1, p=None)

In [82]:
classifier.predict(X)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [83]:
classifier.predict_proba(X)

array([[  1.00000000e+000,   1.38496103e-018,   7.25489025e-026],
       [  1.00000000e+000,   1.48206242e-017,   2.29743996e-025],
       [  1.00000000e+000,   1.07780639e-018,   2.35065917e-026],
       [  1.00000000e+000,   1.43871443e-017,   2.89954283e-025],
       [  1.00000000e+000,   4.65192224e-019,   2.95961100e-026],
       [  1.00000000e+000,   1.52598944e-014,   1.79883402e-021],
       [  1.00000000e+000,   1.13555084e-017,   2.79240943e-025],
       [  1.00000000e+000,   6.57615274e-018,   2.79021029e-025],
       [  1.00000000e+000,   9.12219356e-018,   1.16607332e-025],
       [  1.00000000e+000,   3.20344249e-018,   1.12989524e-025],
       [  1.00000000e+000,   4.48944985e-018,   5.19388089e-025],
       [  1.00000000e+000,   1.65734172e-017,   7.24605453e-025],
       [  1.00000000e+000,   1.19023891e-018,   3.06690017e-026],
       [  1.00000000e+000,   7.39520546e-020,   1.77972179e-027],
       [  1.00000000e+000,   2.58242749e-019,   8.73399972e-026],
       [  

In [84]:
classifier.predict_log_proba(X)

array([[  0.00000000e+00,  -4.11208597e+01,  -5.78855367e+01],
       [  0.00000000e+00,  -3.87505119e+01,  -5.67328319e+01],
       [  0.00000000e+00,  -4.13716038e+01,  -5.90125166e+01],
       [  0.00000000e+00,  -3.87801966e+01,  -5.65000742e+01],
       [  0.00000000e+00,  -4.22118362e+01,  -5.87821546e+01],
       [ -1.50990331e-14,  -3.18135483e+01,  -4.77671483e+01],
       [  0.00000000e+00,  -3.90168287e+01,  -5.65377225e+01],
       [  0.00000000e+00,  -3.95630818e+01,  -5.65385104e+01],
       [  0.00000000e+00,  -3.92358214e+01,  -5.74109854e+01],
       [  0.00000000e+00,  -4.02823057e+01,  -5.74425024e+01],
       [  0.00000000e+00,  -3.99448015e+01,  -5.59171461e+01],
       [  0.00000000e+00,  -3.86387316e+01,  -5.55841702e+01],
       [  0.00000000e+00,  -4.12723776e+01,  -5.87465451e+01],
       [  0.00000000e+00,  -4.40508700e+01,  -6.15933405e+01],
       [  0.00000000e+00,  -4.28003869e+01,  -5.76999890e+01],
       [  0.00000000e+00,  -3.79878625e+01,  -5.2407385

In [86]:
scaler.transform(X)

array([[ 0.22222222,  0.625     ,  0.06779661,  0.04166667],
       [ 0.16666667,  0.41666667,  0.06779661,  0.04166667],
       [ 0.11111111,  0.5       ,  0.05084746,  0.04166667],
       [ 0.08333333,  0.45833333,  0.08474576,  0.04166667],
       [ 0.19444444,  0.66666667,  0.06779661,  0.04166667],
       [ 0.30555556,  0.79166667,  0.11864407,  0.125     ],
       [ 0.08333333,  0.58333333,  0.06779661,  0.08333333],
       [ 0.19444444,  0.58333333,  0.08474576,  0.04166667],
       [ 0.02777778,  0.375     ,  0.06779661,  0.04166667],
       [ 0.16666667,  0.45833333,  0.08474576,  0.        ],
       [ 0.30555556,  0.70833333,  0.08474576,  0.04166667],
       [ 0.13888889,  0.58333333,  0.10169492,  0.04166667],
       [ 0.13888889,  0.41666667,  0.06779661,  0.        ],
       [ 0.        ,  0.41666667,  0.01694915,  0.        ],
       [ 0.41666667,  0.83333333,  0.03389831,  0.04166667],
       [ 0.38888889,  1.        ,  0.08474576,  0.125     ],
       [ 0.30555556,  0.

In [87]:
dbscanner.fit_predict(X)

array([ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0, -1,  0,  0,  0,  0,  0,  0,  0,  0,  1,
        1,  1,  1,  1,  1,  1, -1,  1,  1, -1,  1,  1,  1,  1,  1,  1,  1,
       -1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1, -1,  1,  1,  1,  1,  1, -1,  1,  1,  1,  1, -1,  1,  1,  1,
        1,  1,  1, -1, -1,  1, -1, -1,  1,  1,  1,  1,  1,  1,  1, -1, -1,
        1,  1,  1, -1,  1,  1,  1,  1,  1,  1,  1,  1, -1,  1,  1, -1, -1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1], dtype=int64)

As it can be seen, in order to have access for each object's output, one needs to call different methods. So as to offer a homogeneous interface a collection of adapters is available in PipeGraph. Them all derive from the ```AdapterForSkLearnLikeAdaptee``` baseclass. This class is an adapter for Scikit-Learn objects in order to provide a common interface based on fit and predict methods irrespectively of whether the adapted object provided a ```transform```, ```fit_predict```, or ```predict interface```.

As it can be seen from the following code fragment, the ```fit``` and ```predict``` allow for an arbitrary number of positional and keyword based parameters. These will have to be coherent with the adaptees expectations, but at least we are not imposing hard constrains to the adapter's interface.
```
class AdapterForSkLearnLikeAdaptee(BaseEstimator):
    def fit(self, *pargs, **kwargs):
       ...
    def predict(self, *pargs, **kwargs):
       ...
```

Those sklearn objects following the ```predict``` protocol can be wrapped into the class ```AdapterForFitPredictAdaptee```:

In [102]:
from pipegraph.adapters import AdapterForFitPredictAdaptee

wrapped_classifier = AdapterForFitPredictAdaptee(classifier)
y_pred = wrapped_classifier.predict(X=X)
y_pred

{'predict': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]),
 'predict_log_proba': array([[  0.00000000e+00,  -4.11208597e+01,  -5.78855367e+01],
        [  0.00000000e+00,  -3.87505119e+01,  -5.67328319e+01],
        [  0.00000000e+00,  -4.13716038e+01,  -5.90125166e+01],
        [  0.00000000e+00,  -3.87801966e+01,  -5.65000742e+01],
        [  0.00000000e+00,  -4.22118362e+01,  -5.87821546e+01],
        [ -1.50990331e-14,  -3.18135483e+01,  -4.77671483e+01],
        [  0.00000000e+00,  -3.90168287e+01,  -5.65377225e+01],
        [  0

As you can see the wrapper provides its output as a dictionary containing the outputs provided by ```predict```, ```predict_proba```, and ```predict_log_proba``` where these methods are available.

In [103]:
list(y_pred.keys())

['predict', 'predict_proba', 'predict_log_proba']

Those sklearn objects following the ```transform``` protocol can be wrapped into the class ```AdapterForFitTransformAdaptee```:

In [104]:
from pipegraph.adapters import AdapterForFitTransformAdaptee

wrapped_scaler = AdapterForFitTransformAdaptee(scaler)
y_pred=wrapped_scaler.predict(X)
y_pred

{'predict': array([[ 0.22222222,  0.625     ,  0.06779661,  0.04166667],
        [ 0.16666667,  0.41666667,  0.06779661,  0.04166667],
        [ 0.11111111,  0.5       ,  0.05084746,  0.04166667],
        [ 0.08333333,  0.45833333,  0.08474576,  0.04166667],
        [ 0.19444444,  0.66666667,  0.06779661,  0.04166667],
        [ 0.30555556,  0.79166667,  0.11864407,  0.125     ],
        [ 0.08333333,  0.58333333,  0.06779661,  0.08333333],
        [ 0.19444444,  0.58333333,  0.08474576,  0.04166667],
        [ 0.02777778,  0.375     ,  0.06779661,  0.04166667],
        [ 0.16666667,  0.45833333,  0.08474576,  0.        ],
        [ 0.30555556,  0.70833333,  0.08474576,  0.04166667],
        [ 0.13888889,  0.58333333,  0.10169492,  0.04166667],
        [ 0.13888889,  0.41666667,  0.06779661,  0.        ],
        [ 0.        ,  0.41666667,  0.01694915,  0.        ],
        [ 0.41666667,  0.83333333,  0.03389831,  0.04166667],
        [ 0.38888889,  1.        ,  0.08474576,  0.125     

The adapter for transformers doesn't have to provide so many methods' output, only the value provided by calling ```trasform``` method on the adaptee, which for homogeneity is provided as a dictionary with 'predict' as key:

In [105]:
list(y_pred.keys())

['predict']

Those sklearn objects following the ```fit_predict``` protocol can be wrapped into the class ```AdapterForAtomicFitPredictAdaptee```:

In [106]:
from pipegraph.adapters import AdapterForAtomicFitPredictAdaptee

wrapped_dbscanner = AdapterForAtomicFitPredictAdaptee(dbscanner)
y_pred = wrapped_dbscanner.predict(X=X)
y_pred

{'predict': array([ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
         0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
         0,  0,  0,  0,  0,  0,  0, -1,  0,  0,  0,  0,  0,  0,  0,  0,  1,
         1,  1,  1,  1,  1,  1, -1,  1,  1, -1,  1,  1,  1,  1,  1,  1,  1,
        -1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
         1,  1, -1,  1,  1,  1,  1,  1, -1,  1,  1,  1,  1, -1,  1,  1,  1,
         1,  1,  1, -1, -1,  1, -1, -1,  1,  1,  1,  1,  1,  1,  1, -1, -1,
         1,  1,  1, -1,  1,  1,  1,  1,  1,  1,  1,  1, -1,  1,  1, -1, -1,
         1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1], dtype=int64)}

Again, this adapter provides a dictionary with the values of calling ```fit_predict``` under the key 'predict'.

## Special wrappers

Besides of the three families of objects provided by Scikit-Learn, it is sometimes convenient to provide custom objects whose ```predict``` method returns multiple outputs. In this case, a dictionary can be used as well, with the name of the outputs as keys. In order to comply with this kind of output, the class ```AdapterForCustomFitPredictWithDictionaryOutputAdaptee``` is provided:

In [120]:
from pipegraph.standard_blocks import Demultiplexer
from pipegraph.adapters import AdapterForCustomFitPredictWithDictionaryOutputAdaptee

demultiplexer = Demultiplexer()
wrapped_demultiplexer = AdapterForCustomFitPredictWithDictionaryOutputAdaptee(demultiplexer)
output = wrapped_demultiplexer.predict(X=X, selection=y)
output

{'X_0':       0    1    2    3
 0   5.1  3.5  1.4  0.2
 1   4.9  3.0  1.4  0.2
 2   4.7  3.2  1.3  0.2
 3   4.6  3.1  1.5  0.2
 4   5.0  3.6  1.4  0.2
 5   5.4  3.9  1.7  0.4
 6   4.6  3.4  1.4  0.3
 7   5.0  3.4  1.5  0.2
 8   4.4  2.9  1.4  0.2
 9   4.9  3.1  1.5  0.1
 10  5.4  3.7  1.5  0.2
 11  4.8  3.4  1.6  0.2
 12  4.8  3.0  1.4  0.1
 13  4.3  3.0  1.1  0.1
 14  5.8  4.0  1.2  0.2
 15  5.7  4.4  1.5  0.4
 16  5.4  3.9  1.3  0.4
 17  5.1  3.5  1.4  0.3
 18  5.7  3.8  1.7  0.3
 19  5.1  3.8  1.5  0.3
 20  5.4  3.4  1.7  0.2
 21  5.1  3.7  1.5  0.4
 22  4.6  3.6  1.0  0.2
 23  5.1  3.3  1.7  0.5
 24  4.8  3.4  1.9  0.2
 25  5.0  3.0  1.6  0.2
 26  5.0  3.4  1.6  0.4
 27  5.2  3.5  1.5  0.2
 28  5.2  3.4  1.4  0.2
 29  4.7  3.2  1.6  0.2
 30  4.8  3.1  1.6  0.2
 31  5.4  3.4  1.5  0.4
 32  5.2  4.1  1.5  0.1
 33  5.5  4.2  1.4  0.2
 34  4.9  3.1  1.5  0.1
 35  5.0  3.2  1.2  0.2
 36  5.5  3.5  1.3  0.2
 37  4.9  3.1  1.5  0.1
 38  4.4  3.0  1.3  0.2
 39  5.1  3.4  1.5  0.2
 40  5.0 

In [121]:
list(output.keys())

['X_0', 'X_1', 'X_2']

As it can be seen, this adapter's ```predict``` method provides the dictionary of outputs provided by the adaptee with its original keys.

# Wrapping your custom blocks

PipeGraph uses the ```wrap_adaptee_in_process(adaptee, strategy_class=None)``` function to wrap the objects passed to its constructor's ```steps``` parameters accordingly to these rules:
- If the ```strategy_class``` parameter is passed, this class is used as adapter
- Else, if the adaptee's class is in ```pipegraph.base.strategies_for_custom_adaptees``` dictionary, the value class there is used.
- Else, if the adaptee has a ```predict``` method, the ```AdapterForFitPredictAdaptee``` class is used.
- Else, if the adaptee has a ```transform``` method, the ```AdapterForFitTransformAdaptee``` class is used.
- Else, if the adaptee has a ```fit_predict``` method, the ```AdapterForAtomicFitPredictAdaptee```

In [126]:
from pipegraph.base import wrap_adaptee_in_process

wrapped_scaler = wrap_adaptee_in_process(scaler)
wrapped_scaler.predict(X)

{'predict': array([[ 0.22222222,  0.625     ,  0.06779661,  0.04166667],
        [ 0.16666667,  0.41666667,  0.06779661,  0.04166667],
        [ 0.11111111,  0.5       ,  0.05084746,  0.04166667],
        [ 0.08333333,  0.45833333,  0.08474576,  0.04166667],
        [ 0.19444444,  0.66666667,  0.06779661,  0.04166667],
        [ 0.30555556,  0.79166667,  0.11864407,  0.125     ],
        [ 0.08333333,  0.58333333,  0.06779661,  0.08333333],
        [ 0.19444444,  0.58333333,  0.08474576,  0.04166667],
        [ 0.02777778,  0.375     ,  0.06779661,  0.04166667],
        [ 0.16666667,  0.45833333,  0.08474576,  0.        ],
        [ 0.30555556,  0.70833333,  0.08474576,  0.04166667],
        [ 0.13888889,  0.58333333,  0.10169492,  0.04166667],
        [ 0.13888889,  0.41666667,  0.06779661,  0.        ],
        [ 0.        ,  0.41666667,  0.01694915,  0.        ],
        [ 0.41666667,  0.83333333,  0.03389831,  0.04166667],
        [ 0.38888889,  1.        ,  0.08474576,  0.125     

In [129]:
wrapped_demultiplexer = wrap_adaptee_in_process(demultiplexer)
wrapped_demultiplexer.predict(X=X, selection=y)

{'X_0':       0    1    2    3
 0   5.1  3.5  1.4  0.2
 1   4.9  3.0  1.4  0.2
 2   4.7  3.2  1.3  0.2
 3   4.6  3.1  1.5  0.2
 4   5.0  3.6  1.4  0.2
 5   5.4  3.9  1.7  0.4
 6   4.6  3.4  1.4  0.3
 7   5.0  3.4  1.5  0.2
 8   4.4  2.9  1.4  0.2
 9   4.9  3.1  1.5  0.1
 10  5.4  3.7  1.5  0.2
 11  4.8  3.4  1.6  0.2
 12  4.8  3.0  1.4  0.1
 13  4.3  3.0  1.1  0.1
 14  5.8  4.0  1.2  0.2
 15  5.7  4.4  1.5  0.4
 16  5.4  3.9  1.3  0.4
 17  5.1  3.5  1.4  0.3
 18  5.7  3.8  1.7  0.3
 19  5.1  3.8  1.5  0.3
 20  5.4  3.4  1.7  0.2
 21  5.1  3.7  1.5  0.4
 22  4.6  3.6  1.0  0.2
 23  5.1  3.3  1.7  0.5
 24  4.8  3.4  1.9  0.2
 25  5.0  3.0  1.6  0.2
 26  5.0  3.4  1.6  0.4
 27  5.2  3.5  1.5  0.2
 28  5.2  3.4  1.4  0.2
 29  4.7  3.2  1.6  0.2
 30  4.8  3.1  1.6  0.2
 31  5.4  3.4  1.5  0.4
 32  5.2  4.1  1.5  0.1
 33  5.5  4.2  1.4  0.2
 34  4.9  3.1  1.5  0.1
 35  5.0  3.2  1.2  0.2
 36  5.5  3.5  1.3  0.2
 37  4.9  3.1  1.5  0.1
 38  4.4  3.0  1.3  0.2
 39  5.1  3.4  1.5  0.2
 40  5.0 

Those users implementing their own custom blocks may find useful the option of providing their own custom class to th ```wrap_adaptee_in_process```, as in:

In [130]:
wrapped_demultiplexer = wrap_adaptee_in_process(adaptee=demultiplexer,
                                                strategy_class=AdapterForCustomFitPredictWithDictionaryOutputAdaptee)
wrapped_demultiplexer.predict(X=X, selection=y)

{'X_0':       0    1    2    3
 0   5.1  3.5  1.4  0.2
 1   4.9  3.0  1.4  0.2
 2   4.7  3.2  1.3  0.2
 3   4.6  3.1  1.5  0.2
 4   5.0  3.6  1.4  0.2
 5   5.4  3.9  1.7  0.4
 6   4.6  3.4  1.4  0.3
 7   5.0  3.4  1.5  0.2
 8   4.4  2.9  1.4  0.2
 9   4.9  3.1  1.5  0.1
 10  5.4  3.7  1.5  0.2
 11  4.8  3.4  1.6  0.2
 12  4.8  3.0  1.4  0.1
 13  4.3  3.0  1.1  0.1
 14  5.8  4.0  1.2  0.2
 15  5.7  4.4  1.5  0.4
 16  5.4  3.9  1.3  0.4
 17  5.1  3.5  1.4  0.3
 18  5.7  3.8  1.7  0.3
 19  5.1  3.8  1.5  0.3
 20  5.4  3.4  1.7  0.2
 21  5.1  3.7  1.5  0.4
 22  4.6  3.6  1.0  0.2
 23  5.1  3.3  1.7  0.5
 24  4.8  3.4  1.9  0.2
 25  5.0  3.0  1.6  0.2
 26  5.0  3.4  1.6  0.4
 27  5.2  3.5  1.5  0.2
 28  5.2  3.4  1.4  0.2
 29  4.7  3.2  1.6  0.2
 30  4.8  3.1  1.6  0.2
 31  5.4  3.4  1.5  0.4
 32  5.2  4.1  1.5  0.1
 33  5.5  4.2  1.4  0.2
 34  4.9  3.1  1.5  0.1
 35  5.0  3.2  1.2  0.2
 36  5.5  3.5  1.3  0.2
 37  4.9  3.1  1.5  0.1
 38  4.4  3.0  1.3  0.2
 39  5.1  3.4  1.5  0.2
 40  5.0 

Passing an already wrapped object to PipeGraph's constructor ```steps``` parameter by using the ```wrap_adaptee_in_process``` as describe above may be useful for those custom blocks built by users, thus avoiding the need to modify the ```pipegraph.base.strategies_for_custom_adaptees``` dictionary.