## Custom methods in `SmartCorrelatedSelection`

In this tutorial we show how to pass a custom method to `SmartCorrelatedSelection` using the association measure [Distance Correlation](https://m-clark.github.io/docs/CorrelationComparison.pdf) from the python package [dcor](https://dcor.readthedocs.io/en/latest/index.html). Install `dcor` before starting the tutorial

```
!pip install dcor
```

In [1]:
import pandas as pd
import dcor
import warnings

from sklearn.datasets import make_classification
from feature_engine.selection import SmartCorrelatedSelection

warnings.filterwarnings('ignore')

In [2]:
X, _ = make_classification(
    n_samples=1000,
    n_features=12,
    n_redundant=6,
    n_clusters_per_class=1,
    weights=[0.50],
    class_sep=2,
    random_state=1
)

colnames = ['var_'+str(i) for i in range(12)]
X = pd.DataFrame(X, columns=colnames)

In [3]:
dcor_tr = SmartCorrelatedSelection(
    variables=None,
    method=dcor.distance_correlation,
    threshold=0.75,
    missing_values="raise",
    selection_method="variance",
    estimator=None,
)

X_dcor = dcor_tr.fit_transform(X)
X_dcor

Unnamed: 0,var_0,var_2,var_4,var_6,var_9,var_11
0,-0.718421,0.477337,1.621889,2.089741,0.074477,1.599289
1,0.584286,1.490290,3.584239,-0.024631,1.788593,3.188758
2,-1.644619,0.891121,2.175168,-1.145170,-0.796662,2.178584
3,1.795776,1.568321,1.754788,0.626374,1.247212,-2.376344
4,-0.683522,-0.120177,1.171396,-0.114110,0.322612,-0.972715
...,...,...,...,...,...,...
995,0.379855,-0.093361,2.608481,-1.343059,-1.287548,2.507712
996,0.410435,0.301589,1.140932,0.010015,-1.124970,-1.315063
997,0.562542,-0.551323,1.407670,-1.215225,1.678760,1.551989
998,0.187248,-1.385539,1.288720,0.260543,-1.330030,1.071300


In the next example, we use the function [sklearn.feature_selection.mutual_info_regression](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.mutual_info_regression.html#sklearn.feature_selection.mutual_info_regression) to calculate the Mutual Information between two numerical variables.

As the callable should take as input two 1d ndarrays and output a float value, we define a custom function calling the sklearn method.

In [4]:
from sklearn.feature_selection import mutual_info_regression

def custom_mi(x, y):
    x = x.reshape(-1, 1)
    y = y.reshape(-1, 1)
    return mutual_info_regression(x, y)[0] # should return a float value

In [5]:
mi_tr = SmartCorrelatedSelection(
    variables=None,
    method=custom_mi,
    threshold=0.75,
    missing_values="raise",
    selection_method="variance",
    estimator=None,
)

X_mi = mi_tr.fit_transform(X)
X_mi

Unnamed: 0,var_0,var_2,var_4,var_6,var_8,var_9,var_11
0,-0.718421,0.477337,1.621889,2.089741,2.616778,0.074477,1.599289
1,0.584286,1.490290,3.584239,-0.024631,5.518534,1.788593,3.188758
2,-1.644619,0.891121,2.175168,-1.145170,3.535246,-0.796662,2.178584
3,1.795776,1.568321,1.754788,0.626374,-0.310298,1.247212,-2.376344
4,-0.683522,-0.120177,1.171396,-0.114110,0.262247,0.322612,-0.972715
...,...,...,...,...,...,...,...
995,0.379855,-0.093361,2.608481,-1.343059,4.159278,-1.287548,2.507712
996,0.410435,0.301589,1.140932,0.010015,-0.025811,-1.124970,-1.315063
997,0.562542,-0.551323,1.407670,-1.215225,2.396559,1.678760,1.551989
998,0.187248,-1.385539,1.288720,0.260543,1.926655,-1.330030,1.071300
