# [Custom Estimators](https://scikit-learn.org/stable/developers/develop.html)

## get_params and set_params

All scikit-learn estimators have `get_params` and `set_params` functions. The get_params function takes no arguments and 
returns a dict of the `__init__` parameters of the estimator, together with their values.

It must take one keyword argument, deep, which receives a boolean value that determines whether the method should 
return the parameters of sub-estimators (for most estimators, this can be ignored). 

The default value for deep should be True. For instance considering the following estimator:

The parameter deep will control whether or not the parameters of the subsestimator should be reported. Thus when deep=True, the output will be:

The easiest way to implement these functions, and to get a sensible `__repr__` method, is to inherit from `sklearn.base.BaseEstimator`. If you do not want to make your code dependent on `scikit-learn`, the easiest way to 
implement the interface is

In [3]:
from sklearn.base import BaseEstimator
from sklearn.linear_model import LogisticRegression
class MyEstimator(BaseEstimator):
    def __init__(self, subestimator=None, my_extra_param="random"):
        self.subestimator = subestimator
        self.my_extra_param = my_extra_param
        
my_estimator = MyEstimator(subestimator=LogisticRegression())
for param, value in my_estimator.get_params(deep=True).items():  # returns a dictionary
    print(f"{param} -> {value}")
    

my_extra_param -> random
subestimator__C -> 1.0
subestimator__class_weight -> None
subestimator__dual -> False
subestimator__fit_intercept -> True
subestimator__intercept_scaling -> 1
subestimator__l1_ratio -> None
subestimator__max_iter -> 100
subestimator__multi_class -> auto
subestimator__n_jobs -> None
subestimator__penalty -> l2
subestimator__random_state -> None
subestimator__solver -> lbfgs
subestimator__tol -> 0.0001
subestimator__verbose -> 0
subestimator__warm_start -> False
subestimator -> LogisticRegression()


***

In [4]:
for param, value in my_estimator.get_params(deep=False).items():  # returns a dictionary
    print(f"{param} -> {value}")

my_extra_param -> random
subestimator -> LogisticRegression()


The easiest way to implement these functions, and to get a sensible `__repr__` method, is to inherit 
from `sklearn.base.BaseEstimator`. If you do not want to make your code dependent on `scikit-learn`, the easiest way to 
implement the interface is

    def get_params(self, deep=True):
        # suppose this estimator has parameters "alpha" and "recursive"
        return {"alpha": self.alpha, "recursive": self.recursive}
    
    def set_params(self, **parameters):
        for parameter, value in parameters.items():
            setattr(self, parameter, value)
        return self
        
        

## Cloning 

For use with the `model_selection` module, an estimator must support the `base.clone` function to replicate an estimator. 
This can be done by providing a get_params method. If get_params is present, then `clone(estimator)` will be an instance 
of `type(estimator)` on which set_params has been called with clones of the result of `estimator.get_params()`.

Objects that do not provide this method will be deep-copied (using the Python standard function copy.deepcopy) 
if `safe=False` is passed to clone.

# Pipeline

## Creating a custom estimator

In [30]:
import numpy as np
from sklearn.base import BaseEstimator, ClassifierMixin
from sklearn.utils.estimator_checks import check_estimator
from sklearn.utils.validation import check_X_y, check_array, check_is_fitted
from sklearn.utils.multiclass import unique_labels
from sklearn.metrics import euclidean_distances
class TemplateClassifier(BaseEstimator, ClassifierMixin):

    def __init__(self, demo_param='demo'):
        self.demo_param = demo_param

    def fit(self, X, y):

        # Check that X and y have correct shape
        X, y = check_X_y(X, y)
        # Store the classes seen during fit
        self.classes_ = unique_labels(y)

        self.X_ = X
        self.y_ = y
        # Return the classifier
        return self

    def predict(self, X):

        # Check is fit had been called
        check_is_fitted(self)

        # Input validation
        X = check_array(X)

        closest = np.argmin(euclidean_distances(X, self.X_), axis=1)
        return self.y_[closest]

    def fit_transform(self, X, y=None):
        """Fit the model with X and apply the dimensionality reduction on X.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            Training data, where n_samples is the number of samples
            and n_features is the number of features.

        y : Ignored

        Returns
        -------
        X_new : ndarray of shape (n_samples, n_components)
            Transformed values.

        Notes
        -----
        This method returns a Fortran-ordered array. To convert it to a
        C-ordered array, use 'np.ascontiguousarray'.
        """
        return X
    
    def transform(self, X):
        """Apply dimensionality reduction to X.

        X is projected on the first principal components previously extracted
        from a training set.

        Parameters
        ----------
        X : array-like, shape (n_samples, n_features)
            New data, where n_samples is the number of samples
            and n_features is the number of features.

        Returns
        -------
        X_new : array-like, shape (n_samples, n_components)

        Examples
        --------

        >>> import numpy as np
        >>> from sklearn.decomposition import IncrementalPCA
        >>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
        >>> ipca = IncrementalPCA(n_components=2, batch_size=3)
        >>> ipca.fit(X)
        IncrementalPCA(batch_size=3, n_components=2)
        >>> ipca.transform(X) # doctest: +SKIP
        """
        return X
        
        
CLFEX = TemplateClassifier()
# check_estimator(TemplateClassifier(CLFEX))

## Creating a pipeline

In [32]:
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC
from sklearn.decomposition import PCA
estimators = [('clfex', TemplateClassifier()), ('reduce_dim', PCA()), ('clf', SVC()), ]
pipe = Pipeline(estimators)
pipe

Pipeline(steps=[('clfex', TemplateClassifier()), ('reduce_dim', PCA()),
                ('clf', SVC())])

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [33]:
pipe.get_params()
pipe.fit(np.ones((3, 3)), np.ones(3))
# pipe.set_params(clf__C=10)