# Customized Estimators

You are able to [roll your own](https://scikit-learn.org/stable/developers/develop.html) `estimator`, `regressor`, `classifier` or `transformer`. Below are some templates adapted from [the GitHub repository](https://github.com/scikit-learn-contrib/project-template/blob/master/skltemplate/_template.py) and works with scikit-learn v0.24.1 (the GitHub repository for the [project-template](https://github.com/scikit-learn-contrib/project-template) is independent from and not synchronized with the [scikit-learn](https://github.com/scikit-learn/scikit-learn/tree/0.24.1) repository). There is also [official documentation](https://sklearn-template.readthedocs.io/en/latest/index.html) elsewhere.

Note that you should (though not necessarily) inherit from `BaseEstimator` and use the appropriate `mixin`. After you write your estimator, apply the `check_estimator()` method to check (test) if your estimator is valid. 

## Basic estimator

Here is a barebones, dummy estimator. You need to implement two methods with the following signatures.

- fit(self, X, y, **kwargs)
- predict(self, X)

When you run `fit()`, make sure the first thing you do is check if `y` is `None`. The `check_X_y()` method is also required, and the properties `is_fitted_` and `n_features_in_` are also required to be set inside `fit()`. At the end of `fit()`, `self` must always be returned.

The `predict()` method must return a prediction for every row. Likewise, before making any predictions, `check_is_fitted()` and `check_array()` are required to be called.

In [1]:
from sklearn.utils.estimator_checks import check_estimator
from sklearn.base import BaseEstimator, RegressorMixin, ClassifierMixin, TransformerMixin
from sklearn.utils.validation import check_X_y, check_array, check_is_fitted
from sklearn.utils.multiclass import unique_labels
from sklearn.metrics import euclidean_distances
import numpy as np

class SpecialEstimator(BaseEstimator):
    def __init__(self):
        pass
    
    def fit(self, X, y, **kwargs):
        if y is None:
            raise ValueError('requires y to be passed, but the target y is None')
            
        X, y = check_X_y(X, y)
        self.is_fitted_ = True
        self.n_features_in_ = X.shape[1]
        
        return self
    
    def predict(self, X):
        check_is_fitted(self, 'is_fitted_')
        X = check_array(X)
        return np.ones(X.shape[0], dtype=np.int64)

check_estimator(SpecialEstimator())

## Basic regressor

If your estimator is indeed a regressor, use `RegressorMixin`. The `fit()` and `predict()` implementations follows the same as before. However, notice the `_more_tags()` method? This method is used to override or supply additional `tags`. As of v0.24.1, the documentation states that tags are experimental and subject to change. But [what are these tags](https://scikit-learn.org/stable/developers/develop.html#estimator-tags)? These tags are essentially hints about the capabilities of the estimator. The `poor_score` tag hints that the regressor either fails (`True`) or not fails (`False`, default) to provide a *reasonable* test-set score. By default, this tag is set to `False`, and here, we implement `_more_tags()` to override that value to `True` (otherwise, there is a warning generated).

In [2]:
class SpecialRegressor(RegressorMixin, BaseEstimator):
    def __init__(self):
        pass
    
    def fit(self, X, y, **kwargs):
        if y is None:
            raise ValueError('requires y to be passed, but the target y is None')
            
        X, y = check_X_y(X, y)
        self.is_fitted_ = True
        self.n_features_in_ = X.shape[1]
        
        return self
    
    def predict(self, X):
        check_is_fitted(self, 'is_fitted_')
        X = check_array(X)
        return np.ones(X.shape[0], dtype=np.int64)
    
    def _more_tags(self):
        return {
            'poor_score': True
        }
    
check_estimator(SpecialRegressor())

## Basic classifier

Classifiers should use `ClassifierMixin`, and also follow the `fit()` and `predict()` contracts. One caveate here is that in the `fit()` method, we must also store the state of the number of classes in `classes_`. Be careful with the `predict()` method, as it should return label values that are consistent with the class values seen during `fit()`.

In [3]:
from random import choice

class SpecialClassifier(ClassifierMixin, BaseEstimator):
    def __init__(self):
        pass
    
    def fit(self, X, y, **kwargs):
        if y is None:
            raise ValueError('requires y to be passed, but the target y is None')
        
        X, y = check_X_y(X, y)
        
        self.n_features_in_ = X.shape[1]
        self.classes_ = unique_labels(y)
        self.is_fitted_ = True
                
        self.X_ = X
        self.y_ = y
        
        return self
    
    def predict(self, X):
        check_is_fitted(self, ['is_fitted_', 'X_', 'y_'])
        X = check_array(X)
        
        closest = np.argmin(euclidean_distances(X, self.X_), axis=1)
        return self.y_[closest]
    
check_estimator(SpecialClassifier())

## Basic transformer

Transformers should use `TransformerMixin` and implement two methods.

- fit(self, X, y=None)
- transform(self, X)

The check and properties saved shown below inside `fit()` and `transform()` are all required to pass the checks. 

In [4]:
class SpecialTransformer(TransformerMixin, BaseEstimator):
    def __init__(self):
        pass
    
    def fit(self, X, y=None):
        X = check_array(X, accept_sparse=False)
        
        self.n_features_in_ = X.shape[1]
        self.n_features_ = X.shape[1]
        self.is_fitted_ = True
                
        return self
    
    def transform(self, X):
        check_is_fitted(self, ['is_fitted_'])
        
        X = check_array(X, accept_sparse=False)
        
        if X.shape[1] != self.n_features_:
            raise ValueError('Shape of input is different from what was seen in `fit`')
            
        return np.sqrt(X)
    
check_estimator(SpecialTransformer())