<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Introduction" data-toc-modified-id="Introduction-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Introduction</a></span></li><li><span><a href="#Imports" data-toc-modified-id="Imports-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Imports</a></span></li><li><span><a href="#Create-a-CV-score-harness" data-toc-modified-id="Create-a-CV-score-harness-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Create a CV score harness</a></span></li><li><span><a href="#Averaging-the-models" data-toc-modified-id="Averaging-the-models-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Averaging the models</a></span></li><li><span><a href="#Blending-the-models" data-toc-modified-id="Blending-the-models-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Blending the models</a></span></li><li><span><a href="#References" data-toc-modified-id="References-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>References</a></span></li></ul></div>

# Introduction
<hr style="border:2px solid black"> </hr>

<div class="alert alert-warning">
<font color=black>

**What?** How to create a class with fit and predict methods

</font>
</div>

# Imports
<hr style="border:2px solid black"> </hr>

In [None]:
from sklearn.base import BaseEstimator, TransformerMixin, RegressorMixin, clone
from sklearn import metrics
from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold, cross_val_score, cross_validate
from sklearn.metrics import mean_squared_error

# Create a CV score harness
<hr style="border:2px solid black"> </hr>

<div class="alert alert-info">
<font color=black>

- **k-Fold CV** is the gold standard for evaluating the performance of a machine learning algorithm on unseen data with k set to 3, 5, or 10. CV is an approach that you can use to estimate the performance of a ML algorithm with less variance than a single train-test set split. It works by splitting the dataset into k-parts. After running cross validation you end up with k different performance scores that you can summarize using a mean and a standard deviation.

- **Train/test split** is good for speed when using a slow algorithm and produces performance estimates with lower bias when using large datasets. We can take our original dataset and split it into two parts. Train the algorithm on the first part, make predictions on the second part and evaluate the predictions against the expected results. CONS: A downside of this technique is that it can have a high variance. This means that differences in the training and test dataset can result in meaningful differences in the estimate of accuracy.

- **leave-one-out**: You can configure cross validation so that the size of the fold is 1 (k is set to the number of observations in your dataset). This variation of cross validation is called leave-one-out cross validation. The result is a large number of performance measures that can be summarized in an effort to give a more reasonable estimate of the accuracy of your model on unseen data. CONS: A downside is that it can be a computationally more expensive procedure than k-fold cross validation. You can see in the standard deviation that the score has MORE variance than the k-fold cross validation results described above

- **repeated random splits** Another variation on k-fold cross validation is to create a random split of the data like the train/test split described above, but repeat the process of splitting and evaluation of the algorithm multiple times, like cross validation. This has the speed of using a train/test split and the reduction in variance in the estimated performance of k-fold cross validation. CONS: A down side is that repetitions may include much of the same data in the train or the test split from run to run, introducing redundancy into the evaluatio

- **GroupKFold** GroupKFold is a variation of k-fold which ensures that the same group is not represented in both testing and training sets. For example if the data is obtained from different subjects with several samples per-subject and if the model is flexible enough to learn from highly person specific features it could fail to generalize to new subjects. GroupKFold makes it possible to detect this kind of overfitting situations.

- **The bottom line?** If in doubt, use 10-Fold CV.
  
</font>
</div>

<div class="alert alert-info">
<font color=black>

| Methods                           | Type/variation of | Speed                        | Variance                   |
| --------------------------------- | ----------------- | ---------------------------- | -------------------------- |
| Train/test split                  | bare minimum      | Fastest                      | higher than k-fold         |
| k-Fold                            | k-Fold            | Slower than train/test split | less than train/test split |
| Leave-one-out                     | k-Fold            | Slower tha k-Fold            | higher than k-Fold         |
| Repeated Random Test-Train Splits | Train/split       | Slower than train/test split | less than k-Fold           |
| Group k-Fold                      | k-Fold            | ?                            | ?                          |


</font>
</div>

In [None]:
def get_cv_scores(model, train_set, train_target, name, n_splits=10, state=42, test_size=0.25,
                  type_="k-fold", verbose=False):
    """Get the CV scores of the model.

    To see the available keys
    sklearn.metrics.SCORERS.keys()

    The loocv does not have R2 because the function returns aNaN. 
    This could be fixed but I have not looked into it.

    Parameters
    ----------
    model : object

    train_set : pandas dataframe
        Matrix contraining instances and features

    train_target : pandas dataframe
        target

    name : string
        name to be used in returns pandas dataframe

    n_splits : int, default=10
        No of splits

    state : int, default=42
        No for the random state pseudo number generator

    test_size : float
        Size of the test set between 0 and 1

    type_ : string, default="k-fold"
        Type of cross-validation used

    verbose : string, default=False
        If True print on screen the results, otherwise it does 
        not print anything on console.

    Returns
    -------
    table : pandas dataframe
        Table containing the mean and std for each metrics.

    split_strategy : iterator
        Iterator used in the call, which can be used to access each fold or
        training set splits.
    """

    print("Type of CV selected", type_)

    # k-Fold
    if type_ == "k-fold":
        scoring = ["neg_mean_absolute_error", "neg_mean_squared_error",
                   "neg_root_mean_squared_error", "r2"]
        metrics_acronyms = ["MAE", "MSE", "RMSE", "R2"]
        split_strategy = KFold(
            n_splits=n_splits, random_state=state, shuffle=True)
        result = cross_validate(model, train_set, train_target,
                                scoring=scoring, cv=split_strategy, n_jobs=-1, return_train_score=True)
    # Repeated train-test
    elif type_ == "repeated_tt":
        scoring = ["neg_mean_absolute_error", "neg_mean_squared_error",
                   "neg_root_mean_squared_error", "r2"]
        metrics_acronyms = ["MAE", "MSE", "RMSE", "R2"]
        split_strategy = ShuffleSplit(
            n_splits=n_splits, test_size=test_size, random_state=state)
        result = cross_validate(model, train_set, train_target,
                                scoring=scoring, cv=split_strategy, n_jobs=-1, return_train_score=True)

    elif type_ == "loocv":
        scoring = ["neg_mean_absolute_error", "neg_mean_squared_error",
                   "neg_root_mean_squared_error"]
        metrics_acronyms = ["MAE", "MSE", "RMSE"]
        split_strategy = LeaveOneOut()
        result = cross_validate(model, train_set, train_target,
                                scoring=scoring, cv=split_strategy, n_jobs=-1, return_train_score=True)

    mean, std = [], []
    for i in scoring:
        if verbose == True:
            print("***********")
            print("Scoring: ", i)
            print("Folds:", [abs(i) for i in result["test_"+i]])
            print("Mean: %.6f" % abs(result["test_"+i].mean()))
            print("Standard deviation: %.6f" % abs(result["test_"+i].std()))
        mean.append(abs(result["test_"+i].mean()))
        std.append(abs(result["test_"+i].std()))

    table = pd.DataFrame()
    table["metrics"] = metrics_acronyms
    table["mean_CV_"+type_+"_"+name] = mean
    table["std_CV_"+type_+"_"+name] = std

    return table, split_strategy

# Averaging the models
<hr style="border:2px solid black"> </hr>

In [None]:
class AveragingModels(BaseEstimator, RegressorMixin, TransformerMixin):
    def __init__(self, models):
        self.models = models

    # we define clones of the original models to fit the data in
    def fit(self, X, y):
        self.models_ = [clone(x) for x in self.models]

        # Train cloned base models
        for model in self.models_:
            model.fit(X, y)

        return self

    # Now we do the predictions for cloned models and average them
    def predict(self, X):
        predictions = np.column_stack([
            model.predict(X) for model in self.models_
        ])
        return np.mean(predictions, axis=1)

In [None]:
average = AveragingModels(models = (model_1, model_2, model_3, model_4))

cv_scores_average, _ = get_cv_scores(
    average, X, y, name="average", n_splits=10, state=42, test_size=0.25, type_="k-fold", verbose=False)
cv_scores_average

# Blending the models
<hr style="border:2px solid black"> </hr>

<div class="alert alert-info">
<font color=black>

- The blending is different from the average in that the weights can be specified. 
- If the weightes are all equal and if they all sum to 1 then this is equivalent to using `Averaging Models`.

</font>
</div>

In [None]:
class BlendingModels(BaseEstimator, RegressorMixin, TransformerMixin):
    def __init__(self, models, weights):
        self.models = models
        self.weights = weights

    # we define clones of the original models to fit the data in
    def fit(self, X, y):
        self.models_ = [clone(x) for x in self.models]

        # Train cloned base models
        for model in self.models_:
            model.fit(X, y)
        return self

    # Now we do the predictions for cloned models and average them
    def predict(self, X):
        # """
        predictions = np.column_stack([
            model.predict(X) for model in self.models_
        ])
        # return np.mean(predictions, axis=1)
        # return np.average(predictions, axis=1, weights = self.weights)
        return np.average(predictions, axis=1, weights=self.weights)

In [None]:
blend = BlendingModels(
    models=(model_1, model_2, model_3, model_4),
weights = [0.25, 0.25, 0.25, 0.25])

In [None]:
cv_scores_blend, _ = get_cv_scores(
    blend, X, y, name="blend", n_splits=10, state=42, test_size=0.25, type_="k-fold", verbose=False)
cv_scores_blend

# References
<hr style="border:2px solid black"> </hr>

<div class="alert alert-warning">
<font color=black>

- https://www.kaggle.com/serigne/stacked-regressions-top-4-on-leaderboard#Modelling
- [`numpy.average`](https://numpy.org/doc/stable/reference/generated/numpy.average.html)

</font>
</div>