# Exercise 3: Hyperparameter Optimization

## What you will learn:
* Access and preprocess Data
* Evaluation metrics for regression
* Gradient Boosting Regression
* Determine feature importance from Gradient Boosting Regression (ensemble methods in general)
* Configure training- and testdata
* Evaluation curves for hyperparameter optimizsation
* Gridsearch and Randomsearch for hyperparameter optimization
* Processing Pipelines


**Task1:** In this notebook *Gradient Boosting Regression* is demonstrated by the example *rental bike prediction*. The corresponding dataset has already been described in notebook [05Optimisation.ipynb](../Lecture/05Optimisation.ipynb). Read this dataset into a pandas dataframe and display it's head.

**Task 2:** Calculate descriptive statistics of the generated dataframe.

**Task 3:** Columns 1 (season) to 11 (windspeed) are used as features. The target variable is the last column, i.e. the total count of rental bike per day. Prepare training- and test-data, with a split-ratio of $0.7/0.3$.

**Task 4:** Apply the training-partition to train a `GradientBoostingRegressor`.

## Evaluation metrics for Regression
Regression models can be scored by a variety of metrics. [Regression scores in scikit-learn](http://scikit-learn.org/stable/modules/model_evaluation.html) are 

* mean absolute error (MAE)
* mean squared error (MSE)
* median absolute error (MEDE)
* coefficient of determination ($R^2$) 

If $y_i$ is the predicted value for the i.th element and $r_i$ is it's true value, then these metrics are defined as follows:
$$
\begin{array}[lcl]
 NMAE & = &   \frac{1}{N}\sum\limits_{i=1}^N |y_i-r_i| \\
 MSE & = &   \frac{1}{N}\sum\limits_{i=1}^N (y_i-r_i)^2  \\
 MEDE & = &  median\left( \; |y_i-r_i|, \; \forall \; i \; \in [1,..,N]\right) \\
\end{array}
$$

$$
R^2  =  1- \frac{SS_e}{SS_r}, \quad \mbox{ with } SS_e=\sum_{i=1}^N(r_i-y_i)^2, \quad  SS_r=\sum_{i=1}^N(r_i-\overline{r})^2 \quad \mbox { and } \quad \overline{r}=\frac{1}{N} \sum_{i=1}^N r_i
$$

Another frequently used regression metric is the **Root Mean Squared Logarithmic Error (RMSLE)**, which is caluclated as follows:

$$
RMSLE = \sqrt{\frac{1}{N} \sum\limits_{i=1}^N(\ln(r_i)-\ln(y_i))^2}
$$

For RMSLE there is no explicit scoring function in scikit-learn, but it can be easily computed via the NMSE-function. The RMSLE is well suited for the case that the error (i.e. the difference between $y_i$ and $r_i$) increases with the values of $r_i$. Then large errors at high values of $r_i$ are weighted less by RMSLE.


In [1]:
def determineRegressionMetrics(y_test,y_pred,title=""):
    mse = mean_squared_error(y_test, y_pred)
    mad = mean_absolute_error(y_test, y_pred)
    #rmsle=np.sqrt(mean_squared_error(np.log(y_test+1),np.log(y_pred+1)))
    r2=r2_score(y_test, y_pred)
    med=median_absolute_error(y_test, y_pred)
    evs = explained_variance_score(y_test, y_pred) 
    print(title)
    print("Mean absolute error =", round(mad, 2))
    print("Mean squared error =", round(mse, 2))
    print("Median absolute error =", round(med, 2))
    print("R2 score =", round(r2, 2))
    #print "Root Mean Squared Logarithmic Error =",rmsle
    print("Explained variance score =", round(evs, 2))
    

**Task 5:** Calculate the learned model's prediction on the test-partition. Use the provided function `determineRegressionMetrics` for calculating all the defined performance metrics. 

**Task 6:** Apply the provided function `plot_feature_importances` for visualizing the feature importances of the learned model. These feature importances can be obtained from the `feature_importances_`-attribute of the learned `GradientBoostingRegressor`-model.

In [34]:
def plot_feature_importances(feature_importances, title, feature_names,std="None"):
    
    # Normalize the importance values 
    feature_importances = 100.0 * (feature_importances / max(feature_importances))
    if std=="None":
        std=np.zeros(len(feature_importances))

    # Sort the values and flip them
    index_sorted = np.flipud(np.argsort(feature_importances))

    # Arrange the X ticks
    pos = np.arange(index_sorted.shape[0]) + 0.5

    # Plot the bar graph
    plt.figure(figsize=(16,10))
    plt.bar(pos, feature_importances[index_sorted], align='center',alpha=0.5,yerr=100*std[index_sorted])
    plt.xticks(pos, feature_names[index_sorted])
    plt.ylabel('Relative Importance')
    plt.title(title)
    plt.show()

**Task 7:** Visualize the true and the predicted output-value for a range within the validation data.

**Task 8:** For the following `GradientBoosting`-hyperparameter ranges, determine the `validation_curve` (scikit-learn function) and plot the curve using the `plot_evaluation_curve()`-function, which is defined in module `utilsJM`.

In [38]:
estimator_range=np.arange(20,330,50)
lr_range=np.arange(0.02,0.2,0.02)
loss=['ls', 'lad', 'huber', 'quantile']

**Task 9:** Define a hyperparameter grid, by combining the hyperparameter-ranges `estimator_range` and `lr_range`. The hyperparameter grid is implemented as a dictionary, whose keys are the name of the hyperparameters and whose values are the corresponding parameter ranges. Apply *scikit-learn's* `GridSearchCV`-module to find the best combination of estimator-number and learning-rate. Repeat this experiment by replacing `GridSearchCV` with `RandomizedSearchCV`. What do you observe?