# Parameter Tuning of Regression Algorithms

Regression algorithms are often parameterized. These parameters controls the behaviour and performance of the model. Models can have many parameters and finding the best combination of parameters is known as $\textbf{Hyperparameter optimization}$. The process of hyperparamter optimization uses a search strategy to find a robust parameters for a model on a given problem. 

Scikit learn library is one choice of modeling regression algorithms. The regression models built in scikit learn library comes with default parameter setting. However, these settings can be altered to find the best paramter combinations so as to get good performance by the models. In this notebook,  process of hyperparameter optimization using scikit learn library on different regression models is dicussed. Following  linear and non linear regression models will be used for demonstration

1. Linear models
    - Ridge regression
        * parameter: 
              ** alpha: Regularization strength.It must be a positive float. The default 
                 value is 1. 
    - Lasso regression
        * parameter: 
              ** alpha: Regularization strength.It must be a positive float. The default 
                 value is 1. 
    - Elastic net regression
        * parameter: 
              ** alpha: Regularization strength.It must be a positive float. The default 
                 value is 1. 

2. Non-linear models
    - Classification and Regression Trees(CART): Decision tree regressor
        * parameters: 
                ** max_depth : This indicates how deep the tree can be. The deeper the 
                   tree, the more splits it has and it captures more information about the 
                   data. The default value of None.  
                ** min_samples_split: It represents the minimum number of samples required 
                   to split an internal node. This can vary between considering at least 
                   one sample at each node to considering all of the samples at each node. 
                   When we increase this parameter, the tree becomes more constrained as 
                   it has to consider more samples at each node. Here we will vary the 
                   parameter from 10% to 100% of the samples. The default value is 2.  
                ** min_samples_leaf: The minimum number of samples required to be at a 
                   leaf node. This parameter is similar to min_samples_splits, however, 
                   this describe the minimum number of samples of samples at the leafs,the 
                   base of the tree. The default value is 1.  
                ** max_leaf_nodes: It defines maxinum leaf nodes in the model. The default 
                   value of None. 
    - Support vector regressor
        * parameters:
                ** kernel: Specifies the kernel type to be used in the algorithm either 
                   "linear" or "rbf". The default setting of kernel is rbf.
                ** C : Penalty parameter C of the error term. The default value is 1.0
                ** gamma : Kernel coefficient for ‘rbf’. The default value is 1/n where, n 
                   is number of features in data set.
          
    - K nearest neighbor regressor
        * parameters:
                ** n_neighbors: Number of neighbors to use. The default value is 5. 
                ** metric : Distance metric to use. The default value is "Minkowski"
                
Scikit learn provides Grid search method to be used to find optimal parameter combinations for these models. The method is available under sklearn.model_selection. The name of the method under this library is GridSearchCV(). This method applies exhaustive search over specified parameters values organized in a grid to fit the model using  cross validation technique. In other words, each cross validation run takes pair of parameter values organized in a grid to train  and evaluate the model. The parameter values on which the cross validation results in best performance is considered as final choice of parameters for the model.  

The data set used for demonstration is Moneyball which can downloaded form https://www.kaggle.com/wduckett/moneyball-mlb-stats-19622012/data . The data has been gathered from baseball-reference.com. It contains following features:

1. RA: runs allowed
2. RS:  runs scored
3. OBP: On Base Percentage
4. SLG: Slugging Percentage
5. BA: Batting Average
6. OOBP: opponent’s OBP
7. OSLG: opponent’s SLG
8. W:  wins in that season

The features from 1-7 are used as indicator variables to predict the outcome W(i.e., wins in season). 

The step by step practical learning on hyperparamter optimization of regression models is demonstrated below. Please note that aim of this tutorial is not to find the best regression model for the predictive task. Rather the aim is to learn how to tune paramters of the model during the training phase so as to get robust model for making predictions. 

# 1. Import libraries

In [None]:
from google.colab import files
uploaded = files.upload()

In [None]:
# importing Pandas for data manipulation
import pandas as pd

# importing linear and non linear regression models
from sklearn import linear_model  
from sklearn.svm import SVR
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor

# import GridSearchCV method
from sklearn.model_selection import GridSearchCV

# 2. Load data set

In [None]:
dataset = pd.read_csv("Data sets/moneyball.csv")
dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 902 entries, 0 to 901
Data columns (total 8 columns):
RS          902 non-null int64
RA          902 non-null int64
OBP         902 non-null float64
SLG         902 non-null float64
BA          902 non-null float64
Playoffs    902 non-null int64
RD          902 non-null int64
W           902 non-null int64
dtypes: float64(3), int64(5)
memory usage: 56.5 KB


# 3. Building training and test sets

The training set will be used  by GridSearchCV() method to find best combination of parameters for the given data set. Once the parameters are tuned, the model is tested on the test set. 

In [None]:
# My_data contains all data points from My_data set from from first feature to 6th feature(indicator features)
My_data = dataset.iloc[:,0:7] 

# My_target contains class information which is 7th feature in the data set of 

My_data_target=dataset.iloc[:,7]


X_train, X_test, Y_train, Y_test = train_test_split(My_data, My_data_target, test_size=0.8, random_state=10)

print(X_train.head())
print(Y_test.head())

      RS   RA    OBP    SLG     BA  Playoffs   RD
896  817  680  0.337  0.426  0.267         1  137
542  600  750  0.315  0.363  0.244         0 -150
255  684  622  0.322  0.400  0.257         1   62
412  632  655  0.317  0.361  0.247         0  -23
328  594  583  0.310  0.351  0.247         0   11
437    89
131    84
633    82
195    68
230    94
Name: W, dtype: int64


# 4. Tuning Linear models 



In following codes Ridge, Lasso and Elastic net regression models are tuned on parameter alpha.


The user defined different values to alpha are first set and then these values are exposed to GridSearchCV() to find the best alpha value

In [None]:
# specifying alpha values
alphas =[0.2,0.4,0.5,0.6,0.8,1.0,0.01,0.001]
Dict_alpha = dict(alpha = alphas)

## 4.1 Tuning Ridge regression model

In [None]:
# Creating instance of Ridge model
Model_Ridge = linear_model.Ridge()
# Using GridSearchCV to pass the model,values to alpha and cross validation folds 
Grid_Ridge = GridSearchCV(estimator = Model_Ridge, param_grid=Dict_alpha, cv=10 )
# fitting the model on the training set
Grid_Ridge.fit(X_train, Y_train)
# printing the best alpha value from the given list for the data set in consideration
print("The best alpha value for the Ridge model = ", Grid_Ridge.best_estimator_.alpha)

The best alpha value for the Ridge model =  0.01


## 4.2 Tuning Lasso regression model

In [None]:
# Creating instance of Lasso model
Model_Lasso = linear_model.Lasso(max_iter=10000)
# Using GridSearchCV to pass the model,values to alpha and cross validation folds 
Grid_Lasso = GridSearchCV(estimator = Model_Lasso, param_grid=Dict_alpha, cv=10 )
# fitting the model on the training set
Grid_Lasso.fit(X_train, Y_train)
# printing the best alpha value from the given list for the data set in consideration
print("The best alpha value for the Lasso model = ", Grid_Lasso.best_estimator_.alpha)

The best alpha value for the Lasso model =  0.01


## 4.3 Tuning Elastic net regression model

In [None]:
# Creating instance of Elastic net model
Model_ElasticNet = linear_model.ElasticNet(max_iter=10000)
# Using GridSearchCV to pass the model,values to alpha and cross validation folds
Grid_ElasticNet = GridSearchCV(estimator = Model_ElasticNet, param_grid=Dict_alpha, cv=10 )
# fitting the model on the training set
Grid_ElasticNet.fit(X_train, Y_train)
# printing the best alpha value from the given list for the data set in consideration
print("The best alpha value for the ElasticNet model = ", Grid_ElasticNet.best_estimator_.alpha)

The best alpha value for the ElasticNet model =  0.001


# 5. Tuning Non-Linear Regression models

In following codes CART, Knearest neighbor regressor and Support vector  regression models are tuned on parameter defined in the introduction of this tutorial.

## 5.1 Parameter setting for Support Vector Regressor

The model can be tuned on paramteres such as, kernel type, C and gamma. The code below first set the choice of paramters and then GridSearchCV() will be used to search the best combination of parameters for the data in consideration. 

In [None]:
# setting paramters
param_grid_SVR = [
  {'C': [1, 10, 100, 1000], 'kernel': ['linear']},
  {'C': [1, 10, 100, 1000], 'gamma': [0.001, 0.0001], 'kernel': ['rbf']},
 ]

## 5.2 Tuning Support Vector Regressor

In [None]:
# Creating instance of SVR model
Model_SVR = SVR()
# Using GridSearchCV to pass the model,values to parameters for tuning
# and cross validation folds
Grid_SVR = GridSearchCV(estimator = Model_SVR, param_grid=param_grid_SVR, cv=5)
# fitting the model on the training set
Grid_SVR.fit(X_train, Y_train)
# printing the best parameter values from the given list for the data set in consideration
print("The best parameter choice of SVR model", Grid_SVR.best_estimator_)

The best parameter choice of SVR model SVR(C=1, cache_size=200, coef0=0.0, degree=3, epsilon=0.1,
    gamma='auto_deprecated', kernel='linear', max_iter=-1, shrinking=True,
    tol=0.001, verbose=False)


## 5.3 Parameter setting for CART

The model can be tuned on paramteres such as, maxinum depth of tree, maximum leaf and more. The code below first set the choice of paramters and then GridSearchCV() will be used to search the best combination of parameters for the data in consideration. 

In [None]:
# setting paramters
param_grid_DTR = {"min_samples_split": [10, 20, 40],
              "max_depth": [2, 6, 8],
              "min_samples_leaf": [20, 40, 100],
              "max_leaf_nodes": [5, 20, 100],
              }

## 5.4 Tuning CART

In [None]:
# Creating instance of CART model
Model_DTR = DecisionTreeRegressor()
# Using GridSearchCV to pass the model,values to parameters for tuning
# and cross validation folds
Grid_DTR = GridSearchCV(estimator = Model_DTR, param_grid=param_grid_DTR, cv=5)
# fitting the model on the training set
Grid_DTR.fit(X_train, Y_train)
# printing the best parameters from the given list for the data set in consideration
print("The best parameter choice of Decision tree Regressor model:", Grid_DTR.best_estimator_)

## 5.5 Parameter setting for K nearest Neighbour Regressor

The model can be tuned on paramteres such as, k NN. The code below first set the choice of paramters and then GridSearchCV() will be used to search the best combination of parameters for the data in consideration. 

In [None]:
# setting paramters
param_grid_KNN ={"n_neighbors": [5,7,9,11,13,15,17,19,21]}

## 5.6 Tuning K Nearest Neighbour Regressor

In [None]:
# Creating instance of KNN regressor model
Model_KNN = KNeighborsRegressor()
# Using GridSearchCV to pass the model,values to parameters for tuning
# and cross validation folds
Grid_KNN = GridSearchCV(estimator = Model_KNN, param_grid=param_grid_KNN, cv=5)
# fitting the model on the training set
Grid_KNN.fit(X_train, Y_train)
# printing the best parameters from the given list for the data set in consideration
print("The best parameter choice of KNN Regressor model:", Grid_KNN.best_estimator_)

The best parameter choice of KNN Regressor model: KNeighborsRegressor(algorithm='auto', leaf_size=30, metric='minkowski',
                    metric_params=None, n_jobs=None, n_neighbors=7, p=2,
                    weights='uniform')
