# GridSearchCV to find the best parameters

This method is useful to create a bunch of model with different hyper-parameters and find the optimal values.

Attributes:
 - an estimator (model type: regressor or classification)
 - a parameter space (set of values - hyperparameters)
 - a method for searching
 - a cross-validation scheme
 - a score function
    

In [1]:
from sklearn.model_selection import GridSearchCV

Other options: 

* [RandomizedSearch](http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html#sklearn.model_selection.RandomizedSearchCV): Randomized search on hyper parameters.
* [ParameterSampler](http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.ParameterSampler.html#sklearn.model_selection.ParameterSampler): Generator on parameters sampled from given distributions.


***
***
***

Let's use a clean dataset that corresponds to a bunch of painting' features and their prices.

In [2]:

import pandas as pd

dataset = pd.read_csv("scrap_docs/paintings/multivariate__train_paintings.csv")


## Create target

In [3]:

# We want to predict the price of the painting based on its features

continuous_target = dataset['list_price']
continuous_target.describe()


count    24225.000000
mean      2113.428145
std       3340.112620
min          0.000000
25%        500.000000
50%       1000.000000
75%       2500.000000
max      48000.000000
Name: list_price, dtype: float64

In [4]:

## This prices correspond to a particular tier previously constructed for the business problem
## Range of prices of this paintings: prices up to 1,111 USD
## We want to predict if the price will fall in this category based on the painting' features

classification_target = dataset['tier_3']
classification_target.value_counts()


0    12754
1    11471
Name: tier_3, dtype: int64

In [5]:

dataset[['tier_3','list_price']].groupby('tier_3').describe()


Unnamed: 0_level_0,list_price,list_price,list_price,list_price,list_price,list_price,list_price,list_price
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max
tier_3,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
0,12754.0,528.701149,277.938226,0.0,300.0,500.0,750.0,1111.0
1,11471.0,3875.402525,4192.658681,1120.0,1650.0,2500.0,4295.0,48000.0


## Select features

Artist information and painting features. Previously defined based on the business problem.

In [6]:
    
list_of_features = ['width','height','developed_country','subject_Animal','subject_Geometric',
                    'subject_Landscape','num_following','num_followed','grad_ed','artist_missing']

data = dataset[list_of_features]


In [7]:
data.head(5)

Unnamed: 0,width,height,developed_country,subject_Animal,subject_Geometric,subject_Landscape,num_following,num_followed,grad_ed,artist_missing
0,15.7,15.7,1,0,0,0,1,0,0,0
1,39.4,39.4,1,0,0,0,2,2,0,0
2,39.4,39.4,1,0,0,0,2,2,0,0
3,31.5,39.4,1,0,1,0,759,35,0,0
4,40.0,30.0,0,0,0,0,1,0,0,0


## Normalization

What is normalization? Why do we need normalization? Each time we work with data, it is very important to consider the "scale" of the features. Some features might have distinct values from 1 to 1000, and other features might have values from 0 to 1. As many different data science/machine learning methods compare data along different dimensions, it can often be important to make sure the dimensions are comparable.

To do this re-scaling there are are many approaches, the most common being:

- _Normalization_ : we rescale our data so that the features have unit norms  
- _Standardization_ : we rescale our data acting as if each features is normally distributed (Gaussian with zero mean and unit variance)
- _Scaling to a range_ : we rescale our data based on the minimum and maximum value of each feature 
- Robust scaler for outliers: preprocessing.RobustScaler


( sklearn has a built-in function to help us re-scaling our data -- see below)

**Let's take a look at the data before and after re-scaling.**

Before re-scaling:

In [8]:

data.head(3)


Unnamed: 0,width,height,developed_country,subject_Animal,subject_Geometric,subject_Landscape,num_following,num_followed,grad_ed,artist_missing
0,15.7,15.7,1,0,0,0,1,0,0,0
1,39.4,39.4,1,0,0,0,2,2,0,0
2,39.4,39.4,1,0,0,0,2,2,0,0


In [9]:

summary = data.describe()
for column in summary: summary[column] = summary[column].apply(lambda x: round(x,1))
summary


Unnamed: 0,width,height,developed_country,subject_Animal,subject_Geometric,subject_Landscape,num_following,num_followed,grad_ed,artist_missing
count,24225.0,24225.0,24225.0,24225.0,24225.0,24225.0,24225.0,24225.0,24225.0,24225.0
mean,27.4,27.1,0.7,0.1,0.0,0.1,44.6,52.8,0.1,0.0
std,24.0,25.3,0.5,0.2,0.1,0.3,129.0,127.2,0.3,0.1
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,15.7,15.7,0.0,0.0,0.0,0.0,1.0,2.0,0.0,0.0
50%,23.6,23.8,1.0,0.0,0.0,0.0,6.0,9.0,0.0,0.0
75%,35.8,36.0,1.0,0.0,0.0,0.0,32.0,41.0,0.0,0.0
max,1535.4,2300.0,1.0,1.0,1.0,1.0,3844.0,1465.0,1.0,1.0


Now let's use the sklearn library [preprocessing](http://scikit-learn.org/stable/modules/preprocessing.html) and see how the data looks like.

In [10]:

# Select continuous variables
unique_values_per_column = {i:len(data[i].unique()) for i in data.describe()}   # get unique values
unique_values_per_column

# Select all non-binary
continuous_predictors = []
for k in unique_values_per_column:
    if unique_values_per_column[k]!=2: continuous_predictors.append(k)

# We should confirm that all of this columnd are not categorical (multi-labels)
# If we find those.. We should remove all categorical and create dummies 
continuous_predictors
    

['height', 'width', 'num_following', 'num_followed']

** Normalization based on the train **

In [11]:

from sklearn import preprocessing 

data_scaled = data.copy()

# Scaler
std_scale = preprocessing.StandardScaler().fit( data_scaled[continuous_predictors] )

# Transform data (scaled)
data_continuous_std = std_scale.transform( data_scaled[continuous_predictors] )
data_continuous_std = pd.DataFrame( data_continuous_std, columns=data_scaled[continuous_predictors].columns )

# Complete dataset
for i in data_continuous_std:  
    data_scaled[i] = data_continuous_std[i]


In [12]:

data_scaled.head(3)


Unnamed: 0,width,height,developed_country,subject_Animal,subject_Geometric,subject_Landscape,num_following,num_followed,grad_ed,artist_missing
0,-0.485574,-0.451106,1,0,0,0,-0.33812,-0.414939,0,0
1,0.500637,0.484355,1,0,0,0,-0.330366,-0.399213,0,0
2,0.500637,0.484355,1,0,0,0,-0.330366,-0.399213,0,0


In [13]:

summary = data_scaled.describe()
for column in summary: summary[column] = summary[column].apply(lambda x: round(x,1))
summary


Unnamed: 0,width,height,developed_country,subject_Animal,subject_Geometric,subject_Landscape,num_following,num_followed,grad_ed,artist_missing
count,24225.0,24225.0,24225.0,24225.0,24225.0,24225.0,24225.0,24225.0,24225.0,24225.0
mean,-0.0,0.0,0.7,0.1,0.0,0.1,0.0,0.0,0.1,0.0
std,1.0,1.0,0.5,0.2,0.1,0.3,1.0,1.0,0.3,0.1
min,-1.1,-1.1,0.0,0.0,0.0,0.0,-0.3,-0.4,0.0,0.0
25%,-0.5,-0.5,0.0,0.0,0.0,0.0,-0.3,-0.4,0.0,0.0
50%,-0.2,-0.1,1.0,0.0,0.0,0.0,-0.3,-0.3,0.0,0.0
75%,0.4,0.4,1.0,0.0,0.0,0.0,-0.1,-0.1,0.0,0.0
max,62.8,89.7,1.0,1.0,1.0,1.0,29.5,11.1,1.0,1.0


## Run continuous models using GridSearch

**Continuous variable: List price**

Models from: [Sklearn documentation](http://scikit-learn.org/stable/modules/classes.html#module-sklearn.linear_model)

In [14]:

from sklearn import linear_model
import numpy as np

models_dictionary = {
    'lasso': linear_model.Lasso(),
    'ridgeRegression': linear_model.Ridge(),
    'elasticNet': linear_model.ElasticNet(),
    'bayesianRegression': linear_model.BayesianRidge(),
    'linearRegression': linear_model.LinearRegression(),
    'sgdRegressor': linear_model.SGDRegressor()
    }

parameters_dictionary = {
    'lasso': { 'alpha': [10,1,0.5,0.1,0.01,0.001,0.0001] },
    'ridgeRegression': { 'alpha': [10,1,0.5,0.1,0.01,0.001,0.0001] },
    'elasticNet': { 'alpha': [10,1,0.5,0.1,0.01,0.001,0.0001] },
    'bayesianRegression': { 'n_iter': [100, 200, 300] },
    'linearRegression': { 'fit_intercept': [True,False] },
    'sgdRegressor': {'penalty': ['l1','l2'], 'alpha': [10,1,0.5,0.1,0.01,0.001,0.0001] }
    }


# If randomsearch, you can create for example uniform distribution to sample the alpha parameter
# param_grid = {'alpha': sp_rand()} in the RandomizedSearchCV

grid_dictionary = {}
for m in models_dictionary:
    print m
    grid_dictionary[m] = GridSearchCV(models_dictionary[m], param_grid = parameters_dictionary[m], 
                                      scoring='r2', cv=5, )
    grid_dictionary[m].fit(data_scaled, continuous_target)


ridgeRegression
linearRegression
bayesianRegression
sgdRegressor
elasticNet
lasso


In [15]:

for m in grid_dictionary:
    print "\nMODEL",m,", score=", grid_dictionary[m].best_score_
    print "----> Best parameter selected", grid_dictionary[m].best_estimator_ 

print "\n\nOther results",sorted(grid_dictionary[m].cv_results_.keys())



MODEL ridgeRegression , score= 0.0518977682103
----> Best parameter selected Ridge(alpha=10, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

MODEL linearRegression , score= 0.0516814986941
----> Best parameter selected LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

MODEL bayesianRegression , score= 0.0524122278944
----> Best parameter selected BayesianRidge(alpha_1=1e-06, alpha_2=1e-06, compute_score=False, copy_X=True,
       fit_intercept=True, lambda_1=1e-06, lambda_2=1e-06, n_iter=100,
       normalize=False, tol=0.001, verbose=False)

MODEL sgdRegressor , score= 0.120342065922
----> Best parameter selected SGDRegressor(alpha=1, average=False, epsilon=0.1, eta0=0.01,
       fit_intercept=True, l1_ratio=0.15, learning_rate='invscaling',
       loss='squared_loss', n_iter=5, penalty='l2', power_t=0.25,
       random_state=None, shuffle=True, verbose=0, warm_start=False)

MODEL elasticNet

## Run classification models using GridSearch

**Discrete variable: Tier_3**

Models from: [Sklearn documentation](http://scikit-learn.org/stable/modules/classes.html#module-sklearn.linear_model)

In [17]:

from sklearn import linear_model, neural_network, svm, tree
import numpy as np

models_dictionary = {
    'logistic': linear_model.LogisticRegression(),
    'perceptron': linear_model.Perceptron(),
    'sgdClassifier': linear_model.SGDClassifier(),
    'mlpClassifier': neural_network.MLPClassifier(),
    'decisionTree': tree.DecisionTreeClassifier(),
    'supportVector': svm.SVC()
    }

parameters_dictionary = {
    'logistic': {'penalty':['l1','l2'], 'C':[100,10,0.1,0.01,0.001]},
    'perceptron': {'penalty':['l1','l2'], 'alpha':[10,0.1,0.01]},
    'sgdClassifier': {'penalty':['l1','l2'], 'alpha':[100,10,0.1,0.01,0.001],
                      'loss':['hinge', 'log', 'modified_huber']},
    'mlpClassifier': {'activation':['logistic','tanh','relu'],
                     'alpha':[100,10,0.1,0.01]},
    'supportVector': {'C':[10,0.1,0.01]},
    'decisionTree': {'max_depth':[10,20,30], 
                     'min_samples_split':[len(classification_target)/10,len(classification_target)/100],
                    'min_samples_leaf':[len(classification_target)/10,len(classification_target)/100]}
    }


# If randomsearch, you can create for example uniform distribution to sample the alpha parameter
# param_grid = {'alpha': sp_rand()} in the RandomizedSearchCV

grid_dictionary = {}
for m in models_dictionary:
    print m
    grid_dictionary[m] = GridSearchCV(models_dictionary[m], param_grid = parameters_dictionary[m], 
                                      scoring='roc_auc', cv=3)
    grid_dictionary[m].fit(data_scaled, classification_target)
    

supportVector
mlpClassifier
decisionTree
sgdClassifier
logistic
perceptron


In [18]:

for m in grid_dictionary:
    print "\nMODEL",m,", score=", grid_dictionary[m].best_score_
    print "----> Best parameter selected", grid_dictionary[m].best_estimator_ 

print "\n\nOther results",sorted(grid_dictionary[m].cv_results_.keys())



MODEL supportVector , score= 0.843667174574
----> Best parameter selected SVC(C=0.1, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

MODEL mlpClassifier , score= 0.849745554651
----> Best parameter selected MLPClassifier(activation='relu', alpha=0.01, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(100,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False)

MODEL decisionTree , score= 0.839948668011
----> Best parameter selected DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=10,
            max_features=None, max_leaf_nodes=