## Hyper Parameter Tunning

Load required parameters

In [1]:
import numpy as np, pandas as pd, matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV

Load datasets, Obtain response and predictor, and split

In [2]:
from sklearn.datasets import load_breast_cancer, load_diabetes
cancer = load_breast_cancer()
print(cancer.DESCR)

X1 = pd.DataFrame(cancer.data,columns= cancer.feature_names)
y1 = cancer.target

# X1_train,X1_test,y1_train,y1_test = train_test_split(X1,y1,test_size=0.2,random_state=0)

.. _breast_cancer_dataset:

Breast cancer Wisconsin (diagnostic) dataset
--------------------------------------------

**Data Set Characteristics:**

:Number of Instances: 569

:Number of Attributes: 30 numeric, predictive attributes and the class

:Attribute Information:
    - radius (mean of distances from center to points on the perimeter)
    - texture (standard deviation of gray-scale values)
    - perimeter
    - area
    - smoothness (local variation in radius lengths)
    - compactness (perimeter^2 / area - 1.0)
    - concavity (severity of concave portions of the contour)
    - concave points (number of concave portions of the contour)
    - symmetry
    - fractal dimension ("coastline approximation" - 1)

    The mean, standard error, and "worst" or largest (mean of the three
    worst/largest values) of these features were computed for each image,
    resulting in 30 features.  For instance, field 0 is Mean Radius, field
    10 is Radius SE, field 20 is Worst Radius.

    - 

#### breast cancer data is suitable for classification problem

In [3]:
from sklearn import svm
from sklearn.ensemble import RandomForestClassifier

model_params = {
    'svm': {
        'model': svm.SVC(gamma='auto'),
        'params' : {
            'C': np.arange(1,26),
            'kernel': ['rbf','linear']
        }
    },
    'random_forest': {
        'model': RandomForestClassifier(),
        'params' : {'n_estimators': np.arange(1, 26)}
    },
}

Grid search

In [4]:
scores = []
for model_name, mp in model_params.items():
    clf =  GridSearchCV(mp['model'], mp['params'], cv=5, return_train_score=False)
    clf.fit(X1,y1)
    scores.append({
        'model': model_name,
        'best_score': clf.best_score_,
        'best_params': clf.best_params_
    })
df = pd.DataFrame(scores,columns=['model','best_score','best_params'])
print(df)

           model  best_score                    best_params
0            svm    0.956094  {'C': 25, 'kernel': 'linear'}
1  random_forest    0.963127           {'n_estimators': 21}


Random search

In [5]:
scores = []
for model_name, mp in model_params.items():
    clf =  RandomizedSearchCV(mp['model'], mp['params'],
                               cv=5, return_train_score=False,n_iter=2)
    clf.fit(X1,y1)
    scores.append({
        'model': model_name,
        'best_score': clf.best_score_,
        'best_params': clf.best_params_
    })
df = pd.DataFrame(scores,columns=['model','best_score','best_params'])
print(df)

           model  best_score                    best_params
0            svm    0.950815  {'kernel': 'linear', 'C': 11}
1  random_forest    0.964835           {'n_estimators': 12}


Load datasets, Obtain response and predictor, and split

In [6]:
diabetes = load_diabetes()
print(diabetes.DESCR)

X2 = pd.DataFrame(diabetes.data,columns= diabetes.feature_names)
y2 = diabetes.target

# X2_train,X2_test,y2_train,y2_test = train_test_split(X2,y2,test_size=0.2,random_state=0)

.. _diabetes_dataset:

Diabetes dataset
----------------

Ten baseline variables, age, sex, body mass index, average blood
pressure, and six blood serum measurements were obtained for each of n =
442 diabetes patients, as well as the response of interest, a
quantitative measure of disease progression one year after baseline.

**Data Set Characteristics:**

:Number of Instances: 442

:Number of Attributes: First 10 columns are numeric predictive values

:Target: Column 11 is a quantitative measure of disease progression one year after baseline

:Attribute Information:
    - age     age in years
    - sex
    - bmi     body mass index
    - bp      average blood pressure
    - s1      tc, total serum cholesterol
    - s2      ldl, low-density lipoproteins
    - s3      hdl, high-density lipoproteins
    - s4      tch, total cholesterol / HDL
    - s5      ltg, possibly log of serum triglycerides level
    - s6      glu, blood sugar level

Note: Each of these 10 feature variables have bee

### diabetics data suitable for regression problem

In [7]:
from sklearn.svm import SVR
from sklearn.ensemble import RandomForestRegressor

model_params = {
    'svr': {
        'model': SVR(gamma='auto'),
        'params' : {
            'C': np.arange(1,26),
            'kernel': ['rbf','linear']
        }
    },
    'random_forest': {
        'model': RandomForestRegressor(),
        'params' : {'n_estimators': np.arange(1, 26)}
    },
}

Grid search

In [22]:
scores = []
for model_name, mp in model_params.items():
    clf =  GridSearchCV(mp['model'], mp['params'], cv=5, return_train_score=False)
    clf.fit(X2,y2)
    scores.append({
        'model': model_name,
        'best_score': clf.best_score_,
        'best_params': clf.best_params_
    })
df = pd.DataFrame(scores,columns=['model','best_score','best_params'])
print(df)

           model  best_score                    best_params
0            svr    0.292371  {'C': 25, 'kernel': 'linear'}
1  random_forest    0.407903           {'n_estimators': 23}


Random search

In [21]:
scores = []
for model_name, mp in model_params.items():
    clf =  RandomizedSearchCV(mp['model'], mp['params'],
                               cv=5, return_train_score=False,n_iter=2)
    clf.fit(X2,y2)
    scores.append({
        'model': model_name,
        'best_score': clf.best_score_,
        'best_params': clf.best_params_
    })
df = pd.DataFrame(scores,columns=['model','best_score','best_params'])
print(df)

           model  best_score                    best_params
0            svr    0.260210  {'kernel': 'linear', 'C': 21}
1  random_forest    0.357232            {'n_estimators': 8}
