### Hyperparameter tuning with GridSearchCV
Like the alpha parameter of lasso and ridge regularization, logistic regression also has a regularization parameter: __CC__. 

CC controls the inverse of the regularization strength.  A large CC can lead to an overfit model, while a small CC can lead to an underfit model.

The hyperparameter space for CC has been setup for you. Your job is to use GridSearchCV and logistic regression to find the optimal CC in this hyperparameter space. The feature array is available as X and target variable array is available as y.

Beware, as opposed to splitting the data into training and test sets, the focus here is the process of setting up the hyperparameter grid and performing grid-search cross-validation. In practice,  hold out a portion of the data for evaluation purposes.

In [14]:
# Import necessary modules
import pandas as pd
import numpy as np

from sklearn.linear_model import LogisticRegression

#from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV

In [15]:
column_names = ['pregnancies', 'glucose', 'diastolic', 'triceps', 'insulin', 'bmi',
       'dpf', 'age', 'diabetes']

df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data',
                 names = column_names)

In [16]:
# Note the use of .drop() to drop the target variable 'party' from the feature array X as well as the use of the 
# .values attribute to ensure X and y are NumPy arrays. Without using .values, X and y are a DataFrame and Series 
# respectively; the scikit-learn API will accept them in this form also as long as they are of the right shape.

# build predictor and target df
X, y = df.drop('diabetes', axis=1).values, df['diabetes'].values

In [17]:
# Setup the hyperparameter grid
c_space = np.logspace(-5, 8, 15)
param_grid = {'C': c_space}

# Instantiate a logistic regression classifier: logreg
logreg = LogisticRegression()

# Instantiate the GridSearchCV object: logreg_cv
logreg_cv = GridSearchCV(logreg, param_grid, cv=5)

# Fit it to the data
logreg_cv.fit(X, y)

# Print the tuned parameters and score
print("Tuned Logistic Regression Parameters: {}".format(logreg_cv.best_params_)) 
print("Best score is {}".format(logreg_cv.best_score_))


Tuned Logistic Regression Parameters: {'C': 268.26957952797272}
Best score is 0.7708333333333334
