<a href="https://colab.research.google.com/github/marcelounb/Deep_Learning_with_python_JasonBrownlee/blob/master/09_2_Kfold_SciPy_Pima_Indians_Onset_of_Diabetes_Dataset.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 9.2 Evaluate Models with Cross Validation
 patient medical record data for Pima Indians and whether they had an onset of diabetes within ﬁve years. It is a binary classiﬁcation problem (onset of diabetes as 1 or not as 0). The input variables that describe each patient are numerical and have varying scales. Below lists the eight attributes for the dataset:
1. Number of times pregnant.
2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test.
3. Diastolic blood pressure (mm Hg).
4. Triceps skin fold thickness (mm).
5. 2-Hour serum insulin (mu U/ml).
6. Body mass index.
7. Diabetes pedigree function.
8. Age (years).
9. Class, onset of diabetes within ﬁve years.

Given that all attributes are numerical makes it easy to use directly with neural networks that expect numerical inputs and output values, and ideal for our ﬁrst neural network in Keras. This dataset will also be used for a number of additional lessons coming up in this book, so keep it handy. below is a sample of the dataset showing the ﬁrst 5 rows of the 768 instances:

6,148,72,35,0,33.6,0.627,50,1 

1,85,66,29,0,26.6,0.351,31,0 

8,183,64,0,0,23.3,0.672,32,1 

1,89,66,23,94,28.1,0.167,21,0 

0,137,40,35,168,43.1,2.288,33,1


# Loading Data

In [1]:
from keras.models import Sequential 
from keras.layers import Dense 
import numpy as np
# fix random seed for reproducibility 
seed = 7 
np.random.seed(seed)

Using TensorFlow backend.


In [0]:
# load pima indians dataset 
dataset = np.loadtxt("pima-indians-diabetes.csv", delimiter=",") 
# split into input (X) and output (Y) variables 
X = dataset[:,0:8] 
Y = dataset[:,8]

In [3]:
X.shape

(768, 8)

In [4]:
Y.shape

(768,)

In [0]:
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import cross_val_score

In [0]:
# Function to create model, required for KerasClassifier
def create_model():
  # create model 
  model = Sequential() 
  model.add(Dense(12, input_dim=8, kernel_initializer= 'uniform' , activation= 'relu' )) 
  model.add(Dense(8, kernel_initializer= 'uniform' , activation= 'relu' )) 
  model.add(Dense(1, kernel_initializer= 'uniform' , activation= 'sigmoid' ))
  # Compile model 
  model.compile(loss= 'binary_crossentropy' , optimizer= 'adam' , metrics=[ 'accuracy' ])
  return model

In [0]:
# create model 
model = KerasClassifier(build_fn=create_model, nb_epoch=150, batch_size=10, verbose=0) 

In [8]:
# evaluate using 10-fold cross validation 
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed) 
results = cross_val_score(model, X, Y, cv=kfold) 
print(results.mean())


0.6510594606399536


Running the example displays the skill of the model for each epoch. A total of 10 models are created and evaluated and the ﬁnal average accuracy is displayed.

You can see that when the Keras model is wrapped that estimating model accuracy can be greatly streamlined, compared to the manual enumeration of cross validation folds performed in the previous lesson.
