<a href="https://colab.research.google.com/github/okanbuyuktepe/Deep-learning-Exercises/blob/master/pima_indians.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


**Pima Indians Onset of Diabetes**
- It is a binary classification problem (onset of diabetes as 1 or not as 0).
- Below lists the eight attributes for the dataset:
  - Number of times pregnant.
  - Plasma glucose concentration a 2 hours in an oral glucose tolerance test.
  - Diastolic blood pressure (mm Hg).
  - Triceps skin fold thickness (mm).
  - 2-Hour serum insulin (mu U/ml).
  - Body mass index.
  - Diabetes pedigree function.
  - Age (years).
  - Class, onset of diabetes within five years.

In [1]:
'''
Whenever we work with machine learning algorithms that use a stochastic process (e.g. random
numbers), it is a good idea to initialize the random number generator with a fixed seed value.
This is so that you can run the same code again and again and get the same result.
'''
from keras.models import Sequential
from keras.layers import Dense
import numpy as np
# fix random seed for reproducibility
np.random.seed(7)

In [2]:
# Load data
url = ('https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv')
dataset = np.loadtxt(url, delimiter=',')
# split into input and output variables
X = dataset[:,0:8]
Y = dataset[:,8]

In [None]:
# Define Model
# The first thing to get right is to ensure the input layer has the right number of inputs.
# This can be specified when creating the first layer with the input_dim argument and setting it to 8 for the 8 input variables.
# There are heuristics that we can use and often the best network structure is found through a process of trial and error 
# experimentation.
# Generally, you need a network large enough to capture the structure of the problem if that helps at all. In this example we will use a
# fully-connected network structure with three layers.
# Fully connected layers are defined using the Dense class.

In [4]:
# Create model
model = Sequential()
model.add(Dense(12,input_dim=8,activation='relu'))
model.add(Dense(8,activation='relu'))
model.add(Dense(1,activation='sigmoid'))

In [7]:
# Compile Model
# Compiling the model uses the efficient numerical libraries under the covers (the so-called backend) such as Theano or TensorFlow.
# Remember training a network means finding the best set of weights to make predictions for this problem.
# We must specify the loss function to use to evaluate a set of weights, the optimizer used to search through different weights
# for the network and any optional metrics we would like to collect and report during training.
# In this case we will use logarithmic loss, which for a binary classiffication problem is defined in Keras as binary_crossentropy. 
# Will also use the efficient gradient descent algorithm adam for no other reason that it is an efficient default.

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

In [None]:
# Fit the model
# We can also set the number of instances that are evaluated before a weight update in the network is performed called the
# batch size and set using the batch_size argument.
model.fit(X, Y, epochs=150, batch_size=10)

In [None]:
# Evaluate Model
# We can evaluate the performance of the network on the same dataset. This will only give us an idea of how well we have modeled
# the dataset (e.g. train accuracy), but no idea of how well the algorithm might perform on new data.

scores = model.evaluate(X, Y)
print("\n%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))

In [None]:
# Use a Automatic Verification Dataset
from keras.models import Sequential
from keras.layers import Dense
import numpy

numpy.random.seed(7)
url = ('https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv')
dataset = numpy.loadtxt(url, delimiter=',')
X = dataset[:,0:8]
Y = dataset[:,8]

model = Sequential()
model.add(Dense(12,input_dim=8, activation='relu'))
model.add(Dense(8,activation='relu'))
model.add(Dense(1,activation = 'sigmoid'))

model.compile(loss='binary_crossentropy', optimizer = 'adam', metrics=['accuracy'])
model.fit(X, Y, validation_split=0.33, epochs=150, batch_size=10)

In [None]:
# Use a Manual Verification Dataset
from keras.models import Sequential
from keras.layers import Dense
from sklearn.model_selection import train_test_split
import numpy

numpy.random.seed(7)
url = ('https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv')
dataset = numpy.loadtxt(url, delimiter=',')
X = dataset[:,0:8]
Y = dataset[:,8]

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size =0.33, random_state=7)

model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8,activation='relu'))
model.add(Dense(1,activation='sigmoid'))

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=150, batch_size=10)

In [None]:
# Manual k-fold Cross-Validation
from keras.models import Sequential
from keras.layers import Dense
from sklearn.model_selection import StratifiedKFold
import numpy

numpy.random.seed(7)
url = ('https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv')
dataset = numpy.loadtxt(url, delimiter=',')
X = dataset[:,0:8]
Y = dataset[:,8]
# define 10-fold cross validation test harness
kfold = StratifiedKFold(n_splits=10, shuffle= True, random_state=7)
cvscores = []
for train, test in kfold.split(X, Y):
  model.add(Dense(12, input_dim=8, activation='relu'))
  model.add(Dense(8, activation='relu'))
  model.add(Dense(1, activation='sigmoid'))
  model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
  model.fit(X[train], Y[train], epochs=150, batch_size=10, verbose=0)

  #evaluate the model
  scores = model.evaluate(X[test], Y[test], verbose=0)
  print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
  cvscores.append(scores[1]*100)

print("%.2f%% (+/- %.2f%%)" % (numpy.mean(cvscores), numpy.std(cvscores)))

In [None]:
# Evaluate a Neural Network using scikit-learn
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import cross_val_score
import numpy

numpy.random.seed(7)

# Function to create model, required for KerasClassifier
def create_model():
  model = Sequential()
  model.add(Dense(12, input_dim=8, activation='relu'))
  model.add(Dense(8, activation='relu'))
  model.add(Dense(1, activation='sigmoid'))
  model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
  return model

dataset = numpy.loadtxt(url, delimiter=',')
X = dataset[:,0:8]
Y = dataset[:,8]

# create model
model = KerasClassifier(build_fn=create_model, epochs=150, batch_size = 10, verbose=0)
kfold= StratifiedKFold(n_splits=10, shuffle=True, random_state=7)
results = cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

In [14]:
# MLP for Pima Indians Dataset with grid search via sklearn
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV
import numpy

# Function to create model, required for KerasClassifier
def create_model(optimizer='rmsprop', init='glorot_uniform'):
  model = Sequential()
  model.add(Dense(12, input_dim=8, kernel_initializer=init, activation='relu'))
  model.add(Dense(8, kernel_initializer=init, activation='relu'))
  model.add(Dense(1,kernel_initializer=init, activation='sigmoid'))
  model.compile(loss='binary_crossentropy', optimizer = optimizer, metrics=['accuracy'])
  return model

seed = 7
numpy.random.seed(seed)
url = ('https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv')
dataset = numpy.loadtxt(url, delimiter=',')
X = dataset[:,0:8]
Y = dataset[:,8]

model = KerasClassifier(build_fn=create_model, verbose=0)
# grid search epochs, batch size and optimizer
optimizers = ['rmsprop', 'adam']
inits = ['glorot_uniform', 'normal', 'uniform']
epochs = [50, 100, 150]
batches = [5, 10, 20]
param_grid = dict(optimizer = optimizers, epochs=epochs, batch_size=batches, init=inits)
grid = GridSearchCV(estimator = model, param_grid=param_grid)
grid_result = grid.fit(X, Y)
# summarize results
print('Best: %f using %s' % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
  print('%f (%f) with: %r' % (mean, stdev, param))


Best: 0.760513 using {'batch_size': 5, 'epochs': 150, 'init': 'uniform', 'optimizer': 'adam'}
0.696690 (0.039341) with: {'batch_size': 5, 'epochs': 50, 'init': 'glorot_uniform', 'optimizer': 'rmsprop'}
0.724056 (0.052683) with: {'batch_size': 5, 'epochs': 50, 'init': 'glorot_uniform', 'optimizer': 'adam'}
0.712240 (0.027752) with: {'batch_size': 5, 'epochs': 50, 'init': 'normal', 'optimizer': 'rmsprop'}
0.710975 (0.051658) with: {'batch_size': 5, 'epochs': 50, 'init': 'normal', 'optimizer': 'adam'}
0.704499 (0.030901) with: {'batch_size': 5, 'epochs': 50, 'init': 'uniform', 'optimizer': 'rmsprop'}
0.679696 (0.028185) with: {'batch_size': 5, 'epochs': 50, 'init': 'uniform', 'optimizer': 'adam'}
0.688787 (0.015359) with: {'batch_size': 5, 'epochs': 100, 'init': 'glorot_uniform', 'optimizer': 'rmsprop'}
0.746202 (0.040712) with: {'batch_size': 5, 'epochs': 100, 'init': 'glorot_uniform', 'optimizer': 'adam'}
0.739615 (0.035654) with: {'batch_size': 5, 'epochs': 100, 'init': 'normal', 'opti