# Predict onset of diabetes using Pima Indians dataset

- Source 1: https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/
- Source 2: https://machinelearningmastery.com/5-step-life-cycle-neural-network-models-keras/
- Dataset: https://archive.ics.uci.edu/ml/datasets/diabetes

# 1. Setup

In [1]:
! conda install keras -y

In [2]:
! conda install tensorflow -y

In [3]:
# imports
from numpy import loadtxt
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense

In [4]:
# data source
# url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv'
url = 'pima-indians-diabetes.data.csv'

In [5]:
# load the dataset
dataset = loadtxt(url, delimiter=',')

In [6]:
# without numpy
df = pd.read_csv(url, header=None)
df.columns = ['pregnancies', 'plasma glucose', 'blood pressure', 
              'skin fold', 'insulin', 'BMI', 'pedigree', 'age', 'onset']
df.shape

(768, 9)

In [7]:
# what the dataset looks like
df.head()

Unnamed: 0,pregnancies,plasma glucose,blood pressure,skin fold,insulin,BMI,pedigree,age,onset
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [8]:
# split into input (X) and output (y) variables
X = dataset[:,0:8]
y = dataset[:,8]

In [9]:
# split into 67% for train and 33% for test
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

# 2. Define Keras Model

In [10]:
# keras has two APIs: Sequential and Functional.
model = Sequential()

2022-07-17 09:32:17.893349: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [11]:
# input layer
model.add(Dense(12, input_dim=8, activation='relu'))

In [12]:
# hidden layer
model.add(Dense(16, activation='relu'))

In [13]:
# output layer
model.add(Dense(1, activation='sigmoid'))

# 3. Compile Keras Model

In [14]:
# compile the keras model
model.compile(loss='binary_crossentropy', 
              optimizer='adam', 
              metrics=['accuracy']
             )

# 4. Fit the model

In [15]:
# fit the keras model on the dataset

model.fit(X_train, 
          y_train, 
          validation_data=(X_test,y_test), 
          epochs=150, 
          batch_size=10)

2022-07-17 09:32:18.154603: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)


Epoch 1/150
Epoch 2/150
Epoch 3/150
Epoch 4/150
Epoch 5/150
Epoch 6/150
Epoch 7/150
Epoch 8/150
Epoch 9/150
Epoch 10/150
Epoch 11/150
Epoch 12/150
Epoch 13/150
Epoch 14/150
Epoch 15/150
Epoch 16/150
Epoch 17/150
Epoch 18/150
Epoch 19/150
Epoch 20/150
Epoch 21/150
Epoch 22/150
Epoch 23/150
Epoch 24/150
Epoch 25/150
Epoch 26/150
Epoch 27/150
Epoch 28/150
Epoch 29/150
Epoch 30/150
Epoch 31/150
Epoch 32/150
Epoch 33/150
Epoch 34/150
Epoch 35/150
Epoch 36/150
Epoch 37/150
Epoch 38/150
Epoch 39/150
Epoch 40/150
Epoch 41/150
Epoch 42/150
Epoch 43/150
Epoch 44/150
Epoch 45/150
Epoch 46/150
Epoch 47/150
Epoch 48/150
Epoch 49/150
Epoch 50/150
Epoch 51/150
Epoch 52/150
Epoch 53/150
Epoch 54/150
Epoch 55/150
Epoch 56/150
Epoch 57/150
Epoch 58/150
Epoch 59/150
Epoch 60/150
Epoch 61/150
Epoch 62/150
Epoch 63/150
Epoch 64/150
Epoch 65/150
Epoch 66/150
Epoch 67/150
Epoch 68/150
Epoch 69/150
Epoch 70/150
Epoch 71/150
Epoch 72/150
Epoch 73/150
Epoch 74/150
Epoch 75/150
Epoch 76/150
Epoch 77/150
Epoch 78

<tensorflow.python.keras.callbacks.History at 0x7f9d11c02490>

# 5. Make Predictions


In [33]:
# make probability predictions with the model
y_probs = model.predict(X_test)
# round predictions 
rounded = [round(x[0]) for x in y_probs]
rounded[:10]

[1, 0, 0, 0, 0, 1, 0, 0, 1, 1]

In [34]:
# sample 10 of those
import numpy as np
probs=y_probs.reshape(-1)
np.random.choice(probs, size=10)

array([0.22760728, 0.37395865, 0.35994267, 0.7683891 , 0.5619863 ,
       0.25141233, 0.6586236 , 0.60772157, 0.37480146, 0.5865148 ],
      dtype=float32)

In [35]:
# make class predictions with the model
y_preds = (model.predict(X_test) > 0.5).astype(int)
print(y_preds[:10].tolist())
print(y_test[:10])

[[1], [0], [0], [0], [0], [1], [0], [0], [1], [1]]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]


# 6. Evaluate

In [36]:
# evaluate the keras model
_, accuracy = model.evaluate(X_test, y_test)
print('Accuracy: %.2f' % (accuracy*100))

Accuracy: 73.62


#### How to read the classification report
- Accuracy: The percentage of predictions that were accurate.
- Precision: Percentage of correct positive predictions relative to total positive predictions.
- Recall: Percentage of correct positive predictions relative to total actual positives.
- F1 Score: A weighted harmonic mean of precision and recall.
- Support: The number of occurrences of each class in y_test (i.e., how many observations belonged to each class in the test dataset).

In [37]:
from sklearn import metrics
# Evaluate the model
print(metrics.classification_report(y_test, y_preds))

              precision    recall  f1-score   support

         0.0       0.82      0.77      0.79       168
         1.0       0.60      0.67      0.63        86

    accuracy                           0.74       254
   macro avg       0.71      0.72      0.71       254
weighted avg       0.75      0.74      0.74       254



# 7. Save the model

In [20]:
model.save("diabetes-model-1.h5")

In [21]:
# load model
from keras.models import load_model
model2 = load_model("diabetes-model-1.h5")