# **MLP - Pima Diabets**

**Context**

This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.

**Content**

The datasets consists of several medical predictor variables and one target variable, Outcome. Predictor variables includes the number of pregnancies the patient has had, their BMI, insulin level, age, and so on.

**Acknowledgements**

Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C., & Johannes, R.S. (1988). Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the Symposium on Computer Applications and Medical Care (pp. 261--265). IEEE Computer Society Press.

**Inspiration**

Can you build a machine learning model to accurately predict whether or not the patients in the dataset have diabetes or not?

In [1]:
# Google drive mount
from google.colab import drive
drive.mount('/content/drive')

# Set my working directory
import os
os.chdir("/content/drive/MyDrive/Deep Learning/02 MLP")
os.getcwd()

Mounted at /content/drive


'/content/drive/MyDrive/Deep Learning/02 MLP'

In [3]:
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim

# load the dataset, split into input (X) and output (y) variables
dataset = np.loadtxt('../dataset/dataset_diabetes.csv', skiprows=1, delimiter=',')
X = dataset[:,0:8]
y = dataset[:,8]

X = torch.tensor(X, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)

# define the model
model = nn.Sequential(
    nn.Linear(8, 12),
    nn.ReLU(),
    nn.Linear(12, 8),
    nn.ReLU(),
    nn.Linear(8, 1),
    nn.Sigmoid()
)
print(model)

# train the model
loss_fn   = nn.BCELoss()  # binary cross entropy
optimizer = optim.Adam(model.parameters(), lr=0.001)

n_epochs = 100
batch_size = 10

for epoch in range(n_epochs):
    for i in range(0, len(X), batch_size):
        Xbatch = X[i:i+batch_size]
        y_pred = model(Xbatch)
        ybatch = y[i:i+batch_size]
        loss = loss_fn(y_pred, ybatch)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    print(f'Finished epoch {epoch}, latest loss {loss}')

# compute accuracy (no_grad is optional)
with torch.no_grad():
    y_pred = model(X)
accuracy = (y_pred.round() == y).float().mean()
print(f"Accuracy {accuracy}")

Sequential(
  (0): Linear(in_features=8, out_features=12, bias=True)
  (1): ReLU()
  (2): Linear(in_features=12, out_features=8, bias=True)
  (3): ReLU()
  (4): Linear(in_features=8, out_features=1, bias=True)
  (5): Sigmoid()
)
Finished epoch 0, latest loss 0.6705176830291748
Finished epoch 1, latest loss 0.624144434928894
Finished epoch 2, latest loss 0.6597209572792053
Finished epoch 3, latest loss 0.5886387228965759
Finished epoch 4, latest loss 0.5743430256843567
Finished epoch 5, latest loss 0.5605312585830688
Finished epoch 6, latest loss 0.543185830116272
Finished epoch 7, latest loss 0.5287259221076965
Finished epoch 8, latest loss 0.5269332528114319
Finished epoch 9, latest loss 0.5021027326583862
Finished epoch 10, latest loss 0.4929580092430115
Finished epoch 11, latest loss 0.4841318130493164
Finished epoch 12, latest loss 0.46976861357688904
Finished epoch 13, latest loss 0.45779111981391907
Finished epoch 14, latest loss 0.44852250814437866
Finished epoch 15, latest loss