# S2 - L2 - E2 - Diabetes prediction


This excersise consist in designing and building a feed-forward neural network to predict diabetes diseasse using 'Pima indians diabetes database'. 'Outcome' will be the target column. 0 means no diseasse , 1 diabetes

## 1. Set up environment

In [26]:
import pandas as pd
import numpy as np
import matplotlib as plt
import torch
import torch.optim as optim
import torch.nn as nn
from  sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, f1_score, classification_report


## 2 Load and explore dataset

In [27]:
df = pd.read_csv('diabetes.csv')
df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


Search for NaNs and inconsistencies in the dataset

In [28]:

print("\nInformación del dataset:")
df.info()


Información del dataset:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Pregnancies               768 non-null    int64  
 1   Glucose                   768 non-null    int64  
 2   BloodPressure             768 non-null    int64  
 3   SkinThickness             768 non-null    int64  
 4   Insulin                   768 non-null    int64  
 5   BMI                       768 non-null    float64
 6   DiabetesPedigreeFunction  768 non-null    float64
 7   Age                       768 non-null    int64  
 8   Outcome                   768 non-null    int64  
dtypes: float64(2), int64(7)
memory usage: 54.1 KB


Cleaning and data preprocessing

In [29]:
print('Valores nulos por columna: \n')
print(df.isnull().sum())

Valores nulos por columna: 

Pregnancies                 0
Glucose                     0
BloodPressure               0
SkinThickness               0
Insulin                     0
BMI                         0
DiabetesPedigreeFunction    0
Age                         0
Outcome                     0
dtype: int64



Replace non sense zeros with column average

In [30]:
cols_with_zeros = ["Glucose", "BloodPressure", "SkinThickness", "Insulin", "BMI"]
for col in cols_with_zeros:
    df[col] = df[col].replace(0,df[col].mean())

## 3. Set up dataset for classifcation Neural Network

Separe features and target columns, transform to values

In [31]:
X = df.drop(columns='Outcome').values
y = df['Outcome'].values

Split in train and test sets, then scale them and transform to tensors

In [32]:
#split dataset

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.25,random_state=42, stratify=y)

#scale vars to speed up  convergence
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.fit_transform(X_test)

#transform to tensors

X_train_t = torch.tensor(X_train, dtype = torch.float32)
X_test_t = torch.tensor(X_test, dtype = torch.float32)
y_train_t = torch.tensor(y_train, dtype = torch.float32)
y_test_t = torch.tensor(y_test, dtype = torch.float32)

## 4. Build feed-forward

In [33]:
class DiabetesNN(nn.Module):
    def __init__(self):

        super(DiabetesNN,self).__init__()
        self.fc1 = nn.Linear(8,64)
        self.fc2 = nn.Linear(64,32)
        self.out = nn.Linear(32,1)

        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()

    def forward(self,x):
        
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.sigmoid(self.out(x))
        return x
    
model = DiabetesNN()
print(model)


                

        

DiabetesNN(
  (fc1): Linear(in_features=8, out_features=64, bias=True)
  (fc2): Linear(in_features=64, out_features=32, bias=True)
  (out): Linear(in_features=32, out_features=1, bias=True)
  (relu): ReLU()
  (sigmoid): Sigmoid()
)


Defining Loss function (binary cross entropy) and optimizer (Adam)

In [35]:
criterion = nn.BCELoss()
optimizer = optim.Adam(model.parameters(),lr=0.001)

## 5. Training model

In [39]:
y_train_t = y_train_t.view(-1, 1)
y_test_t = y_test_t.view(-1, 1)

In [40]:
epochs = 200

for epoch in range(epochs):

    y_pred = model(X_train_t)

    loss = criterion(y_pred, y_train_t)

    optimizer.zero_grad()

    loss.backward()

    optimizer.step()

    if(epochs % 10 ) == 0:
        print(f"Epoch [{epoch+1}/{epochs}] - Loss: {loss.item():4f}")

Epoch [1/200] - Loss: 0.669334
Epoch [2/200] - Loss: 0.662339
Epoch [3/200] - Loss: 0.655481
Epoch [4/200] - Loss: 0.648715
Epoch [5/200] - Loss: 0.642039
Epoch [6/200] - Loss: 0.635448
Epoch [7/200] - Loss: 0.628936
Epoch [8/200] - Loss: 0.622505
Epoch [9/200] - Loss: 0.616159
Epoch [10/200] - Loss: 0.609896
Epoch [11/200] - Loss: 0.603686
Epoch [12/200] - Loss: 0.597531
Epoch [13/200] - Loss: 0.591457
Epoch [14/200] - Loss: 0.585466
Epoch [15/200] - Loss: 0.579561
Epoch [16/200] - Loss: 0.573727
Epoch [17/200] - Loss: 0.567981
Epoch [18/200] - Loss: 0.562330
Epoch [19/200] - Loss: 0.556798
Epoch [20/200] - Loss: 0.551374
Epoch [21/200] - Loss: 0.546077
Epoch [22/200] - Loss: 0.540889
Epoch [23/200] - Loss: 0.535832
Epoch [24/200] - Loss: 0.530921
Epoch [25/200] - Loss: 0.526171
Epoch [26/200] - Loss: 0.521580
Epoch [27/200] - Loss: 0.517139
Epoch [28/200] - Loss: 0.512861
Epoch [29/200] - Loss: 0.508745
Epoch [30/200] - Loss: 0.504786
Epoch [31/200] - Loss: 0.500992
Epoch [32/200] - 

## 6. Model performance evaluation

In [44]:
with torch.no_grad():
    y_pred_test = model(X_test_t)
    y_pred_labels = (y_pred_test >= 0.5).float()

accuracy = accuracy_score(y_test_t,y_pred_labels)
f1 = f1_score(y_test_t,y_pred_labels)
report = classification_report(y_test_t, y_pred_labels, digits=4)

print("\n Resultados del modelo:")
print(f"Accuracy: {accuracy:.4f}")
print(f"F1 Score: {f1:.4f}")
print("\nClassification Report:\n", report)



 Resultados del modelo:
Accuracy: 0.7188
F1 Score: 0.5846

Classification Report:
               precision    recall  f1-score   support

         0.0     0.7752    0.8000    0.7874       125
         1.0     0.6032    0.5672    0.5846        67

    accuracy                         0.7188       192
   macro avg     0.6892    0.6836    0.6860       192
weighted avg     0.7152    0.7188    0.7166       192

