## Develop an Artificial Neural Network (ANN) in PyTorch to predict diabetes based on patient health data.

# Dataset
## The dataset consists exclusively of female patients, all aged 21 years or older, belonging to the Pima Indian heritage.

### Pregnancies: Number of times pregnant
### Glucose: Plasma glucose concentration a 2 hours in an oral glucose tolerance test
### BloodPressure: Diastolic blood pressure (mm Hg)
### SkinThickness: Triceps skin fold thickness (mm)
### Insulin: 2-Hour serum insulin (mu U/ml)
### BMI: Body mass index (weight in kg/(height in m)^2)
### DiabetesPedigreeFunction: Diabetes pedigree function
### Age: Age (years)
### Outcome: Class variable (0 or 1)

### Exercise 1 - Classification: Training and optimization for ANN

In [1]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
#importing torch 
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
diabetes_df = pd.read_csv('diabetes.csv')

In [3]:
diabetes_df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [4]:
diabetes_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Pregnancies               768 non-null    int64  
 1   Glucose                   768 non-null    int64  
 2   BloodPressure             768 non-null    int64  
 3   SkinThickness             768 non-null    int64  
 4   Insulin                   768 non-null    int64  
 5   BMI                       768 non-null    float64
 6   DiabetesPedigreeFunction  768 non-null    float64
 7   Age                       768 non-null    int64  
 8   Outcome                   768 non-null    int64  
dtypes: float64(2), int64(7)
memory usage: 54.1 KB


In [5]:
diabetes_df.isnull().sum()

Pregnancies                 0
Glucose                     0
BloodPressure               0
SkinThickness               0
Insulin                     0
BMI                         0
DiabetesPedigreeFunction    0
Age                         0
Outcome                     0
dtype: int64

In [6]:
X = diabetes_df.drop(['Outcome'],axis=1).to_numpy()

In [7]:
y = diabetes_df['Outcome'].to_numpy()

### Scaling Data for better results

In [8]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

### Splitting Data

In [9]:
from sklearn.model_selection import train_test_split

In [10]:
X_train,X_test,y_train,y_test = train_test_split(X, y, test_size=0.2,random_state=24)

In [11]:
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# TASK: Create a Custom Dataset Class for PyTorch
Objective:
The task is to create a custom dataset class, CustomDataset, by inheriting from PyTorch's Dataset class. This class is designed to handle input features and corresponding labels for a machine learning model.

Requirements:

Initialize the class with two inputs: features (input data) and labels (target values).
Convert both features and labels to PyTorch tensors with appropriate data types (float32 for features and long for labels).
Implement the __len__ method to return the number of samples in the dataset.
Implement the __getitem__ method to retrieve a single sample (feature and label pair) by its index.
Expected Behavior:

When the dataset object is created, it should store the input data and labels as tensors.
Calling the len() function on the dataset object should return the total number of samples.
Accessing a sample using an index (like dataset[0]) should return a tuple containing the feature and label at that index.

In [12]:
# create CustomDataset Class

class CustomDataset(Dataset):

    def __init__(self, features, labels):

        self.features = torch.tensor(features, dtype=torch.float32)
        self.labels = torch.tensor(labels, dtype=torch.long)

    def __len__(self):
        # print("Someone called me len..")
        return len(self.features)

    def __getitem__(self, index):

        return self.features[index], self.labels[index]
   

# TASK: Create a train_dataset object using the CustomDataset class with training features (X_train) and labels (y_train)

In [14]:
# create train_dataset object


train_dataset = CustomDataset(X_train, y_train)



In [16]:
len(train_dataset)

614

In [17]:
train_dataset[0]

(tensor([-1.1417,  0.7481,  0.0421, -1.3205, -0.6989,  0.7522, -0.4059, -0.4443]),
 tensor(1))

# TASK: Create a test_dataset object using the CustomDataset class with test features (X_test) and labels (y_test)

In [18]:
# create test_dataset object


test_dataset = CustomDataset(X_test, y_test)



In [20]:
len(test_dataset)

154

# TASK: Create DataLoaders for Batch Processing

In [21]:
# create train and test loader


train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)



In [23]:
len(train_loader)

20

In [24]:
len(test_loader)

5

# TASK: Build a ANN model using PyTorch

### Let us attempt to build a ANN on our own using PyTorch
Let's divide our task in following steps-
1. Define a class name ANN_Model, which is a subclass of nn.Module class which is Base class for all neural network modules.
2. the second step is defining the init function which will take input features, number of neurons in a hidden layer and output features(2 in the case of binary classification)
3. Inside the forward function we'll activate our fully connected layers, by relu activation function
4. And also perform forward propogation

## Three fully connected (linear) layers:
### First layer that takes input_dim inputs and outputs 25 features.
### Second layer that takes 25 inputs and outputs 20 features.
### Output layer that takes 20 features and outputs output_dim features.

## In the forward method, apply the following transformations to the input data:
### Pass the input through First layer and apply ReLU activation.
### Pass the result through Second layer and apply ReLU activation.
### Pass the result through the output layer out.

In [25]:
class ANN_Model(nn.Module):
    def __init__(self,input_dim, output_dim):
        super().__init__()
        
        
        self.f_connected1 = nn.Linear(input_dim, 25)
        self.f_connected2 = nn.Linear(25, 20)
        self.out = nn.Linear(20, output_dim)
        
        
        
    def forward(self, x):
       
        
        out = F.relu(self.f_connected1(x))
        out = F.relu(self.f_connected2(out))
        out = self.out(out)
        
       
        
        return out

### We'll seed to prevent randomness

In [27]:
torch.manual_seed(20)

<torch._C.Generator at 0x7f5aead69bf0>

In [28]:
input_dim = 8
output_dim = 2
model = ANN_Model(input_dim, output_dim)

### Checking model parameters

In [29]:
model.parameters

<bound method Module.parameters of ANN_Model(
  (f_connected1): Linear(in_features=8, out_features=25, bias=True)
  (f_connected2): Linear(in_features=25, out_features=20, bias=True)
  (out): Linear(in_features=20, out_features=2, bias=True)
)>

### Define loss as Cross Entropy loss

In [30]:
loss_function= nn.CrossEntropyLoss()

# TASK: Create an Adam optimizer

## Create an Adam optimizer for a PyTorch model with a specified learning rate (lr=0.01) 

In [31]:


optimizer = torch.optim.Adam(model.parameters(),lr=0.01)



In [32]:
# optimizer= torch.optim.SGD(model.parameters(),lr=0.01)

In [33]:
# optimizer= torch.optim.SGD(model.parameters(),lr=0.01, momentum=0.9)

### Let us run our model...

Here for each epoch we are calculating loss and appending it in final losses, then we'll optimize our loss and perform back propagation on it Pytorch provides .backward function for the same

# TASK-: Training ANN with Gradient Descent Optimization

In [36]:
epochs=10
final_losses=[]
for epoch in range(epochs):
    total_epoch_loss = 0
    for batch_features, batch_labels in train_loader:
        
        #forward feed
        y_pred = model(batch_features)
        #calculate the loss
        loss = loss_function(y_pred, batch_labels)
       
        optimizer.zero_grad()
        # backward propagation: calculate gradients
        # update the weights
      

        loss.backward()

        optimizer.step()
        
        total_epoch_loss = total_epoch_loss + loss.item()


    avg_loss = total_epoch_loss/len(train_loader)
    print(f'Epoch: {epoch + 1} , Loss: {avg_loss}')    

Epoch: 1 , Loss: 0.5782179325819016
Epoch: 2 , Loss: 0.45898606479167936
Epoch: 3 , Loss: 0.44652786254882815
Epoch: 4 , Loss: 0.42196218073368075
Epoch: 5 , Loss: 0.418367475271225
Epoch: 6 , Loss: 0.43789630830287934
Epoch: 7 , Loss: 0.4293580144643784
Epoch: 8 , Loss: 0.4362384557723999
Epoch: 9 , Loss: 0.40420584082603456
Epoch: 10 , Loss: 0.4086001768708229


[NVSHARE][WARN]: Couldn't open file /var/run/secrets/kubernetes.io/serviceaccount/namespace to read Pod namespace
[NVSHARE][INFO]: Successfully initialized nvshare GPU
[NVSHARE][INFO]: Client ID = 082310c27d56fc49


In [38]:
# set model to eval mode
model.eval()

ANN_Model(
  (f_connected1): Linear(in_features=8, out_features=25, bias=True)
  (f_connected2): Linear(in_features=25, out_features=20, bias=True)
  (out): Linear(in_features=20, out_features=2, bias=True)
)

### Creating Predictions

for this you simply have to pass your validation data to model and simply append all the predictions


In [39]:
# evaluation code
total = 0
correct = 0

#predictions in data
with torch.no_grad():

    for batch_features, batch_labels in test_loader:

        outputs = model(batch_features)
        # print(outputs)
        max_logit, predicted = torch.max(outputs, dim=1)
        # print(max_logit)
        # print(predicted)
        total = total + batch_labels.shape[0]

        correct = correct + (predicted == batch_labels).sum().item()

print(correct/total)
        

0.7467532467532467
