In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score

import configparser
config = configparser.ConfigParser()
config.read('config.ini')

import torch
from torch import nn

print(torch.__version__)
device = "cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu"
print(f"Using {device} device")

import wandb
wandb.login()


  from .autonotebook import tqdm as notebook_tqdm


1.12.1
Using mps device


[34m[1mwandb[0m: Currently logged in as: [33mmhrnciar[0m ([33mnsiete-hrnciar-katkovcin[0m). Use [1m`wandb login --relogin`[0m to force relogin


True

## Data loading and train-test split

First, we load cleaned data and created two datasets - labels (y) and predictors (X) - which were further split into train (80%) and test (20%) sets. To ensure the split will always be the same, we also set the random state seed.

In [2]:
df = pd.read_csv('Data/cleaned.csv', index_col=0)
df

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,0.352941,0.670968,0.489796,0.304348,0.186899,0.314928,0.234415,0.483333,1
1,0.058824,0.264516,0.428571,0.239130,0.106370,0.171779,0.116567,0.166667,0
2,0.470588,0.896774,0.408163,0.271739,0.186899,0.104294,0.253629,0.183333,1
3,0.058824,0.290323,0.428571,0.173913,0.096154,0.202454,0.038002,0.000000,0
4,0.000000,0.600000,0.163265,0.304348,0.185096,0.509202,0.943638,0.200000,1
...,...,...,...,...,...,...,...,...,...
763,0.588235,0.367742,0.530612,0.445652,0.199519,0.300613,0.039710,0.700000,0
764,0.117647,0.503226,0.469388,0.217391,0.106370,0.380368,0.111870,0.100000,0
765,0.294118,0.496774,0.489796,0.173913,0.117788,0.163599,0.071307,0.150000,0
766,0.058824,0.529032,0.367347,0.271739,0.186899,0.243354,0.115713,0.433333,1


In [3]:
X, y = df.drop('Outcome', axis=1).values, df.Outcome.values

X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8, shuffle=True, random_state=42)

X_train = torch.FloatTensor(X_train)
X_test = torch.FloatTensor(X_test)
y_train = torch.FloatTensor(y_train).unsqueeze(1)
y_test = torch.FloatTensor(y_test).unsqueeze(1)

## Neural network

Our neural network consists of input layer (8 neurons), one hidden (4 neurons) and an output layer (1 neuron). All layers except output have ReLU activation function and the output layer has Sigmoid activation function, because we need the output to be in interval <0, 1>. Because we are doing binary classification, we used binary cross entropy loss (BCELoss) and as an optimizer we used Adam.

Additionally, learning rate scheduler has been used to make the learning rate variable - higher at the beginning of the training and getting gradually smaller as we are nearing the minimum.

### Parameters

- epochs = 500
- base learing rate = 0.001
- max learning rate = 0.01
- annealing strategy = linear
- steps per epoch = 1

### Experiments

Link to wandb report: https://api.wandb.ai/links/nsiete-hrnciar-katkovcin/0haxn726

1. **Bigger net** `(onecycle-cos-deeper)` - input layer (8 neurons), two hidden layers (16 and 8 neurons) and an output layer (1 neuron). Activation function ReLU on all layers except for output, which is ended with Sigmoid, BCELoss loss function, Adam optimizer, and OneCycle scheduler with _linear_ annealing strategy. This net led to overfitting, which is indicated by validation loss diverging from training loss - training loss decreases while validation loss increases (training accuracy increases while validation accuracy decreases).

2. **Cyclic scheduler** `(cyclic-scheduler-triangular2)` - input layer (8 neurons), one hidden (4 neurons) and an output layer (1 neuron). Activations, loss and optimizer same as before, scheduler was changed to Cyclic scheduler with triangular2 strategy. This net didn't overfit, mostly due to smaller net, but it also had a smaller accuracy of 82% and higher loss.

3. **No scheduler with learning rate 0.001** `(no-scheduler-0.001)` - same net without scheduler yielded higher loss and smaller accuracy of 77%. This may be due to small learning rate and small step which greatly increased the training time and the net could have ended in local minimum, or it didn't have enough time (epochs) to converge to good enough result.

4. **No scheduler with learning rate 0.01** `(no-scheduler-0.001)` - parameters were the same as in previous experiment, but with higher learning rate. In this case net managed to converge to better result with 86% validation accuracy. It however led to overfitting which is indicated by diverging losses.

5. **OneCycle scheduler with _cos_ annealing strategy** `(onecycle-cos-1)` - same net with OneCycle scheduler with _cos_ annealing strategy. This net ended up with 84% validation accuracy, but as before, it started to overfit. In the report, we can see that the learning rate was increased by the scheduler until the maximum and then gradually decreased until the end of the training.

6. **OneCycle scheduler with _linear_ annealing strategy** `(onecycle-linear-1)` - same net with OneCycle scheduler with _linear_ annealing strategy. This net yielded with slightly better results than with _cos_ strategy, with 85% accuracy. It also didn't overfit, but ended up with higher loss than in the previous experiment.

In [4]:
class NeuralNetwork(nn.Module):
    def __init__(self, input_features=8, hidden1=8, hidden2=4, out_features=1):
        super().__init__()
        self.f_connected1 = nn.Linear(input_features, hidden1)
        self.f_connected2 = nn.Linear(hidden1, hidden2)
        self.out = nn.Linear(hidden2, out_features)

        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()

    def forward(self,x):
        x = self.relu(self.f_connected1(x))
        x = self.relu(self.f_connected2(x))
        x = self.sigmoid(self.out(x))
        
        return x


model = NeuralNetwork()
print(model)

loss_fn = nn.BCELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=config['torch'].getfloat('start_lr'))

scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, 
                                                max_lr=config['torch'].getfloat('max_lr'),
                                                #base_lr=config['torch'].getfloat('start_lr'),
                                                epochs=config['default'].getint('epochs'),
                                                steps_per_epoch=1,
                                                anneal_strategy=config['torch']['strategy'],
                                                cycle_momentum=False)

run = wandb.init(project="basic-nn-torch", id="onecycle-linear-1")
wandb.config.update(config)
wandb.watch(model)

NeuralNetwork(
  (f_connected1): Linear(in_features=8, out_features=8, bias=True)
  (f_connected2): Linear(in_features=8, out_features=4, bias=True)
  (out): Linear(in_features=4, out_features=1, bias=True)
  (relu): ReLU()
  (sigmoid): Sigmoid()
)


[]

During training is calculated loss and accuracy and the learning rate is adjusted by the scheduler. During testing is calculated the testing loss and accuracy, as well as F1-score, and all metrics are logged to wandb.

In [5]:
train_losses, val_losses = [], []

for i in range(config['default'].getint('epochs')):
    i += 1
    y_pred = model.forward(X_train)
    train_loss = loss_fn(y_pred, y_train)
    train_losses.append(train_loss)

    y_pred = (y_pred > 0.5).int()
    train_accuracy = accuracy_score(y_train.squeeze(1).int(), y_pred)

    optimizer.zero_grad()
    train_loss.backward()
    optimizer.step()

    wandb.log({'learning_rate': optimizer.param_groups[0]['lr']})
    scheduler.step()
    
    with torch.no_grad():
        y_pred = model(X_test)
        val_loss = loss_fn(y_pred, y_test)
        val_losses.append(val_loss)

        y_pred = (y_pred > 0.5).int()
        wandb.log({'training_loss': train_loss, 'validation_loss': val_loss}, commit=False)

        f1_none = f1_score(y_test.squeeze(1).int(), y_pred, average=None)
        f1_none = {'f1_none/' + str(e): v for e,v in enumerate(f1_none)}
        wandb.log(f1_none, commit=False)

        f1_macro = f1_score(y_test.squeeze(1).int(), y_pred, average='macro')
        wandb.log({'f1_macro': f1_macro}, commit=False)
        
        val_accuracy = accuracy_score(y_test.squeeze(1).int(), y_pred)
        wandb.log({'train_accuracy': train_accuracy, 'val_accuracy': val_accuracy})

    if i % 10 == 0:
        print(f'Epoch {i}')
        print('-' * 40)
        print(f'Training loss: {train_loss}, validation loss: {val_loss}', end='\n\n')

Epoch 10
----------------------------------------
Training loss: 0.7444421648979187, validation loss: 0.739622950553894

Epoch 20
----------------------------------------
Training loss: 0.7351696491241455, validation loss: 0.7300244569778442

Epoch 30
----------------------------------------
Training loss: 0.7225698232650757, validation loss: 0.7175487875938416

Epoch 40
----------------------------------------
Training loss: 0.7075045704841614, validation loss: 0.7030715942382812

Epoch 50
----------------------------------------
Training loss: 0.6906334757804871, validation loss: 0.6877319812774658

Epoch 60
----------------------------------------
Training loss: 0.6759305596351624, validation loss: 0.6757914423942566

Epoch 70
----------------------------------------
Training loss: 0.6669202446937561, validation loss: 0.6699316501617432

Epoch 80
----------------------------------------
Training loss: 0.6590375304222107, validation loss: 0.6629076600074768

Epoch 90
----------------

In [6]:
torch.save(model.state_dict(), "models/model.pth")
wandb.save('runs/pima_run_2023-03-29')
wandb.finish()

0,1
f1_macro,▁▁▁▂▃▃▃▃▃▆▇▇▇▇▇▇▇▇▇▇█▇██████████████████
f1_none/0,▁▁▁▂▇▇▇▇▇▇▇▇▇▇▇▇▇███████████████████████
f1_none/1,▅▅▅▅▁▁▁▁▁▅▇▇▇▇▇▇▇▇▇▇████████████████████
learning_rate,▁▂▂▃▄▄▅▅▆▇▇████▇▇▇▆▆▆▆▅▅▅▅▄▄▄▄▃▃▃▃▂▂▂▁▁▁
train_accuracy,▁▁▁▁▅▅▅▅▅▆▆▇▇▇▇▇▇▇▇▇▇▇▇▇████████████████
training_loss,███▇▇▇▇▆▆▅▄▃▃▃▃▃▃▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
val_accuracy,▁▁▁▁▅▅▅▅▅▆▇▇▇▇▇▇▇▇▇▇▇▇█▇▇███████████████
validation_loss,███▇▇▇▆▆▅▄▃▃▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
f1_macro,0.86073
f1_none/0,0.89691
f1_none/1,0.82456
learning_rate,0.0
train_accuracy,0.87622
training_loss,0.313
val_accuracy,0.87013
validation_loss,0.40686


At the end of training, we can calculate the final testing accuracy - the last run yielded testing accuracy of about 87%. This accuracy is not the best, but with the size of the net we have, it is quite good. It could be increased by making the net bigger, but that also increases the risk of overfitting, so we would need to implement regularization or other method of overfit prevention.

In [10]:
predictions=[]

with torch.no_grad():
    for i, data in enumerate(X_test):
        y_pred = model(data)
        predictions.append((y_pred > 0.5).int().item())

score = accuracy_score(y_test, predictions)
print(f'Validation accuracy: {score * 100:.3f}%')

Validation accuracy: 87.013%
