The training dataset and the overall structure of the code comes from the following website: https://machinelearningmastery.com/develop-your-first-neural-network-with-pytorch-step-by-step/.

Here, I try to find the most effective neural network model for predicting whether a Pima Indian patient has diabetes. I restrict the model to use mini-batch gradient descent and two hidden layers. Various values for the hyperparameters are tested to find the best set of hyperparameters for this dataset. Around $70\%$ of the dataset will be used for training, and the remaining $30\%$ will be used for testing the accuracy of the trained model.

In [1]:
import numpy as np
import numpy.random as rand
import torch
import torch.nn as nn
import torch.optim as optim
import torch.optim.lr_scheduler as lr_scheduler

In [2]:
dataset = np.loadtxt('pima-indians-diabetes.csv', delimiter=',')
rand.shuffle(dataset)
train = dataset[:int(len(dataset)*.7)]
test = dataset[int(len(dataset)*.7):]
X = torch.tensor(dataset[:, :8], dtype=torch.float32)
X_train = torch.tensor(train[:, :8], dtype=torch.float32)
X_test = torch.tensor(test[:, :8], dtype=torch.float32)
y = torch.tensor(dataset[:, 8], dtype=torch.float32).reshape(-1, 1)
y_train = torch.tensor(train[:, 8], dtype=torch.float32).reshape(-1, 1)
y_test = torch.tensor(test[:, 8], dtype=torch.float32).reshape(-1, 1)

In [10]:
n_hyper = 30

dropout_input = rand.rand(n_hyper) * .2
dropout_hidden = rand.rand(n_hyper) * .3 + .2
n_neurons_1 = (rand.rand(n_hyper) * 14 + 2).astype(np.int_)
n_neurons_2 = (rand.rand(n_hyper) * 14 + 2).astype(np.int_)
lr = 10 ** (rand.rand(n_hyper) * (-2) - 1)
momentum = rand.rand(n_hyper) * .15 + .85
weight_decay = 10 ** (-rand.rand(n_hyper) - 1)
lr_factor = rand.rand(n_hyper) * .09 + .01


n_epochs = 200
batch_size = 10

highest_accuracy = 0

for i in range(n_hyper):
    model = nn.Sequential(
        nn.Dropout(dropout_input[i]),
        nn.Linear(8, n_neurons_1[i]),
        nn.Dropout(dropout_hidden[i]),
        nn.ReLU(),
        nn.Linear(n_neurons_1[i], n_neurons_2[i]),
        nn.Dropout(dropout_hidden[i]),
        nn.ReLU(),
        nn.Linear(n_neurons_2[i], 1),
        nn.Sigmoid()
    )

    loss_fn = nn.BCELoss()
    optimizer = optim.SGD(
        model.parameters(),
        lr=lr[i],
        momentum=momentum[i],
        weight_decay=weight_decay[i]
    )
    scheduler = lr_scheduler.LambdaLR(optimizer, lambda e : 1 / (1 + lr_factor[i] * e))

    model.train()
    for epoch in range(n_epochs):
        for j in range(0, len(X_train), batch_size):
            X_batch = X_train[j:j+batch_size]
            y_pred = model(X_batch)
            y_batch = y_train[j:j+batch_size]
            optimizer.zero_grad()
            loss = loss_fn(y_pred, y_batch)
            loss.backward()
            optimizer.step()
            scheduler.step()

    model.eval()
    y_pred = model(X_test)
    accuracy = (y_pred.round() == y_test).float().mean()
    if highest_accuracy < accuracy:
        highest_accuracy = accuracy
        best_model = model
        best_index = i
    print(f'Trial {i+1},\tAccuracy: {accuracy}')

print(f'Best model: Trial {best_index+1}')
y_pred = model(X)
accuracy = (y_pred.round() == y).float().mean()
print(f'Accuracy over the entire dataset: {accuracy}')
print(f'Dropout rate for the input layer: {dropout_input[best_index]}')
print(f'Dropout rate for the hidden layers: {dropout_hidden[best_index]}')
print(f'Size of the first hidden layer: {n_neurons_1[best_index]}')
print(f'Size of the second hidden layer: {n_neurons_2[best_index]}')
print(f'Learning rate: {lr[best_index]}')
print(f'Momentum: {momentum[best_index]}')
print(f'Weight decay: {weight_decay[best_index]}')
print(f'Learning rate decay factor: {lr_factor[best_index]}')

Trial 1,	Accuracy: 0.6536796689033508
Trial 2,	Accuracy: 0.6536796689033508
Trial 3,	Accuracy: 0.6536796689033508
Trial 4,	Accuracy: 0.6536796689033508
Trial 5,	Accuracy: 0.34632036089897156
Trial 6,	Accuracy: 0.6536796689033508
Trial 7,	Accuracy: 0.6536796689033508
Trial 8,	Accuracy: 0.6536796689033508
Trial 9,	Accuracy: 0.6536796689033508
Trial 10,	Accuracy: 0.6536796689033508
Trial 11,	Accuracy: 0.6536796689033508
Trial 12,	Accuracy: 0.6536796689033508
Trial 13,	Accuracy: 0.6536796689033508
Trial 14,	Accuracy: 0.6536796689033508
Trial 15,	Accuracy: 0.6536796689033508
Trial 16,	Accuracy: 0.6536796689033508
Trial 17,	Accuracy: 0.6536796689033508
Trial 18,	Accuracy: 0.6536796689033508
Trial 19,	Accuracy: 0.6536796689033508
Trial 20,	Accuracy: 0.6536796689033508
Trial 21,	Accuracy: 0.6536796689033508
Trial 22,	Accuracy: 0.6666666865348816
Trial 23,	Accuracy: 0.649350643157959
Trial 24,	Accuracy: 0.6536796689033508
Trial 25,	Accuracy: 0.6536796689033508
Trial 26,	Accuracy: 0.653679668903

It seems that the accuracy does not vary very much between different values for the hyperparameters. My models do not seem to be able to exceed the accuracy level of $0.67$.