## HealthHabit - Hospital Multiclassification Problem

The Recent COVID-19 pandemic has raised alarms over one of the most overlooked areas to focus on: Healthcare Management. While healthcare management has various use cases for using data science, patient length of stay or LOS in short is one critical parameter to observe and predict if one wants to improve the efficiency of healthcare management in a hospital. 


This parameter helps hospitals to identify patients of high LOS risk (patients who will stay longer) at the time of admission. Once identified, patients with high LOS risk can have their treatment plan optimized to minimize LOS and lower the chance of staff/visitor infection. Also, prior knowledge of LOS can aid in logistics such as room and bed allocation planning.Ypose you have been hired as a Data Scieforist at HealthHabitat – a not-for-profit organization dedicated to managing the functioning of Hospitals professionally and optimally.
The task is to accurately predict the Length of Stay for each patient on a case-by-case basis so that the Hospitals can use this information for optimal resource allocation and better function
PREDICT: The length of stay is divided into 11 different classes ranging from 0-10 days to more than 100 days.   
 days.   


So let's go ahead and import our dataset first and take a look at the columns present in it. Note that i will not be running the codes in real time, owing to the processing time and the size of the dataset.

### Import the needed packages

In [24]:
import pandas as pd
import numpy as np

### Load the dataset

In [25]:

# Load the dataset
df = pd.read_csv('hospital_stay_data.csv')


### Check for null values in our dataset and verify the type of each column

In [26]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 318438 entries, 0 to 318437
Data columns (total 18 columns):
 #   Column                             Non-Null Count   Dtype  
---  ------                             --------------   -----  
 0   case_id                            318438 non-null  int64  
 1   Hospital_code                      318438 non-null  int64  
 2   Hospital_type_code                 318438 non-null  object 
 3   City_Code_Hospital                 318438 non-null  int64  
 4   Hospital_region_code               318438 non-null  object 
 5   Available Extra Rooms in Hospital  318438 non-null  int64  
 6   Department                         318438 non-null  object 
 7   Ward_Type                          318438 non-null  object 
 8   Ward_Facility_Code                 318438 non-null  object 
 9   Bed Grade                          318325 non-null  float64
 10  patientid                          318438 non-null  int64  
 11  City_Code_Patient                  3139

### Data Preprocessing

Let's quickly move onto data preprocessing. Just like we did earlier, first lets look at the columns with null values.

In [27]:
# Find columns with NaN values
columns_with_nans = df.columns[df.isna().any()].tolist()

# Print the names of columns with NaNs
print("Columns with NaNs:", columns_with_nans)

Columns with NaNs: ['Bed Grade', 'City_Code_Patient']


#### Drop rows with missing values from these columns

There are very few missing values compared to the size of the data. Here, only the two columns have null values, let's drop the rows from the bed grade and city code patient columns from the dataset 

In [28]:
df = df.dropna()
df.count()

case_id                              313793
Hospital_code                        313793
Hospital_type_code                   313793
City_Code_Hospital                   313793
Hospital_region_code                 313793
Available Extra Rooms in Hospital    313793
Department                           313793
Ward_Type                            313793
Ward_Facility_Code                   313793
Bed Grade                            313793
patientid                            313793
City_Code_Patient                    313793
Type of Admission                    313793
Severity of Illness                  313793
Visitors with Patient                313793
Age                                  313793
Admission_Deposit                    313793
Stay                                 313793
dtype: int64

Next, let's check the data distribution for each of the 11 classes to decide on the most suitable evaluation metric for this problem statement.

In [29]:
# The column "Stay" is what we have to predict. Check the distribution of the 'Stay' column
class_distribution = df['Stay'].value_counts(normalize=True) * 100

print("Class Distribution (%):\n", class_distribution)

Class Distribution (%):
 Stay
21-30                 27.507306
11-20                 24.568744
31-40                 17.308225
51-60                 10.982718
0-10                   7.409343
41-50                  3.677902
71-80                  3.217408
More than 100 Days     2.086726
81-90                  1.517242
91-100                 0.864583
61-70                  0.859802
Name: proportion, dtype: float64


Since the data is distributed among the 11 different classes, we can go ahead with accuracy as our evaluation metric for this problem.

Next, let's identify the categorical and numerical columns and proceed to transform them. Here we're going to use a standard scaler for the numerical columns and one hot encoder for the catergorical columns since we've a large number of rows in this dataset. After that we shall split our data into train and test data.

#### Identify the categorical and numerical columns

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import make_pipeline

# Encode categorical variables and normalize numerical variables
categorical_columns = ['Hospital_code', 'Hospital_type_code', 'City_Code_Hospital', 'Hospital_region_code', 'Department', 'Ward_Type', 'Ward_Facility_Code', 'Bed Grade', 'City_Code_Patient','Type of Admission', 'Severity of Illness', 'Age']
numerical_columns = ['Available Extra Rooms in Hospital', 'Visitors with Patient', 'Admission_Deposit']

# Encoding and Normalizing
preprocessor = ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), numerical_columns),
        ('cat', OneHotEncoder(), categorical_columns)
    ])

# Preparing target
label_encoder = LabelEncoder()
df['Stay'] = label_encoder.fit_transform(df['Stay'])

from torch.utils.data import DataLoader, TensorDataset

# Splitting the dataset
X = df.drop(['patientid','Stay'], axis=1)
y = df['Stay']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Apply preprocessing: create a model based on the dataset and then transform the dataset based on the created model
X_train = preprocessor.fit_transform(X_train)
X_test = preprocessor.transform(X_test)

With that we've completed the data preprocessing. Next, let's convert them into tensors using Tensor dataset and dataloaders. Also lets use a batch_size of 64.

In [None]:
# Convert to PyTorch tensors
X_train_tensor = torch.tensor(X_train.toarray().astype(np.float32))
y_train_tensor = torch.tensor(y_train.values.astype(np.int64))
X_test_tensor = torch.tensor(X_test.toarray().astype(np.float32))
y_test_tensor = torch.tensor(y_test.values.astype(np.int64))

# Prepare DataLoader
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
test_dataset = TensorDataset(X_test_tensor, y_test_tensor)

batch_size = 64
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

Let's now define our neural network. Here I've defined my neural network with ReLu activation function in both layers. Notice that, there is no activation function in the last layer. We would be using a softmax activation function on the output of the last layer, since this is a multiclassification problem.

### Define our neural network class

In [32]:
class HospitalStayNet(nn.Module):
    def __init__(self, num_features, num_classes):
        super(HospitalStayNet, self).__init__()
        self.fc1 = nn.Linear(num_features, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, num_classes)
        
    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


### Instantiate our neural network

Let's set basic parameters like number of features and classes, and instantiate our model.

In [33]:
# Determine the number of features from the column count of the dataset
num_features = X_train.shape[1]
num_classes = 11  # As mentioned, there are 11 classes
# Use GPU, if it is available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'Using device: {device}')
model = HospitalStayNet(num_features, num_classes).to(device)


Using device: cpu


Next, let's define the function that calculates our trained model's accuracy. In this code, the function sets the model to evaluation mode, iterates over batches of input data, computes the model's outputs, identifies the class with the highest prediction score, and compares it with the true labels to tally the correct predictions, returning the percentage accuracy.

### Define the function that calculates our trained model's accuracy

In [34]:
import torch
import torch.nn.functional as F
import torch.nn as nn
from torch.optim import Adam
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)


def calculate_accuracy(loader):
    model.eval()  # Set the model to evaluation mode. 

    correct = 0  # the number of correct predictions.
    total = 0  # the total number of predictions.

    with torch.no_grad():  # Disable the gradient calculation to save memory and speed up the process since gradients are not needed for evaluation.
        # Iterate over the data loader, which provides batches of inputs and their corresponding targets.
        for inputs, targets in loader:  
            inputs, targets = inputs.to(device), targets.to(device)  

            outputs = model(inputs)  # Compute the model's outputs for the given inputs.

            # Find the predicted class with the highest score for each input. 
            # The `torch.max` function returns both the maximum values and their indices (the predicted classes)
            _, predicted = torch.max(outputs.data, 1)  
            # targets.size(0) gives the number of targets in the batch.
            total += targets.size(0)  
            # Calculate the number of correct predictions in the batch by comparing predicted with targets, summing the true predictions, and adding this sum to the correct counter.
            correct += (predicted == targets).sum().item()  

    return 100 * correct / total  


#### Define the loss function and the optimizer to use

Note that the Softmax activation for our multiclass classification problem is automatically applied in pytorch's CrossEntropyLoss function during computation.

In [35]:
# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = Adam(model.parameters(), lr=0.001)

We've setup our model successfully. So let's go ahead and train our model.

### Train the model

For training our model, i've set the epochs to 20 to once again reduce the processing time. Further i've also added early stopping to this model with the max_patience set to 3. Note that, over here, i've also added an additional line of code to save the model with the best accuracy (HL). We will be using different names for the best model for the different architectures that we try. Take a look at the code.

In [37]:

# Training loop
num_epochs = 20
best_accuracy = float(0.01)
patience = 0
max_patience = 3  # Maximum epochs to wait for improvement

for epoch in range(num_epochs):
    model.train()       # Set the model to training mode. 
    # Iterate over the data loader, which provides batches of inputs and their corresponding targets.
    for inputs, targets in train_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        optimizer.zero_grad() # initialize the gradients for this batch of data
        outputs = model(inputs)
        loss = criterion(outputs, targets) # calculate the losses
        loss.backward()  # compute the gradients based on the loss values
        optimizer.step() # update the weights and biases based on the loss values
    
    train_accuracy = calculate_accuracy(train_loader)
    test_accuracy = calculate_accuracy(test_loader)
    if test_accuracy > best_accuracy:
        best_accuracy = test_accuracy
        patience = 0
        torch.save(model, 'best_model_simple.pt')
    else:
        patience += 1

    if patience >= max_patience:
        print(f'Early stopped at {epoch+1}')
        break  # Stop training
    
    print(f'Epoch {epoch+1}, Loss: {loss.item():.4f}, Train Accuracy: {train_accuracy:.2f}%, Test Accuracy: {test_accuracy:.2f}%')


Epoch 1, Loss: 1.6718, Train Accuracy: 44.10%, Test Accuracy: 42.81%
Epoch 2, Loss: 1.4220, Train Accuracy: 44.05%, Test Accuracy: 42.61%
Epoch 3, Loss: 1.4790, Train Accuracy: 44.09%, Test Accuracy: 42.81%
Epoch 4, Loss: 1.3952, Train Accuracy: 44.27%, Test Accuracy: 42.78%
Epoch 5, Loss: 1.3607, Train Accuracy: 44.29%, Test Accuracy: 42.56%
Early stopped at 6


Here, we can observe that even though we had set the num of epochs to 20, we have got the best performance of 42.81% on the test set in the 3rd epoch. After the 3rd epoch, since the performance did not improve in the next 3 epochs, the model stopped after the 6th epoch. Feel free to increase the number of epochs and the max_patience to see if you get better results.

Next, let's try and improve our models performance by increasing the number of layers and nodes.

### Define a newer model with more layers and nodes

In [38]:
class HospitalStayNet256(nn.Module):
    def __init__(self, num_features, num_classes):
        super(HospitalStayNet256, self).__init__()
        self.fc1 = nn.Linear(num_features, 128)
        self.fc2 = nn.Linear(128, 256)
        self.fc3 = nn.Linear(256, 128)
        self.fc4 = nn.Linear(128, 64)
        self.fc5 = nn.Linear(64, num_classes)
        
    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = F.relu(self.fc4(x))
        x = self.fc5(x)
        return x
model = HospitalStayNet256(num_features, num_classes).to(device)
optimizer = Adam(model.parameters(), lr=0.001)

### Train the new model

Once again for this larger model i've gone with the same setup as earlier, that is, 20 epochs, a max patience of 3 and to save the best performing model.

In [39]:

# Training loop
num_epochs = 20
best_accuracy = float(0.01)
patience = 0
max_patience = 3  # Maximum epochs to wait for improvement

for epoch in range(num_epochs):
    model.train()       # Set the model to training mode. 
    # Iterate over the data loader, which provides batches of inputs and their corresponding targets.
    for inputs, targets in train_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        optimizer.zero_grad() # initialize the gradients for this batch of data
        outputs = model(inputs)
        loss = criterion(outputs, targets) # calculate the losses
        loss.backward()  # compute the gradients based on the loss values
        optimizer.step() # update the weights and biases based on the loss values
    
    train_accuracy = calculate_accuracy(train_loader)
    test_accuracy = calculate_accuracy(test_loader)
    if test_accuracy > best_accuracy:
        best_accuracy = test_accuracy
        patience = 0
        torch.save(model, 'best_model_big.pt')
    else:
        patience += 1

    if patience >= max_patience:
        print(f'Early stopped at {epoch+1}')
        break  # Stop training
    
    print(f'Epoch {epoch+1}, Loss: {loss.item():.4f}, Train Accuracy: {train_accuracy:.2f}%, Test Accuracy: {test_accuracy:.2f}%')


Epoch 1, Loss: 1.3710, Train Accuracy: 42.51%, Test Accuracy: 42.80%
Epoch 2, Loss: 1.4949, Train Accuracy: 42.64%, Test Accuracy: 42.78%
Epoch 3, Loss: 1.3632, Train Accuracy: 42.66%, Test Accuracy: 42.62%
Epoch 4, Loss: 1.5679, Train Accuracy: 42.94%, Test Accuracy: 42.89%
Epoch 5, Loss: 1.1875, Train Accuracy: 43.36%, Test Accuracy: 42.91%
Epoch 6, Loss: 1.2564, Train Accuracy: 43.50%, Test Accuracy: 43.00%
Epoch 7, Loss: 1.2125, Train Accuracy: 43.88%, Test Accuracy: 43.10%
Epoch 8, Loss: 1.3437, Train Accuracy: 44.07%, Test Accuracy: 43.11%
Epoch 9, Loss: 1.6099, Train Accuracy: 44.10%, Test Accuracy: 42.76%
Epoch 10, Loss: 1.6654, Train Accuracy: 44.42%, Test Accuracy: 42.86%
Early stopped at 11


Note that the more complex model achieves a slightly better test accuracy. The accuracy has improved from 42.81 to 43.11 in the 8th epoch. Although if you look at the 10th epoch, it appears the model is slightly overfitting.

So let's try different techniques such as dropout and batch normalization to see if the performance can be improved further.

Let's take a look at the model performance with dropout.

### Define another model with dropout layers

In [40]:
class HospitalStayNetWithDropout(nn.Module):
    def __init__(self, num_features, num_classes):
        super(HospitalStayNetWithDropout, self).__init__()
        self.fc1 = nn.Linear(num_features, 128)
        self.dropout1 = nn.Dropout(0.5)  # Dropout layer with 50% probability
        self.fc2 = nn.Linear(128, 64)
        #self.dropout2 = nn.Dropout(0.5)  # Another Dropout layer
        self.fc3 = nn.Linear(64, num_classes)
        
    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = self.dropout1(x)
        x = F.relu(self.fc2(x))
        # x = self.dropout2(x)
        x = self.fc3(x)
        return x

model = HospitalStayNetWithDropout(num_features, num_classes).to(device)
optimizer = Adam(model.parameters(), lr=0.001)

#### Train the model with the dropout layers

In [41]:

# Training loop
num_epochs = 20
best_accuracy = float(0.01)
patience = 0
max_patience = 3  # Maximum epochs to wait for improvement

for epoch in range(num_epochs):
    model.train()       # Set the model to training mode. 
    # Iterate over the data loader, which provides batches of inputs and their corresponding targets.
    for inputs, targets in train_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        optimizer.zero_grad() # initialize the gradients for this batch of data
        outputs = model(inputs)
        loss = criterion(outputs, targets) # calculate the losses
        loss.backward()  # compute the gradients based on the loss values
        optimizer.step() # update the weights and biases based on the loss values
    
    train_accuracy = calculate_accuracy(train_loader)
    test_accuracy = calculate_accuracy(test_loader)
    if test_accuracy > best_accuracy:
        best_accuracy = test_accuracy
        patience = 0
        torch.save(model, 'best_model_dropout.pt')
    else:
        patience += 1

    if patience >= max_patience:
        print(f'Early stopped at {epoch+1}')
        break  # Stop training
    
    print(f'Epoch {epoch+1}, Loss: {loss.item():.4f}, Train Accuracy: {train_accuracy:.2f}%, Test Accuracy: {test_accuracy:.2f}%')


Epoch 1, Loss: 2.0530, Train Accuracy: 41.28%, Test Accuracy: 41.62%
Epoch 2, Loss: 1.7034, Train Accuracy: 42.16%, Test Accuracy: 42.47%
Epoch 3, Loss: 1.5578, Train Accuracy: 42.42%, Test Accuracy: 42.58%
Epoch 4, Loss: 1.5634, Train Accuracy: 42.50%, Test Accuracy: 42.64%
Epoch 5, Loss: 1.7391, Train Accuracy: 42.63%, Test Accuracy: 42.83%
Epoch 6, Loss: 1.7118, Train Accuracy: 42.61%, Test Accuracy: 42.73%
Epoch 7, Loss: 1.5481, Train Accuracy: 42.88%, Test Accuracy: 43.04%
Epoch 8, Loss: 1.3582, Train Accuracy: 42.76%, Test Accuracy: 42.95%
Epoch 9, Loss: 1.5448, Train Accuracy: 42.81%, Test Accuracy: 42.88%
Early stopped at 10


Here, we've got our best performance in the 7th epoch. Note that although the model with dropout layers has lesser test accuracy at 43.04% compared to the earlier model, the training and test scores and very close now.

Let's quickly try batch normalization too and analyze the models performance.

### Define another model with Batch Normalization. 

In [42]:
class HospitalStayNetWithBN(nn.Module):
    def __init__(self, num_features, num_classes):
        super(HospitalStayNetWithBN, self).__init__()
        self.fc1 = nn.Linear(num_features, 128)
        self.bn1 = nn.BatchNorm1d(128)
        self.fc2 = nn.Linear(128, 64)
        self.bn2 = nn.BatchNorm1d(64)
        self.fc3 = nn.Linear(64, num_classes)
        
    def forward(self, x):
        x = F.relu(self.bn1(self.fc1(x)))
        x = F.relu(self.bn2(self.fc2(x)))
        x = self.fc3(x)
        return x


model = HospitalStayNetWithBN(num_features, num_classes).to(device)
optimizer = Adam(model.parameters(), lr=0.001)

#### Train the model with batch normalization

In [43]:

# Training loop
num_epochs = 20
best_accuracy = float(0.01)
patience = 0
max_patience = 3  # Maximum epochs to wait for improvement

for epoch in range(num_epochs):
    model.train()       # Set the model to training mode. 
    # Iterate over the data loader, which provides batches of inputs and their corresponding targets.
    for inputs, targets in train_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        optimizer.zero_grad() # initialize the gradients for this batch of data
        outputs = model(inputs)
        loss = criterion(outputs, targets) # calculate the losses
        loss.backward()  # compute the gradients based on the loss values
        optimizer.step() # update the weights and biases based on the loss values
    
    train_accuracy = calculate_accuracy(train_loader)
    test_accuracy = calculate_accuracy(test_loader)
    if test_accuracy > best_accuracy:
        best_accuracy = test_accuracy
        patience = 0
        torch.save(model, 'best_model_BN.pt')
    else:
        patience += 1

    if patience >= max_patience:
        print(f'Early stopped at {epoch+1}')
        break  # Stop training
    
    print(f'Epoch {epoch+1}, Loss: {loss.item():.4f}, Train Accuracy: {train_accuracy:.2f}%, Test Accuracy: {test_accuracy:.2f}%')


Epoch 1, Loss: 1.5455, Train Accuracy: 42.21%, Test Accuracy: 42.35%
Epoch 2, Loss: 1.7564, Train Accuracy: 42.61%, Test Accuracy: 42.47%
Epoch 3, Loss: 1.5042, Train Accuracy: 42.93%, Test Accuracy: 42.91%
Epoch 4, Loss: 1.2331, Train Accuracy: 43.09%, Test Accuracy: 42.87%
Epoch 5, Loss: 1.6523, Train Accuracy: 43.07%, Test Accuracy: 42.69%
Early stopped at 6


The model with batch normalization has not superseded the performance of the dropout model we created earlier.

Next, let's try hypertune the model parameters using the Optuna library.

## Hyperparameter optimization with Optuna

Let's create a framework model with 2 hidden layers. You can try tuning a larger network as well.

In [46]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class HospitalStayNetOptuna(nn.Module):
    def __init__(self, num_features, num_classes, fc1_units=128, fc2_units=64):
        super(HospitalStayNetOptuna, self).__init__()
        self.fc1 = nn.Linear(num_features, fc1_units)
        self.fc2 = nn.Linear(fc1_units, fc2_units)
        self.fc3 = nn.Linear(fc2_units, num_classes)
        
    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


Let's slightly modify the calculate_accuracy function to take different models. 

In [47]:
def calculate_accuracy_optuna(loader, model):
    model.eval()  # Set the model to evaluation mode. 

    correct = 0  # the number of correct predictions.
    total = 0  # the total number of predictions.

    with torch.no_grad():  # Disable the gradient calculation to save memory and speed up the process since gradients are not needed for evaluation.
        # Iterate over the data loader, which provides batches of inputs and their corresponding targets.
        for inputs, targets in loader:  
            inputs, targets = inputs.to(device), targets.to(device)  

            outputs = model(inputs)  # Compute the model's outputs for the given inputs.

            # Find the predicted class with the highest score for each input. 
            # The `torch.max` function returns both the maximum values and their indices (the predicted classes)
            _, predicted = torch.max(outputs.data, 1)  
            # targets.size(0) gives the number of targets in the batch.
            total += targets.size(0)  
            # Calculate the number of correct predictions in the batch by comparing predicted with targets, summing the true predictions, and adding this sum to the correct counter.
            correct += (predicted == targets).sum().item()  

    return 100 * correct / total  

Here we are going to tune the learning rate, number of hidden units in each of the two layers and the batch size. These parameters will be tuned to optimize for the objective of finding the model with the best test accuracy.

Owing to the time taken in processing, I've choosen to go with only 10 epochs and 10 n_trails for hyperparameter tuning. Feel free to increase the number of epochs to a higher value. Note that this will significantly increase the processing time.

In [48]:
import optuna

num_epochs = 10

# Define the loss function 
criterion = nn.CrossEntropyLoss()

def objective(trial):
    # Hyperparameters to be optimized
    lr = trial.suggest_loguniform('lr', 1e-5, 1e-1)
    fc1_units = trial.suggest_categorical('fc1_units', [64, 128, 256])
    fc2_units = trial.suggest_categorical('fc2_units', [32, 64, 128])
    batch_size = trial.suggest_categorical('batch_size', [64, 128, 256])
    
    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)
    
    # Model setup
    model = HospitalStayNetOptuna(num_features, num_classes, fc1_units, fc2_units).to(device)
    optimizer = Adam(model.parameters(), lr=lr)

    # Initialize the variable to store best accuracy for this set of hyper parameters
    best_accuracy = 0

    # Training loop
    for epoch in range(num_epochs):
        model.train()
        for inputs, targets in train_loader:
            inputs, targets = inputs.to(device), targets.to(device)
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, targets)
            loss.backward()
            optimizer.step()
    
        # Evaluation
        model.eval()
        # We use the latest version of the model here
        test_accuracy = calculate_accuracy_optuna(test_loader, model) 
        if test_accuracy > best_accuracy:
            best_accuracy = test_accuracy
        
        # print epoch number and test accuracy for this epoch
        print_threshold = 10
        if ((epoch+1) % print_threshold == 0):
            print(f'Epoch {epoch+1}/{num_epochs}, Test Accuracy: {best_accuracy:.2f}%')


    return best_accuracy

# This method is used by optuna to print intermediate results
def print_trial_callback(study, trial):
    trial = study.best_trial
    print(f"Finished trial #{trial.number} with value: {trial.value:.2f}%")
    print(f"Best parameters: {trial.params}")


study = optuna.create_study(direction='maximize')
# Adjust the number of trials here
study.optimize(objective, n_trials=10, callbacks=[print_trial_callback])  

print('Number of finished trials:', len(study.trials))
print('Best trial:', study.best_trial.params)


  from .autonotebook import tqdm as notebook_tqdm
[I 2024-03-08 16:54:34,688] A new study created in memory with name: no-name-dffa987c-2545-4d4b-8cf4-b9c056e8f505
[I 2024-03-08 16:55:24,196] Trial 0 finished with value: 42.11666852562979 and parameters: {'lr': 6.105955332454353e-05, 'optimizer': 'Adam', 'fc1_units': 64, 'fc2_units': 32, 'batch_size': 64}. Best is trial 0 with value: 42.11666852562979.


Epoch 10/10, Test Accuracy: 42.12%
Finished trial #0 with value: 42.12%
Best parameters: {'lr': 6.105955332454353e-05, 'optimizer': 'Adam', 'fc1_units': 64, 'fc2_units': 32, 'batch_size': 64}


[I 2024-03-08 16:56:00,542] Trial 1 finished with value: 42.53891872082092 and parameters: {'lr': 0.010122234735374186, 'optimizer': 'Adam', 'fc1_units': 64, 'fc2_units': 32, 'batch_size': 128}. Best is trial 1 with value: 42.53891872082092.


Epoch 10/10, Test Accuracy: 42.54%
Finished trial #1 with value: 42.54%
Best parameters: {'lr': 0.010122234735374186, 'optimizer': 'Adam', 'fc1_units': 64, 'fc2_units': 32, 'batch_size': 128}


[I 2024-03-08 16:56:56,445] Trial 2 finished with value: 40.34799789671601 and parameters: {'lr': 1.1390336685532758e-05, 'optimizer': 'Adam', 'fc1_units': 64, 'fc2_units': 128, 'batch_size': 64}. Best is trial 1 with value: 42.53891872082092.


Epoch 10/10, Test Accuracy: 40.35%
Finished trial #1 with value: 42.54%
Best parameters: {'lr': 0.010122234735374186, 'optimizer': 'Adam', 'fc1_units': 64, 'fc2_units': 32, 'batch_size': 128}


[I 2024-03-08 16:57:32,377] Trial 3 finished with value: 42.71259898978632 and parameters: {'lr': 0.005690683999391386, 'optimizer': 'Adam', 'fc1_units': 128, 'fc2_units': 32, 'batch_size': 256}. Best is trial 3 with value: 42.71259898978632.


Epoch 10/10, Test Accuracy: 42.71%
Finished trial #3 with value: 42.71%
Best parameters: {'lr': 0.005690683999391386, 'optimizer': 'Adam', 'fc1_units': 128, 'fc2_units': 32, 'batch_size': 256}


[I 2024-03-08 16:58:52,614] Trial 4 finished with value: 42.81935658630635 and parameters: {'lr': 7.689204209467373e-05, 'optimizer': 'Adam', 'fc1_units': 256, 'fc2_units': 128, 'batch_size': 64}. Best is trial 4 with value: 42.81935658630635.


Epoch 10/10, Test Accuracy: 42.82%
Finished trial #4 with value: 42.82%
Best parameters: {'lr': 7.689204209467373e-05, 'optimizer': 'Adam', 'fc1_units': 256, 'fc2_units': 128, 'batch_size': 64}


[I 2024-03-08 17:00:03,447] Trial 5 finished with value: 43.147596360681334 and parameters: {'lr': 0.000969062169314886, 'optimizer': 'Adam', 'fc1_units': 256, 'fc2_units': 64, 'batch_size': 64}. Best is trial 5 with value: 43.147596360681334.


Epoch 10/10, Test Accuracy: 43.15%
Finished trial #5 with value: 43.15%
Best parameters: {'lr': 0.000969062169314886, 'optimizer': 'Adam', 'fc1_units': 256, 'fc2_units': 64, 'batch_size': 64}


[I 2024-03-08 17:00:57,794] Trial 6 finished with value: 42.43056772733791 and parameters: {'lr': 4.921966523053483e-05, 'optimizer': 'Adam', 'fc1_units': 64, 'fc2_units': 128, 'batch_size': 64}. Best is trial 5 with value: 43.147596360681334.


Epoch 10/10, Test Accuracy: 42.43%
Finished trial #5 with value: 43.15%
Best parameters: {'lr': 0.000969062169314886, 'optimizer': 'Adam', 'fc1_units': 256, 'fc2_units': 64, 'batch_size': 64}


[I 2024-03-08 17:01:31,553] Trial 7 finished with value: 42.8129829984544 and parameters: {'lr': 0.008749229872277697, 'optimizer': 'Adam', 'fc1_units': 64, 'fc2_units': 64, 'batch_size': 256}. Best is trial 5 with value: 43.147596360681334.


Epoch 10/10, Test Accuracy: 42.81%
Finished trial #5 with value: 43.15%
Best parameters: {'lr': 0.000969062169314886, 'optimizer': 'Adam', 'fc1_units': 256, 'fc2_units': 64, 'batch_size': 64}


[I 2024-03-08 17:02:03,447] Trial 8 finished with value: 40.54717251708918 and parameters: {'lr': 3.8566474658189045e-05, 'optimizer': 'Adam', 'fc1_units': 64, 'fc2_units': 32, 'batch_size': 256}. Best is trial 5 with value: 43.147596360681334.


Epoch 10/10, Test Accuracy: 40.55%
Finished trial #5 with value: 43.15%
Best parameters: {'lr': 0.000969062169314886, 'optimizer': 'Adam', 'fc1_units': 256, 'fc2_units': 64, 'batch_size': 64}


[I 2024-03-08 17:02:52,543] Trial 9 finished with value: 42.891059449640686 and parameters: {'lr': 0.00561722291258225, 'optimizer': 'Adam', 'fc1_units': 256, 'fc2_units': 64, 'batch_size': 256}. Best is trial 5 with value: 43.147596360681334.


Epoch 10/10, Test Accuracy: 42.89%
Finished trial #5 with value: 43.15%
Best parameters: {'lr': 0.000969062169314886, 'optimizer': 'Adam', 'fc1_units': 256, 'fc2_units': 64, 'batch_size': 64}
Number of finished trials: 10
Best trial: {'lr': 0.000969062169314886, 'optimizer': 'Adam', 'fc1_units': 256, 'fc2_units': 64, 'batch_size': 64}


Post hyperparameter tuning, our model performance has improved slightly and stands at 43.15%. We also got our optimal parameters in the 5th trial. The best learning rate for our model stands at 0.0096. Further the optimal number of neurons in the first hidden layer is 256 and the second layer is 64. Lastly, the optimal batch size we have got after hyperparameter tuning stands at 64. 

Feel free to go ahead and build the model with the best parameters.
