## Milestone 2: Neural Network Baseline and Hyperparameter Optimization

LIS 640 - Introduction to Applied Deep Learning

Due 3/7/25

## **Overview**
In Milestone 1 you have:
1. **Defined a deep learning problem** where AI can make a meaningful impact.
2. **Identified three datasets** that fit your topic and justified their relevance.
3. **Explored and visualized** the datasets to understand their structure.
4. **Implemented a PyTorch Dataset class** to prepare data for deep learning.

In Milestone 2 we will take the next step and implement a neural network baseline based on what we have learned in class! For this milestone, please use one of the datasets you picked in the last milestone. If you pick a new one, make sure to do Steps 2 - 4 again. 


## **Step 1: Define Your Deep Learning Problem**


The first step is to be clear about what you want your model to predict. Is your goal a classification or a regression task? what are the input features and what are you prediction targets y? Make sure that you have a sensible choice of features and a sensible choice of prediction targets y in your dataloader.

**Write down one paragraph of justification for how you set up your DataLoader below. If it makes sense to change the DataLoader from Milestone 1, describe what you changed and why:**


My problem remains the same -- Distinguish the English text beding generated by Japanese Native speakers and English Native speakers by BERT embedding and MLP. 

However the original data was insufficient in quantity: there was 200-400 words for each text and 60 for English native speakers and 60 for Japanese speakers. Therfore, in order to make the data quantity engough for training, I took two steps: 1. I split the each data equally into 10 parts and store them into the dataframe, marking English natives ad 0, Japanese natives as 1. Then, I used GPT 2 fine tuning to generate new text. New generated datasets are 4000 sets in total-- 2000 English natives (labeled as 0) and 2000 Japanese natives(labeled as 1). Then, I combined the original dataset (splited) with the current generated dataset to ctreate a dataset with 5200 data points-- 2600 for English natives (labeled as 0) and 2000 for Japanese natives (labeled as 1).

#Dataframes being (original, generated, and combined are all in github: https://github.com/kuangzil/Data-for-Native-language-Identification/tree/Milestone2)

The fine tuning code is shown as follows:

In [59]:
from transformers import GPT2LMHeadModel, GPT2Tokenizer, Trainer, TrainingArguments
import torch

import pandas as pd
from sklearn.model_selection import train_test_split




# We already have the combined dataset with the content and labels 0=English, 1=Japanese
df = pd.read_csv(r"D:\Applied Deep Learning\data\combined_DF_new\combined.csv")

# add a prefix to the content based on the label (0=English, 1=Japanese)
df['conditioned_text'] = df.apply(
    lambda x: "[JPN] " + x['text'] if x['label'] == 1 else "[ENG] " + x['text'],
    axis=1#axis=1 means apply the function to each row
)

# split the dataset
train_df, _ = train_test_split(df, test_size=0.1)#test size is 10% of the dataset


# loading the pre-trained GPT-2 model and tokenizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2', pad_token='<|endoftext|>')# pad token is used to pad the input sequences to the same length
model = GPT2LMHeadModel.from_pretrained('gpt2')# GPT2LMHeadModel is the GPT-2 model with a language modeling head

# using the conditioned text as input
train_texts = train_df['conditioned_text'].tolist()#converts the column to a list
train_encodings = tokenizer(train_texts, truncation=True, padding=True, max_length=512, return_tensors="pt")
# truncation=True: truncate the input sequences to the maximum length the model can accept
# padding=True: pad the input sequences to the same length
# max_length=512: maximum length of the input sequences


# self-defined dataset class to use the encodings as input
class ConditionalDataset(torch.utils.data.Dataset):
    def __init__(self, encodings):
        self.encodings = encodings
        
    def __len__(self):
        return self.encodings.input_ids.shape[0]
    
    def __getitem__(self, idx):
        return {
            'input_ids': self.encodings.input_ids[idx],
            
            'labels': self.encodings.input_ids[idx]  # LLM will still use the input_ids as labels
        }

train_dataset = ConditionalDataset(train_encodings)

# training arguments
training_args = TrainingArguments(
    output_dir="./gpt2-conditional",
    num_train_epochs=3,#number of epochs to train the model
    per_device_train_batch_size=4,#batch size for training
    logging_steps=100,#number of steps to print the logs
    save_steps=500,
    learning_rate=2e-5,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
)

# fine-tuning
trainer.train()
model.save_pretrained("./gpt2-conditional")#save the fine-tuned model

def generate_samples(model,tokenizer, prefix, num_samples=2000, batch_size=10):#num_samples is the number of samples to generate, batch_size is the number of samples to generate in each batch
    generated = []
    model.eval()
    # move the model to GPU
    model=model.to('cuda')
    while len(generated) < num_samples:
        inputs = tokenizer.encode(prefix, return_tensors='pt').to('cuda')
        
        outputs = model.generate(
            inputs,
            max_length=100,#maximum length of the generated sequences
            num_return_sequences=batch_size,
            temperature=0.7,#temperature is used to control the randomness of the generated samples
            top_p=0.9,#top_p is used to control the diversity of the generated samples
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )
        
        for output in outputs:
            text = tokenizer.decode(output, skip_special_tokens=True)
            clean_text = text.replace(prefix, "").strip()
            if len(clean_text) > 10:  # filtering out the ones that are too short
                generated.append(clean_text)
                
            if len(generated) >= num_samples:
                break
                
    return generated[:num_samples]

# Generate Japanese samples
japanese_samples = generate_samples(model,tokenizer, "[JPN]", num_samples=2000)

# Generate English samples
english_samples = generate_samples(model,tokenizer, "[ENG]", num_samples=2000)

# setting up DataFrame and save
result_df = pd.DataFrame({
    'text': japanese_samples + english_samples,
    'label': [0]*2000 + [1]*2000
})

result_df.to_csv(r"D:\Applied Deep Learning\data\combined_DF_new\generated_samples.csv", index=False)


oringinal = pd.read_csv(r"D:\Applied Deep Learning\data\combined_DF_new\combined.csv")
combined_df= pd.concat([oringinal, result_df], ignore_index=True)
combined_df.to_csv(r"D:\Applied Deep Learning\data\combined_DF_new\combined_new_df.csv", index=False)

 13%|█▎        | 100/759 [00:31<03:26,  3.19it/s]

{'loss': 3.0713, 'grad_norm': 6.13089656829834, 'learning_rate': 1.7364953886693017e-05, 'epoch': 0.4}


 26%|██▋       | 200/759 [01:05<03:10,  2.93it/s]

{'loss': 2.7528, 'grad_norm': 6.864502906799316, 'learning_rate': 1.4729907773386036e-05, 'epoch': 0.79}


 40%|███▉      | 300/759 [01:40<02:42,  2.83it/s]

{'loss': 2.6765, 'grad_norm': 7.461439609527588, 'learning_rate': 1.2094861660079052e-05, 'epoch': 1.19}


 53%|█████▎    | 400/759 [02:18<02:10,  2.75it/s]

{'loss': 2.5776, 'grad_norm': 6.685812473297119, 'learning_rate': 9.45981554677207e-06, 'epoch': 1.58}


 66%|██████▌   | 500/759 [02:55<01:40,  2.57it/s]

{'loss': 2.5771, 'grad_norm': 5.956602096557617, 'learning_rate': 6.824769433465086e-06, 'epoch': 1.98}


 79%|███████▉  | 600/759 [03:35<00:59,  2.66it/s]

{'loss': 2.5333, 'grad_norm': 6.608233451843262, 'learning_rate': 4.1897233201581036e-06, 'epoch': 2.37}


 92%|█████████▏| 700/759 [04:19<00:33,  1.74it/s]

{'loss': 2.544, 'grad_norm': 6.947583198547363, 'learning_rate': 1.5546772068511201e-06, 'epoch': 2.77}


100%|██████████| 759/759 [04:56<00:00,  2.56it/s]


{'train_runtime': 296.2855, 'train_samples_per_second': 10.237, 'train_steps_per_second': 2.562, 'train_loss': 2.6610089931563428, 'epoch': 3.0}


As the data quantity is insufficient, I fine tuned GPT 2 for text generation-- first, I split 10%  in the original dataset as validation set and what is left (90%) of orginal data set as the training set.  Then, I tried to tell GPT2 which part of the input sequences should be attended to and which part of the input should be ignored by input_ids.  Then, I trained the model and used it for text generation-- original data set contains 1230 datasets, 615 Japanese Natives and 615 Enlgish Natives. I generated 4000, 2000 for each according to the trained data.  And I combined the generated dataset and the original dataset with their labels-- 0 for English Natives and 1 for Japanese. 
The following cell is how I combined the dataset.

In [60]:
oringinal = pd.read_csv(r"D:\Applied Deep Learning\data\combined_DF_new\combined.csv")
combined_df= pd.concat([oringinal, result_df], ignore_index=True)
combined_df.to_csv(r"D:\Applied Deep Learning\data\combined_DF_new\combined_new_df.csv", index=False)
print(combined_df.head())

                                                text  label
0  file00201.txt 2001-01-16 male 20 Japan 1m stud...      1
1  file00202.txt 2001-01-17 female 19 Japan 1m st...      1
2  file00203.txt 2001-02-02 male 27 Japan 3 neigh...      1
3  file00204.txt 2001-02-02 female 31 Japan 1_12m...      1
4  file00205.txt 2001-02-02 female Japan 5 restau...      1


## **Step 2: Train a Neural Network in PyTorch**

We learned in class how to implement and train a feed forward neural network in pytorch. You can find reference implementations [here](https://github.com/mariru/Intro2ADL/blob/main/Week5/Week5_Lab_Example.ipynb) and [here](https://www.kaggle.com/code/girlboss/mmlm2025-pytorch-lb-0-00000). Tip: Try to implement the neural network by yourself from scratch before looking at the reference.


In [61]:
# imports
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch
from transformers import BertTokenizer, DistilBertModel
import numpy as np
import os
import pandas as pd
from torch.utils.data import Dataset
import torch.optim as optim
import umap.umap_ as umap 
# define dataloaders: make sure to have a train, validation and a test loader
#loading the pre-trained model tokenizer of BERT
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = DistilBertModel.from_pretrained('distilbert-base-uncased')



print(torch.cuda.is_available())
print(torch.cuda.device_count())
print(torch.cuda.get_device_name())
# load the pre-trained BERT model to GPU

device = torch.device("cuda")
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model= DistilBertModel.from_pretrained('distilbert-base-uncased').to(device)
# do not feed the data directly to the variable called"model"! in the training process!

class CustomDataset(Dataset):
    def __init__(self, dataframe):
        self.dataset = pd.read_csv(dataframe)
        self.texts = self.dataset[['text']].astype(str)
        # Convert labels to long
        self.labels = self.dataset['label']#map({"Native English": 0, "Native Japanese": 1}).astype(int) this is no longer needed because I already labeled the df

    def __len__(self):
        return len(self.dataset)

    def __getitem__(self, idex):
        text = self.texts.iloc[idex].values[0]
        label = torch.tensor(self.labels.iloc[idex], dtype=torch.long).to(device)  # Make sure it's long
        tokenized_text = tokenizer(text, truncation=True, padding=True, max_length=512, return_tensors="pt")
        tokenized_text = {key: val.to(device) for key, val in tokenized_text.items() if key != 'token_type_ids'}
        
        with torch.no_grad():
            output = model(**tokenized_text)
        
        embedding = output.last_hidden_state[:, 0, :].squeeze(0)
        embedding = embedding.float()  # Ensure embedding is float
        
        return embedding, label
    

# Example usage
dataset = CustomDataset(r"D:\Applied Deep Learning\data\combined_DF_new\combined_new_df.csv")
embedding, label = dataset[0]
print(embedding, label)
# Create a DataLoader
train_loader = torch.utils.data.DataLoader(dataset, batch_size=4, shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset, batch_size=4, shuffle=False)
#batch size too big, will cause memory error on GPU.
#we need to reduce the batch size to 4 or 8



True
1
NVIDIA GeForce RTX 4070 Laptop GPU
tensor([-1.8382e-01, -1.9284e-02,  2.4736e-01, -1.4338e-01, -1.3177e-01,
        -5.4348e-02,  1.5631e-01,  3.3115e-01, -2.8438e-01, -3.9994e-01,
         4.8445e-02, -6.1485e-02,  1.1324e-01,  1.7800e-01, -5.6604e-02,
         1.5154e-01,  2.2107e-01,  3.2030e-01,  9.4237e-02,  1.3604e-01,
         1.5238e-01, -2.5460e-01,  4.4725e-01,  1.7563e-02, -5.0431e-02,
        -8.8167e-02, -7.7577e-02,  1.8742e-01,  3.1459e-02, -8.7420e-02,
        -1.3092e-01,  2.4539e-01, -4.1070e-02, -1.9942e-01,  7.3461e-02,
        -2.7940e-01,  6.7753e-02, -1.4762e-01,  1.1093e-01,  1.4603e-01,
        -8.7253e-02, -3.2385e-02, -2.7490e-01, -3.2436e-01, -1.2498e-02,
        -2.6295e-01, -3.2339e+00, -2.8927e-02,  1.5797e-02, -2.5329e-01,
         4.7867e-01, -1.0998e-01, -9.2680e-02,  2.0057e-01,  2.7781e-01,
         3.7869e-01, -2.3559e-01,  3.4312e-01, -1.5666e-01, -9.3389e-02,
         6.3809e-01, -7.8705e-02, -4.3797e-01,  9.1733e-02, -3.3742e-03,
        -

In this cell, I used BERT-base-uncased and distill-bert-uncased for tokenization and embedding transfer. the transfered embedding, with their labels have been store in a dataframe. Then, I created a Dataset class to store the dataset, tokenization and embedding vectorization. Then, I split the dataset into train_loader and test_loader. 

In [84]:
# UMAP demonstration reduction to 3 demensions
import matplotlib.pyplot as plt
import umap.umap_ as umap  # make sure I am using umap-learn
import numpy as np
# Extract embeddings and labels
def get_embeddings_and_labels(loader):
    embeddings = []
    labels = []
    for batch_embeddings, batch_labels in loader:
        embeddings.append(batch_embeddings.cpu().numpy())  # transfer the embeddings to the CPU and to numpy
        labels.append(batch_labels.cpu().numpy())
    embeddings = np.vstack(embeddings)
    labels = np.hstack(labels)
    return embeddings, labels

# Extract embeddings and labels from the train and test loaders
train_embeddings, train_labels = get_embeddings_and_labels(train_loader)
test_embeddings, test_labels = get_embeddings_and_labels(test_loader)

# reduce the embeddings to 20 dimensions
umap_model = umap.UMAP(n_components=30, random_state=42)

train_embeddings_3d = umap_model.fit_transform(train_embeddings)
test_embeddings_3d = umap_model.transform(test_embeddings)  # use the same model to transform the test embeddings


  warn(


## Demension Reduction!
As the 768 demension would highly increase the time for computation, I reduced the demension of embeddings into 30 for testing-- so that there would be a bit more information stored and highly increase the speed of computation. Then, I still stored the new training set and testing set into new dataloaders. I firstly transferred them into 3d, but I soon found out that there are too little information being stored and not meaningful for training with that dataset.
the class of 30 demension is in the following cell

In [85]:
# defining a new class of data set that stores the embeddings and labels 
from torch.utils.data import DataLoader
class EmbeddingDataset(torch.utils.data.Dataset):
    def __init__(self, embeddings, labels):
        self.embeddings = embeddings
        self.labels = labels

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, idx):
        return torch.tensor(self.embeddings[idx], dtype=torch.float32), torch.tensor(self.labels[idx])

# setting up the data loaders for training and testing
train_embedding_dataset = EmbeddingDataset(train_embeddings_3d, train_labels)
test_embedding_dataset = EmbeddingDataset(test_embeddings_3d, test_labels)

train_loader_20d = DataLoader(train_embedding_dataset, batch_size=32, shuffle=True)
test_loader_20d = DataLoader(test_embedding_dataset, batch_size=32, shuffle=False)

In [94]:

# define the model
class MyNeuralNetwork(nn.Module):
    def __init__(self,hidden_size=185
                 ,dropout_rate=0.3267140666269757
                 ,learning_rate=0.00025937129423859296
                 ):
        super(MyNeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(768
                      , hidden_size),#input layer has 768 neurons which is the size of the BERT embedding (if dealing with 30 d embeddings, the input layer has 30 neurons)
            nn.BatchNorm1d(hidden_size),# adding batch normalization
            nn.Dropout(dropout_rate),# setting up dropout rate to 0.1
            nn.ReLU(),
            nn.Linear(hidden_size, 2)
        )
        self.learning_rate=learning_rate

# I used linear activation functions for the hidden layers and softmax for the output layer
# the input layer has 768 neurons which is the size of the BERT embedding
# the hidden layer has 100 neurons
# the output layer has 2 neurons
# I used the ReLU activation function for the hidden layer and the softmax activation function for the output because it is a binary classification problem and non-linear
# activation functions are needed for the hidden layers
        
    def forward(self, x):
        x = self.flatten(x)  # Flatten input if necessary
        x = self.linear_relu_stack(x)
        return x  # Output logits (CrossEntropyLoss will apply softmax internally)
    
    def fit(self, train_loader,val_loader, epochs,learning_rate=0.01,patience=5):
        self.to(device)
        self.train()
        criterion = nn.CrossEntropyLoss()
        optimizer = optim.Adam(self.parameters(), lr=learning_rate)# here I used the Adam optimizer because it is better for NLP
        best_val_loss= float("inf")#initialize the best validation loss to infinity
        patience_counter=0#initialize the patience counter to 0
        for epoch in range(epochs):
            for i, (data, labels) in enumerate(train_loader):
                data, labels = data.to(device), labels.to(device)
                optimizer.zero_grad()
                outputs = self(data)
                loss = criterion(outputs, labels)#calculate the loss
                loss.backward()#backpropagation
                optimizer.step()
                if i % 100 == 0:
                    print(f"Epoch: {epoch}, Loss: {loss.item()}")
            val_loss=self.evaluate(val_loader, criterion)#evaluate the model on the validation set
            if val_loss<best_val_loss:
                best_val_loss=val_loss#update the best validation loss

                patience_counter=0
            else:
                patience_counter+=1
            if patience_counter>patience:
                print("Early stopping")
                break
        return best_val_loss#return the best validation loss
    #in this funciton, we do not need to convert the data into tensors again bc we arleady did that
    def predict(self, x):
      with torch.no_grad():
        x = torch.tensor(x).to(device)
        outputs = self(x)
        y_pred = torch.argmax(outputs, dim=1)
        return np.where(y_pred.detach().cpu().numpy()==1, "Native Japanese", "Native English")
      
      
    def evaluate(self, val_loader,criterion):
        self.eval()
        total_loss = 0
        with torch.no_grad():
            for data, labels in val_loader:
                data, labels = data.to(device), labels.to(device)
                outputs = self(data)
                loss = criterion(outputs, labels)
                total_loss += loss.item()
        return total_loss / len(val_loader)
        
    


In [95]:
#training the model 
def train_loop(dataloader, model, loss_fn, optimizer, device):
    """
    Runs one full training epoch on the given model using the provided dataloader.

    Parameters:
        dataloader (torch.utils.data.DataLoader): DataLoader providing batches of training data (inputs and labels).
        model (torch.nn.Module): The PyTorch model to be trained.
        loss_fn (function): The loss function used to compute the error between predictions and true labels.
        optimizer (torch.optim.Optimizer): The optimizer used to update the model parameters.
        device (torch.device): The device to run computations on (e.g., "cuda" or "cpu").

    Returns:
        None
    """
    size = len(dataloader.dataset)
    model.train()  # Set the model to training mode
    total_loss = 0

    for batch, (X, y) in enumerate(dataloader):
        # Move data to the specified device
        X, y = X.to(device), y.to(device,dtype=torch.long)
        y=y.long()
        X=X.float()
        # Compute prediction and loss
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        optimizer.zero_grad()  # Clear gradients
        loss.backward()        # Compute gradients
        optimizer.step()       # Update weights

        total_loss += loss.item()

        if batch % 100 == 0:
            current = batch * len(X)
            print(f"loss: {loss.item():>7f}  [{current:>5d}/{size:>5d}]")

    avg_loss = total_loss / len(dataloader)
    print(f"Average Training Loss: {avg_loss:.4f}")


def test_loop(dataloader, model, loss_fn, device):
    """
    Evaluates the model's performance on a test dataset.

    Parameters:
        dataloader (torch.utils.data.DataLoader): DataLoader providing batches of test data (inputs and labels).
        model (torch.nn.Module): The PyTorch model to be evaluated.
        loss_fn (function): The loss function used to compute the error between predictions and true labels.
        device (torch.device): The device to run computations on (e.g., "cuda" or "cpu").

    Returns:
        None
    """
    model.eval()  # Set the model to evaluation mode
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    test_loss, correct = 0, 0

    with torch.no_grad():  # Disable gradient calculations
        for X, y in dataloader:
            X, y = X.to(device), y.to(device,dtype=torch.long) # Move data to the specified device
            X=X.float()
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
            

    avg_loss = test_loss / num_batches
    accuracy = correct / size
    print(f"Test Error: \n Accuracy: {(100 * accuracy):>0.1f}%, Avg loss: {avg_loss:>8f} \n")

batch_size = 4
# we set up batch size as 1 to avoid compurational catastrophy
# Instantiate custom model
my_model = MyNeuralNetwork().to(device)

# Define loss and optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(my_model.parameters(), lr=0.00019466370761380482)

# Training loop
epochs = 100
for epoch in range(epochs):
    print(f"Epoch {epoch+1}\n-------------------------------")
    train_loop(train_loader, my_model, loss_fn, optimizer, device)
    test_loop(test_loader, my_model, loss_fn, device)


Epoch 1
-------------------------------
loss: 0.964580  [    0/ 5124]
loss: 0.566610  [  400/ 5124]
loss: 0.894764  [  800/ 5124]
loss: 0.741076  [ 1200/ 5124]
loss: 0.510969  [ 1600/ 5124]
loss: 0.402552  [ 2000/ 5124]
loss: 0.684575  [ 2400/ 5124]
loss: 0.250913  [ 2800/ 5124]
loss: 0.698820  [ 3200/ 5124]
loss: 0.480867  [ 3600/ 5124]
loss: 0.509612  [ 4000/ 5124]
loss: 0.483319  [ 4400/ 5124]
loss: 0.742746  [ 4800/ 5124]
Average Training Loss: 0.5555
Test Error: 
 Accuracy: 79.9%, Avg loss: 0.456661 

Epoch 2
-------------------------------
loss: 0.509154  [    0/ 5124]
loss: 0.479380  [  400/ 5124]
loss: 0.626528  [  800/ 5124]
loss: 0.397814  [ 1200/ 5124]
loss: 0.283879  [ 1600/ 5124]
loss: 0.442565  [ 2000/ 5124]
loss: 0.603289  [ 2400/ 5124]
loss: 0.541206  [ 2800/ 5124]
loss: 0.175751  [ 3200/ 5124]
loss: 0.303028  [ 3600/ 5124]
loss: 0.488987  [ 4000/ 5124]
loss: 0.562255  [ 4400/ 5124]
loss: 0.382352  [ 4800/ 5124]
Average Training Loss: 0.5030
Test Error: 
 Accuracy: 80.6

#Initial result report

In this step, I built up my MLP and tried to training the model with the hidden_layer of 768 first (because I converted the data into BERT embeddings), but in order to deal with 30 demension embeddings, I converted the Linear function input as 30. 
first 30 d embedding training hidden_size=203,dropout_rate=0.1,learning_rate=0.0001
The result for the 100 epoch shows that :

Epoch 1
-------------------------------
loss: 0.584721  [    0/ 5124]
loss: 0.519337  [ 3200/ 5124]
Average Training Loss: 0.5618
Test Error: 
 Accuracy: 74.0%, Avg loss: 0.511170 

Epoch 2
-------------------------------
loss: 0.497146  [    0/ 5124]
loss: 0.446626  [ 3200/ 5124]
Average Training Loss: 0.5025
Test Error: 
 Accuracy: 74.5%, Avg loss: 0.481287 

Epoch 3
-------------------------------
loss: 0.492960  [    0/ 5124]
loss: 0.443114  [ 3200/ 5124]
Average Training Loss: 0.4762
Test Error: 
 Accuracy: 75.3%, Avg loss: 0.466697 

Epoch 4
...
Average Training Loss: 0.4308
Test Error: 
 Accuracy: 78.5%, Avg loss: 0.425507 


Accuracy improved from 74.0% to 78.5% for 100 epochs, and loss changed from 0.51 - 0.43. 


As this is the result for 30d data, the result is different from 768 d embedding training outcome: Underthe same circumsatnce, the result for 768 bert embedding is: accuracy: 79% --> 86.4% loss: 0.472- 0.373 Which means that there are obviously more information contained in the 768 demension result, however it takes 240 minutes for training while 30d used 27 seconds. This is because less demensions will cost less in computational resources  but wil lost more information. 


## **Step 2 continued: Try Stuff**

Use your code above to try different architectures. Make sure to use early stopping! Try adding Dropout and BatchNorm, try different learning rates. How do they affect training and validation performance? 

 **Summarize your observations in a paragraph below:**


When I set up the drop out rate as 0.2, the training speed is a bit faster than 0.1 of Drop out rate. But the loss is likely to increase in some of the epoch. For Dropout rate of 0.1, and batch norm to 100, it takes 30 minutes to train 30 epoches. But with drop out rate of 0.2, the trainig times is 28mingutes. However, the improvement in accuracy is also different-- for the former one with 0.1 dropout rate, the accuracy is from 79% --> 86.4%, while for the later one with drop out rate of 0.2, the improvement is 80% --> 83.3%. The change in loss is also different-- in the dropout of 0.1, the loss change is from 0.51-0.44 while for the drop out rate of 0.2, the loss change is from 0.472- 0.373. It is significant that lower drop out rate would make loss drop higher bacause there are less information lost.  I set up early stopping with the patience of 5, but the training process still proceed to the last.

Therefore, conclusion should be drawn that lower drop out rate (0.1 for example) could be the optimized solution to my MLP training because less information is likely to be droped and thus more information would be learned. On the contray, higher dropout rate (0.2) would result more information to be lost and inhibit the learning process. 

Then, I tried a different learning rate-- for the lr of 0.01, the learning speed is faster than the lr of 0.001. But under sthe same circumstance, higher learning rate would result less accurate result: under the same circunstance(10 epoch, dropout=10, same batch norm), lr of 0.01 would result 86.4% accuracy, while lr of 0.001 would result in 87.5% accuarcy.  


## **Step 3: Hyperparameter Optimization with Optuna**

As you can see, hyperparameter optimization can be tedious. In class we used [optuna](https://optuna.org/#code_examples) to automate the process. Your next task is to wrap your code from Step 2 into an objective which you can then optimize with optuna. Under the [code exaples](https://optuna.org/#code_examples) there is a tab *PyTorch* which should be helpful as it provides a minimal example on how to wrap PyTorch code inside an objective.

**Important: Make sure the model is evaluated on a validation set, not the training data!!**


In [None]:
import optuna

def objective(trial):
    # Setting up the hyperparameters in the objective function
    hidden_size = trial.suggest_int("hidden_size", 50, 300)  # Hidden size between 50 and 300
    dropout_rate = trial.suggest_float("dropout", 0.01, 0.4)  # Dropout between 0.01 and 0.4
    learning_rate = trial.suggest_loguniform("lr", 1e-5, 1e-2)  # Learning rate between 1e-5 and 1e-2

    # Setting up the model with the suggested hyperparameters
    model = MyNeuralNetwork(hidden_size, dropout_rate, train_loader_20d).to(device)

    # Training the model (pass the correct data loaders)
    model.fit(train_loader_20d, val_loader=test_loader_20d, epochs=100, learning_rate=learning_rate)

    # Evaluate the model on the test_loader (we use test_loader to evaluate performance)
    val_loss = model.evaluate(test_loader_20d, criterion = nn.CrossEntropyLoss())  # Make sure I am using test_loader in the evaluation
    # what is more, I cannot connect "crtierion" with the previous one, so I just connected it with nn.CrossEntropyLoss()

    return val_loss  # Minimize the validation loss

# Create and run the Optuna optimization study
study = optuna.create_study(direction="minimize")  # Minimize the validation loss
study.optimize(objective, n_trials=50)  # Number of trials for optimization

# Print the best hyperparameters found by Optuna
print(study.best_params)





[I 2025-03-03 15:16:46,019] A new study created in memory with name: no-name-96810c6a-9540-4917-8093-c41bb9691905
  learning_rate = trial.suggest_loguniform("lr", 1e-5, 1e-2)  # Learning rate between 1e-5 and 1e-2


Epoch: 0, Loss: 0.6610797643661499
Epoch: 0, Loss: 0.359302818775177
Epoch: 1, Loss: 0.3919389247894287
Epoch: 1, Loss: 0.4813230335712433
Epoch: 2, Loss: 0.3904144763946533
Epoch: 2, Loss: 0.23281002044677734
Epoch: 3, Loss: 0.3842809200286865
Epoch: 3, Loss: 0.4154621958732605
Epoch: 4, Loss: 0.36802324652671814
Epoch: 4, Loss: 0.30847716331481934
Epoch: 5, Loss: 0.589328944683075
Epoch: 5, Loss: 0.27581489086151123
Epoch: 6, Loss: 0.5492057800292969
Epoch: 6, Loss: 0.4563789367675781
Epoch: 7, Loss: 0.4560481905937195
Epoch: 7, Loss: 0.4762313663959503
Epoch: 8, Loss: 0.36396971344947815
Epoch: 8, Loss: 0.37339210510253906
Epoch: 9, Loss: 0.40023931860923767
Epoch: 9, Loss: 0.5566477179527283
Epoch: 10, Loss: 0.465040385723114
Epoch: 10, Loss: 0.3177785277366638
Epoch: 11, Loss: 0.38588738441467285
Epoch: 11, Loss: 0.3154771625995636
Epoch: 12, Loss: 0.43869513273239136
Epoch: 12, Loss: 0.4519065022468567
Epoch: 13, Loss: 0.46726712584495544
Epoch: 13, Loss: 0.5844473838806152
Epoch

[I 2025-03-03 15:17:03,108] Trial 0 finished with value: 0.41486448658809577 and parameters: {'hidden_size': 276, 'dropout': 0.32243513364407383, 'lr': 0.00010612609591523706}. Best is trial 0 with value: 0.41486448658809577.


Early stopping
Epoch: 0, Loss: 0.7873765230178833
Epoch: 0, Loss: 0.3893071711063385
Epoch: 1, Loss: 0.4720616340637207
Epoch: 1, Loss: 0.4370037615299225
Epoch: 2, Loss: 0.4287753105163574
Epoch: 2, Loss: 0.33902209997177124
Epoch: 3, Loss: 0.25430965423583984
Epoch: 3, Loss: 0.3418639004230499
Epoch: 4, Loss: 0.5559511184692383
Epoch: 4, Loss: 0.4378761649131775
Epoch: 5, Loss: 0.45901980996131897
Epoch: 5, Loss: 0.4202910363674164
Epoch: 6, Loss: 0.4501282274723053
Epoch: 6, Loss: 0.3754114806652069
Epoch: 7, Loss: 0.46161001920700073
Epoch: 7, Loss: 0.4323728084564209
Epoch: 8, Loss: 0.46950238943099976
Epoch: 8, Loss: 0.3815102279186249
Epoch: 9, Loss: 0.5094742178916931
Epoch: 9, Loss: 0.37362152338027954
Epoch: 10, Loss: 0.4727054834365845
Epoch: 10, Loss: 0.5619612336158752
Early stopping


[I 2025-03-03 15:17:06,078] Trial 1 finished with value: 0.42835562489926815 and parameters: {'hidden_size': 79, 'dropout': 0.3451948693871122, 'lr': 0.008500581192678849}. Best is trial 0 with value: 0.41486448658809577.


Epoch: 0, Loss: 0.6433467864990234
Epoch: 0, Loss: 0.5729934573173523
Epoch: 1, Loss: 0.5291832089424133
Epoch: 1, Loss: 0.47834378480911255
Epoch: 2, Loss: 0.3652757704257965
Epoch: 2, Loss: 0.6150302290916443
Epoch: 3, Loss: 0.4373672306537628
Epoch: 3, Loss: 0.4923796057701111
Epoch: 4, Loss: 0.2466956228017807
Epoch: 4, Loss: 0.34649384021759033
Epoch: 5, Loss: 0.2832242548465729
Epoch: 5, Loss: 0.3837044835090637
Epoch: 6, Loss: 0.40010660886764526
Epoch: 6, Loss: 0.5498492121696472
Epoch: 7, Loss: 0.469841867685318
Epoch: 7, Loss: 0.37640300393104553
Epoch: 8, Loss: 0.4417385458946228
Epoch: 8, Loss: 0.3398676812648773
Epoch: 9, Loss: 0.5727018117904663
Epoch: 9, Loss: 0.47158825397491455
Epoch: 10, Loss: 0.4600713551044464
Epoch: 10, Loss: 0.5080996751785278
Epoch: 11, Loss: 0.4219558537006378
Epoch: 11, Loss: 0.4118492007255554
Epoch: 12, Loss: 0.3231121897697449
Epoch: 12, Loss: 0.4543934166431427
Epoch: 13, Loss: 0.38539689779281616
Epoch: 13, Loss: 0.26545077562332153
Epoch:

[I 2025-03-03 15:17:20,141] Trial 2 finished with value: 0.42247116812128827 and parameters: {'hidden_size': 216, 'dropout': 0.3031781303938849, 'lr': 2.8907368681019724e-05}. Best is trial 0 with value: 0.41486448658809577.


Early stopping
Epoch: 0, Loss: 0.6483162045478821
Epoch: 0, Loss: 0.5228817462921143
Epoch: 1, Loss: 0.5839260816574097
Epoch: 1, Loss: 0.4297340214252472
Epoch: 2, Loss: 0.41195178031921387
Epoch: 2, Loss: 0.3473753035068512
Epoch: 3, Loss: 0.4846174120903015
Epoch: 3, Loss: 0.4251614511013031
Epoch: 4, Loss: 0.36572766304016113
Epoch: 4, Loss: 0.4534202218055725
Epoch: 5, Loss: 0.29655036330223083
Epoch: 5, Loss: 0.4139351546764374
Epoch: 6, Loss: 0.4279824197292328
Epoch: 6, Loss: 0.4848511815071106
Epoch: 7, Loss: 0.4137151539325714
Epoch: 7, Loss: 0.4215328097343445
Epoch: 8, Loss: 0.5295987129211426
Epoch: 8, Loss: 0.3201828598976135
Epoch: 9, Loss: 0.4394865036010742
Epoch: 9, Loss: 0.5376125574111938
Epoch: 10, Loss: 0.41306471824645996
Epoch: 10, Loss: 0.4973316192626953
Epoch: 11, Loss: 0.38854366540908813
Epoch: 11, Loss: 0.3204967677593231
Epoch: 12, Loss: 0.4624144434928894
Epoch: 12, Loss: 0.38626012206077576
Epoch: 13, Loss: 0.416127473115921
Epoch: 13, Loss: 0.402282714

[I 2025-03-03 15:17:24,454] Trial 3 finished with value: 0.42912354520435286 and parameters: {'hidden_size': 244, 'dropout': 0.02722354189216162, 'lr': 0.0032767732254658736}. Best is trial 0 with value: 0.41486448658809577.


Epoch: 0, Loss: 0.7046705484390259
Epoch: 0, Loss: 0.38033294677734375
Epoch: 1, Loss: 0.3254362642765045
Epoch: 1, Loss: 0.5360750555992126
Epoch: 2, Loss: 0.36039987206459045
Epoch: 2, Loss: 0.4685310125350952
Epoch: 3, Loss: 0.327156662940979
Epoch: 3, Loss: 0.42355847358703613
Epoch: 4, Loss: 0.5116345286369324
Epoch: 4, Loss: 0.3695792257785797
Epoch: 5, Loss: 0.4358178377151489
Epoch: 5, Loss: 0.5143343210220337
Epoch: 6, Loss: 0.4505024254322052
Epoch: 6, Loss: 0.5024691820144653
Epoch: 7, Loss: 0.3116553723812103
Epoch: 7, Loss: 0.4786997437477112
Epoch: 8, Loss: 0.4937686026096344
Epoch: 8, Loss: 0.4049917757511139
Epoch: 9, Loss: 0.29544341564178467
Epoch: 9, Loss: 0.5307707786560059
Epoch: 10, Loss: 0.2894406020641327
Epoch: 10, Loss: 0.43763595819473267
Epoch: 11, Loss: 0.3721563518047333
Epoch: 11, Loss: 0.4146885275840759
Epoch: 12, Loss: 0.37271687388420105
Epoch: 12, Loss: 0.482521653175354
Epoch: 13, Loss: 0.37661999464035034
Epoch: 13, Loss: 0.34285619854927063
Epoch:

[I 2025-03-03 15:17:33,477] Trial 4 finished with value: 0.4185571836261368 and parameters: {'hidden_size': 280, 'dropout': 0.23753126236814082, 'lr': 0.0002065541188907276}. Best is trial 0 with value: 0.41486448658809577.


Early stopping
Epoch: 0, Loss: 0.9639874696731567
Epoch: 0, Loss: 0.6848726272583008
Epoch: 1, Loss: 0.4060063660144806
Epoch: 1, Loss: 0.4940294027328491
Epoch: 2, Loss: 0.3850947916507721
Epoch: 2, Loss: 0.47048091888427734
Epoch: 3, Loss: 0.3995610177516937
Epoch: 3, Loss: 0.5083147287368774
Epoch: 4, Loss: 0.3476550281047821
Epoch: 4, Loss: 0.4739144444465637
Epoch: 5, Loss: 0.4010412395000458
Epoch: 5, Loss: 0.35744374990463257
Epoch: 6, Loss: 0.33869868516921997
Epoch: 6, Loss: 0.3735336661338806
Epoch: 7, Loss: 0.3155152499675751
Epoch: 7, Loss: 0.5070491433143616
Epoch: 8, Loss: 0.5141992568969727
Epoch: 8, Loss: 0.3800053894519806
Epoch: 9, Loss: 0.41521382331848145
Epoch: 9, Loss: 0.5556289553642273
Epoch: 10, Loss: 0.5769093632698059
Epoch: 10, Loss: 0.581944465637207
Epoch: 11, Loss: 0.43508225679397583
Epoch: 11, Loss: 0.4869530498981476
Epoch: 12, Loss: 0.5863324999809265
Epoch: 12, Loss: 0.2743832468986511
Epoch: 13, Loss: 0.49231693148612976
Epoch: 13, Loss: 0.433917582

[I 2025-03-03 15:17:40,192] Trial 5 finished with value: 0.4226763029322491 and parameters: {'hidden_size': 273, 'dropout': 0.3894075709482305, 'lr': 0.00035976090834079764}. Best is trial 0 with value: 0.41486448658809577.


Early stopping
Epoch: 0, Loss: 0.8899600505828857
Epoch: 0, Loss: 0.48138144612312317
Epoch: 1, Loss: 0.4835922420024872
Epoch: 1, Loss: 0.43936553597450256
Epoch: 2, Loss: 0.6205148100852966
Epoch: 2, Loss: 0.39827850461006165
Epoch: 3, Loss: 0.3913358449935913
Epoch: 3, Loss: 0.4341740310192108
Epoch: 4, Loss: 0.25634512305259705
Epoch: 4, Loss: 0.4734777510166168
Epoch: 5, Loss: 0.47679388523101807
Epoch: 5, Loss: 0.6991392970085144
Epoch: 6, Loss: 0.43504783511161804
Epoch: 6, Loss: 0.3362903892993927
Early stopping


[I 2025-03-03 15:17:42,143] Trial 6 finished with value: 0.4288132814285548 and parameters: {'hidden_size': 232, 'dropout': 0.08963430979297632, 'lr': 0.0020660001496150397}. Best is trial 0 with value: 0.41486448658809577.


Epoch: 0, Loss: 0.7919474840164185
Epoch: 0, Loss: 0.45493096113204956
Epoch: 1, Loss: 0.4282335937023163
Epoch: 1, Loss: 0.5750783681869507
Epoch: 2, Loss: 0.4590735137462616
Epoch: 2, Loss: 0.4501896798610687
Epoch: 3, Loss: 0.39051002264022827
Epoch: 3, Loss: 0.4097910225391388
Epoch: 4, Loss: 0.3000686764717102
Epoch: 4, Loss: 0.37978073954582214
Epoch: 5, Loss: 0.41562727093696594
Epoch: 5, Loss: 0.3415859341621399
Epoch: 6, Loss: 0.41267332434654236
Epoch: 6, Loss: 0.35324394702911377


[I 2025-03-03 15:17:44,096] Trial 7 finished with value: 0.43175713639266744 and parameters: {'hidden_size': 127, 'dropout': 0.07619745997492564, 'lr': 0.0024839339929338114}. Best is trial 0 with value: 0.41486448658809577.


Early stopping
Epoch: 0, Loss: 0.7771283984184265
Epoch: 0, Loss: 0.7320212721824646
Epoch: 1, Loss: 0.6200347542762756
Epoch: 1, Loss: 0.5355891585350037
Epoch: 2, Loss: 0.5786080956459045
Epoch: 2, Loss: 0.5199492573738098
Epoch: 3, Loss: 0.35497209429740906
Epoch: 3, Loss: 0.32549887895584106
Epoch: 4, Loss: 0.2767884433269501
Epoch: 4, Loss: 0.39492568373680115
Epoch: 5, Loss: 0.5188320875167847
Epoch: 5, Loss: 0.4147818684577942
Epoch: 6, Loss: 0.45281505584716797
Epoch: 6, Loss: 0.4035623371601105
Epoch: 7, Loss: 0.36716145277023315
Epoch: 7, Loss: 0.5523009896278381
Epoch: 8, Loss: 0.3125231862068176
Epoch: 8, Loss: 0.37042516469955444
Epoch: 9, Loss: 0.47180959582328796
Epoch: 9, Loss: 0.32543566823005676
Epoch: 10, Loss: 0.39589521288871765
Epoch: 10, Loss: 0.45778602361679077
Epoch: 11, Loss: 0.4043230414390564
Epoch: 11, Loss: 0.5270802974700928
Epoch: 12, Loss: 0.44816604256629944
Epoch: 12, Loss: 0.43912002444267273
Epoch: 13, Loss: 0.5107749104499817
Epoch: 13, Loss: 0.29

[I 2025-03-03 15:18:10,730] Trial 8 finished with value: 0.4225041931668609 and parameters: {'hidden_size': 225, 'dropout': 0.21401860080089574, 'lr': 1.0760417239776477e-05}. Best is trial 0 with value: 0.41486448658809577.


Epoch: 0, Loss: 0.5365692973136902
Epoch: 0, Loss: 0.4873289167881012
Epoch: 1, Loss: 0.40874215960502625
Epoch: 1, Loss: 0.3081992566585541
Epoch: 2, Loss: 0.4716368019580841
Epoch: 2, Loss: 0.20308806002140045
Epoch: 3, Loss: 0.42617639899253845
Epoch: 3, Loss: 0.38605886697769165
Epoch: 4, Loss: 0.3586467504501343
Epoch: 4, Loss: 0.39913827180862427
Epoch: 5, Loss: 0.39245977997779846
Epoch: 5, Loss: 0.5048075914382935
Epoch: 6, Loss: 0.3786649703979492
Epoch: 6, Loss: 0.4066137373447418


[I 2025-03-03 15:18:12,682] Trial 9 finished with value: 0.4277834020126958 and parameters: {'hidden_size': 166, 'dropout': 0.2277040599028521, 'lr': 0.0027199565531543695}. Best is trial 0 with value: 0.41486448658809577.


Early stopping
Epoch: 0, Loss: 0.7395080924034119
Epoch: 0, Loss: 0.5559110641479492
Epoch: 1, Loss: 0.6205818057060242
Epoch: 1, Loss: 0.5356339812278748
Epoch: 2, Loss: 0.5461162328720093
Epoch: 2, Loss: 0.5175850987434387
Epoch: 3, Loss: 0.46954095363616943
Epoch: 3, Loss: 0.42248237133026123
Epoch: 4, Loss: 0.37047305703163147
Epoch: 4, Loss: 0.40590766072273254
Epoch: 5, Loss: 0.3037531077861786
Epoch: 5, Loss: 0.32385462522506714
Epoch: 6, Loss: 0.2861565053462982
Epoch: 6, Loss: 0.439493328332901
Epoch: 7, Loss: 0.44626203179359436
Epoch: 7, Loss: 0.4393170475959778
Epoch: 8, Loss: 0.38163626194000244
Epoch: 8, Loss: 0.28124281764030457
Epoch: 9, Loss: 0.3638584017753601
Epoch: 9, Loss: 0.3609023690223694
Epoch: 10, Loss: 0.3651049733161926
Epoch: 10, Loss: 0.32449933886528015
Epoch: 11, Loss: 0.31639713048934937
Epoch: 11, Loss: 0.4749342203140259
Epoch: 12, Loss: 0.42023515701293945
Epoch: 12, Loss: 0.3410148620605469
Epoch: 13, Loss: 0.3069429099559784
Epoch: 13, Loss: 0.3850

[I 2025-03-03 15:18:29,390] Trial 10 finished with value: 0.4133858236452414 and parameters: {'hidden_size': 300, 'dropout': 0.2967865657055977, 'lr': 0.00014369557899949069}. Best is trial 10 with value: 0.4133858236452414.


Epoch: 0, Loss: 1.004586100578308
Epoch: 0, Loss: 0.42078542709350586
Epoch: 1, Loss: 0.4542127847671509
Epoch: 1, Loss: 0.4524120092391968
Epoch: 2, Loss: 0.3745706379413605
Epoch: 2, Loss: 0.41589680314064026
Epoch: 3, Loss: 0.38115304708480835
Epoch: 3, Loss: 0.3823171555995941
Epoch: 4, Loss: 0.35770782828330994
Epoch: 4, Loss: 0.38616943359375
Epoch: 5, Loss: 0.4078293442726135
Epoch: 5, Loss: 0.5789992213249207
Epoch: 6, Loss: 0.23404334485530853
Epoch: 6, Loss: 0.44808611273765564
Epoch: 7, Loss: 0.3701275587081909
Epoch: 7, Loss: 0.3648543357849121
Epoch: 8, Loss: 0.3626590371131897
Epoch: 8, Loss: 0.2811112105846405
Epoch: 9, Loss: 0.3297313451766968
Epoch: 9, Loss: 0.3975798487663269


[I 2025-03-03 15:18:32,216] Trial 11 finished with value: 0.4247560945819624 and parameters: {'hidden_size': 298, 'dropout': 0.28274653860042326, 'lr': 0.00011751463112875217}. Best is trial 10 with value: 0.4133858236452414.


Early stopping
Epoch: 0, Loss: 0.8036384582519531
Epoch: 0, Loss: 0.4594501852989197
Epoch: 1, Loss: 0.5593197345733643
Epoch: 1, Loss: 0.28362226486206055
Epoch: 2, Loss: 0.3400469422340393
Epoch: 2, Loss: 0.48734167218208313
Epoch: 3, Loss: 0.3207254111766815
Epoch: 3, Loss: 0.4796919822692871
Epoch: 4, Loss: 0.40167859196662903
Epoch: 4, Loss: 0.5429617762565613
Epoch: 5, Loss: 0.42417216300964355
Epoch: 5, Loss: 0.6379126906394958
Epoch: 6, Loss: 0.47054678201675415
Epoch: 6, Loss: 0.38319119811058044
Epoch: 7, Loss: 0.4014938771724701
Epoch: 7, Loss: 0.302167683839798
Epoch: 8, Loss: 0.30896204710006714
Epoch: 8, Loss: 0.3964707851409912
Epoch: 9, Loss: 0.3888130486011505
Epoch: 9, Loss: 0.45942333340644836
Epoch: 10, Loss: 0.33320510387420654
Epoch: 10, Loss: 0.4733026325702667
Epoch: 11, Loss: 0.4474271237850189
Epoch: 11, Loss: 0.3468940854072571
Epoch: 12, Loss: 0.44823896884918213
Epoch: 12, Loss: 0.4502667784690857
Epoch: 13, Loss: 0.5377280116081238
Epoch: 13, Loss: 0.76130

[I 2025-03-03 15:18:53,967] Trial 12 finished with value: 0.41534855802722787 and parameters: {'hidden_size': 180, 'dropout': 0.39626789920793365, 'lr': 7.010576176035169e-05}. Best is trial 10 with value: 0.4133858236452414.


Epoch: 0, Loss: 0.6661069393157959
Epoch: 0, Loss: 0.47565510869026184
Epoch: 1, Loss: 0.4368438720703125
Epoch: 1, Loss: 0.33039557933807373
Epoch: 2, Loss: 0.49433231353759766
Epoch: 2, Loss: 0.5172950625419617
Epoch: 3, Loss: 0.41988593339920044
Epoch: 3, Loss: 0.7242680788040161
Epoch: 4, Loss: 0.4836803078651428
Epoch: 4, Loss: 0.49080049991607666
Epoch: 5, Loss: 0.33690476417541504
Epoch: 5, Loss: 0.24353189766407013
Epoch: 6, Loss: 0.30004823207855225
Epoch: 6, Loss: 0.3157042860984802
Epoch: 7, Loss: 0.4565933644771576
Epoch: 7, Loss: 0.42397618293762207
Epoch: 8, Loss: 0.4555257260799408
Epoch: 8, Loss: 0.3449230492115021
Epoch: 9, Loss: 0.5253820419311523
Epoch: 9, Loss: 0.4792752265930176
Epoch: 10, Loss: 0.32220718264579773
Epoch: 10, Loss: 0.4086326062679291
Epoch: 11, Loss: 0.3858264684677124
Epoch: 11, Loss: 0.5343315601348877
Epoch: 12, Loss: 0.401411235332489
Epoch: 12, Loss: 0.3316209614276886
Epoch: 13, Loss: 0.36230796575546265
Epoch: 13, Loss: 0.36699140071868896
E

[I 2025-03-03 15:18:59,241] Trial 13 finished with value: 0.4238980970119838 and parameters: {'hidden_size': 263, 'dropout': 0.15834712227931433, 'lr': 0.0005498785972102752}. Best is trial 10 with value: 0.4133858236452414.


Early stopping
Epoch: 0, Loss: 0.62332683801651
Epoch: 0, Loss: 0.43177056312561035
Epoch: 1, Loss: 0.2918297052383423
Epoch: 1, Loss: 0.45235565304756165
Epoch: 2, Loss: 0.47511613368988037
Epoch: 2, Loss: 0.3874140977859497
Epoch: 3, Loss: 0.503330409526825
Epoch: 3, Loss: 0.317443311214447
Epoch: 4, Loss: 0.35227638483047485
Epoch: 4, Loss: 0.3041534423828125
Epoch: 5, Loss: 0.5568457245826721
Epoch: 5, Loss: 0.5207281112670898
Epoch: 6, Loss: 0.5454759001731873
Epoch: 6, Loss: 0.5044186115264893
Epoch: 7, Loss: 0.35122308135032654
Epoch: 7, Loss: 0.40714094042778015
Epoch: 8, Loss: 0.44723865389823914
Epoch: 8, Loss: 0.4941204786300659
Epoch: 9, Loss: 0.4171912968158722
Epoch: 9, Loss: 0.5914026498794556
Epoch: 10, Loss: 0.5114032626152039
Epoch: 10, Loss: 0.40606531500816345
Epoch: 11, Loss: 0.42446401715278625
Epoch: 11, Loss: 0.41051024198532104
Epoch: 12, Loss: 0.4267750680446625
Epoch: 12, Loss: 0.4042692184448242
Epoch: 13, Loss: 0.4675194323062897
Epoch: 13, Loss: 0.44125479

[I 2025-03-03 15:19:15,975] Trial 14 finished with value: 0.4181390768524421 and parameters: {'hidden_size': 297, 'dropout': 0.31074849320931847, 'lr': 6.645702073206894e-05}. Best is trial 10 with value: 0.4133858236452414.


Epoch: 0, Loss: 0.523016095161438
Epoch: 0, Loss: 0.3666870594024658
Epoch: 1, Loss: 0.4307761788368225
Epoch: 1, Loss: 0.4885629117488861
Epoch: 2, Loss: 0.41319262981414795
Epoch: 2, Loss: 0.4827874004840851
Epoch: 3, Loss: 0.3862878680229187
Epoch: 3, Loss: 0.43648457527160645
Epoch: 4, Loss: 0.4394247829914093
Epoch: 4, Loss: 0.3239324688911438
Epoch: 5, Loss: 0.35483241081237793
Epoch: 5, Loss: 0.5887032151222229
Epoch: 6, Loss: 0.2612144351005554
Epoch: 6, Loss: 0.4795770049095154


[I 2025-03-03 15:19:17,990] Trial 15 finished with value: 0.42653757952709004 and parameters: {'hidden_size': 53, 'dropout': 0.16247875908470438, 'lr': 0.000835291979747401}. Best is trial 10 with value: 0.4133858236452414.


Early stopping
Epoch: 0, Loss: 0.7512915134429932
Epoch: 0, Loss: 0.6663556098937988
Epoch: 1, Loss: 0.63459312915802
Epoch: 1, Loss: 0.4164237082004547
Epoch: 2, Loss: 0.4374808073043823
Epoch: 2, Loss: 0.4268292784690857
Epoch: 3, Loss: 0.46152064204216003
Epoch: 3, Loss: 0.35178038477897644
Epoch: 4, Loss: 0.46853747963905334
Epoch: 4, Loss: 0.36249175667762756
Epoch: 5, Loss: 0.4455724358558655
Epoch: 5, Loss: 0.3785977363586426
Epoch: 6, Loss: 0.41844993829727173
Epoch: 6, Loss: 0.36574333906173706
Epoch: 7, Loss: 0.37235110998153687
Epoch: 7, Loss: 0.43003028631210327
Epoch: 8, Loss: 0.34285852313041687
Epoch: 8, Loss: 0.4014495313167572
Epoch: 9, Loss: 0.297600120306015
Epoch: 9, Loss: 0.30571961402893066
Epoch: 10, Loss: 0.4077049791812897
Epoch: 10, Loss: 0.49549028277397156
Epoch: 11, Loss: 0.5398162007331848
Epoch: 11, Loss: 0.41734376549720764
Epoch: 12, Loss: 0.3961869776248932
Epoch: 12, Loss: 0.3813062906265259
Epoch: 13, Loss: 0.5153713822364807
Epoch: 13, Loss: 0.46095

[I 2025-03-03 15:19:45,299] Trial 16 finished with value: 0.42277834058872293 and parameters: {'hidden_size': 197, 'dropout': 0.2784502409362497, 'lr': 1.5513357751988314e-05}. Best is trial 10 with value: 0.4133858236452414.


Epoch: 0, Loss: 0.6792207360267639
Epoch: 0, Loss: 0.6011373400688171
Epoch: 1, Loss: 0.4136902987957001
Epoch: 1, Loss: 0.5689564943313599
Epoch: 2, Loss: 0.18878744542598724
Epoch: 2, Loss: 0.40084517002105713
Epoch: 3, Loss: 0.3485562801361084
Epoch: 3, Loss: 0.3081146478652954
Epoch: 4, Loss: 0.49107611179351807
Epoch: 4, Loss: 0.4516052007675171
Epoch: 5, Loss: 0.48697879910469055
Epoch: 5, Loss: 0.4095098376274109
Epoch: 6, Loss: 0.5391970872879028
Epoch: 6, Loss: 0.5200082063674927
Epoch: 7, Loss: 0.4110456705093384
Epoch: 7, Loss: 0.4194624722003937
Epoch: 8, Loss: 0.43597620725631714
Epoch: 8, Loss: 0.4118558168411255
Epoch: 9, Loss: 0.4409986734390259
Epoch: 9, Loss: 0.554490327835083
Epoch: 10, Loss: 0.2841767966747284
Epoch: 10, Loss: 0.3356788456439972
Epoch: 11, Loss: 0.3717643618583679
Epoch: 11, Loss: 0.45510366559028625
Epoch: 12, Loss: 0.36554190516471863
Epoch: 12, Loss: 0.3702445924282074
Epoch: 13, Loss: 0.49730223417282104
Epoch: 13, Loss: 0.5303084254264832
Epoch

[I 2025-03-03 15:20:09,833] Trial 17 finished with value: 0.41850195018724995 and parameters: {'hidden_size': 255, 'dropout': 0.35601213054255554, 'lr': 3.9754743505003223e-05}. Best is trial 10 with value: 0.4133858236452414.


Early stopping
Epoch: 0, Loss: 1.0394350290298462
Epoch: 0, Loss: 0.4604789614677429
Epoch: 1, Loss: 0.5222411155700684
Epoch: 1, Loss: 0.5063695907592773
Epoch: 2, Loss: 0.33699920773506165
Epoch: 2, Loss: 0.3402629494667053
Epoch: 3, Loss: 0.44299671053886414
Epoch: 3, Loss: 0.4745674729347229
Epoch: 4, Loss: 0.5447362661361694
Epoch: 4, Loss: 0.39515700936317444
Epoch: 5, Loss: 0.4409855306148529
Epoch: 5, Loss: 0.3875155746936798
Epoch: 6, Loss: 0.5399338006973267
Epoch: 6, Loss: 0.3617687523365021
Epoch: 7, Loss: 0.4323369264602661
Epoch: 7, Loss: 0.38015416264533997
Epoch: 8, Loss: 0.4102749526500702
Epoch: 8, Loss: 0.412166565656662
Epoch: 9, Loss: 0.45723938941955566
Epoch: 9, Loss: 0.4470269978046417
Epoch: 10, Loss: 0.4350980818271637
Epoch: 10, Loss: 0.3483828902244568
Epoch: 11, Loss: 0.5232032537460327
Epoch: 11, Loss: 0.43492838740348816
Epoch: 12, Loss: 0.5161138772964478
Epoch: 12, Loss: 0.3723021447658539
Epoch: 13, Loss: 0.44534051418304443
Epoch: 13, Loss: 0.38655176

[I 2025-03-03 15:20:25,438] Trial 18 finished with value: 0.4196419779005929 and parameters: {'hidden_size': 130, 'dropout': 0.34410806697581475, 'lr': 0.00016962200115256908}. Best is trial 10 with value: 0.4133858236452414.


Early stopping
Epoch: 0, Loss: 0.6768295168876648
Epoch: 0, Loss: 0.508387565612793
Epoch: 1, Loss: 0.43392670154571533
Epoch: 1, Loss: 0.4583619236946106
Epoch: 2, Loss: 0.3269774615764618
Epoch: 2, Loss: 0.3946947753429413
Epoch: 3, Loss: 0.4364117980003357
Epoch: 3, Loss: 0.4797314405441284
Epoch: 4, Loss: 0.4203738570213318
Epoch: 4, Loss: 0.5323182344436646
Epoch: 5, Loss: 0.5503084659576416
Epoch: 5, Loss: 0.5226340889930725
Epoch: 6, Loss: 0.5286792516708374
Epoch: 6, Loss: 0.32459861040115356
Early stopping


[I 2025-03-03 15:20:27,608] Trial 19 finished with value: 0.43414893620849954 and parameters: {'hidden_size': 147, 'dropout': 0.2579004953934265, 'lr': 0.0009916611999456172}. Best is trial 10 with value: 0.4133858236452414.


Epoch: 0, Loss: 0.8292909264564514
Epoch: 0, Loss: 0.429922491312027
Epoch: 1, Loss: 0.4339311420917511
Epoch: 1, Loss: 0.32926514744758606
Epoch: 2, Loss: 0.46540623903274536
Epoch: 2, Loss: 0.45221492648124695
Epoch: 3, Loss: 0.5414474606513977
Epoch: 3, Loss: 0.41236671805381775
Epoch: 4, Loss: 0.4271396994590759
Epoch: 4, Loss: 0.32143741846084595
Epoch: 5, Loss: 0.4310992956161499
Epoch: 5, Loss: 0.4188913106918335
Epoch: 6, Loss: 0.4112606942653656
Epoch: 6, Loss: 0.43109646439552307
Epoch: 7, Loss: 0.3224967122077942
Epoch: 7, Loss: 0.3936411738395691
Epoch: 8, Loss: 0.33388498425483704
Epoch: 8, Loss: 0.49881288409233093
Epoch: 9, Loss: 0.48101845383644104
Epoch: 9, Loss: 0.5091035962104797
Epoch: 10, Loss: 0.4087529182434082
Epoch: 10, Loss: 0.3012672960758209
Epoch: 11, Loss: 0.4051114022731781
Epoch: 11, Loss: 0.3836413323879242
Epoch: 12, Loss: 0.4463704526424408
Epoch: 12, Loss: 0.43438881635665894
Epoch: 13, Loss: 0.5776848196983337
Epoch: 13, Loss: 0.36110347509384155
Ep

[I 2025-03-03 15:20:37,057] Trial 20 finished with value: 0.41826087930268663 and parameters: {'hidden_size': 299, 'dropout': 0.19360571433233306, 'lr': 0.00011199542250545026}. Best is trial 10 with value: 0.4133858236452414.


Early stopping
Epoch: 0, Loss: 0.8109049797058105
Epoch: 0, Loss: 0.6168668866157532
Epoch: 1, Loss: 0.3748748302459717
Epoch: 1, Loss: 0.46074461936950684
Epoch: 2, Loss: 0.44627949595451355
Epoch: 2, Loss: 0.4653921127319336
Epoch: 3, Loss: 0.3499094247817993
Epoch: 3, Loss: 0.4329628348350525
Epoch: 4, Loss: 0.4212019741535187
Epoch: 4, Loss: 0.4537171721458435
Epoch: 5, Loss: 0.4209514558315277
Epoch: 5, Loss: 0.3739659786224365
Epoch: 6, Loss: 0.2852447032928467
Epoch: 6, Loss: 0.3295736014842987
Epoch: 7, Loss: 0.4518469274044037
Epoch: 7, Loss: 0.327513724565506
Epoch: 8, Loss: 0.430951863527298
Epoch: 8, Loss: 0.33729439973831177
Epoch: 9, Loss: 0.2970103919506073
Epoch: 9, Loss: 0.5061845779418945
Epoch: 10, Loss: 0.41651979088783264
Epoch: 10, Loss: 0.41207563877105713
Epoch: 11, Loss: 0.39881864190101624
Epoch: 11, Loss: 0.41287124156951904
Epoch: 12, Loss: 0.4964655041694641
Epoch: 12, Loss: 0.5202757120132446
Epoch: 13, Loss: 0.413693368434906
Epoch: 13, Loss: 0.4364745914

[I 2025-03-03 15:21:04,124] Trial 21 finished with value: 0.41575329899440827 and parameters: {'hidden_size': 198, 'dropout': 0.39267410933713087, 'lr': 6.374683825134805e-05}. Best is trial 10 with value: 0.4133858236452414.


Epoch: 0, Loss: 0.8032262325286865
Epoch: 0, Loss: 0.3141425848007202
Epoch: 1, Loss: 0.37729305028915405
Epoch: 1, Loss: 0.39470282196998596
Epoch: 2, Loss: 0.34186044335365295
Epoch: 2, Loss: 0.4157853126525879
Epoch: 3, Loss: 0.3610039949417114
Epoch: 3, Loss: 0.6648315787315369
Epoch: 4, Loss: 0.40842992067337036
Epoch: 4, Loss: 0.5006450414657593
Epoch: 5, Loss: 0.5058852434158325
Epoch: 5, Loss: 0.3078000247478485
Epoch: 6, Loss: 0.42534199357032776
Epoch: 6, Loss: 0.4482544958591461
Epoch: 7, Loss: 0.24620138108730316
Epoch: 7, Loss: 0.4118584394454956
Epoch: 8, Loss: 0.4184364676475525
Epoch: 8, Loss: 0.2751278579235077
Epoch: 9, Loss: 0.7334849238395691
Epoch: 9, Loss: 0.3377397954463959
Epoch: 10, Loss: 0.48360028862953186
Epoch: 10, Loss: 0.6481369733810425
Epoch: 11, Loss: 0.415185809135437
Epoch: 11, Loss: 0.37854310870170593
Epoch: 12, Loss: 0.4779294431209564
Epoch: 12, Loss: 0.3714950382709503
Epoch: 13, Loss: 0.4046710729598999
Epoch: 13, Loss: 0.4457545578479767
Epoch

[I 2025-03-03 15:21:24,049] Trial 22 finished with value: 0.41142106168145626 and parameters: {'hidden_size': 185, 'dropout': 0.3267140666269757, 'lr': 0.00025937129423859296}. Best is trial 22 with value: 0.41142106168145626.


Early stopping
Epoch: 0, Loss: 0.6184343099594116
Epoch: 0, Loss: 0.43344083428382874
Epoch: 1, Loss: 0.3750566840171814
Epoch: 1, Loss: 0.41125452518463135
Epoch: 2, Loss: 0.4656153619289398
Epoch: 2, Loss: 0.4806395173072815
Epoch: 3, Loss: 0.402471661567688
Epoch: 3, Loss: 0.29434001445770264
Epoch: 4, Loss: 0.532741904258728
Epoch: 4, Loss: 0.3713465631008148
Epoch: 5, Loss: 0.3573569059371948
Epoch: 5, Loss: 0.4249607026576996
Epoch: 6, Loss: 0.35871145129203796
Epoch: 6, Loss: 0.40845078229904175
Epoch: 7, Loss: 0.28790926933288574
Epoch: 7, Loss: 0.514184832572937
Epoch: 8, Loss: 0.4576171636581421
Epoch: 8, Loss: 0.4103585183620453
Epoch: 9, Loss: 0.5668452978134155
Epoch: 9, Loss: 0.344501793384552
Epoch: 10, Loss: 0.46131715178489685
Epoch: 10, Loss: 0.49834614992141724
Epoch: 11, Loss: 0.3748658299446106
Epoch: 11, Loss: 0.38405895233154297
Epoch: 12, Loss: 0.4230014681816101
Epoch: 12, Loss: 0.39076921343803406
Epoch: 13, Loss: 0.4086211323738098
Epoch: 13, Loss: 0.35944795

[I 2025-03-03 15:21:46,249] Trial 23 finished with value: 0.417357018402552 and parameters: {'hidden_size': 96, 'dropout': 0.3254157799250297, 'lr': 0.0003086937136482746}. Best is trial 22 with value: 0.41142106168145626.


Epoch: 0, Loss: 0.8564077019691467
Epoch: 0, Loss: 0.323379248380661
Epoch: 1, Loss: 0.5548016428947449
Epoch: 1, Loss: 0.5820204019546509
Epoch: 2, Loss: 0.549239456653595
Epoch: 2, Loss: 0.5641719102859497
Epoch: 3, Loss: 0.5791448950767517
Epoch: 3, Loss: 0.5066545009613037
Epoch: 4, Loss: 0.4450621008872986
Epoch: 4, Loss: 0.2919240891933441
Epoch: 5, Loss: 0.36084654927253723
Epoch: 5, Loss: 0.4436222314834595
Epoch: 6, Loss: 0.476350337266922
Epoch: 6, Loss: 0.48477649688720703
Epoch: 7, Loss: 0.19874954223632812
Epoch: 7, Loss: 0.41563859581947327
Epoch: 8, Loss: 0.2670953869819641
Epoch: 8, Loss: 0.5392099618911743
Epoch: 9, Loss: 0.311025470495224
Epoch: 9, Loss: 0.343886137008667
Epoch: 10, Loss: 0.36433443427085876
Epoch: 10, Loss: 0.45127129554748535
Epoch: 11, Loss: 0.3764741122722626
Epoch: 11, Loss: 0.511155366897583
Epoch: 12, Loss: 0.4609072506427765
Epoch: 12, Loss: 0.39142903685569763
Epoch: 13, Loss: 0.568294107913971
Epoch: 13, Loss: 0.49619144201278687
Epoch: 14, 

[I 2025-03-03 15:22:02,907] Trial 24 finished with value: 0.41294921435754145 and parameters: {'hidden_size': 277, 'dropout': 0.27697295830138785, 'lr': 0.000207125195692809}. Best is trial 22 with value: 0.41142106168145626.


Early stopping
Epoch: 0, Loss: 0.7757750749588013
Epoch: 0, Loss: 0.38768941164016724
Epoch: 1, Loss: 0.4543401896953583
Epoch: 1, Loss: 0.31761810183525085
Epoch: 2, Loss: 0.6771405935287476
Epoch: 2, Loss: 0.4091896712779999
Epoch: 3, Loss: 0.43493571877479553
Epoch: 3, Loss: 0.3314876854419708
Epoch: 4, Loss: 0.44715195894241333
Epoch: 4, Loss: 0.3209998607635498
Epoch: 5, Loss: 0.4653337597846985
Epoch: 5, Loss: 0.4066220819950104
Epoch: 6, Loss: 0.3484858274459839
Epoch: 6, Loss: 0.38941508531570435
Epoch: 7, Loss: 0.48485222458839417
Epoch: 7, Loss: 0.45695385336875916
Epoch: 8, Loss: 0.27122926712036133
Epoch: 8, Loss: 0.5170010924339294
Epoch: 9, Loss: 0.31885790824890137
Epoch: 9, Loss: 0.41508740186691284
Epoch: 10, Loss: 0.5154159069061279
Epoch: 10, Loss: 0.4006732702255249
Epoch: 11, Loss: 0.44473037123680115
Epoch: 11, Loss: 0.3807721734046936
Epoch: 12, Loss: 0.3713129460811615
Epoch: 12, Loss: 0.38012877106666565
Epoch: 13, Loss: 0.361144095659256
Epoch: 13, Loss: 0.405

[I 2025-03-03 15:22:14,274] Trial 25 finished with value: 0.41580875423268315 and parameters: {'hidden_size': 242, 'dropout': 0.27019300482923075, 'lr': 0.0003288789376053345}. Best is trial 22 with value: 0.41142106168145626.


Epoch: 0, Loss: 0.7891616225242615
Epoch: 0, Loss: 0.4553638696670532
Epoch: 1, Loss: 0.5105327367782593
Epoch: 1, Loss: 0.4248303174972534
Epoch: 2, Loss: 0.4320656657218933
Epoch: 2, Loss: 0.42627236247062683
Epoch: 3, Loss: 0.39269211888313293
Epoch: 3, Loss: 0.37754884362220764
Epoch: 4, Loss: 0.4886242151260376
Epoch: 4, Loss: 0.3955436050891876
Epoch: 5, Loss: 0.46965351700782776
Epoch: 5, Loss: 0.5213238000869751
Epoch: 6, Loss: 0.6011773943901062
Epoch: 6, Loss: 0.5202828049659729
Epoch: 7, Loss: 0.7041504383087158
Epoch: 7, Loss: 0.46777766942977905
Epoch: 8, Loss: 0.46657028794288635
Epoch: 8, Loss: 0.41758033633232117
Epoch: 9, Loss: 0.3835514783859253
Epoch: 9, Loss: 0.3528725802898407
Epoch: 10, Loss: 0.47046980261802673
Epoch: 10, Loss: 0.4200840890407562
Epoch: 11, Loss: 0.314089834690094
Epoch: 11, Loss: 0.41895362734794617
Epoch: 12, Loss: 0.4863075315952301
Epoch: 12, Loss: 0.43841153383255005
Epoch: 13, Loss: 0.29905855655670166
Epoch: 13, Loss: 0.4761866629123688
Ep

[I 2025-03-03 15:22:23,758] Trial 26 finished with value: 0.421739813808051 and parameters: {'hidden_size': 207, 'dropout': 0.18240190684785895, 'lr': 0.0005799965274538032}. Best is trial 22 with value: 0.41142106168145626.


Early stopping
Epoch: 0, Loss: 0.7094021439552307
Epoch: 0, Loss: 0.39647161960601807
Epoch: 1, Loss: 0.33249014616012573
Epoch: 1, Loss: 0.5133110284805298
Epoch: 2, Loss: 0.31269174814224243
Epoch: 2, Loss: 0.5483527183532715
Epoch: 3, Loss: 0.39785823225975037
Epoch: 3, Loss: 0.44867047667503357
Epoch: 4, Loss: 0.3556528091430664
Epoch: 4, Loss: 0.4407114088535309
Epoch: 5, Loss: 0.42343437671661377
Epoch: 5, Loss: 0.5328407287597656
Epoch: 6, Loss: 0.4526453912258148
Epoch: 6, Loss: 0.5399359464645386
Epoch: 7, Loss: 0.33388209342956543
Epoch: 7, Loss: 0.29166823625564575
Epoch: 8, Loss: 0.4685247838497162
Epoch: 8, Loss: 0.48673006892204285
Epoch: 9, Loss: 0.36749327182769775
Epoch: 9, Loss: 0.32909780740737915
Epoch: 10, Loss: 0.37360304594039917
Epoch: 10, Loss: 0.393698513507843
Epoch: 11, Loss: 0.515442967414856
Epoch: 11, Loss: 0.6043148040771484
Epoch: 12, Loss: 0.347233384847641
Epoch: 12, Loss: 0.40766268968582153
Epoch: 13, Loss: 0.2703312337398529
Epoch: 13, Loss: 0.4232

[I 2025-03-03 15:22:37,549] Trial 27 finished with value: 0.4153703557072867 and parameters: {'hidden_size': 255, 'dropout': 0.258371978715597, 'lr': 0.00024652876440864487}. Best is trial 22 with value: 0.41142106168145626.


Epoch: 0, Loss: 0.5908716917037964
Epoch: 0, Loss: 0.4792781472206116
Epoch: 1, Loss: 0.4519689977169037
Epoch: 1, Loss: 0.5169774889945984
Epoch: 2, Loss: 0.31644517183303833
Epoch: 2, Loss: 0.3673352301120758
Epoch: 3, Loss: 0.4875529110431671
Epoch: 3, Loss: 0.38875797390937805
Epoch: 4, Loss: 0.4324813783168793
Epoch: 4, Loss: 0.25127995014190674
Epoch: 5, Loss: 0.4014703333377838
Epoch: 5, Loss: 0.3681872487068176
Epoch: 6, Loss: 0.4506581425666809
Epoch: 6, Loss: 0.22295495867729187


[I 2025-03-03 15:22:39,858] Trial 28 finished with value: 0.4274504472275976 and parameters: {'hidden_size': 176, 'dropout': 0.36262648074631254, 'lr': 0.0009989258161975552}. Best is trial 22 with value: 0.41142106168145626.


Early stopping
Epoch: 0, Loss: 0.7760007381439209
Epoch: 0, Loss: 0.31794360280036926
Epoch: 1, Loss: 0.3847731351852417
Epoch: 1, Loss: 0.45324623584747314
Epoch: 2, Loss: 0.46276602149009705
Epoch: 2, Loss: 0.6870188117027283
Epoch: 3, Loss: 0.3894668221473694
Epoch: 3, Loss: 0.5218105912208557
Epoch: 4, Loss: 0.3336523473262787
Epoch: 4, Loss: 0.3483591675758362
Epoch: 5, Loss: 0.42292311787605286
Epoch: 5, Loss: 0.39175042510032654
Epoch: 6, Loss: 0.3942975103855133
Epoch: 6, Loss: 0.41417884826660156
Epoch: 7, Loss: 0.5461790561676025
Epoch: 7, Loss: 0.45670977234840393
Epoch: 8, Loss: 0.353970468044281
Epoch: 8, Loss: 0.3724155128002167
Epoch: 9, Loss: 0.334440678358078
Epoch: 9, Loss: 0.4336477220058441
Epoch: 10, Loss: 0.47341489791870117
Epoch: 10, Loss: 0.3267141282558441
Epoch: 11, Loss: 0.49451881647109985
Epoch: 11, Loss: 0.42180153727531433
Epoch: 12, Loss: 0.4231661260128021
Epoch: 12, Loss: 0.40945619344711304
Epoch: 13, Loss: 0.3496643900871277
Epoch: 13, Loss: 0.58806

[I 2025-03-03 15:22:57,191] Trial 29 finished with value: 0.41320806200853233 and parameters: {'hidden_size': 279, 'dropout': 0.3018807167173934, 'lr': 0.0001463742902104402}. Best is trial 22 with value: 0.41142106168145626.


Epoch: 0, Loss: 0.8779733777046204
Epoch: 0, Loss: 0.6515771746635437
Epoch: 1, Loss: 0.5502623319625854
Epoch: 1, Loss: 0.4888053238391876
Epoch: 2, Loss: 0.5410681962966919
Epoch: 2, Loss: 0.4580555856227875
Epoch: 3, Loss: 0.28020793199539185
Epoch: 3, Loss: 0.429210364818573
Epoch: 4, Loss: 0.36923739314079285
Epoch: 4, Loss: 0.4518321454524994
Epoch: 5, Loss: 0.38265758752822876
Epoch: 5, Loss: 0.42077627778053284
Epoch: 6, Loss: 0.3825010061264038
Epoch: 6, Loss: 0.3646394908428192
Epoch: 7, Loss: 0.5837793946266174
Epoch: 7, Loss: 0.47966885566711426
Epoch: 8, Loss: 0.43935251235961914
Epoch: 8, Loss: 0.25368523597717285
Epoch: 9, Loss: 0.35094740986824036
Epoch: 9, Loss: 0.5224201083183289
Epoch: 10, Loss: 0.3251098096370697
Epoch: 10, Loss: 0.39729970693588257
Epoch: 11, Loss: 0.4033587574958801
Epoch: 11, Loss: 0.3535175323486328
Epoch: 12, Loss: 0.2819468080997467
Epoch: 12, Loss: 0.4045778512954712
Epoch: 13, Loss: 0.42831724882125854
Epoch: 13, Loss: 0.5653668642044067
Epo

[I 2025-03-03 15:23:27,741] Trial 30 finished with value: 0.41751373934657865 and parameters: {'hidden_size': 279, 'dropout': 0.31980017265344407, 'lr': 2.680671941084907e-05}. Best is trial 22 with value: 0.41142106168145626.


Epoch: 0, Loss: 0.5824875831604004
Epoch: 0, Loss: 0.41156506538391113
Epoch: 1, Loss: 0.4322410225868225
Epoch: 1, Loss: 0.43664219975471497
Epoch: 2, Loss: 0.4502539038658142
Epoch: 2, Loss: 0.3507157564163208
Epoch: 3, Loss: 0.416007399559021
Epoch: 3, Loss: 0.6163187623023987
Epoch: 4, Loss: 0.34860655665397644
Epoch: 4, Loss: 0.4426819682121277
Epoch: 5, Loss: 0.42771241068840027
Epoch: 5, Loss: 0.5884087681770325
Epoch: 6, Loss: 0.5320281982421875
Epoch: 6, Loss: 0.4020153284072876
Epoch: 7, Loss: 0.36096709966659546
Epoch: 7, Loss: 0.41225171089172363
Epoch: 8, Loss: 0.4125945568084717
Epoch: 8, Loss: 0.2859940826892853
Epoch: 9, Loss: 0.6063390374183655
Epoch: 9, Loss: 0.43960970640182495
Epoch: 10, Loss: 0.4149962365627289
Epoch: 10, Loss: 0.3449767231941223
Epoch: 11, Loss: 0.4064171314239502
Epoch: 11, Loss: 0.23045411705970764
Epoch: 12, Loss: 0.4783358871936798
Epoch: 12, Loss: 0.5267965793609619
Epoch: 13, Loss: 0.4172229468822479
Epoch: 13, Loss: 0.4915138781070709
Epoch

[I 2025-03-03 15:23:36,574] Trial 31 finished with value: 0.42498758603003084 and parameters: {'hidden_size': 283, 'dropout': 0.2912908708883592, 'lr': 0.00015306469593488707}. Best is trial 22 with value: 0.41142106168145626.


Early stopping
Epoch: 0, Loss: 0.9704750776290894
Epoch: 0, Loss: 0.5289576649665833
Epoch: 1, Loss: 0.3843547999858856
Epoch: 1, Loss: 0.33924174308776855
Epoch: 2, Loss: 0.4723791182041168
Epoch: 2, Loss: 0.35394614934921265
Epoch: 3, Loss: 0.4151081144809723
Epoch: 3, Loss: 0.34049099683761597
Epoch: 4, Loss: 0.4180057942867279
Epoch: 4, Loss: 0.3731224834918976
Epoch: 5, Loss: 0.3662867844104767
Epoch: 5, Loss: 0.428676962852478
Epoch: 6, Loss: 0.43910831212997437
Epoch: 6, Loss: 0.4086076617240906
Epoch: 7, Loss: 0.387251615524292
Epoch: 7, Loss: 0.39606666564941406
Epoch: 8, Loss: 0.5474719405174255
Epoch: 8, Loss: 0.4544270634651184
Epoch: 9, Loss: 0.32489195466041565
Epoch: 9, Loss: 0.43787339329719543
Epoch: 10, Loss: 0.6383377313613892
Epoch: 10, Loss: 0.5123156905174255
Epoch: 11, Loss: 0.4597218632698059
Epoch: 11, Loss: 0.34896552562713623
Epoch: 12, Loss: 0.565575361251831
Epoch: 12, Loss: 0.3821967840194702
Epoch: 13, Loss: 0.4240351617336273
Epoch: 13, Loss: 0.519017159

[I 2025-03-03 15:23:43,251] Trial 32 finished with value: 0.42142934612467053 and parameters: {'hidden_size': 268, 'dropout': 0.33448522931561553, 'lr': 8.3340163850344e-05}. Best is trial 22 with value: 0.41142106168145626.


Early stopping
Epoch: 0, Loss: 0.6068981289863586
Epoch: 0, Loss: 0.6029860973358154
Epoch: 1, Loss: 0.4222269654273987
Epoch: 1, Loss: 0.3478381931781769
Epoch: 2, Loss: 0.36102795600891113
Epoch: 2, Loss: 0.398763507604599
Epoch: 3, Loss: 0.44378089904785156
Epoch: 3, Loss: 0.6858465075492859
Epoch: 4, Loss: 0.5188701748847961
Epoch: 4, Loss: 0.43932420015335083
Epoch: 5, Loss: 0.31981998682022095
Epoch: 5, Loss: 0.560104489326477
Epoch: 6, Loss: 0.52043616771698
Epoch: 6, Loss: 0.534106969833374
Epoch: 7, Loss: 0.27597472071647644
Epoch: 7, Loss: 0.5308474898338318
Epoch: 8, Loss: 0.3546425998210907
Epoch: 8, Loss: 0.3650049567222595
Epoch: 9, Loss: 0.43364569544792175
Epoch: 9, Loss: 0.5851512551307678
Epoch: 10, Loss: 0.4368531107902527
Epoch: 10, Loss: 0.45079469680786133
Epoch: 11, Loss: 0.42657554149627686
Epoch: 11, Loss: 0.3143395483493805
Epoch: 12, Loss: 0.4450947940349579
Epoch: 12, Loss: 0.5078508853912354
Epoch: 13, Loss: 0.3993369936943054
Epoch: 13, Loss: 0.32772690057

[I 2025-03-03 15:24:05,376] Trial 33 finished with value: 0.4214380207060842 and parameters: {'hidden_size': 253, 'dropout': 0.3011139534284636, 'lr': 4.1334260078029867e-05}. Best is trial 22 with value: 0.41142106168145626.


Early stopping
Epoch: 0, Loss: 0.8888320326805115
Epoch: 0, Loss: 0.3671099543571472
Epoch: 1, Loss: 0.5335007905960083
Epoch: 1, Loss: 0.4650208353996277
Epoch: 2, Loss: 0.4089147448539734
Epoch: 2, Loss: 0.3894609212875366
Epoch: 3, Loss: 0.3609093427658081
Epoch: 3, Loss: 0.493629515171051
Epoch: 4, Loss: 0.3795165419578552
Epoch: 4, Loss: 0.5599185824394226
Epoch: 5, Loss: 0.6296118497848511
Epoch: 5, Loss: 0.3552209436893463
Epoch: 6, Loss: 0.360736221075058
Epoch: 6, Loss: 0.5067839026451111
Epoch: 7, Loss: 0.5319949388504028
Epoch: 7, Loss: 0.5913386940956116


[I 2025-03-03 15:24:08,426] Trial 34 finished with value: 0.4284508244961685 and parameters: {'hidden_size': 285, 'dropout': 0.3655856909907644, 'lr': 0.00043715265980480693}. Best is trial 22 with value: 0.41142106168145626.


Early stopping
Epoch: 0, Loss: 0.657315731048584
Epoch: 0, Loss: 0.4059622585773468
Epoch: 1, Loss: 0.5505164265632629
Epoch: 1, Loss: 0.41224274039268494
Epoch: 2, Loss: 0.398725301027298
Epoch: 2, Loss: 0.468932181596756
Epoch: 3, Loss: 0.40197649598121643
Epoch: 3, Loss: 0.32573068141937256
Epoch: 4, Loss: 0.325283020734787
Epoch: 4, Loss: 0.4901019334793091
Epoch: 5, Loss: 0.26720112562179565
Epoch: 5, Loss: 0.2802318334579468
Epoch: 6, Loss: 0.26619741320610046
Epoch: 6, Loss: 0.2961514890193939
Epoch: 7, Loss: 0.44596704840660095
Epoch: 7, Loss: 0.4719405770301819
Epoch: 8, Loss: 0.4361151158809662
Epoch: 8, Loss: 0.5468736886978149
Epoch: 9, Loss: 0.46109336614608765
Epoch: 9, Loss: 0.5028877854347229
Epoch: 10, Loss: 0.411708265542984
Epoch: 10, Loss: 0.45795634388923645
Epoch: 11, Loss: 0.509690523147583
Epoch: 11, Loss: 0.4167811870574951
Epoch: 12, Loss: 0.8455369472503662
Epoch: 12, Loss: 0.3209781050682068
Epoch: 13, Loss: 0.46091803908348083
Epoch: 13, Loss: 0.39404892921

[I 2025-03-03 15:24:27,624] Trial 35 finished with value: 0.4123738813286312 and parameters: {'hidden_size': 229, 'dropout': 0.24820005644198162, 'lr': 0.00019308355255944528}. Best is trial 22 with value: 0.41142106168145626.


Early stopping
Epoch: 0, Loss: 0.6271116137504578
Epoch: 0, Loss: 0.6409357786178589
Epoch: 1, Loss: 0.3928734064102173
Epoch: 1, Loss: 0.3768504559993744
Epoch: 2, Loss: 0.45734965801239014
Epoch: 2, Loss: 0.5158106088638306
Epoch: 3, Loss: 0.5074344873428345
Epoch: 3, Loss: 0.37841182947158813
Epoch: 4, Loss: 0.43740901350975037
Epoch: 4, Loss: 0.3810364902019501
Epoch: 5, Loss: 0.41631898283958435
Epoch: 5, Loss: 0.420406311750412
Epoch: 6, Loss: 0.34629857540130615
Epoch: 6, Loss: 0.4390786290168762
Epoch: 7, Loss: 0.49049562215805054
Epoch: 7, Loss: 0.33984649181365967
Epoch: 8, Loss: 0.39226770401000977
Epoch: 8, Loss: 0.5250398516654968
Epoch: 9, Loss: 0.3678412437438965
Epoch: 9, Loss: 0.40770915150642395
Epoch: 10, Loss: 0.4908808767795563
Epoch: 10, Loss: 0.35359299182891846
Epoch: 11, Loss: 0.4591810405254364
Epoch: 11, Loss: 0.38074418902397156
Epoch: 12, Loss: 0.4670509099960327
Epoch: 12, Loss: 0.46184828877449036
Epoch: 13, Loss: 0.45918259024620056
Epoch: 13, Loss: 0.41

[I 2025-03-03 15:24:41,866] Trial 36 finished with value: 0.4154822923682676 and parameters: {'hidden_size': 218, 'dropout': 0.21754167860426038, 'lr': 0.0002628190116349351}. Best is trial 22 with value: 0.41142106168145626.


Epoch: 0, Loss: 0.8276278376579285
Epoch: 0, Loss: 0.36665114760398865
Epoch: 1, Loss: 0.5138198733329773
Epoch: 1, Loss: 0.39679861068725586
Epoch: 2, Loss: 0.3111347556114197
Epoch: 2, Loss: 0.4654383659362793
Epoch: 3, Loss: 0.36017370223999023
Epoch: 3, Loss: 0.3048732876777649
Epoch: 4, Loss: 0.5395935773849487
Epoch: 4, Loss: 0.411927729845047
Epoch: 5, Loss: 0.42585161328315735
Epoch: 5, Loss: 0.3824366331100464
Epoch: 6, Loss: 0.47433987259864807
Epoch: 6, Loss: 0.32592490315437317
Epoch: 7, Loss: 0.306641161441803
Epoch: 7, Loss: 0.5907979011535645
Epoch: 8, Loss: 0.30703821778297424
Epoch: 8, Loss: 0.3750962018966675


[I 2025-03-03 15:24:44,466] Trial 37 finished with value: 0.42800659611416253 and parameters: {'hidden_size': 236, 'dropout': 0.23610586201351305, 'lr': 0.00021275519219199397}. Best is trial 22 with value: 0.41142106168145626.


Early stopping
Epoch: 0, Loss: 0.7141053676605225
Epoch: 0, Loss: 0.4957612156867981
Epoch: 1, Loss: 0.4314635992050171
Epoch: 1, Loss: 0.5551658868789673
Epoch: 2, Loss: 0.6270381212234497
Epoch: 2, Loss: 0.44469523429870605
Epoch: 3, Loss: 0.43629902601242065
Epoch: 3, Loss: 0.39032265543937683
Epoch: 4, Loss: 0.4595312476158142
Epoch: 4, Loss: 0.2855978012084961
Epoch: 5, Loss: 0.4704415202140808
Epoch: 5, Loss: 0.4212471544742584
Epoch: 6, Loss: 0.3737679719924927
Epoch: 6, Loss: 0.5362594723701477
Epoch: 7, Loss: 0.4220590889453888
Epoch: 7, Loss: 0.3754488229751587
Epoch: 8, Loss: 0.32281750440597534
Epoch: 8, Loss: 0.46989357471466064
Epoch: 9, Loss: 0.39740967750549316
Epoch: 9, Loss: 0.5063669681549072
Epoch: 10, Loss: 0.4090690314769745
Epoch: 10, Loss: 0.36381348967552185
Epoch: 11, Loss: 0.42503082752227783
Epoch: 11, Loss: 0.4793691337108612
Early stopping


[I 2025-03-03 15:24:47,934] Trial 38 finished with value: 0.42841428608535237 and parameters: {'hidden_size': 163, 'dropout': 0.24911859524221588, 'lr': 0.005730889320311669}. Best is trial 22 with value: 0.41142106168145626.


Epoch: 0, Loss: 0.7427371144294739
Epoch: 0, Loss: 0.4601782560348511
Epoch: 1, Loss: 0.5404643416404724
Epoch: 1, Loss: 0.35595518350601196
Epoch: 2, Loss: 0.360617995262146
Epoch: 2, Loss: 0.5195513963699341
Epoch: 3, Loss: 0.314423531293869
Epoch: 3, Loss: 0.4744381010532379
Epoch: 4, Loss: 0.3766460418701172
Epoch: 4, Loss: 0.3674538731575012
Epoch: 5, Loss: 0.3848278522491455
Epoch: 5, Loss: 0.4173975884914398
Epoch: 6, Loss: 0.39916157722473145
Epoch: 6, Loss: 0.5139745473861694
Epoch: 7, Loss: 0.40171173214912415
Epoch: 7, Loss: 0.5207732915878296
Epoch: 8, Loss: 0.34524980187416077
Epoch: 8, Loss: 0.5365312695503235
Epoch: 9, Loss: 0.5657285451889038
Epoch: 9, Loss: 0.4551917612552643
Epoch: 10, Loss: 0.44549137353897095
Epoch: 10, Loss: 0.33632177114486694
Epoch: 11, Loss: 0.5059067606925964
Epoch: 11, Loss: 0.5733465552330017
Epoch: 12, Loss: 0.4928952157497406
Epoch: 12, Loss: 0.38123294711112976
Epoch: 13, Loss: 0.31228360533714294
Epoch: 13, Loss: 0.3317677676677704
Epoch:

[I 2025-03-03 15:24:55,486] Trial 39 finished with value: 0.42549444641339484 and parameters: {'hidden_size': 192, 'dropout': 0.271295452352129, 'lr': 0.0004933016044528566}. Best is trial 22 with value: 0.41142106168145626.


Early stopping
Epoch: 0, Loss: 0.634751558303833
Epoch: 0, Loss: 0.3227721154689789
Epoch: 1, Loss: 0.4407879710197449
Epoch: 1, Loss: 0.5172328948974609
Epoch: 2, Loss: 0.32084736227989197
Epoch: 2, Loss: 0.4188827574253082
Epoch: 3, Loss: 0.3641853332519531
Epoch: 3, Loss: 0.3641831874847412
Epoch: 4, Loss: 0.4623240828514099
Epoch: 4, Loss: 0.3600706160068512
Epoch: 5, Loss: 0.46830326318740845
Epoch: 5, Loss: 0.5238387584686279
Epoch: 6, Loss: 0.4760659635066986
Epoch: 6, Loss: 0.5075551271438599
Early stopping


[I 2025-03-03 15:24:57,524] Trial 40 finished with value: 0.42975827616564233 and parameters: {'hidden_size': 225, 'dropout': 0.11019959894648403, 'lr': 0.0015390722131768853}. Best is trial 22 with value: 0.41142106168145626.


Epoch: 0, Loss: 0.6118379235267639
Epoch: 0, Loss: 0.4216712713241577
Epoch: 1, Loss: 0.4170805513858795
Epoch: 1, Loss: 0.27953779697418213
Epoch: 2, Loss: 0.6240262389183044
Epoch: 2, Loss: 0.40868091583251953
Epoch: 3, Loss: 0.38991931080818176
Epoch: 3, Loss: 0.4931084215641022
Epoch: 4, Loss: 0.4261399805545807
Epoch: 4, Loss: 0.34847888350486755
Epoch: 5, Loss: 0.6485743522644043
Epoch: 5, Loss: 0.43297043442726135
Epoch: 6, Loss: 0.6097636222839355
Epoch: 6, Loss: 0.3415853679180145
Epoch: 7, Loss: 0.6520900130271912
Epoch: 7, Loss: 0.23939470946788788
Epoch: 8, Loss: 0.37529098987579346
Epoch: 8, Loss: 0.2842743694782257
Epoch: 9, Loss: 0.4399546980857849
Epoch: 9, Loss: 0.5561841726303101
Epoch: 10, Loss: 0.4632228910923004
Epoch: 10, Loss: 0.4233096241950989
Epoch: 11, Loss: 0.5984839797019958
Epoch: 11, Loss: 0.3824460208415985
Epoch: 12, Loss: 0.4020361006259918
Epoch: 12, Loss: 0.42441028356552124
Epoch: 13, Loss: 0.45722249150276184
Epoch: 13, Loss: 0.3763953745365143
Epo

[I 2025-03-03 15:25:11,192] Trial 41 finished with value: 0.41561344361673136 and parameters: {'hidden_size': 268, 'dropout': 0.30040903470044555, 'lr': 0.0001472421663238304}. Best is trial 22 with value: 0.41142106168145626.


Early stopping
Epoch: 0, Loss: 0.5109156370162964
Epoch: 0, Loss: 0.5238698124885559
Epoch: 1, Loss: 0.38962340354919434
Epoch: 1, Loss: 0.33665260672569275
Epoch: 2, Loss: 0.4903319180011749
Epoch: 2, Loss: 0.6856141090393066
Epoch: 3, Loss: 0.34391894936561584
Epoch: 3, Loss: 0.5052558183670044
Epoch: 4, Loss: 0.4181535542011261
Epoch: 4, Loss: 0.3615402281284332
Epoch: 5, Loss: 0.4864533841609955
Epoch: 5, Loss: 0.3511711657047272
Epoch: 6, Loss: 0.5145512819290161
Epoch: 6, Loss: 0.3171069025993347
Epoch: 7, Loss: 0.4176945090293884
Epoch: 7, Loss: 0.33383142948150635
Epoch: 8, Loss: 0.40452808141708374
Epoch: 8, Loss: 0.5639935731887817
Epoch: 9, Loss: 0.3505898416042328
Epoch: 9, Loss: 0.36464324593544006
Epoch: 10, Loss: 0.3470747172832489
Epoch: 10, Loss: 0.30307310819625854
Epoch: 11, Loss: 0.4355284869670868
Epoch: 11, Loss: 0.3703177273273468
Epoch: 12, Loss: 0.5315213203430176
Epoch: 12, Loss: 0.4392206072807312
Epoch: 13, Loss: 0.5120853185653687
Epoch: 13, Loss: 0.3836974

[I 2025-03-03 15:25:18,958] Trial 42 finished with value: 0.42567106522619724 and parameters: {'hidden_size': 288, 'dropout': 0.3125448536942983, 'lr': 0.00018977057605471984}. Best is trial 22 with value: 0.41142106168145626.


Epoch: 0, Loss: 0.697641909122467
Epoch: 0, Loss: 0.33642393350601196
Epoch: 1, Loss: 0.3781593441963196
Epoch: 1, Loss: 0.5088818669319153
Epoch: 2, Loss: 0.5438861846923828
Epoch: 2, Loss: 0.43686643242836
Epoch: 3, Loss: 0.5145899653434753
Epoch: 3, Loss: 0.49094462394714355
Epoch: 4, Loss: 0.4389941692352295
Epoch: 4, Loss: 0.41192054748535156
Epoch: 5, Loss: 0.34666872024536133
Epoch: 5, Loss: 0.494228333234787
Epoch: 6, Loss: 0.453975647687912
Epoch: 6, Loss: 0.32378891110420227
Epoch: 7, Loss: 0.43389034271240234
Epoch: 7, Loss: 0.3954554498195648
Epoch: 8, Loss: 0.48562756180763245
Epoch: 8, Loss: 0.3917124271392822
Epoch: 9, Loss: 0.3621903657913208
Epoch: 9, Loss: 0.44615796208381653
Epoch: 10, Loss: 0.4429987072944641
Epoch: 10, Loss: 0.42968782782554626
Epoch: 11, Loss: 0.5549191236495972
Epoch: 11, Loss: 0.4128831923007965
Epoch: 12, Loss: 0.4325103163719177
Epoch: 12, Loss: 0.4415058195590973
Epoch: 13, Loss: 0.4006352722644806
Epoch: 13, Loss: 0.3918338716030121
Epoch: 1

[I 2025-03-03 15:25:34,216] Trial 43 finished with value: 0.4143256602450187 and parameters: {'hidden_size': 249, 'dropout': 0.2876129763415412, 'lr': 0.0001008451139799531}. Best is trial 22 with value: 0.41142106168145626.


Early stopping
Epoch: 0, Loss: 0.6892909407615662
Epoch: 0, Loss: 0.3904135823249817
Epoch: 1, Loss: 0.2867647111415863
Epoch: 1, Loss: 0.4111950397491455
Epoch: 2, Loss: 0.47075021266937256
Epoch: 2, Loss: 0.533437192440033
Epoch: 3, Loss: 0.5152962803840637
Epoch: 3, Loss: 0.2706857919692993
Epoch: 4, Loss: 0.3667164742946625
Epoch: 4, Loss: 0.386931449174881
Epoch: 5, Loss: 0.5360320210456848
Epoch: 5, Loss: 0.4282771944999695
Epoch: 6, Loss: 0.48412710428237915
Epoch: 6, Loss: 0.4513421356678009
Epoch: 7, Loss: 0.4568648934364319
Epoch: 7, Loss: 0.4088757336139679
Epoch: 8, Loss: 0.3626546263694763
Epoch: 8, Loss: 0.4328317940235138
Epoch: 9, Loss: 0.5055121779441833
Epoch: 9, Loss: 0.37551558017730713
Epoch: 10, Loss: 0.37418538331985474
Epoch: 10, Loss: 0.30289387702941895
Epoch: 11, Loss: 0.454725056886673
Epoch: 11, Loss: 0.2578202486038208
Epoch: 12, Loss: 0.44622382521629333
Epoch: 12, Loss: 0.474077045917511
Epoch: 13, Loss: 0.5689993500709534
Epoch: 13, Loss: 0.366330891847

[I 2025-03-03 15:25:48,983] Trial 44 finished with value: 0.41780284623852615 and parameters: {'hidden_size': 272, 'dropout': 0.24611251583625868, 'lr': 0.0001208461346633042}. Best is trial 22 with value: 0.41142106168145626.


Early stopping
Epoch: 0, Loss: 0.6959660053253174
Epoch: 0, Loss: 0.4023086130619049
Epoch: 1, Loss: 0.48259320855140686
Epoch: 1, Loss: 0.43012863397598267
Epoch: 2, Loss: 0.3677346408367157
Epoch: 2, Loss: 0.34111490845680237
Epoch: 3, Loss: 0.40128156542778015
Epoch: 3, Loss: 0.46011969447135925
Epoch: 4, Loss: 0.39526164531707764
Epoch: 4, Loss: 0.5537604689598083
Epoch: 5, Loss: 0.40444427728652954
Epoch: 5, Loss: 0.4126347005367279
Epoch: 6, Loss: 0.37426435947418213
Epoch: 6, Loss: 0.3811018466949463
Epoch: 7, Loss: 0.44038066267967224
Epoch: 7, Loss: 0.4390643239021301
Epoch: 8, Loss: 0.5319239497184753
Epoch: 8, Loss: 0.25974905490875244
Epoch: 9, Loss: 0.30812016129493713
Epoch: 9, Loss: 0.35903337597846985
Epoch: 10, Loss: 0.48262226581573486
Epoch: 10, Loss: 0.3586776554584503
Epoch: 11, Loss: 0.32945647835731506
Epoch: 11, Loss: 0.4662114083766937
Epoch: 12, Loss: 0.4899764060974121
Epoch: 12, Loss: 0.1870235651731491
Epoch: 13, Loss: 0.45667311549186707
Epoch: 13, Loss: 0

[I 2025-03-03 15:26:02,409] Trial 45 finished with value: 0.42199275536196573 and parameters: {'hidden_size': 210, 'dropout': 0.3766255441081515, 'lr': 0.00037894369027433993}. Best is trial 22 with value: 0.41142106168145626.


Epoch: 0, Loss: 0.6253012418746948
Epoch: 0, Loss: 0.5313881635665894
Epoch: 1, Loss: 0.4615197777748108
Epoch: 1, Loss: 0.4166426658630371
Epoch: 2, Loss: 0.42827826738357544
Epoch: 2, Loss: 0.43044331669807434
Epoch: 3, Loss: 0.4847976863384247
Epoch: 3, Loss: 0.36723148822784424
Epoch: 4, Loss: 0.48902952671051025
Epoch: 4, Loss: 0.47706758975982666
Epoch: 5, Loss: 0.38766390085220337
Epoch: 5, Loss: 0.34464263916015625
Epoch: 6, Loss: 0.48567965626716614
Epoch: 6, Loss: 0.43597909808158875
Epoch: 7, Loss: 0.32210659980773926
Epoch: 7, Loss: 0.37382662296295166
Epoch: 8, Loss: 0.40376579761505127
Epoch: 8, Loss: 0.4271926283836365
Epoch: 9, Loss: 0.3593493103981018
Epoch: 9, Loss: 0.348323255777359
Epoch: 10, Loss: 0.40576696395874023
Epoch: 10, Loss: 0.40598544478416443
Epoch: 11, Loss: 0.4413261413574219
Epoch: 11, Loss: 0.42349857091903687
Epoch: 12, Loss: 0.48339930176734924
Epoch: 12, Loss: 0.29987582564353943
Epoch: 13, Loss: 0.29881617426872253
Epoch: 13, Loss: 0.297235161066

[I 2025-03-03 15:26:09,215] Trial 46 finished with value: 0.4321959930101861 and parameters: {'hidden_size': 292, 'dropout': 0.3287838839318462, 'lr': 0.0006667473067940841}. Best is trial 22 with value: 0.41142106168145626.


Early stopping
Epoch: 0, Loss: 0.696659505367279
Epoch: 0, Loss: 0.5397957563400269
Epoch: 1, Loss: 0.41091325879096985
Epoch: 1, Loss: 0.40210825204849243
Epoch: 2, Loss: 0.46452072262763977
Epoch: 2, Loss: 0.3655879497528076
Epoch: 3, Loss: 0.32143041491508484
Epoch: 3, Loss: 0.3086499273777008
Epoch: 4, Loss: 0.2737557888031006
Epoch: 4, Loss: 0.4580695927143097
Epoch: 5, Loss: 0.45522958040237427
Epoch: 5, Loss: 0.5660635828971863
Epoch: 6, Loss: 0.4563301205635071
Epoch: 6, Loss: 0.24002724885940552
Epoch: 7, Loss: 0.44636884331703186
Epoch: 7, Loss: 0.4061492681503296
Epoch: 8, Loss: 0.6256473660469055
Epoch: 8, Loss: 0.41181156039237976
Epoch: 9, Loss: 0.3780532777309418
Epoch: 9, Loss: 0.47379055619239807
Epoch: 10, Loss: 0.5798885226249695
Epoch: 10, Loss: 0.29359662532806396
Epoch: 11, Loss: 0.5153504610061646
Epoch: 11, Loss: 0.38268545269966125
Epoch: 12, Loss: 0.5596291422843933
Epoch: 12, Loss: 0.39060744643211365
Epoch: 13, Loss: 0.5201485753059387
Epoch: 13, Loss: 0.426

[I 2025-03-03 15:26:26,966] Trial 47 finished with value: 0.4187603226417433 and parameters: {'hidden_size': 233, 'dropout': 0.3455410195238464, 'lr': 5.361887511710794e-05}. Best is trial 22 with value: 0.41142106168145626.


Epoch: 0, Loss: 0.819193959236145
Epoch: 0, Loss: 0.37868186831474304
Epoch: 1, Loss: 0.2224918007850647
Epoch: 1, Loss: 0.5634194016456604
Epoch: 2, Loss: 0.448588490486145
Epoch: 2, Loss: 0.41938695311546326
Epoch: 3, Loss: 0.540790855884552
Epoch: 3, Loss: 0.3782906234264374
Epoch: 4, Loss: 0.45985284447669983
Epoch: 4, Loss: 0.4827806055545807
Epoch: 5, Loss: 0.4091682434082031
Epoch: 5, Loss: 0.43858450651168823
Epoch: 6, Loss: 0.38509616255760193
Epoch: 6, Loss: 0.5257836580276489
Epoch: 7, Loss: 0.33883848786354065
Epoch: 7, Loss: 0.3446692228317261
Epoch: 8, Loss: 0.36768192052841187
Epoch: 8, Loss: 0.3743918240070343
Epoch: 9, Loss: 0.5035917162895203
Epoch: 9, Loss: 0.36440473794937134
Epoch: 10, Loss: 0.3789946138858795
Epoch: 10, Loss: 0.3985324501991272
Epoch: 11, Loss: 0.42013466358184814
Epoch: 11, Loss: 0.4733745753765106
Epoch: 12, Loss: 0.413896381855011
Epoch: 12, Loss: 0.484529584646225
Epoch: 13, Loss: 0.5250592231750488
Epoch: 13, Loss: 0.5311992168426514
Epoch: 1

[I 2025-03-03 15:26:41,322] Trial 48 finished with value: 0.41597702891583216 and parameters: {'hidden_size': 262, 'dropout': 0.21974287493850617, 'lr': 0.0002444047659671073}. Best is trial 22 with value: 0.41142106168145626.


Early stopping
Epoch: 0, Loss: 0.7474607229232788
Epoch: 0, Loss: 0.4360671937465668
Epoch: 1, Loss: 0.3226762115955353
Epoch: 1, Loss: 0.3721334934234619
Epoch: 2, Loss: 0.3660534620285034
Epoch: 2, Loss: 0.4229695498943329
Epoch: 3, Loss: 0.5197665095329285
Epoch: 3, Loss: 0.3630763590335846
Epoch: 4, Loss: 0.2883955240249634
Epoch: 4, Loss: 0.4346677362918854
Epoch: 5, Loss: 0.49669063091278076
Epoch: 5, Loss: 0.2768736481666565
Epoch: 6, Loss: 0.3587530255317688
Epoch: 6, Loss: 0.5562158226966858
Epoch: 7, Loss: 0.41625258326530457
Epoch: 7, Loss: 0.44215330481529236
Epoch: 8, Loss: 0.38214632868766785
Epoch: 8, Loss: 0.4937567114830017
Epoch: 9, Loss: 0.4484466314315796
Epoch: 9, Loss: 0.3577127754688263
Epoch: 10, Loss: 0.4205235242843628
Epoch: 10, Loss: 0.4780934453010559
Epoch: 11, Loss: 0.31808555126190186
Epoch: 11, Loss: 0.40855133533477783
Epoch: 12, Loss: 0.40249010920524597
Epoch: 12, Loss: 0.42081716656684875
Epoch: 13, Loss: 0.27167925238609314
Epoch: 13, Loss: 0.58412

[I 2025-03-03 15:26:54,399] Trial 49 finished with value: 0.41497972464732685 and parameters: {'hidden_size': 276, 'dropout': 0.26739005008931604, 'lr': 0.00014392726926503157}. Best is trial 22 with value: 0.41142106168145626.


{'hidden_size': 185, 'dropout': 0.3267140666269757, 'lr': 0.00025937129423859296}


Optuna shows the result that:

[I 2025-03-03 15:26:54,399] Trial 49 finished with value: 0.41497972464732685 and parameters: {'hidden_size': 276, 'dropout': 0.26739005008931604, 'lr': 0.00014392726926503157}. Best is trial 22 with value: 0.41142106168145626.

{'hidden_size': 185, 'dropout': 0.3267140666269757, 'lr': 0.00025937129423859296}

Therefore, I will use the hypoparameter of 'hidden_size': 185, 'dropout': 0.3267140666269757, 'lr': 0.00025937129423859296 for optimized training -- both for 30 d embedding and 768d bert embedding


## **Step 3 continued: Insights**

Did you find the hyperparameter search helpful? Does it help to increase the number of trials in the optimization? Note that so far we have used the simplest version of optuna which has many nice features. Can you discover more useful features by browsing the optuna website? (Hint: try pruning)

The hyperparameter search is very helpdul for it could mark the exact best hyperparameters from the search range listed-- it saves a lost of time because there are a lot of hyperparameters to consider, but optuna could solve them once and for all. 

It actually would be helpful if I increase the trials in the optimization-- more trial times could mean higher possibilities of finding a better solution. For example, ther was a test run that the best outcome was in the 99th trial. 


module = optunahub.load_module(package="samplers/auto_sampler")
The package "samplers" looks very interesting.
Autosampler is a package that is loaded from OptunaHub, Optuna is a extension of Optuna used for simplize and automate the sampling method used in hyperparameter optimization. While AutoSampler could be used for choosing the suitable sampling method automatically for optimization.




Pruning is another thing that is very interesting.
This is the code:

from optuna.pruners import MedianPruner

study = optuna.create_study(direction="minimize", pruner=MedianPruner())
study.optimize(objective, n_trials=100)

This is one function that could do "early stopping" to end the experiment in advance if the training proformance is not as expected. 
It could save computational cost and avoid overfitting as it stops the experiment when the loss is increasing in a high rate.


## **Step 4: Final Training**

Now that you have found a good hyperparameter setting the validation set is no longer needed. The last step is to combine the training and validation set into a combined training set and retrain the model under the best parameter setting found. Report your final loss on your test data.

final test with 30d embedding

Average Training Loss: 0.4365

Test Error: 

 Accuracy: 78.5%, Avg loss: 0.425930

 The accuracy did not change because the demension reduction would lose some information that could be used for classification.

Final training with 768d bert embedding:

Epoch 1
-------------------------------
loss: 0.964580  [    0/ 5124]
loss: 0.566610  [  400/ 5124]
loss: 0.894764  [  800/ 5124]
loss: 0.741076  [ 1200/ 5124]
loss: 0.510969  [ 1600/ 5124]
loss: 0.402552  [ 2000/ 5124]
loss: 0.684575  [ 2400/ 5124]
loss: 0.250913  [ 2800/ 5124]
loss: 0.698820  [ 3200/ 5124]
loss: 0.480867  [ 3600/ 5124]
loss: 0.509612  [ 4000/ 5124]
loss: 0.483319  [ 4400/ 5124]
loss: 0.742746  [ 4800/ 5124]
Average Training Loss: 0.5555

Test Error: 
 Accuracy: 79.9%, Avg loss: 0.456661 

Epoch 2
-------------------------------
loss: 0.509154  [    0/ 5124]
loss: 0.479380  [  400/ 5124]
loss: 0.626528  [  800/ 5124]
loss: 0.397814  [ 1200/ 5124]
...

Average Training Loss: 0.4304

Test Error:

 Accuracy: 84.6%, Avg loss: 0.318552
  
For 100 epochs training with the optimized hyperparameters being evaluated by Optuna, the outcome is : the average training loss has dropped to 0.4304 from 0.556, which is obvious. It is obvious that the accuracy and loos drop is sharper in BERT embeddings-- as it increase 768 demensions each tensor, it contains more information for the models to learn and fit.

For test error, the accuracy has increased from 79.9% to 84.6%. 

The test average loss has dropped to 0.318 from 0.457. This is an obvious improvement.

Due to the data now used is text embedding, there are still space for improvement: according to previous studies by Hirakawa, M. (2024), Null subjects and long-distance anaphors revisited: What the acquisition of Japanese vs. Chinese contributes to generative approaches to SLA. Japanese Native speakers act very differently in terms of pronouns than English native speakers when speaking English. Therefore, a possible solution in improving the accuracy would be: fine tuning the BERT to have more attention on the patterns of pronouns of both Japanese Natives and English Natives dataset. Therefore, capturing and focus on the differeces on pronoun patterns.

Example: 
English Natives: 
I bought some apples in the store today.

Japanese Natives:
Some apples bought in the store today.



## **Final Submission**
Upload your submission for Milestone 2 to Canvas. 
Happy Deep Learning! 🚀