## **Install transformers dataset**
```
!pip install transformers datasets accelerate ray[tune] optuna -U
```
This line of code is used to install several Python libraries needed for working with NLP models and optimizing them. The exclamation mark at the beginning allows the command to run directly in a notebook. transformers is a library that provides pre-trained models like BERT and RoBERTa for tasks such as text classification and sentiment analysis, while datasets helps to easily access and manage NLP datasets. accelerate makes training faster and more efficient by allowing models to run on multiple devices or use mixed-precision. ray[tune] and optuna are both tools for hyperparameter optimization, which means they help find the best settings for a model to perform well. Finally, the -U flag ensures that all these packages are updated to their latest versions.

In [None]:
!pip install transformers datasets accelerate ray[tune] optuna -U

Collecting datasets
  Downloading datasets-4.4.1-py3-none-any.whl.metadata (19 kB)
Collecting optuna
  Downloading optuna-4.5.0-py3-none-any.whl.metadata (17 kB)
Collecting ray[tune]
  Downloading ray-2.51.1-cp312-cp312-manylinux2014_x86_64.whl.metadata (21 kB)
Collecting pyarrow>=21.0.0 (from datasets)
  Downloading pyarrow-22.0.0-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (3.2 kB)
Collecting click!=8.3.0,>=7.0 (from ray[tune])
  Downloading click-8.2.1-py3-none-any.whl.metadata (2.5 kB)
Collecting tensorboardX>=1.9 (from ray[tune])
  Downloading tensorboardx-2.6.4-py3-none-any.whl.metadata (6.2 kB)
Collecting colorlog (from optuna)
  Downloading colorlog-6.10.1-py3-none-any.whl.metadata (11 kB)
Downloading datasets-4.4.1-py3-none-any.whl (511 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m511.6/511.6 kB[0m [31m13.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading optuna-4.5.0-py3-none-any.whl (400 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [

## **Setup and Installation**

```
import torch
import os
import numpy as np
import pandas as pd
from datasets import Dataset
from transformers import (
    AutoModelForSequenceClassification,
    AutoTokenizer,
    TrainingArguments,
    Trainer,
    set_seed
)
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score
import optuna

# Set a consistent seed for reproducibility across runs
set_seed(42)

# Ensure GPU is available
if torch.cuda.is_available():
    device = torch.device("cuda")
    print(f"Using GPU: {torch.cuda.get_device_name(0)}")
else:
    device = torch.device("cpu")
    print("Using CPU. For faster training, consider enabling a GPU runtime.")
```
This part of the code sets up the environment for training a text classification model. First, it imports the necessary libraries: torch for deep learning, numpy and pandas for data handling, datasets for NLP datasets, transformers for working with pre-trained models, sklearn for data splitting and evaluation metrics, and optuna for hyperparameter optimization. The set_seed(42) function is used to make the results reproducible, meaning the same results can be obtained every time the code runs. The code also checks whether a GPU is available for training; if so, it uses the GPU to speed up computation, otherwise it defaults to the CPU, which is slower but still works.

-----
## **Data Preparation**
```
try:
    df = pd.read_csv("Mental-Health-Twitter.csv")
    print("Dataset loaded successfully.")
except FileNotFoundError:
    print("Error: 'Mental-Health-Twitter.csv' not found. Please upload it to your Colab environment.")
    exit()

df = df.dropna(subset=['post_text'])
df = df[df['post_text'].str.strip() != '']
df = df.rename(columns={"label": "labels"})

train_df, eval_df = train_test_split(df, test_size=0.1, stratify=df['labels'], random_state=42)
train_df = train_df.sample(n=10000, random_state=42)
eval_df = eval_df.sample(n=2000, random_state=42)

print(f"Using {len(train_df)} training samples and {len(eval_df)} evaluation samples.")
print(f"Train label distribution:\n{train_df['labels'].value_counts(normalize=True)}")
print(f"Eval label distribution:\n{eval_df['labels'].value_counts(normalize=True)}")

train_dataset = Dataset.from_pandas(train_df[['post_text', 'labels']])
eval_dataset = Dataset.from_pandas(eval_df[['post_text', 'labels']])

MODEL_NAME = "margotwagner/roberta-psychotherapy-eval"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

def tokenize_function(examples):
    return tokenizer(examples["post_text"], truncation=True, padding=True, max_length=128)

tokenized_train = train_dataset.map(tokenize_function, batched=True, remove_columns=["post_text", "__index_level_0__"])
tokenized_eval = eval_dataset.map(tokenize_function, batched=True, remove_columns=["post_text", "__index_level_0__"])

tokenized_train.set_format("torch", columns=['input_ids', 'attention_mask', 'labels'])
tokenized_eval.set_format("torch", columns=['input_ids', 'attention_mask', 'labels'])
```
This code loads and prepares the dataset for training a text classification model. First, it tries to read a CSV file named Mental-Health-Twitter.csv and stops execution with an error message if the file is not found. Missing or empty text entries in the post_text column are removed, and the target column is renamed to labels for consistency. The dataset is then split into training (90%) and evaluation (10%) sets, and a subset of 10,000 training samples and 2,000 evaluation samples is randomly selected to reduce computation time. The distribution of labels in both sets is displayed to ensure balance. Next, the training and evaluation data are converted into Hugging Face Dataset objects. A pre-trained RoBERTa model (margotwagner/roberta-psychotherapy-eval) is loaded along with its tokenizer. The tokenize_function converts each text into token IDs and attention masks, truncating or padding sequences to a maximum length of 128 tokens.The tokenized datasets are formatted for PyTorch, keeping only the necessary columns: input_ids, attention_mask, and labels, ready for model training.

-----
## **Model and Metrics**
```
def model_init():
    return AutoModelForSequenceClassification.from_pretrained(MODEL_NAME, num_labels=2).to(device)

def compute_metrics(p):
    preds = np.argmax(p.predictions, axis=1)
    acc = accuracy_score(p.label_ids, preds)
    f1 = f1_score(p.label_ids, preds, average="binary")
    precision = precision_score(p.label_ids, preds, average="binary")
    recall = recall_score(p.label_ids, preds, average="binary")
    return {"accuracy": acc, "f1": f1, "precision": precision, "recall": recall}

# --- 3b. HYPERPARAMETER SPACE INCLUDING GRADIENT ACCUMULATION ---
def tune_hp(trial):
    learning_rate = trial.suggest_categorical("learning_rate", [2e-5, 3e-5])
    per_device_train_batch_size = trial.suggest_categorical("per_device_train_batch_size", [8, 16])
    weight_decay = trial.suggest_categorical("weight_decay", [0.01, 0.03])
    num_train_epochs = trial.suggest_categorical("num_train_epochs", [2, 3])
    gradient_accumulation_steps = trial.suggest_categorical("gradient_accumulation_steps", [1, 2, 4])
    return {
        "learning_rate": learning_rate,
        "per_device_train_batch_size": per_device_train_batch_size,
        "weight_decay": weight_decay,
        "num_train_epochs": num_train_epochs,
        "gradient_accumulation_steps": gradient_accumulation_steps
    }
  ```
This code defines the model, evaluation metrics, and the hyperparameter search space for training. The model_init() function loads the pre-trained RoBERTa model for sequence classification with two output labels (depression or not) and moves it to the selected device (GPU or CPU). The compute_metrics() function calculates key evaluation metrics: accuracy, F1-score, precision, and recall by comparing the model’s predictions with the true labels. The tune_hp() function defines the hyperparameter search space for optimization. It specifies possible values for learning rate, batch size per device, weight decay, number of training epochs, and gradient accumulation steps. These hyperparameters are then tested during training to find the combination that yields the best model performance.

-----
## **Training Arguments**
```
training_args = TrainingArguments(
    output_dir="./grid_search_results_mental_health",
    eval_strategy="epoch",
    save_strategy="epoch",
    save_total_limit=1,
    load_best_model_at_end=True,
    metric_for_best_model="f1",
    fp16=torch.cuda.is_available(),
    report_to="none",
    num_train_epochs=3,
    warmup_steps=100,
    logging_dir="./logs",
    logging_steps=200,
    dataloader_num_workers=os.cpu_count() // 2 if os.cpu_count() else 0,
    adam_epsilon=1e-7,
)

trainer = Trainer(
    model_init=model_init,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_eval,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
)

def optuna_hp_objective(metrics):
    return metrics["eval_f1"]
```
This code sets up the training configuration and trainer for the model. The TrainingArguments specify how the model will be trained and evaluated. The results and logs are saved in designated folders, and the model is evaluated and saved at the end of each epoch. Only the best model based on F1-score is kept, and mixed-precision training (fp16) is enabled if a GPU is available to speed up computation. Other settings include the number of training epochs, warmup steps for the optimizer, logging frequency, number of CPU workers for loading data, and the Adam optimizer’s epsilon value. The Trainer class from Hugging Face is then initialized with the model, training arguments, tokenized datasets, evaluation metrics, and tokenizer. Finally, the optuna_hp_objective() function is defined to return the evaluation F1-score, which is used to guide the hyperparameter optimization process.

-----
## **Execution of Grid Search**
```
print("\n--- Starting Hyperparameter Search (using Optuna backend) ---")
print(f"Optimizing for '{training_args.metric_for_best_model}' score...")

NUM_TRIALS = 20
print(f"Number of trials to run: {NUM_TRIALS}")

best_trial = trainer.hyperparameter_search(
    backend="optuna",
    hp_space=tune_hp,
    direction="maximize",
    n_trials=NUM_TRIALS,
    compute_objective=optuna_hp_objective,
)

print("\n--- Hyperparameter Search Complete ---")
print("\nBEST HYPERPARAMETERS FOUND:")

if best_trial:
    print(best_trial)
    best_hps = best_trial.hyperparameters
    print("\nBest Hyperparameters:")
    for key, value in best_hps.items():
        print(f"  {key}: {value}")
    print(f"\nBest Metrics (on evaluation set): {best_trial.metrics}")
else:
    print("Search failed or no best trial found.")

print("\n--- Final Step: Train a model with the best hyperparameters ---")
if best_trial:
    final_training_args = TrainingArguments(
        output_dir="./final_model_mental_health",
        eval_strategy="epoch",
        save_strategy="epoch",
        save_total_limit=1,
        load_best_model_at_end=True,
        metric_for_best_model="f1",
        fp16=torch.cuda.is_available(),
        report_to="none",
        num_train_epochs=best_hps["num_train_epochs"],
        per_device_train_batch_size=best_hps["per_device_train_batch_size"],
        learning_rate=best_hps["learning_rate"],
        weight_decay=best_hps["weight_decay"],
        gradient_accumulation_steps=best_hps["gradient_accumulation_steps"],   
        warmup_steps=100,
        logging_dir="./final_logs",
        logging_steps=200,
        dataloader_num_workers=os.cpu_count() // 2 if os.cpu_count() else 0,
        adam_epsilon=1e-7,
    )

    final_trainer = Trainer(
        model_init=model_init,
        args=final_training_args,
        train_dataset=tokenized_train,
        eval_dataset=tokenized_eval,
        compute_metrics=compute_metrics,
        tokenizer=tokenizer,
    )

    print("\nTraining final model with best hyperparameters...")
    final_trainer.train()

    print("\nFinal model training complete. Best model saved to './final_model_mental_health'.")
    metrics = final_trainer.evaluate()
    print(f"Evaluation metrics of the final model: {metrics}")
else:
    print("No best hyperparameters found, skipping final model training.")
  ```
This code performs hyperparameter optimization using Optuna and trains the final model with the best parameters found. First, it announces the start of the search and specifies that the F1-score will be used to determine the best model. A total of 20 trials are run, where each trial tests a different combination of hyperparameters defined in the tune_hp() function. The trainer.hyperparameter_search() method performs the optimization, and the best hyperparameters and evaluation metrics are printed. Once the best hyperparameters are identified, the code sets up new TrainingArguments using these optimal values, including learning rate, batch size, number of epochs, weight decay, and gradient accumulation steps. A new Trainer is initialized with these settings, and the final model is trained on the tokenized training dataset. After training, the model is evaluated on the evaluation dataset, and its performance metrics are displayed. The trained model is saved in the folder ./final_model_mental_health for future use.

In [None]:
# 1. SETUP AND INSTALLATION
# Run this command first in your Colab notebook:
# !pip install transformers datasets accelerate ray[tune] optuna pandas -U

import torch
import os
import numpy as np
import pandas as pd
from datasets import Dataset
from transformers import (
    AutoModelForSequenceClassification,
    AutoTokenizer,
    TrainingArguments,
    Trainer,
    set_seed
)
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score
import optuna

# Set a consistent seed for reproducibility across runs
set_seed(42)

# Ensure GPU is available
if torch.cuda.is_available():
    device = torch.device("cuda")
    print(f"Using GPU: {torch.cuda.get_device_name(0)}")
else:
    device = torch.device("cpu")
    print("Using CPU. For faster training, consider enabling a GPU runtime.")

# --- 2. DATA PREPARATION ---
try:
    df = pd.read_csv("Mental-Health-Twitter.csv")
    print("Dataset loaded successfully.")
except FileNotFoundError:
    print("Error: 'Mental-Health-Twitter.csv' not found. Please upload it to your Colab environment.")
    exit()

df = df.dropna(subset=['post_text'])
df = df[df['post_text'].str.strip() != '']
df = df.rename(columns={"label": "labels"})

train_df, eval_df = train_test_split(df, test_size=0.1, stratify=df['labels'], random_state=42)
train_df = train_df.sample(n=10000, random_state=42)
eval_df = eval_df.sample(n=2000, random_state=42)

print(f"Using {len(train_df)} training samples and {len(eval_df)} evaluation samples.")
print(f"Train label distribution:\n{train_df['labels'].value_counts(normalize=True)}")
print(f"Eval label distribution:\n{eval_df['labels'].value_counts(normalize=True)}")

train_dataset = Dataset.from_pandas(train_df[['post_text', 'labels']])
eval_dataset = Dataset.from_pandas(eval_df[['post_text', 'labels']])

MODEL_NAME = "margotwagner/roberta-psychotherapy-eval"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

def tokenize_function(examples):
    return tokenizer(examples["post_text"], truncation=True, padding=True, max_length=128)

tokenized_train = train_dataset.map(tokenize_function, batched=True, remove_columns=["post_text", "__index_level_0__"])
tokenized_eval = eval_dataset.map(tokenize_function, batched=True, remove_columns=["post_text", "__index_level_0__"])

tokenized_train.set_format("torch", columns=['input_ids', 'attention_mask', 'labels'])
tokenized_eval.set_format("torch", columns=['input_ids', 'attention_mask', 'labels'])

# --- 3. MODEL AND METRICS ---
def model_init():
    return AutoModelForSequenceClassification.from_pretrained(MODEL_NAME, num_labels=2).to(device)

def compute_metrics(p):
    preds = np.argmax(p.predictions, axis=1)
    acc = accuracy_score(p.label_ids, preds)
    f1 = f1_score(p.label_ids, preds, average="binary")
    precision = precision_score(p.label_ids, preds, average="binary")
    recall = recall_score(p.label_ids, preds, average="binary")
    return {"accuracy": acc, "f1": f1, "precision": precision, "recall": recall}

# --- 3b. HYPERPARAMETER SPACE INCLUDING GRADIENT ACCUMULATION ---
def tune_hp(trial):
    learning_rate = trial.suggest_categorical("learning_rate", [2e-5, 3e-5])
    per_device_train_batch_size = trial.suggest_categorical("per_device_train_batch_size", [8, 16])
    weight_decay = trial.suggest_categorical("weight_decay", [0.01, 0.03])
    num_train_epochs = trial.suggest_categorical("num_train_epochs", [2, 3])
    gradient_accumulation_steps = trial.suggest_categorical("gradient_accumulation_steps", [1, 2, 4])
    return {
        "learning_rate": learning_rate,
        "per_device_train_batch_size": per_device_train_batch_size,
        "weight_decay": weight_decay,
        "num_train_epochs": num_train_epochs,
        "gradient_accumulation_steps": gradient_accumulation_steps
    }

# --- 4. TRAINING ARGUMENTS ---
training_args = TrainingArguments(
    output_dir="./grid_search_results_mental_health",
    eval_strategy="epoch",
    save_strategy="epoch",
    save_total_limit=1,
    load_best_model_at_end=True,
    metric_for_best_model="f1",
    fp16=torch.cuda.is_available(),
    report_to="none",
    num_train_epochs=3,
    warmup_steps=100,
    logging_dir="./logs",
    logging_steps=200,
    dataloader_num_workers=os.cpu_count() // 2 if os.cpu_count() else 0,
    adam_epsilon=1e-7,
)

trainer = Trainer(
    model_init=model_init,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_eval,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
)

def optuna_hp_objective(metrics):
    return metrics["eval_f1"]

# --- 5. EXECUTION OF Grid SEARCH ---
print("\n--- Starting Hyperparameter Search (using Optuna backend) ---")
print(f"Optimizing for '{training_args.metric_for_best_model}' score...")

NUM_TRIALS = 20
print(f"Number of trials to run: {NUM_TRIALS}")

best_trial = trainer.hyperparameter_search(
    backend="optuna",
    hp_space=tune_hp,
    direction="maximize",
    n_trials=NUM_TRIALS,
    compute_objective=optuna_hp_objective,
)

print("\n--- Hyperparameter Search Complete ---")
print("\nBEST HYPERPARAMETERS FOUND:")

if best_trial:
    print(best_trial)
    best_hps = best_trial.hyperparameters
    print("\nBest Hyperparameters:")
    for key, value in best_hps.items():
        print(f"  {key}: {value}")
    print(f"\nBest Metrics (on evaluation set): {best_trial.metrics}")
else:
    print("Search failed or no best trial found.")

print("\n--- Final Step: Train a model with the best hyperparameters ---")
if best_trial:
    final_training_args = TrainingArguments(
        output_dir="./final_model_mental_health",
        eval_strategy="epoch",
        save_strategy="epoch",
        save_total_limit=1,
        load_best_model_at_end=True,
        metric_for_best_model="f1",
        fp16=torch.cuda.is_available(),
        report_to="none",
        num_train_epochs=best_hps["num_train_epochs"],
        per_device_train_batch_size=best_hps["per_device_train_batch_size"],
        learning_rate=best_hps["learning_rate"],
        weight_decay=best_hps["weight_decay"],
        gradient_accumulation_steps=best_hps["gradient_accumulation_steps"],
        warmup_steps=100,
        logging_dir="./final_logs",
        logging_steps=200,
        dataloader_num_workers=os.cpu_count() // 2 if os.cpu_count() else 0,
        adam_epsilon=1e-7,
    )

    final_trainer = Trainer(
        model_init=model_init,
        args=final_training_args,
        train_dataset=tokenized_train,
        eval_dataset=tokenized_eval,
        compute_metrics=compute_metrics,
        tokenizer=tokenizer,
    )

    print("\nTraining final model with best hyperparameters...")
    final_trainer.train()

    print("\nFinal model training complete. Best model saved to './final_model_mental_health'.")
    metrics = final_trainer.evaluate()
    print(f"Evaluation metrics of the final model: {metrics}")
else:
    print("No best hyperparameters found, skipping final model training.")


Using GPU: Tesla T4
Dataset loaded successfully.
Using 10000 training samples and 2000 evaluation samples.
Train label distribution:
labels
1    0.5004
0    0.4996
Name: proportion, dtype: float64
Eval label distribution:
labels
1    0.5
0    0.5
Name: proportion, dtype: float64


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/394 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/280 [00:00<?, ?B/s]

Map:   0%|          | 0/10000 [00:00<?, ? examples/s]

Map:   0%|          | 0/2000 [00:00<?, ? examples/s]

  trainer = Trainer(


config.json:   0%|          | 0.00/886 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/499M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

[I 2025-11-08 09:27:45,170] A new study created in memory with name: no-name-8ac254d7-9a6f-4634-93ad-fa613a0eceb2



--- Starting Hyperparameter Search (using Optuna backend) ---
Optimizing for 'f1' score...
Number of trials to run: 20


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.3286,0.294511,0.8945,0.898606,0.86494,0.935
2,0.2401,0.436722,0.9005,0.902594,0.883988,0.922
3,0.2018,0.46618,0.8985,0.898955,0.894945,0.903


[I 2025-11-08 09:34:33,357] Trial 0 finished with value: 0.8989547038327527 and parameters: {'learning_rate': 2e-05, 'per_device_train_batch_size': 8, 'weight_decay': 0.01, 'num_train_epochs': 3, 'gradient_accumulation_steps': 1}. Best is trial 0 with value: 0.8989547038327527.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.344195,0.8465,0.846115,0.848241,0.844
2,0.603400,0.322092,0.8645,0.87366,0.818341,0.937
3,0.259400,0.270034,0.8875,0.887781,0.885572,0.89


[I 2025-11-08 09:39:01,679] Trial 1 finished with value: 0.8877805486284289 and parameters: {'learning_rate': 2e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.03, 'num_train_epochs': 3, 'gradient_accumulation_steps': 4}. Best is trial 0 with value: 0.8989547038327527.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.3271,0.275122,0.883,0.882648,0.885312,0.88
2,0.2139,0.29434,0.886,0.885081,0.892276,0.878


[I 2025-11-08 09:42:59,875] Trial 2 finished with value: 0.8850806451612904 and parameters: {'learning_rate': 2e-05, 'per_device_train_batch_size': 8, 'weight_decay': 0.01, 'num_train_epochs': 2, 'gradient_accumulation_steps': 2}. Best is trial 0 with value: 0.8989547038327527.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.3298,0.276888,0.8785,0.88129,0.861509,0.902
2,0.2179,0.323656,0.893,0.896318,0.869361,0.925
3,0.1582,0.358256,0.9,0.900794,0.893701,0.908


[I 2025-11-08 09:49:06,338] Trial 3 finished with value: 0.9007936507936508 and parameters: {'learning_rate': 2e-05, 'per_device_train_batch_size': 8, 'weight_decay': 0.03, 'num_train_epochs': 3, 'gradient_accumulation_steps': 2}. Best is trial 3 with value: 0.9007936507936508.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.6619,0.311163,0.852,0.843552,0.894619,0.798
2,0.2493,0.27077,0.879,0.880079,0.872299,0.888


[I 2025-11-08 09:53:19,122] Trial 4 finished with value: 0.8800792864222002 and parameters: {'learning_rate': 2e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.01, 'num_train_epochs': 2, 'gradient_accumulation_steps': 2}. Best is trial 3 with value: 0.9007936507936508.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.350986,0.8465,0.847794,0.840708,0.855
2,0.603700,0.289114,0.8725,0.871407,0.878942,0.864


[I 2025-11-08 09:56:43,762] Trial 5 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.6585,0.327029,0.843,0.831364,0.897912,0.774


[I 2025-11-08 09:57:54,738] Trial 6 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.3171,0.261968,0.883,0.888889,0.846293,0.936
2,0.206,0.323081,0.8935,0.899195,0.853549,0.95
3,0.1398,0.34566,0.9105,0.910812,0.907646,0.914


[I 2025-11-08 10:04:55,851] Trial 7 finished with value: 0.9108121574489287 and parameters: {'learning_rate': 3e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.01, 'num_train_epochs': 3, 'gradient_accumulation_steps': 1}. Best is trial 7 with value: 0.9108121574489287.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.3097,0.278825,0.879,0.878024,0.885163,0.871


[I 2025-11-08 10:06:16,768] Trial 8 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.6195,0.303157,0.8625,0.854574,0.906846,0.808
2,0.216,0.268782,0.891,0.890782,0.89257,0.889


[I 2025-11-08 10:11:01,468] Trial 9 finished with value: 0.8907815631262525 and parameters: {'learning_rate': 3e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.03, 'num_train_epochs': 2, 'gradient_accumulation_steps': 2}. Best is trial 7 with value: 0.9108121574489287.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.3689,0.377796,0.8765,0.88517,0.827107,0.952


[I 2025-11-08 10:12:53,261] Trial 10 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.3251,0.310584,0.8765,0.882213,0.843209,0.925


[I 2025-11-08 10:14:45,377] Trial 11 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.3197,0.276305,0.879,0.884872,0.84392,0.93
2,0.2058,0.342022,0.894,0.898175,0.86414,0.935
3,0.1564,0.415188,0.902,0.902778,0.895669,0.91


[I 2025-11-08 10:22:12,659] Trial 12 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.3689,0.377796,0.8765,0.88517,0.827107,0.952


[I 2025-11-08 10:24:04,934] Trial 13 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.344155,0.843,0.840609,0.853608,0.828


[I 2025-11-08 10:25:12,831] Trial 14 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.3197,0.276305,0.879,0.884872,0.84392,0.93
2,0.2058,0.342022,0.894,0.898175,0.86414,0.935
3,0.1564,0.415188,0.902,0.902778,0.895669,0.91


[I 2025-11-08 10:31:46,166] Trial 15 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.3171,0.261968,0.883,0.888889,0.846293,0.936
2,0.206,0.323081,0.8935,0.899195,0.853549,0.95
3,0.1398,0.34566,0.9105,0.910812,0.907646,0.914


[I 2025-11-08 10:38:30,718] Trial 16 finished with value: 0.9108121574489287 and parameters: {'learning_rate': 3e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.01, 'num_train_epochs': 3, 'gradient_accumulation_steps': 1}. Best is trial 7 with value: 0.9108121574489287.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.3171,0.261968,0.883,0.888889,0.846293,0.936
2,0.206,0.323081,0.8935,0.899195,0.853549,0.95
3,0.1398,0.34566,0.9105,0.910812,0.907646,0.914


[I 2025-11-08 10:45:38,308] Trial 17 finished with value: 0.9108121574489287 and parameters: {'learning_rate': 3e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.01, 'num_train_epochs': 3, 'gradient_accumulation_steps': 1}. Best is trial 7 with value: 0.9108121574489287.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.3171,0.261968,0.883,0.888889,0.846293,0.936
2,0.206,0.323081,0.8935,0.899195,0.853549,0.95
3,0.1398,0.34566,0.9105,0.910812,0.907646,0.914


[I 2025-11-08 10:51:56,270] Trial 18 finished with value: 0.9108121574489287 and parameters: {'learning_rate': 3e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.01, 'num_train_epochs': 3, 'gradient_accumulation_steps': 1}. Best is trial 7 with value: 0.9108121574489287.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.3171,0.261968,0.883,0.888889,0.846293,0.936
2,0.206,0.323081,0.8935,0.899195,0.853549,0.95
3,0.1398,0.34566,0.9105,0.910812,0.907646,0.914


[I 2025-11-08 11:00:04,283] Trial 19 finished with value: 0.9108121574489287 and parameters: {'learning_rate': 3e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.01, 'num_train_epochs': 3, 'gradient_accumulation_steps': 1}. Best is trial 7 with value: 0.9108121574489287.



--- Hyperparameter Search Complete ---

BEST HYPERPARAMETERS FOUND:
BestRun(run_id='7', objective=0.9108121574489287, hyperparameters={'learning_rate': 3e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.01, 'num_train_epochs': 3, 'gradient_accumulation_steps': 1}, run_summary=None)

Best Hyperparameters:
  learning_rate: 3e-05
  per_device_train_batch_size: 16
  weight_decay: 0.01
  num_train_epochs: 3
  gradient_accumulation_steps: 1


AttributeError: 'BestRun' object has no attribute 'metrics'