# **INSTALLATION**

This code installs essential Python libraries used for natural language processing (NLP), model training, and hyperparameter tuning. It sets up the environment by updating and installing Transformers for pre-trained models, Datasets for data handling, Accelerate for efficient GPU usage, and Ray[Tune] and Optuna for automated hyperparameter optimization. The input is the command itself, and the output is a ready-to-use environment with all necessary packages installed for fine-tuning and evaluating transformer-based models like RoBERTa.

In [None]:
!pip install transformers datasets accelerate ray[tune] optuna -U

Collecting datasets
  Downloading datasets-4.4.1-py3-none-any.whl.metadata (19 kB)
Collecting optuna
  Downloading optuna-4.5.0-py3-none-any.whl.metadata (17 kB)
Collecting ray[tune]
  Downloading ray-2.51.1-cp312-cp312-manylinux2014_x86_64.whl.metadata (21 kB)
Collecting pyarrow>=21.0.0 (from datasets)
  Downloading pyarrow-22.0.0-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (3.2 kB)
Collecting click!=8.3.0,>=7.0 (from ray[tune])
  Downloading click-8.2.1-py3-none-any.whl.metadata (2.5 kB)
Collecting tensorboardX>=1.9 (from ray[tune])
  Downloading tensorboardx-2.6.4-py3-none-any.whl.metadata (6.2 kB)
Collecting colorlog (from optuna)
  Downloading colorlog-6.10.1-py3-none-any.whl.metadata (11 kB)
Downloading datasets-4.4.1-py3-none-any.whl (511 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m511.6/511.6 kB[0m [31m15.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading optuna-4.5.0-py3-none-any.whl (400 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [

# **IMPORT**

This section sets up the working environment for fine-tuning the RoBERTa model on Google Colab. It installs and imports essential libraries, including Transformers for model handling, Datasets for data processing, PyTorch for computation, and scikit-learn for evaluation metrics. A consistent random seed is set to ensure reproducibility across multiple runs. The code also checks for GPU availability, which can significantly accelerate model training compared to CPU execution. Overall, this setup prepares the environment to efficiently handle data, model training, and evaluation for the sentiment analysis task.

In [None]:
# 1. SETUP AND INSTALLATION
# Run this command first in your Colab notebook:
# !pip install transformers datasets accelerate ray[tune] optuna pandas -U

import torch
import os
import numpy as np
import pandas as pd
from datasets import Dataset
from transformers import (
    AutoModelForSequenceClassification, # Use AutoModel for RoBERTa
    AutoTokenizer,                     # Use AutoTokenizer for RoBERTa
    TrainingArguments,
    Trainer,
    set_seed
)
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score

# Set a consistent seed for reproducibility across runs
set_seed(42)

# Ensure GPU is available
if torch.cuda.is_available():
    device = torch.device("cuda")
    print(f"Using GPU: {torch.cuda.get_device_name(0)}")
else:
    device = torch.device("cpu")
    print("Using CPU. For faster training, consider enabling a GPU runtime.")

# **DATA PREPARATION**

This code handles the preparation of the Mental-Health-Twitter dataset for training a RoBERTa-based sentiment analysis model. Its purpose is to clean, split, and tokenize the data to make it compatible with the Hugging Face Trainer. The input is a CSV file containing tweets and corresponding depression labels, which are filtered to remove missing or empty text entries. The data is then split into training and evaluation sets, and sample sizes are reduced to manage GPU memory and speed up experimentation. The Hugging Face Dataset objects are created from the pandas DataFrames, and a tokenizer is applied to convert text into token IDs and attention masks for the model. The output is tokenized PyTorch-ready datasets, structured with input_ids, attention_mask, and labels, ready for training and evaluation.

In [None]:
# --- 2. DATA PREPARATION (Using your Mental-Health-Twitter.csv) ---

# Upload 'Mental-Health-Twitter.csv' to your Colab environment
# Example: from google.colab import files
#          files.upload() # Then select your file

# Load your dataset
try:
    df = pd.read_csv("Mental-Health-Twitter.csv")
    print("Dataset loaded successfully.")
except FileNotFoundError:
    print("Error: 'Mental-Health-Twitter.csv' not found. Please upload it to your Colab environment.")
    exit()

# Filter out rows where 'post_text' is NaN or empty
df = df.dropna(subset=['post_text'])
df = df[df['post_text'].str.strip() != '']

# Map labels if necessary (ensure 0 and 1 are the actual labels or adjust)
# Assuming 'label' column already contains 0 for 'no depression' and 1 for 'depression'
# If your labels are different (e.g., 'negative', 'positive'), you'll need to map them:
# df['label'] = df['label'].map({'no_depression_text': 0, 'depression_text': 1})

# Rename 'label' to 'labels' for Hugging Face Trainer compatibility
df = df.rename(columns={"label": "labels"})

# Split data into training and validation sets
# Use a smaller subset for faster experiments, but larger than the example
# Let's try 10,000 for training and 2,000 for evaluation to get a decent signal
# You can adjust these numbers based on initial run times.
train_df, eval_df = train_test_split(df, test_size=0.1, stratify=df['labels'], random_state=42) # 10% for evaluation
train_df = train_df.sample(n=10000, random_state=42) # Limit to 10k training samples
eval_df = eval_df.sample(n=2000, random_state=42)   # Limit to 2k evaluation samples

print(f"Using {len(train_df)} training samples and {len(eval_df)} evaluation samples.")
print(f"Train label distribution:\n{train_df['labels'].value_counts(normalize=True)}")
print(f"Eval label distribution:\n{eval_df['labels'].value_counts(normalize=True)}")

# Convert pandas DataFrames to Hugging Face Dataset objects
train_dataset = Dataset.from_pandas(train_df[['post_text', 'labels']])
eval_dataset = Dataset.from_pandas(eval_df[['post_text', 'labels']])

# Initialize Tokenizer for your specific model
MODEL_NAME = "margotwagner/roberta-psychotherapy-eval"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

def tokenize_function(examples):
    # Ensure 'post_text' is correctly accessed and adjust max_length if needed
    return tokenizer(examples["post_text"], truncation=True, padding=True, max_length=128)

# Apply tokenization
tokenized_train = train_dataset.map(tokenize_function, batched=True, remove_columns=["post_text", "__index_level_0__"])
tokenized_eval = eval_dataset.map(tokenize_function, batched=True, remove_columns=["post_text", "__index_level_0__"])

# Set format to PyTorch tensors
tokenized_train.set_format("torch", columns=['input_ids', 'attention_mask', 'labels'])
tokenized_eval.set_format("torch", columns=['input_ids', 'attention_mask', 'labels'])

# **MODEL, METRICS, AND HYPERPARAMETER DEFINITION**

This code sets up the RoBERTa model, evaluation metrics, and hyperparameter search space for tuning experiments. Its purpose is to define a fresh model initialization for each run, ensuring independent training and fair evaluation of different hyperparameter combinations. The compute_metrics function calculates performance metrics such as accuracy, F1 score, precision, and recall from the model predictions, which are essential for selecting the best configuration. The input consists of tokenized training and evaluation datasets, along with suggested hyperparameter values from the search function. The tune_hp function defines a controlled hyperparameter space, including learning rate, batch size, weight decay, and number of training epochs, optimized to allow multiple trials within reasonable time constraints. The output is a dictionary of hyperparameter combinations that will be tested, guiding the grid or random search process to identify the best-performing model configuration.

In [None]:
# --- 3. MODEL, METRICS, AND HYPERPARAMETER DEFINITION ---

# Function to initialize a fresh model for each grid search run
def model_init():
    # Model must be re-initialized for every run to ensure independence
    return AutoModelForSequenceClassification.from_pretrained(MODEL_NAME, num_labels=2).to(device)

def compute_metrics(p):
    preds = np.argmax(p.predictions, axis=1)
    acc = accuracy_score(p.label_ids, preds)
    f1 = f1_score(p.label_ids, preds, average="binary") # 'binary' for 0/1 labels
    precision = precision_score(p.label_ids, preds, average="binary")
    recall = recall_score(p.label_ids, preds, average="binary")
    return {"accuracy": acc, "f1": f1, "precision": precision, "recall": recall}

# --- HYPERPARAMETER GRID DEFINITION ---
def tune_hp(trial):
    """
    This function defines the hyperparameter space to be explored.
    We'll adjust this to get more than 10 trials within your time limit.
    """
    # Reduced search space for quicker convergence and more trials
    learning_rate = trial.suggest_categorical("learning_rate", [1e-5, 2e-5, 3e-5]) # 3 options
    per_device_train_batch_size = trial.suggest_categorical("per_device_train_batch_size", [4, 16, 32])
    weight_decay = trial.suggest_float("weight_decay", 0.01, 0.03, step=0.02) # 3 options: 0.01, 0.03, 0.05
    num_train_epochs = trial.suggest_categorical("num_train_epochs", [1, 2, 3]) # 2 options

    # Total trials for this configuration: 3 * 2 * 3 * 2 = 36 trials
    # This should give you plenty of experiments (more than 10) within your 8-hour window.

    return {
        "learning_rate": learning_rate,
        "per_device_train_batch_size": per_device_train_batch_size,
        "weight_decay": weight_decay,
        "num_train_epochs": num_train_epochs,
    }

# **TRAINING ARGUMENTS**

This code configures the training process for the RoBERTa model and sets fixed arguments used across all hyperparameter runs. Its purpose is to define how the model is trained, evaluated, and logged, ensuring consistency while allowing hyperparameters to vary per trial. The TrainingArguments specify the output directory, evaluation strategy, F1-score optimization, mixed precision usage for GPU, logging settings, and number of CPU workers for efficient data loading. The Trainer object takes the model initialization function, datasets, tokenizer, and metric computation function to manage training and evaluation automatically. The optuna_hp_objective function is defined to guide the hyperparameter optimization, using the evaluation F1 score as the metric to maximize. The input includes the tokenized datasets, training arguments, and model initialization, while the output is the evaluation metrics dictionary that Optuna uses to determine the best hyperparameter configuration.

In [None]:
# --- 4. TRAINING ARGUMENTS (Fixed for all runs) ---
# Most arguments are fixed, only the chosen HPs vary per run.
training_args = TrainingArguments(
    output_dir="./grid_search_results_mental_health",
    # Evaluation settings (fixed)
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    metric_for_best_model="f1", # Optimize for F1-Score, good for imbalanced classes
    fp16=torch.cuda.is_available(), # Enable mixed precision for T4 GPU
    report_to="none", # Don't report to any external service
    # Fixed parameters (will be overridden by tune_hp where applicable)
    num_train_epochs=3, # Placeholder, will be suggested by tune_hp
    warmup_steps=100, # Reduced warmup steps for smaller datasets/epochs
    logging_dir="./logs",
    logging_steps=500,
    dataloader_num_workers=os.cpu_count() // 2 if os.cpu_count() else 0, # Use half CPU cores for data loading
)

# Initialize the Trainer
trainer = Trainer(
    model_init=model_init, # We pass the function, not the object, for fresh initialization
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_eval,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
)

# --- Define the objective function for Optuna ---
def optuna_hp_objective(metrics):
    """
    Optuna objective function that returns the F1 score for maximization.
    `metrics` is the dictionary returned by trainer.evaluate().
    """
    # The keys in the metrics dictionary will be prefixed with 'eval_' during evaluation
    # e.g., 'eval_loss', 'eval_accuracy', 'eval_f1', 'eval_precision', 'eval_recall'
    return metrics["eval_f1"]

# **EXECUTION OF GRID SEARCH**

This code executes the full hyperparameter search and trains the final RoBERTa model with the best configuration. Its purpose is to explore all combinations of learning rate, batch size, weight decay, and number of epochs using Optuna to maximize the F1 score, which is critical for evaluating imbalanced classes. The input includes the tokenized training and evaluation datasets, the hyperparameter search space defined by tune_hp, and the training arguments. The trainer.hyperparameter_search function iterates through all trial combinations, returning the best trial based on evaluation F1 score. Once the best hyperparameters are identified, the code reinitializes TrainingArguments and the Trainer with these optimal settings to train a final model. The output is the trained RoBERTa model and its evaluation metrics, which indicate performance on the validation dataset, helping determine the most effective hyperparameter configuration.

In [None]:
# --- 5. EXECUTION OF GRID SEARCH ---
# We use Optuna backend for efficient searching. The 'hp_space' provides the search definition.
print("\n--- Starting Hyperparameter Search (using Optuna backend) ---")
print(f"Optimizing for '{training_args.metric_for_best_model}' score...")

# Calculate the total number of trials based on the hp_space function
num_lr = len([2e-5, 3e-5, 4e-5])
num_batch = len([16, 32])
num_wd = len(np.arange(0.01, 0.051, 0.02)) # Includes 0.01, 0.03, 0.05
num_epochs = len([2, 3])
total_trials = num_lr * num_batch * num_wd * num_epochs
print(f"Total experiment combinations: {total_trials}")

best_trial = trainer.hyperparameter_search(
    backend="optuna",
    hp_space=tune_hp,
    direction="maximize", # Maximize the F1 score
    n_trials=total_trials, # Run all combinations defined in tune_hp
    compute_objective=optuna_hp_objective, # <--- CRUCIAL FIX: Explicitly tell Optuna to use eval_f1
)

print("\n--- Hyperparameter Search Complete ---")
print("\nBEST HYPERPARAMETERS FOUND:")

# Extract and print the best configuration
if best_trial:
    print(best_trial)
    best_hps = best_trial.hyperparameters
    print("\nBest Hyperparameters:")
    for key, value in best_hps.items():
        print(f"  {key}: {value}")
    print(f"\nBest Metrics (on evaluation set): {best_trial.metrics}")
else:
    print("Search failed or no best trial found.")

print("\n--- Final Step: Train a model with the best hyperparameters ---")
if best_trial:
    # Re-initialize TrainingArguments with the best hyperparameters for the final training run
    final_training_args = TrainingArguments(
        output_dir="./final_model_mental_health",
        eval_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
        metric_for_best_model="f1",
        fp16=torch.cuda.is_available(),
        report_to="none",
        # Use the best hyperparameters found
        num_train_epochs=best_hps["num_train_epochs"],
        per_device_train_batch_size=best_hps["per_device_train_batch_size"],
        learning_rate=best_hps["learning_rate"],
        weight_decay=best_hps["weight_decay"],
        warmup_steps=100, # Can be adjusted based on number of epochs
        logging_dir="./final_logs",
        logging_steps=500,
        dataloader_num_workers=os.cpu_count() // 2 if os.cpu_count() else 0,
    )

    # Re-initialize the Trainer with the best HPs
    final_trainer = Trainer(
        model_init=model_init, # Re-initialize the model
        args=final_training_args,
        train_dataset=tokenized_train,
        eval_dataset=tokenized_eval,
        compute_metrics=compute_metrics,
        tokenizer=tokenizer,
    )

    print("\nTraining final model with best hyperparameters...")
    final_trainer.train()

    print("\nFinal model training complete. Best model saved to './final_model_mental_health'.")
    metrics = final_trainer.evaluate()
    print(f"Evaluation metrics of the final model: {metrics}")
else:
    print("No best hyperparameters found, skipping final model training.")

Using GPU: Tesla T4
Dataset loaded successfully.
Using 10000 training samples and 2000 evaluation samples.
Train label distribution:
labels
1    0.5004
0    0.4996
Name: proportion, dtype: float64
Eval label distribution:
labels
1    0.5
0    0.5
Name: proportion, dtype: float64


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/394 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/280 [00:00<?, ?B/s]

Map:   0%|          | 0/10000 [00:00<?, ? examples/s]

Map:   0%|          | 0/2000 [00:00<?, ? examples/s]

  trainer = Trainer(


config.json:   0%|          | 0.00/886 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/499M [00:00<?, ?B/s]

[I 2025-11-08 08:51:43,849] A new study created in memory with name: no-name-05ecf44e-5b09-4231-b5e2-adf754f6df58



--- Starting Hyperparameter Search (using Optuna backend) ---
Optimizing for 'f1' score...
Total experiment combinations: 36


model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.4788,0.272291,0.8875,0.890617,0.866604,0.916
2,0.2354,0.285368,0.899,0.898288,0.904665,0.892


[I 2025-11-08 08:54:48,254] Trial 0 finished with value: 0.8982880161127895 and parameters: {'learning_rate': 3e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.03, 'num_train_epochs': 2}. Best is trial 0 with value: 0.8982880161127895.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.4704,0.493263,0.884,0.889313,0.850365,0.932
2,0.328,0.523241,0.899,0.903442,0.865385,0.945
3,0.1829,0.536176,0.907,0.907921,0.89902,0.917


[I 2025-11-08 09:04:40,253] Trial 1 finished with value: 0.907920792079208 and parameters: {'learning_rate': 2e-05, 'per_device_train_batch_size': 4, 'weight_decay': 0.01, 'num_train_epochs': 3}. Best is trial 1 with value: 0.907920792079208.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.4818,0.283387,0.887,0.887113,0.886228,0.888
2,0.2361,0.288584,0.9025,0.901664,0.909461,0.894


[I 2025-11-08 09:07:47,963] Trial 2 finished with value: 0.9016641452344932 and parameters: {'learning_rate': 3e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.01, 'num_train_epochs': 2}. Best is trial 1 with value: 0.907920792079208.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.4704,0.493263,0.884,0.889313,0.850365,0.932
2,0.328,0.523241,0.899,0.903442,0.865385,0.945
3,0.1829,0.536176,0.907,0.907921,0.89902,0.917


[I 2025-11-08 09:17:45,064] Trial 3 finished with value: 0.907920792079208 and parameters: {'learning_rate': 2e-05, 'per_device_train_batch_size': 4, 'weight_decay': 0.01, 'num_train_epochs': 3}. Best is trial 1 with value: 0.907920792079208.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.4814,0.257043,0.887,0.891346,0.858333,0.927
2,0.2403,0.321294,0.9015,0.907119,0.858162,0.962
3,0.1632,0.344993,0.9135,0.913283,0.915578,0.911


[I 2025-11-08 09:22:21,898] Trial 4 finished with value: 0.9132832080200501 and parameters: {'learning_rate': 3e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.03, 'num_train_epochs': 3}. Best is trial 4 with value: 0.9132832080200501.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.4018,0.45272,0.886,0.889642,0.862101,0.919


[I 2025-11-08 09:25:50,800] Trial 5 finished with value: 0.8896418199419167 and parameters: {'learning_rate': 3e-05, 'per_device_train_batch_size': 4, 'weight_decay': 0.03, 'num_train_epochs': 1}. Best is trial 4 with value: 0.9132832080200501.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.329921,0.8415,0.826681,0.911942,0.756
2,0.446200,0.270808,0.8825,0.880771,0.893924,0.868


[I 2025-11-08 09:28:26,118] Trial 6 finished with value: 0.8807711821410451 and parameters: {'learning_rate': 2e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.03, 'num_train_epochs': 2}. Best is trial 4 with value: 0.9132832080200501.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.5808,0.361056,0.841,0.839069,0.849385,0.829


[I 2025-11-08 09:29:46,452] Trial 7 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.329921,0.8415,0.826681,0.911942,0.756
2,0.446200,0.270808,0.8825,0.880771,0.893924,0.868


[I 2025-11-08 09:32:14,756] Trial 8 finished with value: 0.8807711821410451 and parameters: {'learning_rate': 2e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.03, 'num_train_epochs': 2}. Best is trial 4 with value: 0.9132832080200501.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.5696,0.332945,0.849,0.840381,0.891256,0.795


[I 2025-11-08 09:33:34,926] Trial 9 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.4838,0.25362,0.8905,0.895665,0.855323,0.94
2,0.2412,0.298444,0.907,0.909357,0.886882,0.933
3,0.1628,0.346061,0.9105,0.909733,0.917599,0.902


[I 2025-11-08 09:37:58,320] Trial 10 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.4704,0.493263,0.884,0.889313,0.850365,0.932
2,0.328,0.523241,0.899,0.903442,0.865385,0.945
3,0.1829,0.536176,0.907,0.907921,0.89902,0.917


[I 2025-11-08 09:50:44,108] Trial 11 finished with value: 0.907920792079208 and parameters: {'learning_rate': 2e-05, 'per_device_train_batch_size': 4, 'weight_decay': 0.01, 'num_train_epochs': 3}. Best is trial 4 with value: 0.9132832080200501.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.4642,0.480782,0.8915,0.896814,0.854941,0.943
2,0.3444,0.447276,0.906,0.907206,0.895712,0.919
3,0.1668,0.588669,0.9,0.900398,0.896825,0.904


[I 2025-11-08 10:02:35,847] Trial 12 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.4704,0.493263,0.884,0.889313,0.850365,0.932
2,0.328,0.523241,0.899,0.903442,0.865385,0.945
3,0.1829,0.536176,0.907,0.907921,0.89902,0.917


[I 2025-11-08 10:15:12,112] Trial 13 finished with value: 0.907920792079208 and parameters: {'learning_rate': 2e-05, 'per_device_train_batch_size': 4, 'weight_decay': 0.01, 'num_train_epochs': 3}. Best is trial 4 with value: 0.9132832080200501.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.375717,0.838,0.828933,0.878076,0.785
2,0.515800,0.295271,0.8725,0.87457,0.8606,0.889


[I 2025-11-08 10:18:14,112] Trial 14 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.4838,0.25362,0.8905,0.895665,0.855323,0.94
2,0.2412,0.298444,0.907,0.909357,0.886882,0.933
3,0.1628,0.346061,0.9105,0.909733,0.917599,0.902


[I 2025-11-08 10:24:15,034] Trial 15 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.4062,0.423863,0.8895,0.892353,0.869896,0.916


[I 2025-11-08 10:29:27,945] Trial 16 finished with value: 0.8923526546517292 and parameters: {'learning_rate': 2e-05, 'per_device_train_batch_size': 4, 'weight_decay': 0.01, 'num_train_epochs': 1}. Best is trial 4 with value: 0.9132832080200501.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.5057,0.270499,0.883,0.885406,0.867562,0.904


[I 2025-11-08 10:30:48,239] Trial 17 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.4285,0.546304,0.888,0.892204,0.859926,0.927
2,0.3173,0.475499,0.902,0.903638,0.888781,0.919
3,0.1523,0.549459,0.9055,0.904401,0.915046,0.894


[I 2025-11-08 10:41:32,379] Trial 18 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.446338,0.7915,0.784719,0.811099,0.76


[I 2025-11-08 10:42:37,124] Trial 19 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.4814,0.257043,0.887,0.891346,0.858333,0.927
2,0.2403,0.321294,0.9015,0.907119,0.858162,0.962
3,0.1632,0.344993,0.9135,0.913283,0.915578,0.911


[I 2025-11-08 10:49:22,983] Trial 20 finished with value: 0.9132832080200501 and parameters: {'learning_rate': 3e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.03, 'num_train_epochs': 3}. Best is trial 4 with value: 0.9132832080200501.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.4814,0.257043,0.887,0.891346,0.858333,0.927
2,0.2403,0.321294,0.9015,0.907119,0.858162,0.962
3,0.1632,0.344993,0.9135,0.913283,0.915578,0.911


[I 2025-11-08 10:55:32,744] Trial 21 finished with value: 0.9132832080200501 and parameters: {'learning_rate': 3e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.03, 'num_train_epochs': 3}. Best is trial 4 with value: 0.9132832080200501.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.4814,0.257043,0.887,0.891346,0.858333,0.927
2,0.2403,0.321294,0.9015,0.907119,0.858162,0.962
3,0.1632,0.344993,0.9135,0.913283,0.915578,0.911


[I 2025-11-08 11:01:00,272] Trial 22 finished with value: 0.9132832080200501 and parameters: {'learning_rate': 3e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.03, 'num_train_epochs': 3}. Best is trial 4 with value: 0.9132832080200501.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.4814,0.257043,0.887,0.891346,0.858333,0.927
2,0.2403,0.321294,0.9015,0.907119,0.858162,0.962
3,0.1632,0.344993,0.9135,0.913283,0.915578,0.911


[I 2025-11-08 11:07:27,284] Trial 23 finished with value: 0.9132832080200501 and parameters: {'learning_rate': 3e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.03, 'num_train_epochs': 3}. Best is trial 4 with value: 0.9132832080200501.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.4814,0.257043,0.887,0.891346,0.858333,0.927
2,0.2403,0.321294,0.9015,0.907119,0.858162,0.962
3,0.1632,0.344993,0.9135,0.913283,0.915578,0.911


[I 2025-11-08 11:14:01,381] Trial 24 finished with value: 0.9132832080200501 and parameters: {'learning_rate': 3e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.03, 'num_train_epochs': 3}. Best is trial 4 with value: 0.9132832080200501.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.4814,0.257043,0.887,0.891346,0.858333,0.927
2,0.2403,0.321294,0.9015,0.907119,0.858162,0.962
