This command installs and updates several key Python libraries used for natural language processing and model optimization. The exclamation mark ! allows running shell commands directly from environments like Jupyter Notebook. The pip install command fetches and installs packages: transformers provides pre-trained models and training utilities from Hugging Face, datasets handles efficient dataset loading and preprocessing, accelerate optimizes multi-GPU and distributed training, ray[tune] enables scalable hyperparameter tuning, and optuna offers an alternative, efficient framework for automated hyperparameter optimization. The -U flag ensures all packages are upgraded to their latest compatible versions.

In [2]:
!pip install transformers datasets accelerate ray[tune] optuna -U

Collecting datasets
  Downloading datasets-4.4.1-py3-none-any.whl.metadata (19 kB)
Collecting optuna
  Downloading optuna-4.5.0-py3-none-any.whl.metadata (17 kB)
Collecting ray[tune]
  Downloading ray-2.51.1-cp312-cp312-manylinux2014_x86_64.whl.metadata (21 kB)
Collecting pyarrow>=21.0.0 (from datasets)
  Downloading pyarrow-22.0.0-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (3.2 kB)
Collecting click!=8.3.0,>=7.0 (from ray[tune])
  Downloading click-8.2.1-py3-none-any.whl.metadata (2.5 kB)
Collecting tensorboardX>=1.9 (from ray[tune])
  Downloading tensorboardx-2.6.4-py3-none-any.whl.metadata (6.2 kB)
Collecting colorlog (from optuna)
  Downloading colorlog-6.10.1-py3-none-any.whl.metadata (11 kB)
Downloading datasets-4.4.1-py3-none-any.whl (511 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m511.6/511.6 kB[0m [31m15.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading optuna-4.5.0-py3-none-any.whl (400 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [

**Function Description**

This code establishes a full fine-tuning and hyperparameter optimization workflow for a binary text classification model on a mental-health-related Twitter dataset. It loads, cleans, and tokenizes the data, prepares the model using a RoBERTa-based architecture, and evaluates multiple hyperparameter combinations through Optuna’s integration with the Hugging Face Trainer. The pipeline identifies the best configuration for parameters such as learning rate, batch size, weight decay, and number of epochs, before retraining the model using the optimal settings to produce the final evaluation metrics.

**Syntax Explanation**

The script employs Hugging Face’s transformers and datasets libraries for tokenization, model initialization, and training, along with scikit-learn for computing evaluation metrics. The AutoModelForSequenceClassification and AutoTokenizer classes automatically configure themselves based on the specified pretrained model. The dataset is converted from pandas to the Hugging Face Dataset format for efficient tokenization and batch processing. The Trainer API uses TrainingArguments to define training conditions, while Optuna’s hyperparameter search feature (trainer.hyperparameter_search) systematically explores combinations defined in tune_hp. The function compute_metrics calculates performance metrics such as accuracy, precision, recall, and F1-score, which are used to determine the optimal model.

**Inputs**

The code takes as input the Mental-Health-Twitter.csv dataset, which contains a post_text column for the tweet content and a label column for class identification (0 for no depression, 1 for depression). The data is preprocessed to remove null or empty entries, rename the label column to labels, and split into training and evaluation sets using an approximate 90:10 ratio. The RoBERTa tokenizer associated with the pretrained model margotwagner/roberta-psychotherapy-eval processes the textual data into token IDs and attention masks.

**Outputs**

The script produces several outputs, including confirmation of the device used (GPU or CPU), dataset statistics, and progress updates during the Optuna hyperparameter search. Once the search completes, it prints the best trial configuration, the selected hyperparameters, and their corresponding evaluation metrics. The final fine-tuned model, trained using the optimal hyperparameters, is saved to the directory ./final_model_mental_health. Evaluation metrics such as accuracy, precision, recall, and F1-score are displayed for performance comparison.

**Code Flow**

The script begins by importing all required dependencies and setting a random seed to ensure reproducibility across runs. It then checks for GPU availability to determine the optimal computing device. The dataset is loaded, cleaned, and split into training and validation subsets using stratified sampling to maintain balanced label distribution. Afterward, the data is converted into a Hugging Face Dataset and tokenized using a RoBERTa tokenizer configured for truncation, padding, and a maximum sequence length of 128. A function for model initialization (model_init) is defined to ensure that each hyperparameter trial starts with a fresh model instance. Another function (compute_metrics) defines how model predictions will be evaluated, using accuracy, F1-score, precision, and recall as the key metrics. The tune_hp function defines the search space for Optuna, specifying ranges for learning rate, batch size, weight decay, and training epochs. A Trainer instance is then created, combining the model, tokenized datasets, training arguments, and evaluation metrics. The hyperparameter search process begins, exploring multiple configurations to maximize the F1-score. Once the best configuration is identified, the script initializes a new Trainer with the optimal parameters, retrains the model on the dataset, and evaluates the final performance metrics.

**Comments and Observations**

The code is well-structured, modular, and optimized for research-driven experimentation, with strong adherence to reproducibility and evaluation standards. It effectively integrates Optuna for systematic hyperparameter tuning and uses stratified sampling to maintain data balance. The inclusion of metrics beyond accuracy, such as F1-score, ensures robustness for potentially imbalanced datasets. However, the training process did not complete all 36 intended trials due to GPU storage limitations, resulting in only 25 trials being executed before interruption. This constraint may have prevented the model from reaching the optimal hyperparameter combination. Additionally, mixed-precision (fp16) training is appropriately used to improve GPU efficiency, though it should be monitored for memory stability. The code could further benefit from implementing early stopping and pruning strategies within Optuna to reduce runtime and manage GPU storage usage effectively.

In [1]:
# 1. SETUP AND INSTALLATION
# Run this command first in your Colab notebook:
# !pip install transformers datasets accelerate ray[tune] optuna pandas -U

import torch
import os
import numpy as np
import pandas as pd
from datasets import Dataset
from transformers import (
    AutoModelForSequenceClassification, # Use AutoModel for RoBERTa
    AutoTokenizer,                     # Use AutoTokenizer for RoBERTa
    TrainingArguments,
    Trainer,
    set_seed
)
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score

# Set a consistent seed for reproducibility across runs
set_seed(42)

# Ensure GPU is available
if torch.cuda.is_available():
    device = torch.device("cuda")
    print(f"Using GPU: {torch.cuda.get_device_name(0)}")
else:
    device = torch.device("cpu")
    print("Using CPU. For faster training, consider enabling a GPU runtime.")

# --- 2. DATA PREPARATION (Using your Mental-Health-Twitter.csv) ---

# Upload 'Mental-Health-Twitter.csv' to your Colab environment

# Load your dataset
try:
    df = pd.read_csv("Mental-Health-Twitter.csv")
    print("Dataset loaded successfully.")
except FileNotFoundError:
    print("Error: 'Mental-Health-Twitter.csv' not found. Please upload it to your Colab environment.")
    exit()

# Filter out rows where 'post_text' is NaN or empty
df = df.dropna(subset=['post_text'])
df = df[df['post_text'].str.strip() != '']

# Rename 'label' to 'labels' for Hugging Face Trainer compatibility
df = df.rename(columns={"label": "labels"})

# Split data into training and validation sets
# Use a smaller subset for faster experiments, but larger than the example
# Let's try 10,000 for training and 2,000 for evaluation to get a decent signal
# You can adjust these numbers based on initial run times.
train_df, eval_df = train_test_split(df, test_size=0.1, stratify=df['labels'], random_state=42) # 10% for evaluation
train_df = train_df.sample(n=10000, random_state=42) # Limit to 10k training samples
eval_df = eval_df.sample(n=2000, random_state=42)   # Limit to 2k evaluation samples

print(f"Using {len(train_df)} training samples and {len(eval_df)} evaluation samples.")
print(f"Train label distribution:\n{train_df['labels'].value_counts(normalize=True)}")
print(f"Eval label distribution:\n{eval_df['labels'].value_counts(normalize=True)}")

# Convert pandas DataFrames to Hugging Face Dataset objects
train_dataset = Dataset.from_pandas(train_df[['post_text', 'labels']])
eval_dataset = Dataset.from_pandas(eval_df[['post_text', 'labels']])

# Initialize Tokenizer for your specific model
MODEL_NAME = "margotwagner/roberta-psychotherapy-eval"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

def tokenize_function(examples):
    # Ensure 'post_text' is correctly accessed and adjust max_length if needed
    return tokenizer(examples["post_text"], truncation=True, padding=True, max_length=128)

# Apply tokenization
tokenized_train = train_dataset.map(tokenize_function, batched=True, remove_columns=["post_text", "__index_level_0__"])
tokenized_eval = eval_dataset.map(tokenize_function, batched=True, remove_columns=["post_text", "__index_level_0__"])

# Set format to PyTorch tensors
tokenized_train.set_format("torch", columns=['input_ids', 'attention_mask', 'labels'])
tokenized_eval.set_format("torch", columns=['input_ids', 'attention_mask', 'labels'])


# --- 3. MODEL, METRICS, AND HYPERPARAMETER DEFINITION ---

# Function to initialize a fresh model for each grid search run
def model_init():
    # Model must be re-initialized for every run to ensure independence
    return AutoModelForSequenceClassification.from_pretrained(MODEL_NAME, num_labels=2).to(device)

def compute_metrics(p):
    preds = np.argmax(p.predictions, axis=1)
    acc = accuracy_score(p.label_ids, preds)
    f1 = f1_score(p.label_ids, preds, average="binary") # 'binary' for 0/1 labels
    precision = precision_score(p.label_ids, preds, average="binary")
    recall = recall_score(p.label_ids, preds, average="binary")
    return {"accuracy": acc, "f1": f1, "precision": precision, "recall": recall}

# --- HYPERPARAMETER GRID DEFINITION ---
def tune_hp(trial):
    """
    This function defines the hyperparameter space to be explored.
    We'll adjust this to get more than 10 trials within your time limit.
    """
    # Reduced search space for quicker convergence and more trials
    learning_rate = trial.suggest_categorical("learning_rate", [2e-5, 3e-5, 4e-5]) # 3 options
    per_device_train_batch_size = trial.suggest_categorical("per_device_train_batch_size", [16, 32]) # 2 options
    weight_decay = trial.suggest_float("weight_decay", 0.01, 0.05, step=0.02) # 3 options: 0.01, 0.03, 0.05
    num_train_epochs = trial.suggest_categorical("num_train_epochs", [2, 3]) # 2 options

    # Total trials for this configuration: 3 * 2 * 3 * 2 = 36 trials
    # This should give you plenty of experiments (more than 10) within your 8-hour window.

    return {
        "learning_rate": learning_rate,
        "per_device_train_batch_size": per_device_train_batch_size,
        "weight_decay": weight_decay,
        "num_train_epochs": num_train_epochs,
    }


# --- 4. TRAINING ARGUMENTS (Fixed for all runs) ---
# Most arguments are fixed, only the chosen HPs vary per run.
training_args = TrainingArguments(
    output_dir="./grid_search_results_mental_health",
    # Evaluation settings (fixed)
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    metric_for_best_model="f1", # Optimize for F1-Score, good for imbalanced classes
    fp16=torch.cuda.is_available(), # Enable mixed precision for T4 GPU
    report_to="none", # Don't report to any external service
    # Fixed parameters (will be overridden by tune_hp where applicable)
    num_train_epochs=3, # Placeholder, will be suggested by tune_hp
    warmup_steps=100, # Reduced warmup steps for smaller datasets/epochs
    logging_dir="./logs",
    logging_steps=500,
    dataloader_num_workers=os.cpu_count() // 2 if os.cpu_count() else 0, # Use half CPU cores for data loading
)

# Initialize the Trainer
trainer = Trainer(
    model_init=model_init, # We pass the function, not the object, for fresh initialization
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_eval,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
)

# --- Define the objective function for Optuna ---
def optuna_hp_objective(metrics):
    """
    Optuna objective function that returns the F1 score for maximization.
    `metrics` is the dictionary returned by trainer.evaluate().
    """
    # The keys in the metrics dictionary will be prefixed with 'eval_' during evaluation
    # e.g., 'eval_loss', 'eval_accuracy', 'eval_f1', 'eval_precision', 'eval_recall'
    return metrics["eval_f1"]


# --- 5. EXECUTION OF GRID SEARCH ---
# We use Optuna backend for efficient searching. The 'hp_space' provides the search definition.
print("\n--- Starting Hyperparameter Search (using Optuna backend) ---")
print(f"Optimizing for '{training_args.metric_for_best_model}' score...")

# Calculate the total number of trials based on the hp_space function
num_lr = len([2e-5, 3e-5, 4e-5])
num_batch = len([16, 32])
num_wd = len(np.arange(0.01, 0.051, 0.02)) # Includes 0.01, 0.03, 0.05
num_epochs = len([2, 3])
total_trials = num_lr * num_batch * num_wd * num_epochs
print(f"Total experiment combinations: {total_trials}")

best_trial = trainer.hyperparameter_search(
    backend="optuna",
    hp_space=tune_hp,
    direction="maximize", # Maximize the F1 score
    n_trials=total_trials, # Run all combinations defined in tune_hp
    compute_objective=optuna_hp_objective, # <--- CRUCIAL FIX: Explicitly tell Optuna to use eval_f1
)

print("\n--- Hyperparameter Search Complete ---")
print("\nBEST HYPERPARAMETERS FOUND:")

# Extract and print the best configuration
if best_trial:
    print(best_trial)
    best_hps = best_trial.hyperparameters
    print("\nBest Hyperparameters:")
    for key, value in best_hps.items():
        print(f"  {key}: {value}")
    print(f"\nBest Metrics (on evaluation set): {best_trial.metrics}")
else:
    print("Search failed or no best trial found.")

print("\n--- Final Step: Train a model with the best hyperparameters ---")
if best_trial:
    # Re-initialize TrainingArguments with the best hyperparameters for the final training run
    final_training_args = TrainingArguments(
        output_dir="./final_model_mental_health",
        eval_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
        metric_for_best_model="f1",
        fp16=torch.cuda.is_available(),
        report_to="none",
        # Use the best hyperparameters found
        num_train_epochs=best_hps["num_train_epochs"],
        per_device_train_batch_size=best_hps["per_device_train_batch_size"],
        learning_rate=best_hps["learning_rate"],
        weight_decay=best_hps["weight_decay"],
        warmup_steps=100, # Can be adjusted based on number of epochs
        logging_dir="./final_logs",
        logging_steps=500,
        dataloader_num_workers=os.cpu_count() // 2 if os.cpu_count() else 0,
    )

    # Re-initialize the Trainer with the best HPs
    final_trainer = Trainer(
        model_init=model_init, # Re-initialize the model
        args=final_training_args,
        train_dataset=tokenized_train,
        eval_dataset=tokenized_eval,
        compute_metrics=compute_metrics,
        tokenizer=tokenizer,
    )

    print("\nTraining final model with best hyperparameters...")
    final_trainer.train()

    print("\nFinal model training complete. Best model saved to './final_model_mental_health'.")
    metrics = final_trainer.evaluate()
    print(f"Evaluation metrics of the final model: {metrics}")
else:
    print("No best hyperparameters found, skipping final model training.")

Using GPU: Tesla T4
Dataset loaded successfully.
Using 10000 training samples and 2000 evaluation samples.
Train label distribution:
labels
1    0.5004
0    0.4996
Name: proportion, dtype: float64
Eval label distribution:
labels
1    0.5
0    0.5
Name: proportion, dtype: float64


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/394 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/280 [00:00<?, ?B/s]

Map:   0%|          | 0/10000 [00:00<?, ? examples/s]

Map:   0%|          | 0/2000 [00:00<?, ? examples/s]

  trainer = Trainer(


config.json:   0%|          | 0.00/886 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/499M [00:00<?, ?B/s]

[I 2025-11-08 04:00:43,839] A new study created in memory with name: no-name-6fe533e7-77b6-46ee-b28e-bd8283ce3a86



--- Starting Hyperparameter Search (using Optuna backend) ---
Optimizing for 'f1' score...
Total experiment combinations: 36


model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.475,0.264445,0.8845,0.889315,0.853726,0.928
2,0.2435,0.375522,0.8905,0.89752,0.843448,0.959
3,0.1654,0.389608,0.905,0.90481,0.906627,0.903


[I 2025-11-08 04:06:04,803] Trial 0 finished with value: 0.9048096192384769 and parameters: {'learning_rate': 4e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.01, 'num_train_epochs': 3}. Best is trial 0 with value: 0.9048096192384769.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.4739,0.260426,0.8835,0.885504,0.870531,0.901
2,0.238,0.275209,0.9005,0.900151,0.903323,0.897


[I 2025-11-08 04:09:03,605] Trial 1 finished with value: 0.9001505268439538 and parameters: {'learning_rate': 4e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.03, 'num_train_epochs': 2}. Best is trial 0 with value: 0.9048096192384769.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.353865,0.8475,0.830273,0.93601,0.746
2,0.392900,0.257951,0.898,0.900488,0.879048,0.923
3,0.392900,0.295388,0.902,0.90111,0.909369,0.893


[I 2025-11-08 04:12:41,436] Trial 2 finished with value: 0.9011099899091827 and parameters: {'learning_rate': 4e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.01, 'num_train_epochs': 3}. Best is trial 0 with value: 0.9048096192384769.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.355536,0.8315,0.803727,0.962343,0.69
2,0.390800,0.266867,0.901,0.901688,0.895464,0.908
3,0.390800,0.294253,0.906,0.905146,0.913442,0.897


[I 2025-11-08 04:16:38,291] Trial 3 finished with value: 0.905146316851665 and parameters: {'learning_rate': 4e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.05, 'num_train_epochs': 3}. Best is trial 3 with value: 0.905146316851665.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.5045,0.265061,0.8835,0.885278,0.871969,0.899
2,0.2474,0.27671,0.8935,0.892152,0.90359,0.881


[I 2025-11-08 04:19:49,441] Trial 4 finished with value: 0.8921518987341772 and parameters: {'learning_rate': 2e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.05, 'num_train_epochs': 2}. Best is trial 3 with value: 0.905146316851665.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.5062,0.276766,0.878,0.879803,0.86699,0.893


[I 2025-11-08 04:21:09,388] Trial 5 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.4803,0.261164,0.8905,0.893014,0.87297,0.914
2,0.2389,0.281481,0.898,0.897177,0.904472,0.89


[I 2025-11-08 04:24:06,342] Trial 6 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.4803,0.261164,0.8905,0.893014,0.87297,0.914
2,0.2389,0.281481,0.898,0.897177,0.904472,0.89


[I 2025-11-08 04:26:58,073] Trial 7 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.356036,0.8535,0.839276,0.929526,0.765
2,0.386200,0.275615,0.8965,0.89519,0.906667,0.884


[I 2025-11-08 04:29:28,792] Trial 8 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.328873,0.8405,0.825205,0.912727,0.753
2,0.444500,0.271412,0.884,0.881994,0.897516,0.867


[I 2025-11-08 04:31:59,480] Trial 9 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.312762,0.8655,0.856686,0.916762,0.804
2,0.390300,0.245041,0.9035,0.905439,0.887608,0.924
3,0.390300,0.291496,0.9075,0.906988,0.912032,0.902


[I 2025-11-08 04:36:02,542] Trial 10 finished with value: 0.9069884364002011 and parameters: {'learning_rate': 4e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.03, 'num_train_epochs': 3}. Best is trial 10 with value: 0.9069884364002011.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.312762,0.8655,0.856686,0.916762,0.804
2,0.390300,0.245041,0.9035,0.905439,0.887608,0.924
3,0.390300,0.291496,0.9075,0.906988,0.912032,0.902


[I 2025-11-08 04:40:05,725] Trial 11 finished with value: 0.9069884364002011 and parameters: {'learning_rate': 4e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.03, 'num_train_epochs': 3}. Best is trial 10 with value: 0.9069884364002011.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.312762,0.8655,0.856686,0.916762,0.804
2,0.390300,0.245041,0.9035,0.905439,0.887608,0.924
3,0.390300,0.291496,0.9075,0.906988,0.912032,0.902


[I 2025-11-08 04:44:09,931] Trial 12 finished with value: 0.9069884364002011 and parameters: {'learning_rate': 4e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.03, 'num_train_epochs': 3}. Best is trial 10 with value: 0.9069884364002011.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.312762,0.8655,0.856686,0.916762,0.804
2,0.390300,0.245041,0.9035,0.905439,0.887608,0.924
3,0.390300,0.291496,0.9075,0.906988,0.912032,0.902


[I 2025-11-08 04:48:20,868] Trial 13 finished with value: 0.9069884364002011 and parameters: {'learning_rate': 4e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.03, 'num_train_epochs': 3}. Best is trial 10 with value: 0.9069884364002011.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.353865,0.8475,0.830273,0.93601,0.746


[I 2025-11-08 04:49:25,889] Trial 14 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.386615,0.821,0.788166,0.965217,0.666


[I 2025-11-08 04:50:29,709] Trial 15 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.312762,0.8655,0.856686,0.916762,0.804
2,0.390300,0.245041,0.9035,0.905439,0.887608,0.924
3,0.390300,0.291496,0.9075,0.906988,0.912032,0.902


[I 2025-11-08 04:54:32,818] Trial 16 finished with value: 0.9069884364002011 and parameters: {'learning_rate': 4e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.03, 'num_train_epochs': 3}. Best is trial 10 with value: 0.9069884364002011.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.353865,0.8475,0.830273,0.93601,0.746


[I 2025-11-08 04:55:37,429] Trial 17 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.386615,0.821,0.788166,0.965217,0.666


[I 2025-11-08 04:56:41,679] Trial 18 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.336774,0.8335,0.813862,0.922687,0.728


[I 2025-11-08 04:57:45,434] Trial 19 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.312762,0.8655,0.856686,0.916762,0.804
2,0.390300,0.245041,0.9035,0.905439,0.887608,0.924
3,0.390300,0.291496,0.9075,0.906988,0.912032,0.902


[I 2025-11-08 05:01:37,020] Trial 20 finished with value: 0.9069884364002011 and parameters: {'learning_rate': 4e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.03, 'num_train_epochs': 3}. Best is trial 10 with value: 0.9069884364002011.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.312762,0.8655,0.856686,0.916762,0.804
2,0.390300,0.245041,0.9035,0.905439,0.887608,0.924
3,0.390300,0.291496,0.9075,0.906988,0.912032,0.902


[I 2025-11-08 05:05:40,609] Trial 21 finished with value: 0.9069884364002011 and parameters: {'learning_rate': 4e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.03, 'num_train_epochs': 3}. Best is trial 10 with value: 0.9069884364002011.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.312762,0.8655,0.856686,0.916762,0.804
2,0.390300,0.245041,0.9035,0.905439,0.887608,0.924
3,0.390300,0.291496,0.9075,0.906988,0.912032,0.902


[I 2025-11-08 05:09:49,117] Trial 22 finished with value: 0.9069884364002011 and parameters: {'learning_rate': 4e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.03, 'num_train_epochs': 3}. Best is trial 10 with value: 0.9069884364002011.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.312762,0.8655,0.856686,0.916762,0.804
2,0.390300,0.245041,0.9035,0.905439,0.887608,0.924
3,0.390300,0.291496,0.9075,0.906988,0.912032,0.902


[I 2025-11-08 05:14:08,620] Trial 23 finished with value: 0.9069884364002011 and parameters: {'learning_rate': 4e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.03, 'num_train_epochs': 3}. Best is trial 10 with value: 0.9069884364002011.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.312762,0.8655,0.856686,0.916762,0.804
2,0.390300,0.245041,0.9035,0.905439,0.887608,0.924
3,0.390300,0.291496,0.9075,0.906988,0.912032,0.902


[I 2025-11-08 05:18:06,170] Trial 24 finished with value: 0.9069884364002011 and parameters: {'learning_rate': 4e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.03, 'num_train_epochs': 3}. Best is trial 10 with value: 0.9069884364002011.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.312762,0.8655,0.856686,0.916762,0.804
2,0.390300,0.245041,0.9035,0.905439,0.887608,0.924
3,0.390300,0.291496,0.9075,0.906988,0.912032,0.902


[I 2025-11-08 05:22:36,087] Trial 25 finished with value: 0.9069884364002011 and parameters: {'learning_rate': 4e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.03, 'num_train_epochs': 3}. Best is trial 10 with value: 0.9069884364002011.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.353865,0.8475,0.830273,0.93601,0.746


[I 2025-11-08 05:23:43,866] Trial 26 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.312762,0.8655,0.856686,0.916762,0.804
2,0.390300,0.245041,0.9035,0.905439,0.887608,0.924


[W 2025-11-08 05:26:25,438] Trial 27 failed with parameters: {'learning_rate': 4e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.03, 'num_train_epochs': 3} because of the following error: RuntimeError('[enforce fail at inline_container.cc:664] . unexpected pos 55168 vs 55060').
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/torch/serialization.py", line 967, in save
    _save(
  File "/usr/local/lib/python3.12/dist-packages/torch/serialization.py", line 1268, in _save
    zip_file.write_record(name, storage, num_bytes)
RuntimeError: [enforce fail at inline_container.cc:858] . PytorchStreamWriter failed writing file data/1: file write failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/optuna/study/_optimize.py", line 201, in _run_trial
    value_or_values = func(trial)
                      ^^^^^^^^^^^
  File "/usr/local/lib/python3.1

RuntimeError: [enforce fail at inline_container.cc:664] . unexpected pos 55168 vs 55060