This command installs and updates several key Python libraries used for natural language processing and model optimization. The exclamation mark ! allows running shell commands directly from environments like Jupyter Notebook. The pip install command fetches and installs packages: transformers provides pre-trained models and training utilities from Hugging Face, datasets handles efficient dataset loading and preprocessing, accelerate optimizes multi-GPU and distributed training, ray[tune] enables scalable hyperparameter tuning, and optuna offers an alternative, efficient framework for automated hyperparameter optimization. The -U flag ensures all packages are upgraded to their latest compatible versions.

In [1]:
!pip install transformers datasets accelerate ray[tune] optuna -U

Collecting datasets
  Downloading datasets-4.4.1-py3-none-any.whl.metadata (19 kB)
Collecting optuna
  Downloading optuna-4.5.0-py3-none-any.whl.metadata (17 kB)
Collecting ray[tune]
  Downloading ray-2.51.1-cp312-cp312-manylinux2014_x86_64.whl.metadata (21 kB)
Collecting pyarrow>=21.0.0 (from datasets)
  Downloading pyarrow-22.0.0-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (3.2 kB)
Collecting click!=8.3.0,>=7.0 (from ray[tune])
  Downloading click-8.2.1-py3-none-any.whl.metadata (2.5 kB)
Collecting tensorboardX>=1.9 (from ray[tune])
  Downloading tensorboardx-2.6.4-py3-none-any.whl.metadata (6.2 kB)
Collecting colorlog (from optuna)
  Downloading colorlog-6.10.1-py3-none-any.whl.metadata (11 kB)
Downloading datasets-4.4.1-py3-none-any.whl (511 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m511.6/511.6 kB[0m [31m21.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading optuna-4.5.0-py3-none-any.whl (400 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [

**Function Description**

This script performs random hyperparameter search and fine-tuning for a binary RoBERTa-based text classifier on a mental-health Twitter dataset, with explicit adjustments to operate within a 5 GB GPU memory constraint. It prepares and tokenizes the dataset, defines a model initialization routine and evaluation metrics, samples hyperparameters using Optuna’s suggestion methods, executes a defined number of random trials via the Hugging Face Trainer, and then retrains and evaluates a final model using the best hyperparameters discovered by the random search.

**Syntax Explanation**

The code uses Hugging Face transformers and datasets with AutoTokenizer and AutoModelForSequenceClassification for model-agnostic loading, Optuna for randomized hyperparameter suggestions, and scikit-learn for metric computation. The dataset is converted from pandas to a Hugging Face Dataset and tokenized with truncation, padding, and a maximum sequence length of 128 tokens. The Trainer is instantiated with model_init so each trial starts from fresh pretrained weights, and trainer.hyperparameter_search is invoked with hp_space set to the random_hp_space function, which uses suggest_loguniform, suggest_categorical, suggest_float, and suggest_int to sample learning rate, batch size, gradient accumulation steps, weight decay, and number of epochs.

**Inputs**

The primary input is the CSV file Mental-Health-Twitter.csv containing post_text and label columns, where label is renamed to labels for Trainer compatibility. The script removes missing or empty post_text entries, performs a stratified train validation split, and samples the training set to 5,000 rows and the evaluation set to 1,000 rows to reduce memory usage. The pretrained model identifier margotwagner/roberta-psychotherapy-eval is used to initialize the tokenizer and model weights.

**Outputs**

The program prints device selection, dataset sizes, and label distributions, then reports the number of random trials to run and per-trial evaluation results logged through Trainer and Optuna. After the search completes, it prints the best trial and its hyperparameters, retrains a final model with those hyperparameters, evaluates that model, and saves the final checkpoint to ./final_model_mental_health_5gb_random. The evaluation metrics printed include loss, accuracy, F1-score, precision, and recall.

**Code Flow**

The script begins by importing libraries, setting a reproducible seed, and selecting the compute device. It loads and cleans the CSV dataset, renames the label column to labels, performs a stratified train validation split, and reduces the dataset size to 5,000 training and 1,000 evaluation samples. The DataFrames are converted to Hugging Face Dataset objects and tokenized with truncation, padding, and max_length=128, then formatted as PyTorch tensors. A model_init function is defined to instantiate a fresh AutoModelForSequenceClassification for each trial, and compute_metrics is defined to compute accuracy, F1, precision, and recall from predictions. The random_hp_space function defines the random search space tailored for a 5 GB GPU, sampling learning rate from a log-uniform range, selecting per-device batch sizes from conservative options, sampling gradient accumulation steps to permit effective larger batch sizes without increased VRAM, sampling weight decay from a float range, and sampling the number of epochs as an integer. TrainingArguments set shared training behaviors and the Trainer is created with model_init, datasets, tokenizer, and metrics. trainer.hyperparameter_search runs NUM_RANDOM_TRIALS random trials using Optuna and maximizes eval_f1 via the provided objective function. If a best trial is found, the script reconstructs TrainingArguments with the chosen hyperparameters, reinitializes a Trainer, trains the final model, evaluates it, and saves the checkpoint.

**Comments and Observations**

The script is well suited to constrained GPU environments because it reduces dataset size, limits per-device batch sizes, and introduces gradient accumulation steps to simulate larger effective batch sizes without exceeding memory limits. Using suggest_loguniform for learning rate is appropriate because it allows sampling across orders of magnitude, and model_init ensures trial independence so that each hyperparameter configuration is evaluated from the same starting point. Optimizing for F1 makes sense for binary labels that may be imbalanced, and keeping max_length at 128 is a reasonable choice for tweet-length inputs but should be validated against token length distribution. It is recommended to verify the existence of the __index_level_0__ column before removing it to avoid errors, to consider enabling Optuna pruning to stop low-performing trials early, and to validate that labels contains exactly 0 and 1 so that metric calculations remain meaningful.

In [2]:
# 1. SETUP AND INSTALLATION
# Run this command first in your Colab notebook:
# !pip install transformers datasets accelerate ray[tune] optuna pandas -U

import torch
import os
import numpy as np
import pandas as pd
from datasets import Dataset
from transformers import (
    AutoModelForSequenceClassification,
    AutoTokenizer,
    TrainingArguments,
    Trainer,
    set_seed
)
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score
import optuna # Import optuna to use its suggestion methods for random search

# Set a consistent seed for reproducibility across runs
set_seed(42)

# Ensure GPU is available
if torch.cuda.is_available():
    device = torch.device("cuda")
    print(f"Using GPU: {torch.cuda.get_device_name(0)}")
else:
    device = torch.device("cpu")
    print("Using CPU. For faster training, consider enabling a GPU runtime.")

# --- 2. DATA PREPARATION (Using your Mental-Health-Twitter.csv) ---

# Load your dataset
try:
    df = pd.read_csv("Mental-Health-Twitter.csv")
    print("Dataset loaded successfully.")
except FileNotFoundError:
    print("Error: 'Mental-Health-Twitter.csv' not found. Please upload it to your Colab environment.")
    exit()

# Filter out rows where 'post_text' is NaN or empty
df = df.dropna(subset=['post_text'])
df = df[df['post_text'].str.strip() != '']

# Rename 'label' to 'labels' for Hugging Face Trainer compatibility
df = df.rename(columns={"label": "labels"})

# Split data into training and validation sets
# IMPORTANT: Keep dataset size manageable for 5GB GPU.
# 10k train / 2k eval is still quite a lot for a 5GB GPU and RoBERTa-base.
# Let's reduce it further to make sure even batch_size=16 is stable.
train_df, eval_df = train_test_split(df, test_size=0.1, stratify=df['labels'], random_state=42)
train_df = train_df.sample(n=5000, random_state=42) # REDUCED TO 5K TRAINING SAMPLES
eval_df = eval_df.sample(n=1000, random_state=42)   # REDUCED TO 1K EVALUATION SAMPLES

print(f"Using {len(train_df)} training samples and {len(eval_df)} evaluation samples.")
print(f"Train label distribution:\n{train_df['labels'].value_counts(normalize=True)}")
print(f"Eval label distribution:\n{eval_df['labels'].value_counts(normalize=True)}")

# Convert pandas DataFrames to Hugging Face Dataset objects
train_dataset = Dataset.from_pandas(train_df[['post_text', 'labels']])
eval_dataset = Dataset.from_pandas(eval_df[['post_text', 'labels']])

# Initialize Tokenizer for your specific model
MODEL_NAME = "margotwagner/roberta-psychotherapy-eval"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

def tokenize_function(examples):
    # Max length is already 128, which is good. Don't go higher.
    return tokenizer(examples["post_text"], truncation=True, padding=True, max_length=128)

# Apply tokenization
tokenized_train = train_dataset.map(tokenize_function, batched=True, remove_columns=["post_text", "__index_level_0__"])
tokenized_eval = eval_dataset.map(tokenize_function, batched=True, remove_columns=["post_text", "__index_level_0__"])

# Set format to PyTorch tensors
tokenized_train.set_format("torch", columns=['input_ids', 'attention_mask', 'labels'])
tokenized_eval.set_format("torch", columns=['input_ids', 'attention_mask', 'labels'])


# --- 3. MODEL, METRICS, AND HYPERPARAMETER DEFINITION ---

# Function to initialize a fresh model for each grid search run
def model_init():
    return AutoModelForSequenceClassification.from_pretrained(MODEL_NAME, num_labels=2).to(device)

def compute_metrics(p):
    preds = np.argmax(p.predictions, axis=1)
    acc = accuracy_score(p.label_ids, preds)
    f1 = f1_score(p.label_ids, preds, average="binary")
    precision = precision_score(p.label_ids, preds, average="binary")
    recall = recall_score(p.label_ids, preds, average="binary")
    return {"accuracy": acc, "f1": f1, "precision": precision, "recall": recall}

# --- HYPERPARAMETER SEARCH SPACE FOR RANDOM SEARCH (5GB GPU CONSCIOUS) ---
def random_hp_space(trial):
    """
    This function defines the hyperparameter search space for Random Search,
    optimized for a 5GB GPU memory limit.
    """
    # 1. Learning Rate (log-uniform distribution is common for learning rates)
    learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range

    # 2. Batch Size (CRITICAL FOR 5GB GPU)
    # We must be very conservative here. Batch size 32 is likely too large.
    # We will mainly stick to 8 or 16, and consider gradient_accumulation_steps.
    per_device_train_batch_size = trial.suggest_categorical("per_device_train_batch_size", [8, 16])

    # 3. Gradient Accumulation Steps (compensates for small batch sizes)
    # This effectively makes the 'effective' batch size larger without increasing VRAM.
    # effective_batch_size = per_device_train_batch_size * gradient_accumulation_steps
    gradient_accumulation_steps = trial.suggest_categorical("gradient_accumulation_steps", [1, 2, 4]) # Try accumulating gradients

    # 4. Weight Decay (uniform float distribution)
    weight_decay = trial.suggest_float("weight_decay", 0.0, 0.1, step=0.01)

    # 5. Number of Training Epochs (keep low for faster trials)
    num_train_epochs = trial.suggest_int("num_train_epochs", 2, 3) # Max 3 epochs

    return {
        "learning_rate": learning_rate,
        "per_device_train_batch_size": per_device_train_batch_size,
        "gradient_accumulation_steps": gradient_accumulation_steps, # New HP
        "weight_decay": weight_decay,
        "num_train_epochs": num_train_epochs,
    }


# --- 4. TRAINING ARGUMENTS (Fixed for all runs) ---
training_args = TrainingArguments(
    output_dir="./random_search_results_mental_health_5gb", # New output directory
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    metric_for_best_model="f1",
    fp16=torch.cuda.is_available(), # Enable mixed precision for T4 GPU
    report_to="none",
    num_train_epochs=3, # Placeholder, will be suggested by random_hp_space
    warmup_steps=100,
    logging_dir="./logs_random_5gb", # New logging directory
    logging_steps=500,
    dataloader_num_workers=os.cpu_count() // 2 if os.cpu_count() else 0,
    # gradient_accumulation_steps will be passed directly from random_hp_space
)

# Initialize the Trainer
trainer = Trainer(
    model_init=model_init,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_eval,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
)

# --- Define the objective function for Optuna ---
def optuna_hp_objective(metrics):
    """
    Optuna objective function that returns the F1 score for maximization.
    `metrics` is the dictionary returned by trainer.evaluate().
    """
    return metrics["eval_f1"]


# --- 5. EXECUTION OF RANDOM SEARCH ---
print("\n--- Starting Random Search (using Optuna backend) ---")
print(f"Optimizing for '{training_args.metric_for_best_model}' score...")

NUM_RANDOM_TRIALS = 30 # Aim for 30 trials to explore more thoroughly given the reduced epoch count

print(f"Number of random trials to run: {NUM_RANDOM_TRIALS}")
print("NOTE: Batch sizes are kept low and gradient accumulation is used to manage 5GB GPU memory.")
print("      Dataset size has also been reduced for quicker iteration.")

best_trial = trainer.hyperparameter_search(
    backend="optuna",
    hp_space=random_hp_space, # Use the new random_hp_space function
    direction="maximize",
    n_trials=NUM_RANDOM_TRIALS,
    compute_objective=optuna_hp_objective,
)

print("\n--- Random Search Complete ---")
print("\nBEST HYPERPARAMETERS FOUND:")

# Extract and print the best configuration
if best_trial:
    print(best_trial)
    best_hps = best_trial.hyperparameters
    print("\nBest Hyperparameters:")
    for key, value in best_hps.items():
        print(f"  {key}: {value}")
    print(f"\nBest Metrics (on evaluation set): {best_trial.metrics}")
else:
    print("Search failed or no best trial found.")

print("\n--- Final Step: Train a model with the best hyperparameters ---")
if best_trial:
    final_training_args = TrainingArguments(
        output_dir="./final_model_mental_health_5gb_random", # New output directory
        eval_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
        metric_for_best_model="f1",
        fp16=torch.cuda.is_available(),
        report_to="none",
        num_train_epochs=best_hps["num_train_epochs"],
        per_device_train_batch_size=best_hps["per_device_train_batch_size"],
        gradient_accumulation_steps=best_hps["gradient_accumulation_steps"], # Apply best G.A.S.
        learning_rate=best_hps["learning_rate"],
        weight_decay=best_hps["weight_decay"],
        warmup_steps=100,
        logging_dir="./final_logs_random_5gb", # New logging directory
        logging_steps=500,
        dataloader_num_workers=os.cpu_count() // 2 if os.cpu_count() else 0,
    )

    final_trainer = Trainer(
        model_init=model_init,
        args=final_training_args,
        train_dataset=tokenized_train,
        eval_dataset=tokenized_eval,
        compute_metrics=compute_metrics,
        tokenizer=tokenizer,
    )

    print("\nTraining final model with best hyperparameters from Random Search (5GB GPU config)...")
    final_trainer.train()

    print("\nFinal model training complete. Best model saved to './final_model_mental_health_5gb_random'.")
    metrics = final_trainer.evaluate()
    print(f"Evaluation metrics of the final model: {metrics}")
else:
    print("No best hyperparameters found, skipping final model training.")

Using GPU: Tesla T4
Dataset loaded successfully.
Using 5000 training samples and 1000 evaluation samples.
Train label distribution:
labels
1    0.5
0    0.5
Name: proportion, dtype: float64
Eval label distribution:
labels
0    0.507
1    0.493
Name: proportion, dtype: float64


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/394 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/280 [00:00<?, ?B/s]

Map:   0%|          | 0/5000 [00:00<?, ? examples/s]

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

  trainer = Trainer(


config.json:   0%|          | 0.00/886 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/499M [00:00<?, ?B/s]

[I 2025-11-08 05:40:41,484] A new study created in memory with name: no-name-56889edb-2c9f-4f0d-b245-eb69a87ca40a



--- Starting Random Search (using Optuna backend) ---
Optimizing for 'f1' score...
Number of random trials to run: 30
NOTE: Batch sizes are kept low and gradient accumulation is used to manage 5GB GPU memory.
      Dataset size has also been reduced for quicker iteration.


  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range


model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.369737,0.842,0.848659,0.803993,0.89858
2,0.506800,0.319464,0.876,0.875752,0.865347,0.88641
3,0.506800,0.347308,0.879,0.877157,0.878049,0.876268


[I 2025-11-08 05:44:04,078] Trial 0 finished with value: 0.8771573604060914 and parameters: {'learning_rate': 1.8413192623997537e-05, 'per_device_train_batch_size': 8, 'gradient_accumulation_steps': 2, 'weight_decay': 0.05, 'num_train_epochs': 3}. Best is trial 0 with value: 0.8771573604060914.
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.5406,0.333594,0.838,0.854054,0.768233,0.96146
2,0.3338,0.473187,0.874,0.877907,0.840445,0.918864
3,0.2314,0.548604,0.889,0.888442,0.880478,0.896552


[I 2025-11-08 05:47:30,573] Trial 1 finished with value: 0.8884422110552764 and parameters: {'learning_rate': 3.948263170850873e-05, 'per_device_train_batch_size': 8, 'gradient_accumulation_steps': 1, 'weight_decay': 0.05, 'num_train_epochs': 3}. Best is trial 1 with value: 0.8884422110552764.
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.53479,0.715,0.751092,0.659509,0.872211
2,No log,0.372036,0.845,0.838036,0.864224,0.813387
3,No log,0.355703,0.854,0.85102,0.856263,0.845842


[I 2025-11-08 05:50:17,623] Trial 2 finished with value: 0.8510204081632653 and parameters: {'learning_rate': 1.2541545597472021e-05, 'per_device_train_batch_size': 8, 'gradient_accumulation_steps': 4, 'weight_decay': 0.03, 'num_train_epochs': 3}. Best is trial 1 with value: 0.8884422110552764.
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.442323,0.785,0.796209,0.747331,0.851927
2,0.563000,0.347993,0.855,0.85189,0.858025,0.845842
3,0.563000,0.345712,0.866,0.864097,0.864097,0.864097


[I 2025-11-08 05:53:14,813] Trial 3 finished with value: 0.8640973630831643 and parameters: {'learning_rate': 1.0641484830166137e-05, 'per_device_train_batch_size': 8, 'gradient_accumulation_steps': 2, 'weight_decay': 0.03, 'num_train_epochs': 3}. Best is trial 1 with value: 0.8884422110552764.
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.5595,0.318346,0.865,0.868035,0.837736,0.900609
2,0.3115,0.411721,0.88,0.878788,0.875252,0.882353


[I 2025-11-08 05:55:18,800] Trial 4 finished with value: 0.8787878787878788 and parameters: {'learning_rate': 2.9779069150641384e-05, 'per_device_train_batch_size': 8, 'gradient_accumulation_steps': 1, 'weight_decay': 0.03, 'num_train_epochs': 2}. Best is trial 1 with value: 0.8884422110552764.
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.366222,0.838,0.834016,0.84265,0.825558
2,No log,0.29847,0.879,0.875642,0.8875,0.864097
3,No log,0.311949,0.882,0.880081,0.881874,0.878296


[I 2025-11-08 06:00:20,764] Trial 5 finished with value: 0.8800813008130082 and parameters: {'learning_rate': 2.821703213663777e-05, 'per_device_train_batch_size': 8, 'gradient_accumulation_steps': 4, 'weight_decay': 0.04, 'num_train_epochs': 3}. Best is trial 1 with value: 0.8884422110552764.
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.5478,0.380837,0.853,0.85879,0.815693,0.906694


[I 2025-11-08 06:01:16,562] Trial 6 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.6266,0.393741,0.841,0.845481,0.811567,0.882353


[I 2025-11-08 06:02:12,334] Trial 7 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.38934,0.833,0.830112,0.832653,0.827586
2,No log,0.325721,0.863,0.860347,0.864754,0.855984


[I 2025-11-08 06:05:34,212] Trial 8 finished with value: 0.8603465851172273 and parameters: {'learning_rate': 2.4250730135498475e-05, 'per_device_train_batch_size': 8, 'gradient_accumulation_steps': 4, 'weight_decay': 0.1, 'num_train_epochs': 2}. Best is trial 1 with value: 0.8884422110552764.
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.313352,0.866,0.870656,0.830571,0.914807
2,0.461900,0.345976,0.864,0.869981,0.822785,0.922921
3,0.461900,0.389588,0.883,0.881218,0.882114,0.880325


[I 2025-11-08 06:10:51,562] Trial 9 finished with value: 0.8812182741116751 and parameters: {'learning_rate': 3.093096589324922e-05, 'per_device_train_batch_size': 8, 'gradient_accumulation_steps': 2, 'weight_decay': 0.02, 'num_train_epochs': 3}. Best is trial 1 with value: 0.8884422110552764.
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.322611,0.856,0.862857,0.813285,0.918864
2,0.456400,0.315092,0.876,0.878669,0.848771,0.910751
3,0.456400,0.44001,0.889,0.888218,0.882,0.894523


[I 2025-11-08 06:14:59,349] Trial 10 finished with value: 0.8882175226586103 and parameters: {'learning_rate': 4.422467653982048e-05, 'per_device_train_batch_size': 16, 'gradient_accumulation_steps': 1, 'weight_decay': 0.08, 'num_train_epochs': 3}. Best is trial 1 with value: 0.8884422110552764.
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.373076,0.844,0.854478,0.791019,0.929006


[I 2025-11-08 06:15:39,907] Trial 11 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.346961,0.841,0.852093,0.786942,0.929006


[I 2025-11-08 06:16:20,617] Trial 12 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.360725,0.851,0.861909,0.793515,0.943205
2,0.470400,0.296263,0.88,0.879032,0.873747,0.884381
3,0.470400,0.420023,0.89,0.88978,0.879208,0.900609


[I 2025-11-08 06:20:20,917] Trial 13 finished with value: 0.8897795591182365 and parameters: {'learning_rate': 3.688694822915927e-05, 'per_device_train_batch_size': 16, 'gradient_accumulation_steps': 1, 'weight_decay': 0.07, 'num_train_epochs': 3}. Best is trial 13 with value: 0.8897795591182365.
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.346551,0.861,0.86544,0.827778,0.906694
2,0.465900,0.304497,0.879,0.880079,0.860465,0.900609
3,0.465900,0.47491,0.871,0.871897,0.854086,0.890467


[I 2025-11-08 06:24:56,115] Trial 14 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.400768,0.838,0.839286,0.821359,0.858012


[I 2025-11-08 06:25:35,683] Trial 15 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.348331,0.851,0.856039,0.817343,0.89858


[I 2025-11-08 06:26:15,551] Trial 16 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.37963,0.847,0.847761,0.832031,0.864097


[I 2025-11-08 06:26:56,024] Trial 17 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.400228,0.818,0.820158,0.799615,0.841785


[I 2025-11-08 06:27:31,831] Trial 18 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.558752,0.709,0.755257,0.645115,0.910751
2,No log,0.328176,0.871,0.867147,0.880753,0.853955
3,No log,0.341609,0.865,0.859521,0.882479,0.837728


[I 2025-11-08 06:31:04,825] Trial 19 finished with value: 0.8595213319458896 and parameters: {'learning_rate': 3.384946391294378e-05, 'per_device_train_batch_size': 16, 'gradient_accumulation_steps': 4, 'weight_decay': 0.06, 'num_train_epochs': 3}. Best is trial 13 with value: 0.8897795591182365.
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.357397,0.849,0.860315,0.790816,0.943205


[I 2025-11-08 06:31:45,449] Trial 20 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.404344,0.81,0.834206,0.732006,0.969574


[I 2025-11-08 06:32:25,353] Trial 21 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.37107,0.829,0.846637,0.758842,0.957404


[I 2025-11-08 06:33:05,731] Trial 22 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.371237,0.853,0.857143,0.822761,0.894523


[I 2025-11-08 06:33:45,573] Trial 23 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.362912,0.849,0.855502,0.809783,0.906694


[I 2025-11-08 06:34:25,552] Trial 24 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.346218,0.862,0.865234,0.834275,0.89858
2,0.472200,0.297707,0.878,0.878,0.865878,0.890467
3,0.472200,0.442957,0.883,0.882883,0.871542,0.894523


[I 2025-11-08 06:38:37,145] Trial 25 finished with value: 0.8828828828828829 and parameters: {'learning_rate': 3.2432522321584426e-05, 'per_device_train_batch_size': 16, 'gradient_accumulation_steps': 1, 'weight_decay': 0.05, 'num_train_epochs': 3}. Best is trial 13 with value: 0.8897795591182365.
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.5533,0.360598,0.84,0.855072,0.772504,0.957404


[I 2025-11-08 06:39:32,698] Trial 26 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.380272,0.821,0.842012,0.745313,0.967546


[I 2025-11-08 06:40:12,317] Trial 27 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.336141,0.861,0.869238,0.810526,0.93712
2,0.453500,0.339208,0.868,0.874763,0.821747,0.935091


[I 2025-11-08 06:42:40,616] Trial 28 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.424501,0.805,0.805583,0.792157,0.819473


[I 2025-11-08 06:43:23,432] Trial 29 pruned. 



--- Random Search Complete ---

BEST HYPERPARAMETERS FOUND:
BestRun(run_id='13', objective=0.8897795591182365, hyperparameters={'learning_rate': 3.688694822915927e-05, 'per_device_train_batch_size': 16, 'gradient_accumulation_steps': 1, 'weight_decay': 0.07, 'num_train_epochs': 3}, run_summary=None)

Best Hyperparameters:
  learning_rate: 3.688694822915927e-05
  per_device_train_batch_size: 16
  gradient_accumulation_steps: 1
  weight_decay: 0.07
  num_train_epochs: 3


AttributeError: 'BestRun' object has no attribute 'metrics'