This command installs and updates several key Python libraries used for natural language processing and model optimization. The exclamation mark ! allows running shell commands directly from environments like Jupyter Notebook. The pip install command fetches and installs packages: transformers provides pre-trained models and training utilities from Hugging Face, datasets handles efficient dataset loading and preprocessing, accelerate optimizes multi-GPU and distributed training, ray[tune] enables scalable hyperparameter tuning, and optuna offers an alternative, efficient framework for automated hyperparameter optimization. The -U flag ensures all packages are upgraded to their latest compatible versions.

In [1]:
!pip install transformers datasets accelerate ray[tune] optuna -U

Collecting datasets
  Downloading datasets-4.4.1-py3-none-any.whl.metadata (19 kB)
Collecting optuna
  Downloading optuna-4.5.0-py3-none-any.whl.metadata (17 kB)
Collecting ray[tune]
  Downloading ray-2.51.1-cp312-cp312-manylinux2014_x86_64.whl.metadata (21 kB)
Collecting pyarrow>=21.0.0 (from datasets)
  Downloading pyarrow-22.0.0-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (3.2 kB)
Collecting click!=8.3.0,>=7.0 (from ray[tune])
  Downloading click-8.2.1-py3-none-any.whl.metadata (2.5 kB)
Collecting tensorboardX>=1.9 (from ray[tune])
  Downloading tensorboardx-2.6.4-py3-none-any.whl.metadata (6.2 kB)
Collecting colorlog (from optuna)
  Downloading colorlog-6.10.1-py3-none-any.whl.metadata (11 kB)
Downloading datasets-4.4.1-py3-none-any.whl (511 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m511.6/511.6 kB[0m [31m20.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading optuna-4.5.0-py3-none-any.whl (400 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [

**Function Description**

This script conducts random hyperparameter search and fine-tuning for a binary RoBERTa-based text classifier on a mental-health Twitter dataset. It prepares and tokenizes the data, defines a model initialization routine and evaluation metrics, uses Optuna via the Hugging Face Trainer to sample hyperparameters from continuous and categorical distributions, and then retrains a final model using the best hyperparameters discovered by the random search.

**Syntax Explanation**

The code uses Hugging Face transformers and datasets together with scikit-learn metrics and Optuna for hyperparameter sampling. The AutoTokenizer and AutoModelForSequenceClassification classes are used to load model-specific tokenization and weights from the chosen pretrained checkpoint. The dataset is converted from pandas to a Hugging Face Dataset and tokenized with truncation, padding, and max_length=128. The Trainer is configured with model_init so every trial starts from a fresh pretrained model, compute_metrics computes accuracy, F1, precision, and recall using numpy and scikit-learn, and trainer.hyperparameter_search runs a specified number of random trials where Optuna suggests values using functions such as suggest_loguniform, suggest_categorical, suggest_float, and suggest_int.

**Inputs**

The primary input is the CSV file Mental-Health-Twitter.csv, which must contain post_text and label columns; the script renames the label column to labels for Trainer compatibility. The data is cleaned to remove missing or empty post_text entries, then split stratified by label into training and evaluation sets, with the training set sampled to 10,000 rows and the evaluation set sampled to 2,000 rows. The pretrained model identifier margotwagner/roberta-psychotherapy-eval is supplied to initialize tokenizer and model weights.

**Outputs**

The script prints device information, dataset sizes, label distributions, and the number of random trials to run. During the search, per-trial evaluation results are logged through Trainer and Optuna. After the search completes, the best trial and its hyperparameters are printed, a final model is trained with those hyperparameters, and the final evaluation metrics are shown. The final model checkpoint is saved under ./final_model_mental_health_random and the evaluation metrics include loss, accuracy, F1-score, precision, and recall.

**Code Flow**

The script begins by importing libraries, setting a reproducible seed, and selecting the compute device. It loads the CSV dataset, removes missing or empty post_text, renames the label column to labels, and performs a stratified train validation split, followed by sampling to 10,000 training and 2,000 evaluation examples. The DataFrames are converted to Hugging Face Dataset objects and tokenized with truncation, padding, and a max length of 128 tokens, then formatted as PyTorch tensors. A model_init function is defined to instantiate a fresh AutoModelForSequenceClassification for each trial, and compute_metrics is defined to compute accuracy, F1, precision, and recall using predicted class indices. The random_hp_space function defines the random search space using Optuna suggestion methods, sampling learning rate from a log uniform range, selecting batch sizes from a categorical set, sampling weight decay from a float range, and sampling number of epochs as an integer. TrainingArguments sets common training behaviors, and the Trainer is created with model_init, datasets, tokenizer, and metric function. trainer.hyperparameter_search runs NUM_RANDOM_TRIALS random trials with the Optuna backend and maximizes eval_f1 via the provided objective function. If a best trial is found, the script reconstitutes training arguments with the chosen hyperparameters, reinitializes a Trainer, trains the final model, evaluates it, and saves the final checkpoint.

**Comments and Observations**

The use of Optuna for random search allows efficient exploration of continuous and discrete hyperparameter spaces, and suggest_loguniform for learning rate is appropriate because learning rates often span orders of magnitude. Setting model_init ensures trial independence, and optimizing for F1 is suitable for binary labels that may be imbalanced. The chosen sample sizes and max_length=128 are well matched to short social posts, but confirm token length distribution to avoid unnecessary truncation. The code assumes the pandas index column has the name __index_level_0__ when removing columns; verify this to prevent errors. Consider adding Optuna pruning callbacks or using early stopping to abort low-performing trials early and save compute. If GPU memory becomes limiting, reduce per-device batch size, enable gradient accumulation to simulate larger batches, or lower max_length. Finally, validate that labels contains only 0 and 1 and inspect class balance so metrics are interpretable and the final model is reliable.

In [2]:
# 1. SETUP AND INSTALLATION
# Run this command first in your Colab notebook:
# !pip install transformers datasets accelerate ray[tune] optuna pandas -U

import torch
import os
import numpy as np
import pandas as pd
from datasets import Dataset
from transformers import (
    AutoModelForSequenceClassification,
    AutoTokenizer,
    TrainingArguments,
    Trainer,
    set_seed
)
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score
import optuna # Import optuna to use its suggestion methods for random search

# Set a consistent seed for reproducibility across runs
set_seed(42)

# Ensure GPU is available
if torch.cuda.is_available():
    device = torch.device("cuda")
    print(f"Using GPU: {torch.cuda.get_device_name(0)}")
else:
    device = torch.device("cpu")
    print("Using CPU. For faster training, consider enabling a GPU runtime.")

# --- 2. DATA PREPARATION (Using your Mental-Health-Twitter.csv) ---

# Load your dataset
try:
    df = pd.read_csv("Mental-Health-Twitter.csv")
    print("Dataset loaded successfully.")
except FileNotFoundError:
    print("Error: 'Mental-Health-Twitter.csv' not found. Please upload it to your Colab environment.")
    exit()

# Filter out rows where 'post_text' is NaN or empty
df = df.dropna(subset=['post_text'])
df = df[df['post_text'].str.strip() != '']

# Rename 'label' to 'labels' for Hugging Face Trainer compatibility
df = df.rename(columns={"label": "labels"})

# Split data into training and validation sets
train_df, eval_df = train_test_split(df, test_size=0.1, stratify=df['labels'], random_state=42)
train_df = train_df.sample(n=10000, random_state=42) # Limit to 10k training samples
eval_df = eval_df.sample(n=2000, random_state=42)   # Limit to 2k evaluation samples

print(f"Using {len(train_df)} training samples and {len(eval_df)} evaluation samples.")
print(f"Train label distribution:\n{train_df['labels'].value_counts(normalize=True)}")
print(f"Eval label distribution:\n{eval_df['labels'].value_counts(normalize=True)}")

# Convert pandas DataFrames to Hugging Face Dataset objects
train_dataset = Dataset.from_pandas(train_df[['post_text', 'labels']])
eval_dataset = Dataset.from_pandas(eval_df[['post_text', 'labels']])

# Initialize Tokenizer for your specific model
MODEL_NAME = "margotwagner/roberta-psychotherapy-eval"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

def tokenize_function(examples):
    return tokenizer(examples["post_text"], truncation=True, padding=True, max_length=128)

# Apply tokenization
tokenized_train = train_dataset.map(tokenize_function, batched=True, remove_columns=["post_text", "__index_level_0__"])
tokenized_eval = eval_dataset.map(tokenize_function, batched=True, remove_columns=["post_text", "__index_level_0__"])

# Set format to PyTorch tensors
tokenized_train.set_format("torch", columns=['input_ids', 'attention_mask', 'labels'])
tokenized_eval.set_format("torch", columns=['input_ids', 'attention_mask', 'labels'])


# --- 3. MODEL, METRICS, AND HYPERPARAMETER DEFINITION ---

# Function to initialize a fresh model for each grid search run
def model_init():
    return AutoModelForSequenceClassification.from_pretrained(MODEL_NAME, num_labels=2).to(device)

def compute_metrics(p):
    preds = np.argmax(p.predictions, axis=1)
    acc = accuracy_score(p.label_ids, preds)
    f1 = f1_score(p.label_ids, preds, average="binary")
    precision = precision_score(p.label_ids, preds, average="binary")
    recall = recall_score(p.label_ids, preds, average="binary")
    return {"accuracy": acc, "f1": f1, "precision": precision, "recall": recall}

# --- HYPERPARAMETER SEARCH SPACE FOR RANDOM SEARCH ---
def random_hp_space(trial):
    """
    This function defines the hyperparameter search space for Random Search.
    Optuna's suggestion methods (like suggest_loguniform, suggest_float) are
    inherently suitable for random sampling from a distribution.
    """
    # 1. Learning Rate (log-uniform distribution is common for learning rates)
    learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range than grid search

    # 2. Batch Size (categorical, as before)
    per_device_train_batch_size = trial.suggest_categorical("per_device_train_batch_size", [16, 32])

    # 3. Weight Decay (uniform float distribution)
    weight_decay = trial.suggest_float("weight_decay", 0.0, 0.1, step=0.01) # Broader range

    # 4. Number of Training Epochs (integer uniform distribution)
    num_train_epochs = trial.suggest_int("num_train_epochs", 2, 4) # Allow 2, 3, or 4 epochs

    # --- EXPANSION SUPPORT: Add more parameters here if needed ---
    # Example: gradient_accumulation_steps = trial.suggest_categorical("gradient_accumulation_steps", [1, 2, 4])

    return {
        "learning_rate": learning_rate,
        "per_device_train_batch_size": per_device_train_batch_size,
        "weight_decay": weight_decay,
        "num_train_epochs": num_train_epochs,
    }


# --- 4. TRAINING ARGUMENTS (Fixed for all runs) ---
training_args = TrainingArguments(
    output_dir="./random_search_results_mental_health",
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    metric_for_best_model="f1",
    fp16=torch.cuda.is_available(),
    report_to="none",
    num_train_epochs=3, # Placeholder, will be suggested by random_hp_space
    warmup_steps=100,
    logging_dir="./logs_random",
    logging_steps=500,
    dataloader_num_workers=os.cpu_count() // 2 if os.cpu_count() else 0,
)

# Initialize the Trainer
trainer = Trainer(
    model_init=model_init,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_eval,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
)

# --- Define the objective function for Optuna ---
def optuna_hp_objective(metrics):
    """
    Optuna objective function that returns the F1 score for maximization.
    `metrics` is the dictionary returned by trainer.evaluate().
    """
    return metrics["eval_f1"]


# --- 5. EXECUTION OF RANDOM SEARCH ---
print("\n--- Starting Random Search (using Optuna backend) ---")
print(f"Optimizing for '{training_args.metric_for_best_model}' score...")

# For random search, we don't calculate total_trials from a grid.
# Instead, we define 'n_trials' (the number of random samples to try).
# Let's aim for 20-30 trials to have a good chance of finding a strong configuration.
# You can adjust this based on your remaining time and observed trial duration.
NUM_RANDOM_TRIALS = 25 # Increased to 25 to ensure more than 10 experiments

print(f"Number of random trials to run: {NUM_RANDOM_TRIALS}")

best_trial = trainer.hyperparameter_search(
    backend="optuna",
    hp_space=random_hp_space, # <--- Use the new random_hp_space function
    direction="maximize",
    n_trials=NUM_RANDOM_TRIALS, # <--- Specify the number of random trials
    compute_objective=optuna_hp_objective,
)

print("\n--- Random Search Complete ---")
print("\nBEST HYPERPARAMETERS FOUND:")

# Extract and print the best configuration
if best_trial:
    print(best_trial)
    best_hps = best_trial.hyperparameters
    print("\nBest Hyperparameters:")
    for key, value in best_hps.items():
        print(f"  {key}: {value}")
    print(f"\nBest Metrics (on evaluation set): {best_trial.metrics}")
else:
    print("Search failed or no best trial found.")

print("\n--- Final Step: Train a model with the best hyperparameters ---")
if best_trial:
    final_training_args = TrainingArguments(
        output_dir="./final_model_mental_health_random", # New output directory
        eval_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
        metric_for_best_model="f1",
        fp16=torch.cuda.is_available(),
        report_to="none",
        num_train_epochs=best_hps["num_train_epochs"],
        per_device_train_batch_size=best_hps["per_device_train_batch_size"],
        learning_rate=best_hps["learning_rate"],
        weight_decay=best_hps["weight_decay"],
        warmup_steps=100,
        logging_dir="./final_logs_random", # New logging directory
        logging_steps=500,
        dataloader_num_workers=os.cpu_count() // 2 if os.cpu_count() else 0,
    )

    final_trainer = Trainer(
        model_init=model_init,
        args=final_training_args,
        train_dataset=tokenized_train,
        eval_dataset=tokenized_eval,
        compute_metrics=compute_metrics,
        tokenizer=tokenizer,
    )

    print("\nTraining final model with best hyperparameters from Random Search...")
    final_trainer.train()

    print("\nFinal model training complete. Best model saved to './final_model_mental_health_random'.")
    metrics = final_trainer.evaluate()
    print(f"Evaluation metrics of the final model: {metrics}")
else:
    print("No best hyperparameters found, skipping final model training.")

Using GPU: Tesla T4
Dataset loaded successfully.
Using 10000 training samples and 2000 evaluation samples.
Train label distribution:
labels
1    0.5004
0    0.4996
Name: proportion, dtype: float64
Eval label distribution:
labels
1    0.5
0    0.5
Name: proportion, dtype: float64


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/394 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/280 [00:00<?, ?B/s]

Map:   0%|          | 0/10000 [00:00<?, ? examples/s]

Map:   0%|          | 0/2000 [00:00<?, ? examples/s]

  trainer = Trainer(


config.json:   0%|          | 0.00/886 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/499M [00:00<?, ?B/s]

[I 2025-11-08 06:11:44,145] A new study created in memory with name: no-name-de637ebb-4101-4974-a201-6ab7bda08e05



--- Starting Random Search (using Optuna backend) ---
Optimizing for 'f1' score...
Number of random trials to run: 25


  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range than grid search


model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.5453,0.300274,0.8615,0.860734,0.865521,0.856
2,0.2752,0.31375,0.882,0.886429,0.85436,0.921
3,0.2182,0.309616,0.893,0.895406,0.875717,0.916
4,0.146,0.340498,0.8955,0.895028,0.899092,0.891


[I 2025-11-08 06:17:59,337] Trial 0 finished with value: 0.8950276243093923 and parameters: {'learning_rate': 1.2306019678954347e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.02, 'num_train_epochs': 4}. Best is trial 0 with value: 0.8950276243093923.
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range than grid search


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.35584,0.831,0.806636,0.942513,0.705
2,0.439300,0.27715,0.885,0.888674,0.861163,0.918
3,0.439300,0.269861,0.8885,0.886514,0.902591,0.871


[I 2025-11-08 06:21:52,110] Trial 1 finished with value: 0.8865139949109415 and parameters: {'learning_rate': 2.085755051153791e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.08, 'num_train_epochs': 3}. Best is trial 0 with value: 0.8950276243093923.
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range than grid search


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.359297,0.8375,0.825737,0.890173,0.77
2,0.498300,0.287234,0.881,0.882178,0.873529,0.891
3,0.498300,0.278854,0.8875,0.886991,0.891019,0.883


[I 2025-11-08 06:25:39,859] Trial 2 finished with value: 0.886991461577097 and parameters: {'learning_rate': 1.1507702448193873e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.07, 'num_train_epochs': 3}. Best is trial 0 with value: 0.8950276243093923.
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range than grid search


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.364996,0.8275,0.80161,0.943166,0.697
2,0.406500,0.275732,0.888,0.886294,0.9,0.873


[I 2025-11-08 06:28:31,445] Trial 3 finished with value: 0.8862944162436548 and parameters: {'learning_rate': 3.148619670335291e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.04, 'num_train_epochs': 2}. Best is trial 0 with value: 0.8950276243093923.
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range than grid search


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.4724,0.267382,0.88,0.88743,0.835689,0.946
2,0.2447,0.357468,0.901,0.903977,0.877589,0.932
3,0.1783,0.352292,0.9155,0.91744,0.896848,0.939
4,0.0707,0.494911,0.914,0.914086,0.913174,0.915


[I 2025-11-08 06:34:45,441] Trial 4 finished with value: 0.9140859140859141 and parameters: {'learning_rate': 3.6803536470922976e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.01, 'num_train_epochs': 4}. Best is trial 4 with value: 0.9140859140859141.
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range than grid search


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.32822,0.844,0.832078,0.900932,0.773
2,0.459500,0.275184,0.8805,0.879476,0.88708,0.872


[I 2025-11-08 06:37:13,772] Trial 5 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range than grid search


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.4971,0.285631,0.878,0.881783,0.855263,0.91
2,0.2535,0.327059,0.8975,0.903165,0.855864,0.956
3,0.1798,0.348883,0.913,0.915287,0.891841,0.94
4,0.0996,0.458483,0.9115,0.911721,0.909453,0.914


[I 2025-11-08 06:43:31,983] Trial 6 finished with value: 0.9117206982543641 and parameters: {'learning_rate': 2.188915039854356e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.07, 'num_train_epochs': 4}. Best is trial 4 with value: 0.9140859140859141.
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range than grid search


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.4829,0.252847,0.891,0.892505,0.88035,0.905
2,0.2337,0.283644,0.8995,0.898741,0.905584,0.892


[I 2025-11-08 06:46:34,814] Trial 7 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range than grid search


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.4822,0.268342,0.887,0.892586,0.850543,0.939
2,0.2489,0.329318,0.8985,0.903195,0.863263,0.947
3,0.1723,0.340859,0.911,0.911178,0.909363,0.913


[I 2025-11-08 06:51:12,304] Trial 8 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range than grid search


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.4846,0.26465,0.888,0.892617,0.857274,0.931
2,0.2367,0.311695,0.8975,0.902613,0.859729,0.95


[I 2025-11-08 06:54:12,024] Trial 9 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range than grid search


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.4818,0.295238,0.8715,0.880853,0.821089,0.95


[I 2025-11-08 06:55:33,225] Trial 10 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range than grid search


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.476,0.315018,0.872,0.881481,0.82069,0.952


[I 2025-11-08 06:56:54,811] Trial 11 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range than grid search


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.5136,0.269296,0.8785,0.88,0.869268,0.891


[I 2025-11-08 06:58:16,541] Trial 12 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range than grid search


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.4768,0.282819,0.877,0.884724,0.832451,0.944
2,0.2515,0.303051,0.907,0.909445,0.886148,0.934
3,0.1673,0.38291,0.914,0.916505,0.890566,0.944
4,0.0659,0.481456,0.9175,0.917376,0.918756,0.916


[I 2025-11-08 07:07:04,989] Trial 13 finished with value: 0.9173760640961443 and parameters: {'learning_rate': 3.727957465978101e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.05, 'num_train_epochs': 4}. Best is trial 13 with value: 0.9173760640961443.
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range than grid search


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.4771,0.27511,0.8905,0.895065,0.859246,0.934
2,0.2478,0.315965,0.897,0.903468,0.850088,0.964


[I 2025-11-08 07:10:42,523] Trial 14 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range than grid search


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.477,0.258333,0.8905,0.895265,0.857929,0.936
2,0.2449,0.339422,0.9045,0.908393,0.872811,0.947
3,0.1683,0.355902,0.913,0.91326,0.910537,0.916


[I 2025-11-08 07:16:45,348] Trial 15 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range than grid search


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.4766,0.256636,0.887,0.88834,0.87793,0.899
2,0.2556,0.317628,0.901,0.903883,0.878302,0.931
3,0.17,0.332944,0.9095,0.911664,0.890372,0.934


[I 2025-11-08 07:23:36,067] Trial 16 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range than grid search


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.4792,0.273928,0.877,0.886006,0.825561,0.956
2,0.2544,0.330211,0.8965,0.900337,0.868152,0.935


[I 2025-11-08 07:27:04,853] Trial 17 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range than grid search


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.348641,0.839,0.818079,0.94026,0.724
2,0.425200,0.281543,0.8895,0.894712,0.854413,0.939
3,0.425200,0.268321,0.9055,0.905547,0.905095,0.906
4,0.165700,0.317772,0.906,0.906375,0.902778,0.91


[I 2025-11-08 07:34:19,696] Trial 18 finished with value: 0.9063745019920318 and parameters: {'learning_rate': 2.4728767675598835e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.05, 'num_train_epochs': 4}. Best is trial 13 with value: 0.9173760640961443.
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range than grid search


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.4706,0.309201,0.8765,0.885489,0.825411,0.955
2,0.244,0.292771,0.9025,0.904924,0.882969,0.928
3,0.1634,0.355429,0.9055,0.904594,0.913354,0.896


[I 2025-11-08 07:40:44,768] Trial 19 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range than grid search


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.4792,0.258464,0.8855,0.891623,0.846361,0.942
2,0.2393,0.309675,0.9075,0.911441,0.874197,0.952
3,0.1628,0.373921,0.905,0.904234,0.911585,0.897


[I 2025-11-08 07:45:36,686] Trial 20 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range than grid search


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.4987,0.281101,0.8785,0.885862,0.835252,0.943
2,0.249,0.326917,0.8935,0.900141,0.847308,0.96


[I 2025-11-08 07:49:01,259] Trial 21 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range than grid search


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.5173,0.276021,0.8765,0.880387,0.853521,0.909


[I 2025-11-08 07:50:22,791] Trial 22 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range than grid search


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.5362,0.292488,0.868,0.867735,0.869478,0.866


[I 2025-11-08 07:51:43,964] Trial 23 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5) # Broader range than grid search


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.4922,0.27213,0.8815,0.88826,0.840321,0.942
2,0.2466,0.319797,0.9025,0.904924,0.882969,0.928
3,0.1746,0.334233,0.911,0.912745,0.895192,0.931


[I 2025-11-08 07:57:41,152] Trial 24 pruned. 



--- Random Search Complete ---

BEST HYPERPARAMETERS FOUND:
BestRun(run_id='13', objective=0.9173760640961443, hyperparameters={'learning_rate': 3.727957465978101e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.05, 'num_train_epochs': 4}, run_summary=None)

Best Hyperparameters:
  learning_rate: 3.727957465978101e-05
  per_device_train_batch_size: 16
  weight_decay: 0.05
  num_train_epochs: 4


AttributeError: 'BestRun' object has no attribute 'metrics'