## **Install Transformer**
This line of code installs several Python libraries required for working with NLP models and optimizing them. The exclamation mark at the start allows it to be run directly in a notebook like Google Colab. transformers provides pre-trained models such as BERT and RoBERTa for tasks like text classification and sentiment analysis, while datasets helps with accessing and managing NLP datasets. accelerate enables faster training by supporting multiple devices and mixed-precision. ray[tune] and optuna are tools for hyperparameter optimization, helping to find the best model settings. Finally, the -U flag ensures all the packages are updated to their latest versions.


In [None]:
!pip install transformers datasets accelerate ray[tune] optuna -U



## **Setup and Installation**
```
import torch
import os
import numpy as np
import pandas as pd
from datasets import Dataset
from transformers import (
    AutoModelForSequenceClassification,
    AutoTokenizer,
    TrainingArguments,
    Trainer,
    set_seed
)
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score
import optuna # Import optuna to use its suggestion methods for random search

# Set a consistent seed for reproducibility across runs
set_seed(42)

# Ensure GPU is available
if torch.cuda.is_available():
    device = torch.device("cuda")
    print(f"Using GPU: {torch.cuda.get_device_name(0)}")
else:
    device = torch.device("cpu")
    print("Using CPU. For faster training, consider enabling a GPU runtime.")
```
This code sets up the environment for training a text classification model. It first imports the necessary libraries: torch for deep learning, numpy and pandas for data handling, datasets for managing NLP datasets, transformers for working with pre-trained models, sklearn for data splitting and evaluation metrics, and optuna for hyperparameter optimization. The set_seed(42) function ensures that the results are reproducible, meaning the same results can be obtained each time the code runs. The code then checks if a GPU is available for faster training; if not, it uses the CPU, which is slower but still works.

-----
## **Data Preparation**
```
try:
    df = pd.read_csv("Mental-Health-Twitter.csv")
    print("Dataset loaded successfully.")
except FileNotFoundError:
    print("Error: 'Mental-Health-Twitter.csv' not found. Please upload it to your Colab environment.")
    exit()

# Filter out rows where 'post_text' is NaN or empty
df = df.dropna(subset=['post_text'])
df = df[df['post_text'].str.strip() != '']

# Rename 'label' to 'labels' for Hugging Face Trainer compatibility
df = df.rename(columns={"label": "labels"})

# Split data into training and validation sets
train_df, eval_df = train_test_split(df, test_size=0.1, stratify=df['labels'], random_state=42)
train_df = train_df.sample(n=5000, random_state=42)
eval_df = eval_df.sample(n=1000, random_state=42)

print(f"Using {len(train_df)} training samples and {len(eval_df)} evaluation samples.")
print(f"Train label distribution:\n{train_df['labels'].value_counts(normalize=True)}")
print(f"Eval label distribution:\n{eval_df['labels'].value_counts(normalize=True)}")

# Convert pandas DataFrames to Hugging Face Dataset objects
train_dataset = Dataset.from_pandas(train_df[['post_text', 'labels']])
eval_dataset = Dataset.from_pandas(eval_df[['post_text', 'labels']])

# Initialize Tokenizer for your specific model
MODEL_NAME = "margotwagner/roberta-psychotherapy-eval"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

def tokenize_function(examples):
    return tokenizer(examples["post_text"], truncation=True, padding=True, max_length=128)

# Apply tokenization
tokenized_train = train_dataset.map(tokenize_function, batched=True, remove_columns=["post_text", "__index_level_0__"])
tokenized_eval = eval_dataset.map(tokenize_function, batched=True, remove_columns=["post_text", "__index_level_0__"])

tokenized_train.set_format("torch", columns=['input_ids', 'attention_mask', 'labels'])
tokenized_eval.set_format("torch", columns=['input_ids', 'attention_mask', 'labels'])
```
This code handles loading and preparing the dataset for model training. It first attempts to read the CSV file Mental-Health-Twitter.csv and stops with an error message if the file is not found. Rows with missing or empty text in the post_text column are removed, and the target column is renamed to labels to be compatible with the Hugging Face Trainer. The dataset is split into training (90%) and evaluation (10%) sets, and a smaller subset of 5,000 training samples and 1,000 evaluation samples is selected to reduce computation time. The distribution of labels in both sets is printed to ensure balance.
Next, the training and evaluation data are converted into Hugging Face Dataset objects. A pre-trained RoBERTa tokenizer is loaded, and a tokenize_function converts text into token IDs and attention masks, truncating or padding sequences to a maximum length of 128 tokens. Finally, the tokenized datasets are formatted for PyTorch, keeping only the columns needed for model training: input_ids, attention_mask, and labels.

-----
##**MODEL, METRICS, AND HYPERPARAMETER**
```
def model_init():
    return AutoModelForSequenceClassification.from_pretrained(MODEL_NAME, num_labels=2).to(device)

def compute_metrics(p):
    preds = np.argmax(p.predictions, axis=1)
    acc = accuracy_score(p.label_ids, preds)
    f1 = f1_score(p.label_ids, preds, average="binary")
    precision = precision_score(p.label_ids, preds, average="binary")
    recall = recall_score(p.label_ids, preds, average="binary")
    return {"accuracy": acc, "f1": f1, "precision": precision, "recall": recall}

def random_hp_space(trial):
    learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5)
    per_device_train_batch_size = trial.suggest_categorical("per_device_train_batch_size", [8, 16])
    gradient_accumulation_steps = trial.suggest_categorical("gradient_accumulation_steps", [1, 2, 4])
    weight_decay = trial.suggest_float("weight_decay", 0.0, 0.1, step=0.01)
    num_train_epochs = trial.suggest_int("num_train_epochs", 2, 3)
    return {
        "learning_rate": learning_rate,
        "per_device_train_batch_size": per_device_train_batch_size,
        "gradient_accumulation_steps": gradient_accumulation_steps,
        "weight_decay": weight_decay,
        "num_train_epochs": num_train_epochs,
    }
```
This code defines the model, evaluation metrics, and hyperparameter search space for training. The model_init() function loads a pre-trained RoBERTa model for sequence classification with two output labels and moves it to the selected device (GPU or CPU). The compute_metrics() function calculates key evaluation metrics—accuracy, F1-score, precision, and recall—by comparing the model’s predictions with the true labels.The random_hp_space() function defines the hyperparameter space for optimization. It specifies ranges or options for the learning rate, batch size per device, gradient accumulation steps, weight decay, and number of training epochs. These hyperparameters are then tested during training to find the combination that achieves the best model performance.

-----
## **Training Arguments**
```
training_args = TrainingArguments(
    output_dir="./random_search_results_mental_health_5gb",
    eval_strategy="epoch",
    save_strategy="epoch",
    save_total_limit=1,
    load_best_model_at_end=True,
    metric_for_best_model="f1",
    fp16=torch.cuda.is_available(),
    report_to="none",
    num_train_epochs=3,
    warmup_steps=100,
    logging_dir="./logs_random_5gb",
    logging_steps=200,
    dataloader_num_workers=os.cpu_count() // 2 if os.cpu_count() else 0,
    adam_epsilon=1e-7,
)

trainer = Trainer(
    model_init=model_init,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_eval,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
)

def optuna_hp_objective(metrics):
    return metrics["eval_f1"]
```
This code sets up the training configuration and trainer for the model. The TrainingArguments define how the model will be trained and evaluated. Training and evaluation results are saved in designated folders, and the model is evaluated and saved at the end of each epoch. Only the best model based on the F1-score is retained, and mixed-precision training (fp16) is enabled if a GPU is available for faster computation. Other settings include the number of training epochs, warmup steps, logging frequency, number of CPU workers for data loading, and the Adam optimizer’s epsilon value. The Trainer class from Hugging Face is then initialized with the model, training arguments, tokenized datasets, evaluation metrics, and tokenizer. Finally, the optuna_hp_objective() function is defined to return the evaluation F1-score, which guides the hyperparameter optimization process.

-----
## **Execution of Random Search**
```
print("\n--- Starting Random Search (using Optuna backend) ---")
print(f"Optimizing for '{training_args.metric_for_best_model}' score...")

NUM_RANDOM_TRIALS = 30

print(f"Number of random trials to run: {NUM_RANDOM_TRIALS}")
print("NOTE: Batch sizes are kept low and gradient accumulation is used to manage 5GB GPU memory.")
print("      Dataset size has also been reduced for quicker iteration.")

best_trial = trainer.hyperparameter_search(
    backend="optuna",
    hp_space=random_hp_space,
    direction="maximize",
    n_trials=NUM_RANDOM_TRIALS,
    compute_objective=optuna_hp_objective,
)

print("\n--- Random Search Complete ---")
print("\nBEST HYPERPARAMETERS FOUND:")

if best_trial:
    print(best_trial)
    best_hps = best_trial.hyperparameters
    print("\nBest Hyperparameters:")
    for key, value in best_hps.items():
        print(f"  {key}: {value}")
    print(f"\nBest Metrics (on evaluation set): {best_trial.metrics}")
else:
    print("Search failed or no best trial found.")

print("\n--- Final Step: Train a model with the best hyperparameters ---")
if best_trial:
    final_training_args = TrainingArguments(
        output_dir="./final_model_mental_health_5gb_random",
        eval_strategy="epoch",
        save_strategy="epoch",
        save_total_limit=1,
        load_best_model_at_end=True,
        metric_for_best_model="f1",
        fp16=torch.cuda.is_available(),
        report_to="none",
        num_train_epochs=best_hps["num_train_epochs"],
        per_device_train_batch_size=best_hps["per_device_train_batch_size"],
        gradient_accumulation_steps=best_hps["gradient_accumulation_steps"],
        learning_rate=best_hps["learning_rate"],
        weight_decay=best_hps["weight_decay"],
        warmup_steps=100,
        logging_dir="./final_logs_random_5gb",
        logging_steps=200,
        dataloader_num_workers=os.cpu_count() // 2 if os.cpu_count() else 0,
        adam_epsilon=1e-7,
    )

    final_trainer = Trainer(
        model_init=model_init,
        args=final_training_args,
        train_dataset=tokenized_train,
        eval_dataset=tokenized_eval,
        compute_metrics=compute_metrics,
        tokenizer=tokenizer,
    )

    print("\nTraining final model with best hyperparameters from Random Search (5GB GPU config)...")
    final_trainer.train()

    print("\nFinal model training complete. Best model saved to './final_model_mental_health_5gb_random'.")
    metrics = final_trainer.evaluate()
    print(f"Evaluation metrics of the final model: {metrics}")
else:
    print("No best hyperparameters found, skipping final model training.")
```
This code performs random hyperparameter search using Optuna and trains the final model with the best parameters found. It begins by announcing the start of the search, specifying that the F1-score will be used to determine the best model. A total of 30 random trials are conducted, with smaller batch sizes and gradient accumulation used to manage GPU memory constraints, and a reduced dataset size for quicker experimentation. The trainer.hyperparameter_search() method runs the random search according to the random_hp_space() function, and the best hyperparameters and evaluation metrics are printed. Once the optimal parameters are identified, new TrainingArguments are created using these values, including learning rate, batch size, number of epochs, weight decay, and gradient accumulation steps. A new Trainer is then initialized, and the final model is trained on the tokenized training dataset. After training, the model is evaluated on the validation set, and its performance metrics are displayed. The trained model is saved in the folder ./final_model_mental_health_5gb_random for future use.

In [None]:
# 1. SETUP AND INSTALLATION
# Run this command first in your Colab notebook:
# !pip install transformers datasets accelerate ray[tune] optuna pandas -U

import torch
import os
import numpy as np
import pandas as pd
from datasets import Dataset
from transformers import (
    AutoModelForSequenceClassification,
    AutoTokenizer,
    TrainingArguments,
    Trainer,
    set_seed
)
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score
import optuna # Import optuna to use its suggestion methods for random search

# Set a consistent seed for reproducibility across runs
set_seed(42)

# Ensure GPU is available
if torch.cuda.is_available():
    device = torch.device("cuda")
    print(f"Using GPU: {torch.cuda.get_device_name(0)}")
else:
    device = torch.device("cpu")
    print("Using CPU. For faster training, consider enabling a GPU runtime.")

# --- 2. DATA PREPARATION (Using your Mental-Health-Twitter.csv) ---

# Upload 'Mental-Health-Twitter.csv' to your Colab environment
# Example: from google.colab import files
#          files.upload() # Then select your file

# Load your dataset
try:
    df = pd.read_csv("Mental-Health-Twitter.csv")
    print("Dataset loaded successfully.")
except FileNotFoundError:
    print("Error: 'Mental-Health-Twitter.csv' not found. Please upload it to your Colab environment.")
    exit()

# Filter out rows where 'post_text' is NaN or empty
df = df.dropna(subset=['post_text'])
df = df[df['post_text'].str.strip() != '']

# Rename 'label' to 'labels' for Hugging Face Trainer compatibility
df = df.rename(columns={"label": "labels"})

# Split data into training and validation sets
train_df, eval_df = train_test_split(df, test_size=0.1, stratify=df['labels'], random_state=42)
train_df = train_df.sample(n=5000, random_state=42)
eval_df = eval_df.sample(n=1000, random_state=42)

print(f"Using {len(train_df)} training samples and {len(eval_df)} evaluation samples.")
print(f"Train label distribution:\n{train_df['labels'].value_counts(normalize=True)}")
print(f"Eval label distribution:\n{eval_df['labels'].value_counts(normalize=True)}")

# Convert pandas DataFrames to Hugging Face Dataset objects
train_dataset = Dataset.from_pandas(train_df[['post_text', 'labels']])
eval_dataset = Dataset.from_pandas(eval_df[['post_text', 'labels']])

# Initialize Tokenizer for your specific model
MODEL_NAME = "margotwagner/roberta-psychotherapy-eval"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

def tokenize_function(examples):
    return tokenizer(examples["post_text"], truncation=True, padding=True, max_length=128)

# Apply tokenization
tokenized_train = train_dataset.map(tokenize_function, batched=True, remove_columns=["post_text", "__index_level_0__"])
tokenized_eval = eval_dataset.map(tokenize_function, batched=True, remove_columns=["post_text", "__index_level_0__"])

tokenized_train.set_format("torch", columns=['input_ids', 'attention_mask', 'labels'])
tokenized_eval.set_format("torch", columns=['input_ids', 'attention_mask', 'labels'])


# --- 3. MODEL, METRICS, AND HYPERPARAMETER DEFINITION ---

def model_init():
    return AutoModelForSequenceClassification.from_pretrained(MODEL_NAME, num_labels=2).to(device)

def compute_metrics(p):
    preds = np.argmax(p.predictions, axis=1)
    acc = accuracy_score(p.label_ids, preds)
    f1 = f1_score(p.label_ids, preds, average="binary")
    precision = precision_score(p.label_ids, preds, average="binary")
    recall = recall_score(p.label_ids, preds, average="binary")
    return {"accuracy": acc, "f1": f1, "precision": precision, "recall": recall}

def random_hp_space(trial):
    learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5)
    per_device_train_batch_size = trial.suggest_categorical("per_device_train_batch_size", [8, 16])
    gradient_accumulation_steps = trial.suggest_categorical("gradient_accumulation_steps", [1, 2, 4])
    weight_decay = trial.suggest_float("weight_decay", 0.0, 0.1, step=0.01)
    num_train_epochs = trial.suggest_int("num_train_epochs", 2, 3)
    return {
        "learning_rate": learning_rate,
        "per_device_train_batch_size": per_device_train_batch_size,
        "gradient_accumulation_steps": gradient_accumulation_steps,
        "weight_decay": weight_decay,
        "num_train_epochs": num_train_epochs,
    }


# --- 4. TRAINING ARGUMENTS (Fixed for all runs) ---
training_args = TrainingArguments(
    output_dir="./random_search_results_mental_health_5gb",
    eval_strategy="epoch",
    save_strategy="epoch",
    save_total_limit=1,
    load_best_model_at_end=True,
    metric_for_best_model="f1",
    fp16=torch.cuda.is_available(),
    report_to="none",
    num_train_epochs=3,
    warmup_steps=100,
    logging_dir="./logs_random_5gb",
    logging_steps=200,
    dataloader_num_workers=os.cpu_count() // 2 if os.cpu_count() else 0,
    adam_epsilon=1e-7,
)

trainer = Trainer(
    model_init=model_init,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_eval,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
)

def optuna_hp_objective(metrics):
    return metrics["eval_f1"]


# --- 5. EXECUTION OF RANDOM SEARCH ---
print("\n--- Starting Random Search (using Optuna backend) ---")
print(f"Optimizing for '{training_args.metric_for_best_model}' score...")

NUM_RANDOM_TRIALS = 30

print(f"Number of random trials to run: {NUM_RANDOM_TRIALS}")
print("NOTE: Batch sizes are kept low and gradient accumulation is used to manage 5GB GPU memory.")
print("      Dataset size has also been reduced for quicker iteration.")

best_trial = trainer.hyperparameter_search(
    backend="optuna",
    hp_space=random_hp_space,
    direction="maximize",
    n_trials=NUM_RANDOM_TRIALS,
    compute_objective=optuna_hp_objective,
)

print("\n--- Random Search Complete ---")
print("\nBEST HYPERPARAMETERS FOUND:")

if best_trial:
    print(best_trial)
    best_hps = best_trial.hyperparameters
    print("\nBest Hyperparameters:")
    for key, value in best_hps.items():
        print(f"  {key}: {value}")
    print(f"\nBest Metrics (on evaluation set): {best_trial.metrics}")
else:
    print("Search failed or no best trial found.")

print("\n--- Final Step: Train a model with the best hyperparameters ---")
if best_trial:
    final_training_args = TrainingArguments(
        output_dir="./final_model_mental_health_5gb_random",
        eval_strategy="epoch",
        save_strategy="epoch",
        save_total_limit=1,
        load_best_model_at_end=True,
        metric_for_best_model="f1",
        fp16=torch.cuda.is_available(),
        report_to="none",
        num_train_epochs=best_hps["num_train_epochs"],
        per_device_train_batch_size=best_hps["per_device_train_batch_size"],
        gradient_accumulation_steps=best_hps["gradient_accumulation_steps"],
        learning_rate=best_hps["learning_rate"],
        weight_decay=best_hps["weight_decay"],
        warmup_steps=100,
        logging_dir="./final_logs_random_5gb",
        logging_steps=200,
        dataloader_num_workers=os.cpu_count() // 2 if os.cpu_count() else 0,
        adam_epsilon=1e-7,
    )

    final_trainer = Trainer(
        model_init=model_init,
        args=final_training_args,
        train_dataset=tokenized_train,
        eval_dataset=tokenized_eval,
        compute_metrics=compute_metrics,
        tokenizer=tokenizer,
    )

    print("\nTraining final model with best hyperparameters from Random Search (5GB GPU config)...")
    final_trainer.train()

    print("\nFinal model training complete. Best model saved to './final_model_mental_health_5gb_random'.")
    metrics = final_trainer.evaluate()
    print(f"Evaluation metrics of the final model: {metrics}")
else:
    print("No best hyperparameters found, skipping final model training.")

Using GPU: Tesla T4
Dataset loaded successfully.
Using 5000 training samples and 1000 evaluation samples.
Train label distribution:
labels
1    0.5
0    0.5
Name: proportion, dtype: float64
Eval label distribution:
labels
0    0.507
1    0.493
Name: proportion, dtype: float64


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/394 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/280 [00:00<?, ?B/s]

Map:   0%|          | 0/5000 [00:00<?, ? examples/s]

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

  trainer = Trainer(


config.json:   0%|          | 0.00/886 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/499M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

[I 2025-11-08 07:24:23,125] A new study created in memory with name: no-name-b76bc8a7-f4a1-45e5-a821-1d491897d507



--- Starting Random Search (using Optuna backend) ---
Optimizing for 'f1' score...
Number of random trials to run: 30
NOTE: Batch sizes are kept low and gradient accumulation is used to manage 5GB GPU memory.
      Dataset size has also been reduced for quicker iteration.


  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5)


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.6836,0.377026,0.845,0.852802,0.801786,0.910751
2,0.2722,0.312289,0.876,0.876984,0.858252,0.896552
3,0.1842,0.412614,0.878,0.875764,0.879346,0.872211


[I 2025-11-08 07:27:39,456] Trial 0 finished with value: 0.8757637474541752 and parameters: {'learning_rate': 2.7816626707377186e-05, 'per_device_train_batch_size': 16, 'gradient_accumulation_steps': 1, 'weight_decay': 0.08, 'num_train_epochs': 3}. Best is trial 0 with value: 0.8757637474541752.
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5)


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.430534,0.797,0.800393,0.776718,0.825558
2,0.658300,0.351442,0.844,0.842424,0.839034,0.845842


[I 2025-11-08 07:29:25,499] Trial 1 finished with value: 0.8424242424242424 and parameters: {'learning_rate': 1.761994091092203e-05, 'per_device_train_batch_size': 16, 'gradient_accumulation_steps': 2, 'weight_decay': 0.02, 'num_train_epochs': 2}. Best is trial 0 with value: 0.8757637474541752.
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5)


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.383179,0.826,0.819876,0.837209,0.803245
2,0.600500,0.317965,0.87,0.868421,0.866667,0.870183
3,0.248400,0.349758,0.869,0.865088,0.878661,0.851927


[I 2025-11-08 07:32:13,054] Trial 2 finished with value: 0.8650875386199794 and parameters: {'learning_rate': 2.7956175151685996e-05, 'per_device_train_batch_size': 16, 'gradient_accumulation_steps': 2, 'weight_decay': 0.02, 'num_train_epochs': 3}. Best is trial 0 with value: 0.8757637474541752.
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5)


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.6372,0.362321,0.85,0.857685,0.805704,0.916836
2,0.2333,0.339617,0.885,0.885344,0.870588,0.900609


[I 2025-11-08 07:34:22,208] Trial 3 finished with value: 0.8853439680957128 and parameters: {'learning_rate': 4.8433758508879447e-05, 'per_device_train_batch_size': 8, 'gradient_accumulation_steps': 2, 'weight_decay': 0.05, 'num_train_epochs': 2}. Best is trial 3 with value: 0.8853439680957128.
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5)


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.522566,0.728,0.761404,0.670788,0.880325
2,0.696300,0.358313,0.85,0.842437,0.873638,0.813387
3,0.336500,0.343034,0.86,0.857143,0.862423,0.851927


[I 2025-11-08 07:37:53,773] Trial 4 finished with value: 0.8571428571428571 and parameters: {'learning_rate': 1.3790187469123311e-05, 'per_device_train_batch_size': 8, 'gradient_accumulation_steps': 4, 'weight_decay': 0.0, 'num_train_epochs': 3}. Best is trial 3 with value: 0.8853439680957128.
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5)


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.4494,0.398197,0.833,0.837707,0.804104,0.874239
2,0.3018,0.380505,0.868,0.868,0.856016,0.880325


[I 2025-11-08 07:40:19,020] Trial 5 finished with value: 0.868 and parameters: {'learning_rate': 1.0574936411008278e-05, 'per_device_train_batch_size': 8, 'gradient_accumulation_steps': 1, 'weight_decay': 0.03, 'num_train_epochs': 2}. Best is trial 3 with value: 0.8853439680957128.
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5)


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.6824,0.34004,0.864,0.870229,0.821622,0.924949
2,0.2658,0.318804,0.879,0.878636,0.869048,0.888438


[I 2025-11-08 07:42:09,685] Trial 6 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5)


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.579832,0.67,0.705357,0.629984,0.801217
2,No log,0.405258,0.823,0.815817,0.837607,0.795132


[I 2025-11-08 07:44:00,490] Trial 7 finished with value: 0.8158168574401665 and parameters: {'learning_rate': 2.1009203822668113e-05, 'per_device_train_batch_size': 16, 'gradient_accumulation_steps': 4, 'weight_decay': 0.09, 'num_train_epochs': 2}. Best is trial 3 with value: 0.8853439680957128.
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5)


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.6797,0.348934,0.86,0.868668,0.808028,0.939148
2,0.2719,0.301063,0.878,0.880626,0.850662,0.912779


[I 2025-11-08 07:45:46,249] Trial 8 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5)


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.7557,0.428982,0.801,0.803941,0.781609,0.827586


[I 2025-11-08 07:46:25,861] Trial 9 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5)


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.6453,0.336095,0.855,0.864359,0.802083,0.93712
2,0.2364,0.331735,0.883,0.882412,0.874502,0.890467


[I 2025-11-08 07:48:55,778] Trial 10 finished with value: 0.8824120603015075 and parameters: {'learning_rate': 4.925436812600616e-05, 'per_device_train_batch_size': 8, 'gradient_accumulation_steps': 2, 'weight_decay': 0.06, 'num_train_epochs': 2}. Best is trial 3 with value: 0.8853439680957128.
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5)


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.6418,0.35676,0.841,0.853186,0.783051,0.93712


[I 2025-11-08 07:49:41,959] Trial 11 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5)


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.6423,0.358239,0.858,0.86803,0.801029,0.947262
2,0.2333,0.33893,0.875,0.873096,0.873984,0.872211


[I 2025-11-08 07:51:27,308] Trial 12 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5)


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.6531,0.328456,0.867,0.871992,0.82967,0.918864
2,0.2355,0.35054,0.874,0.874502,0.8591,0.890467


[I 2025-11-08 07:53:12,839] Trial 13 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5)


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.6536,0.328966,0.862,0.869318,0.815275,0.931034
2,0.2264,0.331729,0.885,0.884885,0.873518,0.896552


[I 2025-11-08 07:55:23,663] Trial 14 finished with value: 0.8848848848848849 and parameters: {'learning_rate': 4.004676450455011e-05, 'per_device_train_batch_size': 8, 'gradient_accumulation_steps': 2, 'weight_decay': 0.1, 'num_train_epochs': 2}. Best is trial 3 with value: 0.8853439680957128.
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5)


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.346396,0.854,0.856582,0.830476,0.884381
2,0.581900,0.318229,0.869,0.86781,0.863454,0.872211


[I 2025-11-08 07:57:14,612] Trial 15 finished with value: 0.8678102926337034 and parameters: {'learning_rate': 3.540428540284111e-05, 'per_device_train_batch_size': 8, 'gradient_accumulation_steps': 4, 'weight_decay': 0.1, 'num_train_epochs': 2}. Best is trial 3 with value: 0.8853439680957128.
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5)


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.6521,0.34511,0.865,0.870813,0.824275,0.922921
2,0.2402,0.328393,0.882,0.882704,0.865497,0.900609


[I 2025-11-08 07:59:15,992] Trial 16 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5)


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.6462,0.332615,0.853,0.860133,0.810036,0.916836


[I 2025-11-08 08:00:01,603] Trial 17 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5)


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.35521,0.854,0.852823,0.847695,0.858012
2,0.589800,0.313524,0.877,0.875883,0.871486,0.880325


[I 2025-11-08 08:02:01,457] Trial 18 finished with value: 0.875882946518668 and parameters: {'learning_rate': 3.1610546169084315e-05, 'per_device_train_batch_size': 8, 'gradient_accumulation_steps': 4, 'weight_decay': 0.08, 'num_train_epochs': 2}. Best is trial 3 with value: 0.8853439680957128.
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5)


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.3782,0.340545,0.868,0.872093,0.834879,0.912779
2,0.2572,0.393874,0.876,0.875502,0.866799,0.884381


[I 2025-11-08 08:04:35,342] Trial 19 finished with value: 0.8755020080321285 and parameters: {'learning_rate': 2.103684040410503e-05, 'per_device_train_batch_size': 8, 'gradient_accumulation_steps': 1, 'weight_decay': 0.05, 'num_train_epochs': 2}. Best is trial 3 with value: 0.8853439680957128.
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5)


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.6487,0.315539,0.868,0.871345,0.838649,0.906694
2,0.2435,0.306521,0.88,0.883268,0.848598,0.920892


[I 2025-11-08 08:06:38,687] Trial 20 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5)


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.6389,0.330846,0.852,0.860902,0.802102,0.929006


[I 2025-11-08 08:07:24,354] Trial 21 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5)


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.6355,0.331824,0.855,0.864359,0.802083,0.93712
2,0.2319,0.34009,0.877,0.877123,0.864173,0.890467


[I 2025-11-08 08:09:11,079] Trial 22 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5)


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.6645,0.360457,0.852,0.863469,0.791878,0.94929
2,0.25,0.317249,0.885,0.884422,0.876494,0.892495


[I 2025-11-08 08:11:26,796] Trial 23 finished with value: 0.8844221105527639 and parameters: {'learning_rate': 3.286933441000633e-05, 'per_device_train_batch_size': 8, 'gradient_accumulation_steps': 2, 'weight_decay': 0.07, 'num_train_epochs': 2}. Best is trial 3 with value: 0.8853439680957128.
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5)


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.662,0.367684,0.851,0.860094,0.800699,0.929006


[I 2025-11-08 08:12:12,461] Trial 24 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5)


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.6901,0.356483,0.849,0.856327,0.806452,0.912779


[I 2025-11-08 08:12:58,008] Trial 25 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5)


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.39728,0.829,0.82243,0.842553,0.803245
2,0.585000,0.330532,0.867,0.864424,0.868852,0.860041


[I 2025-11-08 08:14:35,111] Trial 26 pruned. 
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5)


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.6432,0.331706,0.862,0.868069,0.820976,0.920892
2,0.2379,0.30911,0.886,0.886,0.873767,0.89858


[I 2025-11-08 08:16:59,743] Trial 27 finished with value: 0.886 and parameters: {'learning_rate': 4.282224740784025e-05, 'per_device_train_batch_size': 8, 'gradient_accumulation_steps': 2, 'weight_decay': 0.07, 'num_train_epochs': 2}. Best is trial 27 with value: 0.886.
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5)


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.34234,0.856,0.851852,0.864301,0.839757
2,0.561300,0.314425,0.881,0.880642,0.871032,0.890467


[I 2025-11-08 08:18:53,670] Trial 28 finished with value: 0.880641925777332 and parameters: {'learning_rate': 4.24946942200687e-05, 'per_device_train_batch_size': 8, 'gradient_accumulation_steps': 4, 'weight_decay': 0.05, 'num_train_epochs': 2}. Best is trial 27 with value: 0.886.
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5)


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.7124,0.38272,0.838,0.841797,0.811676,0.874239


[I 2025-11-08 08:19:32,487] Trial 29 pruned. 



--- Random Search Complete ---

BEST HYPERPARAMETERS FOUND:
BestRun(run_id='27', objective=0.886, hyperparameters={'learning_rate': 4.282224740784025e-05, 'per_device_train_batch_size': 8, 'gradient_accumulation_steps': 2, 'weight_decay': 0.07, 'num_train_epochs': 2}, run_summary=None)

Best Hyperparameters:
  learning_rate: 4.282224740784025e-05
  per_device_train_batch_size: 8
  gradient_accumulation_steps: 2
  weight_decay: 0.07
  num_train_epochs: 2


AttributeError: 'BestRun' object has no attribute 'metrics'