
<center><br><font size=6>Final Project</font><br>
<font size=5>Advanced Topics in Deep Learning</font><br>
<b><font size=4>Part B</font></b>
<br><font size=4>Training Models like Excercise 5</font><br><br>
Authors: Ido Rappaport & Eran Tascesme
</font></center>

**Submission Details:**
<font size=2>
<br>Ido Rappaport, ID: 322891623
<br>Eran Tascesme , ID: 205708720 </font>


**Import libraries**

❗Note the versions of the packages, we have included information in requirements.txt❗

In [None]:
# Standard libraries
import os
import re
import string
import random
import warnings
from collections import Counter
import gc

# Data handling and visualization
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

# NLP libraries
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
import nlpaug.augmenter.word as naw
import nlpaug.augmenter.sentence as nas
from gensim import corpora, models
from urllib.parse import urlparse

# Machine learning and deep learning
import torch
from torch.utils.data import DataLoader, Dataset
from torch import nn, optim
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import (
    precision_score,
    recall_score,
    f1_score,
    accuracy_score,
    classification_report,
    confusion_matrix,
    ConfusionMatrixDisplay
)

# Hugging Face Transformers
from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification,
    TrainingArguments,
    Trainer,
    EarlyStoppingCallback,
    set_seed,
    TrainerCallback,
    TrainerState,
    TrainerControl,
    AutoConfig,
    DataCollatorWithPadding,
    RobertaForSequenceClassification,
    MarianMTModel,
    MarianTokenizer
)
from datasets import Dataset, DatasetDict, load_dataset
from transformers.modeling_outputs import SequenceClassifierOutput
import evaluate
from dataclasses import dataclass
from transformers.trainer_callback import TrainerCallback
from transformers.data.data_collator import DataCollatorWithPadding

# Other libraries
import optuna
import wandb
from tqdm import tqdm

# Filter warnings
warnings.filterwarnings('ignore')

# Download NLTK resources
try:
    nltk.data.find('tokenizers/punkt')
except LookupError:
    nltk.download('punkt')
try:
    nltk.data.find('corpora/stopwords')
except LookupError:
    nltk.download('stopwords')

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device

In [None]:
from huggingface_hub import login
login()

**Load CSV Files**

Following the results from training based on excercise 4, we concluded that we can train solely on the clean, truncated dataset after augmentation. This approach also helps save time and resources.

In [None]:
# Load CSV files

drive_path = "data/"

train_dataset = pd.read_csv(drive_path + "train_balanced.csv", encoding="ISO-8859-1")
eval_dataset = pd.read_csv(drive_path + "val_clean.csv", encoding="ISO-8859-1")

**Training Classes and Methods**

The function `train_with_optuna_wandb` is designed for training a Hugging Face `transformers` model using hyperparameter optimization with Optuna and experiment tracking with Weights & Biases (W&B). It performs the following steps:

*   Sets up W&B for tracking the Optuna trials and the final best model run.
*   Initializes the tokenizer and prepares the datasets.
*   Defines the model initialization, metric computation, and objective function for Optuna.
*   Configures base training arguments for the hyperparameter search.
*   Implements a custom callback to log metrics per epoch during Optuna trials to W&B.
*   Defines the hyperparameter search space for Optuna.
*   Runs the Optuna hyperparameter search to find the best combination of hyperparameters.
*   Prints the details of the best trial found by Optuna.
*   Logs a summary table of all Optuna trials to W&B.
*   Performs a final training run with the best hyperparameters found by Optuna, with W&B logging enabled.
*   Saves the trained model with the best hyperparameters.

This function provides a **general framework** for hyperparameter tuning and experiment tracking for sequence classification tasks using Hugging Face models, Optuna, and W&B.

In [None]:
def train_with_optuna_wandb(
    project_name, model_name, train_dataset, eval_dataset,
    num_labels=5, n_trials=5, num_train_epochs=5
):
    # Set seed for reproducibility
    set_seed(42)

    # Set W&B environment
    os.environ["WANDB_PROJECT"] = project_name
    os.environ["WANDB_MODE"] = "disabled"  # Disable W&B auto-logging for trials

    # Start single W&B run to track all trials
    wandb_run = wandb.init(project=project_name, name="optuna_search_all_trials", reinit=True)

    # Define custom metrics for step tracking
    wandb.define_metric("epoch")
    wandb.define_metric("eval_accuracy", step_metric="epoch")
    wandb.define_metric("train_accuracy", step_metric="epoch")

    # W&B table for final summary
    trials_table = wandb.Table(columns=[
        "trial", "learning_rate", "batch_size", "weight_decay", "eval_accuracy", "train_accuracy"
    ])

    # Tokenizer
    tokenizer = AutoTokenizer.from_pretrained(model_name)

    def tokenize_function(example):
        return tokenizer(example["text"], padding="max_length", truncation=True)

    tokenized_train = train_dataset.map(tokenize_function, batched=True, batch_size=64)
    tokenized_eval = eval_dataset.map(tokenize_function, batched=True, batch_size=64)

    # Accuracy metric
    metric = evaluate.load("accuracy")

    def model_init():
        return AutoModelForSequenceClassification.from_pretrained(
            model_name, num_labels=num_labels, ignore_mismatched_sizes=True
        )

    def compute_metrics(eval_pred):
        predictions = eval_pred.predictions.argmax(axis=-1)
        labels = eval_pred.label_ids
        return metric.compute(predictions=predictions, references=labels)

    def compute_objective(metrics):
        return metrics["eval_accuracy"]

    # Base training args (for Optuna search)
    base_training_args = TrainingArguments(
        output_dir=f"{project_name}/temp_run",
        eval_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
        logging_strategy="epoch",
        num_train_epochs=num_train_epochs,
        report_to=[],  # Disable W&B logging during search
        logging_dir=f"{project_name}/logs",
    )

    # Callback for logging per epoch
    class WandbOptunaCallback(TrainerCallback):
        def on_epoch_end(self, args, state, control, **kwargs):
            train_metrics = trainer.evaluate(eval_dataset=tokenized_train, metric_key_prefix="train")
            eval_metrics = trainer.evaluate(eval_dataset=tokenized_eval, metric_key_prefix="eval")

            train_acc = train_metrics.get("train_accuracy", None)
            eval_acc = eval_metrics.get("eval_accuracy", None)

            # Log per epoch with trial info
            wandb.log({
                "eval_accuracy": eval_acc,
                "train_accuracy": train_acc,
                "epoch": state.epoch,
                "trial": state.trial_name,
            })

            # Add final metrics to summary table
            if state.epoch + 1 == num_train_epochs:
                trials_table.add_data(
                    state.trial_name,
                    state.trial_params.get("learning_rate"),
                    state.trial_params.get("per_device_train_batch_size"),
                    state.trial_params.get("weight_decay"),
                    eval_acc,
                    train_acc
                )

    # Trainer for Optuna trials
    trainer = Trainer(
        model_init=model_init,
        args=base_training_args,
        train_dataset=tokenized_train,
        eval_dataset=tokenized_eval,
        tokenizer=tokenizer,
        compute_metrics=compute_metrics,
        callbacks=[WandbOptunaCallback()]
    )

    def optuna_hp_space(trial):
        return {
            "learning_rate": trial.suggest_float("learning_rate", 1e-6, 1e-4, log=True),
            "per_device_train_batch_size": trial.suggest_categorical("per_device_train_batch_size", [64, 128]),
            "weight_decay": trial.suggest_float("weight_decay", 1e-4, 0.3),
        }

    # Run hyperparameter search
    best_run = trainer.hyperparameter_search(
        direction="maximize",
        backend="optuna",
        hp_space=optuna_hp_space,
        n_trials=n_trials,
        compute_objective=compute_objective,
        study_name="transformers_optuna_study",
        storage=f"sqlite:///{project_name}/optuna_trials.db",
        load_if_exists=True
    )

    print("Best trial:", best_run)

    # Log summary table
    wandb.log({"optuna_trials": trials_table})

    # Finish main W&B run
    wandb.finish()

    # Re-enable W&B for final training run
    os.environ["WANDB_MODE"] = "online"

    # Final training args (W&B enabled)
    final_training_args = TrainingArguments(
        output_dir=f"{project_name}/best_model_run",
        eval_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
        logging_strategy="epoch",
        num_train_epochs=num_train_epochs,
        learning_rate=best_run.hyperparameters["learning_rate"],
        per_device_train_batch_size=best_run.hyperparameters["per_device_train_batch_size"],
        weight_decay=best_run.hyperparameters["weight_decay"],
        report_to=["wandb"],
        logging_dir=f"{project_name}/logs",
        run_name="final_best_model"
    )

    # Final model trainer
    trainer = Trainer(
        model_init=model_init,
        args=final_training_args,
        train_dataset=tokenized_train,
        eval_dataset=tokenized_eval,
        tokenizer=tokenizer,
        compute_metrics=compute_metrics
    )

    trainer.train()

    # Save best model
    best_model_path = f"{project_name}/best_model"
    trainer.save_model(best_model_path)
    print(f"Best model saved to {best_model_path}")

    wandb.finish()
    return best_model_path, best_run


**First Model**

twitter-roberta-base-sentiment

the function above save the best model automatically

In [None]:
best_model_path, best_roberta_run = train_with_optuna_wandb(
    project_name="roberta_sentiment_cutted_data_exc5",
    model_name="cardiffnlp/twitter-roberta-base-sentiment-latest",
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    num_labels=5,
    n_trials=5,
    num_train_epochs=6
)

**Second Model**

distilbert-base-uncased-finetuned-sst-2-english

the function above save the best model automatically

In [None]:
best_model_distil_path, best_distil_run = train_with_optuna_wandb(
    project_name="distilbert_sentiment_5_cutted_data_exc5",
    model_name="distilbert/distilbert-base-uncased-finetuned-sst-2-english",
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    num_labels=5,
    n_trials=5,
    num_train_epochs=6
)

**Improving the selected models**

To improve model training, we are trying to increase the hyperparameter space and the number of studies.

In [None]:
# --- Load CSV files from your Drive ---
drive_path = "data/"

train_df = pd.read_csv(drive_path + "train_balanced.csv", encoding="ISO-8859-1")
eval_df = pd.read_csv(drive_path + "val_clean.csv", encoding="ISO-8859-1")
test_df = pd.read_csv(drive_path + "test_clean.csv", encoding="ISO-8859-1")

for df in [train_df, eval_df, test_df]:
    df['text'] = df['text'].fillna('').astype(str)

# For consistency, rename the label column to 'labels'
train_df = train_df.rename(columns={'label': 'labels'})
eval_df = eval_df.rename(columns={'label': 'labels'})
test_df = test_df.rename(columns={'label': 'labels'})


# Convert pandas DataFrames to Hugging Face Datasets
train_dataset = Dataset.from_pandas(train_df)
eval_dataset = Dataset.from_pandas(eval_df)
test_dataset = Dataset.from_pandas(test_df)


In [None]:
def run_hyperparameter_search_and_train(
    project_name, model_name, train_dataset, eval_dataset, test_dataset,
    num_labels=5, n_trials=12, num_train_epochs=5
):
    # 1. Set W&B Project Environment Variable
    os.environ["WANDB_PROJECT"] = project_name

    # 2. Tokenizer and Data Preparation
    tokenizer = AutoTokenizer.from_pretrained(model_name)

    def tokenize_function(examples):
        return tokenizer(examples["text"], padding="max_length", truncation=True, max_length=256)

    tokenized_train = train_dataset.map(tokenize_function, batched=True)
    tokenized_eval = eval_dataset.map(tokenize_function, batched=True)
    tokenized_test = test_dataset.map(tokenize_function, batched=True)

    # 3. Model Initializer (for fresh model in each trial)
    def model_init():
        return AutoModelForSequenceClassification.from_pretrained(
            model_name,
            num_labels=num_labels,
            ignore_mismatched_sizes=True   # Useful for re-initializing head
        )

    # 4. Metrics Computation
    accuracy_metric = evaluate.load("accuracy")
    def compute_metrics(eval_pred):
        predictions, labels = eval_pred
        predictions = np.argmax(predictions, axis=1)
        return accuracy_metric.compute(predictions=predictions, references=labels)

    # 5. Define the Optuna Objective Function
    def objective(trial):
        # A. Suggest hyperparameters
        hp = {
            "learning_rate": trial.suggest_float("learning_rate", 1e-6, 5e-5, log=True),
            "per_device_train_batch_size": trial.suggest_categorical("per_device_train_batch_size", [64, 128]),
            "gradient_accumulation_steps": trial.suggest_categorical("gradient_accumulation_steps", [1, 2]),
            "weight_decay": trial.suggest_float("weight_decay", 0.01, 0.3),
            "optim": trial.suggest_categorical("optim", ["adamw_torch", "adafactor"]),
            "lr_scheduler_type": trial.suggest_categorical("lr_scheduler_type", ["linear", "cosine"]),
        }

        # B. Define Training Arguments for this specific trial
        # Each trial will be a new run in W&B
        trial_run_name = f"trial-{trial.number}"
        output_dir = f"./results/{trial_run_name}"

        training_args = TrainingArguments(
            output_dir=output_dir,
            run_name=trial_run_name,
            # Core training parameters
            num_train_epochs=num_train_epochs,
            per_device_train_batch_size=hp["per_device_train_batch_size"],
            per_device_eval_batch_size=64,
            gradient_accumulation_steps=hp["gradient_accumulation_steps"],
            learning_rate=hp["learning_rate"],
            weight_decay=hp["weight_decay"],
            optim=hp["optim"],
            lr_scheduler_type=hp["lr_scheduler_type"],
            fp16=True if device == "cuda" else False, # Enable mixed precision
            # Evaluation and logging
            eval_strategy="epoch",
            save_strategy="epoch",
            logging_strategy="epoch",
            load_best_model_at_end=True,
            metric_for_best_model="accuracy",
            report_to="wandb",
            # Efficiency
            save_total_limit=1, # Only keep the best checkpoint
            push_to_hub=False,
        )

        # C. Initialize Trainer
        trainer = Trainer(
            model_init=model_init,
            args=training_args,
            train_dataset=tokenized_train,
            eval_dataset=tokenized_eval,
            tokenizer=tokenizer,
            compute_metrics=compute_metrics,
            callbacks=[EarlyStoppingCallback(early_stopping_patience=2)],
        )

        # D. Train and return metric for Optuna
        trainer.train()
        eval_metrics = trainer.evaluate()

        # E. Clean up to free memory
        del trainer
        gc.collect()
        torch.cuda.empty_cache()

        return eval_metrics["eval_accuracy"]

    # 6. Run Hyperparameter Search
    study = optuna.create_study(direction="maximize", study_name="sentiment-analysis-optimization")
    study.optimize(objective, n_trials=n_trials)

    best_hyperparameters = study.best_trial.params
    print("🏆 Best Hyperparameters Found 🏆")
    print(best_hyperparameters)

    # 7. Train the Final Model with Best Hyperparameters
    print("🚀 Training final model with best hyperparameters...")
    final_training_args = TrainingArguments(
        output_dir="./results/best-model",
        run_name="final-best-model-run",
        # Use best hyperparameters
        **best_hyperparameters,
        # Other fixed settings
        num_train_epochs=num_train_epochs,
        per_device_eval_batch_size=64,
        fp16=True if device == "cuda" else False,
        # Evaluation and logging
        eval_strategy="epoch",
        save_strategy="epoch",
        logging_strategy="epoch",
        load_best_model_at_end=True,
        metric_for_best_model="accuracy",
        report_to="wandb",
        save_total_limit=1,
    )

    final_trainer = Trainer(
        model=model_init(), # Re-initialize the model
        args=final_training_args,
        train_dataset=tokenized_train,
        eval_dataset=tokenized_eval,
        compute_metrics=compute_metrics,
        callbacks=[EarlyStoppingCallback(early_stopping_patience=2)],
    )

    final_trainer.train()

    # 8. Evaluate the Best Model on the Test Set
    print("\n🧪 Evaluating the final best model on the test dataset...")
    test_results = final_trainer.evaluate(eval_dataset=tokenized_test)

    print("✅ Final Test Results ✅")
    print(f"Test Accuracy: {test_results['eval_accuracy']:.4f}")
    print(f"Test Loss: {test_results['eval_loss']:.4f}")

    # Log test results to the final W&B run
    wandb.log({"test_accuracy": test_results["eval_accuracy"], "test_loss": test_results["eval_loss"]})

    # End the final W&B run
    wandb.finish()


    # 9. Save the Final Model
    best_model_path = f"{project_name}/best_model"
    final_trainer.save_model(best_model_path)

    # Define the path and filename for the weights
    weights_path = f"final_models/{project_name}.pt"
    weights_dir = os.path.dirname(weights_path)

    # Create the directory if it doesn't exist
    os.makedirs(weights_dir, exist_ok=True)

    # Get the state dictionary from the trained model
    model_weights = final_trainer.model.state_dict()

    # Save the state dictionary to the specified .pt file
    torch.save(model_weights, weights_path)

    print(f"Best model saved to {best_model_path}")

    return best_model_path, best_hyperparameters

**First Model**

twitter-roberta-base-sentiment

the function above save the best model automatically

In [None]:
PROJECT_NAME = "roberta_sentiment_exc5_improved"
MODEL_NAME = "cardiffnlp/twitter-roberta-base-sentiment-latest"
N_TRIALS = 12  # Number of Optuna trials to run
N_EPOCHS = 5  # Number of epochs for each training run

# --- Run the experiment ---
best_model_path, best_params = run_hyperparameter_search_and_train(
    project_name=PROJECT_NAME,
    model_name=MODEL_NAME,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    test_dataset=test_dataset,
    num_labels=5,
    n_trials=N_TRIALS,
    num_train_epochs=N_EPOCHS
)


print(f"Best hyperparameters: {best_params}")

**Second Model**

distilbert-base-uncased-finetuned-sst-2-english

the function above save the best model automatically

In [None]:
# --- Configuration ---
PROJECT_NAME = "distilbert_exc5_improved"
MODEL_NAME = "distilbert/distilbert-base-uncased-finetuned-sst-2-english"
N_TRIALS = 12 # Number of Optuna trials to run
N_EPOCHS = 5 # Number of epochs for each training run

# --- Run the experiment ---
best_model_path, best_params = run_hyperparameter_search_and_train(
    project_name=PROJECT_NAME,
    model_name=MODEL_NAME,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    test_dataset=test_dataset,
    num_labels=5,
    n_trials=N_TRIALS,
    num_train_epochs=N_EPOCHS
)

print(f"Best hyperparameters: {best_params}")

<center><h1>END</h1></center>
