# Emotional Analysis using Hugging Face Ecosystem
##Set Environment
In this notebook, we have to install following additional libraries (compared to previous notebooks) from Huggingface to enhance our workflow: transformers, datasets, evaluate, and accelearte. In addition, we are also installing wandb.

* The transformers library provides Trainer class that we will use to manage Training process.
* The datasets library simplifies the process of accessing and manipulating a wide array of datasets.
* The evaluate library offers a suite of standardized metrics and methods for robust and consistent model evaluation.
* We will not use accelerate library directly. However , we need to install it as transformer librray usses it in the background.
* Finally wandb library provide tools for efficient experiment tracking.

# Setting up the Environment



In [1]:
import sys
# If in Colab, then import the drive module from google.colab
if 'google.colab' in str(get_ipython()):
  from google.colab import drive
  # Mount the Google Drive to access files stored there
  drive.mount('/content/drive')

  # !pip install torchtext -qq
  # # Install the torchinfo library quietly
  !pip install torchinfo -qq
  # # !pip install torchtext --upgrade -qq
  !pip install torchmetrics -qq
  # !pip install torchinfo -qq
  !pip install fast_ml -qq
  !pip install joblib -qq
  # !pip install sklearn -qq
  # !pip install pandas -qq
  # !pip install numpy -qq
  !pip install scikit-multilearn -qq
  !pip install transformers evaluate wandb accelerate -U -qq
  !pip install pytorch-ignite -qq -U
  !pip install optuna -qq

  basepath = '/content/drive/MyDrive/Colab_Notebooks/BUAN_6342_Applied_Natural_Language_Processing'
  sys.path.append('/content/drive/MyDrive/Colab_Notebooks/BUAN_6342_Applied_Natural_Language_Processing/0_Custom_files')
else:
  basepath = '/Users/harikrishnadev/Library/CloudStorage/GoogleDrive-harikrish0607@gmail.com/My Drive/Colab_Notebooks/BUAN_6342_Applied_Natural_Language_Processing/'
  sys.path.append('/Users/harikrishnadev/Library/CloudStorage/GoogleDrive-harikrish0607@gmail.com/My Drive/Colab_Notebooks/BUAN_6342_Applied_Natural_Language_Processing/0_Custom_files')

Mounted at /content/drive
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m841.5/841.5 kB[0m [31m11.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.1/42.1 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m89.4/89.4 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.8/8.8 MB[0m [31m27.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m12.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.2/2.2 MB[0m [31m55.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m297.4/297.4 kB[0m [31m34.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m510.5/510.5 kB[0m [31m46.6 MB/s[0m eta [36m0:00:00[0m
[2K  

## *Load Libraries*

In [2]:
# standard data science librraies for data handling and v isualization
from pathlib import Path
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.metrics import multilabel_confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns

# New libraries introduced in this notebook
import evaluate
import torch
from datasets import load_dataset, DatasetDict, ClassLabel, Dataset
from datasets import load_metric
from transformers import Pipeline
from transformers import TrainingArguments, Trainer
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification
from transformers import AutoConfig
from transformers import pipeline
from pprint import pprint

import wandb

import os

In [3]:
# Set the base folder path using the Path class for better path handling
base_folder = Path(basepath)

# Define the data folder path by appending the relative path to the base folder
# This is where the data files will be stored
data_folder = base_folder / '0_Data_Folder'

# Define the model folder path for saving trained models
# This path points to a specific folder designated for NLP models related to the IMDb dataset
model_folder = data_folder

custom_functions = base_folder / '0_Custom_files'

# **Logging into Kaggle**
    


In [4]:
if 'google.colab' in str(get_ipython()):
    !chmod 600 /content/drive/MyDrive/Colab_Notebooks/BUAN_6382_Applied_DeepLearning/Data/.kaggle/kaggle.json
    !ls -la /content/drive/MyDrive/Colab_Notebooks/BUAN_6382_Applied_DeepLearning/Data/.kaggle
else:
    !chmod 600 '/Users/harikrishnadev/Library/CloudStorage/GoogleDrive-harikrish0607@gmail.com/My Drive/Colab_Notebooks/BUAN_6382_Applied_DeepLearning/Data/.kaggle/kaggle.json'
    ! ls -la '/Users/harikrishnadev/Library/CloudStorage/GoogleDrive-harikrish0607@gmail.com/My Drive/Colab_Notebooks/BUAN_6382_Applied_DeepLearning/Data/.kaggle'

total 1
-rw------- 1 root root 70 Nov 27 02:27 kaggle.json


In [5]:
if 'google.colab' in str(get_ipython()):
    os.environ['KAGGLE_CONFIG_DIR']='/content/drive/MyDrive/Colab_Notebooks/BUAN_6382_Applied_DeepLearning/Data/.kaggle'
else:
    os.environ['KAGGLE_CONFIG_DIR']='/Users/harikrishnadev/Library/CloudStorage/GoogleDrive-harikrish0607@gmail.com/My Drive/Colab_Notebooks/BUAN_6382_Applied_DeepLearning/Data/.kaggle'

# **Logging into Wandb**

In [6]:
if 'google.colab' in str(get_ipython()):
    from google.colab import userdata
    wandb.login(key=userdata.get('wandb'))
else:
    !wandb login

[34m[1mwandb[0m: W&B API key is configured. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


### Free memory command

In [7]:
import gc
def free_memory():
    """
    Attempts to free up memory by deleting variables and running Python's garbage collector.
    """
    gc.collect()
    for device_id in range(torch.cuda.device_count()):
        torch.cuda.set_device(device_id)
        torch.cuda.empty_cache()
    gc.collect()

# **Loading Dataset**

In [8]:
! kaggle competitions download -c emotion-detection-spring2014

Downloading emotion-detection-spring2014.zip to /content
  0% 0.00/609k [00:00<?, ?B/s]
100% 609k/609k [00:00<00:00, 159MB/s]


In [9]:
! unzip emotion-detection-spring2014.zip

Archive:  emotion-detection-spring2014.zip
  inflating: sample_submission.csv   
  inflating: test.csv                
  inflating: train.csv               


In [10]:
import pandas as pd
train_dataset = pd.read_csv('train.csv', usecols=lambda column: column != 'ID')

In [11]:
train_dataset.head()

Unnamed: 0,Tweet,anger,anticipation,disgust,fear,joy,love,optimism,pessimism,sadness,surprise,trust
0,“Worry is a down payment on a problem you may ...,0,1,0,0,0,0,1,0,0,0,1
1,Whatever you decide to do make sure it makes y...,0,0,0,0,1,1,1,0,0,0,0
2,@Max_Kellerman it also helps that the majorit...,1,0,1,0,1,0,1,0,0,0,0
3,Accept the challenges so that you can literall...,0,0,0,0,1,0,1,0,0,0,0
4,My roommate: it's okay that we can't spell bec...,1,0,1,0,0,0,0,0,0,0,0


In [12]:
train_dataset.columns

Index(['Tweet', 'anger', 'anticipation', 'disgust', 'fear', 'joy', 'love',
       'optimism', 'pessimism', 'sadness', 'surprise', 'trust'],
      dtype='object')

In [13]:
label_columns = ['anger', 'anticipation', 'disgust', 'fear', 'joy', 'love', 'optimism', 'pessimism', 'sadness', 'surprise', 'trust']

In [14]:
len(label_columns)

11

In [15]:
train_dataset[label_columns] = train_dataset[label_columns].astype(bool)

In [16]:
trainset = Dataset.from_pandas(train_dataset)

In [17]:
trainset.features

{'Tweet': Value(dtype='string', id=None),
 'anger': Value(dtype='bool', id=None),
 'anticipation': Value(dtype='bool', id=None),
 'disgust': Value(dtype='bool', id=None),
 'fear': Value(dtype='bool', id=None),
 'joy': Value(dtype='bool', id=None),
 'love': Value(dtype='bool', id=None),
 'optimism': Value(dtype='bool', id=None),
 'pessimism': Value(dtype='bool', id=None),
 'sadness': Value(dtype='bool', id=None),
 'surprise': Value(dtype='bool', id=None),
 'trust': Value(dtype='bool', id=None)}

# **Accessing and Manuplating Splits**

In [18]:
trainset = trainset.train_test_split(test_size=0.3)

In [19]:
trainset

DatasetDict({
    train: Dataset({
        features: ['Tweet', 'anger', 'anticipation', 'disgust', 'fear', 'joy', 'love', 'optimism', 'pessimism', 'sadness', 'surprise', 'trust'],
        num_rows: 5406
    })
    test: Dataset({
        features: ['Tweet', 'anger', 'anticipation', 'disgust', 'fear', 'joy', 'love', 'optimism', 'pessimism', 'sadness', 'surprise', 'trust'],
        num_rows: 2318
    })
})

In [None]:
# !huggingface-cli login

In [None]:
# trainset.push_to_hub("harikrishnad1997/tweetemo")

In [20]:
from google.colab import userdata
os.environ["HF_TOKEN"] = userdata.get('HF_TOKEN')

In [None]:
# trainset = load_dataset("harikrishnad1997/tweetemo")

In [21]:
trainset

DatasetDict({
    train: Dataset({
        features: ['Tweet', 'anger', 'anticipation', 'disgust', 'fear', 'joy', 'love', 'optimism', 'pessimism', 'sadness', 'surprise', 'trust'],
        num_rows: 5406
    })
    test: Dataset({
        features: ['Tweet', 'anger', 'anticipation', 'disgust', 'fear', 'joy', 'love', 'optimism', 'pessimism', 'sadness', 'surprise', 'trust'],
        num_rows: 2318
    })
})

# Custom `MultiLabelClassifier` class built to run multiple models without repeating mutiple lines of code

The **MultiLabelClassifier** is a class designed for training and evaluating multi-label text classification models using the Hugging Face Transformers library. It supports fine-tuning pre-trained models for multi-label classification tasks and provides methods for prediction and hyperparameter optimization.

* `model_name` (str): The pre-trained model name from Hugging Face Transformers.
* `labels` (list of str): The list of labels for classification.
* `batch_size` (int): Batch size for training (default is 8).
* `learning_rate` (float): Learning rate for training (default is 2e-5).
* `num_epochs` (int): Number of epochs for training (default is 5).
* `metric_name` (str): The name of the evaluation metric (default is "f1").
* `threshold` (float): Threshold for binary classification (default is 0.5).



```python
# Initialize the classifier
classifier = MultiLabelClassifier(
    model_name="distilbert-base-uncased",
    labels=["positive", "negative"],
    batch_size=8,
    learning_rate=2e-5,
    num_epochs=10,
    metric_name="f1",
    threshold=0.5
)

# Train the classifier
classifier.train(train_dataset, valid_dataset)

# Optimize threshold
best_threshold = classifier.optimize_threshold(valid_dataset)

# Make predictions
predictions = classifier.predict(["This is a positive sentence", "This is a negative sentence"], threshold = best_threshold)

```


Here's a detailed explanation of the different components of the class:

1. **__init__** method:
   - Initializes the classifier by taking in various parameters such as the pre-trained model name, the list of labels, batch size, learning rate, number of epochs, evaluation metric, and the classification threshold.
   - It sets up the device (either 'cuda' if a GPU is available or 'cpu'), creates the tokenizer and the pre-trained model for multi-label classification.
   - The model is loaded onto the specified device.

2. **preprocess_data** method:
   - This method takes in a dictionary of examples and preprocesses the data for the model.
   - It tokenizes the input text and encodes it using the tokenizer.
   - It then creates a label matrix where each row corresponds to the binary labels for a given input text.
   - The preprocessed data, including the input IDs and the label matrix, is returned.

3. **multi_label_metrics** method:
   - This method computes the multi-label classification metrics, including F1 score (micro-averaged), ROC-AUC score, and accuracy.
   - It takes in the model predictions and the ground truth labels, and applies a threshold to convert the probabilities to binary predictions.
   - The computed metrics are returned as a dictionary.

4. **compute_metrics** method:
   - This method is used as the `compute_metrics` function for the Trainer in the Transformers library.
   - It calls the `multi_label_metrics` method to compute the evaluation metrics for the model.

5. **train** method:
   - This method is responsible for training the model.
   - It sets up the `TrainingArguments` object, which specifies the training configuration, such as the learning rate, batch size, number of epochs, and various logging and checkpointing options.
   - It preprocesses the training and validation datasets using the `preprocess_data` method and sets the data format to PyTorch tensors.
   - It creates a `Trainer` object and calls the `train` method to train the model.
   - After training, it evaluates the model on the validation dataset and logs the results to Weights & Biases.

6. **predict** method:
   - This method generates predictions for a list of input texts.
   - It preprocesses the input texts using the `preprocess_data` method and makes predictions using the model.
   - It applies the classification threshold to convert the probabilities to binary predictions and returns the predicted labels and the binary predictions.

7. **objective** method:
   - This method is used for hyperparameter optimization using Optuna.
   - It takes in a trial object and the validation dataset, and computes the negative F1 score as the objective function.
   - It applies the threshold (which is a hyperparameter to be optimized) to the model predictions and computes the multi-label metrics.
   - The negative F1 score is returned as the objective value.

8. **optimize_threshold** method:
   - This method uses Optuna to optimize the classification threshold.
   - It creates an Optuna study, optimizes the objective function (the `objective` method), and sets the best threshold value found during the optimization process.
   - The best threshold value is returned.


**Notes:**
* The train_dataset and valid_dataset should be compatible with the Hugging Face Dataset class.
* The labels should match the labels present in the datasets.
* Model fine-tuning and prediction methods require GPU if available for faster computation.

In [22]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer
import numpy as np
from sklearn.metrics import f1_score, roc_auc_score, accuracy_score
import torch
from transformers import EvalPrediction
import optuna
from datetime import date
from sklearn.metrics import multilabel_confusion_matrix

class MultiLabelClassifier:
    def __init__(self, model_name, labels, batch_size=8, learning_rate=2e-5, num_epochs=5, metric_name="f1", threshold=0.5):
        """
        Initializes the MultiLabelClassifier.

        Args:
        - model_name (str): The pre-trained model name.
        - labels (list of str): The list of labels for classification.
        - batch_size (int): Batch size for training.
        - learning_rate (float): Learning rate for training.
        - num_epochs (int): Number of epochs for training.
        - metric_name (str): The name of the evaluation metric.
        - threshold (float): Threshold for binary classification.

        Returns:
        - None
        """
        self.model_name = model_name
        self.labels = labels
        self.device = 'cuda' if torch.cuda.is_available() else 'cpu'
        self.batch_size = batch_size
        self.learning_rate = learning_rate
        self.num_epochs = num_epochs
        self.metric_name = metric_name
        self.threshold = threshold
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModelForSequenceClassification.from_pretrained(model_name, problem_type="multi_label_classification", num_labels=len(labels), id2label={str(i): label for i, label in enumerate(labels)}, label2id={label: i for i, label in enumerate(labels)})
        self.id2label = {str(i): label for i, label in enumerate(labels)}
        self.label2id = {label: i for i, label in enumerate(labels)}
        self.model.to(self.device)

    def preprocess_data(self, examples):
        """
        Preprocesses the input data.

        Args:
        - examples (dict): Dictionary containing input data.

        Returns:
        - dict: Preprocessed input data.
        """
        text = examples["Tweet"]
        encoding = self.tokenizer(text, padding="max_length", truncation=True, max_length=128)
        labels_batch = {k: examples[k] for k in examples.keys() if k in self.labels}
        labels_matrix = np.zeros((len(text), len(self.labels)))
        for idx, label in enumerate(self.labels):
            labels_matrix[:, idx] = labels_batch[label]
        encoding["labels"] = labels_matrix.tolist()
        return encoding

    def multi_label_metrics(self, predictions, labels, threshold=None):
        """
        Computes multi-label classification metrics.

        Args:
        - predictions (torch.Tensor): Model predictions.
        - labels (np.ndarray): Ground truth labels.
        - threshold (float): Threshold for binary classification.

        Returns:
        - dict: Dictionary containing computed metrics.
        """
        if threshold is None:
            threshold = self.threshold
        sigmoid = torch.nn.Sigmoid()
        probs = sigmoid(torch.Tensor(predictions))
        y_pred = np.zeros(probs.shape)
        y_pred[np.where(probs >= threshold)] = 1
        y_true = labels
        f1_micro_average = f1_score(y_true=y_true, y_pred=y_pred, average='micro')
        roc_auc = roc_auc_score(y_true, y_pred, average='micro')
        accuracy = accuracy_score(y_true, y_pred)
        metrics = {'f1': f1_micro_average, 'roc_auc': roc_auc, 'accuracy': accuracy}
        return metrics

    def multilabel_confusion_matrix(self, predictions, labels, threshold=None):
        """
        Computes multilabel confusion matrix.

        Args:
        - predictions (torch.Tensor): Model predictions.
        - labels (np.ndarray): Ground truth labels.
        - threshold (float): Threshold for binary classification.

        Returns:
        - np.ndarray: Multilabel confusion matrix.
        """
        if threshold is None:
            threshold = self.threshold
        sigmoid = torch.nn.Sigmoid()
        probs = sigmoid(torch.Tensor(predictions))
        y_pred = np.zeros(probs.shape)
        y_pred[np.where(probs >= threshold)] = 1
        y_true = labels
        return multilabel_confusion_matrix(y_true, y_pred)

    def compute_metrics(self, p: EvalPrediction):
        """
        Computes evaluation metrics.

        Args:
        - p (EvalPrediction): Evaluation predictions.

        Returns:
        - dict: Dictionary containing computed metrics.
        """
        preds = p.predictions[0] if isinstance(p.predictions, tuple) else p.predictions
        result = self.multi_label_metrics(predictions=preds, labels=p.label_ids)
        return result

    def train(self, train_dataset, valid_dataset):
        """
        Trains the model.

        Args:
        - train_dataset (Dataset): Training dataset.
        - valid_dataset (Dataset): Validation dataset.

        Returns:
        - None
        """
        args = TrainingArguments(
            f"{self.model_name}-finetuned",
            # evaluation_strategy="epoch",
            # save_strategy="epoch",
            learning_rate=self.learning_rate,
            per_device_train_batch_size=self.batch_size,
            per_device_eval_batch_size=self.batch_size,
            num_train_epochs=self.num_epochs,
            weight_decay=0.01,
            load_best_model_at_end=True,
            metric_for_best_model="f1",  # Use F1 score as the metric to determine the best model
            optim='adamw_torch',  # Optimizer
            # output_dir=str(model_folder),  # Directory to save model checkpoints
            evaluation_strategy='steps',  # Evaluate model at specified step intervals
            eval_steps=50,  # Perform evaluation every 50 training steps
            save_strategy="steps",  # Save model checkpoint at specified step intervals
            save_steps=1000,  # Save model checkpoint every 1000 training steps
            save_total_limit=2,  # Retain only the best and the most recent model checkpoints
            greater_is_better=True,  # A model is 'better' if its F1 score is higher
            logging_strategy='steps',  # Log metrics and results to Weights & Biases platform
            logging_steps=50,  # Log metrics and results every 50 steps
            report_to='wandb',  # Log metrics and results to Weights & Biases platform
            run_name=f"emotion_tweet_{self.model_name}_{date.today().strftime('%Y-%m-%d')}",  # Experiment name for Weights & Biases
            fp16=True  # Use mixed precision training (FP16)
            )

        train_dataset = train_dataset.map(self.preprocess_data, batched=True, remove_columns=train_dataset.column_names)
        valid_dataset = valid_dataset.map(self.preprocess_data, batched=True, remove_columns=valid_dataset.column_names)

        train_dataset.set_format("torch")
        valid_dataset.set_format("torch")

        trainer = Trainer(
            self.model,
            args,
            train_dataset=train_dataset,
            eval_dataset=valid_dataset,
            tokenizer=self.tokenizer,
            compute_metrics=self.compute_metrics,
        )

        trainer.train()
        eval_results = trainer.evaluate()
        print(f"Evaluation results: {eval_results}")

        # Log evaluation results to Weights & Biases platform
        wandb.log({"eval_accuracy": eval_results["eval_accuracy"], "eval_loss": eval_results["eval_loss"], "eval_f1": eval_results["eval_f1"]})

        # # Compute and plot confusion matrix
        # preds = trainer.predict(valid_dataset)
        # y_labels = valid_dataset[self.labels]
        # confusion_matrix = self.multilabel_confusion_matrix(preds, y_labels)
        # plt.figure(figsize=(10, 7))
        # sns.heatmap(confusion_matrix, annot=True, cmap="Blues")
        # plt.xlabel("Predicted Labels")
        # plt.ylabel("True Labels")
        # plt.title("Multilabel Confusion Matrix")
        # plt.show()

        # # Log confusion matrix to Weights & Biases platform
        # wandb.log({"confusion_matrix": wandb.Image(plt)})

    def predict(self, texts, threshold=0.5):
        """
        Generates predictions for a list of texts.

        Args:
        - texts (list of str): List of input texts.
        - threshold (float): Threshold for binary classification.

        Returns:
        - dict: Dictionary containing predicted labels for each input text.
        """
        if threshold is None:
            threshold = self.threshold

        # Preprocess input texts
        encoding = self.tokenizer(texts, padding="max_length", truncation=True, max_length=128, return_tensors="pt").to(self.device)

        # Make predictions
        with torch.no_grad():
            output = self.model(**encoding)

        # Convert logits to probabilities
        sigmoid = torch.nn.Sigmoid()
        probs = sigmoid(output.logits)

        # Apply threshold for binary classification
        threshold_tensor = torch.tensor([threshold], device=self.device)
        binary_preds = (probs >= threshold_tensor).int()

        # Convert binary predictions to label names
        label_preds = []
        for pred in binary_preds:
            label_pred = [self.id2label[str(i)] for i, val in enumerate(pred) if val == 1]
            label_preds.append(label_pred)

        return label_preds, binary_preds.cpu().numpy()

    def objective(self, trial, valid_dataset):
        """
        Objective function for hyperparameter optimization.

        Args:
        - trial (Trial): Optuna trial object.
        - valid_dataset (Dataset): Validation dataset.

        Returns:
        - float: Computed metric value.
        """
        threshold = trial.suggest_float("threshold", 0.1, 0.9)
        valid_dataset = valid_dataset.map(self.preprocess_data, batched=True)
        valid_dataset.set_format("torch")

        # Get the correct labels from the dataset
        labels = np.array([valid_dataset[column] for column in self.labels]).T

        # Make predictions
        with torch.no_grad():
            logits = self.model(valid_dataset["input_ids"].to(torch.device("cuda")))['logits']
            predictions = torch.sigmoid(logits).cpu().numpy()

            # Apply threshold for binary classification
            binary_preds = (predictions >= threshold).astype(int)

            # Compute metrics
            f1_micro_average = f1_score(y_true=labels, y_pred=binary_preds, average='micro')
            roc_auc = roc_auc_score(labels, predictions, average='micro')
            accuracy = accuracy_score(labels, binary_preds)

            result = {'f1': f1_micro_average, 'roc_auc': roc_auc, 'accuracy': accuracy}
            return -result["f1"]

    def optimize_threshold(self, valid_dataset):
        """
        Optimizes the threshold for binary classification.

        Args:
        - valid_dataset (Dataset): Validation dataset.

        Returns:
        - float: Best threshold value.
        """
        study = optuna.create_study(direction="maximize")
        study.optimize(lambda trial: self.objective(trial, valid_dataset), n_trials=10)
        self.threshold = study.best_params["threshold"]
        return study.best_params["threshold"]

In [23]:
os.environ["WANDB_PROJECT"] = "nlp_course_spring_2024-emotion-analysis-hf-trainer-hw6"  # name your W&B project
os.environ["WANDB_LOG_MODEL"] = "checkpoint"  # log the model during training

# Distill BERT
## Training the model

In [24]:
classifier = MultiLabelClassifier(
    model_name="distilbert-base-uncased",
    labels=label_columns,
    batch_size=8,
    learning_rate=2e-5,
    num_epochs=5,
    metric_name="f1",
    threshold=0.5
)

tokenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [25]:
classifier.train(trainset['train'], trainset['test'])

Map:   0%|          | 0/5406 [00:00<?, ? examples/s]

Map:   0%|          | 0/2318 [00:00<?, ? examples/s]

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


[34m[1mwandb[0m: Currently logged in as: [33mharikrish0607[0m ([33mharikrishnad[0m). Use [1m`wandb login --relogin`[0m to force relogin


Step,Training Loss,Validation Loss,F1,Roc Auc,Accuracy
50,0.5494,0.481741,0.0,0.5,0.028041
100,0.4773,0.466425,0.0,0.5,0.028041
150,0.4549,0.42686,0.395623,0.622513,0.127696
200,0.4164,0.393947,0.548931,0.698297,0.193701
250,0.3913,0.383503,0.536867,0.689889,0.188525
300,0.3944,0.369174,0.602433,0.729545,0.207506
350,0.3608,0.363649,0.593188,0.726965,0.20233
400,0.3689,0.358875,0.576599,0.711124,0.208369
450,0.3582,0.343166,0.612673,0.732441,0.225626
500,0.348,0.33942,0.623702,0.741423,0.226488


[34m[1mwandb[0m: Adding directory to artifact (./distilbert-base-uncased-finetuned/checkpoint-1000)... Done. 2.4s
[34m[1mwandb[0m: Adding directory to artifact (./distilbert-base-uncased-finetuned/checkpoint-2000)... Done. 4.5s
[34m[1mwandb[0m: Adding directory to artifact (./distilbert-base-uncased-finetuned/checkpoint-3000)... Done. 2.3s
dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


Evaluation results: {'eval_loss': 0.3134077489376068, 'eval_f1': 0.6772892295280356, 'eval_roc_auc': 0.7799478253379637, 'eval_accuracy': 0.270060396893874, 'eval_runtime': 2.7437, 'eval_samples_per_second': 844.839, 'eval_steps_per_second': 105.696, 'epoch': 5.0}


## Finding the optimal threshold

In [26]:
best_threshold = classifier.optimize_threshold(trainset['test'])
print(f"Best threshold: {best_threshold}")

[I 2024-04-14 22:00:43,867] A new study created in memory with name: no-name-58259223-76b3-4d94-92c7-8257f09833d2


Map:   0%|          | 0/2318 [00:00<?, ? examples/s]

We strongly recommend passing in an `attention_mask` since your input_ids may be padded. See https://huggingface.co/docs/transformers/troubleshooting#incorrect-output-when-padding-tokens-arent-masked.
[I 2024-04-14 22:00:48,117] Trial 0 finished with value: -0.6418547895057962 and parameters: {'threshold': 0.23803957242491178}. Best is trial 0 with value: -0.6418547895057962.


Map:   0%|          | 0/2318 [00:00<?, ? examples/s]

[I 2024-04-14 22:00:52,198] Trial 1 finished with value: -0.38785786217810386 and parameters: {'threshold': 0.771695821084748}. Best is trial 1 with value: -0.38785786217810386.


Map:   0%|          | 0/2318 [00:00<?, ? examples/s]

[I 2024-04-14 22:00:56,266] Trial 2 finished with value: -0.46877512731171256 and parameters: {'threshold': 0.6808094677936205}. Best is trial 1 with value: -0.38785786217810386.


Map:   0%|          | 0/2318 [00:00<?, ? examples/s]

[I 2024-04-14 22:01:00,330] Trial 3 finished with value: -0.63266291230893 and parameters: {'threshold': 0.19825293194528834}. Best is trial 1 with value: -0.38785786217810386.


Map:   0%|          | 0/2318 [00:00<?, ? examples/s]

[I 2024-04-14 22:01:04,592] Trial 4 finished with value: -0.4867710938528367 and parameters: {'threshold': 0.6515830142080279}. Best is trial 1 with value: -0.38785786217810386.


Map:   0%|          | 0/2318 [00:00<?, ? examples/s]

[I 2024-04-14 22:01:08,612] Trial 5 finished with value: -0.2478468899521531 and parameters: {'threshold': 0.8812427053509152}. Best is trial 5 with value: -0.2478468899521531.


Map:   0%|          | 0/2318 [00:00<?, ? examples/s]

[I 2024-04-14 22:01:12,658] Trial 6 finished with value: -0.3334835511491663 and parameters: {'threshold': 0.8223707243057812}. Best is trial 5 with value: -0.2478468899521531.


Map:   0%|          | 0/2318 [00:00<?, ? examples/s]

[I 2024-04-14 22:01:16,753] Trial 7 finished with value: -0.6363256784968685 and parameters: {'threshold': 0.3542068687342066}. Best is trial 5 with value: -0.2478468899521531.


Map:   0%|          | 0/2318 [00:00<?, ? examples/s]

[I 2024-04-14 22:01:20,806] Trial 8 finished with value: -0.5961134809398063 and parameters: {'threshold': 0.12795976025261027}. Best is trial 5 with value: -0.2478468899521531.


Map:   0%|          | 0/2318 [00:00<?, ? examples/s]

[I 2024-04-14 22:01:25,089] Trial 9 finished with value: -0.6237215692260724 and parameters: {'threshold': 0.17462725794205386}. Best is trial 5 with value: -0.2478468899521531.


Best threshold: 0.8812427053509152


In [27]:
best_threshold

0.8812427053509152

In [28]:
wandb.finish()

VBox(children=(Label(value='2558.421 MB of 2558.421 MB uploaded (1.813 MB deduped)\r'), FloatProgress(value=1.…

0,1
eval/accuracy,▁▁▆▆▆▇▆▇██▇█████████████████████████████
eval/f1,▁▁▇▇▇▇▇█████████████████████████████████
eval/loss,█▇▄▃▃▂▃▂▂▂▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
eval/roc_auc,▁▁▆▇▇▇▇▇▇▇▇▇████████████████████████████
eval/runtime,▁▃▁▄▃▃▂▂▁▂▁▂█▂▁▂▁▁▂▁▁▂▁▂▂▁▂▁▂▁▂▂▂▂▂▆▅▁▂▁
eval/samples_per_second,▇▆█▅▅▆▆▆█▇█▇▁▇█▇█▇▇▇█▇█▇▇▇▇█▇█▇▇▇▇▇▂▄█▇▇
eval/steps_per_second,▇▆█▅▅▆▆▆█▇█▇▁▇█▇█▇▇▇█▇█▇▇▇▇█▇█▇▇▇▇▇▂▄█▇▇
eval_accuracy,▁
eval_f1,▁
eval_loss,▁

0,1
eval/accuracy,0.27006
eval/f1,0.67729
eval/loss,0.31341
eval/roc_auc,0.77995
eval/runtime,2.7437
eval/samples_per_second,844.839
eval/steps_per_second,105.696
eval_accuracy,0.27006
eval_f1,0.67729
eval_loss,0.31341


## Prediction on Submission file

In [29]:
test = pd.read_csv('test.csv')
test.head()

Unnamed: 0,ID,Tweet,anger,anticipation,disgust,fear,joy,love,optimism,pessimism,sadness,surprise,trust
0,2018-01559,@Adnan__786__ @AsYouNotWish Dont worry Indian ...,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE
1,2018-03739,"Academy of Sciences, eschews the normally sobe...",NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE
2,2018-00385,I blew that opportunity -__- #mad,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE
3,2018-03001,This time in 2 weeks I will be 30... 😥,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE
4,2018-01988,#Deppression is real. Partners w/ #depressed p...,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE


In [30]:
testset = Dataset.from_dict({
    'Tweet': test['Tweet']})

In [31]:
testset

Dataset({
    features: ['Tweet'],
    num_rows: 3259
})

In [32]:
outputs, outputs_array = classifier.predict(testset['Tweet'], threshold = best_threshold)

In [33]:
outputs[:10]

[['fear'],
 ['disgust'],
 ['anger', 'disgust'],
 [],
 ['sadness'],
 ['fear'],
 [],
 ['joy'],
 ['joy', 'optimism'],
 ['sadness']]

In [34]:
outputs_array[:10]

array([[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
       [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]], dtype=int32)

In [35]:
test[label_columns] = outputs_array

In [None]:
# submission = pd.read_csv('sample_submission.csv')
# submission.head()

Unnamed: 0,ID,anger,anticipation,disgust,fear,joy,love,optimism,pessimism,sadness,surprise,trust
0,2018-01559,0,0,0,0,0,0,0,0,0,0,0
1,2018-03739,0,0,0,0,0,0,0,0,0,0,0
2,2018-00385,0,0,0,0,0,0,0,0,0,0,0
3,2018-03001,0,0,0,0,0,0,0,0,0,0,0
4,2018-01988,0,0,0,0,0,0,0,0,0,0,0


In [36]:
submission = test.drop(columns = ['Tweet'])
submission.head()

Unnamed: 0,ID,anger,anticipation,disgust,fear,joy,love,optimism,pessimism,sadness,surprise,trust
0,2018-01559,0,0,0,1,0,0,0,0,0,0,0
1,2018-03739,0,0,1,0,0,0,0,0,0,0,0
2,2018-00385,1,0,1,0,0,0,0,0,0,0,0
3,2018-03001,0,0,0,0,0,0,0,0,0,0,0
4,2018-01988,0,0,0,0,0,0,0,0,1,0,0


In [37]:
submission.to_csv(model_folder/f'{classifier.model_name}_{date.today()}.csv', index = False)

## Submission

In [38]:
from kaggle import api
comp = 'emotion-detection-spring2014'
api.competition_submit(model_folder/f'{classifier.model_name}_{date.today()}.csv', f'{classifier.model_name}_{date.today()}', comp)



100%|██████████| 105k/105k [00:01<00:00, 61.6kB/s]


Successfully submitted to Emotion Detection Spring2024

# albert-base-v2
## Training

In [39]:
free_memory()
classifier = MultiLabelClassifier(
    model_name="albert-base-v2",
    labels=label_columns,
    batch_size=8,
    learning_rate=2e-5,
    num_epochs=5,
    metric_name="f1",
    threshold=0.5
)

tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/684 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/760k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.31M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/47.4M [00:00<?, ?B/s]

Some weights of AlbertForSequenceClassification were not initialized from the model checkpoint at albert-base-v2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [40]:
classifier.train(trainset['train'], trainset['test'])

Map:   0%|          | 0/5406 [00:00<?, ? examples/s]

Map:   0%|          | 0/2318 [00:00<?, ? examples/s]

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


Step,Training Loss,Validation Loss,F1,Roc Auc,Accuracy
50,0.5191,0.470024,0.040665,0.508625,0.03365
100,0.4517,0.421826,0.456837,0.649874,0.137619
150,0.4275,0.414763,0.374812,0.613446,0.105263
200,0.4105,0.404732,0.459133,0.650922,0.11648
250,0.3984,0.410735,0.466832,0.655196,0.141933
300,0.4146,0.405263,0.505514,0.676163,0.1717
350,0.3851,0.389034,0.542024,0.697264,0.186799
400,0.3863,0.3824,0.527514,0.685456,0.183779
450,0.3795,0.381093,0.578533,0.721682,0.20233
500,0.378,0.371052,0.547765,0.697524,0.181622


[34m[1mwandb[0m: Adding directory to artifact (./albert-base-v2-finetuned/checkpoint-1000)... Done. 0.3s
[34m[1mwandb[0m: Adding directory to artifact (./albert-base-v2-finetuned/checkpoint-2000)... Done. 0.3s
[34m[1mwandb[0m: Adding directory to artifact (./albert-base-v2-finetuned/checkpoint-3000)... Done. 0.3s
dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


Evaluation results: {'eval_loss': 0.3195108473300934, 'eval_f1': 0.6590885816692267, 'eval_roc_auc': 0.7674135891698186, 'eval_accuracy': 0.24892148403796377, 'eval_runtime': 4.6564, 'eval_samples_per_second': 497.814, 'eval_steps_per_second': 62.28, 'epoch': 5.0}


In [44]:
free_memory()

In [45]:
best_threshold = classifier.optimize_threshold(trainset['test'])
print(f"Best threshold: {best_threshold}")

[I 2024-04-14 22:13:59,123] A new study created in memory with name: no-name-500b2df6-6add-426b-a9b0-8a3c82bd2dc0


Map:   0%|          | 0/2318 [00:00<?, ? examples/s]

[W 2024-04-14 22:14:01,589] Trial 0 failed with parameters: {'threshold': 0.7990143604956941} because of the following error: OutOfMemoryError('CUDA out of memory. Tried to allocate 1.70 GiB. GPU 0 has a total capacity of 15.77 GiB of which 708.38 MiB is free. Process 26106 has 15.08 GiB memory in use. Of the allocated memory 12.77 GiB is allocated by PyTorch, and 1.93 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)').
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/optuna/study/_optimize.py", line 196, in _run_trial
    value_or_values = func(trial)
  File "<ipython-input-22-e67155a72669>", line 275, in <lambda>
    study.optimize(lambda trial: self.objective(trial, valid_dataset), n_trials=10)
  File "<ipython-inpu

OutOfMemoryError: CUDA out of memory. Tried to allocate 1.70 GiB. GPU 0 has a total capacity of 15.77 GiB of which 708.38 MiB is free. Process 26106 has 15.08 GiB memory in use. Of the allocated memory 12.77 GiB is allocated by PyTorch, and 1.93 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

In [42]:
outputs, outputs_array = classifier.predict(testset['Tweet'])

OutOfMemoryError: CUDA out of memory. Tried to allocate 1.20 GiB. GPU 0 has a total capacity of 15.77 GiB of which 472.38 MiB is free. Process 26106 has 15.31 GiB memory in use. Of the allocated memory 12.86 GiB is allocated by PyTorch, and 2.07 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

In [43]:
submission[label_columns] = outputs_array

In [None]:
submission.to_csv(model_folder/f'{classifier.model_name}_{date.today()}.csv', index = False)

In [None]:
from kaggle import api
comp = 'emotion-detection-spring2014'
api.competition_submit(model_folder/f'{classifier.model_name}_{date.today()}.csv', f'{classifier.model_name}_{date.today()}', comp)

100%|██████████| 105k/105k [00:02<00:00, 51.6kB/s]


ApiException: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Content-Type': 'application/json', 'Date': 'Fri, 12 Apr 2024 21:43:28 GMT', 'Access-Control-Allow-Credentials': 'true', 'Access-Control-Allow-Origin': '*', 'Set-Cookie': 'ka_sessionid=41779532300b918800bcf624b5bc4c68; max-age=2626560; path=/, GCLB=COfl8LLA_ejzyAEQAw; path=/; HttpOnly', 'Vary': 'Accept-Encoding', 'Turbolinks-Location': 'https://www.kaggle.com/api/v1/competitions/submissions/submit/emotion-detection-spring2014', 'X-Kaggle-MillisecondsElapsed': '69', 'X-Kaggle-RequestId': '217fd2f5ca67f9f24755e2049c91ec48', 'X-Kaggle-ApiVersion': '1.6.11', 'X-Kaggle-HubVersion': '0.2.2', 'X-Frame-Options': 'SAMEORIGIN', 'Strict-Transport-Security': 'max-age=63072000; includeSubDomains; preload', 'Content-Security-Policy': "object-src 'none'; script-src 'nonce-NUl0yz/DbyDAsy2ZE/YXvQ==' 'report-sample' 'unsafe-inline' 'unsafe-eval' 'strict-dynamic' https: http:; base-uri 'none'; report-uri https://csp.withgoogle.com/csp/kaggle/20201130; frame-src 'self' https://www.kaggleusercontent.com https://www.youtube.com/embed/ https://polygraph-cool.github.io https://www.google.com/recaptcha/ https://www.docdroid.com https://www.docdroid.net https://kaggle-static.storage.googleapis.com https://kkb-production.jupyter-proxy.kaggle.net https://kkb-production.firebaseapp.com https://kaggle-metastore.firebaseapp.com https://apis.google.com https://content-sheets.googleapis.com/ https://accounts.google.com/ https://storage.googleapis.com https://docs.google.com https://drive.google.com https://calendar.google.com/;", 'X-Content-Type-Options': 'nosniff', 'Referrer-Policy': 'strict-origin-when-cross-origin', 'Via': '1.1 google', 'Alt-Svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000', 'Transfer-Encoding': 'chunked'})
HTTP response body: {"code":400,"message":"Submission not allowed:  Your team has used its daily Submission allowance (5) today, please try again tomorrow UTC (2.3 hours from now)."}


In [46]:
wandb.finish()

VBox(children=(Label(value='457.797 MB of 457.797 MB uploaded (5.800 MB deduped)\r'), FloatProgress(value=1.0,…

0,1
eval/accuracy,▁▄▄▅▆▆▆▇▆▇▇▇▇██▇█▇████████████████▇█████
eval/f1,▁▆▆▆▇▇▇▇▇▇▇▇████████████████████████████
eval/loss,█▆▅▅▄▄▃▃▃▂▂▂▂▂▁▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
eval/roc_auc,▁▅▅▅▆▇▆▇▆▇▇▇▇▇▇██▇██████████████████████
eval/runtime,▅█▁▂▃▄▂▃▃▃▃▄▅▅▆▄▆▆▁▂▅▃▂▃▃▃▂▃▂▃▂▃▅▄▂▄▄▃▄▅
eval/samples_per_second,▄▁█▇▆▅▇▆▆▅▆▅▄▄▃▅▃▃█▇▄▆▇▆▆▆▇▆▇▆▇▆▃▅▆▅▅▆▅▄
eval/steps_per_second,▄▁█▇▆▅▇▆▆▅▆▅▄▄▃▅▃▃█▇▄▆▇▆▆▆▇▆▇▆▇▆▃▅▆▅▅▆▅▄
eval_accuracy,▁
eval_f1,▁
eval_loss,▁

0,1
eval/accuracy,0.24892
eval/f1,0.65909
eval/loss,0.31951
eval/roc_auc,0.76741
eval/runtime,4.6564
eval/samples_per_second,497.814
eval/steps_per_second,62.28
eval_accuracy,0.24892
eval_f1,0.65909
eval_loss,0.31951


# Flan T5

In [47]:
free_memory()
classifier = MultiLabelClassifier(
    model_name="t5-base",
    labels=label_columns,
    batch_size=8,
    learning_rate=2e-5,
    num_epochs=5,
    metric_name="f1",
    threshold=0.5
)

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/892M [00:00<?, ?B/s]

Some weights of T5ForSequenceClassification were not initialized from the model checkpoint at t5-base and are newly initialized: ['classification_head.dense.bias', 'classification_head.dense.weight', 'classification_head.out_proj.bias', 'classification_head.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [48]:
classifier.train(trainset['train'], trainset['test'])

Map:   0%|          | 0/5406 [00:00<?, ? examples/s]

Map:   0%|          | 0/2318 [00:00<?, ? examples/s]

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


Step,Training Loss,Validation Loss,F1,Roc Auc,Accuracy
50,0.6454,0.558758,0.319025,0.577212,0.016825
100,0.5191,0.47924,0.005828,0.501214,0.028473
150,0.4797,0.4693,0.000366,0.500067,0.02761
200,0.4746,0.468069,0.000366,0.500091,0.028041
250,0.4643,0.468057,0.0,0.5,0.028041
300,0.48,0.466183,0.0,0.5,0.028041
350,0.4724,0.463679,0.001096,0.500175,0.027179
400,0.4727,0.461014,0.000732,0.500183,0.028041
450,0.4683,0.456741,0.014869,0.503527,0.028473
500,0.4589,0.450004,0.055811,0.513482,0.031061


[34m[1mwandb[0m: Adding directory to artifact (./t5-base-finetuned/checkpoint-1000)... Done. 12.7s
[34m[1mwandb[0m: Adding directory to artifact (./t5-base-finetuned/checkpoint-2000)... Done. 13.1s
[34m[1mwandb[0m: Adding directory to artifact (./t5-base-finetuned/checkpoint-3000)... Done. 13.6s
There were missing keys in the checkpoint model loaded: ['transformer.encoder.embed_tokens.weight', 'transformer.decoder.embed_tokens.weight'].
dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


Evaluation results: {'eval_loss': 0.32597923278808594, 'eval_f1': 0.6451948051948052, 'eval_roc_auc': 0.7577489594136737, 'eval_accuracy': 0.23252804141501293, 'eval_runtime': 13.898, 'eval_samples_per_second': 166.787, 'eval_steps_per_second': 20.866, 'epoch': 5.0}


In [49]:
outputs, outputs_array = classifier.predict(testset['Tweet'])

OutOfMemoryError: CUDA out of memory. Tried to allocate 2.39 GiB. GPU 0 has a total capacity of 15.77 GiB of which 542.38 MiB is free. Process 26106 has 15.24 GiB memory in use. Of the allocated memory 11.49 GiB is allocated by PyTorch, and 3.37 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

In [50]:
wandb.finish()

VBox(children=(Label(value='8539.205 MB of 8539.205 MB uploaded (6.187 MB deduped)\r'), FloatProgress(value=1.…

0,1
eval/accuracy,▁▁▁▁▁▁▂▅▆▇▇▇▇▇▇▇▇▇▇▇▇▇██████████████████
eval/f1,▄▁▁▁▁▁▂▆▆▇▇▇▇▇▇▇█▇██████████████████████
eval/loss,█▆▅▅▅▅▄▄▃▃▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
eval/roc_auc,▃▁▁▁▁▁▂▅▆▆▆▇▇▇▇▇▇▇▇▇▇▇▇█████████████████
eval/runtime,▁▁▁▁▂▁▂▁▁▂▂▂▅▂▁▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▂▂▂█▂▂▂▇
eval/samples_per_second,███▇▇▇▇██▇▇▇▄▇█▇█▇▇▇▇▇▇▇▇▇▇▇▇▇██▇▇▇▁▇▇▇▂
eval/steps_per_second,███▇▇▇▇██▇▇▇▄▇█▇█▇▇▇▇▇▇▇▇▇▇▇▇▇██▇▇▇▁▇▇▇▂
eval_accuracy,▁
eval_f1,▁
eval_loss,▁

0,1
eval/accuracy,0.23253
eval/f1,0.64519
eval/loss,0.32598
eval/roc_auc,0.75775
eval/runtime,13.898
eval/samples_per_second,166.787
eval/steps_per_second,20.866
eval_accuracy,0.23253
eval_f1,0.64519
eval_loss,0.32598
