# Emotional Analysis using Hugging Face Ecosystem
##Set Environment
In this notebook, we have to install following additional libraries (compared to previous notebooks) from Huggingface to enhance our workflow: transformers, datasets, evaluate, and accelearte. In addition, we are also installing wandb.

* The transformers library provides Trainer class that we will use to manage Training process.
* The datasets library simplifies the process of accessing and manipulating a wide array of datasets.
* The evaluate library offers a suite of standardized metrics and methods for robust and consistent model evaluation.
* We will not use accelerate library directly. However , we need to install it as transformer librray usses it in the background.
* Finally wandb library provide tools for efficient experiment tracking.

# Setting up the Environment



In [1]:
import sys
# If in Colab, then import the drive module from google.colab
if 'google.colab' in str(get_ipython()):
  from google.colab import drive
  # Mount the Google Drive to access files stored there
  drive.mount('/content/drive')

  # !pip install torchtext -qq
  # # Install the torchinfo library quietly
  !pip install torchinfo -qq
  # # !pip install torchtext --upgrade -qq
  !pip install torchmetrics -qq
  # !pip install torchinfo -qq
  !pip install fast_ml -qq
  !pip install joblib -qq
  # !pip install sklearn -qq
  # !pip install pandas -qq
  # !pip install numpy -qq
  !pip install scikit-multilearn -qq
  !pip install transformers evaluate wandb accelerate -U -qq
  !pip install pytorch-ignite -qq -U
  !pip install optuna -qq

  basepath = '/content/drive/MyDrive/Colab_Notebooks/BUAN_6342_Applied_Natural_Language_Processing'
  sys.path.append('/content/drive/MyDrive/Colab_Notebooks/BUAN_6342_Applied_Natural_Language_Processing/0_Custom_files')
else:
  basepath = '/Users/harikrishnadev/Library/CloudStorage/GoogleDrive-harikrish0607@gmail.com/My Drive/Colab_Notebooks/BUAN_6342_Applied_Natural_Language_Processing/'
  sys.path.append('/Users/harikrishnadev/Library/CloudStorage/GoogleDrive-harikrish0607@gmail.com/My Drive/Colab_Notebooks/BUAN_6342_Applied_Natural_Language_Processing/0_Custom_files')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## *Load Libraries*

In [2]:
# standard data science librraies for data handling and v isualization
from pathlib import Path
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.metrics import multilabel_confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns

# New libraries introduced in this notebook
import evaluate
import torch
from datasets import load_dataset, DatasetDict, ClassLabel, Dataset
from datasets import load_metric
from transformers import Pipeline
from transformers import TrainingArguments, Trainer
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification
from transformers import AutoConfig
from transformers import pipeline
from pprint import pprint

import wandb

import os

In [3]:
# Set the base folder path using the Path class for better path handling
base_folder = Path(basepath)

# Define the data folder path by appending the relative path to the base folder
# This is where the data files will be stored
data_folder = base_folder / '0_Data_Folder'

# Define the model folder path for saving trained models
# This path points to a specific folder designated for NLP models related to the IMDb dataset
model_folder = data_folder

custom_functions = base_folder / '0_Custom_files'

# **Logging into Kaggle**
    


In [4]:
if 'google.colab' in str(get_ipython()):
    !chmod 600 /content/drive/MyDrive/Colab_Notebooks/BUAN_6382_Applied_DeepLearning/Data/.kaggle/kaggle.json
    !ls -la /content/drive/MyDrive/Colab_Notebooks/BUAN_6382_Applied_DeepLearning/Data/.kaggle
else:
    !chmod 600 '/Users/harikrishnadev/Library/CloudStorage/GoogleDrive-harikrish0607@gmail.com/My Drive/Colab_Notebooks/BUAN_6382_Applied_DeepLearning/Data/.kaggle/kaggle.json'
    ! ls -la '/Users/harikrishnadev/Library/CloudStorage/GoogleDrive-harikrish0607@gmail.com/My Drive/Colab_Notebooks/BUAN_6382_Applied_DeepLearning/Data/.kaggle'

total 1
-rw------- 1 root root 70 Nov 27 02:27 kaggle.json


In [5]:
if 'google.colab' in str(get_ipython()):
    os.environ['KAGGLE_CONFIG_DIR']='/content/drive/MyDrive/Colab_Notebooks/BUAN_6382_Applied_DeepLearning/Data/.kaggle'
else:
    os.environ['KAGGLE_CONFIG_DIR']='/Users/harikrishnadev/Library/CloudStorage/GoogleDrive-harikrish0607@gmail.com/My Drive/Colab_Notebooks/BUAN_6382_Applied_DeepLearning/Data/.kaggle'

# **Logging into Wandb**

In [6]:
if 'google.colab' in str(get_ipython()):
    from google.colab import userdata
    wandb.login(key=userdata.get('wandb'))
else:
    !wandb login

[34m[1mwandb[0m: Currently logged in as: [33mharikrish0607[0m ([33mharikrishnad[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


### Free memory command

In [7]:
import gc
def free_memory():
    """
    Attempts to free up memory by deleting variables and running Python's garbage collector.
    """
    gc.collect()
    for device_id in range(torch.cuda.device_count()):
        torch.cuda.set_device(device_id)
        torch.cuda.empty_cache()
    gc.collect()

# **Loading Dataset**

In [8]:
! kaggle competitions download -c emotion-detection-spring2014

emotion-detection-spring2014.zip: Skipping, found more recently modified local copy (use --force to force download)


In [9]:
! unzip emotion-detection-spring2014.zip

Archive:  emotion-detection-spring2014.zip
replace sample_submission.csv? [y]es, [n]o, [A]ll, [N]one, [r]ename: 

In [10]:
import pandas as pd
train_dataset = pd.read_csv('train.csv', usecols=lambda column: column != 'ID')

In [11]:
train_dataset.head()

Unnamed: 0,Tweet,anger,anticipation,disgust,fear,joy,love,optimism,pessimism,sadness,surprise,trust
0,“Worry is a down payment on a problem you may ...,0,1,0,0,0,0,1,0,0,0,1
1,Whatever you decide to do make sure it makes y...,0,0,0,0,1,1,1,0,0,0,0
2,@Max_Kellerman it also helps that the majorit...,1,0,1,0,1,0,1,0,0,0,0
3,Accept the challenges so that you can literall...,0,0,0,0,1,0,1,0,0,0,0
4,My roommate: it's okay that we can't spell bec...,1,0,1,0,0,0,0,0,0,0,0


In [12]:

train_dataset.columns

Index(['Tweet', 'anger', 'anticipation', 'disgust', 'fear', 'joy', 'love',
       'optimism', 'pessimism', 'sadness', 'surprise', 'trust'],
      dtype='object')

In [13]:
label_columns = ['anger', 'anticipation', 'disgust', 'fear', 'joy', 'love', 'optimism', 'pessimism', 'sadness', 'surprise', 'trust']

In [14]:
len(label_columns)

11

In [15]:
train_dataset[label_columns] = train_dataset[label_columns].astype(bool)

In [16]:
trainset = Dataset.from_pandas(train_dataset)

In [17]:
trainset.features

{'Tweet': Value(dtype='string', id=None),
 'anger': Value(dtype='bool', id=None),
 'anticipation': Value(dtype='bool', id=None),
 'disgust': Value(dtype='bool', id=None),
 'fear': Value(dtype='bool', id=None),
 'joy': Value(dtype='bool', id=None),
 'love': Value(dtype='bool', id=None),
 'optimism': Value(dtype='bool', id=None),
 'pessimism': Value(dtype='bool', id=None),
 'sadness': Value(dtype='bool', id=None),
 'surprise': Value(dtype='bool', id=None),
 'trust': Value(dtype='bool', id=None)}

# **Accessing and Manuplating Splits**

In [18]:
trainset = trainset.train_test_split(test_size=0.3)

In [19]:
trainset

DatasetDict({
    train: Dataset({
        features: ['Tweet', 'anger', 'anticipation', 'disgust', 'fear', 'joy', 'love', 'optimism', 'pessimism', 'sadness', 'surprise', 'trust'],
        num_rows: 5406
    })
    test: Dataset({
        features: ['Tweet', 'anger', 'anticipation', 'disgust', 'fear', 'joy', 'love', 'optimism', 'pessimism', 'sadness', 'surprise', 'trust'],
        num_rows: 2318
    })
})

In [20]:
# !huggingface-cli login

In [21]:
# trainset.push_to_hub("harikrishnad1997/tweetemo")

In [22]:
from google.colab import userdata
os.environ["HF_TOKEN"] = userdata.get('HF_TOKEN')

In [23]:
trainset = load_dataset("harikrishnad1997/tweetemo")

In [24]:
trainset

DatasetDict({
    train: Dataset({
        features: ['Tweet', 'anger', 'anticipation', 'disgust', 'fear', 'joy', 'love', 'optimism', 'pessimism', 'sadness', 'surprise', 'trust'],
        num_rows: 5406
    })
    test: Dataset({
        features: ['Tweet', 'anger', 'anticipation', 'disgust', 'fear', 'joy', 'love', 'optimism', 'pessimism', 'sadness', 'surprise', 'trust'],
        num_rows: 2318
    })
})

# Custom `MultiLabelClassifier` class built to run multiple models without repeating mutiple lines of code

The **MultiLabelClassifier** is a class designed for training and evaluating multi-label text classification models using the Hugging Face Transformers library. It supports fine-tuning pre-trained models for multi-label classification tasks and provides methods for prediction and hyperparameter optimization.

* `model_name` (str): The pre-trained model name from Hugging Face Transformers.
* `labels` (list of str): The list of labels for classification.
* `batch_size` (int): Batch size for training (default is 8).
* `learning_rate` (float): Learning rate for training (default is 2e-5).
* `num_epochs` (int): Number of epochs for training (default is 5).
* `metric_name` (str): The name of the evaluation metric (default is "f1").
* `threshold` (float): Threshold for binary classification (default is 0.5).



```python
# Initialize the classifier
classifier = MultiLabelClassifier(
    model_name="distilbert-base-uncased",
    labels=["positive", "negative"],
    batch_size=8,
    learning_rate=2e-5,
    num_epochs=10,
    metric_name="f1",
    threshold=0.5
)

# Train the classifier
classifier.train(train_dataset, valid_dataset)

# Optimize threshold
best_threshold = classifier.optimize_threshold(valid_dataset)

# Make predictions
predictions = classifier.predict(["This is a positive sentence", "This is a negative sentence"], threshold = best_threshold)

```


Here's a detailed explanation of the different components of the class:

1. **__init__** method:
   - Initializes the classifier by taking in various parameters such as the pre-trained model name, the list of labels, batch size, learning rate, number of epochs, evaluation metric, and the classification threshold.
   - It sets up the device (either 'cuda' if a GPU is available or 'cpu'), creates the tokenizer and the pre-trained model for multi-label classification.
   - The model is loaded onto the specified device.

2. **preprocess_data** method:
   - This method takes in a dictionary of examples and preprocesses the data for the model.
   - It tokenizes the input text and encodes it using the tokenizer.
   - It then creates a label matrix where each row corresponds to the binary labels for a given input text.
   - The preprocessed data, including the input IDs and the label matrix, is returned.

3. **multi_label_metrics** method:
   - This method computes the multi-label classification metrics, including F1 score (micro-averaged), ROC-AUC score, and accuracy.
   - It takes in the model predictions and the ground truth labels, and applies a threshold to convert the probabilities to binary predictions.
   - The computed metrics are returned as a dictionary.

4. **compute_metrics** method:
   - This method is used as the `compute_metrics` function for the Trainer in the Transformers library.
   - It calls the `multi_label_metrics` method to compute the evaluation metrics for the model.

5. **train** method:
   - This method is responsible for training the model.
   - It sets up the `TrainingArguments` object, which specifies the training configuration, such as the learning rate, batch size, number of epochs, and various logging and checkpointing options.
   - It preprocesses the training and validation datasets using the `preprocess_data` method and sets the data format to PyTorch tensors.
   - It creates a `Trainer` object and calls the `train` method to train the model.
   - After training, it evaluates the model on the validation dataset and logs the results to Weights & Biases.

6. **predict** method:
   - This method generates predictions for a list of input texts.
   - It preprocesses the input texts using the `preprocess_data` method and makes predictions using the model.
   - It applies the classification threshold to convert the probabilities to binary predictions and returns the predicted labels and the binary predictions.

7. **objective** method:
   - This method is used for hyperparameter optimization using Optuna.
   - It takes in a trial object and the validation dataset, and computes the negative F1 score as the objective function.
   - It applies the threshold (which is a hyperparameter to be optimized) to the model predictions and computes the multi-label metrics.
   - The negative F1 score is returned as the objective value.

8. **optimize_threshold** method:
   - This method uses Optuna to optimize the classification threshold.
   - It creates an Optuna study, optimizes the objective function (the `objective` method), and sets the best threshold value found during the optimization process.
   - The best threshold value is returned.


**Notes:**
* The train_dataset and valid_dataset should be compatible with the Hugging Face Dataset class.
* The labels should match the labels present in the datasets.
* Model fine-tuning and prediction methods require GPU if available for faster computation.

In [25]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer
import numpy as np
from sklearn.metrics import f1_score, roc_auc_score, accuracy_score
import torch
from transformers import EvalPrediction
import optuna
from datetime import date
from sklearn.metrics import multilabel_confusion_matrix

class MultiLabelClassifier:
    def __init__(self, model_name, labels, batch_size=8, learning_rate=2e-5, num_epochs=5, metric_name="f1", threshold=0.5):
        """
        Initializes the MultiLabelClassifier.

        Args:
        - model_name (str): The pre-trained model name.
        - labels (list of str): The list of labels for classification.
        - batch_size (int): Batch size for training.
        - learning_rate (float): Learning rate for training.
        - num_epochs (int): Number of epochs for training.
        - metric_name (str): The name of the evaluation metric.
        - threshold (float): Threshold for binary classification.

        Returns:
        - None
        """
        self.model_name = model_name
        self.labels = labels
        self.device = 'cuda' if torch.cuda.is_available() else 'cpu'
        self.batch_size = batch_size
        self.learning_rate = learning_rate
        self.num_epochs = num_epochs
        self.metric_name = metric_name
        self.threshold = threshold
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModelForSequenceClassification.from_pretrained(model_name, problem_type="multi_label_classification", num_labels=len(labels), id2label={str(i): label for i, label in enumerate(labels)}, label2id={label: i for i, label in enumerate(labels)})
        self.id2label = {str(i): label for i, label in enumerate(labels)}
        self.label2id = {label: i for i, label in enumerate(labels)}
        self.model.to(self.device)

    def preprocess_data(self, examples):
        """
        Preprocesses the input data.

        Args:
        - examples (dict): Dictionary containing input data.

        Returns:
        - dict: Preprocessed input data.
        """
        text = examples["Tweet"]
        encoding = self.tokenizer(text, padding="max_length", truncation=True, max_length=128)
        labels_batch = {k: examples[k] for k in examples.keys() if k in self.labels}
        labels_matrix = np.zeros((len(text), len(self.labels)))
        for idx, label in enumerate(self.labels):
            labels_matrix[:, idx] = labels_batch[label]
        encoding["labels"] = labels_matrix.tolist()
        return encoding

    def multi_label_metrics(self, predictions, labels, threshold=None):
        """
        Computes multi-label classification metrics.

        Args:
        - predictions (torch.Tensor): Model predictions.
        - labels (np.ndarray): Ground truth labels.
        - threshold (float): Threshold for binary classification.

        Returns:
        - dict: Dictionary containing computed metrics.
        """
        if threshold is None:
            threshold = self.threshold
        sigmoid = torch.nn.Sigmoid()
        probs = sigmoid(torch.Tensor(predictions))
        y_pred = np.zeros(probs.shape)
        y_pred[np.where(probs >= threshold)] = 1
        y_true = labels
        f1_micro_average = f1_score(y_true=y_true, y_pred=y_pred, average='micro')
        roc_auc = roc_auc_score(y_true, y_pred, average='micro')
        accuracy = accuracy_score(y_true, y_pred)
        metrics = {'f1': f1_micro_average, 'roc_auc': roc_auc, 'accuracy': accuracy}
        return metrics

    def multilabel_confusion_matrix(self, predictions, labels, threshold=None):
        """
        Computes multilabel confusion matrix.

        Args:
        - predictions (torch.Tensor): Model predictions.
        - labels (np.ndarray): Ground truth labels.
        - threshold (float): Threshold for binary classification.

        Returns:
        - np.ndarray: Multilabel confusion matrix.
        """
        if threshold is None:
            threshold = self.threshold
        sigmoid = torch.nn.Sigmoid()
        probs = sigmoid(torch.Tensor(predictions))
        y_pred = np.zeros(probs.shape)
        y_pred[np.where(probs >= threshold)] = 1
        y_true = labels
        return multilabel_confusion_matrix(y_true, y_pred)

    def compute_metrics(self, p: EvalPrediction):
        """
        Computes evaluation metrics.

        Args:
        - p (EvalPrediction): Evaluation predictions.

        Returns:
        - dict: Dictionary containing computed metrics.
        """
        preds = p.predictions[0] if isinstance(p.predictions, tuple) else p.predictions
        result = self.multi_label_metrics(predictions=preds, labels=p.label_ids)
        return result

    def train(self, train_dataset, valid_dataset, push_to_huggingface=True):
        """
        Trains the model.

        Args:
        - train_dataset (Dataset): Training dataset.
        - valid_dataset (Dataset): Validation dataset.

        Returns:
        - None
        """
        args = TrainingArguments(
            f"{self.model_name}-finetuned",
            # evaluation_strategy="epoch",
            # save_strategy="epoch",
            learning_rate=self.learning_rate,
            per_device_train_batch_size=self.batch_size,
            per_device_eval_batch_size=self.batch_size,
            num_train_epochs=self.num_epochs,
            weight_decay=0.01,
            load_best_model_at_end=True,
            metric_for_best_model="f1",  # Use F1 score as the metric to determine the best model
            optim='adamw_torch',  # Optimizer
            # output_dir=str(model_folder),  # Directory to save model checkpoints
            evaluation_strategy='steps',  # Evaluate model at specified step intervals
            eval_steps=50,  # Perform evaluation every 50 training steps
            save_strategy="steps",  # Save model checkpoint at specified step intervals
            save_steps=1000,  # Save model checkpoint every 1000 training steps
            save_total_limit=2,  # Retain only the best and the most recent model checkpoints
            greater_is_better=True,  # A model is 'better' if its F1 score is higher
            logging_strategy='steps',  # Log metrics and results to Weights & Biases platform
            logging_steps=50,  # Log metrics and results every 50 steps
            report_to='wandb',  # Log metrics and results to Weights & Biases platform
            gradient_accumulation_steps=10,  # Accumulate gradients for every 1 step
            gradient_checkpointing=True,  # Enable gradient checkpointing
            run_name=f"emotion_tweet_{self.model_name}_{date.today().strftime('%Y-%m-%d_%H-%M-%S')}",  # Experiment name for Weights & Biases
            fp16=True  # Use mixed precision training (FP16)
            )

        train_dataset = train_dataset.map(self.preprocess_data, batched=True, remove_columns=train_dataset.column_names)
        valid_dataset = valid_dataset.map(self.preprocess_data, batched=True, remove_columns=valid_dataset.column_names)

        train_dataset.set_format("torch")
        valid_dataset.set_format("torch")

        trainer = Trainer(
            self.model,
            args,
            train_dataset=train_dataset,
            eval_dataset=valid_dataset,
            tokenizer=self.tokenizer,
            compute_metrics=self.compute_metrics,
        )

        trainer.train()
        eval_results = trainer.evaluate()
        print(f"Evaluation results: {eval_results}")

        # Pushing model to Huggingface
        if push_to_huggingface:
          model_name_hf = f"emotion_tweet_{self.model_name}_{date.today().strftime('%Y-%m-%d')}"
          self.model.push_to_hub(model_name_hf)
          print(f"Model pushed to Huggingface: harikrishnad1997/{model_name_hf}")


        # Log evaluation results to Weights & Biases platform
        wandb.log({"eval_accuracy": eval_results["eval_accuracy"], "eval_loss": eval_results["eval_loss"], "eval_f1": eval_results["eval_f1"]})

        # # Compute and plot confusion matrix
        # preds = trainer.predict(valid_dataset)
        # y_labels = valid_dataset[self.labels]
        # confusion_matrix = self.multilabel_confusion_matrix(preds, y_labels)
        # plt.figure(figsize=(10, 7))
        # sns.heatmap(confusion_matrix, annot=True, cmap="Blues")
        # plt.xlabel("Predicted Labels")
        # plt.ylabel("True Labels")
        # plt.title("Multilabel Confusion Matrix")
        # plt.show()

        # # Log confusion matrix to Weights & Biases platform
        # wandb.log({"confusion_matrix": wandb.Image(plt)})

    def predict(self, texts, threshold=0.5, load_from_huggingface=False):
        """
        Generates predictions for a list of texts.

        Args:
        - texts (list of str): List of input texts.
        - threshold (float): Threshold for binary classification.

        Returns:
        - dict: Dictionary containing predicted labels for each input text.
        """
        if threshold is None:
            threshold = self.threshold

        # Load the model from Hugging Face if specified
        if load_from_huggingface:
          self.model = AutoModelForSequenceClassification.from_pretrained(load_from_huggingface)
          # self.tokenizer = AutoTokenizer.from_pretrained(load_from_huggingface)
          self.model.to("cpu")
        else:
          # Use the model from training
          self.model.to("cpu")


        # Preprocess input texts
        encoding = self.tokenizer(texts, padding="max_length", truncation=True, max_length=128, return_tensors="pt").to("cpu")

        # Make predictions
        with torch.no_grad():
            output = self.model(**encoding)

        # Convert logits to probabilities
        sigmoid = torch.nn.Sigmoid()
        probs = sigmoid(output.logits)

        # Apply threshold for binary classification
        threshold_tensor = torch.tensor([threshold], device="cpu")
        binary_preds = (probs >= threshold_tensor).int()

        # Convert binary predictions to label names
        label_preds = []
        for pred in binary_preds:
            label_pred = [self.id2label[str(i)] for i, val in enumerate(pred) if val == 1]
            label_preds.append(label_pred)

        return label_preds, binary_preds.cpu().numpy()

    def objective(self, trial, valid_dataset):
        """
        Objective function for hyperparameter optimization.

        Args:
        - trial (Trial): Optuna trial object.
        - valid_dataset (Dataset): Validation dataset.

        Returns:
        - float: Computed metric value.
        """
        threshold = trial.suggest_float("threshold", 0.1, 0.9)
        valid_dataset = valid_dataset.map(self.preprocess_data, batched=True)
        valid_dataset.set_format("torch")

        # Get the correct labels from the dataset
        labels = np.array([valid_dataset[column] for column in self.labels]).T

        # Model
        self.model.to("cpu")

        # Make predictions
        with torch.no_grad():
            logits = self.model(valid_dataset["input_ids"].to(torch.device("cpu")))['logits']
            predictions = torch.sigmoid(logits).cpu().numpy()

            # Apply threshold for binary classification
            binary_preds = (predictions >= threshold).astype(int)

            # Compute metrics
            f1_micro_average = f1_score(y_true=labels, y_pred=binary_preds, average='micro')
            roc_auc = roc_auc_score(labels, predictions, average='micro')
            accuracy = accuracy_score(labels, binary_preds)

            result = {'f1': f1_micro_average, 'roc_auc': roc_auc, 'accuracy': accuracy}
            return -result["f1"]

    def optimize_threshold(self, valid_dataset):
        """
        Optimizes the threshold for binary classification.

        Args:
        - valid_dataset (Dataset): Validation dataset.

        Returns:
        - float: Best threshold value.
        """
        study = optuna.create_study(direction="maximize")
        study.optimize(lambda trial: self.objective(trial, valid_dataset), n_trials=10)
        self.threshold = study.best_params["threshold"]
        return study.best_params["threshold"]

In [26]:
os.environ["WANDB_PROJECT"] = "nlp_course_spring_2024-emotion-analysis-hf-trainer-hw6"  # name your W&B project
os.environ["WANDB_LOG_MODEL"] = "checkpoint"  # log the model during training

# Distill BERT
## Training the model

In [None]:
classifier = MultiLabelClassifier(
    model_name="distilbert-base-uncased",
    labels=label_columns,
    batch_size=8,
    learning_rate=2e-5,
    num_epochs=5,
    metric_name="f1",
    threshold=0.5
)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
classifier.train(trainset['train'], trainset['test'], push_to_huggingface=True)

Map:   0%|          | 0/5406 [00:00<?, ? examples/s]

Map:   0%|          | 0/2318 [00:00<?, ? examples/s]

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


[34m[1mwandb[0m: Currently logged in as: [33mharikrish0607[0m ([33mharikrishnad[0m). Use [1m`wandb login --relogin`[0m to force relogin


Step,Training Loss,Validation Loss,F1,Roc Auc,Accuracy
50,0.5541,0.484929,0.0,0.5,0.022865
100,0.4912,0.467407,0.0,0.5,0.022865
150,0.4596,0.437073,0.30868,0.588643,0.105263
200,0.4226,0.4142,0.4866,0.665019,0.163503
250,0.4056,0.392215,0.557127,0.704776,0.190682
300,0.3895,0.3731,0.564522,0.704798,0.200173
350,0.3625,0.364138,0.568786,0.707529,0.200173
400,0.3548,0.373807,0.574513,0.712337,0.214409
450,0.3544,0.348271,0.617738,0.738837,0.235548
500,0.3609,0.353648,0.620933,0.743608,0.22994


[34m[1mwandb[0m: Adding directory to artifact (./distilbert-base-uncased-finetuned/checkpoint-1000)... Done. 6.5s
[34m[1mwandb[0m: Adding directory to artifact (./distilbert-base-uncased-finetuned/checkpoint-2000)... Done. 2.5s
[34m[1mwandb[0m: Adding directory to artifact (./distilbert-base-uncased-finetuned/checkpoint-3000)... Done. 2.3s
dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


Evaluation results: {'eval_loss': 0.3178046643733978, 'eval_f1': 0.6674698795180722, 'eval_roc_auc': 0.7742836032282873, 'eval_accuracy': 0.2553925798101812, 'eval_runtime': 3.368, 'eval_samples_per_second': 688.251, 'eval_steps_per_second': 86.106, 'epoch': 5.0}


model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Model pushed to Huggingface: emotion_tweet_distilbert-base-uncased_2024-04-15


## Finding the optimal threshold

In [None]:
best_threshold = classifier.optimize_threshold(trainset['test'])
print(f"Best threshold: {best_threshold}")

[I 2024-04-15 00:39:25,420] A new study created in memory with name: no-name-0cbf9d20-0d68-4de6-b7e9-cd41bce81768


Map:   0%|          | 0/2318 [00:00<?, ? examples/s]

We strongly recommend passing in an `attention_mask` since your input_ids may be padded. See https://huggingface.co/docs/transformers/troubleshooting#incorrect-output-when-padding-tokens-arent-masked.
[I 2024-04-15 00:39:31,435] Trial 0 finished with value: -0.19591836734693877 and parameters: {'threshold': 0.899836097167448}. Best is trial 0 with value: -0.19591836734693877.


Map:   0%|          | 0/2318 [00:00<?, ? examples/s]

[I 2024-04-15 00:39:37,581] Trial 1 finished with value: -0.46679174484052544 and parameters: {'threshold': 0.600169879757231}. Best is trial 0 with value: -0.19591836734693877.


Map:   0%|          | 0/2318 [00:00<?, ? examples/s]

[I 2024-04-15 00:39:43,654] Trial 2 finished with value: -0.587245349867139 and parameters: {'threshold': 0.18247047654261828}. Best is trial 0 with value: -0.19591836734693877.


Map:   0%|          | 0/2318 [00:00<?, ? examples/s]

[I 2024-04-15 00:39:49,744] Trial 3 finished with value: -0.42064653452175066 and parameters: {'threshold': 0.6786410238931102}. Best is trial 0 with value: -0.19591836734693877.


Map:   0%|          | 0/2318 [00:00<?, ? examples/s]

[I 2024-04-15 00:39:55,976] Trial 4 finished with value: -0.5757073844030366 and parameters: {'threshold': 0.15956080848961315}. Best is trial 0 with value: -0.19591836734693877.


Map:   0%|          | 0/2318 [00:00<?, ? examples/s]

[I 2024-04-15 00:40:02,394] Trial 5 finished with value: -0.585867463526103 and parameters: {'threshold': 0.2707741829167005}. Best is trial 0 with value: -0.19591836734693877.


Map:   0%|          | 0/2318 [00:00<?, ? examples/s]

[I 2024-04-15 00:40:08,648] Trial 6 finished with value: -0.5201109570041609 and parameters: {'threshold': 0.49891072043944973}. Best is trial 0 with value: -0.19591836734693877.


Map:   0%|          | 0/2318 [00:00<?, ? examples/s]

[I 2024-04-15 00:40:15,194] Trial 7 finished with value: -0.560654892355292 and parameters: {'threshold': 0.137711033657284}. Best is trial 0 with value: -0.19591836734693877.


Map:   0%|          | 0/2318 [00:00<?, ? examples/s]

[I 2024-04-15 00:40:21,157] Trial 8 finished with value: -0.5809510331163317 and parameters: {'threshold': 0.16718984374130336}. Best is trial 0 with value: -0.19591836734693877.


Map:   0%|          | 0/2318 [00:00<?, ? examples/s]

[I 2024-04-15 00:40:27,584] Trial 9 finished with value: -0.3759504364967614 and parameters: {'threshold': 0.7431786801176511}. Best is trial 0 with value: -0.19591836734693877.


Best threshold: 0.899836097167448


In [None]:
best_threshold

0.899836097167448

In [None]:
wandb.finish()

VBox(children=(Label(value='2558.420 MB of 2558.445 MB uploaded (1.813 MB deduped)\r'), FloatProgress(value=0.…

0,1
eval/accuracy,▁▁▅▆▆▇▇▇▇███████████████████████████████
eval/f1,▁▁▆▇▇▇█▇▇███████████████████████████████
eval/loss,█▇▅▃▃▂▂▂▂▂▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
eval/roc_auc,▁▁▅▆▆▇▇▇▇▇▇█▇███████████████████████████
eval/runtime,▂▁▁▂▃▄▅▅█▆▆▅▅▅▆▇▆▅▅▅▅▅▅▅▆▇▅▅▅▅▅▅▅▅▅▆▅▅▆▇
eval/samples_per_second,▇█▇▇▆▅▄▄▁▃▃▄▄▄▃▂▃▄▄▄▄▃▄▃▃▂▄▄▄▃▄▄▄▄▄▃▃▃▃▂
eval/steps_per_second,▇█▇▇▆▅▄▄▁▃▃▄▄▄▃▂▃▄▄▄▄▃▄▃▃▂▄▄▄▃▄▄▄▄▄▃▃▃▃▂
eval_accuracy,▁
eval_f1,▁
eval_loss,▁

0,1
eval/accuracy,0.25539
eval/f1,0.66747
eval/loss,0.3178
eval/roc_auc,0.77428
eval/runtime,3.368
eval/samples_per_second,688.251
eval/steps_per_second,86.106
eval_accuracy,0.25539
eval_f1,0.66747
eval_loss,0.3178


## Prediction on Submission file

In [27]:
test = pd.read_csv('test.csv')
test.head()

Unnamed: 0,ID,Tweet,anger,anticipation,disgust,fear,joy,love,optimism,pessimism,sadness,surprise,trust
0,2018-01559,@Adnan__786__ @AsYouNotWish Dont worry Indian ...,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE
1,2018-03739,"Academy of Sciences, eschews the normally sobe...",NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE
2,2018-00385,I blew that opportunity -__- #mad,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE
3,2018-03001,This time in 2 weeks I will be 30... 😥,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE
4,2018-01988,#Deppression is real. Partners w/ #depressed p...,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE


In [28]:
testset = Dataset.from_dict({
    'Tweet': test['Tweet']})

In [29]:
testset

Dataset({
    features: ['Tweet'],
    num_rows: 3259
})

In [None]:
outputs, outputs_array = classifier.predict(testset['Tweet'], threshold = best_threshold,load_from_huggingface='harikrishnad1997/emotion_tweet_distilbert-base-uncased_2024-04-15')

In [None]:
outputs[:10]

[['fear'],
 ['disgust'],
 ['anger', 'disgust'],
 [],
 ['sadness'],
 ['fear'],
 [],
 ['joy'],
 ['joy', 'optimism'],
 ['sadness']]

In [None]:
outputs_array[:10]

array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
       [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]], dtype=int32)

In [None]:
test[label_columns] = outputs_array

In [None]:
# submission = pd.read_csv('sample_submission.csv')
# submission.head()

Unnamed: 0,ID,anger,anticipation,disgust,fear,joy,love,optimism,pessimism,sadness,surprise,trust
0,2018-01559,0,0,0,0,0,0,0,0,0,0,0
1,2018-03739,0,0,0,0,0,0,0,0,0,0,0
2,2018-00385,0,0,0,0,0,0,0,0,0,0,0
3,2018-03001,0,0,0,0,0,0,0,0,0,0,0
4,2018-01988,0,0,0,0,0,0,0,0,0,0,0


In [None]:
submission = test.drop(columns = ['Tweet'])
submission.head()

Unnamed: 0,ID,anger,anticipation,disgust,fear,joy,love,optimism,pessimism,sadness,surprise,trust
0,2018-01559,0,0,0,0,0,0,0,0,0,0,0
1,2018-03739,0,0,1,0,0,0,0,0,0,0,0
2,2018-00385,1,0,1,0,0,0,0,0,0,0,0
3,2018-03001,0,0,0,0,1,0,0,0,0,0,0
4,2018-01988,0,0,0,0,0,0,0,0,1,0,0


In [None]:
submission.to_csv(model_folder/f'{classifier.model_name}_{date.today()}.csv', index = False)

## Submission

In [None]:
from kaggle import api
comp = 'emotion-detection-spring2014'
api.competition_submit(model_folder/f'{classifier.model_name}_{date.today()}.csv', f'{classifier.model_name}_{date.today()}', comp)



100%|██████████| 105k/105k [00:01<00:00, 61.6kB/s]


Successfully submitted to Emotion Detection Spring2024

# albert-base-v2
## Training

In [None]:
free_memory()
classifier = MultiLabelClassifier(
    model_name="albert-base-v2",
    labels=label_columns,
    batch_size=8,
    learning_rate=2e-5,
    num_epochs=5,
    metric_name="f1",
    threshold=0.5
)

tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/684 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/760k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.31M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/47.4M [00:00<?, ?B/s]

Some weights of AlbertForSequenceClassification were not initialized from the model checkpoint at albert-base-v2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
classifier.train(trainset['train'], trainset['test'],push_to_huggingface=True)

Map:   0%|          | 0/5406 [00:00<?, ? examples/s]

Map:   0%|          | 0/2318 [00:00<?, ? examples/s]

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


Step,Training Loss,Validation Loss,F1,Roc Auc,Accuracy
50,0.517,0.47092,0.0,0.499975,0.022865
100,0.4815,0.450299,0.15425,0.539354,0.03667
150,0.4349,0.421976,0.425537,0.634821,0.128991
200,0.4207,0.418621,0.473031,0.659074,0.141501
250,0.4055,0.396566,0.53604,0.694253,0.184642
300,0.4052,0.413835,0.456921,0.650213,0.142364
350,0.3856,0.423896,0.475928,0.66108,0.165229
400,0.3794,0.400368,0.482043,0.662262,0.151424
450,0.3773,0.371143,0.57435,0.713994,0.201898
500,0.3893,0.371793,0.583534,0.722158,0.206644


[34m[1mwandb[0m: Adding directory to artifact (./albert-base-v2-finetuned/checkpoint-1000)... Done. 0.3s
[34m[1mwandb[0m: Adding directory to artifact (./albert-base-v2-finetuned/checkpoint-2000)... Done. 0.3s
[34m[1mwandb[0m: Adding directory to artifact (./albert-base-v2-finetuned/checkpoint-3000)... Done. 0.3s
dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


Evaluation results: {'eval_loss': 0.3197968304157257, 'eval_f1': 0.6594739005343198, 'eval_roc_auc': 0.7666215797009197, 'eval_accuracy': 0.24935289042277825, 'eval_runtime': 8.9335, 'eval_samples_per_second': 259.473, 'eval_steps_per_second': 32.462, 'epoch': 5.0}


model.safetensors:   0%|          | 0.00/46.8M [00:00<?, ?B/s]

Model pushed to Huggingface: emotion_tweet_albert-base-v2_2024-04-15


In [None]:
free_memory()

In [None]:
# best_threshold = classifier.optimize_threshold(trainset['test'])
# print(f"Best threshold: {best_threshold}")

In [None]:
outputs, outputs_array = classifier.predict(testset['Tweet'],load_from_huggingface='harikrishnad1997/emotion_tweet_albert-base-v2_2024-04-15')

config.json:   0%|          | 0.00/1.34k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/46.8M [00:00<?, ?B/s]

In [None]:
submission[label_columns] = outputs_array

In [None]:
submission.to_csv(model_folder/f'{classifier.model_name}_{date.today()}.csv', index = False)

In [None]:
from kaggle import api
comp = 'emotion-detection-spring2014'
api.competition_submit(model_folder/f'{classifier.model_name}_{date.today()}.csv', f'{classifier.model_name}_{date.today()}', comp)



100%|██████████| 105k/105k [00:02<00:00, 42.7kB/s]


Successfully submitted to Emotion Detection Spring2024

In [None]:
wandb.finish()

VBox(children=(Label(value='457.797 MB of 457.797 MB uploaded (5.800 MB deduped)\r'), FloatProgress(value=1.0,…

0,1
eval/accuracy,▁▁▅▅▅▆▇▆▆▇▇▇▇▇█▇▇▇██████▇███████████████
eval/f1,▁▃▆▆▆▇▇▇▇▇██▇███████████████████████████
eval/loss,█▇▆▅▆▃▃▃▄▃▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
eval/roc_auc,▁▂▅▅▅▇▆▆▆▇▇▇▇▇█▇▇███████████████████████
eval/runtime,█▆▂▃▂▂▁▂▁▁▁▁▂▄▁▂▂▁▂▁▁▁▁▁▁▁▂▂▂▄▄▂▃▃▄▃▃▄▅▄
eval/samples_per_second,▁▃▇▆▇▇█▇████▇▅█▇▇█▇███████▇▇▇▅▅▇▆▆▅▆▆▅▄▅
eval/steps_per_second,▁▃▇▆▇▇█▇████▇▅█▇▇█▇███████▇▇▇▅▅▇▆▆▅▆▆▅▄▅
eval_accuracy,▁
eval_f1,▁
eval_loss,▁

0,1
eval/accuracy,0.24935
eval/f1,0.65947
eval/loss,0.3198
eval/roc_auc,0.76662
eval/runtime,8.9335
eval/samples_per_second,259.473
eval/steps_per_second,32.462
eval_accuracy,0.24935
eval_f1,0.65947
eval_loss,0.3198


# Flan T5 base

In [None]:
free_memory()
classifier = MultiLabelClassifier(
    model_name="t5-base",
    labels=label_columns,
    batch_size=8,
    learning_rate=2e-5,
    num_epochs=5,
    metric_name="f1",
    threshold=0.5
)

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/892M [00:00<?, ?B/s]

Some weights of T5ForSequenceClassification were not initialized from the model checkpoint at t5-base and are newly initialized: ['classification_head.dense.bias', 'classification_head.dense.weight', 'classification_head.out_proj.bias', 'classification_head.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
classifier.train(trainset['train'], trainset['test'])

Map:   0%|          | 0/5406 [00:00<?, ? examples/s]

Map:   0%|          | 0/2318 [00:00<?, ? examples/s]

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


Step,Training Loss,Validation Loss,F1,Roc Auc,Accuracy
50,0.6508,0.558692,0.299278,0.569599,0.01855
100,0.5272,0.477653,0.047198,0.508827,0.022433
150,0.4808,0.469784,0.001456,0.500165,0.02157
200,0.4795,0.467826,0.002545,0.500388,0.02157
250,0.479,0.466403,0.0,0.49995,0.022433
300,0.4774,0.46546,0.000364,0.500041,0.022433
350,0.4564,0.464198,0.000729,0.500157,0.022865
400,0.464,0.46223,0.002185,0.500497,0.023296
450,0.4689,0.456367,0.004003,0.500978,0.022865
500,0.4605,0.450905,0.042946,0.510283,0.028473


[34m[1mwandb[0m: Adding directory to artifact (./t5-base-finetuned/checkpoint-1000)... Done. 13.6s
[34m[1mwandb[0m: Adding directory to artifact (./t5-base-finetuned/checkpoint-2000)... Done. 15.2s
[34m[1mwandb[0m: Adding directory to artifact (./t5-base-finetuned/checkpoint-3000)... Done. 11.1s
There were missing keys in the checkpoint model loaded: ['transformer.encoder.embed_tokens.weight', 'transformer.decoder.embed_tokens.weight'].
dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


Evaluation results: {'eval_loss': 0.33038589358329773, 'eval_f1': 0.6327754973160721, 'eval_roc_auc': 0.7488127020057515, 'eval_accuracy': 0.2385677308024159, 'eval_runtime': 19.0087, 'eval_samples_per_second': 121.944, 'eval_steps_per_second': 15.256, 'epoch': 5.0}


model.safetensors:   0%|          | 0.00/894M [00:00<?, ?B/s]

Model pushed to Huggingface: emotion_tweet_t5-base_2024-04-15


In [None]:
# best_threshold = classifier.optimize_threshold(trainset['test'])
# print(f"Best threshold: {best_threshold}")

In [None]:
outputs, outputs_array = classifier.predict(testset['Tweet'],load_from_huggingface='harikrishnad1997/emotion_tweet_t5-base_2024-04-15')

config.json:   0%|          | 0.00/2.00k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/894M [00:00<?, ?B/s]

In [None]:
test[label_columns] = outputs_array
# submission = test.drop(columns = ['Tweet'])
submission.to_csv(model_folder/f'{classifier.model_name}_{date.today()}.csv', index = False)

In [None]:
from kaggle import api
comp = 'emotion-detection-spring2014'
api.competition_submit(model_folder/f'{classifier.model_name}_{date.today()}.csv', f'{classifier.model_name}_{date.today()}', comp)

In [None]:
wandb.finish()

VBox(children=(Label(value='8539.205 MB of 8539.205 MB uploaded (6.187 MB deduped)\r'), FloatProgress(value=1.…

0,1
eval/accuracy,▁▁▁▁▁▁▂▅▆▇▇▇▇▇▇▇▇▇▇▇▇▇██████████████████
eval/f1,▄▁▁▁▁▁▂▆▆▇▇▇▇▇▇▇█▇██████████████████████
eval/loss,█▆▅▅▅▅▄▄▃▃▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
eval/roc_auc,▃▁▁▁▁▁▂▅▆▆▆▇▇▇▇▇▇▇▇▇▇▇▇█████████████████
eval/runtime,▁▁▁▁▂▁▂▁▁▂▂▂▅▂▁▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▂▂▂█▂▂▂▇
eval/samples_per_second,███▇▇▇▇██▇▇▇▄▇█▇█▇▇▇▇▇▇▇▇▇▇▇▇▇██▇▇▇▁▇▇▇▂
eval/steps_per_second,███▇▇▇▇██▇▇▇▄▇█▇█▇▇▇▇▇▇▇▇▇▇▇▇▇██▇▇▇▁▇▇▇▂
eval_accuracy,▁
eval_f1,▁
eval_loss,▁

0,1
eval/accuracy,0.23253
eval/f1,0.64519
eval/loss,0.32598
eval/roc_auc,0.75775
eval/runtime,13.898
eval/samples_per_second,166.787
eval/steps_per_second,20.866
eval_accuracy,0.23253
eval_f1,0.64519
eval_loss,0.32598


# [Link to Weights & Biases Report](https://api.wandb.ai/links/harikrishnad/gkemgt95):  https://api.wandb.ai/links/harikrishnad/gkemgt95

In [30]:
free_memory()
classifier = MultiLabelClassifier(
    model_name="google/flan-t5-base",
    labels=label_columns,
    batch_size=2,
    learning_rate=2e-5,
    num_epochs=5,
    metric_name="f1",
    threshold=0.5
)

Some weights of T5ForSequenceClassification were not initialized from the model checkpoint at google/flan-t5-base and are newly initialized: ['classification_head.dense.bias', 'classification_head.dense.weight', 'classification_head.out_proj.bias', 'classification_head.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [31]:
classifier.train(trainset['train'], trainset['test'])

Map:   0%|          | 0/5406 [00:00<?, ? examples/s]

Map:   0%|          | 0/2318 [00:00<?, ? examples/s]



Step,Training Loss,Validation Loss,F1,Roc Auc,Accuracy
50,0.0,,0.0,0.5,0.022865
100,0.0,,0.0,0.5,0.022865
150,0.0,,0.0,0.5,0.022865
200,0.0,,0.0,0.5,0.022865
250,0.0,,0.0,0.5,0.022865
300,0.0,,0.0,0.5,0.022865
350,0.0,,0.0,0.5,0.022865
400,0.0,,0.0,0.5,0.022865
450,0.0,,0.0,0.5,0.022865
500,0.0,,0.0,0.5,0.022865




ValueError: Artifact name may only contain alphanumeric characters, dashes, underscores, and dots. Invalid name: checkpoint-emotion_tweet_google/flan-t5-base_2024-04-18_00-00-00

In [32]:
wandb.finish()

VBox(children=(Label(value='0.001 MB of 0.019 MB uploaded\r'), FloatProgress(value=0.06053043648463496, max=1.…

0,1
eval/accuracy,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
eval/f1,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
eval/roc_auc,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
eval/runtime,█▃▆▅▄▄▃▅▆▅▁▂▄▃▃▁▁▄▄▅
eval/samples_per_second,▁▆▃▄▅▅▆▄▃▄▇▇▅▆▆██▅▅▄
eval/steps_per_second,▁▆▃▄▅▅▆▄▃▄▇▇▅▆▆██▅▅▄
train/epoch,▁▁▁▁▂▂▂▂▂▂▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▇▇▇▇▇▇████
train/global_step,▁▁▁▁▂▂▂▂▂▂▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▇▇▇▇▇▇████
train/learning_rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
train/loss,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
eval/accuracy,0.02286
eval/f1,0.0
eval/loss,
eval/roc_auc,0.5
eval/runtime,66.742
eval/samples_per_second,34.731
eval/steps_per_second,17.365
train/epoch,3.69959
train/global_step,1000.0
train/grad_norm,


## Notes about Flan t5 base


*   With the Gradient accumlation and small batch size, I was still getting NaN values in predictions.

