<a href="https://colab.research.google.com/github/hudsonmendes/cm3070-fp/blob/objective_3/dev/mlops.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Load Code & Data

In this section, we load the code and the data locally, so that we can utilise  the `hlm12erc` package and use the ml pipelines as they have been designed.

The code files are copied from the following folder, and it's copied to the root directory of the present runtime.
> `/content/drive/MyDrive/Code/github/universityoflondon/cm3070-fp/*`

The following .zip file contains the data compressed, and we decompress it into the `./data` folder:
> `/content/drive/MyDrive/Datasets/meld-transformed.zip`

In [None]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


In [None]:
import os
if not os.path.exists("./pyproject.toml"):
  !rm -rf ./src
  !rm -rf ./tests
  !rm -rf ./configs
  !cp -R /content/drive/MyDrive/Code/github/universityoflondon/cm3070-fp_20230820_0640/* .
  print("Source Code: overwritten!")

if not os.path.exists("./data/"):
  !unzip -j "/content/drive/MyDrive/Datasets/meld-transformed.zip" -d "./data/"
  print("Data: overwritten!")

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
  inflating: ./data/d-481-seq-13.png  
  inflating: ./data/d-87-seq-6.png   
  inflating: ./data/d-158-seq-2.png  
  inflating: ./data/d-387-seq-1.wav  
  inflating: ./data/d-867-seq-6.wav  
  inflating: ./data/d-472-seq-2.png  
  inflating: ./data/d-191-seq-2.png  
  inflating: ./data/d-879-seq-0.png  
  inflating: ./data/d-350-seq-7.png  
  inflating: ./data/d-223-seq-12.png  
  inflating: ./data/d-50-seq-0.wav   
  inflating: ./data/d-146-seq-4.wav  
  inflating: ./data/d-99-seq-0.wav   
  inflating: ./data/d-289-seq-5.png  
  inflating: ./data/d-184-seq-19.png  
  inflating: ./data/d-973-seq-16.wav  
  inflating: ./data/d-1026-seq-11.wav  
  inflating: ./data/d-969-seq-2.png  
  inflating: ./data/d-240-seq-5.png  
  inflating: ./data/d-29-seq-8.wav   
  inflating: ./data/d-232-seq-10.png  
  inflating: ./data/d-120-seq-14.png  
  inflating: ./data/d-288-seq-7.wav  
  inflating: ./data/d-1005-seq-1.wav  
  inflating: .

In [None]:
!rm -rf ./data/*.csv
!unzip -j "/content/drive/MyDrive/Datasets/meld-transformerd-csvs.zip" -d "./data/"

Archive:  /content/drive/MyDrive/Datasets/meld-transformerd-csvs.zip
  inflating: ./data/sample.csv       
  inflating: ./data/test.csv         
  inflating: ./data/valid.csv        
  inflating: ./data/train.csv        


# Environment

In this section of our project, we delve into the technical groundwork, outlining the structuring of our dependencies, initializing key system configurations and paths that will be leveraged throughout the ensuing stages.

At the heart of this setup is the setup.cfg file, which lists our project's dependencies and facilitates the seamless installation of our custom-built **`hlm12erc`** library. By using the `-e` option for pip, we unlock dynamic editing capabilities for the library's codebase without requiring repeated reinstallation.

To optimise our system, we've crafted different sets of dependencies for each critical task, including `etl`, `eda`, `modelling`, `training`, and `serving`, allowing us to avoid redundant installations in environments where certain packages aren't needed. We also establish specific log levels and configure Jupyter's `auto-reload` mechanisms, empowering us with real-time updates and valuable debugging insights.

## Dependencies

In [None]:
!cat ./pyproject.toml

[build-system]
requires = ["setuptools", "wheel"]

[project]
name = "hlm12erc"
version = "0.0.1"
authors = [{ name = "Hudson Mendes", email = "hlm12@student.london.ac.uk" }]
description = "Final Project from University of London"
readme = "README.md"
license = { file = "LICENSE" }
urls = { homepage = "https://github.com/hudsonmendes/cm3070-fp" }
keywords = ["university-of-london"]
dependencies = [
    "torch >= 2.0.1",
    "torchtext >= 0.15.2",
    "torchvision >= 0.15.2",
    "transformers >= 4.30.2",
    "Pillow >= 10.0.0",
    "scikit-learn >= 1.3.0",
]

[project.optional-dependencies]
dev = ["pre-commit>=3.3.3", "black[jupyter]>=23.7.0", "isort>=5.12.0"]
test = ["pytest>=7.4.0"]
etl = [
    "kaggle>=1.5.13",
    "tqdm>=4.65.0",
    "pandas>=2.0.1",
    "google-cloud-storage>=2.10.0",
    "moviepy>=1.0.3",
]
eda = [
    "gensim",
    "tensorflow",
    "tensorflow-hub",
    "torch",
    "transformers",
    "librosa",
    "umap-learn",
    "matplotlib",
    "wordcloud",
    "pyLDAvis

In [None]:
%pip install -e '.[training]'

Obtaining file:///content
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
Collecting transformers>=4.30.2 (from hlm12erc==0.0.1)
  Downloading transformers-4.31.0-py3-none-any.whl (7.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.4/7.4 MB[0m [31m57.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting Pillow>=10.0.0 (from hlm12erc==0.0.1)
  Downloading Pillow-10.0.0-cp310-cp310-manylinux_2_28_x86_64.whl (3.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.4/3.4 MB[0m [31m95.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting scikit-learn>=1.3.0 (from hlm12erc==0.0.1)
  Downloading scikit_learn-1.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (10.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.8/10.8 

**Attention:** for the first time you run this notebook in a runtime, you you must restart your kernel at this point, because the dependencies you installed above bring in newer versions of libraries like `pandas`,etc.

In [None]:
import IPython
IPython.Application.instance().kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}

In [1]:
%load_ext autoreload
%autoreload 2

## Logging

In [2]:
import logging
logging.basicConfig(level=logging.INFO)

In [3]:
import warnings
warnings.filterwarnings("ignore")

## Paths & Locations

In [4]:
import pathlib

# now that the chdir is set to the parent directory of the notebook,
# we can work as if we were running in the root directory of the repository
dir_home = pathlib.Path("./")
dir_data = dir_home / "data"
dir_target = dir_home / "target"
dir_configs = dir_home / "configs"

## GPUs for Training

For this MLOps Pipeline, we use GPUs to accelerate Machine Learning Training.

TPUs have been tried out, but the limitations and constraints related to the TPU architecture caused it to be inviable to the available timeframe of the project. [The main blocker has been reported](https://discuss.pytorch.org/t/error-when-attempting-to-access-xla-tensor-shape/186214) to the PyTorch XLA team before the decision to pivot into GPUs was made.

In [5]:
# In order to try training using TPUs, uncomment the code below
# import torch_xla.core.xla_model as xm
# device = xm.xla_device()

# The following code sets the `device` to one of the GPUs
import torch
device = torch.device("cpu")
if torch.cuda.is_available():
  device = torch.device("cuda:0")
device

device(type='cuda', index=0)

# Defining the Problem

Emotion Recognition in Conversations (ERC) refers to the process of recognising and analysing emotions in interactive dialogues. It presents a unique set of challenges given the fact that the same words or phrases could convey different emotions depending on the context and flow of the conversation. This task is further complicated when applied in non-dyadic settings, where multiple participants engage in a dialogue. This makes ERC a complex problem within the realm of machine learning and artificial intelligence, where context modelling and emotional shifts among interlocutors are difficult to address accurately.

Despite its complexity, ERC has garnered significant interest owing to its vast applications in opinion mining over social media threads, chat history, and other online platforms. The ability to accurately discern emotions in conversations can have profound implications for various industries, making any advancements in this field potentially groundbreaking. However, given the sparsity of the solution space and the high variability in model architecture, ERC remains a largely unexplored area with many potential paths for future research and experimentation.

## Multi-Party Setting Challenge

Multi-party conversations present an inherent set of challenges when it comes to emotion recognition (ERC). In a dialogue involving multiple participants, the context, conversation flow, and emotional shifts become considerably more intricate to decipher. The utterances in multi-party dialogues can express a wide range of emotions based on the context, making the task of accurate emotion recognition more arduous. This problem of contextual modelling and accounting for emotion shifts among multiple interlocutors remains a significant challenge in ERC. Additionally, the complex dynamics of multi-party conversations and the interdependence of individual emotional states further complicate the task.

The multi-modality of ERC data also poses another layer of challenge in multi-party settings. As emotions can only be detected through human actions such as textual utterances, visual gestures, and acoustic signals in the absence of physiological indications, the need for effectively dealing with multi-modal data becomes crucial. While some models focus on exploring this multi-modality, others resort to using a single modality, usually textual, thereby ignoring valuable insights that could be gleaned from other modalities. Consequently, the architectural variations in the existing models, coupled with the largely unexplored solution space, underscore the daunting challenges of emotion recognition in multi-party settings.

## Multi-modality Challenge

The multifaceted challenge of Emotion Recognition in Conversations (ERC) is magnified by the multi-modal nature of the data involved. ERC data typically consists of multiple modalities, such as textual utterances, visual cues, and acoustic signals. Accurately detecting emotions from these varied sources is complex, as they can individually or collectively contribute to the overall emotional context. This complexity is amplified by the high-dimensionality of the data, particularly in video and audio modalities. This high-dimensional data is both difficult to investigate and expensive to compute, posing significant challenges in data handling, processing, and analysis.

Furthermore, the computation and training of multi-modal models are exceptionally resource-intensive. Each modality may require distinct computational approaches and algorithms for processing and analyzing the data, increasing the overall computational load. Training such models also necessitates substantial computational power and time, often leading to increased costs and resource allocation. These factors, combined with the diverse architectures proposed to model ERC, contribute to the intricate and computationally demanding nature of emotion recognition. Despite the high costs and complexity, the vast potential applications of accurate ERC underscore the importance of ongoing research in this challenging, yet highly rewarding, field of machine learning.

# Assembling the Dataset

This notebook works on the **Loaded** data (already transformed through the ETL process), and unzipped in the top section of the notebook.

Here is more information about the ETL process used from the `dev/modelling.ipynb`:
> This section of the document focuses on assembling the final dataset that will be used for training and evaluating machine learning models. This involves performing ETL (Extract, Transform, Load) operations on the raw MELD data[1, 2] to prepare it for modelling. ETL is a process used to extract data from various sources, transform it into a format that is suitable for analysis, and load it into a target database or data warehouse. In this section, the raw data is extracted from various sources, transformed into a format that can be used for modelling, and loaded into a Pandas DataFrame.
>
> Here the final dataset is assembled by combining the preprocessed text, audio, and visual features for each example. The `hlm12erc` library is used to load the preprocessed features for each example and combine them into a single DataFrame. This library was created specifically for this project to simplify the notebook code by abstracting the ETL complexity into a simple, well-tested library that could be reused and scheduled if needed. The library was designed with full unit-test coverage to ensure that the data is loaded and combined correctly. The resulting DataFrame contains the preprocessed features for each example, as well as the corresponding label, which will be used for training and evaluating the machine learning models.

In [6]:
from hlm12erc.training import MeldDataset

ds_sample = MeldDataset(dir_data / f"sample.csv")
ds_train  = MeldDataset(dir_data / f"train.csv")
ds_valid  = MeldDataset(dir_data / f"valid.csv")
ds_test   = MeldDataset(dir_data / f"test.csv")

In [7]:
ds_train.df

Unnamed: 0.1,Unnamed: 0,dialogue,sequence,speaker,x_text,x_visual,x_audio,label
0,0,0,0,Chandler,also I was the point person on my companys tr...,d-0-seq-0.png,d-0-seq-0.wav,neutral
1,1,0,1,The Interviewer,You mustve had your hands full.,d-0-seq-1.png,d-0-seq-1.wav,neutral
2,2,0,2,Chandler,That I did. That I did.,d-0-seq-2.png,d-0-seq-2.wav,neutral
3,3,0,3,The Interviewer,So lets talk a little bit about your duties.,d-0-seq-3.png,d-0-seq-3.wav,neutral
4,4,0,4,Chandler,My duties? All right.,d-0-seq-4.png,d-0-seq-4.wav,surprise
...,...,...,...,...,...,...,...,...
9984,9984,1038,13,Chandler,You or me?,d-1038-seq-13.png,d-1038-seq-13.wav,neutral
9985,9985,1038,14,Ross,"I got it. Uh, Joey, women don't have Adam's ap...",d-1038-seq-14.png,d-1038-seq-14.wav,neutral
9986,9986,1038,15,Joey,"You guys are messing with me, right?",d-1038-seq-15.png,d-1038-seq-15.wav,surprise
9987,9987,1038,16,All,Yeah.,d-1038-seq-16.png,d-1038-seq-16.wav,neutral


In [8]:
import os
import io
import base64
from IPython.display import display, HTML
from PIL import Image

df_sample = ds_train.df.groupby(["label"], group_keys=False).apply(lambda x: x.sample(min(len(x), 3)))
df_sample = df_sample.sort_values(["label"])

table_rows = []
for i, row in df_sample.iterrows():
    speaker_cell = f'<td>{row["speaker"]}</td>'
    text_cell = f'<td>{row["x_text"]}</td>'
    image_path = dir_data / row["x_visual"]
    with Image.open(image_path) as img:
        width, height = img.size
        crop_top = height // 2 - height // 10
        crop_bottom = height // 2 + height // 10
        img_cropped = img.crop((0, crop_top, width, crop_bottom))
        buffer = io.BytesIO()
        img_cropped.save(buffer, format="JPEG")
        image_data = base64.b64encode(buffer.getvalue()).decode()
    image_cell = f'<td><img src="data:image/jpeg;base64,{image_data}" width="100"></td>'
    audio_cell = f'<td><audio controls src="{os.path.join("data/", row["x_audio"])}" /></td>'
    label_cell = f'<td>{row["label"]}</td>'
    table_rows.append(f"<tr>{speaker_cell}{text_cell}{image_cell}{audio_cell}{label_cell}</tr>")

table_html = (
    "<table><tr><th>Speaker</th><th>Text</th><th>Image</th><th>Audio</th><th>Emotion</th></tr>"
    + "".join(table_rows)
    + "</table>"
)
display(HTML(table_html))

Speaker,Text,Image,Audio,Emotion
Phoebe,Fine! Then you tell Roger because he was really looking forward to this!,,,anger
Chandler,I just walked in the bathroom and saw Kathy naked! It was like torture!,,,anger
Phoebe,You said I was boring--Ohh!,,,anger
Monica,"Nobody wants to do it? All right, Ill do it myself.",,,disgust
Monica,"But I stand by my review, I know food and that wasnt it.",,,disgust
Phoebe and Rachel,Oh no.,,,disgust
Monica,I cant live like this! What are we gonna do? What are we gonna do?,,,fear
Monica,"Uh sorry, wrong number.",,,fear
Ross,"No Phoebe, dont look! You dont want to see whats under there!!",,,fear
Rachel,We won. We won!,,,joy


# Choosing the Metric of Success

The chosen metric of success for the **HLM12ERC** project, opposed to what was originally outlined in the design document, is **`F1 (Weighted)`**, in the form it's calculated by the `scikit-learn` library[3].

The reason for the choice is that, as exposed by the **Exploratory Data Analysis**, the **MELD Dataset** is considerably class-imbalanced. Blind assumption of the `neutral` class would result in $48%$ accuracy, which could erroneously be assumed to have statistical power (better than random), which would be roughly $1/7$ or $14\%$ accuracy.

This choice clearly aligns with the principles described by Francois Chollet in "Choosing the Metric of Success"[4], where the selection of a suitable metric is vital for effectively guiding the optimization of machine learning models and for a fair comparative analysis of different methods.

In [9]:
import inspect
from hlm12erc.training import ERCEvaluator
from IPython.display import Code

class_code = inspect.getsource(ERCEvaluator)
Code(class_code, language="python")

In [10]:
import inspect
from hlm12erc.training import ERCMetricCalculator
from IPython.display import Code

class_code = inspect.getsource(ERCMetricCalculator)
Code(class_code, language="python")

# Deciding on the Evaluation Protocol

The evaluation protocol for the **HLM12ERC** project adopts the **`Hold Out Test-set`** approach, a decision influenced by the structure of the MELD Dataset and the constraints imposed by its size and multimodal nature. This dataset comes pre-divided into three splits: `train`, `dev`, and `test`, which directly supports the implementation of the Hold Out approach.

The choice of this protocol is further justified due to the impracticability of using methods like K-Fold Cross Validation, stemming from the significant computational demands of the MELD dataset. In addition, the Hold Out Test-set approach serves as the evaluation standard for both individual components (Objectives 1 to 6) and the final model (Objective 7), ensuring consistent assessment throughout the development process.


In [11]:
import pandas as pd

df_stats = pd.DataFrame.from_dict(
    dict(train=len(ds_train),
         valid=len(ds_valid),
         test=len(ds_test)),
    orient="index",
    columns=["count"])

df_stats["percentage"] = df_stats.apply(
    lambda row: f'{round(100. * row["count"] / df_stats["count"].sum())}%',
    axis=1)

df_stats

Unnamed: 0,count,percentage
train,9989,73%
valid,1109,8%
test,2610,19%


# Preparing your Data

The data preparation process in this code involves three main stages: (a) ETL (Extract, Transform, Load), (b) DataSet Loading, and (c) Data Collation for model training.

* **ETL:** This stage simplifies the dataset by extracting the data from a Kaggle source, transforming it into a 1st Normal Form (1NF) CSV table format, and loading it into a destination folder or Google Cloud Storage bucket. The ETL process makes the data easier to consume by the training process. The logic can be found in the `hlm12erc.etl` module, and it can also be run with a command-line instruction, and orchestrated through the `hlm12erc.etl.ETL` class.

* **DataSet Loading:** This stage wraps the data using PyTorch Data Utility Classes to shape it appropriately for consumption by the model trainer. This step ensures that the data is organized and can be efficiently fed into the training process. The dataset class is defined at `hlm12erc.training.MeldDataset`.

* **Data Collation:** The Data Collator is responsible for creating batches of data suitable for model training and evaluation. It takes a list of MeldRecord instances and collates the data into a dictionary format with keys like `x_text`, `x_visual`, `x_audio`, and `y_true`. The collation involves converting text, visual, and audio data into appropriate formats and encoding the labels using ERCLabelEncoder, making the data ready for consumption by the PyTorch model's "forward" method during both training and inference. The collator class is defined at `hlm12erc.training.ERCDataCollator`, and can be observed below.

In [12]:
import inspect
from hlm12erc.training import MeldDataset
from IPython.display import Code

class_code = inspect.getsource(MeldDataset)
Code(class_code, language="python")

In [13]:
import inspect
from hlm12erc.training import ERCDataCollator
from IPython.display import Code

class_code = inspect.getsource(ERCDataCollator)
Code(class_code, language="python")

# Model Selection

This section takes us from a baseline model to a final model through a rigorous set of experiments set out by the **Project Design** document as objectives.

We initially attempt to establish a baseline model with some basic approaches to the representation of each modality of the data, which serves as a starting point and upon which we shall iterate.

Then, we utilize a two-step process for each different experiment to evaluate whether we can produce a better model than the ones previously devised. First, a "Scaling up" phase aims to develop a model capable of overfitting the data, allowing deeper insights into its learning capacity and identifying areas for improvement. Second, a "Regularizing" phase addresses overfitting concerns through hyperparameter tuning. This iterative optimization ultimately leads to the final model that fulfills the success criteria and delivers superior performance on the MELD Test Split.

## Monitoring (Weights & Biases)

In [7]:
%env WANDB_NOTEBOOK_NAME=dev/mlops.ipynb
%env WANDB_PROJECT=hlm12erc_v2

env: WANDB_NOTEBOOK_NAME=dev/mlops.ipynb
env: WANDB_PROJECT=hlm12erc_v2


In [8]:
import wandb
wandb.login()

[34m[1mwandb[0m: Currently logged in as: [33mhudsonmendes[0m. Use [1m`wandb login --relogin`[0m to force relogin


True

## Training & Evaluation Helpers

In [9]:
from typing import Tuple
from hlm12erc.modelling import ERCConfigLoader, ERCModel
from hlm12erc.training import ERCTrainer, MeldDataset

def train_model(
    config_name: str,
    datasets: Tuple[MeldDataset, MeldDataset],
    n_epochs: int,
    batch_size: int,
  ) -> Tuple[str, ERCModel]:
  model_config = ERCConfigLoader(dir_configs / f"{config_name}.yml").load()
  model_trainer = ERCTrainer(model_config)
  _, model_instance = model_trainer.train(
      data=datasets,
      n_epochs=n_epochs,
      batch_size=batch_size,
      save_to=(dir_target),
      device=device)
  return model_instance

In [10]:
from hlm12erc.modelling import ERCModel
from hlm12erc.training import ERCEvaluator, MeldDataset

def evaluate_model(
    model_instance: ERCModel,
    dataset: MeldDataset,
    batch_size: int,
  ):
  model_evaluator = ERCEvaluator(model_instance)
  model_evaluator.evaluate(
      dataset=dataset,
      batch_size=batch_size,
      device=device)

## Objective 1: Baseline Model

### 1.1. Baseline Model

In [None]:
model_baseline = train_model("baseline", datasets=(ds_train, ds_valid), n_epochs=10, batch_size=32)
evaluate_model(model_baseline, dataset=ds_test, batch_size=32)

Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.6.0


Epoch,Training Loss,Validation Loss,Acc,F1 Weighted
1,1.898222,1.891855,0.423805,0.252297
2,1.739663,1.743486,0.423805,0.252297
3,1.73745,1.741986,0.423805,0.252297
4,1.73712,1.741776,0.423805,0.252297
5,1.737018,1.741713,0.423805,0.252297
6,1.73697,1.741685,0.423805,0.252297
7,1.736943,1.741669,0.423805,0.252297
8,1.736929,1.741661,0.423805,0.252297
9,1.736921,1.741656,0.423805,0.252297
10,1.736918,1.741655,0.423805,0.252297


0,1
eval/acc,▁▁▁▁▁▁▁▁▁▁
eval/f1_weighted,▁▁▁▁▁▁▁▁▁▁
eval/loss,█▁▁▁▁▁▁▁▁▁
eval/runtime,██▂▁▁▁▁▂▂▅
eval/samples_per_second,▁▁▇████▇▇▄
eval/steps_per_second,▁▁▇████▇▇▄
train/acc,▂▁▅▄▄▇▇▅▅▆▅▅▅▆▆▅▆▆▅▆▇▆▅▆▆▇▇▆▅▆▆▇▆▆▅▅▆▄█▅
train/epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train/f1_weighted,▁▁▃▃▃▆▆▄▄▅▄▄▄▅▅▄▅▅▄▅▆▅▄▅▅▆▆▅▃▅▅▆▅▅▄▄▅▃█▃
train/global_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███

0,1
eval/acc,0.42381
eval/f1_weighted,0.2523
eval/loss,1.74165
eval/runtime,49.0266
eval/samples_per_second,22.62
eval/steps_per_second,0.714
train/acc,0.42857
train/epoch,10.0
train/f1_weighted,0.25714
train/global_step,3130.0


evaluating:   0%|          | 0/82 [00:00<?, ?it/s]

              precision    recall  f1-score   support

       anger       0.00      0.00      0.00       345
     disgust       0.00      0.00      0.00        68
        fear       0.00      0.00      0.00        50
         joy       0.00      0.00      0.00       402
     neutral       0.48      1.00      0.65      1256
     sadness       0.00      0.00      0.00       208
    surprise       0.00      0.00      0.00       281

    accuracy                           0.48      2610
   macro avg       0.07      0.14      0.09      2610
weighted avg       0.23      0.48      0.31      2610



### 1.2. Baseline (Each Modality)

In [None]:
model_baseline_t = train_model("baseline-t", datasets=(ds_train, ds_valid), n_epochs=10, batch_size=32)
evaluate_model(model_baseline_t, dataset=ds_test, batch_size=32)

Epoch,Training Loss,Validation Loss,Acc,F1 Weighted
1,1.945176,1.944353,0.019838,0.000772
2,1.766149,1.762022,0.423805,0.252297
3,1.741229,1.744317,0.423805,0.252297
4,1.738746,1.742753,0.423805,0.252297
5,1.73796,1.742271,0.423805,0.252297
6,1.73761,1.74206,0.423805,0.252297
7,1.737424,1.74195,0.423805,0.252297
8,1.737321,1.741889,0.423805,0.252297
9,1.737267,1.741857,0.423805,0.252297
10,1.737249,1.741847,0.423805,0.252297


0,1
eval/acc,▁█████████
eval/f1_weighted,▁█████████
eval/loss,█▂▁▁▁▁▁▁▁▁
eval/runtime,▁▅▃▂▃▂▁█▁▄
eval/samples_per_second,█▄▆▇▅▇█▁█▅
eval/steps_per_second,█▄▆▇▅▇█▁█▅
train/acc,▂▁▁▂▂▇▇▅▅▆▅▅▅▆▆▅▆▆▅▆▇▆▅▆▆▇▇▆▅▆▆▇▆▆▅▅▆▄█▅
train/epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train/f1_weighted,▁▁▁▁▂▆▆▄▄▅▄▄▄▅▅▄▅▅▄▅▆▅▄▅▅▆▆▅▃▅▅▆▅▅▄▄▅▃█▃
train/global_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███

0,1
eval/acc,0.42381
eval/f1_weighted,0.2523
eval/loss,1.74185
eval/runtime,42.0304
eval/samples_per_second,26.386
eval/steps_per_second,0.833
train/acc,0.42857
train/epoch,10.0
train/f1_weighted,0.25714
train/global_step,3130.0


evaluating:   0%|          | 0/82 [00:00<?, ?it/s]

              precision    recall  f1-score   support

       anger       0.00      0.00      0.00       345
     disgust       0.00      0.00      0.00        68
        fear       0.00      0.00      0.00        50
         joy       0.00      0.00      0.00       402
     neutral       0.48      1.00      0.65      1256
     sadness       0.00      0.00      0.00       208
    surprise       0.00      0.00      0.00       281

    accuracy                           0.48      2610
   macro avg       0.07      0.14      0.09      2610
weighted avg       0.23      0.48      0.31      2610



In [None]:
model_baseline_a = train_model("baseline-a", datasets=(ds_train, ds_valid), n_epochs=10, batch_size=32)
evaluate_model(model_baseline_a, dataset=ds_test, batch_size=32)

Epoch,Training Loss,Validation Loss,Acc,F1 Weighted
1,1.922579,1.914453,0.423805,0.252297
2,1.748713,1.748289,0.423805,0.252297
3,1.743376,1.743485,0.423805,0.252297
4,1.79279,1.755706,0.423805,0.252297
5,1.83791,1.772151,0.423805,0.252297
6,1.875091,1.793243,0.355275,0.261296
7,1.877402,1.795857,0.348963,0.260353
8,1.881457,1.799832,0.34716,0.261393
9,1.886291,1.802208,0.342651,0.259261
10,1.887488,1.802712,0.342651,0.259521


0,1
eval/acc,█████▂▂▁▁▁
eval/f1_weighted,▁▁▁▁▁█▇█▆▇
eval/loss,█▁▁▂▂▃▃▃▃▃
eval/runtime,▁▆▄▃▇▄█▄▆▄
eval/samples_per_second,█▃▅▆▂▅▁▅▃▅
eval/steps_per_second,█▃▅▆▂▅▁▅▃▅
train/acc,▁▃▃▃▃▆▆▄▄▅▄▄▄▅▆▄▅▆▄▅▆▆▅▆▆▇█▆▅▇▆▆▇▇▅▃▆▆█▁
train/epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train/f1_weighted,▁▃▂▁▁▄▅▃▃▃▃▃▂▃▄▃▃▄▃▃▅▅▄▅▅▆█▅▃▆▆▆▆▆▄▂▅▄█▁
train/global_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███

0,1
eval/acc,0.34265
eval/f1_weighted,0.25952
eval/loss,1.80271
eval/runtime,31.8435
eval/samples_per_second,34.827
eval/steps_per_second,1.099
train/acc,0.2381
train/epoch,10.0
train/f1_weighted,0.17857
train/global_step,3130.0


evaluating:   0%|          | 0/82 [00:00<?, ?it/s]

              precision    recall  f1-score   support

       anger       0.00      0.00      0.00       345
     disgust       0.00      0.00      0.00        68
        fear       0.00      0.00      0.00        50
         joy       0.00      0.00      0.00       402
     neutral       0.48      1.00      0.65      1256
     sadness       0.00      0.00      0.00       208
    surprise       0.00      0.00      0.00       281

    accuracy                           0.48      2610
   macro avg       0.07      0.14      0.09      2610
weighted avg       0.23      0.48      0.31      2610



In [None]:
model_baseline_v = train_model("baseline-v", datasets=(ds_train, ds_valid), n_epochs=10, batch_size=32)
evaluate_model(model_baseline_v, dataset=ds_test, batch_size=32)

Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.6.0


VBox(children=(Label(value='Waiting for wandb.init()...\r'), FloatProgress(value=0.016669082949980898, max=1.0…

Epoch,Training Loss,Validation Loss,Acc,F1 Weighted
1,1.893219,1.879474,0.423805,0.252297
2,1.740137,1.743571,0.423805,0.252297
3,1.73785,1.742188,0.423805,0.252297
4,1.737371,1.74191,0.423805,0.252297
5,1.73719,1.741807,0.423805,0.252297
6,1.737096,1.741754,0.423805,0.252297
7,1.737043,1.741724,0.423805,0.252297
8,1.737014,1.741708,0.423805,0.252297
9,1.736998,1.741699,0.423805,0.252297
10,1.736992,1.741695,0.423805,0.252297


0,1
eval/acc,▁▁▁▁▁▁▁▁▁▁
eval/f1_weighted,▁▁▁▁▁▁▁▁▁▁
eval/loss,█▁▁▁▁▁▁▁▁▁
eval/runtime,▁▁▃█▂▃▁▁▁▃
eval/samples_per_second,██▆▁▇▆▇██▆
eval/steps_per_second,██▆▁▇▆▇██▆
train/acc,▃▄▂▂▂▆▆▃▃▅▃▃▃▅▅▃▅▅▃▄▆▅▃▅▅▆▆▅▂▄▅▆▅▅▃▃▅▁█▂
train/epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train/f1_weighted,▃▄▂▁▁▅▆▃▃▄▃▃▂▄▅▃▄▅▃▄▆▄▃▅▄▆▅▅▂▄▅▆▄▄▃▂▄▁█▂
train/global_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███

0,1
eval/acc,0.42381
eval/f1_weighted,0.2523
eval/loss,1.7417
eval/runtime,36.2219
eval/samples_per_second,30.617
eval/steps_per_second,0.966
train/acc,0.42857
train/epoch,10.0
train/f1_weighted,0.25714
train/global_step,3130.0


evaluating:   0%|          | 0/82 [00:00<?, ?it/s]

              precision    recall  f1-score   support

       anger       0.00      0.00      0.00       345
     disgust       0.00      0.00      0.00        68
        fear       0.00      0.00      0.00        50
         joy       0.00      0.00      0.00       402
     neutral       0.48      1.00      0.65      1256
     sadness       0.00      0.00      0.00       208
    surprise       0.00      0.00      0.00       281

    accuracy                           0.48      2610
   macro avg       0.07      0.14      0.09      2610
weighted avg       0.23      0.48      0.31      2610



### 1.3. Dice Loss

In [None]:
model_losses_dice = train_model("losses-dice", datasets=(ds_train, ds_valid), n_epochs=10, batch_size=32)
evaluate_model(model_losses_dice, dataset=ds_test, batch_size=32)

.vector_cache/glove.6B.zip: 862MB [02:40, 5.37MB/s]                           
100%|█████████▉| 399999/400000 [00:12<00:00, 32621.25it/s]
Downloading: "https://github.com/pytorch/vision/zipball/v0.6.0" to /root/.cache/torch/hub/v0.6.0.zip
Downloading: "https://download.pytorch.org/models/resnet50-0676ba61.pth" to /root/.cache/torch/hub/checkpoints/resnet50-0676ba61.pth
100%|██████████| 97.8M/97.8M [00:00<00:00, 384MB/s]
[34m[1mwandb[0m: Currently logged in as: [33mhudsonmendes[0m. Use [1m`wandb login --relogin`[0m to force relogin


Epoch,Training Loss,Validation Loss,Acc,F1 Weighted
1,0.907771,0.869604,0.423805,0.252297
2,0.899907,0.859837,0.423805,0.252297
3,0.90177,0.857211,0.423805,0.252297
4,0.896549,0.853789,0.334536,0.257744
5,0.891821,0.852219,0.324617,0.254839
6,0.89726,0.8507,0.319206,0.251791
7,0.894778,0.851924,0.319206,0.260275
8,0.895022,0.847296,0.321912,0.275754
9,0.89978,0.84792,0.32642,0.290209
10,0.900979,0.848246,0.322813,0.287532


0,1
eval/acc,███▂▁▁▁▁▁▁
eval/f1_weighted,▁▁▁▂▂▁▃▅██
eval/loss,█▅▄▃▃▂▂▁▁▁
eval/runtime,▃█▂▆▂█▂▂▁▂
eval/samples_per_second,▅▁▆▃▇▁▆▇█▇
eval/steps_per_second,▅▁▆▃▇▁▆▇█▇
train/acc,▂▂▁▃▃▆▆▄▄▅▄▄▄▅▅▅▆▆▆▆▇▇▆▆▆▆█▆▅▇▆▆▇█▆▂▇▇█▁
train/epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train/f1_weighted,▂▁▁▂▂▄▅▃▃▄▃▃▃▄▄▄▆▅▅▅▆▆▆▆▆▆▇▆▄▇▆▆▇█▆▃▇▆█▂
train/global_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███

0,1
eval/acc,0.32281
eval/f1_weighted,0.28753
eval/loss,0.84825
eval/runtime,50.4048
eval/samples_per_second,22.002
eval/steps_per_second,0.694
train/acc,0.28571
train/epoch,10.0
train/f1_weighted,0.31214
train/global_step,3130.0


evaluating:   0%|          | 0/82 [00:00<?, ?it/s]

              precision    recall  f1-score   support

       anger       0.14      0.02      0.04       345
     disgust       0.00      0.00      0.00        68
        fear       0.00      0.00      0.00        50
         joy       0.15      0.34      0.21       402
     neutral       0.48      0.51      0.49      1256
     sadness       0.09      0.12      0.10       208
    surprise       0.00      0.00      0.00       281

    accuracy                           0.31      2610
   macro avg       0.12      0.14      0.12      2610
weighted avg       0.28      0.31      0.28      2610



In [None]:
model_losses_dice_lr5e2 = train_model("losses-dice-lr-5e-2", datasets=(ds_train, ds_valid), n_epochs=10, batch_size=32)
evaluate_model(model_losses_dice_lr5e2, dataset=ds_test, batch_size=32)

Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.6.0


Epoch,Training Loss,Validation Loss,Acc,F1 Weighted
1,0.914286,0.914953,0.423805,0.252297
2,0.898266,0.866536,0.423805,0.252297
3,0.901423,0.858686,0.423805,0.252297
4,0.904974,0.859673,0.423805,0.252297
5,0.899916,0.859182,0.423805,0.252297
6,0.899484,0.858373,0.423805,0.252297
7,0.901502,0.858808,0.423805,0.252297
8,0.899597,0.858658,0.423805,0.252297
9,0.8999,0.858653,0.423805,0.252297
10,0.899512,0.858747,0.423805,0.252297


0,1
eval/acc,▁▁▁▁▁▁▁▁▁▁
eval/f1_weighted,▁▁▁▁▁▁▁▁▁▁
eval/loss,█▂▁▁▁▁▁▁▁▁
eval/runtime,█▃▂▁▃▃▅▃▂▅
eval/samples_per_second,▁▆▆█▆▆▄▅▇▄
eval/steps_per_second,▁▆▆█▆▆▄▅▇▄
train/acc,▅▅▃▄▄▁▁▅▅▃▅▅▅▆▆▅▆▆▅▅▇▆▅▆▆▇▇▆▄▅▆▇▆▆▅▅▆▄█▄
train/epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train/f1_weighted,▄▅▄▃▃▁▁▄▅▃▄▄▄▅▅▄▅▅▄▅▆▅▄▅▅▆▆▅▃▅▅▆▅▅▄▄▅▃█▃
train/global_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███

0,1
eval/acc,0.42381
eval/f1_weighted,0.2523
eval/loss,0.85875
eval/runtime,45.7345
eval/samples_per_second,24.249
eval/steps_per_second,0.765
train/acc,0.42857
train/epoch,10.0
train/f1_weighted,0.25714
train/global_step,3130.0


evaluating:   0%|          | 0/82 [00:00<?, ?it/s]

              precision    recall  f1-score   support

       anger       0.00      0.00      0.00       345
     disgust       0.00      0.00      0.00        68
        fear       0.00      0.00      0.00        50
         joy       0.00      0.00      0.00       402
     neutral       0.48      1.00      0.65      1256
     sadness       0.00      0.00      0.00       208
    surprise       0.00      0.00      0.00       281

    accuracy                           0.48      2610
   macro avg       0.07      0.14      0.09      2610
weighted avg       0.23      0.48      0.31      2610



In [None]:
model_losses_dice_lr5e3 = train_model("losses-dice-lr-5e-3", datasets=(ds_train, ds_valid), n_epochs=10, batch_size=32)
evaluate_model(model_losses_dice_lr5e3, dataset=ds_test, batch_size=32)

Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.6.0


Epoch,Training Loss,Validation Loss,Acc,F1 Weighted
1,0.895654,0.848844,0.431019,0.283883
2,0.869833,0.812318,0.391344,0.338778
3,0.860379,0.805328,0.313796,0.321113
4,0.803581,0.795761,0.361587,0.351057
5,0.907463,0.79343,0.338142,0.343711
6,0.891712,0.808186,0.320108,0.322855
7,0.908527,0.793909,0.382326,0.361325
8,0.896331,0.796532,0.371506,0.356104
9,0.915591,0.800621,0.350766,0.340796
10,0.894924,0.798144,0.354373,0.345926


0,1
eval/acc,█▆▁▄▂▁▅▄▃▃
eval/f1_weighted,▁▆▄▇▆▅██▆▇
eval/loss,█▃▃▁▁▃▁▁▂▂
eval/runtime,▆▄▁▂▁▃▇█▄▂
eval/samples_per_second,▃▅█▆█▆▁▁▅▇
eval/steps_per_second,▃▅█▆█▆▂▁▅▇
train/acc,▅▅▄▄▃▇▅▄▅▅▆▇▄▅▅▆▅▄▆▅█▅▆▄▆▇▇▇▇▅▇▄▅▇▅▁▅▆█▁
train/epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train/f1_weighted,▃▃▂▃▂▆▅▄▄▅▆▆▄▅▅▅▅▄▅▆█▅▆▅▅▇▇▆▇▅▇▄▅▇▅▂▅▆▇▁
train/global_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███

0,1
eval/acc,0.35437
eval/f1_weighted,0.34593
eval/loss,0.79814
eval/runtime,50.6787
eval/samples_per_second,21.883
eval/steps_per_second,0.691
train/acc,0.2381
train/epoch,10.0
train/f1_weighted,0.2381
train/global_step,3130.0


evaluating:   0%|          | 0/82 [00:00<?, ?it/s]

              precision    recall  f1-score   support

       anger       0.28      0.29      0.28       345
     disgust       0.01      0.03      0.02        68
        fear       0.00      0.00      0.00        50
         joy       0.38      0.29      0.33       402
     neutral       0.63      0.57      0.60      1256
     sadness       0.15      0.14      0.15       208
    surprise       0.15      0.21      0.17       281

    accuracy                           0.39      2610
   macro avg       0.23      0.22      0.22      2610
weighted avg       0.43      0.39      0.41      2610



In [None]:
model_losses_dice_lr5e4 = train_model("losses-dice-lr-5e-4", datasets=(ds_train, ds_valid), n_epochs=10, batch_size=32)
evaluate_model(model_losses_dice_lr5e4, dataset=ds_test, batch_size=32)

Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.6.0


Epoch,Training Loss,Validation Loss,Acc,F1 Weighted
1,0.903366,0.858573,0.423805,0.252297
2,0.904166,0.855257,0.423805,0.252297
3,0.885047,0.837146,0.342651,0.291639
4,0.853514,0.833489,0.328224,0.288774
5,0.865139,0.835175,0.323715,0.298045
6,0.861287,0.835701,0.328224,0.308171
7,0.818554,0.832527,0.32101,0.303333
8,0.811861,0.833913,0.337241,0.310931
9,0.884989,0.833147,0.323715,0.307777
10,0.884433,0.832508,0.32101,0.304703


0,1
eval/acc,██▂▁▁▁▁▂▁▁
eval/f1_weighted,▁▁▆▅▆█▇██▇
eval/loss,█▇▂▁▂▂▁▁▁▁
eval/runtime,▃▇▆▅▁█▅▇▅▄
eval/samples_per_second,▆▂▃▄█▁▄▂▄▅
eval/steps_per_second,▆▂▃▃█▁▄▂▄▅
train/acc,▃▄▃▂▂▅▅▃▃▅▃▃▃▅▅▃▅▅▃▅▄▆▆▄▆█▆█▄▅█▃▆▇▆▂▆▆▇▁
train/epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train/f1_weighted,▂▂▁▁▁▄▄▂▂▅▃▃▃▅▄▃▅▄▃▅▅▆▆▅▆█▆▇▅▆█▃▆▇▆▂▆▆▇▁
train/global_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███

0,1
eval/acc,0.32101
eval/f1_weighted,0.3047
eval/loss,0.83251
eval/runtime,49.0375
eval/samples_per_second,22.615
eval/steps_per_second,0.714
train/acc,0.33333
train/epoch,10.0
train/f1_weighted,0.34394
train/global_step,3130.0


evaluating:   0%|          | 0/82 [00:00<?, ?it/s]

              precision    recall  f1-score   support

       anger       0.19      0.11      0.14       345
     disgust       0.00      0.00      0.00        68
        fear       0.00      0.00      0.00        50
         joy       0.16      0.24      0.19       402
     neutral       0.51      0.50      0.50      1256
     sadness       0.11      0.12      0.12       208
    surprise       0.11      0.14      0.13       281

    accuracy                           0.32      2610
   macro avg       0.16      0.16      0.15      2610
weighted avg       0.32      0.32      0.31      2610



### 1.4. Focal Loss

In [None]:
model_losses_focal = train_model("losses-focal", datasets=(ds_train, ds_valid), n_epochs=10, batch_size=32)
evaluate_model(model_losses_focal, dataset=ds_test, batch_size=32)

Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.6.0


Epoch,Training Loss,Validation Loss,Acc,F1 Weighted
1,13.525152,10.970795,0.036069,0.002511
2,13.081808,10.978519,0.032462,0.019288
3,13.068168,11.005193,0.06853,0.043196
4,13.069429,10.956643,0.090171,0.050272
5,15.478132,11.254326,0.130748,0.093717
6,16.493437,12.360599,0.199279,0.193326
7,19.638823,13.328895,0.254283,0.249974
8,22.993275,14.339983,0.274121,0.269881
9,22.43235,15.226344,0.290352,0.281061
10,23.251173,15.613028,0.291253,0.277788


0,1
eval/acc,▁▁▂▃▄▆▇███
eval/f1_weighted,▁▁▂▂▃▆▇███
eval/loss,▁▁▁▁▁▃▅▆▇█
eval/runtime,█▄▅▆▂▂▂▁▁▃
eval/samples_per_second,▁▅▄▃▇▇▇██▅
eval/steps_per_second,▁▅▄▃▇▇▇██▆
train/acc,▁▁▁▁▁▁▁▁▁▁▁▃▁▂▁▂▃▃▂▃▃▃▃▄▅▇▅▅▆▇▆▄▇▇█▃█▇█▂
train/epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train/f1_weighted,▁▁▁▁▁▁▁▁▁▁▁▂▁▂▁▂▂▂▂▃▂▃▃▄▆▇▅▆▆▇▆▄▇▇█▄███▂
train/global_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███

0,1
eval/acc,0.29125
eval/f1_weighted,0.27779
eval/loss,15.61303
eval/runtime,51.1879
eval/samples_per_second,21.665
eval/steps_per_second,0.684
train/acc,0.2381
train/epoch,10.0
train/f1_weighted,0.22556
train/global_step,3130.0


evaluating:   0%|          | 0/82 [00:00<?, ?it/s]

              precision    recall  f1-score   support

       anger       0.14      0.37      0.21       345
     disgust       0.03      0.32      0.05        68
        fear       0.02      0.12      0.03        50
         joy       0.00      0.00      0.00       402
     neutral       0.67      0.00      0.01      1256
     sadness       0.00      0.00      0.00       208
    surprise       0.13      0.23      0.16       281

    accuracy                           0.09      2610
   macro avg       0.14      0.15      0.06      2610
weighted avg       0.35      0.09      0.05      2610



In [None]:
model_losses_focal_lr5e2 = train_model("losses-focal-lr-5e-2", datasets=(ds_train, ds_valid), n_epochs=10, batch_size=32)
evaluate_model(model_losses_focal_lr5e2, dataset=ds_test, batch_size=32)

Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.6.0


Epoch,Training Loss,Validation Loss,Acc,F1 Weighted
1,10.036918,10.794436,0.115419,0.039537
2,12.344419,11.122893,0.354373,0.269154
3,12.005418,10.955873,0.146979,0.037669
4,14.193848,11.086184,0.146979,0.037669
5,12.224723,11.207065,0.146979,0.037669
6,12.545444,10.982155,0.137962,0.033452
7,13.056494,11.016578,0.146979,0.037669
8,12.176677,10.974984,0.146979,0.037669
9,12.576733,10.929255,0.146979,0.037669
10,12.401653,10.970663,0.146979,0.037669


0,1
eval/acc,▁█▂▂▂▂▂▂▂▂
eval/f1_weighted,▁█▁▁▁▁▁▁▁▁
eval/loss,▁▇▄▆█▄▅▄▃▄
eval/runtime,▁▄▆█▃▇▆▇▄▅
eval/samples_per_second,█▅▃▁▆▂▃▂▅▄
eval/steps_per_second,█▅▃▁▇▂▃▂▅▄
train/acc,▆▃▃▁▃▂▃▂▆▁▆▃▅▄▂▃▄▂▂▁█▇▂▂▂▁▃▃▃▂▁▄▃▂▁▁▂▂▃▃
train/epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train/f1_weighted,▅▂▂▁▂▂▃▂▅▁▅▂▄▃▁▂▂▁▁▁█▆▁▁▁▁▂▁▂▁▁▃▂▁▁▁▁▁▁▂
train/global_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███

0,1
eval/acc,0.14698
eval/f1_weighted,0.03767
eval/loss,10.97066
eval/runtime,45.2133
eval/samples_per_second,24.528
eval/steps_per_second,0.774
train/acc,0.0
train/epoch,10.0
train/f1_weighted,0.0
train/global_step,3130.0


evaluating:   0%|          | 0/82 [00:00<?, ?it/s]

              precision    recall  f1-score   support

       anger       0.19      0.11      0.14       345
     disgust       0.00      0.00      0.00        68
        fear       0.00      0.00      0.00        50
         joy       0.00      0.00      0.00       402
     neutral       0.00      0.00      0.00      1256
     sadness       0.08      0.96      0.15       208
    surprise       0.00      0.00      0.00       281

    accuracy                           0.09      2610
   macro avg       0.04      0.15      0.04      2610
weighted avg       0.03      0.09      0.03      2610



In [None]:
model_losses_focal_lr5e3 = train_model("losses-focal-lr-5e-3", datasets=(ds_train, ds_valid), n_epochs=10, batch_size=32)
evaluate_model(model_losses_focal_lr5e3, dataset=ds_test, batch_size=32)

Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.6.0


Epoch,Training Loss,Validation Loss,Acc,F1 Weighted
1,12.540477,10.924006,0.137962,0.033452
2,11.058533,10.987504,0.400361,0.285724
3,9.085086,11.10813,0.321912,0.321184
4,9.226305,12.391071,0.211903,0.233521
5,12.429557,14.101876,0.238052,0.261125
6,13.514296,16.277239,0.262399,0.280787
7,16.98991,20.08153,0.244364,0.264189
8,17.584007,23.941301,0.293057,0.295491
9,22.858047,25.801476,0.292155,0.29513
10,28.011106,28.222897,0.291253,0.294373


0,1
eval/acc,▁█▆▃▄▄▄▅▅▅
eval/f1_weighted,▁▇█▆▇▇▇▇▇▇
eval/loss,▁▁▁▂▂▃▅▆▇█
eval/runtime,▂▄▂▂▂▃▃█▁▃
eval/samples_per_second,▇▅▇▇▇▆▆▁█▆
eval/steps_per_second,▇▅▇▇▇▆▆▁█▆
train/acc,▁▅▁▂▂▂▂▂▂▂▄▄▄▄▄▄▅▄▃▃▄▃▄▄▆▆▆▅▅▇▇▄▇█▆▂▇▇▇▃
train/epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train/f1_weighted,▁▄▁▂▁▂▃▂▂▂▄▅▄▅▅▄▅▅▃▃▄▃▄▅▇▆▆▅▅▇█▄▇█▆▃█▇█▃
train/global_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███

0,1
eval/acc,0.29125
eval/f1_weighted,0.29437
eval/loss,28.2229
eval/runtime,50.0261
eval/samples_per_second,22.168
eval/steps_per_second,0.7
train/acc,0.38095
train/epoch,10.0
train/f1_weighted,0.40366
train/global_step,3130.0


evaluating:   0%|          | 0/82 [00:00<?, ?it/s]

              precision    recall  f1-score   support

       anger       0.13      1.00      0.23       345
     disgust       0.00      0.00      0.00        68
        fear       0.00      0.00      0.00        50
         joy       0.00      0.00      0.00       402
     neutral       0.00      0.00      0.00      1256
     sadness       0.00      0.00      0.00       208
    surprise       0.00      0.00      0.00       281

    accuracy                           0.13      2610
   macro avg       0.02      0.14      0.03      2610
weighted avg       0.02      0.13      0.03      2610



In [None]:
model_losses_focal_lr5e4 = train_model("losses-focal-lr-5e-4", datasets=(ds_train, ds_valid), n_epochs=10, batch_size=32)
evaluate_model(model_losses_focal_lr5e4, dataset=ds_test, batch_size=32)

Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.6.0


Epoch,Training Loss,Validation Loss,Acc,F1 Weighted
1,12.51335,10.962363,0.409378,0.266146
2,12.074313,10.955706,0.206492,0.193561
3,14.780609,11.444904,0.108206,0.080339
4,20.531313,13.715443,0.113616,0.086558
5,31.502148,17.775818,0.132552,0.125733
6,41.478516,22.350309,0.151488,0.159065
7,48.224628,26.420303,0.1578,0.147039
8,53.34853,28.94718,0.166817,0.171301
9,57.254246,31.384884,0.186655,0.190551
10,58.109959,32.381241,0.191163,0.195109


0,1
eval/acc,█▃▁▁▂▂▂▂▃▃
eval/f1_weighted,█▅▁▁▃▄▄▄▅▅
eval/loss,▁▁▁▂▃▅▆▇██
eval/runtime,▅▄▁▄▅▁▇█▃▂
eval/samples_per_second,▄▅█▅▄█▂▁▆▆
eval/steps_per_second,▄▅█▅▄█▂▁▆▇
train/acc,▁▂▃▃▂▁▂▁▄▃▂▃▃▂▃▃▃▄▃▄▅▅▆▅▆▅▆▆▆▇▅▁█▇▆▁▇█▇▃
train/epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train/f1_weighted,▁▁▃▂▂▁▁▁▃▃▂▃▃▂▃▃▃▅▃▄▅▅▆▆▇▆▆▆▆▇▅▂█▇▆▂▇█▇▂
train/global_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███

0,1
eval/acc,0.19116
eval/f1_weighted,0.19511
eval/loss,32.38124
eval/runtime,49.6665
eval/samples_per_second,22.329
eval/steps_per_second,0.705
train/acc,0.09524
train/epoch,10.0
train/f1_weighted,0.15584
train/global_step,3130.0


evaluating:   0%|          | 0/82 [00:00<?, ?it/s]

              precision    recall  f1-score   support

       anger       0.00      0.00      0.00       345
     disgust       0.03      0.29      0.05        68
        fear       0.00      0.00      0.00        50
         joy       0.00      0.00      0.00       402
     neutral       0.48      0.34      0.40      1256
     sadness       0.06      0.33      0.11       208
    surprise       0.30      0.01      0.02       281

    accuracy                           0.20      2610
   macro avg       0.13      0.14      0.08      2610
weighted avg       0.27      0.20      0.20      2610



## Objective 2: Advanced Text Embeddings

In [None]:
# running on A100, batch_size can be set to 16
model_adv_text_gpt2 = train_model("adv-text-gpt2", datasets=(ds_train, ds_valid), n_epochs=10, batch_size=16)
evaluate_model(model_adv_text_gpt2, dataset=ds_test, batch_size=16)

Downloading (…)lve/main/config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading: "https://github.com/pytorch/vision/zipball/v0.6.0" to /root/.cache/torch/hub/v0.6.0.zip
Downloading: "https://download.pytorch.org/models/resnet50-0676ba61.pth" to /root/.cache/torch/hub/checkpoints/resnet50-0676ba61.pth
100%|██████████| 97.8M/97.8M [00:00<00:00, 348MB/s]
[34m[1mwandb[0m: Currently logged in as: [33mhudsonmendes[0m. Use [1m`wandb login --relogin`[0m to force relogin


Epoch,Training Loss,Validation Loss,Acc,F1 Weighted
1,0.963317,0.873705,0.423805,0.252297
2,0.973982,0.846044,0.423805,0.252297
3,0.987432,0.812639,0.469793,0.345979
4,0.993528,0.794214,0.481515,0.361684
5,0.994609,0.781965,0.479711,0.393339
6,0.994075,0.778364,0.477908,0.411886
7,0.995444,0.771343,0.479711,0.422166
8,0.996437,0.769856,0.482417,0.414574
9,0.99744,0.769495,0.465284,0.416395
10,0.997624,0.770378,0.469793,0.416517


0,1
eval/acc,▁▁▆██▇██▆▆
eval/f1_weighted,▁▁▅▆▇█████
eval/loss,█▆▄▃▂▂▁▁▁▁
eval/runtime,▅▆▆█▆▅▆▅▅▁
eval/samples_per_second,▄▃▃▁▃▄▃▄▄█
eval/steps_per_second,▄▃▃▁▃▄▃▃▄█
train/acc,▂▂▁▄▄▂▅▄▅▄▄▃▅▆▄▅▅▇▆▅▇█▆▅▇▅▇▃▇▆▆▃▆▇▇▄▅▆▅▁
train/epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train/f1_weighted,▁▁▁▃▃▁▄▃▄▄▃▄▄▆▃▄▄▆▅▅▇█▆▅▇▅▆▃▇▆▆▃▆▇▇▅▅▇▅▁
train/global_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███

0,1
eval/acc,0.46979
eval/f1_weighted,0.41652
eval/loss,0.77038
eval/runtime,41.7368
eval/samples_per_second,26.571
eval/steps_per_second,1.677
train/acc,0.0
train/epoch,10.0
train/f1_weighted,0.0
train/global_step,6250.0


evaluating:   0%|          | 0/164 [00:00<?, ?it/s]

              precision    recall  f1-score   support

       anger       0.26      0.20      0.22       345
     disgust       0.00      0.00      0.00        68
        fear       0.00      0.00      0.00        50
         joy       0.28      0.36      0.31       402
     neutral       0.69      0.84      0.76      1256
     sadness       0.23      0.16      0.19       208
    surprise       0.16      0.09      0.11       281

    accuracy                           0.51      2610
   macro avg       0.23      0.24      0.23      2610
weighted avg       0.44      0.51      0.47      2610



In [None]:
# running on A100, batch_size can be set to 16
model_adv_text_gpt2_lr5e3 = train_model("adv-text-gpt2-lr-5e-3", datasets=(ds_train, ds_valid), n_epochs=10, batch_size=16)
evaluate_model(model_adv_text_gpt2_lr5e3, dataset=ds_test, batch_size=16)

Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.6.0


Epoch,Training Loss,Validation Loss,Acc,F1 Weighted
1,0.978964,0.859137,0.203787,0.1454
2,0.991325,0.859539,0.423805,0.252297
3,0.986247,0.856318,0.29486,0.235583
4,0.984167,0.855439,0.292155,0.234005
5,0.972435,0.854947,0.320108,0.249979
6,0.978623,0.856013,0.301172,0.239279
7,0.971509,0.857864,0.295762,0.235572
8,0.972503,0.858303,0.242561,0.195129
9,0.979785,0.85351,0.306583,0.2444
10,0.979132,0.85399,0.301172,0.240692


0,1
eval/acc,▁█▄▄▅▄▄▂▄▄
eval/f1_weighted,▁█▇▇█▇▇▄▇▇
eval/loss,██▄▃▃▄▆▇▁▂
eval/runtime,█▄▁▁▃▅▂▂▄▅
eval/samples_per_second,▁▅██▆▄▇▇▅▄
eval/steps_per_second,▁▅██▆▄▇▇▅▄
train/acc,▇▄▅▃▅▃▇▅▆█▅▄▂▆▃▅▅▆▅▇▅▇▄▅▅▇▅▅▅▆▅▂▅█▅▄▅▄▅▁
train/epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train/f1_weighted,▆▂▄▁▄▁▇▄▆█▄▄▂▇▂▄▄▆▆▆▅█▃▄▅▇▅▄▆▇▅▂▅█▄▃▅▄▄▁
train/global_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███

0,1
eval/acc,0.30117
eval/f1_weighted,0.24069
eval/loss,0.85399
eval/runtime,41.1488
eval/samples_per_second,26.951
eval/steps_per_second,1.701
train/acc,0.0
train/epoch,10.0
train/f1_weighted,0.0
train/global_step,6250.0


evaluating:   0%|          | 0/164 [00:00<?, ?it/s]

              precision    recall  f1-score   support

       anger       0.00      0.00      0.00       345
     disgust       0.00      0.00      0.00        68
        fear       0.00      0.00      0.00        50
         joy       0.16      0.50      0.25       402
     neutral       0.48      0.53      0.51      1256
     sadness       0.00      0.00      0.00       208
    surprise       0.00      0.00      0.00       281

    accuracy                           0.33      2610
   macro avg       0.09      0.15      0.11      2610
weighted avg       0.26      0.33      0.28      2610



In [None]:
# running on A100, batch_size can be set to 16
model_adv_text_gpt2_lr5e4 = train_model("adv-text-gpt2-lr-5e-4", datasets=(ds_train, ds_valid), n_epochs=10, batch_size=16)
evaluate_model(model_adv_text_gpt2, dataset=ds_test, batch_size=16)

Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.6.0


Epoch,Training Loss,Validation Loss,Acc,F1 Weighted
1,0.977357,0.837689,0.336339,0.274961
2,0.99347,0.823821,0.403968,0.314758
3,0.971005,0.835796,0.342651,0.266444
4,0.99882,0.817836,0.336339,0.309633
5,0.999824,0.826047,0.314698,0.292484
6,0.999968,0.8135,0.33183,0.308966
7,0.999919,0.818006,0.346258,0.314854
8,0.999994,0.818691,0.339946,0.308834
9,0.999989,0.815757,0.351668,0.317071
10,0.999994,0.816777,0.345356,0.311924


0,1
eval/acc,▃█▃▃▁▂▃▃▄▃
eval/f1_weighted,▂█▁▇▅▇█▇█▇
eval/loss,█▄▇▂▅▁▂▃▂▂
eval/runtime,█▄▄▂▅▄▃▁▅▅
eval/samples_per_second,▁▅▅▇▄▅▆█▄▄
eval/steps_per_second,▁▅▅▇▄▅▆█▄▄
train/acc,▁▄▅▄▅▃▇▇█▅▅▂▅▇▃▅▇▅▇▃▅▇▆▆▇▅▇▂▆▆▇▅▅▇▅▅▅▅▆▂
train/epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train/f1_weighted,▁▂▄▃▅▂▇▆█▅▄▃▅▇▃▄▇▆▇▃▅█▇▆▅▅▆▂▆▇▇▃▅█▆▄▅▅▇▃
train/global_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███

0,1
eval/acc,0.34536
eval/f1_weighted,0.31192
eval/loss,0.81678
eval/runtime,41.1834
eval/samples_per_second,26.928
eval/steps_per_second,1.7
train/acc,0.0
train/epoch,10.0
train/f1_weighted,0.0
train/global_step,6250.0


evaluating:   0%|          | 0/164 [00:00<?, ?it/s]

              precision    recall  f1-score   support

       anger       0.21      0.21      0.21       345
     disgust       0.00      0.00      0.00        68
        fear       0.00      0.00      0.00        50
         joy       0.23      0.57      0.33       402
     neutral       0.64      0.50      0.57      1256
     sadness       0.09      0.13      0.11       208
    surprise       0.00      0.00      0.00       281

    accuracy                           0.37      2610
   macro avg       0.17      0.20      0.17      2610
weighted avg       0.38      0.37      0.36      2610



## Objective 3: Avanced Audio Embeddings

In [11]:
# running on A100, batch_size can be set to 8
model_adv_audio_wav2vec2 = train_model("adv-audio-wav2vec2", datasets=(ds_train, ds_valid), n_epochs=10, batch_size=8)
evaluate_model(model_adv_audio_wav2vec2, dataset=ds_test, batch_size=8)

Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.6.0


Epoch,Training Loss,Validation Loss,Acc,F1 Weighted
1,0.986939,0.811979,0.481515,0.357821
2,0.998092,0.804129,0.460775,0.354995
3,0.99862,0.801053,0.468891,0.359176
4,0.997519,0.798805,0.474301,0.362626
5,0.992639,0.793285,0.483318,0.36592
6,0.990361,0.778179,0.494139,0.41456
7,0.992227,0.764909,0.520289,0.442336
8,0.985092,0.756298,0.505861,0.448793
9,0.99221,0.756387,0.505861,0.460268
10,0.991561,0.755599,0.504058,0.461688


0,1
eval/acc,▃▁▂▃▄▅█▆▆▆
eval/f1_weighted,▁▁▁▂▂▅▇▇██
eval/loss,█▇▇▆▆▄▂▁▁▁
eval/runtime,▄▅▆▅█▂▁▂▂▁
eval/samples_per_second,▅▄▃▄▁▇█▇▇█
eval/steps_per_second,▅▄▃▄▁▆█▇▇█
train/acc,▁▁▄▄▄▂▄▇█▇▅▁█▅█▂▄▇▇█▅▅▄▇▅▇▄▅▇▇█▂█▅▅▁▅▇▅█
train/epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇▇███
train/f1_weighted,▁▁▃▃▄▃▃▆▇▇▄▁▇▆▇▃▃▆▇▇▅▄▄▇▅▆▃▄▇▇▇▃▇▅▅▁▅▆▅█
train/global_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇▇███

0,1
eval/acc,0.50406
eval/f1_weighted,0.46169
eval/loss,0.7556
eval/runtime,49.2931
eval/samples_per_second,22.498
eval/steps_per_second,2.82
train/acc,0.0
train/epoch,10.0
train/f1_weighted,0.0
train/global_step,12490.0


evaluating:   0%|          | 0/327 [00:00<?, ?it/s]

              precision    recall  f1-score   support

       anger       0.23      0.12      0.16       345
     disgust       0.00      0.00      0.00        68
        fear       0.00      0.00      0.00        50
         joy       0.34      0.55      0.42       402
     neutral       0.70      0.81      0.75      1256
     sadness       0.29      0.17      0.22       208
    surprise       0.46      0.32      0.38       281

    accuracy                           0.54      2610
   macro avg       0.29      0.28      0.28      2610
weighted avg       0.49      0.54      0.51      2610



In [12]:
# running on A100, batch_size can be set to 8
model_adv_audio_wav2vec2_lr5e4 = train_model("adv-audio-wav2vec2-lr-5e-4", datasets=(ds_train, ds_valid), n_epochs=10, batch_size=8)
evaluate_model(model_adv_audio_wav2vec2_lr5e4, dataset=ds_test, batch_size=8)

Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.6.0


Epoch,Training Loss,Validation Loss,Acc,F1 Weighted
1,0.990441,0.843777,0.285843,0.235464
2,0.993049,0.806876,0.454463,0.345378
3,0.996557,0.807087,0.424707,0.331171
4,0.997711,0.796679,0.440938,0.340849
5,0.996734,0.793928,0.428314,0.353627
6,0.99823,0.796976,0.454463,0.365684
7,0.99884,0.7957,0.459874,0.372253
8,0.99593,0.780484,0.437331,0.392227
9,0.991771,0.78215,0.449053,0.389932
10,0.964646,0.77986,0.446348,0.392468


0,1
eval/acc,▁█▇▇▇██▇█▇
eval/f1_weighted,▁▆▅▆▆▇▇███
eval/loss,█▄▄▃▃▃▃▁▁▁
eval/runtime,▄▇██▇▇▄▅▁▁
eval/samples_per_second,▅▂▁▁▂▂▅▄██
eval/steps_per_second,▅▂▁▁▂▂▅▄██
train/acc,▁▁▃▃▂▂▃▆▅▆▃▁▆▅▆▂▂▂▇▇▃▃▃▆▆▇▃▃▃▇▂▁▆▅▆▁▅█▃▆
train/epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇▇███
train/f1_weighted,▁▁▃▃▃▃▃▅▅▅▃▁▅▅▅▂▂▃▇▆▃▃▃▅▅▆▃▃▃▇▂▂▅▄▅▁▅█▄▆
train/global_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇▇███

0,1
eval/acc,0.44635
eval/f1_weighted,0.39247
eval/loss,0.77986
eval/runtime,49.0326
eval/samples_per_second,22.618
eval/steps_per_second,2.835
train/acc,0.2
train/epoch,10.0
train/f1_weighted,0.33333
train/global_step,12490.0


evaluating:   0%|          | 0/327 [00:00<?, ?it/s]

              precision    recall  f1-score   support

       anger       0.00      0.00      0.00       345
     disgust       0.00      0.00      0.00        68
        fear       0.00      0.00      0.00        50
         joy       0.30      0.61      0.40       402
     neutral       0.67      0.73      0.70      1256
     sadness       0.14      0.14      0.14       208
    surprise       0.21      0.16      0.18       281

    accuracy                           0.47      2610
   macro avg       0.19      0.23      0.20      2610
weighted avg       0.40      0.47      0.43      2610



## Objective 4: Advanced Visual Embeddings

## Objective 5: Advanced Feature Fusion



## Objective 6: Contrastive Learning