# CM3070, Final Project

```
University of London
BSc Computer Science
CM3070, Final Project
Hudson Leonardo MENDES
hlm12@student.london.ac.uk
```


## 1. Introduction


### 1.1. Domain-specific Area


### 1.2. Dataset

**Multimodal EmotionLines Dataset(MELD)**[2, 3]

> Multimodal EmotionLines Dataset (MELD) has been created by enhancing and extending EmotionLines dataset. MELD contains the same dialogue instances available in EmotionLines, but it also encompasses audio and visual modality along with text. MELD has more than 1400 dialogues and 13000 utterances from Friends TV series. Multiple speakers participated in the dialogues. Each utterance in a dialogue has been labeled by any of these seven emotions -- Anger, Disgust, Sadness, Joy, Neutral, Surprise and Fear. MELD also has sentiment (positive, negative and neutral) annotation for each utterance.
> (Hakim, 2021)


### 1.3. Objective


## 2. Environment


### 2.1. Dependencies


In [1]:
!cat ../setup.cfg

[metadata]
name = hlm12erc
version = attr: hlm12erc.VERSION
author = Hudson Mendes
author_email = hlm12@student.london.ac.uk
description = Final Project from University of London
long_description = file: README.md, LICENSE
keywords = university-of-london
license = copyright

[options]
zip_safe = False
include_package_data = True
packages = find:
package_dir =
    =src
python_requires = >=3.10
install_requires =
    torch>=2.0.1
    transformers>=4.30.2

[options.package_data]

[options.extras_require]
# development
dev =
    pre-commit>=3.3.3
    black[jupyter]>=23.7.0
    isort>=5.12.0

test =
    pytest>=7.4.0

# mlops
etl =
    kaggle>=1.5.13
    tqdm>=4.65.0
    pandas>=2.0.1
    google-cloud-storage>=2.10.0
    moviepy>=1.0.3
    Pillow>=10.0.0

modeling =
    # no dependencies yet

training =
    # no dependencies yet

serving =
    fire>=0.5.0

all =
    %(dev)s
    %(test)s
    %(etl)s
    %(modeling)s
    %(training)s
    %(serving)s

[options.packages.find]
include =
    ./sr

In [2]:
%load_ext autoreload
%autoreload 2

### Imports

In [6]:
import torch
import torchvision

from PIL import Image

import pandas as pd

### Logging

In [3]:
import logging

logging.basicConfig(level=logging.INFO)


### Paths & Locations

In [2]:
import pathlib

dir_data = pathlib.Path("../data")

## 3. Exploratory Data Analysis


### 3.1. Data Extraction, Transformation & Loading


In [5]:
%%capture
%pip install -e '.[etl]'

In [None]:
from hlm12erc.etl import ETL, KaggleDataset

ETL(
    dataset=KaggleDataset(
        owner="zaber666",
        name="meld-dataset",
        subdir="MELD-RAW/MELD.Raw",
    )
).into(
    uri_or_folderpath=dir_data,
    force=False,
)

In [None]:
!ls {str(dir_data)}

In [3]:
df_train = pd.read_csv(dir_data / "train.csv")
df_train

Unnamed: 0.1,Unnamed: 0,dialogue,sequence,x_text,x_visual,x_audio,label
0,0,0,0,also I was the point person on my companys tr...,d-0-seq-0.png,d-0-seq-0.wav,neutral
1,1,0,1,You mustve had your hands full.,d-0-seq-1.png,d-0-seq-1.wav,neutral
2,2,0,2,That I did. That I did.,d-0-seq-2.png,d-0-seq-2.wav,neutral
3,3,0,3,So lets talk a little bit about your duties.,d-0-seq-3.png,d-0-seq-3.wav,neutral
4,4,0,4,My duties? All right.,d-0-seq-4.png,d-0-seq-4.wav,surprise
...,...,...,...,...,...,...,...
9984,9984,1038,13,You or me?,d-1038-seq-13.png,d-1038-seq-13.wav,neutral
9985,9985,1038,14,"I got it. Uh, Joey, women don't have Adam's ap...",d-1038-seq-14.png,d-1038-seq-14.wav,neutral
9986,9986,1038,15,"You guys are messing with me, right?",d-1038-seq-15.png,d-1038-seq-15.wav,surprise
9987,9987,1038,16,Yeah.,d-1038-seq-16.png,d-1038-seq-16.wav,neutral


### 3.2. Statistical Analysis


#### 3.2.1. Measures of Spread


In [9]:
from typing import Tuple
from tqdm import tqdm


def calculate_measures_of_spread(filenames: pd.Series) -> Tuple[torch.Tensor, torch.Tensor]:
    """
    Calculate the mean and standard deviation of the images in the training set.
    This is required for image normalisation during training & inference.

    :param filenames: The filenames of the images in the training set.
    :return: The mean and standard deviation of the images in the training set.
    """
    mean = torch.zeros(3)
    std = torch.zeros(3)
    transform = torchvision.transforms.ToTensor()
    for filename in tqdm(filenames, desc="measures of spread"):
        filepath = dir_data / filename
        tensor = transform(Image.open(filepath))
        mean += tensor.mean(dim=(1, 2))
        std += tensor.std(dim=(1, 2))
    mean /= len(df_train.x_visual)
    std /= len(df_train.x_visual)
    return mean, std


calculate_measures_of_spread(filenames=df_train.x_visual)

measures of spread:   0%|          | 0/9989 [00:00<?, ?it/s]

measures of spread: 100%|██████████| 9989/9989 [13:31<00:00, 12.30it/s]


(tensor([0.2706, 0.2010, 0.1914]), tensor([0.1857, 0.1608, 0.1667]))

#### 3.2.2. Types of Distribution


### 3.3. Data Visualization


#### 3.3.1. Word Cloud & Frequency


#### 3.3.2. Image Grid & Scatter Plot


#### 3.3.3. Audio Feature Plots


## 4. Machine Learning

**Universal Machine-learning Workflow**[1]


### 4.1. Problem Definition


### 4.2. Measure of Success


### 4.3. Evaluation Protocol


### 4.4. Data Preparation


### 4.5. Baseline Model


In [None]:
from hlm12erc.modelling import ERCModel

### 4.6. Overfitting Model


### 4.7. Model Tuning


## 5. Results


## 6. Conclusions


## References

[1] Chollet, François. Deep Learning with Python. Manning, 2017.

[2] Poria, Soujanya, et al. ‘MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations’. ArXiv [Cs.CL], 2019, http://arxiv.org/abs/1810.02508. arXiv.

[3] Chen, Sheng-Yeh, et al. ‘EmotionLines: An Emotion Corpus of Multi-Party Conversations’. ArXiv [Cs.CL], 2018, http://arxiv.org/abs/1802.08379. arXiv.

[4] Su, Lin, et al. ‘GEM: A General Evaluation Benchmark for Multimodal Tasks’. ArXiv [Cs.CL], 2021, http://arxiv.org/abs/2106.09889. arXiv.
