# DialogueGCNModel Documentation

This document provides an overview of the `DialogueGCNModel` and its components, explaining each class and how they contribute to the model's functionality. The model is designed for emotion recognition in conversation using deep learning techniques like RNNs, GCNs, and attention mechanisms.

## Table of Contents
1. [Overview](#overview)
2. [Mathematical Background](#mathematical-background)
3. [Classes and Components](#classes-and-components)
    - [MaskedNLLLoss](#maskednllloss)
    - [SimpleAttention](#simpleattention)
    - [MatchingAttention](#matchingattention)
    - [DialogueRNNCell](#dialoguernncell)
    - [DialogueRNN](#dialoguernn)
    - [MaskedEdgeAttention](#maskededgeattention)
    - [GraphNetwork](#graphnetwork)
    - [DialogueGCNModel](#dialoguegcnmodel)
4. [Usage](#usage)
5. [References](#references)

## Overview

The `DialogueGCNModel` leverages a combination of Graph Neural Networks (GNNs), Recurrent Neural Networks (RNNs), and Attention mechanisms to model dialogues for emotion recognition. The model processes conversational data to predict the emotional state of the speakers at each step.

---

## Mathematical Background

### 1. **Recurrent Neural Networks (RNNs)**

RNNs are used to model sequences by maintaining a hidden state at each time step that captures information from previous inputs.

Mathematically, at each time step $$( t )$$, the hidden state $$( h_t )$$ is updated as follows:

$$
h_t = f(W_{xh} x_t + W_{hh} h_{t-1} + b_h)
$$

Where:
- \( h_t \) is the hidden state at time \( t \),
- \( x_t \) is the input at time \( t \),
- \( W_{xh}, W_{hh} \) are weights, and
- \( f \) is the activation function (usually tanh or ReLU).

RNNs are widely used for processing sequential data like dialogue because they capture temporal dependencies in a sequence.

Official Link: [RNNs Overview](https://colah.github.io/posts/2015-08-Understanding-LSTMs/)

### 2. **Graph Convolutional Networks (GCNs)**

GCNs apply convolutions on graph-structured data. The graph structure allows the model to capture dependencies between entities, such as words or tokens in dialogue.

The graph convolution operation can be defined as:

$$
h' = \sigma( \hat{A} X W )
$$

Where:
- \( hat{A} \) is the normalized adjacency matrix with added self-connections,
- \( X \) is the feature matrix of nodes,
- \( W \) is the learnable weight matrix, and
- \( sigma \) is a nonlinear activation function (e.g., ReLU).

GCNs allow the model to use information from neighboring nodes, which is essential for dialogue context modeling.

Official Link: [GCN Paper](https://arxiv.org/abs/1609.02907)

### 3. **Attention Mechanisms**

Attention mechanisms allow models to focus on different parts of the input sequence at each step, which is crucial for tasks like translation and dialogue modeling.

The basic attention mechanism computes a weighted sum of input vectors based on learned attention scores:

$$
\text{Attention}(Q, K, V) = \text{softmax}( \frac{Q K^T}{\sqrt{d_k}} ) V
$$

Where:
- \( Q \) is the query,
- \( K \) is the key,
- \( V \) is the value, and
- \( d_k \) is the dimension of the key vectors.

Attention helps the model selectively focus on relevant tokens in the dialogue.

Official Link: [Attention is All You Need](https://arxiv.org/abs/1706.03762)

---

## Classes and Components

### MaskedNLLLoss

`MaskedNLLLoss` is a custom loss function that computes the negative log-likelihood loss while considering a mask to ignore certain tokens in the input sequence.

#### Mathematical Formulation:
The masked negative log-likelihood loss is calculated as:

$$
\mathcal{L}_{NLL} = - \sum_{i=1}^{N} \mathbb{I}_{mask_i} \log p(y_i | x)
$$

Where: $$  \mathbb{I}_{mask_i} $$ is the mask indicator,
- \( p(y_i | x) \) is the probability of the true label \( y_i \) given input \( x \),
- \( N \) is the number of tokens.

### SimpleAttention

`SimpleAttention` applies a basic attention mechanism to the input sequence.

#### Mathematical Formulation:
For a sequence of inputs \( X \), the attention score is computed as:

$$
\text{Attention}(X) = \text{softmax}(W X)
$$

Where \( W \) is a learnable weight matrix.

### MatchingAttention

`MatchingAttention` is a more complex attention mechanism that computes attention between two sequences using different attention types.

#### Mathematical Formulation:
The attention weight is computed using the similarity between the query \( Q \) and key \( K \) vectors:

$$
\text{Attention}(Q, K) = \frac{\exp(Q K^T)}{\sum_{i=1}^{N} \exp(Q K_i^T)}
$$

### DialogueRNNCell

`DialogueRNNCell` defines the core RNN cell used for processing each step in the dialogue. It incorporates multiple GRU cells and attention mechanisms.

#### Mathematical Formulation:
The GRU (Gated Recurrent Unit) update rule is given by:

$$
z_t = \sigma(W_z x_t + U_z h_{t-1} + b_z)
$$
$$
r_t = \sigma(W_r x_t + U_r h_{t-1} + b_r)
$$
$$
\tilde{h}_t = \tanh(W_h x_t + U_h (r_t \circ h_{t-1}) + b_h)
$$
$$
h_t = (1 - z_t) \circ h_{t-1} + z_t \circ \tilde{h}_t
$$

Where \( z_t \) is the update gate, \( r_t \) is the reset gate, and $$ ( \circ ) $$ denotes element-wise multiplication.

### DialogueRNN

`DialogueRNN` utilizes multiple `DialogueRNNCell` layers to process an entire dialogue sequence.

### MaskedEdgeAttention

`MaskedEdgeAttention` applies an attention mechanism to a graph's edges, considering specific relations between dialogue tokens.

#### Mathematical Formulation:
The edge attention mechanism is defined as:

$$
\alpha_{ij} = \text{softmax}(W \cdot (h_i || h_j))
$$

Where \( h_i \) and \( h_j \) are the node representations, \( || \) denotes concatenation, and \( W \) is a learnable weight matrix.

### GraphNetwork

`GraphNetwork` combines GCNs and attention mechanisms to process the dialogue graph.

#### Mathematical Formulation:
The graph convolution operation is applied to the nodes and edges as follows:

$$
h' = \sigma( \hat{A} X W)
$$

Where $$ ( \hat{A} ) $$ is the normalized adjacency matrix, and $$ ( X )$$ is the node feature matrix.

---

## DialogueGCNModel

`DialogueGCNModel` combines the various components (RNN, GCN, attention) for emotion recognition in conversations.

### Mathematical Formulation:

Given input sequence $$( X )$$, edge information $$( A )$$, and attention weights $$( \alpha )$$, the output of the model is computed as:

$$
y = \text{softmax}(W_{out} h_{\text{final}})
$$

Where $$ ( h_{\text{final}} )$$ is the final hidden state after processing through the graph and attention layers.

## Usage

To use the `DialogueGCNModel`, you will need to instantiate it and provide the required inputs, such as features, masks, and edge data.

```python
# Example usage
model = DialogueGCNModel(base_model='DialogRNN', D_m=256, D_g=128, D_p=64, D_e=32, D_h=64, D_a=100, 
                         graph_hidden_size=128, n_speakers=2, max_seq_len=50, window_past=5, window_future=5)
output = model(features, edge_index, edge_norm, edge_type, seq_lengths, umask, nodal_attn=True, avec=False) 
