# Mathematical Concepts Explained:

## 1. Data Preprocessing and Loading:
### DataLoader & Sampler:

The dataset is split into training, validation, and test sets using the SubsetRandomSampler. The training and validation sets are sampled with a certain fraction (default valid=0.1 means 10% for validation).

DataLoader handles batching and shuffling of the data.

**Mathematics:** No heavy mathematical computation here. This part simply arranges data for feeding into models.

**Documentation:** [PyTorch DataLoader](https://pytorch.org/docs/stable/data.html)

## 2. Neural Networks and Model Architectures:
### Recurrent Models (RNN, GRU, LSTM): 
These are models designed for sequence processing. LSTM and GRU are more advanced forms of RNNs designed to mitigate issues like vanishing gradients in long sequences.

### Graph Neural Networks (GNN): 
The DialogueGCNModel extends the traditional neural network model by adding graph-based learning, using graph convolutional layers to model relationships between sequential data in conversation (e.g., past and future utterances).

**Mathematics:**

- **RNN/GRU/LSTM:** These models use backpropagation through time (BPTT) for training, where gradients are calculated recursively over time steps.
- **GNN:** Graph Convolutional Networks (GCN) extend neural networks to graph-structured data. Each node in the graph is updated based on its neighbors.

**Documentation:**

- [RNN, LSTM, GRU in PyTorch](https://pytorch.org/docs/stable/nn.html#recurrent-layers)
- [Graph Convolution Networks (GCN)](https://pytorch-geometric.readthedocs.io/en/latest/)

## 3. Loss Function:
### Cross-Entropy Loss (NLLLoss):

The loss function used here is a Negative Log Likelihood Loss (NLLLoss), commonly used for multi-class classification tasks.

It calculates the logarithm of the predicted probabilities for the correct class and penalizes wrong predictions. This is a standard loss function for multi-class classification in neural networks.

### Masked Loss: 
For sequential data (such as text or speech), the MaskedNLLLoss is used to ignore padding tokens when calculating the loss.

**Mathematics:**

**NLLLoss:**
$$[
Loss(x, y) = - \log(p_y)
]$$
where \(p_y\) is the probability of the correct class \(y\) predicted by the model.

**Documentation:** [PyTorch NLLLoss](https://pytorch.org/docs/stable/nn.html#nllloss)

## 4. Optimization:
### Adam Optimizer:

The model's parameters are updated using the Adam optimizer, which is a variant of gradient descent that adapts the learning rate for each parameter based on first and second moments of the gradients.

**Mathematics:**

Adam optimization uses the following formula to update parameters $$(\theta)$$:
$$[
\theta_{t+1} = \theta_t - \eta \cdot \frac{v_t}{\sqrt{m_t} + \epsilon}
]$$
where:

- \(m_t\) is the first moment (mean of gradients),
- \(v_t\) is the second moment (variance of gradients),
- \(\eta\) is the learning rate, and
- \(\epsilon\) is a small constant to avoid division by zero.

**Documentation:** [Adam Optimizer in PyTorch](https://pytorch.org/docs/stable/optim.html#torch.optim.Adam)

## 5. Evaluation Metrics:
### Accuracy, F1-Score: 
These metrics are calculated to evaluate the performance of the model.

### Confusion Matrix: 
Used to assess how well the model classifies the different emotion labels.

### Weighted F1-Score: 
The F1-score is calculated using the weighted average, which accounts for class imbalances.

**Mathematics:**

- **Accuracy:**
$$[
Accuracy = \frac{\text{Number of Correct Predictions}}{\text{Total Number of Predictions}}
]$$
- **F1-Score:**
$$[
F1 \text{-score} = \frac{2 \cdot \text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}
]$$
where:

- **Precision** = $$(\frac{\text{True Positives}}{\text{True Positives} + \text{False Positives}})$$
- **Recall** = $$(\frac{\text{True Positives}}{\text{True Positives} + \text{False Negatives}})$$

**Documentation:**

- [Classification Report in scikit-learn](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html)
- [F1-Score Calculation](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html)

## 6. Model Checkpointing:
After each epoch, the model's parameters are saved using `torch.save`. This allows saving the state of the model at the best epoch, which can be loaded for later evaluation or deployment.

**Mathematics:** No direct mathematical operations in saving models, but this is critical for training stability and reusability.

**Documentation:** [Saving and Loading Models in PyTorch](https://pytorch.org/tutorials/beginner/saving_loading_models.html)

## Step-by-Step Breakdown:
### Data Preparation:
Dataset is loaded, split, and batched using PyTorch’s DataLoader and SubsetRandomSampler.

### Model Selection:
The base model (LSTM, GRU, or GCN) is selected based on the argument passed. Each model uses specific layers, such as LSTM cells, GRU cells, or graph convolutions.

### Training:
The model is trained over multiple epochs. During each epoch, gradients are calculated and parameters are updated using the Adam optimizer.

### Evaluation:
After training, the model is evaluated on the validation and test sets, and metrics like accuracy, F1-score, and confusion matrix are calculated.

### Logging:
Tensorboard is used for visualizing training and evaluation metrics.

### Model Checkpointing:
After each epoch, the model is saved to prevent data loss.

## Official Documentation:
- [PyTorch Overview](https://pytorch.org/docs/stable/index.html)
- [Adam Optimizer](https://pytorch.org/docs/stable/optim.html#torch.optim.Adam)
- [NLLLoss](https://pytorch.org/docs/stable/nn.html#nllloss)
- [DataLoader](https://pytorch.org/docs/stable/data.html)
- [Graph Neural Networks (PyTorch Geometric)](https://pytorch-geometric.readthedocs.io/en/latest/)
- [Scikit-learn Metrics](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics)

This explanation covers the key mathematical operations and how they are implemented in the provided code. For more details on each part, refer to the links above for the official documentation of each method or concept.
