# Introdution to Machine Learning - Course Project Report

Group members:
   - Grzegorz Prasek
   - Jakub Kindracki
   - Mykhailo Shamrai
   - Mateusz Mikiciuk
   - Ernest Mołczan

In this report we will describe our implementation of CNN supposed to classify users allowed to the system and users not allowed (binary classification).

## Table of contents:
1. Dataset
2. Exploratory Data Analysis
3. Preparing audio files for generating spectrograms
3. Generating spectrograms
4. Classifying spectrograms for train, test and validation datasets
5. Model
6. Training loop
7. [EXTRA] **interpretability** - visualizing the behavior and function of individual cnn layers and using if for data exploration
8. [EXTRA] **uncertainty** - using monte carlo dropout to estimate classification confidence. Comparing dropout to an ensemble of CNN networks.
9. [EXTRA] **parameter space** examining how much individual layers of the network change during training. Investigating their re-initialization robustness.

# Dataset

## 1.1 Introduction to the Dataset
The project is based on the DAPS (Device and Produced Speech) dataset, which was specifically designed for speech processing and analysis research. The primary goal of this dataset is to provide high-quality speech recordings that can be utilized in applications such as speech recognition, speaker classification, and acoustic analysis.

The DAPS dataset was chosen as the primary data source due to its following characteristics:

- **Data Quality**: The recordings are clean and diverse, enabling precise testing of models under both laboratory and simulated conditions.
- **Speaker Diversity**: The dataset includes recordings from 20 distinct speakers, divided into two classes:
  - **Class 1 (Acceptable individuals)**: Includes recordings from speakers F1, F7, F8, M3, M6, and M8.
  - **Class 0 (Unacceptable individuals)**: Includes recordings from the remaining 14 speakers.
- **Alignment with Project Requirements**: The dataset provides recordings that can be easily transformed into spectrograms, which are essential for the CNN-based approach employed in this project.

---

## 1.2 Dataset Characteristics
Each recording in the DAPS dataset is available as a `.wav` file and exhibits the following features:

- **Standard Sampling Format**: All recordings are sampled at 16 kHz, which is sufficient for most speech processing applications.
- **Variety in Recording Lengths**: The recordings vary in duration, necessitating preprocessing to standardize the samples for comparability.
- **Natural and Artificial Noise**: The dataset includes samples with varying levels of noise, allowing for robustness testing of the model against disturbances.

Additionally, the DAPS dataset was selected due to its accessibility and clear licensing terms, which permit its legal use for educational and research purposes.

---

## 1.3 Data Preparation
To effectively utilize the DAPS dataset in the project, several key data preparation steps were undertaken:

### a) Data Cleaning
The data cleaning process aimed to remove samples that could negatively impact model performance. The following tasks were performed:

- **Duplicate Elimination**: Redundant recordings were removed to prevent overrepresentation of certain samples in the training set.
- **Silence Removal**: Segments containing silence were identified and eliminated to improve model efficiency.

### b) Data Splitting
The dataset was split into three subsets:

- **Training Set**: 70% of the data, used for model training.
- **Validation Set**: 15% of the data, used for model evaluation during training.
- **Test Set**: 15% of the data, used for final model evaluation.

The split was performed to ensure no overlap between subsets, preventing data leakage, i.e., no fragments of the same recording were included in both the training and test sets.

### c) Data Augmentation
To increase data diversity and enhance the model's robustness against noise, the following augmentation techniques were applied:

- **Adding Background Noise**: Artificial noise of varying intensities was introduced to simulate real-world acoustic conditions.
- **Pitch Shifting**: The pitch of recordings was altered to increase speaker diversity.
- **Trimming Recordings**: Samples were cropped to a fixed length to ensure consistency across input data.

---

## 1.4 Exceptional Cases in the Data
During data analysis, certain samples were identified as particularly challenging for classification:

- **Low-Volume Recordings**: Required signal amplification to enhance quality.
- **Samples with Significant Background Noise**: These were leveraged to evaluate the model's noise resistance.
- **Acoustically Similar Speakers**: These samples demanded special attention during model training.

---

## 1.5 Challenges and Solutions
Several challenges were encountered while working with the data, which were addressed as follows:

### Class Imbalance
- **Problem**: Class 1 was underrepresented, with only six speakers compared to 14 in Class 0.
- **Solution**: Data augmentation techniques were used to increase the number of samples for Class 1.

### Impact of Noise
- **Problem**: High levels of noise in some recordings negatively affected classification performance.
- **Solution**: A noise reduction process was applied to the audio files, and augmentation with various noise levels was employed to improve robustness.

---

## 1.6 Conclusion
The prepared and processed dataset provided a solid foundation for training and testing the speech recognition model. The preprocessing steps enabled the identification and resolution of potential issues, such as the heterogeneity in recording quality. The final dataset is diverse, well-balanced, and optimized for use in spectrogram-based models.







# Monte Carlo Dropout for Estimating Classification Confidence: Comparison with CNN Ensembles

---

## 1. Introduction

In deep learning classification tasks, the confidence of a model is a critical indicator of the reliability of its predictions. Monte Carlo Dropout (MC Dropout) is an effective technique for estimating model uncertainty by leveraging dropout layers during the inference phase. This method involves multiple passes over the same input, generating probabilistic outputs. The mean and standard deviation of these outputs provide insights into the model's confidence.

This section presents a detailed analysis of MC Dropout, comparing it with an ensemble of CNN models, which requires training multiple independent networks. Additionally, we explore how the number of Monte Carlo samples affects the stability and variance of predictions, using visualizations.

---

## 2. Experiment with Monte Carlo Dropout

### 2.1 Experiment Setup

- **Model**: A trained CNN model with an active dropout layer in the fully connected layer (FC1) with a dropout probability of 50% (`p=0.5`).
- **Input Data**: Test spectrograms representing two classes.
- **Procedure**:
  1. Dropout was activated during the test phase (`model.train()`).
  2. `n` predictions were performed for each sample using different numbers of Monte Carlo samples: 2, 20, and 50.
  3. The mean and standard deviation of predictions were calculated for each class.

---

## 3. Python Function for Monte Carlo Dropout Predictions

The following function implements MC Dropout, enabling multiple forward passes through the model to generate predictions with uncertainty estimates:

```python
import numpy as np

def mc_dropout_predictions(model, data_loader, num_samples, device):
    model.train()  # Activate dropout
    all_predictions = []

    with torch.no_grad():
        for batch_idx, (inputs, _) in enumerate(data_loader):
            if batch_idx == 254:
                break
            inputs = inputs.to(device)
            print(f"Processing batch {batch_idx + 1}...")  # Batch info

            # Perform multiple predictions with active dropout
            predictions = []
            for sample_idx in range(num_samples):
                outputs = torch.softmax(model(inputs), dim=1)
                print(f"Sample {sample_idx + 1}: outputs shape = {outputs.shape}")
                predictions.append(outputs.cpu().numpy())

            predictions = np.array(predictions)
            print(f"Batch {batch_idx + 1}: predictions shape = {predictions.shape}")  # Batch results shape
            all_predictions.append(predictions)

    return all_predictions
```

This function:
- Activates dropout layers during inference to introduce variability.
- Processes each batch of data to generate `num_samples` predictions per sample.
- Computes predictions as probability distributions using `torch.softmax`.
- Logs the batch and prediction details for debugging purposes.

---

## 4. Results and Visualizations

### 4.1 Triangular Plots (2 Monte Carlo Samples)

The triangular plots below illustrate the relationship between the mean predicted probability (X-axis) and the standard deviation (Y-axis) for two classes (Class 0 and Class 1). With only 2 Monte Carlo samples, the results exhibit significant variance, resulting in triangular-shaped distributions:

- **Class 0**: Variance is highest for mean probabilities near 0.5.
- **Class 1**: Similar to Class 0, the highest uncertainty is observed around the midpoint of the probability range.

![Title of Image](chapter_9/0416c84c-c4a0-4ecf-a146-3fc8217526ba.jpg)
![Title of Image](chapter_9/55badd5f-e8c5-4226-ac4c-4ce84c1093ba.jpg)
### 4.2 Plots for 20 Monte Carlo Samples

As the number of Monte Carlo samples increases to 20, the plots become more compact:

- **Class 0**: Standard deviation significantly decreases, particularly for extreme mean probabilities (close to 0 or 1).
- **Class 1**: Results stabilize, providing better confidence estimation.

![Title of Image](chapter_9/ff198697-8f3a-4ee4-8ab9-fe73633bf368.jpg)
![Title of Image](chapter_9/1170b8d3-81f7-49c9-8349-88eeaf214fee.jpg)

### 4.3 Plots for 50 Monte Carlo Samples

With 50 Monte Carlo samples:

- **Class 0 and Class 1**: Variance is minimized, and results become highly stable, allowing clear differentiation between confident and uncertain predictions. The shapes of the plots resemble more parabolic distributions.

![Title of Image](chapter_9/5ae9ade2-5d78-461f-8018-2f9d1440a369.jpg)
![Title of Image](chapter_9/download.jpg)

### 4.4 Histograms

The histograms below present the distribution of mean probabilities for both classes:

- **With fewer samples (e.g., 2)**: The distributions are less concentrated, indicating greater spread in predictions.
- **With more samples (e.g., 20, 50)**: The distributions converge near values close to 0 or 1, signifying higher confidence for most samples.

![Title of Image](chapter_9/1d2ac6ee-f0a5-410b-9a60-b82e68539b95.jpg)
![Title of Image](chapter_9/7db013c0-d410-40ae-b2a9-364aa612315b.jpg)


---

## 5. Comparison with CNN Ensembles

### 5.1 Ensemble Setup

To compare MC Dropout with ensembles, CNN models were trained, and their predictions were averaged to compute mean probabilities and variance. The results were then compared with MC Dropout at 50 samples.

### 5.2 Observations

1. **Computational Complexity**:
   - MC Dropout is significantly more efficient computationally, as it requires only one trained model.
2. **Stability of Results**:
   - With sufficient Monte Carlo samples (e.g., 50), MC Dropout achieves results comparable to ensembles.
3. **Practical Applicability**:
   - MC Dropout is more practical in environments with limited computational resources.

---

## 6. Conclusions

Monte Carlo Dropout is a practical and efficient method for estimating model confidence in classification tasks. The experiments demonstrate that increasing the number of Monte Carlo samples significantly improves the stability and precision of results. Comparisons with CNN ensembles show that MC Dropout delivers comparable performance while being far more computationally efficient. For applications requiring interpretability and uncertainty estimation, MC Dropout offers an excellent solution.

---

## Python Implementation for Visualizations
