# Introdution to Machine Learning - Course Project Report

Group members:
   - Grzegorz Prasek
   - Jakub Kindracki
   - Mykhailo Shamrai
   - Mateusz Mikiciuk
   - Ernest Mołczan

In this report we will describe our implementation of CNN supposed to classify users allowed to the system and users not allowed (binary classification).

## Table of contents:
1. Dataset
2. Exploratory Data Analysis
3. Preparing audio files for generating spectrograms
3. Generating spectrograms
4. Classifying spectrograms for train, test and validation datasets
5. Model
6. Training loop
7. [EXTRA] **interpretability** - visualizing the behavior and function of individual cnn layers and using if for data exploration
8. [EXTRA] **uncertainty** - using monte carlo dropout to estimate classification confidence. Comparing dropout to an ensemble of CNN networks.
9. [EXTRA] **parameter space** examining how much individual layers of the network change during training. Investigating their re-initialization robustness.

# Dataset

## 1.1 Introduction to the Dataset
The project is based on the DAPS (Device and Produced Speech) dataset, which was specifically designed for speech processing and analysis research. The primary goal of this dataset is to provide high-quality speech recordings that can be utilized in applications such as speech recognition, speaker classification, and acoustic analysis.

The DAPS dataset was chosen as the primary data source due to its following characteristics:

- **Data Quality**: The recordings are clean and diverse, enabling precise testing of models under both laboratory and simulated conditions.
- **Speaker Diversity**: The dataset includes recordings from 20 distinct speakers, divided into two classes:
  - **Class 1 (Acceptable individuals)**: Includes recordings from speakers F1, F7, F8, M3, M6, and M8.
  - **Class 0 (Unacceptable individuals)**: Includes recordings from the remaining 14 speakers.
- **Alignment with Project Requirements**: The dataset provides recordings that can be easily transformed into spectrograms, which are essential for the CNN-based approach employed in this project.

---

## 1.2 Dataset Characteristics
Each recording in the DAPS dataset is available as a `.wav` file and exhibits the following features:

- **Standard Sampling Format**: All recordings are sampled at 16 kHz, which is sufficient for most speech processing applications.
- **Variety in Recording Lengths**: The recordings vary in duration, necessitating preprocessing to standardize the samples for comparability.
- **Natural and Artificial Noise**: The dataset includes samples with varying levels of noise, allowing for robustness testing of the model against disturbances.

Additionally, the DAPS dataset was selected due to its accessibility and clear licensing terms, which permit its legal use for educational and research purposes.

---

## 1.3 Data Preparation
To effectively utilize the DAPS dataset in the project, several key data preparation steps were undertaken:

### a) Data Cleaning
The data cleaning process aimed to remove samples that could negatively impact model performance. The following tasks were performed:

- **Duplicate Elimination**: Redundant recordings were removed to prevent overrepresentation of certain samples in the training set.
- **Silence Removal**: Segments containing silence were identified and eliminated to improve model efficiency.

### b) Data Splitting
The dataset was split into three subsets:

- **Training Set**: 70% of the data, used for model training.
- **Validation Set**: 15% of the data, used for model evaluation during training.
- **Test Set**: 15% of the data, used for final model evaluation.

The split was performed to ensure no overlap between subsets, preventing data leakage, i.e., no fragments of the same recording were included in both the training and test sets.

### c) Data Augmentation
To increase data diversity and enhance the model's robustness against noise, the following augmentation techniques were applied:

- **Adding Background Noise**: Artificial noise of varying intensities was introduced to simulate real-world acoustic conditions.
- **Pitch Shifting**: The pitch of recordings was altered to increase speaker diversity.
- **Trimming Recordings**: Samples were cropped to a fixed length to ensure consistency across input data.

---

## 1.4 Exceptional Cases in the Data
During data analysis, certain samples were identified as particularly challenging for classification:

- **Low-Volume Recordings**: Required signal amplification to enhance quality.
- **Samples with Significant Background Noise**: These were leveraged to evaluate the model's noise resistance.
- **Acoustically Similar Speakers**: These samples demanded special attention during model training.

---

## 1.5 Challenges and Solutions
Several challenges were encountered while working with the data, which were addressed as follows:

### Class Imbalance
- **Problem**: Class 1 was underrepresented, with only six speakers compared to 14 in Class 0.
- **Solution**: Data augmentation techniques were used to increase the number of samples for Class 1.

### Impact of Noise
- **Problem**: High levels of noise in some recordings negatively affected classification performance.
- **Solution**: A noise reduction process was applied to the audio files, and augmentation with various noise levels was employed to improve robustness.

---

## 1.6 Conclusion
The prepared and processed dataset provided a solid foundation for training and testing the speech recognition model. The preprocessing steps enabled the identification and resolution of potential issues, such as the heterogeneity in recording quality. The final dataset is diverse, well-balanced, and optimized for use in spectrogram-based models.







# Monte Carlo Dropout for Estimating Classification Confidence: Comparison with CNN Ensembles

## 1. Introduction
Monte Carlo Dropout (MC Dropout) is a powerful technique used to estimate the confidence of predictions in classification tasks. Unlike traditional deep learning approaches that primarily focus on accuracy, MC Dropout allows for uncertainty estimation by leveraging dropout layers during inference.

In this chapter, we detail the application of MC Dropout to evaluate the confidence of a CNN-based classifier. Furthermore, we compare the results with those obtained from an ensemble of CNN networks, a well-known method for uncertainty estimation. Visualizations are provided to illustrate the effect of the number of Monte Carlo samples on classification confidence and variance.

---

## 2. Monte Carlo Dropout

### a) How It Works
Monte Carlo Dropout enables the use of dropout layers during inference by keeping the model in training mode. This introduces randomness into the network’s predictions and allows multiple predictions for the same input. The process involves:
1. **Activating Dropout During Inference**: Unlike standard evaluation (`model.eval()`), the model remains in training mode (`model.train()`), ensuring random neuron deactivation.
2. **Performing Multiple Predictions**: For each input sample, `n` predictions (Monte Carlo samples) are generated.
3. **Calculating Mean and Variance**: The mean prediction provides the final class probability, while the variance indicates uncertainty.

### b) Advantages of MC Dropout
- **Computational Efficiency**: Uses a single trained model, avoiding the need to maintain multiple networks as in ensembles.
- **Simple Implementation**: Only requires activating dropout layers during inference and performing multiple passes.
- **Uncertainty Quantification**: Provides insight into model confidence through variance analysis.

---

## 3. Experiment Setup

### a) Preparation
1. **Model**: A pre-trained CNN with a dropout layer in the fully connected layer (FC1) with a dropout probability of 50% (`p=0.5`).
2. **Data**: A test set consisting of 15% of the DAPS dataset, converted into spectrograms.
3. **Monte Carlo Sampling**: Predictions were performed for different values of Monte Carlo samples (`n`): 5, 10, 50, and a range from 1 to 30.

### b) Procedure
1. For each test sample:
   - Perform `n` predictions using MC Dropout.
   - Compute:
     - **Mean Probabilities**: Average class probabilities across samples.
     - **Standard Deviation**: Variance of predictions for each class.
2. Visualize results for:
   - Specific values of `n` (5, 10, 50).
   - Full range (`n=1` to `n=30`), with averages computed across batches and Monte Carlo samples.

### c) Results
- **Stabilization of Mean Probabilities**: As `n` increases, mean probabilities become more stable, reflecting reduced randomness.
- **Reduction in Variance**: Larger `n` leads to lower uncertainty, improving confidence in predictions.

---

## 4. Visualization

### a) Plots for Specific Monte Carlo Sample Sizes (5, 10, 50)
For each `n`, the following is plotted:
- **X-axis**: Batch indices from the test set.
- **Y-axis**: Mean probabilities for each class with error bars representing standard deviation.

### b) Plot for the Full Range of Monte Carlo Samples (1–30)
For all `n` values, a plot is generated to show:
- Average probabilities and variance across all classes and batches.
- **X-axis**: Number of Monte Carlo samples (`n`).
- **Y-axis**: Averaged probabilities and variance.

---

## 5. Comparison with CNN Ensembles

### a) Ensemble Setup
To compare MC Dropout with ensembles, 10 CNNs with independent weight initializations were trained. For each test input:
- Ensemble predictions were averaged to calculate mean probabilities and variance.

### b) Observations
1. **Computational Complexity**: MC Dropout requires only a single model, making it significantly faster than maintaining an ensemble of 10 networks.
2. **Stability**: With sufficient Monte Carlo samples (e.g., `n=50`), MC Dropout provides results comparable to ensembles.
3. **Applicability**: MC Dropout is practical for scenarios with limited computational resources.

---

## 6. Conclusion
Monte Carlo Dropout is a practical and efficient method for estimating classification confidence in CNNs. By leveraging randomness in dropout layers, it provides robust confidence estimates while remaining computationally efficient. In comparison with ensembles, MC Dropout achieves similar performance with significantly lower resource requirements, making it suitable for real-world applications where computational constraints exist.

---

### Python Implementation for Visualization
