# Neurailogic Research Center

<div style="font-family: Arial, sans-serif; margin: 20px;">
    <h1 style="text-align: center; color: #333;">EEG Processing: An Entry to the World of Brain Waves</h1>
    <h2 style="text-align: center; color: #555;">Home Work 5: Functional Connectivity-Based Classification of Parkinson's Disease</h2>
    <p style="text-align: center; color: #666; font-size: 16px;">
        Author: Mohammadreza Shahsavari<br>
        Contact: <a href="mailto:mohamadrezashahsavary@gmail.com" style="color: #0066cc; text-decoration: none;">mohamadrezashahsavary@gmail.com</a>
    </p>
</div>







## Overview

In this assignment, you will classify Parkinson's Disease (PD) and Healthy Control (HC) subjects using EEG data. Instead of using EEG power bands as features, you will calculate functional connectivity metrics from the EEG data and use these metrics for classification.

## Objectives

- **Download and prepare EEG data.**
- **Calculate functional connectivity metrics as features.**
- **Prepare the dataset with these features and corresponding labels.**
- **Train an SVM classifier using these features.**
- **Evaluate the classifier’s performance.**


##Step-by-Step Guide
###Download and Extract the Dataset:

Start by downloading the provided EEG dataset for the assignment. Running the cell below will automatically download and unzip the dataset. Once completed, you'll find the preprocessed EEG signals organized into two folders: 'PD' (Parkinson's Disease) and 'HC' (Healthy Control).

## Downalod and Extract the Dataset

In [1]:
# Dowloading the EDF data
import gdown
import os
import zipfile

eeg_arch_url = 'https://drive.google.com/uc?id=1rynIhpIAQc4OzX4nCrrnxOXMsdLt7xWD'

eeg_arch_file_name = 'EEGs.zip'  # Assuming the file is now a ZIP file

gdown.download(eeg_arch_url, eeg_arch_file_name, quiet=False)

eeg_arch_path = os.path.join('/content', eeg_arch_file_name)

# Unzip the file
with zipfile.ZipFile(eeg_arch_path, 'r') as zip_ref:
    zip_ref.extractall()

!ls -l /content

Downloading...
From (original): https://drive.google.com/uc?id=1rynIhpIAQc4OzX4nCrrnxOXMsdLt7xWD
From (redirected): https://drive.google.com/uc?id=1rynIhpIAQc4OzX4nCrrnxOXMsdLt7xWD&confirm=t&uuid=554463aa-e92b-4abc-b0cf-a376bd18fa6e
To: /content/EEGs.zip
100%|██████████| 162M/162M [00:02<00:00, 77.1MB/s]


total 157820
-rw-r--r-- 1 root root 161595831 Sep 20 16:45  EEGs.zip
drwxr-xr-x 4 root root      4096 Sep 20 16:45 'Preprocessed Uc San Diego Dataset'
drwxr-xr-x 1 root root      4096 Sep 19 13:25  sample_data


##Step 1: Load EEG Signals from .mat Files

In this step, we load the EEG signals from the .mat files stored in the "PD" (Parkinson's Disease) and "HC" (Healthy Control) folders. We use the scipy.io module to load each .mat file, extract the EEG data, and store them in lists for further processing.

In [8]:
import os
from scipy.io import loadmat

# Define the paths
base_path = "/content/Preprocessed Uc San Diego Dataset"

pd_file_path = os.path.join(base_path, 'HC')
hc_file_path = os.path.join(base_path, 'PD')

pd_signals = []
for pd_file in os.listdir(pd_file_path):
  pd_full_path = os.path.join(pd_file_path, pd_file)
  pd_signals.append(loadmat(pd_full_path)['time'][:32, :])


hc_signals = []
for hc_file in os.listdir(hc_file_path):
  hc_full_path = os.path.join(hc_file_path, hc_file)
  hc_signals.append(loadmat(hc_full_path)['time'][:32, :])


print(len(pd_signals), len(hc_signals))

500 500


## Step 2: Calculate Functional Connectivity

In this step, you will calculate the functional connectivity for each EEG dataset. These connectivity measures will serve as the features for your machine learning classifier.

In [19]:
import numpy as np

pd_correlations = []
for eeg in pd_signals:
  correlation = np.corrcoef(eeg)
  pd_correlations.append(correlation.flatten())

hc_correlations = []
for eeg in hc_signals:
  correlation = np.corrcoef(eeg)
  hc_correlations.append(correlation.flatten())

pd_correlations = np.array(pd_correlations)
hc_correlations = np.array(hc_correlations)




In [22]:
print(pd_correlations.shape)
print(hc_correlations.shape)

(500, 1024)
(500, 1024)


##Step 3: Prepare Labels and Combine Data

In this part, you have to prepare labels for the features extracted from the EEG signals. The features from the "PD" folder are labeled as 1 (Parkinson's Disease), and those from the "HC" folder are labeled as 0 (Healthy Control). Then combine these features and labels into a single dataset for training and testing.

In [23]:
pd_labels = np.ones(pd_correlations.shape[0])
hc_labels = np.zeros(hc_correlations.shape[0])




[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.

##Step 4: Train SVM Classifier

Train an SVM classifier using the extracted features.
Split the dataset into training and testing sets, and the SVM should be trained on the training data.
Then predict the labels for the test set and evaluate the classifier's performance using accuracy and a detailed classification report.

# Step 5: Evaluate the Classifier

In this step, you will evaluate the performance of your classifier using various metrics. These metrics provide insights into how well the model distinguishes between Parkinson's Disease and Healthy Control based on the EEG signals. The key metrics to evaluate are **accuracy**, **precision**, **recall**, and **F1-score**. Each of these metrics captures a different aspect of classifier performance.

## 1. Accuracy

- **Definition**: Accuracy measures the overall correctness of the classifier. It represents the proportion of correctly classified instances (both Parkinson's and Healthy Control) out of the total instances.

- **Formula**:
  $$
  \text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}
  $$
  Where:
  - **TP**: True Positives (Parkinson’s correctly identified)
  - **TN**: True Negatives (Healthy Controls correctly identified)
  - **FP**: False Positives (Healthy Controls incorrectly classified as Parkinson's)
  - **FN**: False Negatives (Parkinson's incorrectly classified as Healthy Controls)

- **Interpretation**: A high accuracy means that the model is generally performing well, but it can be misleading if the dataset is imbalanced (e.g., if there are more healthy subjects than Parkinson's patients).

## 2. Precision

- **Definition**: Precision measures the proportion of true positive predictions out of all the positive predictions made by the model. It tells you how many of the Parkinson’s predictions were actually correct.

- **Formula**:
  $$
  \text{Precision} = \frac{TP}{TP + FP}
  $$

- **Interpretation**: High precision means that when the classifier predicts a subject as having Parkinson's, it is usually correct. This metric is especially important when false positives (incorrectly labeling healthy people as Parkinson's patients) are costly or problematic.

## 3. Recall (Sensitivity)

- **Definition**: Recall, or sensitivity, measures the proportion of true positive instances (Parkinson’s patients) that the model correctly identifies. It is a measure of how well the model captures the actual Parkinson's cases.

- **Formula**:
  $$
  \text{Recall} = \frac{TP}{TP + FN}
  $$

- **Interpretation**: A high recall means the classifier is good at detecting Parkinson's patients, but it might also increase false positives. This is important when missing a Parkinson's patient is more critical than misclassifying a healthy person.

## 4. F1-Score

- **Definition**: The F1-score is the harmonic mean of precision and recall. It balances the two metrics and is particularly useful when you need to account for both false positives and false negatives.

- **Formula**:
  $$
  \text{F1-Score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}
  $$

- **Interpretation**: A high F1-score indicates that the classifier has a good balance between precision and recall, which is crucial when the cost of false positives and false negatives are equally important.


## Calculating Metrics Using Code

In the provided code, the metrics are automatically calculated using built-in functions from **scikit-learn** after predicting on the test set. Here’s how the different metrics are computed:

### Accuracy:
The `accuracy_score(y_test, y_pred)` function computes the accuracy by comparing the true labels (`y_test`) with the predicted labels (`y_pred`). It calculates the proportion of correctly classified instances.

```python
print("Accuracy:", accuracy_score(y_test, y_pred))
```

### Precision, Recall, and F1-Score:

The `classification_report(y_test, y_pred)` generates a detailed report that includes precision, recall, and F1-score for each class (Parkinson’s and Healthy Control). It also provides the support, which is the number of true instances for each class.

```python
print("Classification Report:\n", classification_report(y_test, y_pred))
```
