# 04_LGBM_Feature_Discovery
This notebook serves a exploratory visualization analysis of the feature discovery step, which generates features from the raw data using multiple different settings for Epoch size and Welch window size (for power spectral density calculation), and then uses the combined features to derive overall feature importance, overall epoch size importance, and overall welch size importance for generating EEG and heart rate features from ECG.

### Prerequisites:
- `make model_extended` - this will run the following if you have not already run them:
    - `make download`
    - `make features`
    - `make features_extended`

# Table of Contents


## [Feature Importance Plots](#feature_importance_plots)
## [Extended Model Evaluation](#extended_model_evaluation)

In [1]:
import os
import sys
import pytz
import numpy as np
import pandas as pd
import re
import umap
import plotly.express as px
import warnings
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import KFold
from lightgbm import LGBMClassifier


sys.path.insert(0, '..') 
import src.models.build_extended_model_LGBM as emodel

In [2]:
%load_ext autoreload
%autoreload 2

<a id='feature_importance_plots'></a>
## Feature Importance Plots
Note: the feature importances are all on the same scale, so for example a heart rate feature that has a feature importance "value" of 100 and is the most important heart rate feature, may be relatively less important than an EEG feature with a value of 200 that is less important than other EEG features. However, some of these feature importance charts represent sums of multiple features, so in that case they can not necessarily be compared to features in other plots.

#### Read features

In [3]:
features_df = emodel.load_from_csvs(
    '../data/processed/features/test12_Wednesday_07_features_with_labels.csv',
    '../data/interim/feature_discovery/EEG/Wednesday_feature_discovery_EEG.csv',
    '../data/interim/feature_discovery/ECG/Wednesday_feature_discovery_ECG.csv'
)

Basic Features Time Start and End		2019-10-25 08:21:02-07:00	2019-10-29 00:39:36-07:00
EEG Features Time Start and End		2019-10-25 08:21:02-07:00	2019-10-29 00:39:36-07:00
Heart Rate Features Time Start and End	2019-10-25 08:21:02-07:00	2019-10-29 00:39:36-07:00


### EEG Feature Importances

---
EEG Epoch Importance

---

![EEG_Epoch_Importance](../reports/figures/feature_discovery/EEG_Epoch_Importance.png "EEG_Epoch_Importance")

---
EEG Welch Importance

---

![EEG_Welch_Importance](../reports/figures/feature_discovery/EEG_Welch_Importance.png "EEG_Welch_Importance")

---
EEG Frequency Range Importance

---

![EEG_Frequency_Range_Importance](../reports/figures/feature_discovery/EEG_Frequency_Range_Importance.png "EEG_Frequency_Range_Importance")

---
EEG Other Feature Importance

---

![EEG_Other_Feature_Importance](../reports/figures/feature_discovery/EEG_Other_Feature_Importance.png "EEG_Other_Feature_Importance")

### Heart Rate (from ECG) Feature Importances

---
Heart Rate Epoch Importance

---

![Heart_Rate_Epoch_Importance](../reports/figures/feature_discovery/Heart_Rate_Epoch_Importance.png "Heart_Rate_Epoch_Importance")

---
Heart Rate Welch Feature Importance

---

![Heart_Rate_Welch_Importance](../reports/figures/feature_discovery/Heart_Rate_Welch_Importance.png "Heart_Rate_Welch_Importance")

---
Heart Rate Feature Importance

---

![Heart_Rate_Feature_Importance](../reports/figures/feature_discovery/Heart_Rate_Feature_Importance.png "Heart_Rate_Feature_Importance")

### Other Feature Importance (movement, pressure)

---
Other Features' Importance

---

![Other_Feature_Importance](../reports/figures/feature_discovery/Other_Feature_Importance.png "Other_Feature_Importance")

<a id='extended_model_evaluation'></a>
## Extended model evaluation

In [4]:
extended_model_conf_matr = pd.read_csv('../models/lightgbm_model_extended_confusion_matrix.csv', index_col=0)

### Confusion matrix

In [5]:
combined_conf_matr = extended_model_conf_matr.copy()
combined_conf_matr.index.name = 'Label'
combined_conf_matr.index = combined_conf_matr.index.str.slice(0, -7)
combined_conf_matr = combined_conf_matr.groupby('Label', sort=False).sum()
combined_conf_matr

Unnamed: 0_level_0,Predicted_Active Waking,Predicted_Quiet Waking,Predicted_Drowsiness,Predicted_SWS,Predicted_REM,Predicted_Unscorable
Label,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
True_Active Waking,125675,6674,581,1363,138,0
True_Quiet Waking,11314,24681,3277,1219,1715,0
True_Drowsiness,2003,5319,14028,964,18,0
True_SWS,4631,2851,1126,48403,914,0
True_REM,1098,4358,50,1386,24452,0
True_Unscorable,6287,138,29,0,0,0


### Overall accuracy, class accuracy, weighted accuracy
Weighted accuracy = Each class's accuracy is averaged to give a final accuracy value that gives each class with the same importance

In [6]:
overall_accuracy = np.sum(np.diag(combined_conf_matr)) / np.sum(combined_conf_matr.sum(axis=1))
print('Overall accuracy: ', np.round(100*overall_accuracy, 2), '%', sep='')

Overall accuracy: 80.5%


In [7]:
class_accuracies = np.diag(combined_conf_matr) / combined_conf_matr.sum(axis=1)
class_accuracies.index = class_accuracies.index.str.replace('True_', '')
class_accuracies.index.name = 'Sleep State'
class_accuracies.name = 'Class_Accuracies'
print('Per-class accuracy:\n')
print(class_accuracies)

Per-class accuracy:

Sleep State
Active Waking    0.934866
Quiet Waking     0.584775
Drowsiness       0.628157
SWS              0.835615
REM              0.780117
Unscorable       0.000000
Name: Class_Accuracies, dtype: float64


In [8]:
print('Weighted accuracy: ', np.round(100*np.mean(class_accuracies.drop('Unscorable')), 2), '%', sep='')

Weighted accuracy: 75.27%
