# Rumination prediction with cesium

Features are calculated for each band from cwt separately.
Different strategies of ICA and PCA application:

- ICA + PCA on all kind of features (mean, std itd) -> do not increase effectivness of the model

- ICA + PCA on two the best kind of features (std + abs_diff) -> increase effectivness of the model a lot. Second the best method:  

    - 9 * (3 from 2*14) = 27 components
    
            MAPE: 17.9079258661464
            MAE: 0.519917817723914
            MSE: 0.478030015754372
            R^2: 0.376125673763408

    - 9 * (2 from 2*14) = 18 components

            MAPE: 16.8443619132606
            MAE: 0.502078488977514
            MSE: 0.454436875307531
            R^2: 0.406916950702055



- ICA + PCA separately on std features and abs_diffs features -> ok, but not the best way
    
    - 6*(5+5) = 60 components

            MAPE: 24.018846090150365
            MAE: 0.6884866303220416
            MSE: 0.7185622829832321
            R^2: 0.062208343867806604

    - 6*(3+3) = 36 components

            MAPE: 21.50178345656211
            MAE: 0.6201825754057195
            MSE: 0.6069731329548679
            R^2: 0.2078427255904629

    - 6*(2+2) = 24 components

            19.438263116024583
            0.5637319078555214
            0.5362710537170331
            0.3001156176565023

    - 6*(1+1) = 12 components

            19.85447763226303
            0.5760456075230935
            0.585240921194229
            0.23620531480654705

    - 5*(3+3) = 30 components

            MAPE: 20.536346943303492
            MAE: 0.5910318538690427
            MSE: 0.5747574611935253
            R^2: 0.24988722039619093

    - 5*(2+2) = 20 components

            19.35858512090129
            0.5635286480139711
            0.5433040170537524
            0.2909369361542158
            
    - 5*(1+1) = 10 components
    
            20.250558073766026
            0.5886032629783092
            0.5852655385369406
            0.23617318684890465

    - 4*(4+4) = 32 components

            MAPE: 22.031474920258777
            MAE: 0.6380830691061724
            MSE: 0.6277453555688914
            R^2: 0.1807330128932182
            
   
- PCA on flattened ICA channels and PCA separately on std features and abs_diffs features -> more research needed
    
    - 30 from 6*(5+5) = 30 components
    
            MAPE: 21.44571192549426
            MAE: 0.6261229886064391
            MSE: 0.6193306483997083
            R^2: 0.19171500061917268
       
    - 30 from 6*(4+4) = 30 components
    
            MAPE: 21.24921683786726
            MAE: 0.615903392719023
            MSE: 0.608353978391557
            R^2: 0.20604059185814527
    
    - 30 from 6*(3+3) = 30 components
    
            MAPE: 21.11155762577906
            MAE: 0.6154703465611149
            MSE: 0.604724357788718
            R^2: 0.21077758960611548
    
    - 30 from 5*(5+5) = 30 components
    
            MAPE: 21.67828659174979
            MAE: 0.6323799032109819
            MSE: 0.6303037217274262
            R^2: 0.17739410338791517
    
    - 30 from 5*(4+4) = 30 components
    
            MAPE: 21.317971705648553
            MAE: 0.615749388609156
            MSE: 0.6219822296997861
            R^2: 0.1882544365488662
    
    - 30 from 5*(3+3) = 30 components
    
            MAPE: 21.220223056606272
            MAE: 0.6138419950553302
            MSE: 0.6099786827888294
            R^2: 0.20392019914685844
    
    - 30 from 4*(4+4) = 30 components
    
            MAPE; 22.469724109237735
            MAE: 0.6509080551097475
            MSE: 0.6534929953956176
            R^2: 0.14712991074547543
   
   
- PCA on flattened ICA channels and (std + abs_diff) feature sets -> **the best results:**
    
        Example:
        
        
     -  PCA: 18; ICA: 9:
     
             MAPE: 16.1051630051677
             MAE: 0.481721094094576
             MSE: 0.412577754295216
             R^2: 0.461547057719921
             
             
     - PCA: 18; ICA: 18:
     
             MAPE: 11.6348912814115
             MAE: 0.35025761618236
             MSE: 0.237247454583717
             R^2: 0.690369660896321



### Imports

In [None]:
%load_ext lab_black
import os
import pickle
from time import time
import pywt
import mne
import scipy
import numpy as np
import plotly.graph_objects as go
import plotly.express as px
import pandas as pd
import cesium.featurize
from plotly.subplots import make_subplots
from ipywidgets import Dropdown, FloatRangeSlider, IntSlider, FloatSlider, interact
from sklearn.decomposition import FastICA
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import StratifiedKFold
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.decomposition import PCA


from utils import *

### Loading data

Loading EEG data and data from rumination questionnaire. By default create_df_data loads all info from given file but one can specify it by passing a list of desired labels from csv file.

In [None]:
tmin, tmax = -0.1, 0.6
signal_frequency = 256
ERROR = 0
CORRECT = 1

In [None]:
df_name = "go_nogo_df"
pickled_data_filename = "../data/" + df_name + ".pkl"
info_filename = "../data/Demographic_Questionnaires_Behavioral_Results_N=163.csv"

# Check if data is already loaded
if os.path.isfile(pickled_data_filename):
    print("Pickled file found. Loading pickled data...")
    epochs_df = pd.read_pickle(pickled_data_filename)
    print("Done")
else:
    print("Pickled file not found. Loading data...")
    epochs_df = create_df_data(info_filename=info_filename)
    epochs_df.name = df_name
    # save loaded data into a pickle file
    epochs_df.to_pickle("../data/" + epochs_df.name + ".pkl")
    print("Done. Pickle file created")

Data is now read into dataframe and each epoch is a single record.

Sorting participants by the number of errors, descending. This way the best participants are first.

In [None]:
# add new columns with info about error/correct responses amount
grouped_df = epochs_df.groupby("id")
epochs_df["error_sum"] = grouped_df[["marker"]].transform(
    lambda x: (x.values == ERROR).sum()
)
epochs_df["correct_sum"] = grouped_df[["marker"]].transform(
    lambda x: (x.values == CORRECT).sum()
)

# mergesort for stable sorting
epochs_df = epochs_df.sort_values("error_sum", ascending=False, kind="mergesort")

## Training and predictions

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Lasso
from sklearn.linear_model import ElasticNet
from sklearn.tree import DecisionTreeRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import FunctionTransformer
from sklearn.dummy import DummyRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import ParameterGrid
from sklearn.svm import SVR
from tempfile import mkdtemp


from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score


import numpy as np
import scipy.stats

- Computes ICA and then at each channel computes CWT (ica_n_components = N).
- For each band (frequency) from CWT set it computes features given in feature_dict parameter (eg. std or mean).
- Then it computes PCA on flattened ICA channels and features (outer_components = N)
- Ending feature vector has shape: outer_components from (ica_n_components * len(feature_dict) * frequencies)

#### Subset of standard features for EEG analysis provided by Guo et al. (2012)

In [None]:
def std_signal(t, m, e):
    return np.std(m)


def abs_diffs_signal(t, m, e):
    return np.sum(np.abs(np.diff(m)))

In [None]:
guo_features = {
    "std": std_signal,
    "abs_diffs": abs_diffs_signal,
}

### Regressions grid search

**Warning! It takes a lot of time! One run with ICA_n = 35 takes ~20 min** 

It is a pipeline which allows manipulation of vectorization's parameters. Base_steps dictionary consists of all steps of vectorization including standarization of data.

In rate_regression function, using GridSearchCV, cross-validation splitting strategy can be specified. Default cv = 5.
Results of cross-validated search are in **grid_search.cv_results** and chosen model is in **grid_search.best_estimator_**

Defined data transformers - custom data transformation steps

In [None]:
def IcaPreprocessingTransformer():
    def transform(X):
        timepoints_per_channel = np.concatenate(X, axis=1)
        return timepoints_per_channel.T

    return FunctionTransformer(func=transform)


def CwtVectorizer(timepoints_count, mwt="morl", cwt_density=2):
    def transform(X):
        X_ica_transposed = X.T
        ica_n_components = X.shape[1]

        epochs_count = int(X_ica_transposed.shape[1] / timepoints_count)
        data_per_channel = X_ica_transposed.reshape(
            ica_n_components, epochs_count, timepoints_count
        )

        cwt_per_channel = []
        for data in data_per_channel:
            data_cwt = np.array([cwt(epoch, mwt, cwt_density) for epoch in data])
            cwt_per_channel.append(data_cwt)
        cwt_per_channel = np.array(cwt_per_channel)
        return cwt_per_channel

    return FunctionTransformer(func=transform)


def CwtFeatureVectorizer(feature_dict):
    def transform(X):
        vectorized_data = []

        for data_cwt in X:
            # cesium functions
            feature_set_cwt = cesium.featurize.featurize_time_series(
                times=None,
                values=data_cwt,
                errors=None,
                features_to_use=list(feature_dict.keys()),
                custom_functions=feature_dict,
            )
            features_per_epoch = feature_set_cwt.to_numpy()
            vectorized_data.append(features_per_epoch)
        vectorized_data = np.array(vectorized_data)
        return vectorized_data

    return FunctionTransformer(func=transform)


# reshape data from (channels x epoch x features) to (epochs x channles x features)
# and then flatten it to (epoch x channels*features)
def PostprocessingTransformer():
    def transform(X):
        vectorized_data = np.stack(X, axis=1)
        epochs_per_channel_feature = vectorized_data.reshape(
            vectorized_data.shape[0], -1
        )
        return epochs_per_channel_feature

    return FunctionTransformer(func=transform)

#### Prediction with SVR-rbf

In [None]:
X = np.array(epochs_df[epochs_df["marker"] == ERROR]["epoch"].to_list())
y = np.array(epochs_df[epochs_df["marker"] == ERROR]["Rumination Full Scale"].to_list())

In [None]:
base_steps = [
    ("ica_preprocessing", IcaPreprocessingTransformer()),
    ("ica", FastICA(random_state=5)),
    ("cwt", CwtVectorizer(timepoints_count=X.shape[-1])),
    ("cwt_feature", CwtFeatureVectorizer(feature_dict=guo_features)),
    ("postprocessing", PostprocessingTransformer()),
    ("pca", PCA(random_state=5)),
    ("scaler", StandardScaler()),
    ("svr", SVR()),
]

regressor_params = dict(
    ica__n_components=[1, 2, 3, 4],
    pca__n_components=[1, 2, 3, 4],
    # svr__C=np.arange(1, 2, 1),
    # svr__gamma=[0.1],
    # svr__epsilon=[0.1],
    svr__kernel=["linear"],
)

In [None]:
%%time
pipeline = Pipeline(steps=base_steps)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=0, shuffle=False
)

# X_test, y_test = X_train, y_train

for params in ParameterGrid(regressor_params):
    pipeline.set_params(**params)
    pipeline.fit(X_train, y_train)
    
    y_pred = pipeline.predict(X_test)
    r2 = r2_score(y_test, y_pred)
    print(params, r2)