In [6]:
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import os

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import StackingClassifier, RandomForestClassifier

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

from numpy.random import seed

import tensorflow as tf

#These lines are required in order to use GPU for neural networks

os.environ['TF_XLA_FLAGS'] = '--tf_xla_enable_xla_devices'

gpus= tf.config.experimental.list_physical_devices('GPU')

if gpus:
    try:
    # Currently, memory growth needs to be the same across GPUs
        for gpu in gpus:
              tf.config.experimental.set_memory_growth(gpu, True)
        logical_gpus = tf.config.experimental.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
    except RuntimeError as e:
        # Memory growth must be set before GPUs have been initialized
        print(e)

1 Physical GPUs, 1 Logical GPUs


## Section 3: Using a Cascading Classifier

This notebook uses a Cascading Classifier that will take a summation of the previously engineered features as input. A Cascading Classifier is an ensemble method made of a series of different classifiers. This version will use a Random Forest and a Deep Neural Network with 5 hidden layers. This notebook will also use a linear regression in an attempt to linearly combine the features together, both as its own classifier and as the first layer to the Cascading Classifier.

The analysis done here compares the results with the MTEX-CNN results in [the next notebook](../code/04b_mtex_cnn_classification.ipynb).

This notebook is split into several parts

1. [Loading and combining the engineered features together](#part-3-1-loading-and-combining-the-engineered-features-together)
2. [Preparing for the cascading classifier](#part-3-2-preparing-for-the-cascading-classifier)
 1. [Finding the best parameters for logistic regression](#part-3-2-1-finding-the-best-parameters-for-logistic-regression)
 2. [Creating the neural network](#part-3-2-2-creating-the-neural-network)
3. [Creating the cascading classifier](#part-3-3-creating-the-cascading-classifier)
 1. [Cascading classifier without logistic regression](#part-3-3-1-cascading-classifier-without-logistic-regression)
 2. [Cascading classifier with logistic regression](#part-3-3-2-cascading-classifier-with-logistic-regression)
4. [Analysis](#part-3-4-analysis) 

Set random seeds for reproducibility

In [10]:
seed(42)
tf.random.set_seed(42)

<a id=part-3-1-loading-and-combining-the-engineered-features-together></a>

### Part 3.1: Loading and combining the engineered features together

Load all feature dataframes

In [3]:
folder = '../data/features_df_folder/'

Meta data dataframe that contains the label

In [4]:
meta_data_df = pd.read_csv('../data/data_analysis_files/meta_data_df.csv')

Kurtosis and Skew

In [4]:
kurtosis_df = pd.read_csv(folder + 'kurtosis_df.csv')

In [5]:
skew_df = pd.read_csv(folder + 'skew_df.csv')

Maximum, minimum, and average amplitudes

In [6]:
max_peak_df = pd.read_csv(folder + 'max_peak_df.csv')

In [7]:
min_peak_df = pd.read_csv(folder + 'min_peak_df.csv')

In [8]:
avg_peak_df = pd.read_csv(folder + 'avg_peak_df.csv')

Short-time fourier transform and Fast-fourier transform

In [9]:
stft_df = pd.read_csv(folder + 'stft_df.csv')

In [10]:
fft_df = pd.read_csv(folder + 'fft_df.csv')

Rename the columns...

In [11]:
kurtosis_df.columns = [['kurtosis_0', 'kurtosis_1', 'kurtosis_2', 'kurtosis_3', 'kurtosis_4', 'kurtosis_5', 'kurtosis_6',
                       'kurtosis_7', 'kurtosis_8', 'kurtosis_9', 'kurtosis_10', 'kurtosis_11']]

In [12]:
skew_df.columns = [['skew_0', 'skew_1', 'skew_2', 'skew_3', 'skew_4', 'skew_5', 'skew_6',
                       'skew_7', 'skew_8', 'skew_9', 'skew_10', 'skew_11']]

In [13]:
max_peak_df.columns = [['max_peak_0', 'max_peak_1', 'max_peak_2', 'max_peak_3', 'max_peak_4', 'max_peak_5', 'max_peak_6',
                       'max_peak_7', 'max_peak_8', 'max_peak_9', 'max_peak_10', 'max_peak_11']]

In [14]:
min_peak_df.columns = [['min_peak_0', 'min_peak_1', 'min_peak_2', 'min_peak_3', 'min_peak_4', 'min_peak_5', 'min_peak_6',
                       'min_peak_7', 'min_peak_8', 'min_peak_9', 'min_peak_10', 'min_peak_11']]

In [15]:
avg_peak_df.columns = [['avg_peak_0', 'avg_peak_1', 'avg_peak_2', 'avg_peak_3', 'avg_peak_4', 'avg_peak_5', 'avg_peak_6',
                       'avg_peak_7', 'avg_peak_8', 'avg_peak_9', 'avg_peak_10', 'avg_peak_11']]

In [16]:
stft_df.columns = [['stft_0', 'stft_1', 'stft_2', 'stft_3', 'stft_4', 'stft_5', 'stft_6',
                       'stft_7', 'stft_8', 'stft_9', 'stft_10', 'stft_11']]

In [17]:
fft_df.columns = [['fft_0', 'fft_1', 'fft_2', 'fft_3', 'fft_4', 'fft_5', 'fft_6',
                       'fft_7', 'fft_8', 'fft_9', 'fft_10', 'fft_11']]

...and then combine them to one big dataframe for train test split

In [18]:
combined_features_df = pd.concat([kurtosis_df, skew_df, max_peak_df, min_peak_df, avg_peak_df, stft_df, fft_df], axis=1)

In [19]:
combined_features_df.shape

(21837, 84)

<a id=part-3-2-preparing-for-the-cascading-classifier></a>

### Part 3.2: Preparing for the cascading classifier

The paper I referred to mentioned linearly combining the features together. However, they did not mention how. They fed the linearly combined features into a cascading classifier. A cascading classifier is an ensemble method that stacks different classifiers on top of each other. In the paper, they run a Random Forest before feeding the results into a Neural Network. In my case, I'll run a Logistic Regression before feeding it into a Random Forest and Neural Network.

The Random Forest classifier will use default settings as the paper describes that they should be able to obtain good results on the default settings.

The Neural Network (Multilayer Perception) will use 5 hidden layers.

<a id=part-3-2-1-finding-the-best-parameters-for-logistic-regression></a>

#### Part 3.2.1: Finding the best parameters for Logistic Regression

In [20]:
#Prepare X and y for train_test_split. X is simply the combined_features, while y is the labels from meta_data_df
X = combined_features_df
y = meta_data_df['diagnostic_superclass']

In [21]:
#Split into training and testing sets. Stratify on y because of imbalanced classes. Use random_state for repeatable
#experiments
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state = 42)

In [22]:
#Instantiate Standard Scaler to use on training and test data
ss = StandardScaler()
X_train_ss = ss.fit_transform(X_train)
X_test_ss = ss.transform(X_test)

In [23]:
#Instantiate Logistic Regression for GridSearch
logreg = LogisticRegression()

In [24]:
param_grid = {'penalty' : ['l2'],
              'multi_class' : ['multinomial'],
             'C' : [0.001, 0.01, 0.05, 0.1, 1],
             'solver' : ['lbfgs'],
             'max_iter':[500]}

In [25]:
#Instantiate GridSearch
gridsearch = GridSearchCV(estimator = logreg, param_grid = param_grid, cv=10, verbose = 0)

In [26]:
logreg_model = gridsearch.fit(X_train_ss, y_train)

In [27]:
logreg_model.best_estimator_

LogisticRegression(C=1, max_iter=500, multi_class='multinomial')

In [28]:
logreg_model.score(X_test_ss, y_test)

0.6227106227106227

In [29]:
logreg_model.predict_proba(X_test_ss)[0:10]

array([[3.29391275e-02, 1.90905651e-02, 5.64953803e-02, 7.19346242e-01,
        1.47735910e-03, 1.70651326e-01],
       [1.24314081e-01, 1.48129369e-02, 2.91923904e-01, 3.62686373e-01,
        4.08375756e-03, 2.02178947e-01],
       [1.01103884e-01, 1.54512348e-02, 3.68655419e-02, 6.65470325e-01,
        1.74882464e-02, 1.63620768e-01],
       [7.04009463e-02, 1.05172791e-02, 6.23890640e-02, 7.11535071e-01,
        3.31887530e-03, 1.41838764e-01],
       [9.55005952e-01, 1.71127294e-03, 5.72460245e-03, 4.91948928e-04,
        6.53709695e-03, 3.05291269e-02],
       [1.28183333e-01, 9.26651177e-02, 2.16842998e-02, 8.66041504e-02,
        1.85443354e-03, 6.69008666e-01],
       [1.31247912e-01, 4.08154413e-03, 9.35354973e-02, 6.90402575e-01,
        4.65561316e-03, 7.60768586e-02],
       [4.87060112e-01, 7.07481214e-02, 1.65999889e-02, 1.38194317e-07,
        1.63543714e-03, 4.23956202e-01],
       [8.63381595e-03, 6.52110205e-02, 2.85781667e-02, 6.28218874e-01,
        4.16016580e-03, 

Train a Logistic Regression model on the best parameters so that we can get the coefficients and intercept

In [30]:
final_model = LogisticRegression(multi_class='multinomial', max_iter=500, C=0.001, solver='lbfgs', penalty='l2')

In [31]:
final_model.fit(X_train_ss, y_train)

LogisticRegression(C=0.001, max_iter=500, multi_class='multinomial')

In [32]:
final_model.score(X_test_ss, y_test)

0.6049450549450549

<a id=part-3-2-2-creating-the-neural-network></a>

#### Part 3.2.2: Creating the Neural Network

In [33]:
nn_model = Sequential()

#Define initial input layer
nn_model.add(Dense(256, input_shape=(21837,84), activation='relu'))

#Define hidden layers
nn_model.add(Dense(128, activation='relu'))
nn_model.add(Dense(64, activation='relu'))
nn_model.add(Dense(32, activation='relu'))
nn_model.add(Dense(16, activation='relu'))
nn_model.add(Dense(8, activation='relu'))

#Final output layer
nn_model.add(Dense(6, activation='softmax'))

#Compile the neural network for multiclass classification
nn_model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [34]:
#This step is necessary to fit the model into StackingClassifier
nn_clf = KerasClassifier(lambda: nn_model, epochs=50, batch_size= 256)
nn_clf._estimator_type = "classifier"

<a id=part-3-3-creating-the-cascading-classifier></a>

### Part 3.3: Creating the Cascading Classifier

<a id=part-3-3-1-cascading-classifier-without-logistic-regression></a>

#### Part 3.3.1: Cascading Classifier without Logistic Regression

Let's do a cascading classifier now

In [35]:
estimators = [('rf', RandomForestClassifier(n_estimators=100, random_state=42)),
             ('nn', nn_clf)]

In [36]:
clf = StackingClassifier(estimators = estimators, cv=10)

In [37]:
clf.fit(X_train_ss, y_train)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/5



Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50


Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50


Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Epoch 1/50
Ep

Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/5

StackingClassifier(cv=10,
                   estimators=[('rf', RandomForestClassifier(random_state=42)),
                               ('nn',
                                <tensorflow.python.keras.wrappers.scikit_learn.KerasClassifier object at 0x00000225B8FE6790>)])

In [38]:
clf.score(X_test_ss, y_test)



0.6076923076923076

<a id=part-3-3-2-cascading-classifier-with-logistic-regression></a>

#### Part 3.3.2: Cascading Classifier with Logistic Regression

In [47]:
estimators_lr = [('lr', LogisticRegression(multi_class='multinomial', max_iter=500, C=0.001, solver='lbfgs', penalty='l2')),
              ('rf', RandomForestClassifier(n_estimators=100, random_state=42)),
             ('nn', nn_clf)]

In [48]:
clf_with_lr = StackingClassifier(estimators = estimators, cv=10)

In [49]:
clf_with_lr.fit(X_train_ss, y_train)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/5



Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/

Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50


Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/5

Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


StackingClassifier(cv=10,
                   estimators=[('rf', RandomForestClassifier(random_state=42)),
                               ('nn',
                                <tensorflow.python.keras.wrappers.scikit_learn.KerasClassifier object at 0x00000225B8FE6790>)])

In [50]:
clf_with_lr.score(X_test_ss, y_test)



0.6065934065934065

<a id=part-3-4-analysis></a>

### Part 3.4: Analysis

#### Comments on both models

Logistic Regression gave an accuracy score of 60.49%, while the Cascading Classifier gave a test accuracy score of 60.66%. Both are less than that of the raw signal being fed to the [MTEX-CNN](./04b_mtex_cnn_classification.ipynb), which gives around 69.72% validation accuracy. There is considerable overfitting in the Cascading Classifiers, with the training accuracy being around 99.97%.

#### Improving the Cascading Classifer score

The Cascading Classifier score could probability improved through a proper understanding of the linear combination procedure used in the [paper this notebook refers to](https://www.sciencedirect.com/science/article/abs/pii/S0169260719305747). Here, the features were simply added up and standardised. We attempted to do Logistic Regression to combine the features together, but using Logistic Regression as the first classifier in the Cascading Classifer actually returned a worse score. 

Another change that could be done is feeding the data to the Random Forest and Neural Network separately, before feeding the Random Forest probabilities that could not be correctly classified into the Neural Network. We were limited in our understanding of how to implement the Cascading Classifier properly.

#### What the Logistic Regression score tells us

The fact that the scores between Logistic Regression and the other, more complicated models are similar means that the models overfit to the training data and that there are too many features. This notebook aimed to use as much of the information of the multilead ECG as possible, and did not do dimensionality reduction such as Principle Component Analysis (PCA). However, in reality, it's possible that one or two channels are more important than others. Channel II, which is feature 1 in our case, seems to be one of the main channels from which doctors get their information from.