# Electrocardiogram (ECG) classification using three different algorithms

The objective is to classify (in a supervised way) the INCART 12-lead Arrhythmia Database, developed by the St. Petersburg Institute of Cardiological Technics, using three different models: $k$-Nearest Neighbors classifier, Random Forest algorithm and an Artificial Neural Network.

The complete database is a comprehensive collection of ECG recordings aimed at supporting research in arrhythmia detection and analysis. This database consists of 75 annotated recordings derived from 32 Holter monitor records. Each recording is 30 minutes long and includes data from 12 standard ECG leads, sampled at 257 Hz. This dataset is publicly available through PhysioNet, a repository for medical research data managed by the MIT Laboratory for Computational Physiology, and is widely used for developing and testing algorithms for ECG analysis.

The complete dataset contains 10 kinds of diagnoses beyond the recordings classified as normal. But in this paper only the normal and the irregular heartbeat are considered. For this reason the class labeled as 'N' (normal beat) is mapped into 1 and all the other labels into 0.

For reducing the computations, only a subset of the entire dataset will be taken in accont: 10 recordings for each patient will be listed into a Pandas dataframe. The chosen attributes contain the measures of ECG peaks and intervals.

# Prepare the data

In [1]:
import pandas as pd
import numpy as np
import random
import math
from collections import Counter
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.metrics import confusion_matrix

In [2]:
df = pd.read_csv('incart_subset.csv')
print(df.shape)
df

(750, 13)


Unnamed: 0,record,type,0_pre-RR,0_post-RR,0_pPeak,0_tPeak,0_rPeak,0_sPeak,0_qPeak,0_qrs_interval,0_pq_interval,0_qt_interval,0_st_interval
0,I01,N,163,165,0.069610,-0.083281,0.614133,-0.392761,0.047159,15,2,27,10
1,I01,N,165,166,-0.097030,0.597254,-0.078704,-0.078704,-0.137781,3,5,14,6
2,I01,N,166,102,0.109399,0.680528,-0.010649,-0.010649,-0.720620,6,25,35,4
3,I01,VEB,102,231,0.176376,0.256431,-0.101098,-0.707525,-0.101098,4,3,14,7
4,I01,N,231,165,0.585577,0.607461,-0.083499,-0.083499,-0.167858,3,34,43,6
...,...,...,...,...,...,...,...,...,...,...,...,...,...
745,I75,N,210,205,-0.152757,0.766852,-0.177058,-0.177058,-0.862051,3,9,16,4
746,I75,N,205,206,-0.165687,0.778149,-0.488176,-0.488176,-0.879209,2,9,16,5
747,I75,N,206,207,-0.164147,0.809764,-0.636810,-0.636810,-0.930783,2,9,16,5
748,I75,N,207,148,-0.149094,0.867021,-0.419573,-0.419573,-0.885266,3,9,16,4


The target column is called 'type'. It will be separated from the feature columns and mapped into {0,1}.

In [3]:
numeric_columns = [col for col in df.columns if '0_' in col]
X = df[numeric_columns]
y = df['type']
print(f'The shape of X is {X.shape} and the shape of y is {y.shape}')

The shape of X is (750, 11) and the shape of y is (750,)


In [4]:
class_labels = y.unique()
print(f'The class labels are {class_labels}')

The class labels are ['N' 'VEB' 'SVEB']


In [5]:
y = y.map({'N': 1, 'VEB': 0, 'SVEB':0})
class_labels = y.unique()
print(f'The class labels have become {class_labels}')

The class labels have become [1 0]


Now, we must split the dataset in training and test sets. The training set is used to fit the data, while the test set is used to evaluate the model.

Usually, when some medical data are studied, it is recommended that if one patient recording belongs to the test set, the other recordings of the same patient cannot be stored in the training set (and viceversa). It's for this reason that we do not choose randomly the rows of the dataset, but we choose randomply the 20% of the 75 patients to build the test set.

In [6]:
nr_test_patients = int(75 * 0.2)
print(f'{nr_test_patients} patients are in the test set')

random_records = random.sample(list(df['record'].unique()), k=nr_test_patients)
print(f'The patients randomly chosen are {random_records}')

test_indexes = []
for r in random_records:
    test_indexes  = test_indexes + list(df[df['record'] == r].index)
print(f'The length of the test dataset is {len(test_indexes)}')

15 patients are in the test set
The patients randomly chosen are ['I35', 'I59', 'I70', 'I25', 'I41', 'I30', 'I11', 'I22', 'I65', 'I36', 'I29', 'I31', 'I32', 'I14', 'I61']
The length of the test dataset is 150


In [7]:
X_test = X.loc[test_indexes]
y_test = y[test_indexes]

train_indexes = list(set(X.index) - set(X_test.index))
X_train  = X.loc[train_indexes]
y_train  = y[train_indexes]

print(f'shape of X_test is {X_test.shape}, shape of y_test is {y_test.shape}')
print(f'shape of X_train is {X_train.shape}, shape of y_train is {y_train.shape}')

shape of X_test is (150, 11), shape of y_test is (150,)
shape of X_train is (600, 11), shape of y_train is (600,)


Since the following algorithms are implemented supposing that inputs are NumPy array, then we transform training and test sets. NumPy arrays are very suitable for a big amount of numerical computations. They are optimized for performance, and so they are significantly fast.

In [8]:
X_test = X_test.values
y_test = y_test.values

X_train  = X_train.values
y_train  = y_train.values

print(f'The type of X_test is {type(X_test)}\nThe type of y_test is {type(y_test)}\nThe type of X_train is {type(X_train)}\nThe type of y_train is {type(y_train)}')

The type of X_test is <class 'numpy.ndarray'>
The type of y_test is <class 'numpy.ndarray'>
The type of X_train is <class 'numpy.ndarray'>
The type of y_train is <class 'numpy.ndarray'>


# Accuracy function

For understanding if the classifier predicts well the labels, we can sum the number of time in which class label predicted is equal to the true value. It's for this reason that the accuracy function is implemented.

In [9]:
def accuracy(y_test, y_pred):
    sum_correct = sum([pred==test for pred,test in zip(y_pred,y_test)])
    acc = sum_correct/len(y_test)
    nr_errors = y_test.shape[0] - sum_correct
    print(f'The number of errors is {nr_errors} out of {y_test.shape[0]}')
    print(f'The model has an accuracy of {round(acc * 100, 2)} %')

# $k$-Nearest Neighbors (kNN)

In [10]:
class KNN:
    '''
    Class which implements the k nearest neighbors classifier.
    '''
    def __init__(self, k=3):
        self.k = k
        self.X = None
        self.y = None

    def fit(self, X, y):
        '''
        Function used to train a decision tree classifier model.

        :param X: np.array, features
        :param y: np.array or list, target
        :return: None
        '''
        self.X = X
        self.y = y

    def predict(self, X_test):
        y_pred = []
        for x in X_test:
            # Calculate the distances between x and all training points
            distances = np.linalg.norm(self.X - x, axis=1)
            # Sort training point indices by distance
            sorted_indices = np.argsort(distances)
            # Get the k nearest neighbors
            k_indices = sorted_indices[:self.k]
            # Get the matching labels
            k_labels = self.y[k_indices]
            # Predict the most common label
            unique_labels, counts = np.unique(k_labels, return_counts=True)
            most_common_label = unique_labels[np.argmax(counts)]
            y_pred.append(most_common_label)
        return np.array(y_pred)

In [11]:
knn = KNN(k=3)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
accuracy(y_test, y_pred)

The number of errors is 10 out of 150
The model has an accuracy of 93.33 %


# Random Forest algorithm

First of all a decision tree must be implemented. Then we can build the Radom Forest algorithm with a bootstrap sampling.

In [12]:
class Node:
    '''
    Helper class which implements a single tree node.
    '''
    def __init__(self, feature=None, threshold=None, data_left=None, data_right=None, gain=None, value=None):
        self.feature = feature
        self.threshold = threshold
        self.data_left = data_left
        self.data_right = data_right
        self.gain = gain
        self.value = value

In [13]:
class DecisionTree:
    '''
    Class which implements a decision tree classifier algorithm.
    '''
    def __init__(self, min_samples_split=2, max_depth=5):
        self.min_samples_split = min_samples_split
        self.max_depth = max_depth
        self.root = None

    @ staticmethod # Static methods do not receive any implicit first argument. This means they do not have access to the instance (self) or the class
    def _gini_index(s):
        '''
        Helper function, calculates gini index from an array of integer values.

        :param s: list
        :return: float, gini index value
        '''
        # Convert to integers to avoid runtime errors
        # bincount returns the count of values in each bin from 0 to the largest value in the array
        counts = np.bincount(np.array(s, dtype=np.int64))
        # Probabilities of each class label
        percentages = counts / len(s)

        # Caclulate entropy
        gini = 1
        for pct in percentages:
            if pct > 0:
                gini -= pct ** 2
        return gini


    def _information_gain(self, parent, left_child, right_child):
        '''
        Helper function, calculates information gain from a parent and two child nodes.

        :param parent: list, the parent node
        :param left_child: list, left child of a parent
        :param right_child: list, right child of a parent
        :return: float, information gain
        '''
        # Compute n_k /n, for k = 1,2 where n_1 is the number of samples in left child, n_2 is the number of samples in right child, n is the number of samples in parent node
        num_left = len(left_child) / len(parent)
        num_right = len(right_child) / len(parent)

        # One-liner which implements the previously discussed formula
        return self._gini_index(parent) - (num_left * self._gini_index(left_child) + num_right * self._gini_index(right_child))


    def _best_split(self, X, y):
        '''
        Helper function, calculates the best split for given features and target

        :param X: np.array, features
        :param y: np.array or list, target
        :return: dict
        '''
        best_split = None
        best_info_gain = -1
        n_rows, n_cols = X.shape

        # For every dataset feature
        for f_idx in range(n_cols):
            X_curr = X[:, f_idx]
            # For every unique value of that feature
            for threshold in np.unique(X_curr):
                # Construct a dataset and split it to the left and right parts
                # Left part includes records lower or equal to the threshold
                # Right part includes records higher than the threshold
                df = np.concatenate((X, y.reshape(1, -1).T), axis=1)
                df_left = np.array([row for row in df if row[f_idx] <= threshold])
                df_right = np.array([row for row in df if row[f_idx] > threshold])

                # Do the calculation only if there's data in both subsets
                if len(df_left) > 0 and len(df_right) > 0:
                    # Obtain the value of the target variable for subsets
                    y = df[:, -1]
                    y_left = df_left[:, -1]
                    y_right = df_right[:, -1]

                    # Caclulate the information gain and save the split parameters
                    # if the current split is better then the previous best
                    gain = self._information_gain(y, y_left, y_right)
                    if gain > best_info_gain:
                        best_split = {
                            'feature_index': f_idx,
                            'threshold': threshold,
                            'df_left': df_left,
                            'df_right': df_right,
                            'gain': gain
                        }
                        best_info_gain = gain
        return best_split

    def _build(self, X, y, depth=0):
        '''
        Helper recursive function, used to build a decision tree from the input data.

        :param X: np.array, features
        :param y: np.array or list, target
        :param depth: current depth of a tree, used as a stopping criteria
        :return: Node
        '''
        n_rows, n_cols = X.shape

        # Check to see if a node should be leaf node
        if n_rows >= self.min_samples_split and depth <= self.max_depth:
            # Get the best split
            best = self._best_split(X, y)
            # If the split isn't pure
            if best and best['gain'] > 0:
                # Build a tree on the left
                left = self._build(
                    X=best['df_left'][:, :-1],
                    y=best['df_left'][:, -1],
                    depth=depth + 1
                )
                right = self._build(
                    X=best['df_right'][:, :-1],
                    y=best['df_right'][:, -1],
                    depth=depth + 1
                )
                return Node(
                    feature=best['feature_index'],
                    threshold=best['threshold'],
                    data_left=left,
                    data_right=right,
                    gain=best['gain']
                )
        # Leaf node value is the most common target value
        return Node(
            value=Counter(y).most_common(1)[0][0]
        )

    def fit(self, X, y):
        '''
        Function used to train a decision tree classifier model.

        :param X: np.array, features
        :param y: np.array or list, target
        :return: None
        '''
        # Call a recursive function to build the tree
        self.root = self._build(X, y)

    def _predict(self, x, tree):
        '''
        Helper recursive function, used to predict a single instance (tree traversal).

        :param x: single observation
        :param tree: built tree
        :return: float, predicted class
        '''
        # Leaf node
        if tree.value is not None:
            return tree.value

        feature_value = x[tree.feature]

        # Go to the left
        if feature_value <= tree.threshold:
            return self._predict(x=x, tree=tree.data_left)

        # Go to the right
        if feature_value > tree.threshold:
            return self._predict(x=x, tree=tree.data_right)

    def predict(self, X):
        '''
        Function used to classify new instances.

        :param X: np.array, features
        :return: np.array, predicted classes
        '''
        # Call the _predict() function for every observation
        return [self._predict(x, self.root) for x in X]

In [14]:
dt_model = DecisionTree()
dt_model.fit(X_train, y_train)
y_pred = dt_model.predict(X_test)

accuracy(y_test, y_pred)

The number of errors is 13 out of 150
The model has an accuracy of 91.33 %


In [15]:
class RandomForest:
    '''
    Class which implements random forest algorithm.
    '''
    def __init__(self, n_estimators=5, min_samples_split=2, max_depth=5, n_features=5):
        self.n_estimators = n_estimators
        self.min_samples_split = min_samples_split
        self.max_depth = max_depth
        self.n_features = n_features
        self.trees = []

    def _bootstrap_sample(self, X, y):
        '''
        Helper function, used to generate bootstrap samples.

        :param X: np.array, features
        :param y: np.array or list, target
        :return: two np.array, features and target sampled by bootstrap method
        '''
        n_samples = X.shape[0]
        # Choose randomly (with repetition) the indexes of examples from the original dataset
        indices = np.random.choice(n_samples, n_samples, replace=True)
        return X[indices], y[indices]

    def fit(self, X, y):
        '''
        Function used to train a random forest.

        :param X: np.array, features
        :param y: np.array or list, target
        :return: None
        '''
        for _ in range(self.n_estimators):
            tree = DecisionTree(min_samples_split=self.min_samples_split, max_depth=self.max_depth)
            X_sample, y_sample = self._bootstrap_sample(X, y)
            # Choose randomly n_features
            random_features = [0,1] + random.sample(range(2,X.shape[1]), self.n_features)
            print(f'features of the {_+1} decision tree are {random_features}')
            # Fit the decision tree on the dataset with a reduced number of columns
            tree.fit(X_sample[:,random_features], y_sample)
            self.trees.append(tree)

    def predict(self, X):
        '''
        Function used to classify new instances.

        :param X: np.array, features
        :return: np.array, predicted classes
        '''
        tree_preds = np.array([tree.predict(X) for tree in self.trees])
        tree_preds = np.swapaxes(tree_preds, 0, 1)
        y_pred = [Counter(tree_pred).most_common(1)[0][0] for tree_pred in tree_preds]
        return y_pred

Because of the randomicity in the choice of features subset, we run the algorithm more the one time to visualize possible changes.

In [16]:
col_names = X.columns
for i, feature in enumerate(col_names):
    print(f'The index {i} is related to the name column {feature}')


The index 0 is related to the name column 0_pre-RR
The index 1 is related to the name column 0_post-RR
The index 2 is related to the name column 0_pPeak
The index 3 is related to the name column 0_tPeak
The index 4 is related to the name column 0_rPeak
The index 5 is related to the name column 0_sPeak
The index 6 is related to the name column 0_qPeak
The index 7 is related to the name column 0_qrs_interval
The index 8 is related to the name column 0_pq_interval
The index 9 is related to the name column 0_qt_interval
The index 10 is related to the name column 0_st_interval


In [17]:
for i in range(3):
    rf_model = RandomForest(n_estimators=3, n_features=2, max_depth=4)
    rf_model.fit(X_train, y_train)
    y_pred = rf_model.predict(X_test)
    accuracy(y_test, y_pred)


features of the 1 decision tree are [0, 1, 7, 5]
features of the 2 decision tree are [0, 1, 3, 10]
features of the 3 decision tree are [0, 1, 6, 3]
The number of errors is 15 out of 150
The model has an accuracy of 90.0 %
features of the 1 decision tree are [0, 1, 8, 5]
features of the 2 decision tree are [0, 1, 10, 3]
features of the 3 decision tree are [0, 1, 5, 4]
The number of errors is 13 out of 150
The model has an accuracy of 91.33 %
features of the 1 decision tree are [0, 1, 6, 5]
features of the 2 decision tree are [0, 1, 7, 2]
features of the 3 decision tree are [0, 1, 2, 3]
The number of errors is 3 out of 150
The model has an accuracy of 98.0 %


After many trials, it was observed that to fit data mantaining the first and the second columns, the accuracy is never under 70%. The other columns are chosen randomly.

# Artificial Neural Network

## Useful functions

In [18]:
def relu(x):
        '''
        Helper function, used to compute ReLU.

        :param x: float or np.array or list
        :return: float in (0,+inf)
        '''
        if (type(x) != float) & (type(x) != int):
            return np.array([max(0,z) for z in x])
        return max(0,x)


def relu_derivative(x):
    '''
    Helper function, used to compute derivative of ReLU.

    :param x: float or np.array or list
    :return: float in (0,+inf)
    '''
    if (type(x) != float) & (type(x) != int):
        der = []
        for z in x:
            if z > 0:
                der.append(1)
            else:
                der.append(0)
    else:
        if x > 0:
            der = 1
        else:
            der = 0
    return der


def sigmoid(x):
    '''
    Helper function, used to compute sigmoid.

    :param x: float
    :return: float in (0,1)
    '''
    return 1 / (1 + np.exp(-x))



def sigmoid_derivative(x):
    '''
    Helper function, used to compute sigmoid derivative.

    :param x: float
    :return: float
    '''
    return sigmoid(x) * (1 - sigmoid(x))



def cross_entropy(y_true, y_pred):
    '''
    Helper function, used to compute cross entropy.

    :param y_true: bool, target
    :param y_pred: float, probability in (0,1)
    :return: float
    '''
    if y_true == y_pred:
        return 0
    # To avoid predictions 0 or 1
    y_pred = np.clip(y_pred, 1e-15, 1 - 1e-15)
    return - (y_true * np.log(y_pred)) - ((1-y_true) * np.log(1-y_pred))


def cross_entropy_derivative(y_true, y_pred):
    '''
    Helper function, used to compute derivative of cross entropy.

    :param y_true: bool, target
    :param y_pred: float, probability in (0,1)
    :return: float
    '''
    y_pred = np.clip(y_pred, 1e-15, 1 - 1e-15)
    return (y_pred -y_true) / (y_pred * (1-y_pred))


## Algorithm

In [19]:
class NeuralNetwork:
    '''
    Class which implements an artificial neural network.
    '''
    def __init__(self, input_size, hidden_sizes, output_size, learning_rate=0.01):
        '''
        Initialization of weights and biases.

        :param input_size: int, number of input layer nodes
        :param hidden_sizes: list, number of nodes for each hidden layer
        :param output_size: int, number of out layer nodes
        :param learning_rate: float, default learning rate is 0.01
        :return: None
        '''
        self.input_size = input_size
        self.hidden_sizes = hidden_sizes
        self.output_size = output_size
        self.learning_rate = learning_rate
        self.Z = []
        self.A = []
        self.predictions = []

        if len(self.hidden_sizes) == 1:
            # Initialization of weights
            np.random.seed(42)
            self.weights_input_hidden = np.random.uniform(-1, 1, (self.input_size, self.hidden_sizes[0]))
            self.weights_hidden_output = np.random.uniform(-1, 1, (self.hidden_sizes[0], self.output_size))
            # Initialization of biases
            self.bias_hidden = np.zeros(self.hidden_sizes[0])
            self.bias_output = np.zeros(self.output_size)

        else:
            # Initialization of weights
            np.random.seed(42)
            self.weights_input_hidden = np.random.uniform(-1, 1, (self.input_size, self.hidden_sizes[0]))
            self.weights_hidden_hidden = [np.random.uniform(-1, 1, (self.hidden_sizes[i], self.hidden_sizes[i+1])) for i in range(len(self.hidden_sizes)-1)]
            if self.output_size == 1:
                self.weights_hidden_output = np.random.uniform(-1, 1, self.hidden_sizes[-1])
            else:
                self.weights_hidden_output = np.random.uniform(-1, 1, (self.hidden_sizes[-1], self.output_size))
            # Initialization of biases
            self.bias_hidden = []
            for hid in self.hidden_sizes:
                self.bias_hidden.append(np.zeros(hid))
            self.bias_output = np.zeros(self.output_size)





    def _forward_pass(self, X):
        '''
        Helper function, used to do the forward pass.

        :param X: np.array, features
        :return: np.array, predicted probabilities
        '''
        # Initialization of output of hidden layes (z) and activations (a)
        self.Z = []
        self.A = []
        # Initialization of predictions
        self.predictions = []

        for i,x in enumerate(X):
            z = []
            a = []

            if len(self.hidden_sizes) == 1:
                # Calculation of hidden and output layers outputs
                z.append(np.dot(x, self.weights_input_hidden) + self.bias_hidden)
                a.append(relu(z[-1]))

                z.append(np.dot(a[-1], self.weights_hidden_output) + self.bias_output)
                a.append(sigmoid(z[-1]))


            else:
                z.append(np.dot(x, self.weights_input_hidden) + self.bias_hidden[0])
                a.append(relu(z[-1]))

                for layer in range(len(self.hidden_sizes)-1):
                    z.append(np.dot(a[-1], self.weights_hidden_hidden[layer]) + self.bias_hidden[layer+1])
                    a.append(relu(z[-1]))

                z.append(np.dot(a[-1], self.weights_hidden_output) + self.bias_output)
                a.append(sigmoid(z[-1]))

            self.Z.append(z)
            self.A.append(a)
            self.predictions.append(a[-1])
        return np.array(self.predictions)



    def _delta(self, y):
        '''
        Helper function, useed for computing the partial derivative of loss wrt the activation

        '''
        self.delta = []
        for index,y_true in enumerate(y):
            # L is the sum of hidden and output layers
            L = len(self.hidden_sizes) + 1

            # Initialization of delta
            d = [np.zeros(self.hidden_sizes[0])]
            if len(self.hidden_sizes) > 1:
                for s in (self.hidden_sizes[1:]):
                    d.append(np.zeros(s))
            d.append(np.zeros(self.output_size))

            # delta output [L-1]
            d[L-1] = (cross_entropy_derivative(y_true, self.A[index][L-1][0]))

            # delta hidden [L-2]
            d[L-2] = d[L-1] * sigmoid_derivative(self.Z[index][L-1][0]) * self.weights_hidden_output


            if L >= 3:
                # delta hidden [L-3,...,0]:
                for l in range(L-3, -1, -1):
                    for j in range(len(self.weights_hidden_hidden[l])):
                        d[l][j] = np.sum(list(d[l+1] * np.array(relu_derivative(self.Z[index][l+1])) * self.weights_hidden_hidden[l][j]))

            self.delta.append(d)


        return self.delta




    def _backward_pass(self, X, y):
        '''
        Helper function, used to perform the backward pass.

        :param X: np.array, features
        :param y: np.array, target
        :return: np.array, predicted probabilities
        '''
        sum_der_w = [np.zeros((self.input_size, self.hidden_sizes[0]))]
        if len(self.hidden_sizes) > 1:
            for s in range(len(self.hidden_sizes)-1):
                sum_der_w.append(np.zeros((self.hidden_sizes[s],self.hidden_sizes[s+1])))
        sum_der_w.append(np.zeros((self.hidden_sizes[-1], self.output_size)))
        sum_der_w = np.array(sum_der_w, dtype=object)

        sum_der_b = []
        for s in (self.hidden_sizes):
            sum_der_b.append(np.zeros(s))
        sum_der_b.append(np.zeros(self.output_size))
        sum_der_b = np.array(sum_der_b, dtype=object)

        for n,x in enumerate(X):
            L = len(self.hidden_sizes)

            # Partial derivatives of a wrt to z
            der_a_z = []
            for l in range(len(self.hidden_sizes)):
                der_a_z.append(np.array(relu_derivative(self.Z[n][l])))
            der_a_z.append(sigmoid_derivative(self.Z[n][L]))

            # Partial derivative of C wrt w
            der_C_w = []
            der_C_w.append(np.array([self.delta[n][0].flatten() * der_a_z[0] * xj for xj in x]))

            if L > 1:
                for l in range(1,L):
                    der_C_w.append(np.array([self.delta[n][l].flatten() * der_a_z[l] * aj  for aj in self.A[n][l-1]]))

            der_C_w.append(np.array([self.delta[n][L].flatten() * der_a_z[L] * aj  for aj in self.A[n][-2]]))
            der_C_w = np.array(der_C_w, dtype=object)

            sum_der_w = sum_der_w + der_C_w

            # Partial derivatives of C wrt b
            der_C_b = []


            for l in range(0,L):
                der_C_b.append(self.delta[n][l] * der_a_z[l])

            der_C_b.append(self.delta[n][L] * der_a_z[L] )
            der_C_b = np.array(der_C_b, dtype=object)


            # Addition of derivatives
            sum_der_b = sum_der_b + der_C_b




        # Update of weights
        self.weights_input_hidden -= self.learning_rate * sum_der_w[0]
        for hid in range(len(self.hidden_sizes)-1):
            self.weights_hidden_hidden[hid] -= self.learning_rate * sum_der_w[hid+1]
        if len(self.hidden_sizes) > 1:
          self.weights_hidden_output -= self.learning_rate * sum_der_w[-1].flatten()
        else:
          self.weights_hidden_output -= self.learning_rate * sum_der_w[-1]


        # Update of bias
        for hid in range(len(self.hidden_sizes)-1):
            self.bias_hidden[hid] -= self.learning_rate * sum_der_b[hid]
        self.bias_output -= self.learning_rate * sum_der_b[-1]



    def fit(self, X, y, epochs=1000):
        for epoch in range(epochs):
            self._forward_pass(X)
            self._delta(y)
            self._backward_pass(X, y)

            # Calculation of cross entropy
            loss = np.zeros(X.shape[0])
            for i,y_true,y_pred in zip(range(X.shape[0]), y, self.predictions):
                loss[i] = cross_entropy(y_true, y_pred)
            print(f'Epoch {epoch}, Loss: {np.mean(loss)}')

    def predict(self, X):
        output = self._forward_pass(X)
        y_pred = []
        for o in output:
            if o > 0.5:
                y_pred.append(1)
            else:
                y_pred.append(0)

        return y_pred


In [20]:
nn_model = NeuralNetwork(input_size=11, hidden_sizes=[2,3,4], output_size=1, learning_rate=0.001)
nn_model.fit(X_train, y_train, epochs=15)
y_pred = nn_model.predict(X_test)
accuracy(y_test, y_pred)

  loss[i] = cross_entropy(y_true, y_pred)


Epoch 0, Loss: 28.4903144980968
Epoch 1, Loss: 0.6142097063932765
Epoch 2, Loss: 0.5550810312717986
Epoch 3, Loss: 0.5118500806425518
Epoch 4, Loss: 0.47983780947032934
Epoch 5, Loss: 0.45579795019715497
Epoch 6, Loss: 0.4374915596018487
Epoch 7, Loss: 0.42336654317100275
Epoch 8, Loss: 0.41233528722177243
Epoch 9, Loss: 0.40362538562929334
Epoch 10, Loss: 0.3966804007752502
Epoch 11, Loss: 0.39109373691203475
Epoch 12, Loss: 0.3865642005603818
Epoch 13, Loss: 0.38286578509906294
Epoch 14, Loss: 0.37982685613823153
The number of errors is 21 out of 150
The model has an accuracy of 86.0 %
