# Detecting electrode inversion in an ECG


The ECG is a time series that measures the electrical activity of the heart. This is the main tool to diagnose heart diseases. Recording an ECG is simple: 3 electrodes are placed at the ends of limbs, and 6 on the anterior chest.This generates **12 time series**, called leads, each corresponding to a difference in potential between a pair of electrodes.

The electrodes' position is very important to correctly interpret the ECG. Making the mistake of inverting electrodes compromises interpretation, either because the leads do not explore the expected area (errors in the measures of hypertrophia indices, in the analysis of the ST segment), or because they generate false abnormalities (fake Q waves, error in the heart's axis...).

Inversion errors are frequent (5% of ECGs), and only experts (cardiologists) manage to detect them. But most ECGs are not interpreted by experts: only 30% are, the rest being interpreted by nurses or general practitioners. An algorithm for automatic detection of electrode inversion is therefore paramount to the correct interpretation of ECGs  and would improve the quality of diagnosis.

This project is intended to make you detect electrode inversion in an ECG. The dataset at your disposal contains ECGs from a cardiology center. **An ECG will be labeled as correctly realised (0) or as inverted (1).**
The goal is to perform **binary classification** on these ECGs.

![image.png](attachment:image.png)

## Inversions

Inversions do not necessarily correspond to the inversion of only 2 leads:
* Precordial leads from (V1, ..., V6) can be inverted: V1 becomes V6, V2 becomes V5...
* 2 electrodes can be exchanged, which modifies several leads from ML1, ML2, ML3, AVF, AVR, AVL. For instance, if electrodes of the right and left arms are inverted, then ML1 becomes -ML1, ML2 and ML3 are inverted, AVL and AVR are inverted and AVF remains the same. More details here: https://litfl.com/ecg-limb-lead-reversal-ecg-library/


## Data

Data is available at the following link:
https://drive.google.com/file/d/1tdjqbkRNqxfDdNbb6pXiX8DvCaFlMOjd/view?usp=sharing

In the archive, you will find:
* input_training.npy
* output_training.npy
* input_test.npy

The training data contains 1400 ECGs and their labels. For each ECG, the data consists of **10 seconds** of recording for **12 leads**, each sampled at **250Hz**.

The testing data contains 2630 ECGs on which you will give your predictions at the end of the homework in a numpy array with a shape (2630,).

Each input file therefore contains the ECG signal in the form **(n_ecgs, n_samples=2500, n_leads=12)**.

## Code

You are free to choose the libraries you use for your implementations.
Use of Keras is preferred for conciseness, but you can use a different DL library if you are unfamiliar with Keras. This will not affect evaluation of the notebook.

## Evaluation

We will use **accuracy** as a metric to evaluate your predictions on the test set.


## Objective

The key objective of this homework is to propose a **deep learning model** relevant to the task that shows good accuracy in detection of lead inversion. A strong notebook should be readable, reproducible and the code must be clean.

Please send back:
- a Jupyter Notebook explaining your process and commenting your results,
- a npy file containing the predictions on the test set

In [None]:
!pip install ecg_plot

Collecting ecg_plot
  Downloading ecg_plot-0.2.8-py3-none-any.whl (9.2 kB)
Installing collected packages: ecg_plot
Successfully installed ecg_plot-0.2.8


In [3]:
!pip install keras-tuner -q


In [12]:
import numpy as np
#import ecg_plot
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from tensorflow import keras
from sklearn.decomposition import PCA
from scipy import signal
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv1D, MaxPooling1D, Flatten, Dense
from keras_tuner import RandomSearch


## 1. Load and analyze the data

In [5]:
data = np.load('data/input_training.npy')
labels = np.load('data/output_training.npy')


Data preprocessing



In [6]:
def data_preprocessing(raw_data):
  preprocessed_data = np.empty((raw_data.shape[0],2500,12))
  for i in range(len(raw_data)):
    single_ecg = raw_data[i]
    ecg_moveaxis = single_ecg.T
    preprocessed_data[i] =  ecg_moveaxis
  return preprocessed_data


In [7]:
X = data_preprocessing(data)
x_train,x_test,y_train,y_test = train_test_split(X,labels, test_size=0.20, random_state=42)

In [8]:
x_train.shape

(1120, 2500, 12)

## 2. Classification using the raw signal
The goal here is to perform classification using directly the raw data.

### a) Which variant of neural networks would be more adequate for the task? (RNN, CNN, DNN...)



*Your answer: I decide to use a CNN architecture because my goal is to extract features from each lead, so if we consider a lead data as a image the best way to extract features on a image is with CNN*

### b) Train and evaluate a classifier using the raw signal
Train and evaluate the method of your choice using only the signal from the training set.

We expect:
- a simple architecture relevant for the task
- a model converging without overfitting
- high performances on the testing set

Note: It is not complicated to reach an accuracy of 85% on the testing set

**model buildind and hyperparmeters optimization**

In [10]:
x_train_search,x_val,y_train_seach,y_val =  train_test_split(x_train,y_train, test_size=0.20, random_state=40)

In [9]:
def build_model(hp):
    inputlayer = keras.layers.Input(shape=(2500,12))

    conv1 = keras.layers.Conv1D(filters=hp.Int('conv_1_filter', min_value=8, max_value=200, step=10), kernel_size=hp.Int('conv_1_kernel', min_value=5, max_value=30, step=10), padding='same')(inputlayer)
    conv1 = keras.layers.BatchNormalization()(conv1)
    conv1 = keras.layers.Activation(activation='relu')(conv1)
    conv1 = keras.layers.SpatialDropout1D(0.1)(conv1)

    conv2 = keras.layers.Conv1D(filters=hp.Int('conv_2_filter', min_value=16, max_value=300, step=10), kernel_size=hp.Int('conv_2_kernel', min_value=5, max_value=30, step=5), padding='same')(conv1)
    conv2 = keras.layers.BatchNormalization()(conv2)
    conv2 = keras.layers.Activation('relu')(conv2)
    conv2 = keras.layers.SpatialDropout1D(0.1)(conv2)

    conv3 = keras.layers.Conv1D(filters=hp.Int('conv_3_filter', min_value=12, max_value=600, step=10), kernel_size=hp.Int('conv_3_kernel', min_value=5, max_value=30, step=5),padding='same')(conv2)
    conv3 = keras.layers.BatchNormalization()(conv3)
    conv3 = keras.layers.Activation('relu')(conv3)
    conv3 = keras.layers.Dropout(0.2)(conv3)

    gap_layer = keras.layers.GlobalAveragePooling1D()(conv3)


    output_layer = tf.keras.layers.Dense(units=1,activation='sigmoid', name='output_layer')(gap_layer)

    model = keras.Model(inputs=inputlayer, outputs=output_layer)

    model.compile(loss=tf.keras.losses.BinaryCrossentropy(), optimizer=tf.keras.optimizers.Adam(),
    metrics=[tf.keras.metrics.BinaryAccuracy(name='accuracy', dtype=None, threshold=0.5)])
    return model

In [14]:
tuner = RandomSearch(build_model,
                    objective='val_accuracy',
                    max_trials = 5)
tuner.search(x_train_search,y_train_seach,epochs=100,validation_data=(x_val,y_val))

Trial 5 Complete [00h 05m 24s]
val_accuracy: 0.9464285969734192

Best val_accuracy So Far: 0.9553571343421936
Total elapsed time: 00h 26m 00s


Model training

In [15]:
my_model=tuner.get_best_models(num_models=1)[0]
my_model.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 2500, 12)]        0         
                                                                 
 conv1d (Conv1D)             (None, 2500, 68)          12308     
                                                                 
 batch_normalization (Batch  (None, 2500, 68)          272       
 Normalization)                                                  
                                                                 
 activation (Activation)     (None, 2500, 68)          0         
                                                                 
 spatial_dropout1d (Spatial  (None, 2500, 68)          0         
 Dropout1D)                                                      
                                                                 
 conv1d_1 (Conv1D)           (None, 2500, 266)         362026

In [16]:
#training
epochs = 100
batch_size = 32
callbacks = [
    keras.callbacks.ModelCheckpoint(
        "best_model.h5", save_best_only=True, monitor="val_loss"
    ),
    keras.callbacks.ReduceLROnPlateau(
        monitor="val_loss", factor=0.5, patience=20, min_lr=0.0001
    ),
    keras.callbacks.EarlyStopping(monitor="val_loss", patience=50, verbose=1),
]
history = my_model.fit(
    x_train,
    y_train,
    batch_size=batch_size,
    epochs=epochs,
    callbacks=callbacks,
    validation_split=0.2,
    verbose=1,
)

Epoch 1/100
Epoch 2/100


  saving_api.save_model(


Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78/100
Epoch 79/100
Epoch 

Model evaluation

evaluation on x_test and y_test set

In [32]:
model = keras.models.load_model("best_model.h5")
test_loss, test_acc = model.evaluate(x_test, y_test)
print("Test accuracy", test_acc)

Test accuracy 0.9607142806053162


### c) What would you explore to improve your results?

*Your answer: To improve my results, I will try to find a certain data pre-processing that will allow better detection of electrode inversion by the model, and I will also test other RNN or DNN-based architectures.*

## 3. Prediction on the test set
Use the output model of section 2 to make predictions on the testing set.

**Save your predictions in a file predictions.npy that you will send along with your notebook.**

The expected format is a binary array of shape (n_ecgs=2630,) where each value corresponds to the prediction on the corresponding ECG of the test set.

In [31]:
#prediction on raw data

def predict(raw_data):
  #preprocess raw_data
  eval_set = data_preprocessing(raw_data)
  #build_prediction
  real_predictions = np.empty(len(eval_set))
  model = keras.models.load_model("best_model.h5")
  predictions = model.predict(eval_set)
  for i in range(len(predictions)):
    pred = predictions[i]
    if pred[0]>0.5:
      real_predictions[i] = 1
    else:
      real_predictions[i] = 0

  return real_predictions


In [29]:
def predict_and_score_on_raw_data(raw_data,raw_data_labels):
  #preprocess raw_data
  eval_set = data_preprocessing(raw_data)
  #build_prediction
  real_predictions = np.empty(len(eval_set))
  model = keras.models.load_model("best_model.h5")
  predictions = model.predict(eval_set)
  for i in range(len(predictions)):
    pred = predictions[i]
    if pred[0]>0.5:
      real_predictions[i] = 1
    else:
      real_predictions[i] = 0
  print(f"accuracy:{np.count_nonzero(real_predictions==labels)/len(labels)}")
  return real_predictions

In [22]:
evaluation_set = np.load('candidate_files/input_test_set.npy')
predictions = predict(evaluation_set)




In [23]:
np.save("predictions.npy", predictions)