# EKG Data Loading and Preprocessing

This notebook implements mathematical expressions for loading and preprocessing EKG/ECG data for BPNN (Backpropagation Neural Network) model training.

## Mathematical Formulations

### 1. Signal Normalization
For raw EKG signal $x_i$, we normalize to range [0, 1]:

$$x_{norm} = \frac{x_i - x_{min}}{x_{max} - x_{min}}$$

### 2. Z-Score Normalization
Alternative standardization method:

$$x_{std} = \frac{x_i - \mu}{\sigma}$$

where $\mu$ is the mean and $\sigma$ is the standard deviation.

### 3. Moving Average Filter
For noise reduction with window size $w$:

$$x_{filtered}[n] = \frac{1}{w} \sum_{k=0}^{w-1} x[n-k]$$

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import signal

# Set random seed for reproducibility
np.random.seed(42)

## Data Loading Functions

In [None]:
def load_ekg_data(filepath):
    """
    Load EKG data from CSV file.
    
    Parameters:
    -----------
    filepath : str
        Path to the EKG data file
    
    Returns:
    --------
    data : numpy.ndarray
        Raw EKG signal data
    """
    try:
        data = pd.read_csv(filepath)
        return data.values
    except FileNotFoundError:
        print(f"File {filepath} not found. Generating synthetic data.")
        return generate_synthetic_ekg()

def generate_synthetic_ekg(duration=10, sampling_rate=250):
    """
    Generate synthetic EKG data for testing.
    
    Parameters:
    -----------
    duration : float
        Duration in seconds
    sampling_rate : int
        Samples per second
    
    Returns:
    --------
    signal : numpy.ndarray
        Synthetic EKG signal
    """
    t = np.linspace(0, duration, duration * sampling_rate)
    # Simulate heart rate at ~70 bpm
    heart_rate = 70 / 60  # Hz
    # Generate QRS complex approximation
    signal_data = np.sin(2 * np.pi * heart_rate * t) + \
                  0.3 * np.sin(4 * np.pi * heart_rate * t) + \
                  0.1 * np.random.randn(len(t))
    return signal_data

## Preprocessing Functions

### Min-Max Normalization Implementation

In [None]:
def normalize_minmax(data):
    """
    Apply min-max normalization: x_norm = (x - x_min) / (x_max - x_min)
    
    Parameters:
    -----------
    data : numpy.ndarray
        Input signal data
    
    Returns:
    --------
    normalized : numpy.ndarray
        Normalized signal in range [0, 1]
    """
    x_min = np.min(data)
    x_max = np.max(data)
    if x_max - x_min == 0:
        return np.zeros_like(data)
    return (data - x_min) / (x_max - x_min)

def normalize_zscore(data):
    """
    Apply z-score normalization: x_std = (x - μ) / σ
    
    Parameters:
    -----------
    data : numpy.ndarray
        Input signal data
    
    Returns:
    --------
    standardized : numpy.ndarray
        Standardized signal with mean=0, std=1
    """
    mu = np.mean(data)
    sigma = np.std(data)
    if sigma == 0:
        return np.zeros_like(data)
    return (data - mu) / sigma

### Signal Filtering

In [None]:
def apply_moving_average(data, window_size=5):
    """
    Apply moving average filter: x_filtered[n] = (1/w) * Σ(x[n-k]) for k=0 to w-1
    
    Parameters:
    -----------
    data : numpy.ndarray
        Input signal data
    window_size : int
        Size of the moving average window
    
    Returns:
    --------
    filtered : numpy.ndarray
        Filtered signal
    """
    return np.convolve(data, np.ones(window_size)/window_size, mode='same')

def apply_bandpass_filter(data, lowcut=0.5, highcut=40, fs=250, order=4):
    """
    Apply Butterworth bandpass filter for EKG signal.
    
    Parameters:
    -----------
    data : numpy.ndarray
        Input signal data
    lowcut : float
        Low cutoff frequency (Hz)
    highcut : float
        High cutoff frequency (Hz)
    fs : int
        Sampling frequency (Hz)
    order : int
        Filter order
    
    Returns:
    --------
    filtered : numpy.ndarray
        Filtered signal
    """
    nyquist = 0.5 * fs
    low = lowcut / nyquist
    high = highcut / nyquist
    b, a = signal.butter(order, [low, high], btype='band')
    return signal.filtfilt(b, a, data)

## Feature Extraction

### Heart Rate Variability (HRV) Metrics

#### SDNN (Standard Deviation of NN intervals)
$$SDNN = \sqrt{\frac{1}{N-1} \sum_{i=1}^{N} (RR_i - \overline{RR})^2}$$

#### RMSSD (Root Mean Square of Successive Differences)
$$RMSSD = \sqrt{\frac{1}{N-1} \sum_{i=1}^{N-1} (RR_{i+1} - RR_i)^2}$$

In [None]:
def calculate_sdnn(rr_intervals):
    """
    Calculate SDNN: sqrt((1/(N-1)) * Σ(RR_i - RR_mean)²)
    
    Parameters:
    -----------
    rr_intervals : numpy.ndarray
        R-R intervals in milliseconds
    
    Returns:
    --------
    sdnn : float
        SDNN value in milliseconds
    """
    return np.std(rr_intervals, ddof=1)

def calculate_rmssd(rr_intervals):
    """
    Calculate RMSSD: sqrt((1/(N-1)) * Σ(RR_{i+1} - RR_i)²)
    
    Parameters:
    -----------
    rr_intervals : numpy.ndarray
        R-R intervals in milliseconds
    
    Returns:
    --------
    rmssd : float
        RMSSD value in milliseconds
    """
    successive_diffs = np.diff(rr_intervals)
    return np.sqrt(np.mean(successive_diffs**2))

## Data Segmentation for BPNN Input

### Window-based Segmentation
For a signal of length $T$ with window size $w$ and stride $s$:

$$N_{windows} = \lfloor \frac{T - w}{s} \rfloor + 1$$

In [None]:
def create_windows(data, window_size=250, stride=125):
    """
    Create overlapping windows from signal data.
    N_windows = floor((T - w) / s) + 1
    
    Parameters:
    -----------
    data : numpy.ndarray
        Input signal data
    window_size : int
        Size of each window
    stride : int
        Step size between windows
    
    Returns:
    --------
    windows : numpy.ndarray
        Array of windowed segments, shape (n_windows, window_size)
    """
    n_windows = (len(data) - window_size) // stride + 1
    windows = np.zeros((n_windows, window_size))
    
    for i in range(n_windows):
        start_idx = i * stride
        end_idx = start_idx + window_size
        windows[i] = data[start_idx:end_idx]
    
    return windows

## Example Usage

In [None]:
# Generate or load EKG data
ekg_signal = generate_synthetic_ekg(duration=10, sampling_rate=250)

# Apply preprocessing
filtered_signal = apply_bandpass_filter(ekg_signal)
normalized_signal = normalize_minmax(filtered_signal)

# Create windows for BPNN input
windows = create_windows(normalized_signal, window_size=250, stride=125)

print(f"Original signal shape: {ekg_signal.shape}")
print(f"Windowed data shape: {windows.shape}")
print(f"Number of windows: {windows.shape[0]}")
print(f"Window size: {windows.shape[1]}")

# Visualize
plt.figure(figsize=(15, 8))

plt.subplot(3, 1, 1)
plt.plot(ekg_signal[:1000])
plt.title('Raw EKG Signal')
plt.ylabel('Amplitude')

plt.subplot(3, 1, 2)
plt.plot(filtered_signal[:1000])
plt.title('Filtered EKG Signal')
plt.ylabel('Amplitude')

plt.subplot(3, 1, 3)
plt.plot(normalized_signal[:1000])
plt.title('Normalized EKG Signal')
plt.xlabel('Sample')
plt.ylabel('Normalized Amplitude')

plt.tight_layout()
plt.show()