In [2]:
import os
import pandas as pd
from scipy.signal import butter, filtfilt

def butter_bandpass_filter(data, high_cutoff, sampling_freq):
    nyquist_freq = 0.5 * sampling_freq
    high_cutoff_norm = high_cutoff / nyquist_freq
    b, a = butter(N=4, Wn=high_cutoff_norm, btype='lowpass')
    filtered_data = filtfilt(b, a, data)
    return filtered_data

def apply_moving_average(data, window_size):
    return data.rolling(window=window_size, min_periods=1).mean()

# Define the parameters
sampling_freq = 250 # this is sampling frequency for IIST BCI Dataset
high_cutoff = 50 
window_size = 5

# Input and output directories
input_dir = "C:/Users/sgt17/Desktop/Marathi-English Vocal/MUMMY_CSV_Files"  # Path to the folder containing input CSV files
output_dir = "C:/Users/sgt17/Desktop/Marathi-English Vocal/MUMMY_A-N_Filtered"  # Path to the folder where filtered CSV files will be saved

# Create the output directory if it doesn't exist
if not os.path.exists(output_dir):
    os.makedirs(output_dir)

# Process each CSV file in the input directory
for filename in os.listdir(input_dir):
    if filename.endswith('.csv'):
        input_path = os.path.join(input_dir, filename)
        output_path = os.path.join(output_dir, filename)
        
        # Load EEG data from CSV file
        eeg_data = pd.read_csv(input_path)
        
        # Ignore columns 1 and 10 as they are not EEG data
        eeg_data = eeg_data.drop(columns=eeg_data.columns[[0, 9]])
        
        # Apply bandpass filter to remove frequencies greater than 50 Hz
        for col in eeg_data.columns:
            eeg_data[col] = butter_bandpass_filter(eeg_data[col], high_cutoff, sampling_freq)
        
        # Apply moving average filter to remove artifacts
        for col in eeg_data.columns:
            eeg_data[col] = apply_moving_average(eeg_data[col], window_size)
        
        # Save the filtered EEG data to a new CSV file
        eeg_data.to_csv(output_path, index=False)


# Explanation

The goal is to remove noise and artifacts from the EEG signals using butterworth low pass  and moving average filters.

Here's an explanation of the script:

1. Importing Libraries : The script imports necessary libraries including `os` for file operations, `pandas` for data manipulation, and `butter` and `filtfilt` functions from `scipy.signal` for applying the Butterworth filter.

2. butter_bandpass_filter Function: This function implements a Butterworth bandpass filter to remove frequencies higher than the specified cutoff frequency (`high_cutoff`). It takes the EEG data, cutoff frequency, and sampling frequency as input and returns the filtered data.

3. apply_moving_average Function: This function applies a moving average filter to smooth out the EEG signals and remove artifacts. It takes the EEG data and the window size for the moving average as input and returns the filtered data.

4. Define Parameters: The script defines parameters such as the sampling frequency (`sampling_freq`), high cutoff frequency (`high_cutoff`), and window size for the moving average filter (`window_size`).

5. Input and Output Directories: Paths to the input directory containing the raw CSV files and the output directory where filtered CSV files will be saved are defined.

6. Processing CSV Files: The script iterates through each CSV file in the input directory. For each file:
   - EEG data is loaded from the CSV file.
   - Columns that are not EEG data (e.g., timestamps) are dropped.
   - The Butterworth bandpass filter is applied to each EEG channel to remove high-frequency noise.
   - The moving average filter is applied to smooth out the signals and remove artifacts.
   - The filtered EEG data is saved to a new CSV file in the output directory.

By applying these filters, the script aims to preprocess the EEG data and enhance its quality for further analysis, such as feature extraction and classification.

# Information on Filters

1) Butter-worth low-pass filter is type of signal processing filter used to attenuate higher frequency components of signal while allowing lower frequency components to pass through minimal distortion. This gives maximally flat frequency response.
   
2) The moving average filter is a simple technique used to smooth out time-series data by averaging neighboring data points within a specified window.

3) When we apply filter to EEG data, we are essentially applying it directly to time domain voltage values. The process of applying filter involves convolving time domain signal with impulse response of filter which effectively removes certain frequency copmponents. This is done internally, we dont need to explicitly convert data into frequency domain.