<a href="https://colab.research.google.com/github/sanchita0608/Neuroimaging/blob/main/EEG_processing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## EEG Signal Analysis

This repository contains Python code for performing fundamental analysis on Electroencephalography (EEG) signals. The script processes raw EEG data, applies bandpass filtering to isolate specific brainwave frequencies, visualizes both raw and filtered signals, computes Power Spectral Density (PSD), calculates absolute band power, and derives relative band power.

### Features:

*   **Data Loading and Parsing**: Reads raw EEG data from CSV files, handling channel names and numerical data extraction.
*   **Bandpass Filtering**: Implements a Butterworth bandpass filter to extract common EEG frequency bands: Delta, Theta, Alpha, Beta, and Gamma.
*   **Time-Domain Visualization**: Plots raw and band-filtered EEG signals over time for visual inspection.
*   **Power Spectral Density (PSD)**: Utilizes Welch's method to compute and visualize the PSD of raw and filtered EEG signals, illustrating the power distribution across frequencies.
*   **Absolute Band Power Calculation**: Quantifies the average power within each defined frequency band.
*   **Relative Band Power Calculation**: Determines the percentage contribution of each frequency band to the total EEG power, offering insights into brain state and activity.
*   **Automated Processing**: Designed to iterate through multiple subject files, hemispheres (Left/Right), and electrodes, automating the analysis workflow.
*   **Output**: Saves calculated band power values to Excel files for further statistical analysis.

### EEG Bands Defined:

*   **Delta**: 0.5–4 Hz
*   **Theta**: 4–8 Hz
*   **Alpha**: 8–13 Hz
*   **Beta**: 13–30 Hz
*   **Gamma**: 30–100 Hz

### Usage:

1.  **Data Organization**: Ensure your EEG data CSV files are organized in a structured directory within your Google Drive, as specified by `base_drive_path`.
    *   `base_drive_path/chbXXchbXX/H/ELECTRODE.csv`
    *   `XX`: Subject ID (e.g., '01', '02')
    *   `H`: Hemisphere ('L' or 'R')
    *   `ELECTRODE`: Electrode type (e.g., 'F7-T7', 'Fp1-F3')

2.  **Configuration**: Adjust the `base_drive_path` and the `subjects` list in the first code cell (`Configuration`). The `subjects` list can be programmatically generated for a range of subjects.

3.  **Execution**: Run the provided Jupyter/Colab notebook cells sequentially to:
    *   Generate file paths for all specified subjects and electrodes.
    *   Process each EEG file, applying filters and generating time-domain and PSD plots.
    *   Calculate and display absolute band powers for all processed files.
    *   Calculate and display relative band powers for all processed files.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
import pandas as pd
import numpy as np
from scipy.signal import butter, filtfilt
import matplotlib.pyplot as plt
import os

# ---------- Configuration ----------
# Update folder path below:
base_drive_path = "/content/drive/My Drive/Seizure"
# subjects = [f'{i:02d}' for i in range(1, n+1)]
subjects = [f'{i:02d}' for i in range(1, 25)]
hemisphere = ['L', 'R']
electrodes_L = ['F7-T7', 'Fp1-F3', 'Fp1-F7']
electrodes_R = ['F8-T8', 'Fp2-F4', 'Fp2-F8']

all_eeg_file_paths = []
for s in subjects:
  subject_folder = f'chb{s}chb{s}'
  for h in hemisphere:
    if h == 'L':
          current_electrodes = electrodes_L
    elif h == 'R':
          current_electrodes = electrodes_R
    else:
            # To handle cases where hemisphere is not 'L' or 'R', if necessary
          print(f"Warning: Unknown hemisphere '{h}' for subject {s}. Skipping.")
          continue

    hemisphere_folder = h

    for electrode_type in current_electrodes:
        file_name = f'{electrode_type}.csv'

        full_path = os.path.join(base_drive_path, subject_folder, hemisphere_folder, file_name)
        all_eeg_file_paths.append({
            'path':full_path,
            'subject':s,
            'hemisphere':h,
            'electrode':electrode_type
      })

print(f"Generated {len(all_eeg_file_paths)} file paths.")

# Sampling frequency in Hz
fs = 500

# --- Bandpass Filter Function ---
def bandpass_filter(data, lowcut, highcut, fs, order=4):
    nyq = 0.5 * fs
    low = lowcut / nyq
    high = highcut / nyq
    b, a = butter(order, [low, high], btype='band')
    return filtfilt(b, a, data)

# --- Define Bands ---
bands = {
    "Delta (0.5–4 Hz)": (0.5, 4),
    "Theta (4–8 Hz)": (4, 8),
    "Alpha (8-13 Hz)": (8, 13),
    "Beta (13-30 Hz)": (13, 30),
    "Gamma (30-100 Hz)": (30, 100)
}

for file_info in all_eeg_file_paths:
    full_path = file_info['path']
    subject_id = file_info['subject']
    hemisphere_val = file_info['hemisphere']
    electrode_val = file_info['electrode']

    print(f"\nProcessing file: {full_path}")

    # To check if file exists before proceeding
    if not os.path.exists(full_path):
        print(f"CRITICAL ERROR: '{full_path}' not found. Please upload the file or check the name.")
        continue # Skip to the next file

    # ---------- Read and Parse Data ----------
    # Read CSV. header=None allows us to inspect the first row manually to check for labels.
    df = pd.read_csv(full_path, header=None)

    print(f"Data Shape: {df.shape}")

    eeg_data = np.array([])
    time = np.array([])
    channel_name = "Unknown"

    if not df.empty and df.shape[0] > 1:
        # Assumption: Data is in the first column (index 0)
        # Row 0 is the Channel Name, Rows 1+ are data points
        channel_name = str(df.iloc[0, 0])

        # specific fix for your file format: coerce errors to force non-numbers to NaN
        raw_values = pd.to_numeric(df.iloc[1:, 0], errors='coerce')

        # Fill NaNs with 0 and reshape
        eeg_data = np.nan_to_num(raw_values.values).reshape(-1, 1)

        # Create time vector based on fs
        time = np.arange(len(eeg_data)) / fs
    else:
        print("Error: File is empty or too short.")
        continue # Skip to the next file if data is not valid

    # ---------- Processing and Plotting ----------
    if eeg_data.size == 0:
        print("No valid EEG data could be extracted.")
    else:
        print(f"Processing {len(eeg_data)} samples for channel: {channel_name}")

        # --- Apply Filters ---
        filtered_data = {}
        for band, (low, high) in bands.items():
            filtered_data[band] = bandpass_filter(eeg_data[:, 0], low, high, fs)

        # --- Plotting ---
        plt.figure(figsize=(15, 10))

        # Plot Raw Data
        plt.subplot(len(bands) + 1, 1, 1)
        plt.plot(time, eeg_data[:, 0], color='black', alpha=0.7, lw=1)
        plt.title(f"Raw EEG Signal - Subject {subject_id} {hemisphere_val} {electrode_val}: {channel_name}")
        plt.ylabel("Amplitude")
        plt.grid(True, linestyle='--', alpha=0.5)

        # Plot Frequency Bands
        for i, (band_name, signal) in enumerate(filtered_data.items()):
            plt.subplot(len(bands) + 1, 1, i + 2)
            plt.plot(time, signal, lw=1)
            plt.ylabel(band_name.split()[0])
            plt.grid(True, linestyle='--', alpha=0.5)

        plt.xlabel("Time (seconds)")
        plt.tight_layout()
        plt.show()


In [None]:
import pandas as pd
import numpy as np
from scipy.signal import butter, filtfilt, welch
import matplotlib.pyplot as plt
import os # Import os for file system operations

# Assuming bandpass_filter function, bands dictionary, and all_eeg_file_paths
# are defined in the previous cells and are available in the kernel state.

# Define Bandpass Filter Function and Bands for self-containment if needed
# (These are already defined in cell 9ZgPAa6lwpPj, but included here for robustness)

def bandpass_filter(data, lowcut, highcut, fs, order=4):
    nyq = 0.5 * fs
    low = lowcut / nyq
    high = highcut / nyq
    b, a = butter(order, [low, high], btype='band')
    return filtfilt(b, a, data)

bands = {
    "Delta (0.5–4 Hz)": (0.5, 4),
    "Theta (4–8 Hz)": (4, 8),
    "Alpha (8–13 Hz)": (8, 13),
    "Beta (13–30 Hz)": (13, 30),
    "Gamma (30–100 Hz)": (30, 100)
}

# Set Sampling Frequency consistent with the previous processing cell
fs = 500  # Sampling frequency in Hz

for file_info in all_eeg_file_paths:
    full_path = file_info['path']
    subject_id = file_info['subject']
    hemisphere_val = file_info['hemisphere']
    electrode_val = file_info['electrode']

    print(f"\nProcessing for: Subject {subject_id}, Hemisphere {hemisphere_val}, Electrode {electrode_val} - File: {full_path}")

    # Check if file exists before proceeding
    if not os.path.exists(full_path):
        print(f"CRITICAL ERROR: '{full_path}' not found. Please ensure the file has been uploaded and saved correctly.")
        continue # Skip to the next file

    # ---------- Read and Parse Data (Aligned with cell 9ZgPAa6lwpPj) ----------
    df_raw = pd.read_csv(full_path, header=None)

    print(f"Data Shape: {df_raw.shape}")

    eeg_data = np.array([])
    time = np.array([])
    channel_name = "Unknown"

    if not df_raw.empty and df_raw.shape[0] > 1:
        channel_name = str(df_raw.iloc[0, 0])
        raw_values = pd.to_numeric(df_raw.iloc[1:, 0], errors='coerce')
        eeg_data = np.nan_to_num(raw_values.values) # This will be 1D for single column

        # Create time vector based on fs
        time = np.arange(len(eeg_data)) / fs
    else:
        print("Error: File is empty or too short for processing.")
        continue # Skip to the next file if data is not valid

    if eeg_data.size == 0:
        print("No valid EEG data could be extracted.")
        continue

    print(f"Processing {len(eeg_data)} samples for channel: {channel_name}")

    # ---------- Apply Filtering ----------
    filtered_data = {}
    for band, (low, high) in bands.items():
        filtered_data[band] = bandpass_filter(eeg_data, low, high, fs) # Pass 1D data

    # ---------- Compute Band Power ----------
    # For a single channel, we use electrode_val as the channel identifier
    channels = [electrode_val]
    band_power_data = {}

    for band in bands.keys():
        # Use the 1D filtered data for welch
        f, Pxx = welch(filtered_data[band], fs=fs, nperseg=fs*2)
        band_power_data[band] = [np.mean(Pxx)]  # average power, stored as a list for DataFrame

    band_power = pd.DataFrame(band_power_data, index=channels)

    # Save to Excel with a unique filename for each subject/electrode
    output_file = f"EEG_Band_Power_Sub{subject_id}_{hemisphere_val}_{electrode_val}.xlsx"
    band_power.to_excel(output_file)
    print(f"Band power values saved to {output_file}")

    # ---------- Time-Domain Plots ----------
    plt.figure(figsize=(12, 10))

    plt.subplot(len(bands) + 1, 1, 1)
    plt.plot(time, eeg_data, color="black") # eeg_data is 1D
    plt.title(f"Raw EEG (Subject: {subject_id} {hemisphere_val} {electrode_val})")
    plt.ylabel("Amplitude")

    for i, (band, _) in enumerate(bands.items(), start=2):
        plt.subplot(len(bands) + 1, 1, i)
        plt.plot(time, filtered_data[band]) # filtered_data[band] is 1D
        plt.title(band)
        plt.ylabel("Amplitude")

    plt.xlabel("Time")
    plt.tight_layout()
    plt.show()

    # ---------- PSD Plots ----------
    plt.figure(figsize=(12, 10))

    f, Pxx = welch(eeg_data, fs=fs, nperseg=fs*2) # eeg_data is 1D
    plt.subplot(len(bands) + 1, 1, 1)
    plt.semilogy(f, Pxx, color="black")
    plt.title(f"Raw EEG PSD (Subject: {subject_id} {hemisphere_val} {electrode_val})")
    plt.ylabel("Power")

    for i, (band, _) in enumerate(bands.items(), start=2):
        f, Pxx = welch(filtered_data[band], fs=fs, nperseg=fs*2) # filtered_data[band] is 1D
        plt.subplot(len(bands) + 1, 1, i)
        plt.semilogy(f, Pxx)
        plt.title(f"{band} PSD")
        plt.ylabel("Power")

    plt.xlabel("Frequency (Hz)")
    plt.tight_layout()
    plt.show()

In [None]:
import pandas as pd
import numpy as np
from scipy.signal import butter, filtfilt, welch

# Ensure fs is consistent with previous cells
fs = 500 # Sampling frequency in Hz

# Re-using bandpass_filter function and bands dictionary defined earlier
# (Assuming these are available from previous executed cells)

# You can uncomment and define them here if you want this cell to be fully self-contained:
# def bandpass_filter(data, lowcut, highcut, fs, order=4):
#     nyq = 0.5 * fs
#     low = lowcut / nyq
#     high = highcut / nyq
#     b, a = butter(order, [low, high], btype='band')
#     return filtfilt(b, a, data)

# bands = {
#     "Delta (0.5–4 Hz)": (0.5, 4),
#     "Theta (4–8 Hz)": (4, 8),
#     "Alpha (8–13 Hz)": (8, 13),
#     "Beta (13–30 Hz)": (13, 30),
#     "Gamma (30–100 Hz)": (30, 100)
# }

all_band_powers = [] # List to store band power DataFrames for each file

for file_info in all_eeg_file_paths:
    full_path = file_info['path']
    subject_id = file_info['subject']
    hemisphere_val = file_info['hemisphere']
    electrode_val = file_info['electrode']

    print(f"\nProcessing for Band Power: Subject {subject_id}, Hemisphere {hemisphere_val}, Electrode {electrode_val} - File: {full_path}")

    if not os.path.exists(full_path):
        print(f"CRITICAL ERROR: '{full_path}' not found. Skipping.")
        continue

    # ---------- Read and Parse Data (Aligned with cell 9ZgPAa6lwpPj) ----------
    df_raw = pd.read_csv(full_path, header=None)

    eeg_data = np.array([])
    channel_name = "Unknown"

    if not df_raw.empty and df_raw.shape[0] > 1:
        channel_name = str(df_raw.iloc[0, 0])
        raw_values = pd.to_numeric(df_raw.iloc[1:, 0], errors='coerce')
        eeg_data = np.nan_to_num(raw_values.values) # This will be 1D for single column
    else:
        print("Error: File is empty or too short for processing. Skipping.")
        continue

    if eeg_data.size == 0:
        print("No valid EEG data could be extracted. Skipping.")
        continue

    # ---------- Apply Filtering ----------
    filtered_data = {}
    for band, (low, high) in bands.items():
        filtered_data[band] = bandpass_filter(eeg_data, low, high, fs) # Pass 1D data

    # ---------- Compute Band Power ----------
    channels = [f"Sub{subject_id}_{hemisphere_val}_{electrode_val}"] # Unique identifier for each entry
    band_power_data = {}

    for band in bands.keys():
        f, Pxx = welch(filtered_data[band], fs=fs, nperseg=fs*2)
        band_power_data[band] = [np.mean(Pxx)]  # average power, stored as a list for DataFrame

    current_band_power_df = pd.DataFrame(band_power_data, index=channels)
    all_band_powers.append(current_band_power_df)

# Concatenate all individual band power dataframes
if all_band_powers:
    final_band_power_df = pd.concat(all_band_powers)
    print("\n--- Combined Band Power Results ---")
    display(final_band_power_df)
else:
    print("No band power data could be processed.")

In [None]:
# ---------- Compute Relative Band Power ----------

# Sum of absolute band powers per channel
total_power = final_band_power_df.sum(axis=1)

# Relative contribution (%)
relative_band_power = (final_band_power_df.T / total_power).T * 100

display(relative_band_power)