<a href="https://colab.research.google.com/github/rockangator/SEM_Fall2024_HK/blob/main/SEM_Hypotheses.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#PPLX

Here's a Python code file that demonstrates the initial steps in processing EEG data from the BrainLat dataset and creating a basic machine learning model for disease classification. This code serves as a foundation for the more advanced analyses proposed in the project aims.

In [None]:
import numpy as np
import pandas as pd
import mne
import os
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix
import tensorflow as tf

# Function to load and preprocess EEG data
def load_and_preprocess_eeg(file_path):
    raw = mne.io.read_raw_edf(file_path, preload=True)
    raw.filter(l_freq=1, h_freq=40)  # Bandpass filter
    raw.notch_filter(freqs=60)  # Notch filter for power line noise
    return raw

# Function to extract features from EEG data
def extract_features(raw):
    data = raw.get_data()
    features = np.concatenate([
        np.mean(data, axis=1),
        np.std(data, axis=1),
        np.max(data, axis=1),
        np.min(data, axis=1)
    ])
    return features

# Load and preprocess data
data_dir = "path/to/brainlat/dataset"
subjects = []
labels = []

for subject_dir in os.listdir(data_dir):
    subject_path = os.path.join(data_dir, subject_dir)
    if os.path.isdir(subject_path):
        eeg_file = os.path.join(subject_path, "eeg.edf")
        if os.path.exists(eeg_file):
            raw = load_and_preprocess_eeg(eeg_file)
            features = extract_features(raw)
            subjects.append(features)

            # Assuming label information is stored in a separate file
            label_file = os.path.join(subject_path, "label.txt")
            with open(label_file, "r") as f:
                label = f.read().strip()
            labels.append(label)

X = np.array(subjects)
y = np.array(labels)

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train a Random Forest Classifier
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)
rf_classifier.fit(X_train_scaled, y_train)

# Make predictions and evaluate
y_pred = rf_classifier.predict(X_test_scaled)
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))

# Create a simple neural network model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu', input_shape=(X_train_scaled.shape[1],)),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dense(16, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model
history = model.fit(X_train_scaled, y_train, epochs=50, batch_size=32, validation_split=0.2, verbose=1)

# Evaluate the model
test_loss, test_accuracy = model.evaluate(X_test_scaled, y_test)
print(f"Test accuracy: {test_accuracy}")

# Make predictions using the neural network
y_pred_nn = model.predict(X_test_scaled)
y_pred_nn_classes = (y_pred_nn > 0.5).astype(int).flatten()

print(classification_report(y_test, y_pred_nn_classes))
print(confusion_matrix(y_test, y_pred_nn_classes))

This code demonstrates:

1. Loading and preprocessing EEG data using the MNE library.
2. Extracting basic features from the EEG signals.
3. Training a Random Forest Classifier as a baseline model.
4. Creating and training a simple neural network using TensorFlow/Keras.
5. Evaluating both models using classification reports and confusion matrices.

Note that this code assumes a specific structure for the BrainLat dataset and may need to be adjusted based on the actual data organization. Additionally, for the proposed project aims, this code would need to be expanded to include:

1. More advanced feature extraction techniques.
2. Implementation of the CNN-LSTM architecture for EEG analysis.
3. Integration of genetic and biomarker data.
4. Transfer learning from larger datasets.
5. Interpretability methods to identify significant features and biomarkers.

This code serves as a starting point for the more complex analyses outlined in the project aims.

#CGT

Below is a simplified code outline in Python, which shows the basic structure for loading EEG data, preprocessing it, and setting up an initial machine learning model pipeline to identify features potentially predictive of neurodegeneration.

In [None]:
# Import necessary libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import mne  # for EEG data handling

# Load EEG data (example using MNE)
# Replace 'sample_path' with actual data path
eeg_data_path = 'sample_path/eeg_data.fif'
raw = mne.io.read_raw_fif(eeg_data_path, preload=True)
raw.filter(1., 40.)  # Band-pass filtering to reduce noise

# Preprocess EEG data - Feature Extraction
eeg_data = raw.get_data()
# Calculate power spectral densities (PSD) as an example feature
psd, freqs = mne.time_frequency.psd_multitaper(raw, fmin=1, fmax=40)

# Define target variables (dummy variables for demonstration)
# In real research, these would be labels associated with disease stages or gene markers
labels = np.random.randint(0, 2, size=len(psd))

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(psd, labels, test_size=0.2, random_state=42)

# Model training - Example Random Forest Classifier
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)

# Prediction and evaluation
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f'EEG Biomarker Model Accuracy: {accuracy:.2f}')

# Placeholder for future integration with genetic biomarker data
# genetic_data = pd.read_csv('path/to/genetic_data.csv')  # example


This outline serves as a starting point to explore EEG biomarkers and can be extended with genetic and protein datasets as they become available. The code demonstrates basic EEG data handling, preprocessing, and an initial classifier setup to detect potential biomarkers. Advanced architecture and multi-agent models can build upon this foundation.