In [1]:
import psutil
import platform

print("### Hardware Requirements ###")
print(f"CPU: {platform.processor()}")
print(f"Total RAM: {psutil.virtual_memory().total / 1e9:.2f} GB")

### Hardware Requirements ###
CPU: x86_64
Total RAM: 810.20 GB


In [2]:
import tensorflow as tf
import pandas as pd
import numpy as np
import sklearn as sklearn
import platform

print("### Software Requirements ###")
print(f"Python Version: {platform.python_version()}")
print(f"TensorFlow Version: {tf.__version__}")
print(f"Pandas Version: {pd.__version__}")
print(f"Numpy Version: {np.__version__}")
print(f"sklearn Version: {sklearn.__version__}")

2025-03-11 13:20:34.362708: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


### Software Requirements ###
Python Version: 3.9.13
TensorFlow Version: 2.12.0
Pandas Version: 1.5.0
Numpy Version: 1.24.0
sklearn Version: 1.1.2


In [3]:
import time
start_time = time.time()

# DeepMECFS Tutorial

This notebook demonstrates how to use the **DeepMECFS** pretrained BioMapAI model for ME/CFS metabolomics data. The pretrained model is designed to:
1. **Load the trained deep learning model (DeepMECFS)**
2. **Load the secondary model (Y2y_model)** for final label predictions
3. **Align and preprocess** your metabolomics data to match the model’s requirements
4. **Compute metrics** (accuracy, precision, recall, F1, AUC, AUCPR) on your dataset

## Reference
The metabolomics data used in this example:
- *"Plasma metabolomics reveals disrupted response and recovery following maximal exercise in myalgic encephalomyelitis/chronic fatigue syndrome"*, Arnaud Germain et al., JCI Insight. 2022;7(9):e157621. [DOI: [10.1172/jci.insight.157621](https://doi.org/10.1172/jci.insight.157621)]

## Notebook Overview
1. **Environment Setup**: Imports and helper functions
2. **Load Pretrained Model**: Load the DeepMECFS model and the Y2y_model
3. **Load and Preprocess Data**: Demonstration of alignment and scaling
4. **Predict & Evaluate**: Use the pretrained model to produce predictions and compute metrics

> **Note**: This notebook relies on a folder `pretrained_model_DeepMECFS/` containing:
>
> - `DeepMECFS_metabolome/` (the main Keras model)
> - `Y2y_metabolome/` (the final conversion model)
> - `metabolome_feature_metadata.csv` (feature requirement for the model)

Let's get started!

## 1. Environment Setup
We’ll begin by importing the necessary Python libraries and defining a helper function to dynamically import **BioMapAI.py**.

In [4]:
import importlib.util
import os
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    roc_auc_score, average_precision_score
)
import joblib
import tensorflow as tf

# Clear any previous TensorFlow session
tf.keras.backend.clear_session()

def import_module_with_full_path(file_path):
    base_filename = os.path.basename(file_path)
    module_name = os.path.splitext(base_filename)[0]
    module_spec = importlib.util.spec_from_file_location(module_name, file_path)
    imported_module = importlib.util.module_from_spec(module_spec)
    module_spec.loader.exec_module(imported_module)
    return imported_module

print("Environment setup complete.")

Environment setup complete.


## 2. Load Pretrained Model
Here we load:
- **DeepMECFS_model**: The trained BioMapAI model for ME/CFS metabolomics.
- **Y2y_model**: A secondary model that converts the intermediate outputs (Y) into the final binary outcome (`CFS` vs. `Control`).

We also load `metabolome_feature_metadata.csv` to ensure our dataset columns align properly.

In [5]:
# Import BioMapAI from your local path
BioMapAI = import_module_with_full_path("BioMapAI.py")

# Load pretrained model
DeepMECFS_model = tf.keras.models.load_model("pretrained_model_DeepMECFS/DeepMECFS_metabolome/")
Y2y_model = tf.keras.models.load_model("pretrained_model_DeepMECFS/Y2y_metabolome/")
feature_meta = pd.read_csv("pretrained_model_DeepMECFS/metabolome_feature_metadata.csv", index_col=0)

print("Loaded DeepMECFS_model, Y2y_model, and feature metadata.")

Loaded DeepMECFS_model, Y2y_model, and feature metadata.


## 3. Load and Preprocess Data
In this example, we load data from an Excel file (`jci.insight.2022.xlsx`). The data includes rows with metabolite measurements and some metadata columns.

### Steps:
1. **Load the Excel file**.
2. **Parse feature metadata** from the first rows/columns.
3. **Transpose** and align data columns to match `feature_meta` from the pretrained model.
4. **Scale** data using `StandardScaler`.
5. **Create** a label vector (`y`) from a `Phenotype` column (mapping `Control` to 0, `CFS` to 1).

> **Tip**: Make sure your data columns match the exact names (or COMP_IDs) the pretrained model expects.


In [6]:
# Load your data
# (Note: You may need 'openpyxl' installed to read Excel files: pip install openpyxl)
data = pd.read_excel("example_data/jci.insight.2022.xlsx", sheet_name='ScaledImpDataZeroDrug&Tobacco')

# The following transformations align the dataset to match the pretrained model's requirements
# 1) Separate feature metadata
data_feature_meta = data.iloc[6:, :15]
data_feature_meta.columns = data_feature_meta.iloc[0]
data_feature_meta.index = data_feature_meta.loc[:, 'COMP ID']
data_feature_meta = data_feature_meta.drop('COMP ID')

# 2) Extract the sample metadata (Phenotype, etc.) from columns 15 onward
meta = data.iloc[:6, 15:].transpose()
meta.columns = meta.iloc[0]
meta = meta.drop('ID')

# 3) The main data matrix starts at row 7, column 16
data_main = data.iloc[7:, 16:]
data_main.index = data_feature_meta.index
data_main.columns = meta.index

# 4) Transpose so rows = samples, columns = metabolites
data_main = data_main.transpose()

# 5) Reindex the feature metadata for easy overlap calculations
feature_meta.index = feature_meta.COMP_ID

print("Data loaded and reformatted.")

Data loaded and reformatted.


### Check Feature Coverage
Next, we ensure that the features in your dataset overlap with the ones expected by the pretrained model.


In [7]:
overlap_features = set(data_main.columns).intersection(feature_meta.index)
overlap_len = len(overlap_features)

print("Model features:", len(feature_meta.index))
print("Test dataset features:", len(data_main.columns))
print("Overlap:", overlap_len)
print("Feature Coverage:", overlap_len / len(feature_meta.index))

Model features: 730
Test dataset features: 1157
Overlap: 573
Feature Coverage: 0.7849315068493151


### Finalize X (features) and y (labels)
1. **Reindex** the data columns to match `feature_meta` order.
2. **Fill missing features** (if any) with zeros.
3. **Extract** the phenotype column from `meta` and map to 0 or 1.
4. **Scale** the data with `StandardScaler`.

In [8]:
# Align columns
X = data_main.reindex(columns=feature_meta.index, fill_value=0)

# Convert data to float
X = X.astype("float32")

# Extract labels (y)
y = meta.Phenotype.map({'Control': 0, 'CFS': 1})
y = y.astype("float32")

# Scale features
X = StandardScaler().fit_transform(X)

print("Features and labels prepared.")

Features and labels prepared.


## 4. Predict & Evaluate
Use **BioMapAI.ScoreYModel** to combine your pretrained **DeepMECFS_model** and **Y2y_model**, then generate predictions and compute various metrics.

### `calculate_metrics` Function
We have defined `calculate_metrics` to compute:
- **Accuracy**
- **Precision**
- **Recall**
- **F1-Score**
- **AUC** (Area Under the ROC Curve)
- **AUCPR** (Area Under the Precision-Recall Curve)


In [9]:
def calculate_metrics(y_true, y_pred):
    accuracy = accuracy_score(y_true, y_pred)
    precision = precision_score(y_true, y_pred)
    recall = recall_score(y_true, y_pred)
    f1 = f1_score(y_true, y_pred)
    auc = roc_auc_score(y_true, y_pred)
    aucpr = average_precision_score(y_true, y_pred)
    result = pd.Series([
        accuracy, precision, recall, f1, auc, aucpr
    ], index=['accuracy','precision','recall','f1','auc','aucpr'])
    return result

print("Helper function for metrics defined.")

Helper function for metrics defined.


### 4.1 Generate Predictions and Metrics
1. **Predict probabilities** (`y_prob`) with `ScoreYModel.predict()`.
2. Convert probabilities to hard predictions (`y_pred`) using a threshold of 0.5.
3. Evaluate using the `calculate_metrics` function.
4. Compute **loss** and **accuracy** from the model’s `.evaluate()`.
5. Optionally, compute a continuous score with `.get_score()`.

In [10]:
# Generate probabilities
y_prob = BioMapAI.ScoreYModel(DeepMECFS_model, Y2y_model).predict(X)

# Binarize predictions
y_pred = (y_prob > 0.5).astype(int).flatten()

# Compute metrics
metrics_result = calculate_metrics(y, y_pred)
print("Metrics on your dataset:")
print(metrics_result)

# Evaluate final output (Keras-style loss & accuracy)
loss, accuracy = BioMapAI.ScoreYModel(DeepMECFS_model, Y2y_model).evaluate(X, y)
print("\nModel Evaluation:")
print("Loss:", loss)
print("Accuracy:", accuracy)

# (Optional) Get a score instead of probability
score = BioMapAI.ScoreYModel(DeepMECFS_model, Y2y_model).get_score(X)
print("\Score (example):", score[:5])

Metrics on your dataset:
accuracy     0.581683
precision    0.605536
recall       0.760870
f1           0.674374
auc          0.552849
aucpr        0.596873
dtype: float64

Model Evaluation:
Loss: 1.1389771699905396
Accuracy: 0.5841584205627441
\Score (example): [[0.69249278 4.         2.         0.90231776 0.34845942 0.71620554
  0.52002591 0.77187139 0.48639807 0.69270623 0.59847623 0.38456982]
 [0.65149671 4.         3.         0.89950168 0.36542782 0.69211692
  0.51083148 0.7390132  0.47947568 0.70352364 0.57209647 0.34919432]
 [0.54803205 4.         0.         0.86284113 0.30350617 0.58225155
  0.41387561 0.64547694 0.45103189 0.66739452 0.49009156 0.28886947]
 [0.09153847 0.         0.         0.71719098 0.29834646 0.25541073
  0.20815381 0.37611681 0.38354927 0.61935139 0.28919482 0.02028154]
 [0.68932599 4.         0.         0.90346587 0.34127256 0.7179293
  0.51207089 0.76941878 0.48757833 0.69147861 0.59413648 0.36833206]]


## Conclusion
You have successfully:
1. Loaded the **DeepMECFS** pretrained model and **Y2y_model**.
2. Aligned, preprocessed, and scaled your metabolomics data.
3. Generated predictions for ME/CFS vs. Control classification.
4. Evaluated using various metrics (accuracy, precision, recall, F1, AUC, AUCPR).

Feel free to adapt the code to your own datasets. Keep in mind:
- You must match the exact features the model expects.
- Scaling or normalization steps should be consistent.
- Additional hyperparameter tuning is not necessary here since the model is pretrained, but you can still experiment with thresholding or post-processing.

Happy modeling!

In [11]:
end_time = time.time()
print(f"Total execution time: {end_time - start_time:.2f} seconds")

Total execution time: 10.36 seconds
