# Evaluation Script

This script evaluates the performance of generative models across multiple datasets by comparing real and synthetic data.

1. **Cross-Validation Strategy**  
   - A **5-fold cross-validation (5CV)** procedure is implemented.  
   - For each dataset, the script computes average performance across folds, ensuring robust and unbiased evaluation.  

2. **Train–Test Comparisons**  
   - The evaluation compares the **training data**, **test data**, and the **synthetic samples** generated by the models.  
   - This setup allows assessing how well generative models capture the statistical properties of the original test data.  

3. **Discrepancy Metrics**  
   - The following metrics are calculated to quantify differences between real and synthetic distributions:  
     - **Wasserstein Distance (WD)**  
     - **Maximum Mean Discrepancy (MMD)**  
     - **Correlation Difference (CD)**  
   - All discrepancy metrics are computed using the **top 50 principal components (PCs)**, reducing noise while retaining key variance in the data.  
4. **Classification**
   
   Setup  
   - Classification models are trained on real + synthetic data.  
   - Test data is used for performance evaluation.  

   Metrics  
   - Standard classification metrics are computed, including:  
     - **Precision** – the proportion of correctly predicted positive samples out of all predicted positives.  
     - **Recall** – the proportion of correctly predicted positive samples out of all actual positives.  
     - **F1-score** – the harmonic mean of precision and recall, providing a balanced measure of performance.  

   Cross-Validation  
   - Results are averaged across **5-fold cross-validation (5CV)** for each dataset to ensure stability and robustness.   


5. **Outputs**  
   - Fold-wise and averaged results are stored for each dataset.  
   - The results provide both **quantitative evidence** of distribution similarity and **classification-based evaluation** for downstream tasks.  

---

In summary, the evaluation script compares **real vs. synthetic distributions** with WD, MMD, and CD metrics (using 50-PC representation) and reports **5CV averaged results per dataset**.


In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pylab import savefig
from scipy.io import arff
import ntpath
import glob
import os
import math
from sklearn import preprocessing

import argparse

import torch

from sklearn import manifold
import string


Pbmc3k

In [None]:
import pickle
with open(f"data/pbmc3k_train.pkl", "rb") as f:
    X_train = pickle.load(f)
with open(f"data/pbmc3k_test.pkl", "rb") as f:
    X_test = pickle.load(f)
with open(f"Rdata/pbmc3k_y_train.pkl", "rb") as f:
    y_train = pickle.load(f)
with open(f"data/pbmc3k_y_test.pkl", "rb") as f:
    y_test = pickle.load(f)

In [None]:
X_train.shape, y_train.shape, X_test.shape, y_test.shape

((2374, 2000), (2374,), (264, 2000), (264,))

In [None]:
unique_values, counts = np.unique(y_train, return_counts=True)
display(dict(zip(unique_values, counts)),np.max(counts))

{np.int64(0): np.int64(640),
 np.int64(1): np.int64(432),
 np.int64(2): np.int64(425),
 np.int64(3): np.int64(309),
 np.int64(4): np.int64(251),
 np.int64(5): np.int64(146),
 np.int64(6): np.int64(129),
 np.int64(7): np.int64(29),
 np.int64(8): np.int64(13)}

np.int64(640)

Pbmc68k

In [None]:
pip install anndata




In [None]:
import anndata

# Load the h5ad file
adata = anndata.read_h5ad(gdrivePath+"/Revision/data/68kPBMC_preprocessed.h5ad")

# Print basic info
print(adata)


AnnData object with n_obs × n_vars = 68579 × 17789
    obs: 'cluster', 'n_genes', 'n_counts', 'split'
    var: 'n_cells'


In [None]:
adata.obs

Unnamed: 0_level_0,cluster,n_genes,n_counts,split
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
AAACATACACCCAA-1,2,498,1216.0,train
AAACATACCCCTCA-1,3,472,1265.0,train
AAACATACCGGAGA-1,1,542,1322.0,train
AAACATACTAACCG-1,6,349,854.0,train
AAACATACTCTTCA-1,0,446,1252.0,train
...,...,...,...,...
TTTGCATGAGCCTA-8,3,484,1314.0,train
TTTGCATGCTAGCA-8,3,614,1690.0,train
TTTGCATGCTGCAA-8,2,529,1230.0,train
TTTGCATGGCTCCT-8,2,286,552.0,train


In [None]:
train_mask = adata.obs['split'] == "train"
test_mask = adata.obs['split'] == "test"

# Select training data
X_train = adata.X[train_mask.values]
y_train = adata.obs['cluster'][train_mask.values]

# Select test data
X_test = adata.X[test_mask.values]
y_test = adata.obs['cluster'][test_mask.values]

In [None]:
X_train.shape , y_train.shape, X_test.shape, y_test.shape

((61588, 17789), (61588,), (6991, 17789), (6991,))

5CV Data

In [None]:
import pickle

all_folds = []

for fold in range(1, 6):
    with open(f"data/5CV/fold_skf_3000_{fold}.pkl", "rb") as f:
        fold_data = pickle.load(f)
        all_folds.append(fold_data)


In [None]:
for i,f in enumerate(all_folds, start=1):
  print(f"Fold {i}:")
  print("X_train shape:", f['X_train'].shape)
  print("y_train shape:", f['y_train'].shape)

Fold 1:
X_train shape: (11337, 3000)
y_train shape: (11337,)
Fold 2:
X_train shape: (11337, 3000)
y_train shape: (11337,)
Fold 3:
X_train shape: (11338, 3000)
y_train shape: (11338,)
Fold 4:
X_train shape: (11338, 3000)
y_train shape: (11338,)
Fold 5:
X_train shape: (11338, 3000)
y_train shape: (11338,)


In [None]:
unique_values, counts = np.unique(y_train, return_counts=True)
display(dict(zip(unique_values, counts)),np.max(counts))

{np.int64(0): np.int64(640),
 np.int64(1): np.int64(432),
 np.int64(2): np.int64(425),
 np.int64(3): np.int64(309),
 np.int64(4): np.int64(251),
 np.int64(5): np.int64(146),
 np.int64(6): np.int64(129),
 np.int64(7): np.int64(29),
 np.int64(8): np.int64(13)}

np.int64(640)

Examples of loading synthetic data

In [None]:
with open(f"ACTIVA/pbmc68k_activa_generated.pkl", "rb") as f:
  gen_data = pickle.load(f)

In [None]:
import pickle
with open(f"results/pbmc68k_MF-FB_generated.pkl", "rb") as f:
  gen_data = pickle.load(f)

In [None]:
with open(f"results/pbmc3k_GC_generated.pkl", "rb") as f:
  gen_data = pickle.load(f)

In [None]:
gen_data.shape

(264, 2000)

Examples of loading 5CV synthetic data

In [None]:
import pickle

gen_dict = []

for fold in range(1, 6):
    with open(f"results/MOE-FB_{fold}.pkl", "rb") as f:
        gen_data = pickle.load(f)
        gen_dict.append(gen_data)


In [None]:
for i,f in enumerate(gen_dict, start=1):
  print(f"Fold {i}:")
  print("X_train shape:", f['X_train_gen'].shape)
  print("y_train shape:", f['y_train_gen'].shape)

Fold 1:
X_train shape: (18405, 3000)
y_train shape: (18405,)
Fold 2:
X_train shape: (18405, 3000)
y_train shape: (18405,)
Fold 3:
X_train shape: (18405, 3000)
y_train shape: (18405,)
Fold 4:
X_train shape: (18416, 3000)
y_train shape: (18416,)
Fold 5:
X_train shape: (18405, 3000)
y_train shape: (18405,)


HCA

In [None]:
import pickle

all_folds = []

for fold in range(1, 6):
    with open(f"Revision/HCA/5CV/fold_skf_{fold}.pkl", "rb") as f:
        fold_data = pickle.load(f)
        all_folds.append(fold_data)


In [None]:
import pickle

gen_dict = []

for fold in range(1, 6):
    with open(f"HCA/5CV/GC_skf_fold{fold}.pkl", "rb") as f:
        gen_data = pickle.load(f)
        gen_dict.append(gen_data)



PCA

In [None]:
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_np_scaled = scaler.fit_transform(X_train)
synthetic_samples_scaled = scaler.transform(gen_data)
X_test_np_scaled = scaler.transform(X_test)

pca = PCA(n_components=50,random_state=42)
X_train_pca = pca.fit_transform(X_train_np_scaled)
synthetic_pca = pca.transform(synthetic_samples_scaled)
X_test_pca = pca.transform(X_test_np_scaled)


PCA - 5CV

In [None]:
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

all_folds_pca = []
gen_dict_pca = []

for i in range(5):
    X_test_np = all_folds[i]['X_val'].values
    X_train_np = all_folds[i]['X_train'].values
    synthetic_samples = gen_dict[i]['X_train_gen']


    # Standardize
    scaler = StandardScaler()
    X_train_np_scaled = scaler.fit_transform(X_train_np)
    synthetic_samples_scaled = scaler.transform(synthetic_samples)
    X_test_np_scaled = scaler.transform(X_test_np)


    # PCA
    pca = PCA(n_components=50,random_state=42)
    X_train_pca = pca.fit_transform(X_train_np_scaled)

    synthetic_pca = pca.transform(synthetic_samples_scaled)

    X_test_pca = pca.transform(X_test_np_scaled)


    syn = {
          'X_train': X_train_pca,
          'X_val': X_test_pca
      }
    all_folds_pca.append(syn)

    gen_dict_pca.append(synthetic_pca)





Wasserstein Distance - WD

In [None]:
from scipy.stats import wasserstein_distance

wd_score = []
for feature_idx in range(X_test_pca.shape[1]):
  wd = wasserstein_distance(X_test_pca[:, feature_idx], synthetic_pca[:, feature_idx])
  wd_score.append(wd)


In [None]:
print(np.mean(wd_score)) #train-test pbmc3k

0.45406135463855096


In [None]:
print(np.mean(wd_score)) #ACTIVA pbmc3k Gen - Test 100epc

0.9183059864973818


In [None]:
print(np.mean(wd_score)) #GC pbmc3k

11.724739081941932


In [None]:
print(np.mean(wd_score)) #TVAE pbmc3k 100 epc

1.2454621032798447


In [None]:
print(np.mean(wd_score)) #CTGAN pbmc3k

0.44015200935395404


In [None]:
print(np.mean(wd_score)) #FB-MOE pbmc3k

0.40528297584481093


In [None]:
print(np.mean(wd_score)) #train-test Pbmc68k

0.10384442378825817


In [None]:
print(np.mean(wd_score)) #train-test MAF-FB pbmc68k

0.4355393600600089


In [None]:
print(np.mean(wd_score)) #train-test ACTIVA pbmc68k

0.7913668209566334


5CV- WD

In [None]:
from scipy.stats import wasserstein_distance
import numpy as np

wd_scores = []

for i in range(5):
    X_gen = np.array(gen_dict_pca[i])#np.array(gen_dict_pca[i])
    X_val = np.array(all_folds_pca[i]['X_val'])

    fold_wd = []
    for feature_idx in range(X_gen.shape[1]):
        wd = wasserstein_distance(X_gen[:, feature_idx], X_val[:, feature_idx])
        fold_wd.append(wd)

    mean_wd = np.mean(fold_wd)
    wd_scores.append(mean_wd)
    print(f"Fold {i + 1}: WD = {mean_wd:.4f}")

print("All Fold WDs:", wd_scores)
print("Average WD across folds:", np.mean(wd_scores))


HCA-BM10K WD 5CV

In [None]:
#train-test
print("All Fold WDs:", wd_scores)
print("Average WD across folds:", np.mean(wd_scores))

All Fold WDs: [np.float64(0.2249129853936038), np.float64(0.21479028858693908), np.float64(0.23263657902348447), np.float64(0.20710126271462045), np.float64(0.21655879005722437)]
Average WD across folds: 0.21919998115517444


In [None]:
#FB-MOE
print("All Fold WDs:", wd_scores)
print("Average WD across folds:", np.mean(wd_scores))

All Fold WDs: [np.float64(0.879814047156967), np.float64(0.8606204568790219), np.float64(0.8658935813268926), np.float64(0.8604577146288744), np.float64(0.8777209105788095)]
Average WD across folds: 0.8689013421141132


In [None]:
#MAF-FB
print("All Fold WDs:", wd_scores)
print("Average WD across folds:", np.mean(wd_scores))

All Fold WDs: [np.float64(0.8806664035509105), np.float64(0.8639304001682138), np.float64(0.8641392583031928), np.float64(0.8638129938514685), np.float64(0.8826448242603517)]
Average WD across folds: 0.8710387760268274


In [None]:
#CTGAN
print("All Fold WDs:", wd_scores)
print("Average WD across folds:", np.mean(wd_scores))

All Fold WDs: [np.float64(0.4869652335823195), np.float64(0.44252429557319917), np.float64(0.4845046965214293), np.float64(0.43982685652160386), np.float64(0.4911104549078049)]
Average WD across folds: 0.46898630742127134


In [None]:
#TVAE
print("All Fold WDs:", wd_scores)
print("Average WD across folds:", np.mean(wd_scores))

All Fold WDs: [np.float64(0.5533219297740434), np.float64(0.5297170114021647), np.float64(0.5217549977450364), np.float64(0.5025263817267053), np.float64(0.5154266781153247)]
Average WD across folds: 0.5245493997526548


In [None]:
#GC
print("All Fold WDs:", wd_scores)
print("Average WD across folds:", np.mean(wd_scores))

All Fold WDs: [np.float64(2.768908765143101), np.float64(2.7970539647434127), np.float64(2.7517902092107547), np.float64(2.6761763706831085), np.float64(2.694375970706221)]
Average WD across folds: 2.73766105609732


HCA 5CV end---

5CV Integrated Pancreatic Dataset WD SKF 3000 Results

In [None]:
print("Train vs Test")
print("All Fold WDs:", wd_scores)
print("Average WD across folds:", np.mean(wd_scores))

Train vs Test
All Fold WDs: [np.float64(0.13790136672277692), np.float64(0.14175517275337504), np.float64(0.14009066613353321), np.float64(0.12970281643836318), np.float64(0.13711298860108376)]
Average WD across folds: 0.13731260212982643


In [None]:
print("TVAE - Gen vs Test")
print("All Fold WDs:", wd_scores)
print("Average WD across folds:", np.mean(wd_scores))

TVAE - Gen vs Test
All Fold WDs: [np.float64(0.7070426308703033), np.float64(0.7240658013048972), np.float64(0.7023546893954266), np.float64(0.6999003745865017), np.float64(0.7021834616828189)]
Average WD across folds: 0.7071093915679896


In [None]:
print("CTGAN - Gen vs Test")
print("All Fold WDs:", wd_scores)
print("Average WD across folds:", np.mean(wd_scores))

CTGAN - Gen vs Test
All Fold WDs: [np.float64(0.6044317108680753), np.float64(0.6223876772039634), np.float64(0.6030753448316529), np.float64(0.5936910245792189), np.float64(0.5902349909433829)]
Average WD across folds: 0.6027641496852587


In [None]:
print("GC - Gen vs Test")
print("All Fold WDs:", wd_scores)
print("Average WD across folds:", np.mean(wd_scores))

GC - Gen vs Test
All Fold WDs: [np.float64(0.7889359736841761), np.float64(0.7564975367080765), np.float64(0.755000657150692), np.float64(0.7615038600538773), np.float64(0.7943546059985441)]
Average WD across folds: 0.7712585267190731


In [None]:
print("FB-paper - Gen vs Test")#MAF-FB
print("All Fold WDs:", wd_scores)
print("Average WD across folds:", np.mean(wd_scores))

FB-paper - Gen vs Test
All Fold WDs: [np.float64(0.6253901150179434), np.float64(0.6337784402089062), np.float64(0.6224293348787358), np.float64(0.6136634970376103), np.float64(0.609206914950513)]
Average WD across folds: 0.6208936604187418


In [None]:
print("FB-MOE - Gen vs Test")
print("All Fold WDs:", wd_scores)
print("Average WD across folds:", np.mean(wd_scores))

FB-MOE-MAF - Gen vs Test
All Fold WDs: [np.float64(0.5999874581483928), np.float64(0.6064372111209415), np.float64(0.5956222513325993), np.float64(0.5864445015566421), np.float64(0.575417567992129)]
Average WD across folds: 0.5927817980301409


CD

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import pearsonr


def CD(generated_data,real_data):

  generated_data = pd.DataFrame(generated_data)
  real_data = pd.DataFrame(real_data)
  # 1. Calculate correlation matrices for both generated and real data
  corr_generated = generated_data.corr(method='pearson')
  corr_real = real_data.corr(method='pearson')


  # 4. Calculate the Correlation Discrepancy (Difference between real and generated correlations)
  correlation_discrepancy = np.abs(corr_real - corr_generated)

  # 5. Calculate the mean discrepancy as a summary metric
  mean_discrepancy = correlation_discrepancy.mean().mean()
  #print(f"Mean Correlation Discrepancy: {mean_discrepancy:}")
  return mean_discrepancy

#CD(X_test_pca,synthetic_pca)

In [None]:
CD(X_test_pca,X_train_pca) #pbmc3k train-test

np.float64(0.061494239366676105)

In [None]:
CD(X_test_pca,synthetic_pca) #pbmc3k gen-test MAF-FB

np.float64(0.08149562227406576)

In [None]:
CD(X_test_pca,synthetic_pca) #ACTIVA pbmc3k 100epc

np.float64(0.3251639726877482)

In [None]:
CD(X_test_pca,synthetic_pca) #GC Pbmc3k

np.float64(0.11055279868000374)

In [None]:
CD(X_test_pca,synthetic_pca) #TVAE Pbmc3k 100epc

np.float64(0.3284345191225019)

In [None]:
CD(X_test_pca,synthetic_pca) #CTGAN Pbmc3k 100epc

np.float64(0.08144118120792941)

In [None]:
CD(X_test_pca,synthetic_pca) #FB-MOE

np.float64(0.07953590103796837)

In [None]:
CD(X_test_pca,X_train_pca) #train Pbmc68k

np.float64(0.15786946481401695)

In [None]:
CD(X_test_pca,synthetic_pca) #train FB-paper Pbmc68k

np.float64(0.1603132214494761)

In [None]:
CD(X_test_pca,synthetic_pca) #train ACTIVA Pbmc68k

np.float64(0.7784679288962104)

5CV CD

In [None]:
from scipy.stats import wasserstein_distance
import numpy as np

cd_scores = []

for i in range(5):
    X_gen = gen_dict_pca[i]
    X_val = all_folds_pca[i]['X_val']

    fold_cd = []
    cd = CD(X_gen, X_val)
    fold_cd.append(cd)

    cd_scores.append(cd)
    print(f"Fold {i + 1}: CD = {cd:.4f}")

print("All Fold CDs:", cd_scores)
print("Average CD across folds:", np.mean(cd_scores))


HCA 5CV

In [None]:
#train-test
print("All Fold CDs:", cd_scores)
print("Average CD across folds:", np.mean(cd_scores))

All Fold CDs: [np.float64(0.028482247565610398), np.float64(0.026583743145999216), np.float64(0.02657675640072331), np.float64(0.02438423379085148), np.float64(0.029881875984246013)]
Average CD across folds: 0.027181771377486084


In [None]:
#FB MOE
print("All Fold CDs:", cd_scores)
print("Average CD across folds:", np.mean(cd_scores))

All Fold CDs: [np.float64(0.0623470408315862), np.float64(0.05531340006516592), np.float64(0.06061106052229991), np.float64(0.058350049436595504), np.float64(0.06069991848780207)]
Average CD across folds: 0.05946429386868992


In [None]:
#MAF-FB
print("All Fold CDs:", cd_scores)
print("Average CD across folds:", np.mean(cd_scores))

All Fold CDs: [np.float64(0.06187730336403202), np.float64(0.05539450386752146), np.float64(0.060540032938767724), np.float64(0.058714669245232554), np.float64(0.06055873836589662)]
Average CD across folds: 0.05941704955629008


In [None]:
#CTGAN
print("All Fold CDs:", cd_scores)
print("Average CD across folds:", np.mean(cd_scores))

All Fold CDs: [np.float64(0.0667193226752828), np.float64(0.05852284017711116), np.float64(0.06493631514885059), np.float64(0.05928739442527361), np.float64(0.06801762866458011)]
Average CD across folds: 0.06349670021821965


In [None]:
#TVAE
print("All Fold CDs:", cd_scores)
print("Average CD across folds:", np.mean(cd_scores))

All Fold CDs: [np.float64(0.07266489317297013), np.float64(0.07357148857872224), np.float64(0.06922118632998725), np.float64(0.06925300742195929), np.float64(0.07270545403235468)]
Average CD across folds: 0.07148320590719873


In [None]:
#GC
print("All Fold CDs:", cd_scores)
print("Average CD across folds:", np.mean(cd_scores))

All Fold CDs: [np.float64(0.3692539871805438), np.float64(0.3657034587639022), np.float64(0.35291208588419315), np.float64(0.323168157690149), np.float64(0.33859609814633773)]
Average CD across folds: 0.34992675753302516


HCA 5CV end----

5CV Integrated Pancreatic Dataset CD SKF 3000 Results

In [None]:
print("Train vs Test")
print("All Fold CDs:", cd_scores)
print("Average CD across folds:", np.mean(cd_scores))

Train vs Test
All Fold CDs: [np.float64(0.01823955776668988), np.float64(0.017944903708676505), np.float64(0.017539240791690787), np.float64(0.016775423333824903), np.float64(0.017781103823022105)]
Average CD across folds: 0.017656045884780834


In [None]:
print("TVAE - Gen vs Test")
print("All Fold CDs:", cd_scores)
print("Average CD across folds:", np.mean(cd_scores))

TVAE - Gen vs Test
All Fold CDs: [np.float64(0.09954079201265438), np.float64(0.10061704833687565), np.float64(0.09203456728151617), np.float64(0.09704612728023154), np.float64(0.09443077797927465)]
Average CD across folds: 0.09673386257811048


In [None]:
print("CTGAN - Gen vs Test")
print("All Fold CDs:", cd_scores)
print("Average CD across folds:", np.mean(cd_scores))

CTGAN - Gen vs Test
All Fold CDs: [np.float64(0.08067770404675777), np.float64(0.0791271582714595), np.float64(0.07225498607408998), np.float64(0.07762901366431477), np.float64(0.07831207939404704)]
Average CD across folds: 0.07760018829013382


In [None]:
print("GC - Gen vs Test")
print("All Fold CDs:", cd_scores)
print("Average CD across folds:", np.mean(cd_scores))

GC - Gen vs Test
All Fold CDs: [np.float64(0.1297417327536517), np.float64(0.12384878307204121), np.float64(0.12383777511699666), np.float64(0.11507406002609523), np.float64(0.12895244597869246)]
Average CD across folds: 0.12429095938949546


In [None]:
print("FB-paper - Gen vs Test")#MAF-FB
print("All Fold CDs:", cd_scores)
print("Average CD across folds:", np.mean(cd_scores))

FB-paper - Gen vs Test
All Fold CDs: [np.float64(0.07065052745976921), np.float64(0.06753403152689309), np.float64(0.06500850422985187), np.float64(0.06813286918343014), np.float64(0.06676765097346206)]
Average CD across folds: 0.06761871667468128


In [None]:
print("FB-MOE-MAF - Gen vs Test")
print("All Fold CDs:", cd_scores)
print("Average CD across folds:", np.mean(cd_scores))

FB-MOE-MAF - Gen vs Test
All Fold CDs: [np.float64(0.07456729344233654), np.float64(0.07110389305816922), np.float64(0.06888548171392131), np.float64(0.0721986823513803), np.float64(0.07080543039991687)]
Average CD across folds: 0.07151215619314485


MMD Metric

In [None]:
mc.compute_scalar_mmd(X_test_pca, synthetic_pca, gammas=None)#FB-MOE pmbc3k

np.float32(0.030967243)

In [None]:
mc.compute_scalar_mmd(X_test_pca, synthetic_pca, gammas=None)#Activa pbmc3k

np.float32(0.105935246)

In [None]:
mc.compute_scalar_mmd(X_test_pca, synthetic_pca, gammas=None)#FB paper pbmc3k

np.float32(0.022169301)

In [None]:
mc.compute_scalar_mmd(X_test_pca, synthetic_pca, gammas=None)#ctGAN pbmc3k

np.float32(0.060532097)

In [None]:
mc.compute_scalar_mmd(X_test_pca, synthetic_pca, gammas=None)#TVAE pbmc3k

np.float32(0.3259683)

In [None]:
mc.compute_scalar_mmd(X_test_pca, synthetic_pca, gammas=None)#GC pbmc3k

np.float32(0.08436968)

In [None]:
mc.compute_scalar_mmd(X_test_pca, X_train_pca, gammas=None) # train Pbmc68k

np.float32(0.00031343472)

In [None]:
mc.compute_scalar_mmd(X_test_pca, synthetic_pca, gammas=None) #FB- paper Pbmc68k

np.float32(0.02870742)

In [None]:
mc.compute_scalar_mmd(X_test_pca, synthetic_pca, gammas=None) #ACTIVA Pbmc68k

np.float32(0.05582629)

5CV -MMD

In [None]:
mmd_scores = []

for i in range(5):
    X_gen = gen_dict_pca[i]#gen_dict_pca[i]#np.array(all_folds_pca[i]['X_train'])
    X_val = all_folds_pca[i]['X_val']

    mmd = compute_mmd(X_val, X_gen,gammas=None)

    mmd_scores.append(mmd)
    print(f"Fold {i + 1}: MMD = {mmd:.8f}")

print("All Fold MMDs:", mmd_scores)
print("Average MMD across folds:", np.mean(mmd_scores))


HCA 5CV

In [None]:
#train-test
print("All Fold MMDs:", mmd_scores)
print("Average MMD across folds:", np.mean(mmd_scores))

All Fold MMDs: [np.float32(0.0010652129), np.float32(0.0010532303), np.float32(0.0011115184), np.float32(0.00093535386), np.float32(0.001071135)]
Average MMD across folds: 0.0010472902


In [None]:
#FB-MOE
print("All Fold MMDs:", mmd_scores)
print("Average MMD across folds:", np.mean(mmd_scores))

All Fold MMDs: [np.float32(0.005394787), np.float32(0.0052615814), np.float32(0.0054551405), np.float32(0.005071365), np.float32(0.00539975)]
Average MMD across folds: 0.0053165248


In [None]:
#FB-paper
print("All Fold MMDs:", mmd_scores)
print("Average MMD across folds:", np.mean(mmd_scores))

All Fold MMDs: [np.float32(0.0053990446), np.float32(0.005284695), np.float32(0.0054820883), np.float32(0.0050776484), np.float32(0.0054083876)]
Average MMD across folds: 0.005330373


In [None]:
#CTGAN
print("All Fold MMDs:", mmd_scores)
print("Average MMD across folds:", np.mean(mmd_scores))

All Fold MMDs: [np.float32(0.0062954873), np.float32(0.006271582), np.float32(0.006400822), np.float32(0.0061136247), np.float32(0.006259912)]
Average MMD across folds: 0.006268285


In [None]:
#TVAE
print("All Fold MMDs:", mmd_scores)
print("Average MMD across folds:", np.mean(mmd_scores))

All Fold MMDs: [np.float32(0.009529873), np.float32(0.009089555), np.float32(0.009528232), np.float32(0.0088353045), np.float32(0.009417857)]
Average MMD across folds: 0.009280165


In [None]:
#GC
print("All Fold MMDs:", mmd_scores)
print("Average MMD across folds:", np.mean(mmd_scores))

All Fold MMDs: [np.float32(0.0062127872), np.float32(0.0060867467), np.float32(0.006387098), np.float32(0.0059040543), np.float32(0.0062383357)]
Average MMD across folds: 0.0061658043


HCA 5CV end----

5CV Integrated Pancreatic Dataset

In [None]:
#Train-Test
print("All Fold MMDs:", mmd_scores)
print("Average MMD across folds:", np.mean(mmd_scores))

All Fold MMDs: [np.float32(0.0004923369), np.float32(0.0005185182), np.float32(0.0005432284), np.float32(0.0004941907), np.float32(0.00052429066)]
Average MMD across folds: 0.000514513


In [None]:
#FB-MOE
print("All Fold MMDs:", mmd_scores)
print("Average MMD across folds:", np.mean(mmd_scores))

All Fold MMDs: [np.float32(0.0035226948), np.float32(0.0036923029), np.float32(0.0036216017), np.float32(0.0036880996), np.float32(0.0035676218)]
Average MMD across folds: 0.0036184643


In [None]:
#FB-Paper
print("All Fold MMDs:", mmd_scores)
print("Average MMD across folds:", np.mean(mmd_scores))

All Fold MMDs: [np.float32(0.0025984494), np.float32(0.002759332), np.float32(0.0027371037), np.float32(0.0027800333), np.float32(0.0026595017)]
Average MMD across folds: 0.002706884


In [None]:
#CTGAN
print("All Fold MMDs:", mmd_scores)
print("Average MMD across folds:", np.mean(mmd_scores))

All Fold MMDs: [np.float32(0.0041958797), np.float32(0.004286135), np.float32(0.004373178), np.float32(0.0044729975), np.float32(0.0042707347)]
Average MMD across folds: 0.0043197847


In [None]:
#TVAE
print("All Fold MMDs:", mmd_scores)
print("Average MMD across folds:", np.mean(mmd_scores))

All Fold MMDs: [np.float32(0.0056204717), np.float32(0.005941011), np.float32(0.0057594837), np.float32(0.0059694224), np.float32(0.005993925)]
Average MMD across folds: 0.0058568628


In [None]:
#GC
print("All Fold MMDs:", mmd_scores)
print("Average MMD across folds:", np.mean(mmd_scores))

All Fold MMDs: [np.float32(0.0023848438), np.float32(0.0024278285), np.float32(0.0024639026), np.float32(0.002449179), np.float32(0.0025096445)]
Average MMD across folds: 0.0024470796


Classification

In [None]:
from sklearn.ensemble import RandomForestClassifier
def classify(X_train, X_test, y_train, y_test ):


  names = ["Random_Forest"]

  classifiers = [
    RandomForestClassifier(random_state=0, criterion = 'entropy'),#n_estimators 100 (default)
    ]

  test_score = []
  val_scores = []
  val_scores_mean = []
  test_predictions =[]
  test_predictions_proba =[]
  #accuracy = []

  scoring = ['precision_micro','precision_macro']#, 'recall_micro','recall_macro','f1_micro','f1_macro']

  for name, clf in zip(names, classifiers):
      clf.fit(X_train, y_train)
      #cv_result = cross_val_score(clf, X_train, y_train, cv=5,scoring =scoring)
      #val_scores.append(cv_result)
      #val_scores_mean.append(cv_result.mean())

      test_predictions.append(clf.predict(X_test))
      test_predictions_proba.append(clf.predict_proba(X_test))
      score = clf.score(X_test, y_test)
      test_score.append(score)
      #accuracy.append(accuracy_score(y_test, test_predictions))

  dfs = pd.DataFrame()
  dfs['Name'] = names
  dfs['Test Score'] = test_score
  #dfs['Accuracy'] = accuracy
  #dfs['5-Fold CV F1 Micro'] = val_scores_mean

  return test_predictions,test_predictions_proba, dfs#, cv_result


In [None]:
import seaborn as sns
#predictions, predictions_proba, dfs = classify(val_folds[0]['X_train'], val_folds[0]['X_val'], val_folds[0]['y_train'], val_folds[0]['y_val'])
predictions, predictions_proba, dfs = classify(X_train.values,X_test.values, y_train.values, y_test.values)

cm = sns.light_palette("purple", as_cmap=True)
s = dfs.style.background_gradient(cmap=cm)
print("Results:")
s

Results:


Unnamed: 0,Name,Test Score
0,Random_Forest,0.879


5CV

In [None]:
from sklearn.metrics import classification_report
import numpy as np
import pandas as pd

file_name = "train"
# Initialize list to store metrics
reports = []
predictions_folds = []
predictions_proba_folds = []
# Loop over each fold
for i in range(5):
    # Run classify function
    predictions, predictions_proba, dfs = classify(
        gen_dict[i]['X_train_gen'],#gen_dict[i]['X_train_gen'],#all_folds[i]['X_train'].values,
        all_folds[i]['X_val'].values,
        gen_dict[i]['y_train_gen'],#gen_dict[i]['y_train_gen'],#all_folds[i]['y_train'].values,
        all_folds[i]['y_val'].values
    )

    # Get classification report as dict
    report = classification_report(
        all_folds[i]['y_val'].values,
        predictions[0],
        output_dict=True,
        zero_division=0
    )

    reports.append(report)
    predictions_folds.append(predictions)
    predictions_proba_folds.append(predictions_proba)

# Convert reports to DataFrames and average over folds
# Only include class-level and 'macro avg', 'weighted avg' metrics
metrics = ['precision', 'recall', 'f1-score', 'support']
labels = [label for label in reports[0].keys() if label not in ('accuracy')]

# Build DataFrame
report_df = pd.DataFrame(
    {
        label: {metric: np.mean([r[label][metric] for r in reports]) for metric in metrics}
        for label in labels
    }
).T

# Add accuracy separately
accuracy = np.mean([r['accuracy'] for r in reports])
report_df.loc['accuracy'] = [accuracy, accuracy, accuracy, np.mean([r['macro avg']['support'] for r in reports])]

with open(f"HCA/5CV/classification/"+file_name+"_predictions.pkl", "wb") as f:
    pickle.dump(predictions_folds, f)
with open(f"HCA/5CV/classification/"+file_name+"_predictions_proba.pkl", "wb") as f:
    pickle.dump(predictions_proba_folds, f)
with open(f"HCA/5CV/classification/"+file_name+"_reports.pkl", "wb") as f:
    pickle.dump(reports, f)

# Display result
print("Mean Classification Report Across Folds:")
print(report_df.round(4))


Mean Classification Report Across Folds:
              precision  recall  f1-score  support
1                0.9405  0.8185    0.8746    119.0
2                0.8912  0.4007    0.5514     99.8
3                0.8485  0.9721    0.9059    229.6
4                0.9032  0.9900    0.9446    301.2
5                0.8382  0.9666    0.8976    167.6
6                0.9665  0.9244    0.9448     68.6
7                0.8119  0.9937    0.8936    414.6
8                0.9655  0.9028    0.9330    117.4
9                0.8999  0.6637    0.7636    223.0
10               0.9609  0.9685    0.9646     50.6
11               0.9517  0.6667    0.7826     94.2
12               1.0000  0.7099    0.8281     22.8
13               0.9900  0.7835    0.8720     26.6
14               0.9805  0.9600    0.9695     20.0
15               0.9483  0.9340    0.9404     30.4
16               1.0000  0.9181    0.9571     14.6
macro avg        0.9311  0.8483    0.8765   2000.0
weighted avg     0.8878  0.8800    0.8719

# HCA 5CV

In [None]:
print("train")
print("Mean Classification Report Across Folds:")
print(report_df.round(4))

train
Mean Classification Report Across Folds:
              precision  recall  f1-score  support
1                0.9405  0.8185    0.8746    119.0
2                0.8912  0.4007    0.5514     99.8
3                0.8485  0.9721    0.9059    229.6
4                0.9032  0.9900    0.9446    301.2
5                0.8382  0.9666    0.8976    167.6
6                0.9665  0.9244    0.9448     68.6
7                0.8119  0.9937    0.8936    414.6
8                0.9655  0.9028    0.9330    117.4
9                0.8999  0.6637    0.7636    223.0
10               0.9609  0.9685    0.9646     50.6
11               0.9517  0.6667    0.7826     94.2
12               1.0000  0.7099    0.8281     22.8
13               0.9900  0.7835    0.8720     26.6
14               0.9805  0.9600    0.9695     20.0
15               0.9483  0.9340    0.9404     30.4
16               1.0000  0.9181    0.9571     14.6
macro avg        0.9311  0.8483    0.8765   2000.0
weighted avg     0.8878  0.8800    

In [None]:
print("FB-MOE")
print("Mean Classification Report Across Folds:")
print(report_df.round(4))

FB-MOE
Mean Classification Report Across Folds:
              precision  recall  f1-score  support
1                0.9519  0.8269    0.8844    119.0
2                0.8991  0.4349    0.5853     99.8
3                0.8659  0.9704    0.9151    229.6
4                0.9302  0.9874    0.9579    301.2
5                0.8848  0.9582    0.9199    167.6
6                0.9670  0.9360    0.9510     68.6
7                0.8158  0.9937    0.8959    414.6
8                0.9632  0.9233    0.9426    117.4
9                0.9082  0.6726    0.7724    223.0
10               0.9502  0.9802    0.9650     50.6
11               0.9437  0.7875    0.8577     94.2
12               1.0000  0.8415    0.9127     22.8
13               0.9416  0.8729    0.9031     26.6
14               0.9810  0.9900    0.9851     20.0
15               0.9414  0.9346    0.9375     30.4
16               0.9492  0.9733    0.9590     14.6
macro avg        0.9308  0.8802    0.8965   2000.0
weighted avg     0.8987  0.8929   

In [None]:
print("FB-paper")#MAF-FB
print("Mean Classification Report Across Folds:")
print(report_df.round(4))

FB-paper
Mean Classification Report Across Folds:
              precision  recall  f1-score  support
1                0.9569  0.8151    0.8798    119.0
2                0.9049  0.4549    0.6050     99.8
3                0.8547  0.9712    0.9092    229.6
4                0.9257  0.9894    0.9564    301.2
5                0.8819  0.9523    0.9157    167.6
6                0.9556  0.9390    0.9470     68.6
7                0.8174  0.9918    0.8961    414.6
8                0.9615  0.9216    0.9409    117.4
9                0.9114  0.6619    0.7668    223.0
10               0.9545  0.9803    0.9670     50.6
11               0.9315  0.7749    0.8458     94.2
12               1.0000  0.8332    0.9079     22.8
13               0.9615  0.8801    0.9185     26.6
14               0.9810  1.0000    0.9902     20.0
15               0.9461  0.9151    0.9297     30.4
16               0.9375  1.0000    0.9669     14.6
macro avg        0.9301  0.8800    0.8964   2000.0
weighted avg     0.8970  0.8909 

In [None]:
print("CTGAN")
print("Mean Classification Report Across Folds:")
print(report_df.round(4))

CTGAN
Mean Classification Report Across Folds:
              precision  recall  f1-score  support
1                0.9523  0.7950    0.8659    119.0
2                0.8794  0.4269    0.5744     99.8
3                0.8576  0.9730    0.9115    229.6
4                0.9031  0.9874    0.9433    301.2
5                0.8644  0.9642    0.9112    167.6
6                0.9724  0.9273    0.9491     68.6
7                0.8079  0.9937    0.8912    414.6
8                0.9662  0.9182    0.9414    117.4
9                0.9182  0.6538    0.7637    223.0
10               0.9540  0.9764    0.9649     50.6
11               0.9520  0.7263    0.8226     94.2
12               1.0000  0.7538    0.8560     22.8
13               0.9727  0.8054    0.8808     26.6
14               0.9718  0.9600    0.9651     20.0
15               0.9190  0.9475    0.9328     30.4
16               0.9367  0.9733    0.9526     14.6
macro avg        0.9267  0.8614    0.8829   2000.0
weighted avg     0.8912  0.8837    

In [None]:
print("TVAE")
print("Mean Classification Report Across Folds:")
print(report_df.round(4))

TVAE
Mean Classification Report Across Folds:
              precision  recall  f1-score  support
1                0.9373  0.8168    0.8721    119.0
2                0.8760  0.4248    0.5720     99.8
3                0.8404  0.9660    0.8986    229.6
4                0.9012  0.9907    0.9438    301.2
5                0.8673  0.9630    0.9122    167.6
6                0.9636  0.9302    0.9462     68.6
7                0.8037  0.9928    0.8882    414.6
8                0.9657  0.8960    0.9290    117.4
9                0.8955  0.6242    0.7353    223.0
10               0.9574  0.9724    0.9647     50.6
11               0.9529  0.7347    0.8275     94.2
12               1.0000  0.6842    0.8124     22.8
13               0.9913  0.7447    0.8490     26.6
14               0.9805  0.9700    0.9749     20.0
15               0.9402  0.9211    0.9303     30.4
16               0.9867  0.9324    0.9581     14.6
macro avg        0.9287  0.8478    0.8759   2000.0
weighted avg     0.8856  0.8779    0

In [None]:
print("GC")
print("Mean Classification Report Across Folds:")
print(report_df.round(4))

GC
Mean Classification Report Across Folds:
              precision  recall  f1-score  support
1                0.9381  0.8235    0.8764    119.0
2                0.8854  0.3988    0.5491     99.8
3                0.8442  0.9713    0.9032    229.6
4                0.9015  0.9887    0.9430    301.2
5                0.8343  0.9749    0.8989    167.6
6                0.9727  0.9301    0.9507     68.6
7                0.8091  0.9913    0.8909    414.6
8                0.9588  0.9096    0.9334    117.4
9                0.9037  0.6529    0.7578    223.0
10               0.9617  0.9763    0.9687     50.6
11               0.9630  0.6560    0.7788     94.2
12               1.0000  0.6929    0.8183     22.8
13               1.0000  0.7607    0.8621     26.6
14               1.0000  0.9400    0.9684     20.0
15               0.9595  0.9277    0.9433     30.4
16               1.0000  0.9457    0.9714     14.6
macro avg        0.9332  0.8463    0.8759   2000.0
weighted avg     0.8870  0.8786    0.8

# SKF 3000 Results

In [None]:
print("Original")
print("Mean Classification Report Across Folds:")
print(report_df.round(4))

Original
Mean Classification Report Across Folds:
                    precision  recall  f1-score  support
PSC                    0.8640  0.8691    0.8644     10.6
acinar                 0.9470  0.9582    0.9523    272.4
activated_stellate     0.9483  0.9472    0.9471     56.8
alpha                  0.9580  0.9725    0.9652    974.2
beta                   0.9565  0.9699    0.9631    737.6
delta                  0.9550  0.9378    0.9463    189.8
ductal                 0.9475  0.9447    0.9460    340.0
endothelial            0.9641  0.8996    0.9299     57.8
epsilon                0.0000  0.0000    0.0000      4.2
gamma                  0.8959  0.8326    0.8630     84.8
macrophage             0.9556  0.7273    0.8201     11.0
mast                   0.9600  0.8000    0.8711      5.0
mesenchymal            0.9050  0.9000    0.8994     16.0
pp                     0.9393  0.9081    0.9232     37.0
quiescent_stellate     0.9280  0.8440    0.8829     34.6
schwann                0.0000  0.0000 

In [None]:
print("TVAE")
print("Mean Classification Report Across Folds:")
print(report_df.round(4))

TVAE
Mean Classification Report Across Folds:
                    precision  recall  f1-score  support
PSC                    0.8677  0.8891    0.8758     10.6
acinar                 0.9504  0.9559    0.9529    272.4
activated_stellate     0.9234  0.9613    0.9414     56.8
alpha                  0.9599  0.9729    0.9664    974.2
beta                   0.9552  0.9702    0.9626    737.6
delta                  0.9555  0.9262    0.9406    189.8
ductal                 0.9536  0.9424    0.9479    340.0
endothelial            0.9639  0.8996    0.9301     57.8
epsilon                0.4000  0.1000    0.1600      4.2
gamma                  0.9023  0.8539    0.8770     84.8
macrophage             0.9596  0.7636    0.8419     11.0
mast                   0.9600  0.8400    0.8933      5.0
mesenchymal            0.8829  0.9125    0.8946     16.0
pp                     0.9306  0.9189    0.9240     37.0
quiescent_stellate     0.9261  0.8329    0.8758     34.6
schwann                1.0000  0.5667    0

In [None]:
print("CTGAN")
print("Mean Classification Report Across Folds:")
print(report_df.round(4))

CTGAN
Mean Classification Report Across Folds:
                    precision  recall  f1-score  support
PSC                    0.8397  0.8891    0.8599     10.6
acinar                 0.9531  0.9552    0.9539    272.4
activated_stellate     0.9511  0.9543    0.9524     56.8
alpha                  0.9607  0.9723    0.9664    974.2
beta                   0.9588  0.9696    0.9642    737.6
delta                  0.9547  0.9326    0.9435    189.8
ductal                 0.9548  0.9424    0.9485    340.0
endothelial            0.9647  0.9134    0.9373     57.8
epsilon                0.2000  0.0500    0.0800      4.2
gamma                  0.8981  0.8563    0.8762     84.8
macrophage             0.9618  0.8000    0.8677     11.0
mast                   0.9600  0.9200    0.9378      5.0
mesenchymal            0.8421  0.9125    0.8737     16.0
pp                     0.9204  0.9297    0.9247     37.0
quiescent_stellate     0.9201  0.8845    0.9012     34.6
schwann                1.0000  0.8000    

In [None]:
print("GC")
print("Mean Classification Report Across Folds:")
print(report_df.round(4))

GC
Mean Classification Report Across Folds:
                    precision  recall  f1-score  support
PSC                    0.8677  0.8891    0.8758     10.6
acinar                 0.9526  0.9552    0.9536    272.4
activated_stellate     0.9348  0.9578    0.9461     56.8
alpha                  0.9607  0.9725    0.9665    974.2
beta                   0.9568  0.9710    0.9638    737.6
delta                  0.9565  0.9273    0.9416    189.8
ductal                 0.9520  0.9447    0.9483    340.0
endothelial            0.9649  0.9307    0.9470     57.8
epsilon                0.0000  0.0000    0.0000      4.2
gamma                  0.9045  0.8727    0.8880     84.8
macrophage             0.9600  0.7818    0.8573     11.0
mast                   0.9600  0.8800    0.9156      5.0
mesenchymal            0.8939  0.9125    0.9003     16.0
pp                     0.9407  0.9189    0.9288     37.0
quiescent_stellate     0.9460  0.8615    0.9009     34.6
schwann                0.4000  0.1333    0.2

---------------------