# ISCB-Africa ASBCB 2025

**Venue:** Lagoon Beach Hotel & Conference Center, Cape Town, South Africa

**Website:** https://www.iscb.org/africa2025/home

**Date of the session:** April 10, 2025 12:00-16:00 SAST

**Instructors/Affiliation:** 
1. Loni Taylor, Meharry Medical College, Nashville, TN, USA.

2. Bishnu Sarker, Meharry Medical College, Nashville, TN, USA.
3. Animesh Acharjee, University of Birmingham, UK.

## **Similarity Network Fusion for Multiomics Data Integration**

### Learning Objectives
- Understand the mathematical foundations of Similarity Network Fusion (SNF).
- Construct similarity networks from multi-omics datasets.
- Fuse networks using SNF and perform spectral clustering.
- Apply the SNFpy package in Python for integrative data analysis.
- Interpret fused similarity graphs and derive biological insights.

### 1. Overview
Biomedical data such as gene expression, methylation, and mutation data are heterogeneous. 
To capture complex relationships, SNF constructs a fused network that integrates all sources.

**Input**: Multiple omics datasets

**Process**: Steps involved in similarity network fusion are as follows: 
1. Build similarity network from each dataset
2. Iterative message passing
3. Fused network

**Output**: Unified patient similarity graph

### 2. Mathematical background

### Step 1: Similarity Matrix
$$ W_{ij} = \exp\left(-\frac{\|x_i - x_j\|^2}{\mu}\right) $$
Only retain $K$ nearest neighbors.

### Step 2: SNF Iterative Update
$$ W_i^{(t+1)} = P_i \left(\sum_{j \ne i} W_j^{(t)}\right) P_i^\top $$
Where $P_i$ is a row-normalized transition matrix.

### Step 3: Spectral Clustering
$$ L = D - W $$
Apply k-means on the eigenvectors of $L$ and $D$ is the diagonal matrix.


### 3. Similarity Network Fusion usng SNFpy python Package. 

If you have not installed snfpy, please use following command:

!pip install snfpy

#### Demonstrating the fusion using randomly generated data points. 

Lets us generate 3 dataframes: 
1. First one to simulate transcriptomics
2. Second one to simulate Genomics/mutations
3. Third one is to simulate epigenomics/Methylation.

In [39]:
import numpy as np
from snf import make_affinity, snf, compute
from sklearn.preprocessing import StandardScaler
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

In [5]:

np.random.seed(0)

In [6]:
# Simulating three types of omics data (e.g., expression, methylation, mutation)
np.random.seed(42)  # for reproducibility

# 100 samples (patients), 500 gene expression features
X1 = np.random.rand(100, 500)  

# 100 samples, 300 methylation features
X2 = np.random.rand(100, 300)

# 100 samples, 200 mutation features
X3 = np.random.rand(100, 200)

#### Let's scale the data points centered around the mean

In [9]:
scaler = StandardScaler()

X1_scaled = scaler.fit_transform(X1)
X2_scaled = scaler.fit_transform(X2)
X3_scaled = scaler.fit_transform(X3)

#### Next step is to make similarity network for each individual omics data. 

The scaled exponential similarity kernel, based on the probability density
function of the normal distribution, takes the form:

 
$\mathbf{W}(i, j) = \frac{1}{\sqrt{2\pi\sigma^2}}$
                          $\ exp^{-\frac{\rho^2(x_{i},x_{j})}{2\sigma^2}}$

where $\rho(x_{i},x_{j})$ is the Euclidean distance (or other distance metric, as appropriate) between patients $x_{i}$ and $x_{j}$. The value for $\\sigma$ is calculated as:

$\sigma = \mu\ \frac{\overline{\rho}(x_{i},N_{i})+ \overline{\rho}(x_{j},N_{j}) + \rho(x_{i},x_{j})}{3} $

where $\overline{\rho}(x_{i},N_{i})$ represents the average value of distances between $x_{i}$ and its neighbors $N_{1..K}$, and $\mu\in(0, 1)\subset\mathbb{R}$.

In [10]:
aff1 = make_affinity(X1_scaled, K=20, mu=0.5)
aff2 = make_affinity(X2_scaled, K=20, mu=0.5)
aff3 = make_affinity(X3_scaled, K=20, mu=0.5)

#### Fusing the graphs

In order to fuse the supplied $m$ arrays, each must be normalized. Atraditional normalization on an affinity matrix would suffer from numerical instabilities due to the self-similarity along the diagonal; thus, a modified normalization is used:


$\mathbf{P}(i,j) =\left\{\begin{array}{rr}\frac{\mathbf{W}_(i,j)} {2 \sum_{k\neq i}^{} \mathbf{W}_(i,k)} ,& j \neq i \\1/2 ,& j = i \end{array}\right.$

Under the assumption that local similarities are more important than distant ones, a more sparse weight matrix is calculated based on a KNN framework.
$\mathbf{S}(i,j) = \left\{\begin{array}{rr} \frac{\mathbf{W}_(i,j)} {\sum_{k\in N_{i}}^{}\mathbf{W}_(i,k)} ,& j \in N_{i} \\ 0 ,& \text{otherwise} \end{array}\right.$

The two weight matrices $\mathbf{P}$ and $\mathbf{S}$ thus provide information about a given patient's similarity to all other patients and the `K` most similar patients, respectively.
These :math:`m` matrices are then iteratively fused. At each iteration, the matrices are made more similar to each other via:
$\mathbf{P}^{(v)} = \mathbf{S}^{(v)} \times \frac{\sum_{k\neq v}^{}\mathbf{P}^{(k)}}{m-1} \times (\mathbf{S}^{(v)})^{T}, v = 1, 2, ..., m $

After each iteration, the resultant matrices are normalized via the equation above. Fusion stops after `t` iterations, or when the matrices $\mathbf{P}^{(v)}, v = 1, 2, ..., m$ converge.

The output fused matrix is full rank and can be subjected to clustering and classification.

In [11]:
fused_network = snf([aff1, aff2, aff3], K=20, t=20)

#### Clustering

In [12]:
from sklearn.cluster import spectral_clustering
n_clusters = 3
labels = spectral_clustering(fused_network, n_clusters=n_clusters)

#### Listing marker features. 

In [13]:
from scipy.stats import f_oneway

# Example: gene_expression is a (samples x genes) DataFrame
significant_genes = []

for gene in range(X1.shape[1]):
    groups = [X1[labels == i][:,gene] for i in range(n_clusters)]
    fval, pval = f_oneway(*groups)
    if pval < 0.01:
        significant_genes.append((gene, pval))

# Sort by p-value
biomarkers = sorted(significant_genes, key=lambda x: x[1])

In [14]:
biomarkers

[(113, 1.7991410712101845e-05),
 (59, 7.700401334773176e-05),
 (52, 0.0004430119721814418),
 (173, 0.0010312303779545493),
 (166, 0.0015690557346617574),
 (355, 0.002009975693624609),
 (433, 0.0023843306512554585),
 (111, 0.003084321633307054),
 (4, 0.0031980767373908092),
 (322, 0.0038562216784861756),
 (210, 0.003907037647529072),
 (260, 0.005445168303983312),
 (498, 0.005878377402124463),
 (471, 0.006767128831383652),
 (154, 0.006916463562072587),
 (182, 0.00750914953775964),
 (348, 0.009265927435693459)]

#### 4. Case Study : Applying  similarity graph based fusion to CLL datasets to find biomarkers (e.g., differentially expressed genes).

In [23]:
data_loc = ""
df_meth = pd.read_csv(data_loc + "CLL_data_Methylation.csv", index_col=0)
df_mrna = pd.read_csv(data_loc + "CLL_data_mRNA.csv", index_col=0)
df_mut = pd.read_csv(data_loc + "CLL_Mutations.txt", sep='\t', index_col=0)

In [24]:
df_mut

Unnamed: 0,H045,H109,H024,H056,H079,H164,H059,H167,H113,H049,...,H178,H166,H174,H177,H259,H175,H179,H050,H180,H229
gain2p25.3,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,,0.0,0.0,,0.0,0.0,,0.0,
gain3q26,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,,0.0,0.0,,0.0,1.0,,0.0,
del6p21.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,,0.0,0.0,,0.0,0.0,,0.0,
del6q21,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,,1.0,0.0,,0.0,0.0
del8p12,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,...,0.0,,0.0,0.0,,0.0,0.0,,0.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
UMODL1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,,,,,,,,,,
VWF,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,,,,,,,,,,
XPO1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,...,,,,,,,,,,
ZC3H18,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,,,,,,,,,,


In [25]:
# drop nans by column
df_mrna = df_mrna.dropna(axis='columns')
df_meth = df_meth.dropna(axis='columns')
df_mut = df_mut.dropna(axis='columns')

In [28]:
df_meth

Unnamed: 0,H045,H109,H024,H056,H079,H164,H059,H167,H113,H049,...,H106,H176,H136,H178,H166,H174,H177,H259,H175,H179
cg10146935,1.811086,-3.997508,-2.844313,-3.338656,-0.019362,-2.485997,-1.460211,-4.952291,-2.980209,-0.091713,...,-4.946979,-0.838754,-4.684879,-5.077259,-0.625954,-4.918812,-4.727493,-5.193812,-4.437189,-5.060459
cg26837773,-5.172572,1.594870,0.161170,-2.093433,3.748980,0.060530,-3.472232,0.547577,2.440098,2.940767,...,0.199544,0.341156,3.496116,-4.821910,-0.858200,3.214163,2.036858,-0.816088,4.043775,-2.345652
cg17801765,5.411526,5.412693,0.365706,0.373634,5.412010,5.268908,-4.989999,5.337081,0.749546,0.426493,...,0.574758,0.707883,0.438992,4.873615,0.753594,-0.628446,-4.584779,0.547390,4.086683,0.135581
cg13244315,-0.118825,1.043871,-4.219236,-1.592196,1.416418,4.659831,-0.461120,-1.918861,-1.237015,-0.421916,...,-1.405492,1.522866,3.779207,4.069311,3.287555,2.059305,2.244938,-0.210781,1.388141,-3.354897
cg06181703,5.120384,1.279480,0.721100,4.047059,5.237422,1.761247,4.543997,4.939463,4.781683,5.051073,...,4.849147,4.654126,4.362140,4.123456,2.817449,2.874335,-3.451370,-0.025308,4.143205,-3.581970
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
cg07600533,3.901933,2.634941,1.516759,-3.884756,-4.238106,-2.004915,1.596288,4.086832,-0.027165,-4.139885,...,2.725696,4.026755,1.418229,3.823024,0.053674,-0.473871,3.547123,3.433890,3.653984,-3.766334
cg08260245,5.713831,5.107460,5.676245,5.950338,6.040756,0.929815,5.603487,1.237379,1.057925,5.902691,...,4.614008,4.861668,4.913404,-1.077382,0.610930,3.458398,-0.793773,4.660379,4.388997,-0.594636
cg19112186,5.703520,1.326244,5.488636,5.354059,5.584746,-0.539210,5.385392,0.915215,1.802086,5.366336,...,4.456464,4.844164,4.740198,0.421066,0.583761,4.432086,-0.008298,0.851485,3.245582,0.751461
cg10770023,5.166255,0.677912,4.221828,4.934536,5.095111,-0.083568,5.202978,0.109868,4.331600,4.890801,...,4.417512,4.791798,4.991042,0.372515,0.378272,2.637359,0.031687,0.949804,4.690901,0.325863


In [29]:
df_mrna = df_mrna.T
df_meth = df_meth.T
df_mut=df_mut.T 

In [30]:
df_meth

Unnamed: 0,cg10146935,cg26837773,cg17801765,cg13244315,cg06181703,cg19626656,cg15207968,cg12755103,cg23651812,cg14287724,...,cg07016730,cg25152348,cg08425796,cg05418105,cg22249529,cg07600533,cg08260245,cg19112186,cg10770023,cg00270625
H045,1.811086,-5.172572,5.411526,-0.118825,5.120384,0.145951,-3.436869,-3.844246,2.075422,3.501829,...,3.547843,0.060132,4.442026,2.861301,5.246799,3.901933,5.713831,5.703520,5.166255,4.911655
H109,-3.997508,1.594870,5.412693,1.043871,1.279480,-3.928433,2.989245,0.393004,4.800121,3.159201,...,0.887926,-0.214753,4.561187,3.919911,5.058302,2.634941,5.107460,1.326244,0.677912,5.281115
H024,-2.844313,0.161170,0.365706,-4.219236,0.721100,-3.418859,-3.250385,-2.691305,0.534854,-4.629484,...,-4.486709,0.121749,-2.841373,-3.607177,0.765651,1.516759,5.676245,5.488636,4.221828,5.379716
H056,-3.338656,-2.093433,0.373634,-1.592196,4.047059,0.226601,2.377386,-2.775075,0.419985,0.312388,...,-4.238214,0.137862,-3.964855,-2.270940,-2.631909,-3.884756,5.950338,5.354059,4.934536,5.366823
H079,-0.019362,3.748980,5.412010,1.416418,5.237422,0.324213,-0.647632,-3.098837,5.397188,3.410770,...,2.758021,0.021011,0.673296,3.455230,-3.140733,-4.238106,6.040756,5.584746,5.095111,5.338470
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
H174,-4.918812,3.214163,-0.628446,2.059305,2.874335,0.064801,-4.095393,-3.940087,-2.687416,3.183535,...,1.727112,-5.747195,4.277795,3.489834,0.373859,-0.473871,3.458398,4.432086,2.637359,4.716947
H177,-4.727493,2.036858,-4.584779,2.244938,-3.451370,-2.433687,-3.876421,-4.070582,-0.478506,3.122097,...,-0.564552,-5.658218,4.060304,4.112607,2.383421,3.547123,-0.793773,-0.008298,0.031687,0.354774
H259,-5.193812,-0.816088,0.547390,-0.210781,-0.025308,-2.463599,0.324804,-0.729888,-2.114203,1.561882,...,0.601878,-5.509716,3.837758,3.667093,1.484485,3.433890,4.660379,0.851485,0.949804,1.288995
H175,-4.437189,4.043775,4.086683,1.388141,4.143205,-2.055345,2.718916,-3.591529,-4.462097,3.526029,...,1.288681,-5.302995,4.441621,4.130137,4.599153,3.653984,4.388997,3.245582,4.690901,1.470244


In [31]:
df_mrna

Unnamed: 0,ENSG00000244734,ENSG00000158528,ENSG00000198478,ENSG00000175445,ENSG00000174469,ENSG00000188536,ENSG00000186522,ENSG00000196263,ENSG00000198046,ENSG00000144642,...,ENSG00000136492,ENSG00000143198,ENSG00000161653,ENSG00000203778,ENSG00000177599,ENSG00000111328,ENSG00000165474,ENSG00000164061,ENSG00000166816,ENSG00000165972
H045,4.558644,11.741854,8.921456,12.686458,2.644946,2.644946,11.473792,9.680574,10.323723,11.137333,...,7.220013,10.177649,4.667130,7.052534,6.323287,10.059942,1.528848,5.771337,1.528848,5.256267
H109,2.721512,13.287432,2.721512,10.925985,12.648355,1.528848,10.271483,9.986980,10.231973,1.528848,...,7.947078,8.223803,4.775046,6.279164,5.799820,10.361200,1.528848,5.359180,2.383843,7.360436
H024,9.938456,2.341006,12.381452,1.528848,1.528848,6.664661,3.408744,3.657904,3.657904,2.341006,...,8.967320,10.096502,5.815616,6.369060,6.539086,8.152703,1.528848,6.059008,2.341006,5.715771
H056,13.278004,3.232874,8.106266,1.528848,13.565210,9.580385,3.410471,3.565827,3.232874,2.417160,...,8.532982,10.238564,5.902633,5.728777,5.929667,7.737858,2.417160,5.495087,3.232874,5.789100
H079,6.086874,11.940820,4.889503,13.340588,5.476914,3.862678,9.955379,10.244702,10.567114,7.967849,...,7.760580,8.790501,5.929477,5.833956,6.036300,8.274131,2.270282,5.102382,1.528848,6.316072
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
H070,3.254823,1.528848,1.528848,4.436292,12.418931,1.528848,1.528848,1.528848,1.528848,1.528848,...,8.410700,9.314123,4.956979,6.873084,6.070732,8.067587,1.528848,5.111537,1.528848,7.060619
H255,3.269304,12.427299,2.907226,11.425088,1.528848,2.344369,9.696142,10.988039,10.060879,5.343396,...,6.579472,10.558828,5.043407,6.954905,5.874135,8.751176,1.528848,5.726740,2.667779,5.986365
H135,1.528848,1.528848,1.528848,1.528848,12.815852,1.528848,1.528848,1.528848,1.528848,1.528848,...,7.178759,9.572243,3.491024,6.028750,5.038032,8.350212,1.528848,6.936723,1.528848,5.756981
H247,8.826116,3.121428,8.087886,5.680739,9.970217,5.836520,1.528848,7.679473,2.682078,3.121428,...,7.243387,9.173220,4.317162,6.526577,4.165399,9.158571,1.528848,4.517548,1.528848,6.241231


In [32]:

cols = df_meth.columns.copy()
columns = {}
for c in cols:
    mask = df_meth[c] < 0
    columns[c + '_p'] = df_meth[c].mask(mask)
    columns[c + '_n'] = - df_meth[c].mask(~mask)

df_meth = pd.concat(list(columns.values()), keys=list(columns.keys()), axis=1)  
df_meth = df_meth.fillna(0)



In [33]:
df_meth

Unnamed: 0,cg10146935_p,cg10146935_n,cg26837773_p,cg26837773_n,cg17801765_p,cg17801765_n,cg13244315_p,cg13244315_n,cg06181703_p,cg06181703_n,...,cg07600533_p,cg07600533_n,cg08260245_p,cg08260245_n,cg19112186_p,cg19112186_n,cg10770023_p,cg10770023_n,cg00270625_p,cg00270625_n
H045,1.811086,0.000000,0.000000,5.172572,5.411526,0.000000,0.000000,0.118825,5.120384,0.000000,...,3.901933,0.000000,5.713831,0.000000,5.703520,0.000000,5.166255,0.0,4.911655,0.0
H109,0.000000,3.997508,1.594870,0.000000,5.412693,0.000000,1.043871,0.000000,1.279480,0.000000,...,2.634941,0.000000,5.107460,0.000000,1.326244,0.000000,0.677912,0.0,5.281115,0.0
H024,0.000000,2.844313,0.161170,0.000000,0.365706,0.000000,0.000000,4.219236,0.721100,0.000000,...,1.516759,0.000000,5.676245,0.000000,5.488636,0.000000,4.221828,0.0,5.379716,0.0
H056,0.000000,3.338656,0.000000,2.093433,0.373634,0.000000,0.000000,1.592196,4.047059,0.000000,...,0.000000,3.884756,5.950338,0.000000,5.354059,0.000000,4.934536,0.0,5.366823,0.0
H079,0.000000,0.019362,3.748980,0.000000,5.412010,0.000000,1.416418,0.000000,5.237422,0.000000,...,0.000000,4.238106,6.040756,0.000000,5.584746,0.000000,5.095111,0.0,5.338470,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
H174,0.000000,4.918812,3.214163,0.000000,0.000000,0.628446,2.059305,0.000000,2.874335,0.000000,...,0.000000,0.473871,3.458398,0.000000,4.432086,0.000000,2.637359,0.0,4.716947,0.0
H177,0.000000,4.727493,2.036858,0.000000,0.000000,4.584779,2.244938,0.000000,0.000000,3.451370,...,3.547123,0.000000,0.000000,0.793773,0.000000,0.008298,0.031687,0.0,0.354774,0.0
H259,0.000000,5.193812,0.000000,0.816088,0.547390,0.000000,0.000000,0.210781,0.000000,0.025308,...,3.433890,0.000000,4.660379,0.000000,0.851485,0.000000,0.949804,0.0,1.288995,0.0
H175,0.000000,4.437189,4.043775,0.000000,4.086683,0.000000,1.388141,0.000000,4.143205,0.000000,...,3.653984,0.000000,4.388997,0.000000,3.245582,0.000000,4.690901,0.0,1.470244,0.0


In [34]:
X = pd.concat([df_mrna.T, df_meth.T])
X = X.dropna(axis='columns')
print(X.shape)

(13496, 135)


In [35]:
X

Unnamed: 0,H045,H109,H024,H056,H079,H164,H059,H167,H113,H049,...,H271,H006,H084,H260,H192,H070,H255,H135,H247,H066
ENSG00000244734,4.558644,2.721512,9.938456,13.278004,6.086874,2.571839,4.938961,1.528848,2.286122,2.504699,...,4.199712,8.607476,10.682876,12.365431,6.731859,3.254823,3.269304,1.528848,8.826116,4.063590
ENSG00000158528,11.741854,13.287432,2.341006,3.232874,11.940820,11.506818,5.483675,2.618869,2.812801,2.504699,...,3.743776,3.948041,2.651553,12.776441,3.071359,1.528848,12.427299,1.528848,3.121428,12.465548
ENSG00000198478,8.921456,2.721512,12.381452,8.106266,4.889503,12.756213,3.593890,4.119490,5.220041,2.884897,...,2.226109,5.306285,9.321213,10.534619,6.091324,1.528848,2.907226,1.528848,8.087886,11.948637
ENSG00000175445,12.686458,10.925985,1.528848,1.528848,13.340588,10.885547,11.194029,11.599981,2.286122,2.884897,...,2.226109,9.034459,9.397879,11.786520,1.528848,4.436292,11.425088,1.528848,5.680739,10.767604
ENSG00000174469,2.644946,12.648355,1.528848,13.565210,5.476914,10.975187,7.944246,2.618869,2.286122,12.940957,...,13.723207,10.394117,12.091816,9.442299,4.948473,12.418931,1.528848,12.815852,9.970217,10.721614
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
cg19112186_n,0.000000,0.000000,0.000000,0.000000,0.000000,0.539210,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
cg10770023_p,5.166255,0.677912,4.221828,4.934536,5.095111,0.000000,5.202978,0.109868,4.331600,4.890801,...,5.267949,4.786754,0.329337,0.243252,4.923801,5.110465,0.553364,5.283331,4.635037,1.159108
cg10770023_n,0.000000,0.000000,0.000000,0.000000,0.000000,0.083568,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
cg00270625_p,4.911655,5.281115,5.379716,5.366823,5.338470,1.084841,4.993814,0.000000,0.000000,5.254260,...,5.450230,5.445228,0.695874,0.834654,5.091327,5.223667,0.932052,5.193239,5.162510,0.000000


In [36]:
X1 = X.iloc[5000:, :].T
X2 = X.iloc[:5000, :].T

In [37]:
X1

Unnamed: 0,cg10146935_p,cg10146935_n,cg26837773_p,cg26837773_n,cg17801765_p,cg17801765_n,cg13244315_p,cg13244315_n,cg06181703_p,cg06181703_n,...,cg07600533_p,cg07600533_n,cg08260245_p,cg08260245_n,cg19112186_p,cg19112186_n,cg10770023_p,cg10770023_n,cg00270625_p,cg00270625_n
H045,1.811086,0.000000,0.000000,5.172572,5.411526,0.0,0.000000,0.118825,5.120384,0.0,...,3.901933,0.000000,5.713831,0.000000,5.703520,0.0,5.166255,0.0,4.911655,0.000000
H109,0.000000,3.997508,1.594870,0.000000,5.412693,0.0,1.043871,0.000000,1.279480,0.0,...,2.634941,0.000000,5.107460,0.000000,1.326244,0.0,0.677912,0.0,5.281115,0.000000
H024,0.000000,2.844313,0.161170,0.000000,0.365706,0.0,0.000000,4.219236,0.721100,0.0,...,1.516759,0.000000,5.676245,0.000000,5.488636,0.0,4.221828,0.0,5.379716,0.000000
H056,0.000000,3.338656,0.000000,2.093433,0.373634,0.0,0.000000,1.592196,4.047059,0.0,...,0.000000,3.884756,5.950338,0.000000,5.354059,0.0,4.934536,0.0,5.366823,0.000000
H079,0.000000,0.019362,3.748980,0.000000,5.412010,0.0,1.416418,0.000000,5.237422,0.0,...,0.000000,4.238106,6.040756,0.000000,5.584746,0.0,5.095111,0.0,5.338470,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
H070,0.220273,0.000000,1.568219,0.000000,5.459845,0.0,0.000000,0.602089,2.963107,0.0,...,3.886102,0.000000,5.959290,0.000000,5.608365,0.0,5.110465,0.0,5.223667,0.000000
H255,0.000000,4.025299,0.000000,1.562166,4.287223,0.0,0.000000,1.179398,0.093875,0.0,...,0.000000,3.496187,0.000000,0.268956,0.755417,0.0,0.553364,0.0,0.932052,0.000000
H135,0.000000,2.155755,0.262750,0.000000,4.935740,0.0,0.000000,3.033297,0.202252,0.0,...,0.293211,0.000000,5.527967,0.000000,5.049722,0.0,5.283331,0.0,5.193239,0.000000
H247,0.000000,3.034153,0.000000,0.581535,0.280928,0.0,0.000000,3.841313,2.514960,0.0,...,0.000000,0.938765,4.781603,0.000000,4.948248,0.0,4.635037,0.0,5.162510,0.000000


In [38]:
X2

Unnamed: 0,ENSG00000244734,ENSG00000158528,ENSG00000198478,ENSG00000175445,ENSG00000174469,ENSG00000188536,ENSG00000186522,ENSG00000196263,ENSG00000198046,ENSG00000144642,...,ENSG00000136492,ENSG00000143198,ENSG00000161653,ENSG00000203778,ENSG00000177599,ENSG00000111328,ENSG00000165474,ENSG00000164061,ENSG00000166816,ENSG00000165972
H045,4.558644,11.741854,8.921456,12.686458,2.644946,2.644946,11.473792,9.680574,10.323723,11.137333,...,7.220013,10.177649,4.667130,7.052534,6.323287,10.059942,1.528848,5.771337,1.528848,5.256267
H109,2.721512,13.287432,2.721512,10.925985,12.648355,1.528848,10.271483,9.986980,10.231973,1.528848,...,7.947078,8.223803,4.775046,6.279164,5.799820,10.361200,1.528848,5.359180,2.383843,7.360436
H024,9.938456,2.341006,12.381452,1.528848,1.528848,6.664661,3.408744,3.657904,3.657904,2.341006,...,8.967320,10.096502,5.815616,6.369060,6.539086,8.152703,1.528848,6.059008,2.341006,5.715771
H056,13.278004,3.232874,8.106266,1.528848,13.565210,9.580385,3.410471,3.565827,3.232874,2.417160,...,8.532982,10.238564,5.902633,5.728777,5.929667,7.737858,2.417160,5.495087,3.232874,5.789100
H079,6.086874,11.940820,4.889503,13.340588,5.476914,3.862678,9.955379,10.244702,10.567114,7.967849,...,7.760580,8.790501,5.929477,5.833956,6.036300,8.274131,2.270282,5.102382,1.528848,6.316072
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
H070,3.254823,1.528848,1.528848,4.436292,12.418931,1.528848,1.528848,1.528848,1.528848,1.528848,...,8.410700,9.314123,4.956979,6.873084,6.070732,8.067587,1.528848,5.111537,1.528848,7.060619
H255,3.269304,12.427299,2.907226,11.425088,1.528848,2.344369,9.696142,10.988039,10.060879,5.343396,...,6.579472,10.558828,5.043407,6.954905,5.874135,8.751176,1.528848,5.726740,2.667779,5.986365
H135,1.528848,1.528848,1.528848,1.528848,12.815852,1.528848,1.528848,1.528848,1.528848,1.528848,...,7.178759,9.572243,3.491024,6.028750,5.038032,8.350212,1.528848,6.936723,1.528848,5.756981
H247,8.826116,3.121428,8.087886,5.680739,9.970217,5.836520,1.528848,7.679473,2.682078,3.121428,...,7.243387,9.173220,4.317162,6.526577,4.165399,9.158571,1.528848,4.517548,1.528848,6.241231


In [40]:
graph_meth=make_affinity(X1, metric='euclidean', K=20, mu=0.5)

In [42]:
graph_meth.shape

(135, 135)

In [43]:
graph_mrna=make_affinity(X2, metric='euclidean', K=20, mu=0.5)
graph_mrna.shape

(135, 135)

In [46]:
graph_fusion=snf([graph_meth, graph_mrna], K=20, t=20)

In [47]:
graph_fusion

array([[0.55802961, 0.00746819, 0.00555081, ..., 0.0054432 , 0.00565573,
        0.00782555],
       [0.00746819, 0.55633207, 0.0059919 , ..., 0.00612188, 0.00627854,
        0.0076142 ],
       [0.00555081, 0.0059919 , 0.55619611, ..., 0.00928914, 0.00811511,
        0.00569219],
       ...,
       [0.0054432 , 0.00612188, 0.00928914, ..., 0.55842561, 0.00862933,
        0.00540643],
       [0.00565573, 0.00627854, 0.00811511, ..., 0.00862933, 0.55483685,
        0.0056395 ],
       [0.00782555, 0.0076142 , 0.00569219, ..., 0.00540643, 0.0056395 ,
        0.55462068]])

In [48]:
first, second = compute.get_n_clusters(graph_fusion)
first, second

(2, 4)

In [49]:
from sklearn.cluster import spectral_clustering
n_clusters = first
labels = spectral_clustering(fused_network, n_clusters=n_clusters)

In [52]:
X2.columns

Index(['ENSG00000244734', 'ENSG00000158528', 'ENSG00000198478',
       'ENSG00000175445', 'ENSG00000174469', 'ENSG00000188536',
       'ENSG00000186522', 'ENSG00000196263', 'ENSG00000198046',
       'ENSG00000144642',
       ...
       'ENSG00000136492', 'ENSG00000143198', 'ENSG00000161653',
       'ENSG00000203778', 'ENSG00000177599', 'ENSG00000111328',
       'ENSG00000165474', 'ENSG00000164061', 'ENSG00000166816',
       'ENSG00000165972'],
      dtype='object', length=5000)

In [54]:
labels

array([1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1,
       1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1,
       0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
       1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0], dtype=int32)

In [None]:

from scipy.stats import f_oneway

# Example: gene_expression is a (samples x genes) DataFrame
significant_genes = []

for gene in X2.columns:
    groups = [X2[labels == i][gene] for i in range(n_clusters)]
    fval, pval = f_oneway(*groups)
    if pval < 0.01:
        significant_genes.append((gene, pval))

# Sort by p-value
biomarkers = sorted(significant_genes, key=lambda x: x[1])

ValueError: Item wrong length 100 instead of 135.

In [56]:
from scipy.stats import f_oneway

# Example: gene_expression is a (samples x genes) DataFrame
significant_genes = []

for gene in range(X2.shape[1]) :
    groups = [X1[labels == i][:,gene] for i in range(n_clusters)]
    fval, pval = f_oneway(*groups)
    if pval < 0.01:
        significant_genes.append((gene, pval))

# Sort by p-value
biomarkers = sorted(significant_genes, key=lambda x: x[1])

ValueError: Item wrong length 100 instead of 135.