# NetColoc analysis of rare variants causing childhood dementia

Following the example protocol here: https://github.com/ucsd-ccbb/NetColoc/blob/d97f2597b6bf73f201a34c15dbba63eac014a793/example_notebooks/ASD_NetColoc_analysis.ipynb 

In [3]:
# load required packages
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import networkx as nx
import pandas as pd
import re
import random

from IPython.display import display

import getpass
import ndex2

import json
import cdapsutil

from gprofiler import GProfiler
gp = GProfiler("MyToolName/0.1")

from scipy.stats import hypergeom
from scipy.stats import norm

# latex rendering of text in graphs
import matplotlib as mpl
mpl.rc('text', usetex = False)
mpl.rc('font', family = 'serif')

from matplotlib import rcParams
rcParams['font.family'] = 'sans-serif'
rcParams['font.sans-serif'] = ['Arial']

sns.set(font_scale=1.4)

sns.set_style('white')

sns.set_style("ticks", {"xtick.major.size": 15, "ytick.major.size": 15})
plt.rcParams['svg.fonttype'] = 'none'

from datetime import datetime
import sys
%matplotlib inline

In [11]:
# verify DDOT was installed
import ddot

from netcoloc import netprop_zscore, netprop, network_colocalization, validation

# 2. Select one gene set of interest. Load gene set from text files into python.

Identify one gene sets of interest. Gene sets should come from experimental data (not manual curation) to avoid bias.

Usage Note: gene sets should be < 500 genes (propagation algorithm breaks down if seeded with larger sets). If your gene set is larger, only use the top 500 as seeds to the network propagation.

Here, will use the childhood dementia genes

In [8]:
# set names of geneset 1
# ------ customize this section based on your gene sets and how they should be labeled -------
d1_name='CD'

In [16]:
# load childhoood dementia genes frmo knowedgebase 
D1_df = pd.read_csv('genesets/childhooddementiagenes.csv')
D1_df.index = D1_df['GENE']
print('Number of '+d1_name+' genes:', len(D1_df))
D1_genes = D1_df.index.tolist() # define rare variant genes to seed network propagation
print("First 5 genes:", D1_genes[0:5])

Number of CD genes: 235
First 5 genes: ['SGSH', 'NAGLU', 'HGSNAT', 'GNS', 'MTTP']


# 3. Select gene interaction network to use for the analysis.

Identify network UUID on NDEx (ndexbio.org) and use this to import to this Jupyter notebook. We recommend using PCNet as a starting point (will try this now), but a user may want to switch to “STRING high confidence” if using a machine with low memory (< 8GB RAM).

This takes about 2 mins

In [17]:
interactome_uuid='4de852d9-9908-11e9-bcaf-0ac135e8bacf' # for PCNet
# interactome_uuid='275bd84e-3d18-11e8-a935-0ac135e8bacf' # for STRING high confidence
ndex_server='public.ndexbio.org'
ndex_user=None
ndex_password=None
G_int = ndex2.create_nice_cx_from_server(
            ndex_server, 
            username=ndex_user, 
            password=ndex_password, 
            uuid=interactome_uuid
        ).to_networkx()
nodes = list(G_int.nodes)

# remove self edges from network
G_int.remove_edges_from(nx.selfloop_edges(G_int))

# print out the numbers of nodes and edges in the interatome for diagnostic purposes:
print('Number of nodes:', len(G_int.nodes))
print('\nNumber of edges:', len(G_int.edges))

Number of nodes: 18820

Number of edges: 2693109


In [18]:
int_nodes = list(G_int.nodes)

# Identify network colocalized gene network
## 4. Precalculate matrices needed for propagation. This step should take a few minutes (more for larger/denser networks)
A benchmarking analysis demonstrates that the runtime required scales with the number of edges (w’) and the number of nodes (w’’). NetColoc includes functionality for saving and loading these matrices, by setting the XXX parameter, which can be useful if running multiple analyses. The diffusion parameter, which controls the rate of propagation through the network, may be set in this step. In practice, we have found that results are not dependent on the choice of this parameter, and recommend using the default value of 0.5.

Background on network propagation: https://www.nature.com/articles/nrg.2017.38.pdf?origin=ppub

In [19]:
# pre-calculate matrices used for network propagation. this step takes a few minutes, more for denser interactomes
print('\ncalculating w_prime')
w_prime = netprop.get_normalized_adjacency_matrix(G_int, conserve_heat=True)

print('\ncalculating w_double_prime')
w_double_prime = netprop.get_individual_heats_matrix(w_prime, .5)


calculating w_prime

calculating w_double_prime


# 5. Subset seed genes to those found in the selected network.

Only genes contained in the interaction network will be retained for downstream analysis.

In [20]:
# subset seed genes to those found in interactome
print("Number of D1 genes:", len(D1_genes))
D1_genes = list(np.intersect1d(D1_genes,int_nodes))
print("Number of D1 genes in interactome:", len(D1_genes))

Number of D1 genes: 235
Number of D1 genes in interactome: 196


# 6. Compute network proximity scores from seed gene set.

The network proximity scores include a correction for the degree distribution of the input gene sets. The runtime required for computing the network proximity scores increases linearly with the number of nodes in the underlying interaction network and with the size of the input gene list

In [21]:
# D1 network propagation
print('\nCalculating D1 z-scores: ')
z_D1, Fnew_D1, Fnew_rand_D1 = netprop_zscore.calculate_heat_zscores(w_double_prime, int_nodes, 
                                                                    dict(G_int.degree), 
                                                                    D1_genes, num_reps=1000,
                                                                    minimum_bin_size=100)

z_D1 = pd.DataFrame({'z':z_D1})

z_D1.sort_values('z',ascending=False).head()


Calculating D1 z-scores: 


  0%|          | 0/1000 [00:00<?, ?it/s]

Unnamed: 0,z
DLAT,23.819753
EIF2B1,23.796715
LRPPRC,23.454761
PDHB,23.177739
DLD,22.877058


# 7. Build proximal subnetwork by taking z> thresh

Select genes which have z>threshold (default = 3). These genes are proximal in network space to the seed gene set

In [22]:
# ----------- select thresholds for NetColoc -----------------
zthresh=3 # default = 3

# select the genes in the network intersection, make a subgraph

G_prox = nx.subgraph(G_int,z_D1[z_D1['z']>zthresh].index.tolist()) 
print("Nodes in proximal subgraph:", len(G_prox.nodes()))
print("Edges in proximal subgraph:", len(G_prox.edges()))

Nodes in proximal subgraph: 748
Edges in proximal subgraph: 31926


# Compute systems map from proximal subgraph

9. Convert network colocalization subnetwork to form used in community detection module

In [23]:
# compile dataframe of metadata for overlapping nodes
node_df = pd.DataFrame(index=list(G_prox.nodes))
node_df = node_df.assign(d1_seeds=0, d1_name=d1_name,)
node_df.loc[list(np.intersect1d(D1_genes,node_df.index.tolist())), 'd1_seeds']=1
node_df['z_d1']=z_D1.loc[list(G_prox.nodes)]['z']
node_df['sum_seeds']=node_df['d1_seeds']

node_df = node_df.sort_values('z_d1',ascending=False)
node_df.head(15)

Unnamed: 0,d1_seeds,d1_name,z_d1,sum_seeds
DLAT,1,CD,23.819753,1
EIF2B1,1,CD,23.796715,1
LRPPRC,1,CD,23.454761,1
PDHB,1,CD,23.177739,1
DLD,1,CD,22.877058,1
EIF2B3,1,CD,22.639214,1
PDHA1,1,CD,22.497164,1
EIF2B5,1,CD,21.419588,1
MTR,1,CD,21.417352,1
DPM1,1,CD,21.37633,1


# 10. Run community detection on NetColoc subnetwork (recommend HiDef).
Documentation for CDAPS utils to build multiscale systems map in notebook

https://cdapsutil.readthedocs.io/en/latest/quicktutorial.html#example
https://cdapsutil.readthedocs.io/en/latest/cdapsutil.html#community-detection

In [24]:
print("Nodes in overlap subgraph:", len(G_prox.nodes()))
print("Edges in overlap subgraph:", len(G_prox.edges()))
# Create cx format of overlap subgraph
G_prox_cx = ndex2.create_nice_cx_from_networkx(G_prox)
G_prox_cx.set_name(d1_name+'_NetColoc_subgraph') 
for node_id, node in G_prox_cx.get_nodes():
    data = node_df.loc[node['n']]
    for row, value in data.items():
        if row == 'd1_seeds' or row == 'd2_seeds' or row=='sum_seeds':
            data_type = 'double'
        elif row=='d1_name' or row=='d2_name':
            data_type='string'
        else:
            data_type = 'double'
        G_prox_cx.set_node_attribute(node_id, row, value, type=data_type)

cd = cdapsutil.CommunityDetection()

# Run HiDeF on CDAPS REST service
G_hier = cd.run_community_detection(G_prox_cx, algorithm='hidefv1.1beta',arguments={'--maxres':'20'})


Nodes in overlap subgraph: 748
Edges in overlap subgraph: 31926


In [25]:
# Print information about hierarchy
print('Hierarchy name: ' + str(G_hier.get_name()))
print('# nodes: ' + str(len(G_hier.get_nodes())))
print('# edges: ' + str(len(G_hier.get_edges())))

Hierarchy name: hidefv1.1beta_(none)_CD_NetColoc_subgraph
# nodes: 69
# edges: 69


11. Convert the NetColoc hierarchy to networkx format, and write out features of the hierarchy to a pandas dataframe, for easier access in Python.

In [26]:
G_hier = G_hier.to_networkx(mode='default')
G_hier

nodes = G_hier.nodes()

# print the number of nodes and edges in the hierarchy for diagnostic purposes
print('Number of nodes:', len(G_hier.nodes()))

print('\nNumber of edges:', len(G_hier.edges()))

Number of nodes: 69

Number of edges: 69


In [27]:
# add node attributes to dataframe for easier access
hier_df = pd.DataFrame.from_dict(dict(G_hier.nodes(data=True)), orient='index')
hier_df['system_ID']=hier_df.index.tolist()
# some columns are not the right type
hier_df['CD_MemberList_Size']=[int(x) for x in hier_df['CD_MemberList_Size'].tolist()]
hier_df['HiDeF_persistence']=[int(x) for x in hier_df['HiDeF_persistence'].tolist()]
hier_df.head()

Unnamed: 0,CD_MemberList,CD_MemberList_Size,CD_Labeled,CD_MemberList_LogSize,CD_CommunityName,CD_AnnotatedMembers,CD_AnnotatedMembers_Size,CD_AnnotatedMembers_Overlap,CD_AnnotatedMembers_Pvalue,HiDeF_persistence,represents,name,system_ID
0,AIMP1 SUOX ETFB CLPTM1 GRHPR EGLN3 FARSB KCND1...,747,False,9.545,,,0,0.0,0.0,122,C748,C748,0
1,CHKB SLC44A2 SLC44A3 GPD1L SLC44A1 SLC44A5 GPD1,7,False,2.807,,,0,0.0,0.0,26,C768,C768,1
2,M6PR SYNJ1 NLRP3 DNAJC6 SCARB2 DNASE2 INPP4A,7,False,2.807,,,0,0.0,0.0,18,C771,C771,2
3,KIF1A TCP11X2 PLA2G6 TUBA4B TUBB4A TUBG2 TBCD,7,False,2.807,,,0,0.0,0.0,6,C773,C773,3
4,EPM2A GYG1 PPP1R3C NHLRC1 GYG2 AGL,6,False,2.585,,,0,0.0,0.0,56,C775,C775,4


12. Remove systems with no seed genes (OPTIONAL)

In [28]:
hier_df.index=hier_df['name']
hier_df.head()

num_d1_seeds = []
frac_d1_seeds=[]

systems_keep = []
for c in hier_df.index.tolist():
    system_genes = hier_df['CD_MemberList'].loc[c].split(' ')
    d1_temp = list(np.intersect1d(system_genes,D1_genes))

    num_d1_temp = len(d1_temp)
    if (num_d1_temp)>0: # keep the system if it has at least 1 seed gene
        systems_keep.append(c)
        num_d1_seeds.append(num_d1_temp)
        
        frac_d1_seeds.append((num_d1_temp)/float(len(system_genes)))

        
frac_no_seeds = np.subtract(1.0,np.array([frac_d1_seeds]).sum(axis=0))

hier_df = hier_df.loc[systems_keep]
hier_df['num_d1_seeds']=num_d1_seeds
hier_df['frac_d1_seeds']=frac_d1_seeds
hier_df['frac_no_seeds']=frac_no_seeds
print("Number of nodes with seed genes:", len(hier_df))

hier_df.head()
    

Number of nodes with seed genes: 59


Unnamed: 0_level_0,CD_MemberList,CD_MemberList_Size,CD_Labeled,CD_MemberList_LogSize,CD_CommunityName,CD_AnnotatedMembers,CD_AnnotatedMembers_Size,CD_AnnotatedMembers_Overlap,CD_AnnotatedMembers_Pvalue,HiDeF_persistence,represents,name,system_ID,num_d1_seeds,frac_d1_seeds,frac_no_seeds
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
C748,AIMP1 SUOX ETFB CLPTM1 GRHPR EGLN3 FARSB KCND1...,747,False,9.545,,,0,0.0,0.0,122,C748,C748,0,196,0.262383,0.737617
C768,CHKB SLC44A2 SLC44A3 GPD1L SLC44A1 SLC44A5 GPD1,7,False,2.807,,,0,0.0,0.0,26,C768,C768,1,1,0.142857,0.857143
C771,M6PR SYNJ1 NLRP3 DNAJC6 SCARB2 DNASE2 INPP4A,7,False,2.807,,,0,0.0,0.0,18,C771,C771,2,3,0.428571,0.571429
C773,KIF1A TCP11X2 PLA2G6 TUBA4B TUBB4A TUBG2 TBCD,7,False,2.807,,,0,0.0,0.0,6,C773,C773,3,4,0.571429,0.428571
C775,EPM2A GYG1 PPP1R3C NHLRC1 GYG2 AGL,6,False,2.585,,,0,0.0,0.0,56,C775,C775,4,2,0.333333,0.666667


In [29]:
# prune G_hier--> only keep systems with at least one seed gene

nkeep=[]
for n in list(G_hier.nodes()):
    if G_hier.nodes(data=True)[n]['name'] in systems_keep:
        nkeep.append(n)
        

G_hier = nx.subgraph(G_hier, nkeep)
print("Number of nodes with seed genes:", len(G_hier.nodes()))
print("Number of edges remaining:", len(G_hier.edges()))

Number of nodes with seed genes: 59
Number of edges remaining: 59


In [30]:
network_colocalization.view_G_hier(G_hier)

CytoscapeWidget(cytoscape_layout={'name': 'cose'}, cytoscape_style=[{'selector': 'node', 'css': {'content': 'd…

14. Annotate systems with gprofiler.
Annotate moderately sized systems (between 50 to 1000 genes per system) if they are significantly enriched for a Gene Ontology biological process. Also require that the GO term is enriched with 
 and shares at least 3 genes with the system to annotate, to increase the stringency of the annotation. Label the system using the GO term that meets these criteria, and has the highest sum of precision and recall. Systems which have no GO terms meeting these criteria are labeled with their unique system ID.

In [31]:
# gprofiler annotation of clusters

system_name_list = []
for p in hier_df.index.tolist():
    focal_genes=hier_df['CD_MemberList'].loc[p].split(' ')
    print(p)
    print(len(focal_genes))
    if len(focal_genes)>2:
        gp_temp = pd.DataFrame(gp.profile(focal_genes,significance_threshold_method='fdr',
                                               sources=['REAC']))
        if len(gp_temp)>0: # make sure data is not empty
            
            # make sure terms are specific, and overlap with at least 3 genes
            gp_temp = gp_temp[(gp_temp['term_size']<1000)&(gp_temp['term_size']>50)]
            gp_temp = gp_temp[gp_temp['intersection_size']>=3]
            
            gp_temp = gp_temp[gp_temp['p_value']<1E-5] # set a stringent pvalue threshold
            
            gp_temp = gp_temp.sort_values('recall',ascending=False)
            
            if len(gp_temp)>1:
                system_name_list.append(gp_temp.head(1)['name'].tolist()[0])
            else:
                system_name_list.append(p)
        else:
            system_name_list.append(p)
            

        display(gp_temp.head())
        
    else:
        system_name_list.append(p)

C748
747


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size
5,Pyruvate metabolism and Citric Acid (TCA) cycle,10842,26,Pyruvate metabolism and Citric Acid (TCA) cycle,REAC:R-HSA-71406,1.162152e-17,[REAC:R-HSA-1428517],0.047016,query_1,553,0.481481,True,REAC,54
12,Peroxisomal protein import,10842,25,Peroxisomal protein import,REAC:R-HSA-9033241,2.1694e-15,[REAC:R-HSA-9609507],0.045208,query_1,553,0.416667,True,REAC,60
9,Sphingolipid metabolism,10842,31,Sphingolipid metabolism,REAC:R-HSA-428157,6.387769000000001e-17,[REAC:R-HSA-556833],0.056058,query_1,553,0.360465,True,REAC,86
1,The citric acid (TCA) cycle and respiratory el...,10842,59,The citric acid (TCA) cycle and respiratory el...,REAC:R-HSA-1428517,3.0499429999999997e-30,[REAC:R-HSA-1430728],0.106691,query_1,553,0.333333,True,REAC,177
6,Metabolism of water-soluble vitamins and cofac...,10842,37,Metabolism of water-soluble vitamins and cofac...,REAC:R-HSA-196849,2.1832990000000003e-17,[REAC:R-HSA-196854],0.066908,query_1,553,0.305785,True,REAC,121


C768
7


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size
0,Glycerophospholipid biosynthesis,10842,7,Glycerophospholipid biosynthesis,REAC:R-HSA-1483206,4.33687e-13,[REAC:R-HSA-1483257],1.0,query_1,7,0.054688,True,REAC,128
4,"Transport of bile salts and organic acids, met...",10842,4,"Transport of bile salts and organic acids, met...",REAC:R-HSA-425366,3.689452e-07,[REAC:R-HSA-425407],0.571429,query_1,7,0.047619,True,REAC,84
1,Phospholipid metabolism,10842,7,Phospholipid metabolism,REAC:R-HSA-1483257,7.409963e-12,[REAC:R-HSA-556833],1.0,query_1,7,0.033333,True,REAC,210
3,Metabolism of lipids,10842,7,Metabolism of lipids,REAC:R-HSA-556833,2.4901e-08,[REAC:R-HSA-1430728],1.0,query_1,7,0.009563,True,REAC,732


C771
7


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size
0,Clathrin-mediated endocytosis,10842,4,Clathrin-mediated endocytosis,REAC:R-HSA-8856828,6e-06,[REAC:R-HSA-199991],0.666667,query_1,6,0.027778,True,REAC,144


C773
7


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size


C775
6


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size
5,Metabolism of carbohydrates,10842,6,Metabolism of carbohydrates,REAC:R-HSA-71387,1.013487e-09,[REAC:R-HSA-1430728],1.0,query_1,6,0.020979,True,REAC,286
6,Diseases of metabolism,10842,5,Diseases of metabolism,REAC:R-HSA-5668914,8.123555e-08,[REAC:R-HSA-1643685],0.833333,query_1,6,0.020408,True,REAC,245


C810
4


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size


C749
475


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size
4,Pyruvate metabolism and Citric Acid (TCA) cycle,10842,26,Pyruvate metabolism and Citric Acid (TCA) cycle,REAC:R-HSA-71406,5.723746000000001e-22,[REAC:R-HSA-1428517],0.068966,query_1,377,0.481481,True,REAC,54
12,Peroxisomal protein import,10842,23,Peroxisomal protein import,REAC:R-HSA-9033241,5.905974e-17,[REAC:R-HSA-9609507],0.061008,query_1,377,0.383333,True,REAC,60
1,The citric acid (TCA) cycle and respiratory el...,10842,59,The citric acid (TCA) cycle and respiratory el...,REAC:R-HSA-1428517,5.038239e-40,[REAC:R-HSA-1430728],0.156499,query_1,377,0.333333,True,REAC,177
3,Protein localization,10842,46,Protein localization,REAC:R-HSA-9609507,5.056234000000001e-28,[REAC:0000000],0.122016,query_1,377,0.291139,True,REAC,158
5,"Respiratory electron transport, ATP synthesis ...",10842,35,"Respiratory electron transport, ATP synthesis ...",REAC:R-HSA-163200,2.553335e-20,[REAC:R-HSA-1428517],0.092838,query_1,377,0.275591,True,REAC,127


C750
139


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size
0,Sphingolipid metabolism,10842,23,Sphingolipid metabolism,REAC:R-HSA-428157,2.0038719999999998e-26,[REAC:R-HSA-556833],0.237113,query_1,97,0.267442,True,REAC,86
20,Heparan sulfate/heparin (HS-GAG) metabolism,10842,7,Heparan sulfate/heparin (HS-GAG) metabolism,REAC:R-HSA-1638091,6.949472e-06,[REAC:R-HSA-1630316],0.072165,query_1,97,0.127273,True,REAC,55
17,Synthesis of substrates in N-glycan biosythesis,10842,8,Synthesis of substrates in N-glycan biosythesis,REAC:R-HSA-446219,1.218802e-06,[REAC:R-HSA-446193],0.082474,query_1,97,0.126984,True,REAC,63
19,Biosynthesis of the N-glycan precursor (dolich...,10842,8,Biosynthesis of the N-glycan precursor (dolich...,REAC:R-HSA-446193,5.950052e-06,[REAC:R-HSA-446203],0.082474,query_1,97,0.102564,True,REAC,78
14,Glycosaminoglycan metabolism,10842,11,Glycosaminoglycan metabolism,REAC:R-HSA-1630316,2.056392e-07,[REAC:R-HSA-71387],0.113402,query_1,97,0.089431,True,REAC,123


C782
6


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size
2,Metabolism of water-soluble vitamins and cofac...,10842,4,Metabolism of water-soluble vitamins and cofac...,REAC:R-HSA-196849,3e-06,[REAC:R-HSA-196854],0.666667,query_1,6,0.033058,True,REAC,121


C813
4


C753
18


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size
0,Sphingolipid metabolism,10842,12,Sphingolipid metabolism,REAC:R-HSA-428157,5.412145e-24,[REAC:R-HSA-556833],0.923077,query_1,13,0.139535,True,REAC,86
2,Metabolism of lipids,10842,12,Metabolism of lipids,REAC:R-HSA-556833,5.029319e-13,[REAC:R-HSA-1430728],0.923077,query_1,13,0.016393,True,REAC,732


C755
13


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size
0,Dual incision in TC-NER,10842,7,Dual incision in TC-NER,REAC:R-HSA-6782135,2.771094e-11,[REAC:R-HSA-6781827],0.583333,query_1,12,0.109375,True,REAC,64
4,Formation of TC-NER Pre-Incision Complex,10842,5,Formation of TC-NER Pre-Incision Complex,REAC:R-HSA-6781823,6.468139e-08,[REAC:R-HSA-6781827],0.416667,query_1,12,0.096154,True,REAC,52
1,Transcription-Coupled Nucleotide Excision Repa...,10842,7,Transcription-Coupled Nucleotide Excision Repa...,REAC:R-HSA-6781827,5.335411e-11,[REAC:R-HSA-5696398],0.583333,query_1,12,0.090909,True,REAC,77
5,Gap-filling DNA repair synthesis and ligation ...,10842,5,Gap-filling DNA repair synthesis and ligation ...,REAC:R-HSA-6782210,1.449112e-07,[REAC:R-HSA-6781827],0.416667,query_1,12,0.079365,True,REAC,63
2,Nucleotide Excision Repair,10842,7,Nucleotide Excision Repair,REAC:R-HSA-5696398,4.35027e-10,[REAC:R-HSA-73894],0.583333,query_1,12,0.06422,True,REAC,109


C787
5


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size


C757
13


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size


C794
4


C763
8


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size


C764
8


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size


C765
8


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size
3,Extension of Telomeres,10842,3,Extension of Telomeres,REAC:R-HSA-180786,5e-06,[REAC:R-HSA-157579],0.6,query_1,5,0.058824,True,REAC,51


C766
7


C770
7


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size


C772
7


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size


C774
7


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size


C776
6


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size


C778
6


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size


C781
6


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size


C786
5


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size


C788
5


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size


C792
4


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size


C796
4


C800
4


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size


C801
4


C802
4


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size


C806
4


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size


C807
4


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size


C808
4


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size


C752
18


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size
1,Peroxisomal protein import,10842,10,Peroxisomal protein import,REAC:R-HSA-9033241,1.265106e-19,[REAC:R-HSA-9609507],0.714286,query_1,14,0.166667,True,REAC,60
3,E3 ubiquitin ligases ubiquitinate target proteins,10842,5,E3 ubiquitin ligases ubiquitinate target proteins,REAC:R-HSA-8866654,3.721897e-08,[REAC:R-HSA-8852135],0.357143,query_1,14,0.086207,True,REAC,58
0,Protein localization,10842,12,Protein localization,REAC:R-HSA-9609507,1.1210319999999998e-19,[REAC:0000000],0.857143,query_1,14,0.075949,True,REAC,158
4,Protein ubiquitination,10842,5,Protein ubiquitination,REAC:R-HSA-8852135,1.352892e-07,[REAC:R-HSA-597592],0.357143,query_1,14,0.064103,True,REAC,78


C756
13


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size
3,Complex I biogenesis,10842,6,Complex I biogenesis,REAC:R-HSA-6799198,1.581007e-11,[REAC:R-HSA-611105],0.6,query_1,10,0.105263,True,REAC,57
0,"Respiratory electron transport, ATP synthesis ...",10842,10,"Respiratory electron transport, ATP synthesis ...",REAC:R-HSA-163200,6.4510309999999995e-19,[REAC:R-HSA-1428517],1.0,query_1,10,0.07874,True,REAC,127
2,Respiratory electron transport,10842,8,Respiratory electron transport,REAC:R-HSA-611105,1.412401e-14,[REAC:R-HSA-163200],0.8,query_1,10,0.07767,True,REAC,103
1,The citric acid (TCA) cycle and respiratory el...,10842,10,The citric acid (TCA) cycle and respiratory el...,REAC:R-HSA-1428517,9.901966e-18,[REAC:R-HSA-1430728],1.0,query_1,10,0.056497,True,REAC,177


C759
11


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size
1,Metabolism of amino acids and derivatives,10842,8,Metabolism of amino acids and derivatives,REAC:R-HSA-71291,2.13639e-11,[REAC:R-HSA-1430728],1.0,query_1,8,0.021622,True,REAC,370


C760
8


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size


C762
8


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size
2,Metabolism of water-soluble vitamins and cofac...,10842,4,Metabolism of water-soluble vitamins and cofac...,REAC:R-HSA-196849,1e-06,[REAC:R-HSA-196854],0.666667,query_1,6,0.033058,True,REAC,121
3,Metabolism of vitamins and cofactors,10842,4,Metabolism of vitamins and cofactors,REAC:R-HSA-196854,4e-06,[REAC:R-HSA-1430728],0.666667,query_1,6,0.021505,True,REAC,186


C767
7


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size
0,Dual incision in TC-NER,10842,6,Dual incision in TC-NER,REAC:R-HSA-6782135,3.028526e-12,[REAC:R-HSA-6781827],1.0,query_1,6,0.09375,True,REAC,64
1,Transcription-Coupled Nucleotide Excision Repa...,10842,6,Transcription-Coupled Nucleotide Excision Repa...,REAC:R-HSA-6781827,4.7886e-12,[REAC:R-HSA-5696398],1.0,query_1,6,0.077922,True,REAC,77
5,Formation of TC-NER Pre-Incision Complex,10842,4,Formation of TC-NER Pre-Incision Complex,REAC:R-HSA-6781823,1.062774e-07,[REAC:R-HSA-6781827],0.666667,query_1,6,0.076923,True,REAC,52
6,Gap-filling DNA repair synthesis and ligation ...,10842,4,Gap-filling DNA repair synthesis and ligation ...,REAC:R-HSA-6782210,2.001059e-07,[REAC:R-HSA-6781827],0.666667,query_1,6,0.063492,True,REAC,63
2,Nucleotide Excision Repair,10842,6,Nucleotide Excision Repair,REAC:R-HSA-5696398,2.726636e-11,[REAC:R-HSA-73894],1.0,query_1,6,0.055046,True,REAC,109


C811
4


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size


C814
4


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size


C751
84


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size
1,Sphingolipid metabolism,10842,17,Sphingolipid metabolism,REAC:R-HSA-428157,5.296356e-20,[REAC:R-HSA-556833],0.261538,query_1,65,0.197674,True,REAC,86
11,Synthesis of substrates in N-glycan biosythesis,10842,8,Synthesis of substrates in N-glycan biosythesis,REAC:R-HSA-446219,5.569221e-08,[REAC:R-HSA-446193],0.123077,query_1,65,0.126984,True,REAC,63
20,Heparan sulfate/heparin (HS-GAG) metabolism,10842,6,Heparan sulfate/heparin (HS-GAG) metabolism,REAC:R-HSA-1638091,8.029995e-06,[REAC:R-HSA-1630316],0.092308,query_1,65,0.109091,True,REAC,55
13,Biosynthesis of the N-glycan precursor (dolich...,10842,8,Biosynthesis of the N-glycan precursor (dolich...,REAC:R-HSA-446193,2.632472e-07,[REAC:R-HSA-446203],0.123077,query_1,65,0.102564,True,REAC,78
9,Glycosaminoglycan metabolism,10842,10,Glycosaminoglycan metabolism,REAC:R-HSA-1630316,4.904911e-08,[REAC:R-HSA-71387],0.153846,query_1,65,0.081301,True,REAC,123


C761
8


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size


C754
16


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size
3,Sphingolipid metabolism,10842,8,Sphingolipid metabolism,REAC:R-HSA-428157,4.335902e-14,[REAC:R-HSA-556833],0.666667,query_1,12,0.093023,True,REAC,86
5,Metabolism of lipids,10842,8,Metabolism of lipids,REAC:R-HSA-556833,7.775659e-07,[REAC:R-HSA-1430728],0.666667,query_1,12,0.010929,True,REAC,732


C815
4


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size


C758
11


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size
0,Sphingolipid metabolism,10842,8,Sphingolipid metabolism,REAC:R-HSA-428157,1.46043e-16,[REAC:R-HSA-556833],1.0,query_1,8,0.093023,True,REAC,86
2,Metabolism of lipids,10842,8,Metabolism of lipids,REAC:R-HSA-556833,1.805045e-09,[REAC:R-HSA-1430728],1.0,query_1,8,0.010929,True,REAC,732


C769
7


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size
3,Complex I biogenesis,10842,3,Complex I biogenesis,REAC:R-HSA-6799198,5.128585e-06,[REAC:R-HSA-611105],0.6,query_1,5,0.052632,True,REAC,57
0,Respiratory electron transport,10842,5,Respiratory electron transport,REAC:R-HSA-611105,1.052783e-09,[REAC:R-HSA-163200],1.0,query_1,5,0.048544,True,REAC,103
1,"Respiratory electron transport, ATP synthesis ...",10842,5,"Respiratory electron transport, ATP synthesis ...",REAC:R-HSA-163200,1.528713e-09,[REAC:R-HSA-1428517],1.0,query_1,5,0.03937,True,REAC,127
2,The citric acid (TCA) cycle and respiratory el...,10842,5,The citric acid (TCA) cycle and respiratory el...,REAC:R-HSA-1428517,5.482049e-09,[REAC:R-HSA-1430728],1.0,query_1,5,0.028249,True,REAC,177


C777
6


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size


C784
5


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size


C779
6


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size


C805
4


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size


C780
6


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size
3,Extension of Telomeres,10842,3,Extension of Telomeres,REAC:R-HSA-180786,5e-06,[REAC:R-HSA-157579],0.6,query_1,5,0.058824,True,REAC,51


C783
5


C785
5


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size


C803
4


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size
1,Metabolism of amino acids and derivatives,10842,4,Metabolism of amino acids and derivatives,REAC:R-HSA-71291,3e-06,[REAC:R-HSA-1430728],1.0,query_1,4,0.010811,True,REAC,370


C799
4


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size


C816
4


Unnamed: 0,description,effective_domain_size,intersection_size,name,native,p_value,parents,precision,query,query_size,recall,significant,source,term_size
