# Summary of the Epilepsy Knowlege Assembly

This notebook was used to generate **Table 2** of the manuscript, "A systematic approach for identifying shared mechanisms in epilepsy and its comorbidities" from Hoyt and Domingo-Fernandez *et. al*, 2018. In this example, we'll summarize the NeuroMMSig subgraphs in the Epilepsy Knowledge Assembly.

## Code Provenance

This notebook uses Python 3 and the currently most up-to-date versions of [PyBEL](https://github.com/pybel/pybel) and [PyBEL Tools](https://github.com/pybel/pybel-tools).

In [1]:
import os
import sys
import time

import pandas as pd

import pybel
import pybel.utils 
from pybel_tools import selection
import pybel_tools.utils
from pybel_tools.mutation import infer_central_dogma
from pybel_tools.summary import info_json, info_list

In [2]:
print(sys.version)

3.6.3 (default, Oct  9 2017, 09:47:56) 
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.37)]


In [3]:
print(time.asctime())

Thu Mar 22 15:40:50 2018


In [4]:
print(pybel.utils.get_version())

0.11.2-dev


In [5]:
print(pybel_tools.utils.get_version())

0.5.2-dev


## Data Provenance

This notebook uses the Epilepsy Knowledge Assembly, which is available from the Fraunhofer SCAI Department of Bioinformatics [downloads page](https://www.scai.fraunhofer.de/en/business-research-areas/bioinformatics/downloads.html).

In [6]:
bel_url = 'https://www.scai.fraunhofer.de/content/dam/scai/de/downloads/bioinformatik/epilepsy.bel'

The following local file path is defined for caching

In [7]:
pickle_path = 'epilepsy.gpickle'

In [8]:
%%time

if os.path.exists(pickle_path): # load from pre-compiled
    graph = pybel.from_pickle(pickle_path)
else:
    graph = pybel.from_url(bel_url)
    pybel.to_pickle(graph, pickle_path) # cache for later

CPU times: user 35.3 ms, sys: 10.7 ms, total: 46 ms
Wall time: 44.6 ms


There are a couple semantic errors in the document. Nobody's perfect!

In [9]:
print(graph)

Epilepsy Knowledge Assembly v1.0.0


## Processing

Ensure all proteins have their corresponding mRNAs, and all RNAs have their corresponding genes.

In [10]:
infer_central_dogma(graph)

Split the graph into subgraphs using the `Subgraph` annotation.

In [11]:
subgraphs = selection.get_subgraphs_by_annotation(graph, 'Subgraph')
len(subgraphs)

33

## Summary

In [12]:
def fix_columns(df_):
    for c in ['Nodes', 'Edges', 'Citations', 'Components']:
        df_[c] = df_[c].astype(int)

Using the [`pybel_tools.summary.info_json`](http://pybel-tools.readthedocs.io/en/latest/summary.html#pybel_tools.summary.info_json) function, the nodes, edges, citations, authors, average degree, and network density of a graph are entered in a dictionary. Edges that were not classified in a subgraph were labeled as "Undefined."

In [13]:
data = {
    subgraph_name.capitalize(): info_json(subgraph)
    for subgraph_name, subgraph in subgraphs.items()
}
df = pd.DataFrame(data).T
del df['Authors']
fix_columns(df)

df_total = pd.DataFrame({'Total': info_json(graph)}).T
del df_total['Compilation warnings']
del df_total['Authors']
fix_columns(df_total)

df = pd.concat([df, df_total])
df

Unnamed: 0,Average degree,Citations,Components,Edges,Network density,Nodes
Adaptive immune system subgraph,1.0,5,4,12,0.090909,12
Adenosine signaling subgraph,2.026316,15,3,154,0.027018,76
Apoptosis signaling subgraph,2.20614,115,5,503,0.009719,228
Brain_derived neurotrophic factor signaling subgraph,1.92,29,1,144,0.025946,75
Calcium dependent subgraph,2.576159,73,8,778,0.008559,302
Chromatin organization subgraph,1.25,2,2,10,0.178571,8
Energy metabolic subgraph,1.945055,24,4,177,0.021612,91
Estradiol metabolism,1.142857,1,2,8,0.190476,7
G-protein-mediated signaling,1.820513,26,5,142,0.023643,78
Gaba subgraph,2.422053,57,2,637,0.009244,263


The dataframe can be output to CSV, or a wide variety of other formats using pandas.

In [14]:
path = os.path.join(os.path.expanduser('~'), 'Desktop', 'subgraph_summary.csv')
df[['Nodes', 'Edges', 'Components', 'Citations']].to_csv(path)