This notebook contains a code snippet that generats a pandas DataFrame containing the summary for a given knowledge assembly, split by a given annotation

In [1]:
import os
import sys
import time

import pandas as pd
import pybel
from pybel.constants import VERSION as PYBEL_VERSION
from pybel_tools import selection
from pybel_tools.mutation import infer_central_dogma
from pybel_tools.summary import info_json, info_list

In [2]:
print(sys.version)

3.6.3 (default, Oct  9 2017, 09:47:56) 
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.37)]


In [3]:
print(time.asctime())

Thu Jan 11 14:11:38 2018


In [4]:
print(PYBEL_VERSION)

0.10.2-dev


In [5]:
bms_base = os.environ["BMS_BASE"]

In this example, we'll summarize the NeuroMMSig subgraphs in the Epilepsy Knowledge Assembly (Hoyt, et. al 2018).

In [6]:
graph = pybel.from_pickle(
    os.path.join(bms_base, "aetionomy", "epilepsy", "epilepsy.gpickle")
)
print(graph)

Epilepsy Knowledge Assembly v2.0.1


In [7]:
infer_central_dogma(graph)

In [8]:
subgraphs = selection.get_subgraphs_by_annotation(graph, "Subgraph")

len(subgraphs)

32

In [9]:
def fix_columns(df_):
    for c in ["Authors", "Nodes", "Edges", "Citations", "Components"]:
        df_[c] = df_[c].astype(int)

Using the [`info_json`](http://pybel-tools.readthedocs.io/en/latest/summary.html#pybel_tools.summary.info_json) function, the nodes, edges, citations, authors, average degree, and network density of a graph are entered in a dictionary.

In [10]:
data = {
    subgraph_name.capitalize(): info_json(subgraph)
    for subgraph_name, subgraph in subgraphs.items()
}
df = pd.DataFrame(data).T
fix_columns(df)

df_total = pd.DataFrame({"Total": info_json(graph)}).T
del df_total["Compilation warnings"]
fix_columns(df_total)
df_total

df = pd.concat([df, df_total])
df

Unnamed: 0,Authors,Average degree,Citations,Components,Edges,Network density,Nodes
Adaptive immune system subgraph,0,1.0,5,4,12,0.090909,12
Adenosine signaling subgraph,0,2.026316,15,3,154,0.027018,76
Apoptosis signaling subgraph,0,2.20614,115,5,503,0.009719,228
Brain_derived neurotrophic factor signaling subgraph,0,1.893333,29,1,142,0.025586,75
Calcium dependent subgraph,0,2.625828,73,8,793,0.008724,302
Chromatin organization subgraph,0,1.25,2,2,10,0.178571,8
Energy metabolic subgraph,0,1.945055,24,4,177,0.021612,91
Estradiol metabolism,0,1.142857,1,2,8,0.190476,7
G-protein-mediated signaling,0,1.794872,25,5,140,0.02331,78
Gaba subgraph,0,2.412214,56,2,632,0.009242,262


The dataframe can be output to CSV, or a wide variety of other formats using pandas.

In [11]:
path = os.path.join(os.path.expanduser("~"), "Desktop", "subgraph_summary.vcsv")
df[["Nodes", "Edges", "Components", "Citations"]].to_csv(path)