# Parsing the Causal Biological Network Database

**Author:** [Charles Tapley Hoyt](https://github.com/cthoyt/)

**Estimated Run Time:** 1 minute

This notebook outlines the process of parsing the JSON Graph File format used in the Causal Biological Network (CBN) Database. 

In [1]:
import json
import os
import time

import networkx as nx
import pybel
import pybel_tools
import requests
from pybel.constants import *
from pybel_tools.visualization import to_jupyter

In [2]:
pybel.__version__

'0.7.2-dev'

In [3]:
pybel_tools.__version__

'0.1.17-dev'

In [4]:
time.asctime()

'Thu Aug 10 14:07:57 2017'

## Data Acquisition

Data can be downloaded directly from the `GetJSONGraphFile` endpoint, and are returned in the response as JSON.

In [5]:
res = requests.get(
    "http://causalbionet.com/Networks/GetJSONGraphFile?networkId=hox_2.0_hs"
).json()

## Parsing

The structure is traversed, and the [BELParser](http://pybel.readthedocs.io/en/latest/parser.html#pybel.parser.parse_bel.BelParser) is manipulated directly. Normally, during BEL compilation, the usage of this class is hidden from the user.

In [6]:
graph = pybel.BELGraph()
parser = pybel.parser.BelParser(graph)

In [7]:
def get_citation(evidence):
    return {
        CITATION_NAME: evidence["citation"]["name"],
        CITATION_TYPE: evidence["citation"]["type"],
        CITATION_REFERENCE: evidence["citation"]["id"],
    }

In [8]:
annotation_map = {
    "tissue": "Tissue",
    "disease": "Disease",
    "species_common_name": "Species",
}

In [9]:
species_map = {"human": "9606", "rat": "10116", "mouse": "10090"}

In [10]:
annotation_value_map = {"Species": species_map}

In [11]:
for edge in res["graph"]["edges"]:
    for evidence in edge["metadata"]["evidences"]:
        if "citation" not in evidence or not evidence["citation"]:
            continue

        parser.control_parser.clear()
        parser.control_parser.citation = get_citation(evidence)
        parser.control_parser.evidence = evidence["summary_text"]

        d = {}

        if "biological_context" in evidence:
            annotations = evidence["biological_context"]

            if annotations["tissue"]:
                d["Tissue"] = annotations["tissue"]

            if annotations["disease"]:
                d["Disease"] = annotations["disease"]

            if annotations["species_common_name"]:
                d["Species"] = species_map[annotations["species_common_name"].lower()]

        parser.control_parser.annotations.update(d)
        bel = "{source} {relation} {target}".format_map(edge)
        try:
            parser.parseString(bel)
        except Exception as e:
            print(e, bel)

## Visualization

Finally, the graph is vizualized in the notebook diretly with `pybel_tools.visualization.to_jupyter`.

In [12]:
to_jupyter(graph)

<IPython.core.display.Javascript object>

In [17]:
pybel.to_database(graph)

In [None]:
pybel.get_ver

# Using PyBEL Functions

This pipeline is implemented directly in PyBEL at [pybel.from_cbn_jgif]()

In [13]:
with open(
    os.path.join(os.environ["BMS_BASE"], "cbn", "Human-2.0", "Hox-2.0-Hs.jgf")
) as f:
    graph_jgif_dict = json.load(f)

In [14]:
%%time
graph = pybel.from_cbn_jgif(graph_jgif_dict)

CPU times: user 4.73 s, sys: 69.2 ms, total: 4.79 s
Wall time: 4.8 s


In [15]:
bel_lines = pybel.to_bel_lines(graph)

graph_reloaded = pybel.from_lines(bel_lines)

ERROR:pybel.io.line_utils:Missing required document metadata: ContactInfo


In [16]:
to_jupyter(graph_reloaded)

<IPython.core.display.Javascript object>