# Collapsing Orthologous Nodes

This notebook outlines how mouse and rat genes can be collapsed to their orthologous human genes in a BEL network produced by `PyBEL`

In [1]:
import os
from collections import defaultdict, Counter

import pybel
import networkx as nx

pybel.get_version()

'PyBEL Version: 0.2.4-dev'

## Download Mapping

Gene orthology data is downloaded from the OpenBEL Framework as a BEL document and parsed with `PyBEL`.

In [2]:
url = 'http://resources.openbel.org/belframework/20150611/resource/gene-orthology.bel'

In [3]:
path = os.path.expanduser('~/.pybel/gene-orthology.gpickle')

if not os.path.exists(path):
    orthology = pybel.from_url(url)
    pybel.to_pickle(g, path)
else:
    orthology = pybel.from_pickle(path)

## Construct Equivalence Classes

In this example, the directed BEL graph is relaxed to be undirected and connected components are grouped to reflect transitive equivalence. A mapping from each node to its "equivalence" class is built, and then further a mapping from each "equivalence class" to its contained Mouse Genome Informatics (MGI) name. 

In [4]:
orthology_undirected = orthology.to_undirected()

In [5]:
index2component = {}
member2index = {}
index2mgi = {}

for i, component in enumerate(nx.connected_components(orthology_undirected)):
    index2component[i] = component
    for function, namespace, name in component:
        member2index[function, namespace, name] = i
        
        if 'MGI' == namespace:
            index2mgi[i] = function, namespace, name
        
len(index2component), len(member2index), len(index2mgi)

(36748, 109024, 17670)

In [6]:
mapping = {}

for function, namepace, name in orthology_undirected:
    if (function, namepace, name) not in member2index:
        continue
        
    index = member2index[function, namepace, name]
    
    if index not in index2mgi:
        continue
        
    mapping[function, namepace, name] = index2mgi[index]

Counter(s for f, s, n in mapping)

Counter({'HGNC': 17567, 'MGI': 17700, 'RGD': 17471})

# Application to Sample Corpus

In [20]:
g = pybel.get_small_corpus()

In [21]:
for name, data in g.nodes_iter(data=True):
    if data['type'] == 'Gene' and name in mapping:
        g.node[name].update(orthology_undirected.node[mapping[name]])

In [22]:
g = nx.relabel_nodes(g, lambda n: mapping[n] if n in mapping else n, copy=False)

Summarize unsuccessful mappings

In [29]:
Counter(node[1] for node in g.nodes_iter() if g.node[node]['type'] == 'Gene' and node[1] != 'MGI')

Counter({'HGNC': 5, 'PFH': 57, 'PFM': 20, 'PFR': 6, 'SPAC': 1})