# Assigning Pathways

During NeuroMMSig v1.0, pathways were manually assigned to each edge. For v2.0, we would like to automate this process first with a rule based system, then a machine learning system for prioritizing curation before resorting to manual curation.

## Preamble

### Imports

In [1]:
import getpass
import itertools as itt
import os
import random
import sys
import time
import textwrap
from collections import defaultdict

import bio2bel_hgnc
import bio2bel_mgi
import bio2bel_wikipathways
import hbp_knowledge
import pybel
from bio2bel_hgnc.models import gene_mouse_gene
from IPython.display import Markdown
from pybel.dsl import BaseAbundance, ListAbundance
from pybel_tools.pathway_assigner import PathwayAssigner

### Utilities

In [2]:
def show_doc(f):
    return Markdown(textwrap.dedent(f.__doc__.split('\n', 1)[1]))

### Environment

In [3]:
print(time.asctime())

Tue Aug 20 11:53:51 2019


In [4]:
print(sys.version)

3.7.3 (default, Mar 27 2019, 09:23:39) 
[Clang 10.0.0 (clang-1000.11.45.5)]


In [5]:
print(getpass.getuser())

cthoyt


In [6]:
pybel.get_version()

'0.13.3-dev'

In [7]:
print(hbp_knowledge.VERSION)

0.0.7


In [8]:
print(bio2bel_wikipathways.get_version())

0.2.4-dev


### Data

In [9]:
graph = hbp_knowledge.get_graph()
graph.summarize()

Human Brain Pharmacome Knowledge v0.0.7
Number of Nodes: 6023
Number of Edges: 21625
Number of Citations: 358
Number of Authors: 2012
Network Density: 5.96E-04
Number of Components: 31


## Assigning Pathways

Generate mappings from a given database to HGNC gene identifiers.

In [10]:
wikipathways_manager = bio2bel_wikipathways.Manager()
wikipathways_manager.summarize()

{'pathways': 556, 'proteins': 6613}

In [11]:
managers = [
    wikipathways_manager,
]

assigner = PathwayAssigner(
    graph=graph, 
    managers=managers,
    mgi_cache_path='mgi_symbol_to_hgnc_symbol.json',
    rgd_cache_path='rgd_symbol_to_hgnc_symbol.json',
)
assigner

could not find MGI:5592748
could not find MGI:6278009
could not find MGI:1918096


<pybel_tools.pathway_assigner.PathwayAssigner at 0x124de4b70>

### Assigning Gene-Gene Edges

In [12]:
show_doc(assigner.annotate_gene_gene)


1. Identify if subject and object are both gene nodes. If they are orthologs, try and map them to HGNC.
2. `If` the subject and object in an edge are both in a canonical pathway, then the edge gets assigned to the
   pathway.
3. `Else if` only one of the subject and the object in the edge have been assigned in the pathway:
  1. `If` the edge is an ontological edge, than add it to the pathway
  2. `If` there are other edges in the pathway mentioned in the same article, assign the edge to the pathway
  3. `Else` leave for manual curation
4. `Else if` neither of the nodes are assigned to the pathway, but both nodes are connected to nodes in the
   pathway by directed edges, assign both edge to the pathway as well as incident edges
5. `Else` the nodes don't get assigned to the pathway


In [13]:
c = assigner.annotate_gene_gene()
print(f'Made {c} annotations')

Made 11710 annotations


### Assigning Chemical/Biological Process/Disease - HGNC edges

In [14]:
show_doc(assigner.annotate_gene_other)


1. Identify if subject or object are a gene nodes. If they are orthologs, try and map them to HGNC.
2. If an entity is related to a gene in a pathway, then that edge gets annotated to the pathway


In [15]:
c = assigner.annotate_gene_other()
print(f'Made {c} annotations')

Made 71134 annotations


### Assigning Tangential Nodes

In [16]:
show_doc(assigner.annotate_by_document)


If an edge has only one node that appears in a pathway, but that pathway has already been mentioned in the
paper, then it gets annotated to that pathway too.


In [17]:
c = assigner.annotate_by_document()
print(f'Made {c} annotations')

Made 75530 annotations


### Assigning Complexes

In [18]:
show_doc(assigner.annotate_complexes)


If two or more members of a complex are in a pathway, then the whole complex and all of its partOf
relationships will get assigned to that pathway.


In [19]:
c = assigner.annotate_complexes()
print(f'Made {c} annotations')

Made 9918 annotations


### Investigating what's Left

- Dealing with orthologs
- Reasoning over hierarchical relations (isA, partOf, hasMember)
- Protein complex membership for GO cellular components
- Checking protein families
- Annotation of GO cellular components to pathways
- Reactions - need to enrich with connections to biological processes in GO or annotate based on any enzymes that they interact with.

In [20]:
assigner.summarize()

10216 (47.24%) of 21625 edges were annotated
There are 11409 unannotated edges

Examples of unannotated nodes:

b640ac6b a(CHEBI:Anatabine) decreases a(CHEBI:"amyloid-beta polypeptide 42")
a5e9f48d a(CHEBI:"amyloid-beta polypeptide 42") decreases bp(MESH:Neuroprotection)
83fc97b2 p(HGNC:TTBK2) decreases surf(p(HGNC:GRIK2))
b48dd5e0 act(p(FPLX:PKA)) increases act(complex(GO:"proteasome complex"))
0b2a209d p(HGNCGENEFAMILY:Aminopeptidases) hasMember p(HGNC:ENPEP)
bf5d462e p(HGNC:PPME1) decreases p(HGNC:PPP2CA, pmod(Me, Leu, 309))
94358693 p(HGNC:BAG3) directlyIncreases complex(p(HGNC:BAG3), p(HGNC:DCP1B))
66505084 p(CONSO:"UBB+1") hasVariant p(CONSO:"UBB+1", pmod(Ub))
cfb887ab a(CHEBI:tanespimycin) decreases act(p(FPLX:HSP90))
249ed0c3 complex(p(MGI:"avian reticuloendotheliosis viral (v-rel) oncogene related B"), p(MGI:"nuclear factor of kappa light polypeptide gene enhancer in B cells 1, p105")) increases p(MGI:"chitinase-like 1")
c242a9a0 a(MESH:Prions) increases path(MESH:"Prion Disea

### Print Results

In [21]:
# Print results to files
tsv_path, rst_path = 'assignments.tsv', 'assignments.rst'
assigner.to_file(tsv_path, rst_path)