# Extract PTSD-related Dense Dynamic Data Cloud using Blackboard

## John C. Earls (ISB), Eric Sage(UCSD), Mike Yu (UCSD), Kevin Xin(SRI)

**APIs used**:
* http://isbtranslatorapi.adversary.us/
* http://www.ndexbio.org/
* http://biothings.io/

**Preconditions**:
* The problem of understanding PTSD has been posed to the blackboard.
* Some text based knowledge source has proposed that adenosine is a key driver of cardiac related pathologies associated with PTSD based on this pnas paper http://www.pnas.org/content/111/8/3188.full performed on mouse models:

> **Adenosine**, an important signaling molecule, affects an array of cardiovascular activities. Elevating the adenosine level in the heart causes vasodilation and an increase in heart rate (33). Adenosine has been implicated in the initial injury process, because it is a well-known signaling molecule for stress and tissue injury (34⇓–36). From this study, we observed a gradual increase in the adenosine deaminase transcript level (peaking in the T3R1 group). This observation may imply a compensatory process in heart tissue to reduce the adenosine level after the tissue is in the wound-healing stage 

* The blue team recognizes this beacon as a request to contextualize adenosine using Dense Dynamic Data Clouds

**Blackboard steps performed**:
* Create seeds from *adenosine*
* Query HPWP API for Adenosine specific subnetwork
* Write Adenosine specific subnetwork back to blackboard
* NDEx imports network from blackboard
* Some service calculates enrichment for certain metabolic pathways and returns to blackboard
* Some service leverages the Biothings API to identify drugs that target correlated proteins
* Some service suggests using correlated Clinical Labs as features for PTSD biomarker

**NOTE** : This is real data, these are real results.  Someone could reasonably follow up on this.

In [1]:
import urllib2
import json
import requests
import logging
import pandas
from collections import Counter
logging.basicConfig(level=logging.WARNING,
                    format='%(asctime)s %(levelname)s %(message)s',)

#Some helper query functions to API
base_url = 'http://isbtranslatorapi.adversary.us'
def query_isb(endpoint, data={}, base_url=base_url):
    req = requests.post('%s/%s' % (base_url,endpoint), data=data)
    return req.json()

def get_analytes(kwargs):
    kw_local = kwargs.copy()
    frm = 0
    size = 1000
    meta = []
    kw_local['from'] = frm
    kw_local['size'] = size
    res = query_isb('/v1/analyte', data=kw_local)
    meta += res
    # Note: this is relying on the pagination, it would be smarter to just partition
    # the *sig_ids* set which would greatly speed up the query
    while len(res) > 0:
        kw_local['from'] += size
        logging.debug("Saving records from %i to %i" %(frm, frm+size))
        res = query_isb('/v1/analyte', data=kw_local)
        meta+=res
    return meta


### The data service identifies a set of metabolites in the DDDC which contain adenosine

In [2]:
clin_vars = get_analytes({'category':"Metabolites"})
adenosine = [x['_id'] for x in clin_vars if x['_id'].find('adenosine') >-1]
for met in adenosine:
    print met

METAB.None.amino-acid.polyamine-metabolism.5-methylthioadenosine-(mta)
METAB.None.nucleotide.purine-metabolism--adenine-containing.adenosine
METAB.None.nucleotide.purine-metabolism--adenine-containing.adenosine-3'-5'-cyclic-monophosphate-(camp)
METAB.None.nucleotide.purine-metabolism--adenine-containing.adenosine-5'-monophosphate-(amp)
METAB.None.nucleotide.purine-metabolism--adenine-containing.n1-methyladenosine
METAB.None.nucleotide.purine-metabolism--adenine-containing.n6-carbamoylthreonyladenosine
METAB.None.nucleotide.purine-metabolism--adenine-containing.n6-methyladenosine
METAB.None.nucleotide.purine-metabolism--adenine-containing.n6-succinyladenosine


### The data service queries the correlation network from the Hundred Person Wellness Project for analytes related to these seeds

In [3]:
def get_correlations(kwargs):
    kw_local = kwargs.copy()
    sigs = []
    frm = 0
    size = 10000
    meta = []
    kw_local['from'] = frm
    kw_local['size'] = size
    res = query_isb('v1/correlation', data=kw_local)
    correlations = res[:]
    while len(res) > 0:
        logging.debug("Saving records from %i to %i" %(frm, frm+size))
        kw_local['from'] += size
        frm = kw_local['from']
        res = query_isb('v1/correlation', data=kw_local)
        correlations += res
    return correlations
# get correlation network based on seed network
acorr = get_correlations({'ids1':','.join(adenosine)})
adf = pandas.DataFrame(acorr)
sig_adf = adf[adf.bh_adjusted_pvalue < .1]
nodes = set(sig_adf._id_1.tolist() + sig_adf._id_2.tolist())
my_nodes = {a['_id']: a for a in get_analytes({'ids':','.join(nodes)})}
num_edges = len(sig_adf)
print "%i edges in HPWP in adinine seeded network." % (num_edges,)
num_nodes = len(my_nodes)
print "%i total nodes in HPWP Adenosine seeded subnetwork" % (num_nodes,)
for k, v in Counter([v['category'] for v in my_nodes.values()]).items():
    print "%i %s in HPWP Adenosine seeded network" % (v, k)

388 edges in HPWP in adinine seeded network.
248 total nodes in HPWP Adenosine seeded subnetwork
14 Proteomics in HPWP Adenosine seeded network
196 Metabolites in HPWP Adenosine seeded network
38 Clinical Labs in HPWP Adenosine seeded network


### The data service writes the HPWP seeded subnetwork back to the blackboard and is recognized by UCSD's NDEx service as consumable

NDEx service ingests correlation network and generates an NDEx network object that is further computed on and produces a visualization.

see : http://www.ndexbio.org/#/newNetwork/88fd2073-35c1-11e7-8f50-0ac135e8bacf

In [4]:
rec = sig_adf.to_dict('records')
json.dump(rec, open('sig_edges.json', 'w'))

### Some data source performs basic metabolic pathway enrichment on adenisine seeded HPWP correlation network and returns results to blackboard

In [5]:
import scipy.stats
super_pathways_condition = []
for k, v in my_nodes.items():
    if 'super_pathway' in v:
        super_pathways_condition.append(v['super_pathway'])
spc = Counter(super_pathways_condition)
super_pathways_background = []
for v in clin_vars:
    super_pathways_background.append(v['super_pathway'])
spb = Counter(super_pathways_background)
df = pandas.DataFrame([spc, spb], index=['Adenosine-seeded HPWP', 'Total HPWP'])
perc = (df.iloc[0]/df.iloc[1])*100
perc.name = "Percent"
df = df.append(perc)
s = df.sum(axis=1)
not_ashpwp_total = s['Total HPWP'] - s['Adenosine-seeded HPWP']
odr = {}
pv = {}
for c in df.columns:
    ct = [[df.loc['Adenosine-seeded HPWP',c],  df.loc['Total HPWP',c]-df.loc['Adenosine-seeded HPWP',c]],
          [s['Adenosine-seeded HPWP'], not_ashpwp_total]
    ]
    res = scipy.stats.fisher_exact(ct)
    odr[c] = res[0]
    pv[c] = res[1]
df = df.append(pandas.Series(odr, name="Fisher OR"))
df = df.append(pandas.Series(pv, name="Fisher p-value"))
df.transpose().sort_values("Fisher p-value")

Unnamed: 0,Adenosine-seeded HPWP,Total HPWP,Percent,Fisher OR,Fisher p-value
nucleotide,23.0,32.0,71.875,5.358844,1e-05
lipid,54.0,252.0,21.428571,0.571892,0.001296
carbohydrate,12.0,18.0,66.666667,4.193878,0.004014
amino-acid,70.0,156.0,44.871795,1.706811,0.004598
xenobiotics,17.0,91.0,18.681319,0.481729,0.009971
cofactors-and-vitamins,4.0,24.0,16.666667,0.419388,0.121557
peptide,12.0,27.0,44.444444,1.677551,0.210427
energy,4.0,7.0,57.142857,2.795918,0.222863


In [6]:
import scipy.stats
super_pathways_condition = []
for k, v in my_nodes.items():
    if 'sub_pathway' in v:
        super_pathways_condition.append(v['sub_pathway'])
spc = Counter(super_pathways_condition)
super_pathways_background = []
for v in clin_vars:
    super_pathways_background.append(v['sub_pathway'])
spb = Counter(super_pathways_background)
df = pandas.DataFrame([spc, spb], index=['Adenosine-seeded HPWP', 'Total HPWP'])
df = df.fillna(0.0)
perc = (df.iloc[0]/df.iloc[1])*100
perc.name = "Percent"
df = df.append(perc)
s = df.sum(axis=1)
not_ashpwp_total = s['Total HPWP'] - s['Adenosine-seeded HPWP']
odr = {}
pv = {}
for c in df.columns:
    ct = [[df.loc['Adenosine-seeded HPWP',c],  df.loc['Total HPWP',c]-df.loc['Adenosine-seeded HPWP',c]],
          [s['Adenosine-seeded HPWP'], not_ashpwp_total]
    ]
    res = scipy.stats.fisher_exact(ct)
    odr[c] = res[0]
    pv[c] = res[1]
df = df.append(pandas.Series(odr, name="Fisher OR"))
df = df.append(pandas.Series(pv, name="Fisher p-value"))
df.transpose().sort_values('Fisher p-value')

Unnamed: 0,Adenosine-seeded HPWP,Total HPWP,Percent,Fisher OR,Fisher p-value
"purine-metabolism,-adenine-containing",8.0,8.0,100.000000,inf,0.000134
"methionine,-cysteine,-sam-and-taurine-metabolism",10.0,15.0,66.666667,4.193878,0.009618
gamma-glutamyl-amino-acid,9.0,13.0,69.230769,4.718112,0.012805
histidine-metabolism,7.0,10.0,70.000000,4.892857,0.017439
aminosugar-metabolism,3.0,3.0,100.000000,inf,0.034366
"purine-metabolism,-guanine-containing",3.0,3.0,100.000000,inf,0.034366
"pyrimidine-metabolism,-orotate-containing",3.0,3.0,100.000000,inf,0.034366
monoacylglycerol,0.0,10.0,0.000000,0.000000,0.035295
polyamine-metabolism,4.0,5.0,80.000000,8.387755,0.041468
tryptophan-metabolism,10.0,18.0,55.555556,2.621173,0.044732


### Some data source searches for drug targets for proteins in the adenisine seeded HPWP correlation network and returns results to blackboard

In [7]:
# get protein objects
p = [v for v in my_nodes.values() if v['category'] == "Proteomics"]

'''
load biothings_client and biothings_explorer
'''
from biothings_client import get_client
from biothings_explorer import IdListHandler
md = get_client('drug')
ih = IdListHandler()

# map all proteins to entrezgene using biothings API
symbs = {}
for x in [pp['abbreviation'] for pp in p if 'uniprot' not in pp ]:
    req = requests.get('http://mygene.info/v3/query?q=symbol:%s' % (x,))
    res = req.json()
    if res['total'] == 0:
        req = requests.get('http://mygene.info/v3/query?q=symbol:%s' % (x.replace('_','')))
        res = req.json()
        if res['total'] > 0:
            symbs[x] = res
        else:
            if x == 'PDGF_Subunit_B':
                req = requests.get('http://mygene.info/v3/query?q=symbol:%s' % ('PDGFB',))
                res = req.json()
                symbs[x] = res
            else:
                dd = {'HSP_27':'HSPB1', 'CD40L':'CD154','4E_BP1':'EIF4EBP1','STAMPB': 'STABP'}
                req = requests.get('http://mygene.info/v3/query?q=symbol:%s' % (dd[x],))
                res = req.json()
                if res['total'] > 0:
                     symbs[x] = res                                   
    else:
        symbs[x] = res
     


for pp in p:
    if 'uniprot' in pp:
        req = requests.get('http://mygene.info/v3/query?q=uniprot:%s' % (pp['uniprot']))
        res = req.json()
        symbs[pp['abbreviation']] = res

entrez_map = {k:v['hits'][0]['entrezgene'] for k,v in symbs.items()}

# now map all to uniprot
uniprot_list = ih.list_handler(input_id_list=map(str,entrez_map.values()), input_type='entrez_gene_id', output_type='uniprot_id')
uniprot_map = {k:u for u, k in zip(uniprot_list, entrez_map.keys())}
uniprot_id_list = uniprot_map.values()
#map u back to prots
uniprot2object = {} 
for a,u in uniprot_map.items():
    uniprot2object[u] = [pp for pp in p if pp['abbreviation'] == a][0]


md = get_client("drug")
prot2drug = []
for u in  uniprot_id_list:
    res =  md.query('drugbank.targets.uniprot:%s' % (u,), fields='drugbank.name')
    for j in res['hits']:
        prot2drug.append((u,j['drugbank']['name']))

drug_targets = {}
for u, d in prot2drug:
    drug_targets[d] = uniprot2object[u]
print "Putative drug and their adenisine related protein targets"
print "=" * 20
print json.dumps(drug_targets, indent=2)    
    

Putative drug and their adenisine related protein targets
{
  "Ethanesulfonic Acid": {
    "category": "Proteomics", 
    "subsystem": "CVD", 
    "vendor": "Olink", 
    "name": "EGF", 
    "measurement_technology": "antibody capture", 
    "abbreviation": "EGF", 
    "_id": "PROTE.None.CVD.EGF.None"
  }, 
  "OGX-427": {
    "category": "Proteomics", 
    "subsystem": "Inflammation", 
    "vendor": "Olink", 
    "name": "IL_7", 
    "measurement_technology": "antibody capture", 
    "abbreviation": "IL_7", 
    "_id": "PROTE.None.Inflammation.IL_7.None"
  }, 
  "Alpha-Aminobutyric Acid": {
    "category": "Proteomics", 
    "subsystem": "Inflammation", 
    "vendor": "Olink", 
    "name": "SIRT2", 
    "measurement_technology": "antibody capture", 
    "abbreviation": "SIRT2", 
    "_id": "PROTE.None.Inflammation.SIRT2.None"
  }, 
  "1,4-Dithiothreitol": {
    "category": "Proteomics", 
    "subsystem": "CVD", 
    "vendor": "Olink", 
    "name": "ITGB1BP2", 
    "measurement_technolo

### Note these are just a few trivial examples of ways one could leverage DDDC and the Blackboard approach to better understand PTSD.

We havent even looked at the clinical labs, which are enriched for cardiac markers and obvious candidates for study for the purpose of developing cheap ranges to identify solidiers undergoing PTSD.



In [8]:
print "Clinical Labs"
pandas.DataFrame([v for v in my_nodes.values() if v['category'] == "Clinical Labs"])

Clinical Labs


Unnamed: 0,_id,category,hmdb,name,vendor
0,CHEMS.None.Genova.adiponectin,Clinical Labs,,adiponectin,Genova
1,CHEMS.None.Quest.creatinine_quest,Clinical Labs,,creatinine,Quest
2,CHEMS.None.Quest.ldl_small_quest,Clinical Labs,,ldl_small,Quest
3,CHEMS.None.Genova.benzoic_acid,Clinical Labs,HMDB01870,benzoic_acid,Genova
4,CHEMS.None.Genova.3_hydroxypropionic_acid,Clinical Labs,HMDB00700,3_hydroxypropionic_acid,Genova
5,CHEMS.None.Quest.ldl_pattern_quest,Clinical Labs,,ldl_pattern,Quest
6,CHEMS.None.Genova.a_ketoisocaproic_acid,Clinical Labs,HMDB00695,a_ketoisocaproic_acid,Genova
7,CHEMS.None.Genova.alanine_plasma,Clinical Labs,HMDB00161,alanine,Genova
8,CHEMS.None.Genova.3_hydroxyisovaleric_acid,Clinical Labs,HMDB00754,3_hydroxyisovaleric_acid,Genova
9,CHEMS.None.Quest.ldl_peak_size_quest,Clinical Labs,,ldl_peak_size,Quest
