# Extract PTSD-related Dense Dynamic Data Cloud using Blackboard

## John C. Earls (ISB)
## Chunhua Weng, Chi Yuan,Barnett Chiu, Sunny Shang, James J. Cimino(Columbia)
## Mike Yu (UCSD)
## Ken Gersing, Rajarshi Guha, Mark Williams (NCATS)


**APIs used**:
* http://isbtranslatorapi.adversary.us/
* http://www.ndexbio.org/
* http://biothings.io/


**Preconditions**:
* The problem of understanding PTSD has been posed to the blackboard.
* Some text based knowledge source has proposed that Thyroxine is a key driver of cardiac related pathologies associated with PTSD based on this pnas paper http://www.pnas.org/content/111/8/3188.full performed on mouse models:

> **Thyroxine**, an important signaling molecule, affects an array of cardiovascular activities. Elevating the Thyroxine level in the heart causes vasodilation and an increase in heart rate (33). Thyroxine has been implicated in the initial injury process, because it is a well-known signaling molecule for stress and tissue injury (34⇓–36). From this study, we observed a gradual increase in the Thyroxine deaminase transcript level (peaking in the T3R1 group). This observation may imply a compensatory process in heart tissue to reduce the Thyroxine level after the tissue is in the wound-healing stage 

* The blue team recognizes this beacon as a request to contextualize Thyroxine using Dense Dynamic Data Clouds

**Blackboard steps performed**:
* Create seeds from *Thyroxine*
* Query HPWP API for Thyroxine specific subnetwork
* Write Thyroxine specific subnetwork back to blackboard
* NDEx imports network from blackboard
* Some service calculates enrichment for certain metabolic pathways and returns to blackboard
* Some service suggests using correlated Clinical Labs as features for PTSD biomarker


In [1]:
import urllib2
import json
import requests
import logging
import pandas
from collections import Counter
logging.basicConfig(level=logging.WARNING,
                    format='%(asctime)s %(levelname)s %(message)s',)

#Some helper query functions to API
base_url = 'http://isbtranslatorapi.adversary.us'
def query_isb(endpoint, data={}, base_url=base_url):
    req = requests.post('%s/%s' % (base_url,endpoint), data=data)
    return req.json()

def get_analytes(kwargs):
    kw_local = kwargs.copy()
    frm = 0
    size = 1000
    meta = []
    kw_local['from'] = frm
    kw_local['size'] = size
    res = query_isb('/v1/analyte', data=kw_local)
    meta += res
    # Note: this is relying on the pagination, it would be smarter to just partition
    # the *sig_ids* set which would greatly speed up the query
    while len(res) > 0:
        kw_local['from'] += size
        logging.debug("Saving records from %i to %i" %(frm, frm+size))
        res = query_isb('/v1/analyte', data=kw_local)
        meta+=res
    return meta


### The data service identifies a set of metabolites in the DDDC which contain adenosine

In [2]:
clin_vars = get_analytes({'category':"Metabolites"})
thyroxine = [x['_id'] for x in clin_vars if x['_id'].find('thyroxine') >-1]
for met in thyroxine:
    print met

METAB.None.amino-acid.phenylalanine-and-tyrosine-metabolism.thyroxine


### The data service queries the correlation network from the Hundred Person Wellness Project for analytes related to these seeds

In [3]:
def get_correlations(kwargs):
    kw_local = kwargs.copy()
    sigs = []
    frm = 0
    size = 10000
    meta = []
    kw_local['from'] = frm
    kw_local['size'] = size
    res = query_isb('v1/correlation', data=kw_local)
    correlations = res[:]
    while len(res) > 0:
        logging.debug("Saving records from %i to %i" %(frm, frm+size))
        kw_local['from'] += size
        frm = kw_local['from']
        res = query_isb('v1/correlation', data=kw_local)
        correlations += res
    return correlations
# get correlation network based on seed network
acorr = get_correlations({'ids1':','.join(thyroxine)})
adf = pandas.DataFrame(acorr)
sig_adf = adf[adf.bh_adjusted_pvalue < .1]
nodes = set(sig_adf._id_1.tolist() + sig_adf._id_2.tolist())
my_nodes = {a['_id']: a for a in get_analytes({'ids':','.join(nodes)})}
num_edges = len(sig_adf)
print "%i edges in HPWP in thyroxine seeded network." % (num_edges,)
num_nodes = len(my_nodes)
print "%i total nodes in HPWP thyroxine seeded subnetwork" % (num_nodes,)
for k, v in Counter([v['category'] for v in my_nodes.values()]).items():
    print "%i %s in HPWP Adenosine seeded network" % (v, k)

51 edges in HPWP in thyroxine seeded network.
52 total nodes in HPWP thyroxine seeded subnetwork
46 Metabolites in HPWP Adenosine seeded network
6 Clinical Labs in HPWP Adenosine seeded network


### The data service writes the HPWP seeded subnetwork back to the blackboard and is recognized by UCSD's NDEx service as consumable

NDEx service ingests correlation network and generates an NDEx network object that is further computed on and produces a visualization.

see : http://www.ndexbio.org/#/newNetwork/88fd2073-35c1-11e7-8f50-0ac135e8bacf

In [4]:
rec = sig_adf.to_dict('records')
json.dump(rec, open('thyroxine_sig_edges.json', 'w'))

### Some data source performs basic metabolic pathway enrichment on adenisine seeded HPWP correlation network and returns results to blackboard

In [5]:
import scipy.stats
super_pathways_condition = []
for k, v in my_nodes.items():
    if 'super_pathway' in v:
        super_pathways_condition.append(v['super_pathway'])
spc = Counter(super_pathways_condition)
super_pathways_background = []
for v in clin_vars:
    super_pathways_background.append(v['super_pathway'])
spb = Counter(super_pathways_background)
df = pandas.DataFrame([spc, spb], index=['Thyroxine-seeded HPWP', 'Total HPWP']).fillna(0.0)
perc = (df.iloc[0]/df.iloc[1])*100
perc.name = "Percent"
df = df.append(perc)
s = df.sum(axis=1)
not_ashpwp_total = s['Total HPWP'] - s['Thyroxine-seeded HPWP']
odr = {}
pv = {}
for c in df.columns:
    ct = [[df.loc['Thyroxine-seeded HPWP',c],  df.loc['Total HPWP',c]-df.loc['Thyroxine-seeded HPWP',c]],
          [s['Thyroxine-seeded HPWP'], not_ashpwp_total]
    ]
    res = scipy.stats.fisher_exact(ct)
    odr[c] = res[0]
    pv[c] = res[1]
df = df.append(pandas.Series(odr, name="Fisher OR"))
df = df.append(pandas.Series(pv, name="Fisher p-value"))
df.transpose().sort_values("Fisher p-value")

Unnamed: 0,Thyroxine-seeded HPWP,Total HPWP,Percent,Fisher OR,Fisher p-value
lipid,41.0,252.0,16.269841,2.369771,0.000265
amino-acid,2.0,156.0,1.282051,0.158385,0.00248
xenobiotics,2.0,91.0,2.197802,0.27406,0.072539
cofactors-and-vitamins,0.0,24.0,0.0,0.0,0.246258
peptide,0.0,27.0,0.0,0.0,0.249249
carbohydrate,0.0,18.0,0.0,0.0,0.387358
nucleotide,1.0,32.0,3.125,0.393408,0.501928
energy,0.0,7.0,0.0,0.0,1.0


In [8]:
import scipy.stats
super_pathways_condition = []
for k, v in my_nodes.items():
    if 'sub_pathway' in v:
        super_pathways_condition.append(v['sub_pathway'])
spc = Counter(super_pathways_condition)
super_pathways_background = []
for v in clin_vars:
    super_pathways_background.append(v['sub_pathway'])
spb = Counter(super_pathways_background)
df = pandas.DataFrame([spc, spb], index=['Thyroxine-seeded HPWP', 'Total HPWP']).fillna(0)
df = df.fillna(0.0)
perc = (df.iloc[0]/df.iloc[1])*100
perc.name = "Percent"
df = df.append(perc)
s = df.sum(axis=1)
not_ashpwp_total = s['Total HPWP'] - s['Thyroxine-seeded HPWP']
odr = {}
pv = {}
for c in df.columns:
    ct = [[df.loc['Thyroxine-seeded HPWP',c],  df.loc['Total HPWP',c]-df.loc['Thyroxine-seeded HPWP',c]],
          [s['Thyroxine-seeded HPWP'], not_ashpwp_total]
    ]
    res = scipy.stats.fisher_exact(ct)
    odr[c] = res[0]
    pv[c] = res[1]
df = df.append(pandas.Series(odr, name="Fisher OR"))
df = df.append(pandas.Series(pv, name="Fisher p-value"))
df.transpose().sort_values('Fisher p-value')

Unnamed: 0,Thyroxine-seeded HPWP,Total HPWP,Percent,Fisher OR,Fisher p-value
lysolipid,16.0,40.0,40.000000,8.130435,8.139297e-08
fatty-acid-metabolism(acyl-carnitine),9.0,14.0,64.285714,21.952174,2.510230e-07
polyunsaturated-fatty-acid-(n3-and-n6),3.0,11.0,27.272727,4.573370,4.916191e-02
"fatty-acid,-monohydroxy",3.0,11.0,27.272727,4.573370,4.916191e-02
phospholipid-metabolism,2.0,6.0,33.333333,6.097826,7.353896e-02
quantitative-total-fatty-acid,1.0,1.0,100.000000,inf,7.730263e-02
sterol,1.0,2.0,50.000000,12.195652,1.485124e-01
"pyrimidine-metabolism,-cytidine-containing",1.0,2.0,50.000000,12.195652,1.485124e-01
steroid,0.0,33.0,0.000000,0.000000,1.592649e-01
sphingolipid-metabolism,2.0,11.0,18.181818,2.710145,2.076860e-01


### Some data source searches for drug targets for proteins in the adenisine seeded HPWP correlation network and returns results to blackboard

### Note these are just a few trivial examples of ways one could leverage DDDC and the Blackboard approach to better understand PTSD.

We havent even looked at the clinical labs, which are enriched for cardiac markers and obvious candidates for study for the purpose of developing cheap ranges to identify solidiers undergoing PTSD.



In [7]:
print "Clinical Labs"
cl = pandas.DataFrame([v for v in my_nodes.values() if v['category'] == "Clinical Labs"])
sig_adf[sig_adf._id_1.isin(cl._id.tolist()) | sig_adf._id_2.isin(cl._id.tolist())]

Clinical Labs


Unnamed: 0,_id_1,_id_2,bh_adjusted_pvalue,coefficient,description,pvalue,study,test
208,METAB.None.amino-acid.phenylalanine-and-tyrosi...,CHEMS.None.Genova.ldl_cholesterol,0.002067,-0.405385,"mean value, age and sex adjusted",1.5e-05,Hundred Person Wellness Project,SPEARMAN
209,METAB.None.amino-acid.phenylalanine-and-tyrosi...,CHEMS.None.Genova.ldl_particle,0.087355,-0.279575,"mean value, age and sex adjusted",0.003386,Hundred Person Wellness Project,SPEARMAN
246,METAB.None.amino-acid.phenylalanine-and-tyrosi...,CHEMS.None.Genova.pyruvic_acid,0.076353,0.285582,"mean value, age and sex adjusted",0.002736,Hundred Person Wellness Project,SPEARMAN
265,METAB.None.amino-acid.phenylalanine-and-tyrosi...,CHEMS.None.Genova.total_cholesterol,0.000505,-0.435114,"mean value, age and sex adjusted",3e-06,Hundred Person Wellness Project,SPEARMAN
293,METAB.None.amino-acid.phenylalanine-and-tyrosi...,CHEMS.None.Quest.cholesterol_total_quest,0.000495,-0.435632,"mean value, age and sex adjusted",2e-06,Hundred Person Wellness Project,SPEARMAN
304,METAB.None.amino-acid.phenylalanine-and-tyrosi...,CHEMS.None.Quest.ldl_particle_number_quest,0.056252,-0.299529,"mean value, age and sex adjusted",0.001637,Hundred Person Wellness Project,SPEARMAN
