# Addressing COVID-19 Patient RAS-mediated Bradykinin Storm Hypothesis with  Biothings Explorer

&emsp;

SLIDES LINK: https://docs.google.com/presentation/d/1cL0Y-2FECPP5rWlJGI_ZWsvBysZVUGU-oDq75VexFrA/edit?usp=sharing

# Table of Contents

## &emsp; 0 Imports
## &emsp; 1 Overview of Background and BTE Approach 
### &emsp; &emsp; 1.1 Article: Summary and Background 
### &emsp; &emsp; 1.2 Overview of BTE Approach 
## &emsp; 2 Determining Related Genes
### &emsp; &emsp; 2.1 Load COVID-19, Hypotension, and Vasodilation Nodes
### &emsp; &emsp; 2.2 COVID -> Genes <- Hypotension
### &emsp; &emsp; 2.3 COVID -> Genes <- Vasodilation 
### &emsp; &emsp; 2.4 Create Gene Nodes
## &emsp; 3 Analyzing and Exploring Gene Results
### &emsp; &emsp; 3.1 Genes -> Pathways
### &emsp; &emsp; 3.2 Genes -> Biological Processes 
### &emsp; &emsp; 3.3 Genes -> Chemical Substances
### &emsp; &emsp; 3.4 Genes -> Anatomical Entities
### &emsp; &emsp; 3.5 Genes -> Chemical Substances <- Vasodilation
## &emsp; 4 Exploring COVID-19 to Hyaluronic Acid Connection
### &emsp; &emsp; 4.1 COVID-19 -> Genes <- Hyaluronic Acid Explain Query
### &emsp; &emsp; 4.2 COVID/HYA Genes -> Pathways
## &emsp; 5 Summary 
### &emsp; &emsp; 5.1 Summary
### &emsp; &emsp; 5.2 Future Directions 

## 0 Imports

In [17]:
# Import pandas and biothings explorers modules
import pandas as pandas
from biothings_explorer.query.predict import Predict
from biothings_explorer.query.visualize import display_graph
from biothings_explorer.user_query_dispatcher import FindConnection
from biothings_explorer.hint import Hint
import nest_asyncio
nest_asyncio.apply()
%matplotlib inline
import warnings
warnings.filterwarnings("ignore") 
ht = Hint()


## predict_many -> functionality to be fully incorporated into BTE soon, will no longer need following
def predict_many(input_object_list, output_type_list, intermediate_node_list = ''):
    df_list = []
    for input_object in input_object_list: 
        if('name' in input_object):
            for output_type in output_type_list: 
                if(len(intermediate_node_list) > 0):
                    for inter in intermediate_node_list:
                        try: 
#                             print("Running: " + input_object['name'] + ' --> intermediate type ' + inter + ' --> output type ' + output_type )
                            fc = FindConnection(input_obj=input_object, output_obj=output_type, intermediate_nodes=[inter])
                            fc.connect(verbose=False)
                            df = fc.display_table_view()
                            rows = df.shape[0]
                            if(rows > 0):
                                df_list.append(df)
                        except:
                            pass
#                             print(input_object['name'] + ' --> intermediate type ' + inter + ' --> output type ' + output_type + ' FAILED')
                else:
                    try:
#                         print("Running: " + input_object['name'] + ' --> output type ' + output_type )
                        fc = FindConnection(input_obj=input_object, output_obj=output_type, intermediate_nodes=None)
                        fc.connect(verbose=False)
                        df = fc.display_table_view()
                        rows = df.shape[0]
                        if(rows > 0):
                            df_list.append(df)
                    except:
                        pass
#                         print(input_object['name'] + ' --> output type ' + output_type + ' FAILED')

    if(len(df_list) > 0):
        return pandas.concat(df_list)
    else:
        return None

## 1 Overview of Background and BTE Approach
&emsp;
### 1.1 Article: Summary and Background

Article Reference:

Garvin, Michael R., et al. "A mechanistic model and therapeutic interventions for COVID-19 involving a RAS-mediated bradykinin storm." Elife 9 (2020): e59177.


Article Link:  

https://elifesciences.org/articles/59177



Article Main Points: 

- RAS Pathway Imbalance implicated through gene expression analysis from cells in bronchoalveolar lavage fluid (BALF) from COVID-19 patients 

- Predicted RAS pathway imbalance to be cause of bradykinin-driven vascular dilation, vascular permeability and hypotension

- Leaky membranes -> allows Hyaluronic Acid (HYA) to permeate into lungs

- Analyses found that production of HYA was increased and the enzymes that could degrade it greatly decreased


### 1.2 Overview of BTE Approach 

As described in the article, vasodilation and hypotension are two distinctive signs and symptoms in severe COVID-19 cases that are predicted to lead to leaky membranes in the lung. Therefore, the following approach to determine whether or not this may be RAS-pathway linked is by looking at Genes that are both related to COVID and vasoconstriction or hypotension, and then analyzing this genes by looking at what pathways or processes they may be involved in, what chemical substances they may produce, and in what tissues or anatomical entities the genes are linked to. 

Also, starting in Section 4, a brief exploration of genes related to both COVID and Hyaluronic acid is done - HYA being a chemical found in the lungs of COVID patients that is hyper-absorbant and forms a gelatenous substance that blocks oxygen absorption by the lungs

## 2 Determining Related Genes
&emsp;
### 2.1 Load COVID-19, Hypotension, and Vasodilation Nodes
&emsp;


In [18]:
covid = ht.query('COVID-19')['Disease'][0]
covid

{'MONDO': 'MONDO:0100096',
 'DOID': 'DOID:0080600',
 'name': 'COVID-19',
 'primary': {'identifier': 'MONDO',
  'cls': 'Disease',
  'value': 'MONDO:0100096'},
 'display': 'MONDO(MONDO:0100096) DOID(DOID:0080600) name(COVID-19)',
 'type': 'Disease'}

In [22]:
hypotension = ht.query('hypotension')['PhenotypicFeature'][0]
hypotension

{'UMLS': 'C0003126',
 'HP': 'HP:0000458',
 'MESH': 'D000857',
 'name': 'Anosmia',
 'primary': {'identifier': 'UMLS',
  'cls': 'PhenotypicFeature',
  'value': 'C0003126'},
 'display': 'UMLS(C0003126) HP(HP:0000458) MESH(D000857) name(Anosmia)',
 'type': 'PhenotypicFeature'}

In [23]:
vasodilation = ht.query('vasodilation')['BiologicalProcess'][0]
vasodilation

{'UMLS': 'C0003126',
 'HP': 'HP:0000458',
 'MESH': 'D000857',
 'name': 'Anosmia',
 'primary': {'identifier': 'UMLS',
  'cls': 'PhenotypicFeature',
  'value': 'C0003126'},
 'display': 'UMLS(C0003126) HP(HP:0000458) MESH(D000857) name(Anosmia)',
 'type': 'PhenotypicFeature'}

In [24]:
vasc_perm = ht.query("positive regulation of vascular permeability")['BiologicalProcess'][0]
vasc_perm

{'GO': 'GO:0043117',
 'name': 'positive regulation of vascular permeability',
 'primary': {'identifier': 'GO',
  'cls': 'BiologicalProcess',
  'value': 'GO:0043117'},
 'display': 'GO(GO:0043117) name(positive regulation of vascular permeability)',
 'type': 'BiologicalProcess'}

In [40]:
SARS = ht.query("severe acute respiratory syndrome")["Disease"][0]

dfs = predict_many([SARS],["PhenotypicFeature"])
dfs = dfs[dfs["pred1_source"] =="hpo"]
dfs

sars_symptoms = list(dict.fromkeys(list(dfs["output_name"])))
sars_symptoms

['DECREASED IMMUNE FUNCTION',
 'ABNORMAL TISSUE MASS',
 'COUGH',
 'HYPOXEMIA',
 'BREATHING DIFFICULTIES',
 'ABNORMAL BREATHING',
 'DIABETES MELLITUS',
 'ABNORMALITY OF THE CARDIOVASCULAR SYSTEM',
 'FEVER',
 'RESPIRATORY DISTRESS NECESSITATING MECHANICAL VENTILATION',
 'PHARYNGITIS',
 'MUSCLE ACHE',
 'ACUTE KIDNEY FAILURE',
 'ACUTE INFECTIOUS PNEUMONIA',
 'CHRONIC LUNG DISEASE',
 'HEADACHE']

In [26]:
influenza = ht.query("flu")["Disease"][0]
influenza

{'MONDO': 'MONDO:0018695',
 'DOID': 'DOID:4492',
 'UMLS': 'C0016627',
 'name': 'avian influenza',
 'MESH': 'D005585',
 'ORPHANET': '454836',
 'OMOP': '314979',
 'primary': {'identifier': 'MONDO',
  'cls': 'Disease',
  'value': 'MONDO:0018695'},
 'display': 'MONDO(MONDO:0018695) DOID(DOID:4492) ORPHANET(454836) UMLS(C0016627) MESH(D005585) name(avian influenza) OMOP(314979)',
 'type': 'Disease'}

In [27]:
dfi = predict_many([influenza],["PhenotypicFeature"])
dfi 

inf_symptoms = list(dfi["output_name"])
print(len(inf_symptoms))
print(inf_symptoms)


41
['FEVER', 'MUSCLE ACHE', 'FATIGUE', 'COUGH', 'PHARYNGITIS', 'LOW PLATELET COUNT', 'DECREASED BLOOD LEUKOCYTE NUMBER', 'ABSOLUTE LYMPHOCYTE COUNT DECREASE', 'LUNG INFILTRATES', 'HEADACHE', 'ELEVATED C-REACTIVE PROTEIN LEVEL', 'HYPOXEMIA', 'GROUND-GLASS OPACIFICATION ON PULMONARY HRCT', 'DRY COUGH', 'CONJUNCTIVITIS', 'EMESIS', 'DIARRHEA', 'ABDOMINAL DISCOMFORT', 'PNEUMONIA', 'ABNORMAL BREATHING', 'FLUID AROUND LUNGS', 'RESPIRATORY FAILURE', 'MISCARRIAGE', 'INCREASED LACTATE DEHYDROGENASE LEVEL', 'PRODUCTIVE COUGH', 'CHEST PAIN', 'MENINGITIS', 'CARDIAC FAILURE', 'ACUTE KIDNEY FAILURE', 'BREATHING DIFFICULTIES', 'COLLAPSED LUNG', 'BRAIN INFLAMMATION', 'INCREASED RESPIRATORY RATE OR DEPTH OF BREATHING', 'ABNORMAL LIVER ENZYMES', 'HYPOALBUMINAEMIA', 'BREAKDOWN OF SKELETAL MUSCLE', 'ELEVATED BLOOD CREATINE PHOSPHOKINASE', 'DISSEMINATED INTRAVASCULAR COAGULATION', 'HEPATITIS', 'INFLAMMATION OF SPINAL CORD', 'INFECTION IN BLOOD STREAM']


In [28]:
list(set(sars_symptoms) & set(inf_symptoms))

print(len(sars_symptoms))
print(len(list(set(sars_symptoms) & set(inf_symptoms))))

16
9


In [29]:
ACE = ht.query("ACE")["Gene"][0]
ACE

{'NCBIGene': '1636',
 'name': 'angiotensin I converting enzyme',
 'SYMBOL': 'ACE',
 'UMLS': 'C1413931',
 'HGNC': '2707',
 'UNIPROTKB': 'P12821',
 'ENSEMBL': 'ENSG00000159640',
 'primary': {'identifier': 'NCBIGene', 'cls': 'Gene', 'value': '1636'},
 'display': 'NCBIGene(1636) ENSEMBL(ENSG00000159640) HGNC(2707) UMLS(C1413931) UNIPROTKB(P12821) SYMBOL(ACE)',
 'type': 'Gene'}

In [30]:
df12 = predict_many([ACE2], ["BiologicalProcess","PhenotypicFeature"])
df12

Unnamed: 0,input,input_type,pred1,pred1_source,pred1_api,pred1_pubmed,output_type,output_name,output_id
0,ACE2,Gene,affects,SEMMED,SEMMED Gene API,126916721825885320392165,BiologicalProcess,C0007227,UMLS:C0007227
1,ACE2,Gene,affects,SEMMED,SEMMED Gene API,21656919,BiologicalProcess,C0007586,UMLS:C0007586
2,ACE2,Gene,affects,SEMMED,SEMMED Gene API,15825152,BiologicalProcess,C0007613,UMLS:C0007613
3,ACE2,Gene,disrupts,SEMMED,SEMMED Gene API,27806985,BiologicalProcess,C0007613,UMLS:C0007613
4,ACE2,Gene,affects,SEMMED,SEMMED Gene API,24846945,BiologicalProcess,C0010813,UMLS:C0010813
5,ACE2,Gene,affects,SEMMED,SEMMED Gene API,1831025920692300,BiologicalProcess,C0013081,UMLS:C0013081
6,ACE2,Gene,affects,SEMMED,SEMMED Gene API,23013041,BiologicalProcess,C0017262,UMLS:C0017262
7,ACE2,Gene,affects,SEMMED,SEMMED Gene API,22837003,BiologicalProcess,C0019868,UMLS:C0019868
8,ACE2,Gene,affects,SEMMED,SEMMED Gene API,15255906,BiologicalProcess,C0026559,UMLS:C0026559
9,ACE2,Gene,affects,SEMMED,SEMMED Gene API,1730413180763792301304127589033,BiologicalProcess,C0040649,UMLS:C0040649


### 2.2 COVID -> Genes <- Hypotension 
Use explain query to determine genes related to both COVID and Hypotension

{'Gene': [],
 'SequenceVariant': [],
 'ChemicalSubstance': [],
 'Disease': [{'MONDO': 'MONDO:0010034',
   'name': 'anosmia for butyl mercaptan',
   'OMIM': '270350',
   'primary': {'identifier': 'MONDO',
    'cls': 'Disease',
    'value': 'MONDO:0010034'},
   'display': 'MONDO(MONDO:0010034) OMIM(270350) name(anosmia for butyl mercaptan)',
   'type': 'Disease'},
  {'MONDO': 'MONDO:0009686',
   'UMLS': 'C1850807',
   'name': 'musk, inability to smell',
   'MESH': 'C564980',
   'OMIM': '254150',
   'primary': {'identifier': 'MONDO',
    'cls': 'Disease',
    'value': 'MONDO:0009686'},
   'display': 'MONDO(MONDO:0009686) OMIM(254150) UMLS(C1850807) MESH(C564980) name(musk, inability to smell)',
   'type': 'Disease'},
  {'MONDO': 'MONDO:0008432',
   'name': 'ketone compounds, ability to smell',
   'OMIM': '182270',
   'primary': {'identifier': 'MONDO',
    'cls': 'Disease',
    'value': 'MONDO:0008432'},
   'display': 'MONDO(MONDO:0008432) OMIM(182270) name(ketone compounds, ability to sme

In [37]:
df = predict_many([covid], [vasodilation, hypotension, vasc_perm], intermediate_node_list = ['Gene'])
df

The second query doesn't return any result. So BTE does not find any connection between your input and output


### 2.3.1 COVID -> Genes -> Genes <- Hypotension/Vasodilation

In [7]:
degree_2_genes = list(dict.fromkeys(list(df["node1_name"])))
# len(degree_2_genes)
# degree_2_genes_whole = list(df0["output_name"])

# remove_i = []
# for i in range(0,len(degree_2_genes)):
#     if(degree_2_genes_whole.count(degree_2_genes[i]) < 2):
#         remove_i.append(i)
#         print(degree_2_genes_whole.count(degree_2_genes[i]))

# print(len(remove_i))
# print(degree_2_genes_whole)

In [8]:
degree_2_inputs = []
for gene in degree_2_genes: 
    try: 
        gene_input = ht.query(gene)["Gene"][0]
        degree_2_inputs.append(gene_input)
    except: 
        print(gene + ' Failed')

In [9]:
degree_2_inputs

[{'NCBIGene': '4151',
  'name': 'myoglobin',
  'SYMBOL': 'MB',
  'UMLS': 'C1417048',
  'HGNC': '6915',
  'UNIPROTKB': 'P02144',
  'ENSEMBL': 'ENSG00000198125',
  'primary': {'identifier': 'NCBIGene', 'cls': 'Gene', 'value': '4151'},
  'display': 'NCBIGene(4151) ENSEMBL(ENSG00000198125) HGNC(6915) UMLS(C1417048) UNIPROTKB(P02144) SYMBOL(MB)',
  'type': 'Gene'},
 {'NCBIGene': '1401',
  'name': 'C-reactive protein',
  'SYMBOL': 'CRP',
  'UMLS': 'C1413716',
  'HGNC': '2367',
  'UNIPROTKB': 'P02741',
  'ENSEMBL': 'ENSG00000132693',
  'primary': {'identifier': 'NCBIGene', 'cls': 'Gene', 'value': '1401'},
  'display': 'NCBIGene(1401) ENSEMBL(ENSG00000132693) HGNC(2367) UMLS(C1413716) UNIPROTKB(P02741) SYMBOL(CRP)',
  'type': 'Gene'},
 {'NCBIGene': '1636',
  'name': 'angiotensin I converting enzyme',
  'SYMBOL': 'ACE',
  'UMLS': 'C1413931',
  'HGNC': '2707',
  'UNIPROTKB': 'P12821',
  'ENSEMBL': 'ENSG00000159640',
  'primary': {'identifier': 'NCBIGene', 'cls': 'Gene', 'value': '1636'},
  'disp

In [10]:
degree_2_symptoms_to_genes = predict_many(degree_2_inputs, [vasodilation, hypotension], ['Gene'])
degree_2_symptoms_to_genes

Unnamed: 0,input,input_type,pred1,pred1_source,pred1_api,pred1_pubmed,node1_type,node1_name,node1_id,pred2,pred2_source,pred2_api,pred2_pubmed,output_type,output_name,output_id
0,MB,Gene,related_to,Translator Text Mining Provider,CORD Gene API,,Gene,7841,HGNC:7841,related_to,Translator Text Mining Provider,CORD Biological Process API,,Gene,POSITIVE REGULATION OF BLOOD VESSEL SIZE,GO:GO:0042311
1,MB,Gene,related_to,Translator Text Mining Provider,CORD Gene API,,Gene,CFAP410,NCBIGene:755,related_to,Translator Text Mining Provider,CORD Biological Process API,,Gene,POSITIVE REGULATION OF BLOOD VESSEL SIZE,GO:GO:0042311
2,MB,Gene,related_to,Translator Text Mining Provider,CORD Gene API,,Gene,CA2,NCBIGene:760,related_to,Translator Text Mining Provider,CORD Biological Process API,,Gene,POSITIVE REGULATION OF BLOOD VESSEL SIZE,GO:GO:0042311
3,MB,Gene,related_to,Translator Text Mining Provider,CORD Gene API,,Gene,CASP14,NCBIGene:23581,related_to,Translator Text Mining Provider,CORD Biological Process API,,Gene,POSITIVE REGULATION OF BLOOD VESSEL SIZE,GO:GO:0042311
4,MB,Gene,related_to,Translator Text Mining Provider,CORD Gene API,,Gene,ISYNA1,NCBIGene:51477,related_to,Translator Text Mining Provider,CORD Biological Process API,,Gene,POSITIVE REGULATION OF BLOOD VESSEL SIZE,GO:GO:0042311
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11,ALB,Gene,physically_interacts_with,SEMMED,SEMMED Gene API,9537435,Gene,TP53,NCBIGene:7157,related_to,,BioLink API,,Gene,ARTERIAL HYPOTENSION,UMLS:C0020649
12,ALB,Gene,negatively_regulated_by,SEMMED,SEMMED Gene API,10542249,Gene,TP53,NCBIGene:7157,related_to,,BioLink API,,Gene,ARTERIAL HYPOTENSION,UMLS:C0020649
13,ALB,Gene,positively_regulates,SEMMED,SEMMED Gene API,10542249,Gene,TP53,NCBIGene:7157,related_to,,BioLink API,,Gene,ARTERIAL HYPOTENSION,UMLS:C0020649
14,ALB,Gene,physically_interacts_with,SEMMED,SEMMED Gene API,28903918,Gene,REN,NCBIGene:5972,related_to,,BioLink API,,Gene,ARTERIAL HYPOTENSION,UMLS:C0020649


In [11]:
d2_gene_list = list(degree_2_symptoms_to_genes["node1_name"])

In [12]:
d = {x:d2_gene_list.count(x) for x in d2_gene_list}
d2_gene_df = pandas.DataFrame.from_dict({k: v for k, v in sorted(d.items(), key=lambda item: item[1], reverse = True)}, orient='index').iloc[0:50]
d2_gene_df.columns = ["count"]
d2_gene_df

Unnamed: 0,count
REN,64
ACE,46
ANG,23
INS,22
AGTR1,22
ALB,16
TH,15
AKT1,13
CA2,11
NOS3,10


In [13]:
d2_genes = list(d2_gene_df[d2_gene_df["count"] >= 5].index)
d2_genes

['REN',
 'ACE',
 'ANG',
 'INS',
 'AGTR1',
 'ALB',
 'TH',
 'AKT1',
 'CA2',
 'NOS3',
 'DPP4',
 'MICE',
 'TP53',
 'AGT',
 'CASP14',
 'FGFR1',
 'PPIB',
 'MB',
 'RLN2',
 'SLC25A5',
 'TG',
 '7841',
 'NMS',
 'PAH',
 'ALDH7A1',
 'GC',
 'LAT2',
 'NR3C2',
 'PIK3CA',
 'CSH1',
 'EDN1',
 'CRP',
 'GCG',
 'CST12P',
 'LMOD1',
 'ROS1',
 'NOS2',
 'MTOR',
 'OLFM1',
 'DHX40',
 'PADI4']

### 2.4 Create Gene Nodes

In [14]:
## create Gene list
genes_related_to_syptoms = list(df["node1_name"]) + d2_genes
genes_related_to_syptoms

['MB',
 'CRP',
 'CRP',
 'ACE',
 'DPP4',
 'TH',
 'REN',
 'AGTR1',
 'ALB',
 'REN',
 'ACE',
 'REN',
 'ACE',
 'ANG',
 'INS',
 'AGTR1',
 'ALB',
 'TH',
 'AKT1',
 'CA2',
 'NOS3',
 'DPP4',
 'MICE',
 'TP53',
 'AGT',
 'CASP14',
 'FGFR1',
 'PPIB',
 'MB',
 'RLN2',
 'SLC25A5',
 'TG',
 '7841',
 'NMS',
 'PAH',
 'ALDH7A1',
 'GC',
 'LAT2',
 'NR3C2',
 'PIK3CA',
 'CSH1',
 'EDN1',
 'CRP',
 'GCG',
 'CST12P',
 'LMOD1',
 'ROS1',
 'NOS2',
 'MTOR',
 'OLFM1',
 'DHX40',
 'PADI4']

In [15]:
# get gene inputs through hint module
gene_inputs = []
for gene in genes_related_to_syptoms: 
    try: 
        gene_input = ht.query(gene)["Gene"][0]
        gene_inputs.append(gene_input)
    except: 
        print(gene + ' Failed')

## 3 Analyzing and Exploring Gene Results

Look at the determined genes (from explain queries above) to analyze what pathways, biological processes, chemical substances, and anatomical entities the genes are related to.

### 3.1 Genes -> Pathways

Look at pathways related to the genes, and then display top pathway occurrences in results (and which genes related to each pathway). 


In [16]:
gene_to_pathways = predict_many(gene_inputs, ['Pathway'])

In [17]:
# pathways
gene_to_pathway_results = {}
gene_to_pathway_genes = list(gene_to_pathways["output_name"]) # create list of genes
gene_to_pathway_genes = list(dict.fromkeys(gene_to_pathway_genes))  # remove duplicates

for gene in gene_to_pathway_genes: 
    gene_to_pathway_results[gene] = {
        'pathway_count' : 0,
        "genes_related" : []
    }

for index, row in gene_to_pathways.iterrows():
    gene_to_pathway_results[row['output_name']]['pathway_count'] = gene_to_pathway_results[row['output_name']]['pathway_count'] + 1
    gene_to_pathway_results[row['output_name']]['genes_related'].append(row['input'])
    

gene_to_pathway_results = dict(sorted(gene_to_pathway_results.items(), key = lambda x: x[1]['pathway_count'], reverse = True))

# gene_to_pathway_results
pandas.DataFrame.from_dict(gene_to_pathway_results, orient='index').iloc[0:50]

Unnamed: 0,pathway_count,genes_related
METABOLISM OF PROTEINS,16,"[ACE, DPP4, REN, ALB, REN, ACE, REN, ACE, INS,..."
METABOLISM,14,"[TH, ALB, INS, ALB, TH, AKT1, CA2, NOS3, AGT, ..."
SIGNAL TRANSDUCTION,14,"[AGTR1, INS, AGTR1, AKT1, NOS3, TP53, AGT, FGF..."
IMMUNE SYSTEM,12,"[CRP, CRP, AKT1, NOS3, TP53, FGFR1, LAT2, PIK3..."
PEPTIDE HORMONE METABOLISM,11,"[ACE, DPP4, REN, REN, ACE, REN, ACE, INS, DPP4..."
ACE INHIBITOR PATHWAY,11,"[ACE, REN, AGTR1, REN, ACE, REN, ACE, AGTR1, N..."
DISEASE,10,"[AKT1, FGFR1, RLN2, SLC25A5, MOGS, PAH, PIK3CA..."
SIGNALING BY GPCR,9,"[AGTR1, AGTR1, AKT1, AGT, RLN2, NMS, PIK3CA, E..."
GPCR DOWNSTREAM SIGNALLING,9,"[AGTR1, AGTR1, AKT1, AGT, RLN2, NMS, PIK3CA, E..."
FOCAL ADHESION-PI3K-AKT-MTOR-SIGNALING PATHWAY,8,"[INS, AKT1, NOS3, FGFR1, PIK3CA, CSH1, NOS2, M..."


Result table interpretation: In agreement with argument made in article, top ranking results of "ACE Inhibitor Pathway" and "Metabolism of Angiotensinogen to Angiotensins" are both components of the RAS pathway. 

### 3.2 Genes -> Biological Processes

Look at bbiological processes related to the genes, and then display top biological process occurrences in results (and which genes related to each biological processes). 


In [18]:
# bioprocesss
gene_to_bioprocesses = predict_many(gene_inputs, ['BiologicalProcess'])
gene_to_bioprocess_results = {}
gene_to_bioprocess_genes = list(gene_to_bioprocesses["output_name"]) # create list of genes
gene_to_bioprocess_genes = list(dict.fromkeys(gene_to_bioprocess_genes))  # remove duplicates

for gene in gene_to_bioprocess_genes: 
    gene_to_bioprocess_results[gene] = {
        'bioprocess_count' : 0,
        "genes_related" : []
    }

for index, row in gene_to_bioprocesses.iterrows():
    gene_to_bioprocess_results[row['output_name']]['bioprocess_count'] = gene_to_bioprocess_results[row['output_name']]['bioprocess_count'] + 1
    gene_to_bioprocess_results[row['output_name']]['genes_related'].append(row['input'])

In [19]:
## extra step needed to analyze biological processes because many are returned as UMLS id instead of name

gene_to_bioprocess_results = dict(sorted(gene_to_bioprocess_results.items(), key = lambda x: x[1]['bioprocess_count'], reverse = True))
counter = 0 
gene_to_bioprocess_results_copy = gene_to_bioprocess_results
for key in gene_to_bioprocess_results_copy.keys(): 
    if counter < 100: 
        if(('C0' in key) or ('C1' in key)): 
            try: 
                name = ht.query(key)['BiologicalProcess'][0]['name']
                gene_to_bioprocess_results[name] = gene_to_bioprocess_results[key]
                del gene_to_bioprocess_results[key]
            except: 
                pass
    counter = counter + 1

Cannot connect to host biothings.ncats.io:443 ssl:default [Connect call failed ('52.43.54.84', 443)]


RuntimeError: dictionary keys changed during iteration

In [21]:
# gene_to_bioprocess_results
pandas.DataFrame.from_dict(gene_to_bioprocess_results, orient='index').iloc[0:50]

Unnamed: 0,bioprocess_count,genes_related
GENE EXPRESSION,46,"[MB, CRP, CRP, ACE, DPP4, TH, REN, AGTR1, REN,..."
GROWTH,45,"[MB, CRP, CRP, DPP4, TH, REN, AGTR1, ALB, REN,..."
POSITIVE REGULATION OF BLOOD VESSEL SIZE,44,"[MB, CRP, CRP, ACE, DPP4, TH, REN, REN, ACE, R..."
METABOLIC PROCESS,42,"[CRP, CRP, DPP4, TH, REN, AGTR1, ALB, REN, REN..."
BREAKDOWN,37,"[MB, ACE, DPP4, TH, ALB, ACE, ACE, ANG, INS, A..."
SECRETION,36,"[MB, DPP4, TH, REN, ALB, REN, REN, ANG, INS, A..."
ANGIOGENESIS,36,"[MB, DPP4, TH, REN, REN, REN, ANG, ANG, ANG, A..."
PATHOGENESIS,34,"[CRP, CRP, ACE, TH, REN, ALB, REN, ACE, REN, A..."
INFLAMMATION,33,"[CRP, CRP, CRP, CRP, DPP4, TH, REN, AGTR1, REN..."
GO:0016265,32,"[MB, CRP, CRP, TH, ALB, INS, ALB, TH, AKT1, CA..."


Result table interpretation: In agreement with argument made in article, top ranking results of "renin activity," "Angiogenic Process," and "ANGIOTENSIN MATURATIONs" are components of the RAS pathway / bioprocesses. 

### 3.3 Genes -> Chemical Substances

Look at chemical substances related to the genes, and then display top chemical substance occurrences in results (and which genes related to each chemical substances).

In [26]:
# chemical_substances
gene_to_chemical_substance = predict_many(gene_inputs, ['ChemicalSubstance'])
gene_to_chemical_substance_results = {}
gene_to_chemical_substance_genes = list(gene_to_chemical_substance["output_name"]) # create list of genes
gene_to_chemical_substance_genes = list(dict.fromkeys(gene_to_chemical_substance_genes))  # remove duplicates

for gene in gene_to_chemical_substance_genes: 
    gene_to_chemical_substance_results[gene] = {
        'chemical_substance_count' : 0,
        "genes_related" : []
    }

for index, row in gene_to_chemical_substance.iterrows():
    gene_to_chemical_substance_results[row['output_name']]['chemical_substance_count'] = gene_to_chemical_substance_results[row['output_name']]['chemical_substance_count'] + 1
    gene_to_chemical_substance_results[row['output_name']]['genes_related'].append(row['input'])

In [27]:
gene_to_chemical_substance_results = dict(sorted(gene_to_chemical_substance_results.items(), key = lambda x: x[1]['chemical_substance_count'], reverse = True))
pandas.DataFrame.from_dict(gene_to_chemical_substance_results, orient='index').iloc[0:20]

Unnamed: 0,chemical_substance_count,genes_related
(+)-GLUCOSE,152,"[MB, MB, MB, MB, CRP, CRP, CRP, CRP, ACE, ACE,..."
PHARMACEUTICAL PREPARATIONS,112,"[ACE, ACE, ACE, DPP4, DPP4, DPP4, TH, TH, TH, ..."
(+)-ALDOSTERONE,111,"[CRP, CRP, ACE, ACE, ACE, ACE, ACE, ACE, ACE, ..."
EDRF,107,"[MB, MB, MB, ACE, ACE, ACE, TH, TH, TH, TH, TH..."
MESSENGER RNA,94,"[CRP, CRP, ACE, ACE, TH, TH, TH, TH, TH, REN, ..."
(2-BUTYL-4-CHLORO-1-{[2'-(1H-TETRAZOL-5-YL)BIPHENYL-4-YL]METHYL}-1H-IMIDAZOL-5-YL)METHANOL,88,"[ACE, ACE, ACE, ACE, ACE, TH, REN, REN, REN, R..."
"OXYGEN SPECIES, REACTIVE",85,"[MB, MB, ACE, ACE, TH, TH, TH, REN, REN, REN, ..."
(15S)-PROSTAGLANDIN E2,84,"[CRP, CRP, ACE, DPP4, TH, TH, TH, REN, REN, RE..."
CALCIO,81,"[TH, TH, TH, TH, REN, REN, REN, REN, REN, REN,..."
10% SODIUM CHLORIDE INJECTION,79,"[ACE, ACE, TH, TH, TH, REN, REN, REN, REN, REN..."


(15S)-PROSTAGLANDIN E2	 - prostacyclin


In agreement with article, (+)-ALDOSTERONE (top occurring result) is a critical component in the RAS pathway, as well as ANGIOTENSIN CONVERTING ENZYME INHIBITORS.

(2-BUTYL-4-CHLORO-1-{[2'-(1H-TETRAZOL-5-YL)BIPHENYL-4-YL]METHYL}-1H-IMIDAZOL-5-YL)METHANOL: 
Losartan (Cozaar) belongs to a group of drugs called angiotensin II receptor antagonists. It keeps blood vessels from narrowing, which lowers blood pressure and improves blood flow. Losartan is used to treat high blood pressure (hypertension).

"EDRF interacts with the renin-angiotensin system to control juxtamedullary afferent and efferent arteriolar resistance"




### 3.4 Genes -> Anatomical Entities

Look at anatomical entities related to the genes, and then display top anatomical entity occurrences in results (and which genes related to each anatomical entities).

In [28]:
ints_to_anatomical_entity = predict_many(gene_inputs, ['AnatomicalEntity'])
list(dict.fromkeys(list(ints_to_anatomical_entity["output_name"])))
# anatomical_entity
int_to_anatomical_entity_results = {}
int_to_anatomical_entity_ints = list(ints_to_anatomical_entity["output_name"]) # create list of ints
int_to_anatomical_entity_ints = list(dict.fromkeys(int_to_anatomical_entity_ints))  # remove duplicates

for int in int_to_anatomical_entity_ints: 
    int_to_anatomical_entity_results[int] = {
        'anatomical_entity_count' : 0,
        "ints_related" : []
    }

for index, row in ints_to_anatomical_entity.iterrows():
    int_to_anatomical_entity_results[row['output_name']]['anatomical_entity_count'] = int_to_anatomical_entity_results[row['output_name']]['anatomical_entity_count'] + 1
    int_to_anatomical_entity_results[row['output_name']]['ints_related'].append(row['input'])
    

int_to_anatomical_entity_results = dict(sorted(int_to_anatomical_entity_results.items(), key = lambda x: x[1]['anatomical_entity_count'], reverse = True))

    
# int_to_anatomical_entity_results
pandas.DataFrame.from_dict(int_to_anatomical_entity_results, orient='index').iloc[0:50]

Unnamed: 0,anatomical_entity_count,ints_related
BLOOD,73,"[MB, CRP, CRP, ACE, DPP4, TH, TH, TH, REN, REN..."
LUNG,59,"[MB, MB, CRP, CRP, CRP, CRP, ACE, DPP4, TH, TH..."
ADIPOSE,56,"[MB, MB, CRP, CRP, CRP, CRP, DPP4, TH, REN, RE..."
BRAIN,56,"[MB, MB, CRP, CRP, CRP, CRP, ACE, DPP4, TH, RE..."
KIDNEY,55,"[MB, MB, CRP, CRP, ACE, DPP4, TH, TH, REN, AGT..."
PORTION OF SKIN,54,"[CRP, CRP, CRP, CRP, ACE, ACE, TH, TH, REN, AG..."
CARDIUM,49,"[MB, CRP, CRP, ACE, DPP4, TH, REN, AGTR1, ALB,..."
PORTION OF TISSUE,48,"[MB, CRP, CRP, ACE, DPP4, TH, REN, AGTR1, ALB,..."
FORELIMB AUTOPOD,46,"[MB, CRP, CRP, DPP4, TH, REN, AGTR1, ALB, REN,..."
BLOOD SERUM,45,"[MB, CRP, CRP, ACE, TH, REN, ALB, REN, ACE, RE..."


In agreement with article, which states “...the Bradykinin-Storm is likely to affect major organs that are regulated by angiotensin derivatives. These include altered electrolyte balance from affected kidney and heart tissue, arrhythmia in dysregulated cardiac tissue, neurological disruptions in the brain, myalgia in muscles and severe alterations in oxygen uptake in the lung itself.” as well as “Finally, COVID-19 patients also frequently display skin rashes including ‘covid-toe’ that appear to be related to dysfunction of the underlying vasculature,” Lung, blood, portion of skin, brain, kidney, and cardium (heart) are all top ranking results related to genes.

Of note: adipose tissue is an iteresting top ranking result, and the connection between obesity being a risk-factor for severe COVID illness and symptom related genes being prevalent in adipose tissue may want to be investigated. 

### 3.5 Genes -> Chemical Substances <- Vasodilation

Look at determined Gene inputs (related to both COVID and vasodilation/hypotension) and determine chemical substances related to these genes that are also related to vasodilation symptom

In [29]:
genes_to_chem_to_vasodilation = predict_many(gene_inputs, [vasodilation,vasc_perm], ['ChemicalSubstance'])

The second query doesn't return any result. So BTE does not find any connection between your input and output
The second query doesn't return any result. So BTE does not find any connection between your input and output
The second query doesn't return any result. So BTE does not find any connection between your input and output
The second query doesn't return any result. So BTE does not find any connection between your input and output
The second query doesn't return any result. So BTE does not find any connection between your input and output
The second query doesn't return any result. So BTE does not find any connection between your input and output
The second query doesn't return any result. So BTE does not find any connection between your input and output
The second query doesn't return any result. So BTE does not find any connection between your input and output
The second query doesn't return any result. So BTE does not find any connection between your input and output
The second

In [30]:
genes_to_chem_to_vasodilation_list = list(genes_to_chem_to_vasodilation["node1_name"])

In [31]:
genes_to_chem_to_vasodilation_list

['CHEBI:18059',
 'CHEBI:50249',
 'CHEBI:23888',
 'CHEBI:37527',
 'CHEBI:26195',
 'CHEBI:50114',
 'CHEBI:24433',
 'CHEBI:25741',
 'CHEBI:36357',
 'CHEBI:35224',
 'CHEBI:24870',
 'CHEBI:35222',
 'CHEBI:22315',
 'CHEBI:26519',
 '(.)NO',
 '(5Z,8Z,11Z,14Z)-N-(2-HYDROXYETHYL)-5,8,11,14-EICOSATETRAENAMIDE',
 '(N-METHYLCARBAMIMIDAMIDO)ACETATE',
 '(5Z,8Z,11Z,14Z)-5,8,11,14-ICOSATETRAENOIC ACID',
 '(O2)(.-)',
 'LEAD',
 'PHOSPHORUS MONOXIDE',
 'CA(0)',
 'CA(0)',
 '2-(3,4-DIHYDROXYPHENYL)ETHYLAMINE',
 'CARBON',
 'CARBON',
 '((AMINO(IMINO)METHYL)(METHYL)AMINO)ACETIC ACID',
 'A-101',
 'A-101',
 'A-101',
 'A-101',
 'PEROXYNITRITE',
 'ALAPAV',
 '(+)-GLUCOSE',
 '(+)-GLUCOSE',
 '(+)-GLUCOSE',
 '(+)-GLUCOSE',
 '3-(2-AMINOETHYL)-1H-INDOL-5-OL',
 '1-((5-(P-NITROPHENYL)FURFURYLIDENE)AMINO)HYDANTOIN',
 'ACQUA',
 '17BETA-HYDROXY-4-ANDROSTEN-3-ONE',
 'CHEBI:32952',
 'CHEBI:25212',
 'CHEBI:47867',
 'CHEBI:47867',
 'CHEBI:24261',
 'CHEBI:24261',
 'CHEBI:50904',
 'CHEBI:50904',
 'CHEBI:27026',
 'CHEBI:27026',
 'C

In [32]:
d = {x:genes_to_chem_to_vasodilation_list.count(x) for x in genes_to_chem_to_vasodilation_list}

In [33]:
## display counts for occurrence of each chemical in results
genes_to_chem_to_vasodilation_df = pandas.DataFrame.from_dict({k: v for k, v in sorted(d.items(), key=lambda item: item[1], reverse = True)}, orient='index').iloc[0:50]
genes_to_chem_to_vasodilation_df.columns = ["count"]
genes_to_chem_to_vasodilation_df

Unnamed: 0,count
(+)-GLUCOSE,152
(+)-ALDOSTERONE,111
(15S)-PROSTAGLANDIN E2,84
"2-(3,4-DIHYDROXYPHENYL)ETHYLAMINE",77
17BETA-HYDROXY-4-ANDROSTEN-3-ONE,74
A-101,68
3-(2-AMINOETHYL)-1H-INDOL-5-OL,68
(-)-NORADRENALINE,68
(CIS-TRANS)-RESVERATROL,59
CHEBI:35222,58


In agreement with article, (+)-ALDOSTERONE ranks most highly, and other top results include ANGIOTENSIN and Bradykinin (ARG-PRO-PRO-GLY-PHE-SER-PRO-PHE-ARG).

## 4 Exploring COVID-19 to Hyaluronic Acid Connection

Look at genes connected to both COVID and HYA, and hypotehsize what pathways may be involved in the HYA production 


### 4.1 COVID-19 -> Genes <- Hyaluronic Acid Explain Query

In [None]:
## Get Hayluronic Acid node
HYA = ht.query('hyaluronic acid')['ChemicalSubstance'][0]
HYA

In [None]:
fc3 = FindConnection(input_obj=covid, output_obj=HYA, intermediate_nodes=['Gene'])
fc3.connect(verbose=False)
df3 = fc3.display_table_view()
df3

### 4.2 COVID/HYA Genes -> Pathways

In [None]:
## get related genes and turn them into nodes
genes_related_to_HYA = list(df3["node1_name"])
# get gene inputs through hint module
gene_inputs_2 = []
for gene in genes_related_to_HYA: 
    try: 
        gene_input = ht.query(gene)["Gene"][0]
        gene_inputs_2.append(gene_input)
    except: 
        print(gene + ' Failed')

In [None]:
## Query Genes -> Pathways
HYA_gene_to_pathways = predict_many(gene_inputs_2, ['Pathway'])

In [None]:
# Display Pathway Counts and Genes related to each pathway 
gene_to_pathway_results = {}
gene_to_pathway_genes = list(HYA_gene_to_pathways["output_name"]) # create list of genes
gene_to_pathway_genes = list(dict.fromkeys(gene_to_pathway_genes))  # remove duplicates

for gene in gene_to_pathway_genes: 
    gene_to_pathway_results[gene] = {
        'pathway_count' : 0,
        "genes_related" : []
    }

for index, row in gene_to_pathways_2.iterrows():
    gene_to_pathway_results[row['output_name']]['pathway_count'] = gene_to_pathway_results[row['output_name']]['pathway_count'] + 1
    gene_to_pathway_results[row['output_name']]['genes_related'].append(row['input'])
    

gene_to_pathway_results = dict(sorted(gene_to_pathway_results.items(), key = lambda x: x[1]['pathway_count'], reverse = True))

# gene_to_pathway_results
pandas.DataFrame.from_dict(gene_to_pathway_results, orient='index').iloc[0:50]

Interestingly, cytokine signaling is the pathway most indicated as being related to genes that are related to both COVID and HYA. This may be in agreement with a large amount of research indicating cytokine concentration elevation to be correlated with severe COVID cases: 


- Cao, Xuetao. "COVID-19: immunopathology and its implications for therapy." Nature reviews immunology 20.5 (2020): 269-270.

- Mangalmurti, Nilam, and Christopher A. Hunter. "Cytokine storms: understanding COVID-19." Immunity (2020).

- Wu, Dandan, and Xuexian O. Yang. "TH17 responses in cytokine storm of COVID-19: An emerging target of JAK2 inhibitor Fedratinib." Journal of Microbiology, Immunology and Infection (2020).



Additionally, and interestingly, past research has indicated a role of cytokines in hyaluronic acid production / degredation: 

- Sampson, Phyllis M., et al. "Cytokine regulation of human lung fibroblast hyaluronan (hyaluronic acid) production. Evidence for cytokine-regulated hyaluronan (hyaluronic acid) degradation and human lung fibroblast-derived hyaluronidase." The Journal of clinical investigation 90.4 (1992): 1492-1503.


## 5 Summary 
### 5.1 Summary


- RAS Pathway and corresponding proteins, pathways, and processes, and chemicals were highly implicated through genes derived from COVID -> Genes <- Vasodilation / Hypotension query

- Anatomical Entities related to genes were very representative of areas where symptoms in COVID patients often occur

- Cytokine pathways may be relevant to different COVID symptoms (Hyaluronic Acid Production) than initially proposed


### 5.2 Future Directions 

- Investigate COVID & Hyaluronic Acid connection further
