# Addressing COVID-19 Patient RAS-mediated Bradykinin Storm Hypothesis with  Biothings Explorer

&emsp;

SLIDES LINK: https://docs.google.com/presentation/d/1cL0Y-2FECPP5rWlJGI_ZWsvBysZVUGU-oDq75VexFrA/edit?usp=sharing

# Table of Contents

## &emsp; 0 Imports
## &emsp; 1 Overview of Background and BTE Approach 
### &emsp; &emsp; 1.1 Article: Summary and Background 
### &emsp; &emsp; 1.2 Overview of BTE Approach 
## &emsp; 2 Determining Related Genes
### &emsp; &emsp; 2.1 Load COVID-19, Hypotension, and Vasodilation Nodes
### &emsp; &emsp; 2.2 COVID -> Genes <- Hypotension
### &emsp; &emsp; 2.3 COVID -> Genes <- Vasodilation 
### &emsp; &emsp; 2.4 Create Gene Nodes
## &emsp; 3 Analyzing and Exploring Gene Results
### &emsp; &emsp; 3.1 Genes -> Pathways
### &emsp; &emsp; 3.2 Genes -> Biological Processes 
### &emsp; &emsp; 3.3 Genes -> Chemical Substances
### &emsp; &emsp; 3.4 Genes -> Anatomical Entities
### &emsp; &emsp; 3.5 Genes -> Chemical Substances <- Vasodilation
## &emsp; 4 Exploring COVID-19 to Hyaluronic Acid Connection
### &emsp; &emsp; 4.1 COVID-19 -> Genes <- Hyaluronic Acid Explain Query
### &emsp; &emsp; 4.2 COVID/HYA Genes -> Pathways
## &emsp; 5 Summary 
### &emsp; &emsp; 5.1 Summary
### &emsp; &emsp; 5.2 Future Directions 

## 0 Imports

In [9]:
# Import pandas and biothings explorers modules
import pandas as pandas
from biothings_explorer.query.predict import Predict
from biothings_explorer.query.visualize import display_graph
from biothings_explorer.user_query_dispatcher import FindConnection
from biothings_explorer.hint import Hint
import nest_asyncio
nest_asyncio.apply()
%matplotlib inline
import warnings
warnings.filterwarnings("ignore") 
ht = Hint()


## predict_many -> functionality to be fully incorporated into BTE soon, will no longer need following
def predict_many(input_object_list, output_type_list, intermediate_node_list = ''):
    df_list = []
    for input_object in input_object_list: 
        if('name' in input_object):
            for output_type in output_type_list: 
                if(len(intermediate_node_list) > 0):
                    for inter in intermediate_node_list:
                        try: 
#                             print("Running: " + input_object['name'] + ' --> intermediate type ' + inter + ' --> output type ' + output_type )
                            fc = FindConnection(input_obj=input_object, output_obj=output_type, intermediate_nodes=[inter])
                            fc.connect(verbose=False)
                            df = fc.display_table_view()
                            rows = df.shape[0]
                            if(rows > 0):
                                df_list.append(df)
                        except:
                            pass
#                             print(input_object['name'] + ' --> intermediate type ' + inter + ' --> output type ' + output_type + ' FAILED')
                else:
                    try:
#                         print("Running: " + input_object['name'] + ' --> output type ' + output_type )
                        fc = FindConnection(input_obj=input_object, output_obj=output_type, intermediate_nodes=None)
                        fc.connect(verbose=False)
                        df = fc.display_table_view()
                        rows = df.shape[0]
                        if(rows > 0):
                            df_list.append(df)
                    except:
                        pass
#                         print(input_object['name'] + ' --> output type ' + output_type + ' FAILED')

    if(len(df_list) > 0):
        return pandas.concat(df_list)
    else:
        return None

## 1 Overview of Background and BTE Approach
&emsp;
### 1.1 Article: Summary and Background

Article Reference:

Garvin, Michael R., et al. "A mechanistic model and therapeutic interventions for COVID-19 involving a RAS-mediated bradykinin storm." Elife 9 (2020): e59177.


Article Link:  

https://elifesciences.org/articles/59177



Article Main Points: 

- RAS Pathway Imbalance implicated through gene expression analysis from cells in bronchoalveolar lavage fluid (BALF) from COVID-19 patients 

- Predicted RAS pathway imbalance to be cause of bradykinin-driven vascular dilation, vascular permeability and hypotension

- Leaky membranes -> allows Hyaluronic Acid (HYA) to permeate into lungs

- Analyses found that production of HYA was increased and the enzymes that could degrade it greatly decreased


### 1.2 Overview of BTE Approach 

As described in the article, vasodilation and hypotension are two distinctive signs and symptoms in severe COVID-19 cases that are predicted to lead to leaky membranes in the lung. Therefore, the following approach to determine whether or not this may be RAS-pathway linked is by looking at Genes that are both related to COVID and vasoconstriction or hypotension, and then analyzing this genes by looking at what pathways or processes they may be involved in, what chemical substances they may produce, and in what tissues or anatomical entities the genes are linked to. 

Also, starting in Section 4, a brief exploration of genes related to both COVID and Hyaluronic acid is done - HYA being a chemical found in the lungs of COVID patients that is hyper-absorbant and forms a gelatenous substance that blocks oxygen absorption by the lungs

## 2 Determining Related Genes
&emsp;
### 2.1 Load COVID-19, Hypotension, and Vasodilation Nodes
&emsp;


In [86]:
covid = ht.query('COVID-19')['Disease'][0]
covid

{'MONDO': 'MONDO:0100096',
 'DOID': 'DOID:0080600',
 'name': 'COVID-19',
 'primary': {'identifier': 'MONDO',
  'cls': 'Disease',
  'value': 'MONDO:0100096'},
 'display': 'MONDO(MONDO:0100096) DOID(DOID:0080600) name(COVID-19)',
 'type': 'Disease'}

In [87]:
hypotension = ht.query('hypotension')['PhenotypicFeature'][0]
hypotension

{'UMLS': 'C0020649',
 'HP': 'HP:0002615',
 'MESH': 'D007022',
 'name': 'Hypotension',
 'primary': {'identifier': 'UMLS',
  'cls': 'PhenotypicFeature',
  'value': 'C0020649'},
 'display': 'UMLS(C0020649) HP(HP:0002615) MESH(D007022) name(Hypotension)',
 'type': 'PhenotypicFeature'}

In [88]:
vasodilation = ht.query('vasodilation')['BiologicalProcess'][0]
vasodilation

{'GO': 'GO:0042311',
 'name': 'vasodilation',
 'primary': {'identifier': 'GO',
  'cls': 'BiologicalProcess',
  'value': 'GO:0042311'},
 'display': 'GO(GO:0042311) name(vasodilation)',
 'type': 'BiologicalProcess'}

### 2.2 COVID -> Genes <- Hypotension 
Use explain query to determine genes related to both COVID and Hypotension

In [4]:
fc = FindConnection(input_obj=covid, output_obj=hypotension, intermediate_nodes=['Gene'])
fc.connect(verbose=False)
df = fc.display_table_view()
df

Unnamed: 0,input,input_type,pred1,pred1_source,pred1_api,pred1_pubmed,node1_type,node1_name,node1_id,pred2,pred2_source,pred2_api,pred2_pubmed,output_type,output_name,output_id
0,2019 NOVEL CORONAVIRUS,Disease,related_to,DISEASE,DISEASES API,,Gene,AGTR1,NCBIGene:185,related_to,,BioLink API,,Gene,ARTERIAL HYPOTENSION,UMLS:C0020649
1,2019 NOVEL CORONAVIRUS,Disease,related_to,DISEASE,DISEASES API,,Gene,ALB,NCBIGene:213,related_to,,BioLink API,,Gene,ARTERIAL HYPOTENSION,UMLS:C0020649
2,2019 NOVEL CORONAVIRUS,Disease,related_to,DISEASE,DISEASES API,,Gene,REN,NCBIGene:5972,related_to,,BioLink API,,Gene,ARTERIAL HYPOTENSION,UMLS:C0020649
3,2019 NOVEL CORONAVIRUS,Disease,related_to,DISEASE,DISEASES API,,Gene,ACE,NCBIGene:1636,related_to,,BioLink API,,Gene,ARTERIAL HYPOTENSION,UMLS:C0020649


In [21]:
df.to_csv("covid_to_hyptension.csv", index = False)

### 2.3 COVID -> Genes <- Vasodilation 
Use explain query to determine genes related to both COVID and Vasodilation

In [5]:
fc2 = FindConnection(input_obj=covid, output_obj=vasodilation, intermediate_nodes=['Gene'])
fc2.connect(verbose=False)
df2 = fc2.display_table_view()
df2

Unnamed: 0,input,input_type,pred1,pred1_source,pred1_api,pred1_pubmed,node1_type,node1_name,node1_id,pred2,pred2_source,pred2_api,pred2_pubmed,output_type,output_name,output_id
0,2019 NOVEL CORONAVIRUS,Disease,related_to,DISEASE,DISEASES API,,Gene,MB,NCBIGene:4151,related_to,Translator Text Mining Provider,CORD Biological Process API,,Gene,POSITIVE REGULATION OF BLOOD VESSEL SIZE,GO:GO:0042311
1,2019 NOVEL CORONAVIRUS,Disease,related_to,DISEASE,DISEASES API,,Gene,CRP,NCBIGene:1401,related_to,Translator Text Mining Provider,CORD Biological Process API,,Gene,POSITIVE REGULATION OF BLOOD VESSEL SIZE,GO:GO:0042311
2,2019 NOVEL CORONAVIRUS,Disease,related_to,scigraph,Automat CORD19 Scigraph API,,Gene,CRP,NCBIGene:1401,related_to,Translator Text Mining Provider,CORD Biological Process API,,Gene,POSITIVE REGULATION OF BLOOD VESSEL SIZE,GO:GO:0042311
3,2019 NOVEL CORONAVIRUS,Disease,related_to,DISEASE,DISEASES API,,Gene,ACE,NCBIGene:1636,related_to,Translator Text Mining Provider,CORD Biological Process API,,Gene,POSITIVE REGULATION OF BLOOD VESSEL SIZE,GO:GO:0042311
4,2019 NOVEL CORONAVIRUS,Disease,related_to,DISEASE,DISEASES API,,Gene,DPP4,NCBIGene:1803,related_to,Translator Text Mining Provider,CORD Biological Process API,,Gene,POSITIVE REGULATION OF BLOOD VESSEL SIZE,GO:GO:0042311
5,2019 NOVEL CORONAVIRUS,Disease,related_to,scigraph,Automat CORD19 Scigraph API,,Gene,TH,NCBIGene:7054,related_to,Translator Text Mining Provider,CORD Biological Process API,,Gene,POSITIVE REGULATION OF BLOOD VESSEL SIZE,GO:GO:0042311
6,2019 NOVEL CORONAVIRUS,Disease,related_to,DISEASE,DISEASES API,,Gene,REN,NCBIGene:5972,related_to,Translator Text Mining Provider,CORD Biological Process API,,Gene,POSITIVE REGULATION OF BLOOD VESSEL SIZE,GO:GO:0042311


In [22]:
df2.to_csv("covid_to_vasodilation.csv", index = False)

### 2.4 Create Gene Nodes

In [6]:
## create Gene list
genes_related_to_syptoms = list(df["node1_name"]) + list(df2["node1_name"])
genes_related_to_syptoms

['AGTR1', 'ALB', 'REN', 'ACE', 'MB', 'CRP', 'CRP', 'ACE', 'DPP4', 'TH', 'REN']

In [10]:
# get gene inputs through hint module
gene_inputs = []
for gene in genes_related_to_syptoms: 
    try: 
        gene_input = ht.query(gene)["Gene"][0]
        gene_inputs.append(gene_input)
    except: 
        print(gene + ' Failed')

## 3 Analyzing and Exploring Gene Results

Look at the determined genes (from explain queries above) to analyze what pathways, biological processes, chemical substances, and anatomical entities the genes are related to.

### 3.1 Genes -> Pathways

Look at pathways related to the genes, and then display top pathway occurrences in results (and which genes related to each pathway). 


In [11]:
gene_to_pathways = predict_many(gene_inputs, ['Pathway'])

In [12]:
# pathways
gene_to_pathway_results = {}
gene_to_pathway_genes = list(gene_to_pathways["output_name"]) # create list of genes
gene_to_pathway_genes = list(dict.fromkeys(gene_to_pathway_genes))  # remove duplicates

for gene in gene_to_pathway_genes: 
    gene_to_pathway_results[gene] = {
        'pathway_count' : 0,
        "genes_related" : []
    }

for index, row in gene_to_pathways.iterrows():
    gene_to_pathway_results[row['output_name']]['pathway_count'] = gene_to_pathway_results[row['output_name']]['pathway_count'] + 1
    gene_to_pathway_results[row['output_name']]['genes_related'].append(row['input'])
    

gene_to_pathway_results = dict(sorted(gene_to_pathway_results.items(), key = lambda x: x[1]['pathway_count'], reverse = True))

# gene_to_pathway_results
pandas.DataFrame.from_dict(gene_to_pathway_results, orient='index').iloc[0:50]

Unnamed: 0,pathway_count,genes_related
METABOLISM OF PROTEINS,6,"[ALB, REN, ACE, ACE, DPP4, REN]"
ACE INHIBITOR PATHWAY,5,"[AGTR1, REN, ACE, ACE, REN]"
PEPTIDE HORMONE METABOLISM,5,"[REN, ACE, ACE, DPP4, REN]"
METABOLISM OF ANGIOTENSINOGEN TO ANGIOTENSINS,4,"[REN, ACE, ACE, REN]"
SELENIUM MICRONUTRIENT NETWORK,3,"[ALB, CRP, CRP]"
VITAMIN B12 METABOLISM,3,"[ALB, CRP, CRP]"
FOLATE METABOLISM,3,"[ALB, CRP, CRP]"
HUMAN COMPLEMENT SYSTEM,3,"[ALB, CRP, CRP]"
VESICLE-MEDIATED TRANSPORT,2,"[AGTR1, ALB]"
METABOLISM,2,"[ALB, TH]"


Result table interpretation: In agreement with argument made in article, top ranking results of "ACE Inhibitor Pathway" and "Metabolism of Angiotensinogen to Angiotensins" are both components of the RAS pathway. 

### 3.2 Genes -> Biological Processes

Look at bbiological processes related to the genes, and then display top biological process occurrences in results (and which genes related to each biological processes). 


In [13]:
# bioprocesss
gene_to_bioprocesses = predict_many(gene_inputs, ['BiologicalProcess'])
gene_to_bioprocess_results = {}
gene_to_bioprocess_genes = list(gene_to_bioprocesses["output_name"]) # create list of genes
gene_to_bioprocess_genes = list(dict.fromkeys(gene_to_bioprocess_genes))  # remove duplicates

for gene in gene_to_bioprocess_genes: 
    gene_to_bioprocess_results[gene] = {
        'bioprocess_count' : 0,
        "genes_related" : []
    }

for index, row in gene_to_bioprocesses.iterrows():
    gene_to_bioprocess_results[row['output_name']]['bioprocess_count'] = gene_to_bioprocess_results[row['output_name']]['bioprocess_count'] + 1
    gene_to_bioprocess_results[row['output_name']]['genes_related'].append(row['input'])

In [89]:
## extra step needed to analyze biological processes because many are returned as UMLS id instead of name

gene_to_bioprocess_results = dict(sorted(gene_to_bioprocess_results.items(), key = lambda x: x[1]['bioprocess_count'], reverse = True))
counter = 0 
gene_to_bioprocess_results_copy = gene_to_bioprocess_results
for key in gene_to_bioprocess_results_copy.keys(): 
    if counter < 100: 
        if(('C0' in key) or ('C1' in key)): 
            try: 
                name = ht.query(key)['BiologicalProcess'][0]['name']
                gene_to_bioprocess_results[name] = gene_to_bioprocess_results[key]
                del gene_to_bioprocess_results[key]
            except: 
                pass
    counter = counter + 1

In [16]:
# gene_to_bioprocess_results
pandas.DataFrame.from_dict(gene_to_bioprocess_results, orient='index').iloc[0:50]

Unnamed: 0,bioprocess_count,genes_related
Growth,25,"[AGTR1, AGTR1, ALB, ALB, ALB, REN, REN, REN, A..."
enzyme activity,25,"[ALB, ALB, ALB, REN, REN, REN, REN, REN, REN, ..."
Apoptosis,22,"[AGTR1, AGTR1, AGTR1, ALB, ALB, ALB, ALB, REN,..."
Up-Regulation (Physiology),21,"[AGTR1, AGTR1, ALB, ALB, REN, REN, REN, ACE, A..."
"Transcription, Genetic",20,"[AGTR1, ALB, ALB, REN, REN, REN, REN, ACE, ACE..."
Signal Transduction,18,"[AGTR1, AGTR1, AGTR1, ALB, ALB, REN, REN, REN,..."
Water consumption,17,"[AGTR1, REN, REN, REN, REN, REN, REN, ACE, ACE..."
Cell Proliferation,17,"[AGTR1, AGTR1, ALB, ALB, ALB, REN, REN, ACE, A..."
renin activity,17,"[AGTR1, REN, REN, REN, REN, REN, REN, ACE, ACE..."
Down-Regulation,17,"[AGTR1, ALB, ALB, REN, REN, ACE, MB, CRP, CRP,..."


Result table interpretation: In agreement with argument made in article, top ranking results of "renin activity," "Angiogenic Process," and "ANGIOTENSIN MATURATIONs" are components of the RAS pathway / bioprocesses. 

### 3.3 Genes -> Chemical Substances

Look at chemical substances related to the genes, and then display top chemical substance occurrences in results (and which genes related to each chemical substances).

In [17]:
# chemical_substances
gene_to_chemical_substance = predict_many(gene_inputs, ['ChemicalSubstance'])
gene_to_chemical_substance_results = {}
gene_to_chemical_substance_genes = list(gene_to_chemical_substance["output_name"]) # create list of genes
gene_to_chemical_substance_genes = list(dict.fromkeys(gene_to_chemical_substance_genes))  # remove duplicates

for gene in gene_to_chemical_substance_genes: 
    gene_to_chemical_substance_results[gene] = {
        'chemical_substance_count' : 0,
        "genes_related" : []
    }

for index, row in gene_to_chemical_substance.iterrows():
    gene_to_chemical_substance_results[row['output_name']]['chemical_substance_count'] = gene_to_chemical_substance_results[row['output_name']]['chemical_substance_count'] + 1
    gene_to_chemical_substance_results[row['output_name']]['genes_related'].append(row['input'])

In [23]:
gene_to_chemical_substance_results = dict(sorted(gene_to_chemical_substance_results.items(), key = lambda x: x[1]['chemical_substance_count'], reverse = True))
pandas.DataFrame.from_dict(gene_to_chemical_substance_results, orient='index').iloc[0:20]

Unnamed: 0,chemical_substance_count,genes_related
(+)-ALDOSTERONE,45,"[AGTR1, AGTR1, AGTR1, AGTR1, AGTR1, REN, REN, ..."
(+)-GLUCOSE,45,"[AGTR1, AGTR1, ALB, ALB, ALB, ALB, ALB, REN, R..."
(2-BUTYL-4-CHLORO-1-{[2'-(1H-TETRAZOL-5-YL)BIPHENYL-4-YL]METHYL}-1H-IMIDAZOL-5-YL)METHANOL,35,"[AGTR1, AGTR1, AGTR1, AGTR1, AGTR1, AGTR1, AGT..."
EDRF,32,"[AGTR1, AGTR1, AGTR1, ALB, REN, REN, REN, REN,..."
"(2S-(1(R*(R*)),2ALPHA,3ABETA,6ABETA))-1-(2-((1-(ETHOXYCARBONYL)-3-PHENYLPROPYL)AMINO)-1-OXOPROPYL)OCTAHYDROCYCLOPENTA(B)PYRROLE-2-CARBOXYLIC ACID",32,"[AGTR1, ALB, REN, REN, REN, ACE, ACE, ACE, ACE..."
PHARMACEUTICAL PREPARATIONS,31,"[AGTR1, ALB, ALB, ALB, ALB, ALB, REN, REN, REN..."
AMIAS,30,"[AGTR1, AGTR1, AGTR1, AGTR1, AGTR1, AGTR1, AGT..."
10% SODIUM CHLORIDE INJECTION,30,"[AGTR1, AGTR1, ALB, ALB, ALB, ALB, ALB, REN, R..."
ALISKIREN,29,"[AGTR1, REN, REN, REN, REN, REN, REN, REN, REN..."
POTASSIUM,28,"[ALB, ALB, REN, REN, REN, REN, REN, REN, REN, ..."


In agreement with article, (+)-ALDOSTERONE (top occurring result) is a critical component in the RAS pathway, as well as ANGIOTENSIN CONVERTING ENZYME INHIBITORS. 


### 3.4 Genes -> Anatomical Entities

Look at anatomical entities related to the genes, and then display top anatomical entity occurrences in results (and which genes related to each anatomical entities).

In [19]:
ints_to_anatomical_entity = predict_many(gene_inputs, ['AnatomicalEntity'])
list(dict.fromkeys(list(ints_to_anatomical_entity["output_name"])))
# anatomical_entity
int_to_anatomical_entity_results = {}
int_to_anatomical_entity_ints = list(ints_to_anatomical_entity["output_name"]) # create list of ints
int_to_anatomical_entity_ints = list(dict.fromkeys(int_to_anatomical_entity_ints))  # remove duplicates

for int in int_to_anatomical_entity_ints: 
    int_to_anatomical_entity_results[int] = {
        'anatomical_entity_count' : 0,
        "ints_related" : []
    }

for index, row in ints_to_anatomical_entity.iterrows():
    int_to_anatomical_entity_results[row['output_name']]['anatomical_entity_count'] = int_to_anatomical_entity_results[row['output_name']]['anatomical_entity_count'] + 1
    int_to_anatomical_entity_results[row['output_name']]['ints_related'].append(row['input'])
    

int_to_anatomical_entity_results = dict(sorted(int_to_anatomical_entity_results.items(), key = lambda x: x[1]['anatomical_entity_count'], reverse = True))

    
# int_to_anatomical_entity_results
pandas.DataFrame.from_dict(int_to_anatomical_entity_results, orient='index').iloc[0:50]

Unnamed: 0,anatomical_entity_count,ints_related
LUNG,17,"[AGTR1, ALB, REN, REN, ACE, MB, MB, CRP, CRP, ..."
BLOOD,16,"[AGTR1, ALB, ALB, REN, REN, ACE, MB, CRP, CRP,..."
PORTION OF SKIN,15,"[AGTR1, ALB, ALB, REN, ACE, ACE, CRP, CRP, CRP..."
BRAIN,14,"[AGTR1, ALB, REN, ACE, MB, MB, CRP, CRP, CRP, ..."
ADIPOSE,14,"[ALB, ALB, REN, REN, MB, MB, CRP, CRP, CRP, CR..."
KIDNEY,13,"[AGTR1, ALB, REN, ACE, MB, MB, CRP, CRP, ACE, ..."
PORTION OF TISSUE,11,"[AGTR1, ALB, REN, ACE, MB, CRP, CRP, ACE, DPP4..."
CARDIUM,11,"[AGTR1, ALB, REN, ACE, MB, CRP, CRP, ACE, DPP4..."
MATERIAL ANATOMICAL ENTITY,10,"[AGTR1, REN, ACE, MB, CRP, CRP, ACE, DPP4, TH,..."
IECUR,10,"[ALB, REN, ACE, MB, CRP, CRP, ACE, DPP4, TH, REN]"


In agreement with article, which states “...the Bradykinin-Storm is likely to affect major organs that are regulated by angiotensin derivatives. These include altered electrolyte balance from affected kidney and heart tissue, arrhythmia in dysregulated cardiac tissue, neurological disruptions in the brain, myalgia in muscles and severe alterations in oxygen uptake in the lung itself.” as well as “Finally, COVID-19 patients also frequently display skin rashes including ‘covid-toe’ that appear to be related to dysfunction of the underlying vasculature,” Lung, blood, portion of skin, brain, kidney, and cardium (heart) are all top ranking results related to genes.

Of note: adipose tissue is an iteresting top ranking result, and the connection between obesity being a risk-factor for severe COVID illness and symptom related genes being prevalent in adipose tissue may want to be investigated. 

### 3.5 Genes -> Chemical Substances <- Vasodilation

Look at determined Gene inputs (related to both COVID and vasodilation/hypotension) and determine chemical substances related to these genes that are also related to vasodilation symptom

In [90]:
genes_to_chem_to_vasodilation = predict_many(gene_inputs, [vasodilation], ['ChemicalSubstance'])

In [91]:
genes_to_chem_to_vasodilation_list = list(genes_to_chem_to_vasodilation["node1_name"])

In [102]:
d = {x:genes_to_chem_to_vasodilation.count(x) for x in genes_to_chem_to_vasodilation}

In [107]:
## display counts for occurrence of each chemical in results
genes_to_chem_to_vasodilation_df = pandas.DataFrame.from_dict({k: v for k, v in sorted(d.items(), key=lambda item: item[1], reverse = True)}, orient='index').iloc[0:20]
genes_to_chem_to_vasodilation_df.columns = ["count"]
genes_to_chem_to_vasodilation_df

Unnamed: 0,count
(+)-ALDOSTERONE,45
(+)-GLUCOSE,45
(-)-NORADRENALINE,25
"2-(3,4-DIHYDROXYPHENYL)ETHYLAMINE",25
3-(2-AMINOETHYL)-1H-INDOL-5-OL,24
17BETA-HYDROXY-4-ANDROSTEN-3-ONE,22
(15S)-PROSTAGLANDIN E2,22
(-)-CAPTOPRIL,22
LEAD,15
CHEBI:35222,14


In agreement with article, (+)-ALDOSTERONE ranks most highly, and other top results include ANGIOTENSIN and Bradykinin (ARG-PRO-PRO-GLY-PHE-SER-PRO-PHE-ARG).

## 4 Exploring COVID-19 to Hyaluronic Acid Connection

Look at genes connected to both COVID and HYA, and hypotehsize what pathways may be involved in the HYA production 


### 4.1 COVID-19 -> Genes <- Hyaluronic Acid Explain Query

In [20]:
## Get Hayluronic Acid node
HYA = ht.query('hyaluronic acid')['ChemicalSubstance'][0]
HYA

{'CHEBI': 'CHEBI:16336',
 'name': 'hyaluronic acid',
 'CAS': '9004-61-9',
 'formula': '(C14H21NO12)n',
 'primary': {'identifier': 'CHEBI',
  'cls': 'ChemicalSubstance',
  'value': 'CHEBI:16336'},
 'display': 'CHEBI(CHEBI:16336) name(hyaluronic acid) CAS(9004-61-9) formula((C14H21NO12)n)',
 'type': 'ChemicalSubstance'}

In [81]:
fc3 = FindConnection(input_obj=covid, output_obj=HYA, intermediate_nodes=['Gene'])
fc3.connect(verbose=False)
df3 = fc3.display_table_view()
df3

Unnamed: 0,input,input_type,pred1,pred1_source,pred1_api,pred1_pubmed,node1_type,node1_name,node1_id,pred2,pred2_source,pred2_api,pred2_pubmed,output_type,output_name,output_id
0,2019 NOVEL CORONAVIRUS,Disease,related_to,DISEASE,DISEASES API,,Gene,ALB,NCBIGene:213,physically_interacts_with,SEMMED,SEMMED Chemical API,7152761,Gene,ACIDE HYALURONIQUE,name:ACIDE HYALURONIQUE
1,2019 NOVEL CORONAVIRUS,Disease,related_to,DISEASE,DISEASES API,,Gene,ALB,NCBIGene:213,related_to,CTD,CTD API,16642209,Gene,ACIDE HYALURONIQUE,name:ACIDE HYALURONIQUE
2,2019 NOVEL CORONAVIRUS,Disease,related_to,DISEASE,DISEASES API,,Gene,IL2RA,NCBIGene:3559,physically_interacts_with,SEMMED,SEMMED Chemical API,12090468,Gene,ACIDE HYALURONIQUE,name:ACIDE HYALURONIQUE
3,2019 NOVEL CORONAVIRUS,Disease,related_to,DISEASE,DISEASES API,,Gene,TNF,NCBIGene:7124,physically_interacts_with,SEMMED,SEMMED Chemical API,"20601239,2171539,23765644,23903893,27837681,85...",Gene,ACIDE HYALURONIQUE,name:ACIDE HYALURONIQUE
4,2019 NOVEL CORONAVIRUS,Disease,related_to,DISEASE,DISEASES API,,Gene,TNF,NCBIGene:7124,positively_regulates,SEMMED,SEMMED Chemical API,11341374240584138514850,Gene,ACIDE HYALURONIQUE,name:ACIDE HYALURONIQUE
5,2019 NOVEL CORONAVIRUS,Disease,related_to,DISEASE,DISEASES API,,Gene,TNF,NCBIGene:7124,negatively_regulated_by,SEMMED,SEMMED Chemical API,1401082,Gene,ACIDE HYALURONIQUE,name:ACIDE HYALURONIQUE
6,2019 NOVEL CORONAVIRUS,Disease,related_to,DISEASE,DISEASES API,,Gene,TNF,NCBIGene:7124,coexists_with,SEMMED,SEMMED Chemical API,106160011766632127027581,Gene,ACIDE HYALURONIQUE,name:ACIDE HYALURONIQUE
7,2019 NOVEL CORONAVIRUS,Disease,related_to,DISEASE,DISEASES API,,Gene,TNF,NCBIGene:7124,produced_by,SEMMED,SEMMED Chemical API,1401082,Gene,ACIDE HYALURONIQUE,name:ACIDE HYALURONIQUE
8,2019 NOVEL CORONAVIRUS,Disease,related_to,DISEASE,DISEASES API,,Gene,TNF,NCBIGene:7124,negatively_regulates,SEMMED,SEMMED Chemical API,20398644,Gene,ACIDE HYALURONIQUE,name:ACIDE HYALURONIQUE
9,2019 NOVEL CORONAVIRUS,Disease,related_to,DISEASE,DISEASES API,,Gene,TNF,NCBIGene:7124,positively_regulated_by,SEMMED,SEMMED Chemical API,1401082,Gene,ACIDE HYALURONIQUE,name:ACIDE HYALURONIQUE


### 4.2 COVID/HYA Genes -> Pathways

In [108]:
## get related genes and turn them into nodes
genes_related_to_HYA = list(df3["node1_name"])
# get gene inputs through hint module
gene_inputs_2 = []
for gene in genes_related_to_HYA: 
    try: 
        gene_input = ht.query(gene)["Gene"][0]
        gene_inputs_2.append(gene_input)
    except: 
        print(gene + ' Failed')

In [112]:
## Query Genes -> Pathways
HYA_gene_to_pathways = predict_many(gene_inputs_2, ['Pathway'])

In [113]:
# Display Pathway Counts and Genes related to each pathway 
gene_to_pathway_results = {}
gene_to_pathway_genes = list(HYA_gene_to_pathways["output_name"]) # create list of genes
gene_to_pathway_genes = list(dict.fromkeys(gene_to_pathway_genes))  # remove duplicates

for gene in gene_to_pathway_genes: 
    gene_to_pathway_results[gene] = {
        'pathway_count' : 0,
        "genes_related" : []
    }

for index, row in gene_to_pathways_2.iterrows():
    gene_to_pathway_results[row['output_name']]['pathway_count'] = gene_to_pathway_results[row['output_name']]['pathway_count'] + 1
    gene_to_pathway_results[row['output_name']]['genes_related'].append(row['input'])
    

gene_to_pathway_results = dict(sorted(gene_to_pathway_results.items(), key = lambda x: x[1]['pathway_count'], reverse = True))

# gene_to_pathway_results
pandas.DataFrame.from_dict(gene_to_pathway_results, orient='index').iloc[0:50]

Unnamed: 0,pathway_count,genes_related
CYTOKINE SIGNALING IN IMMUNE SYSTEM,21,"[IL2RA, TNF, TNF, TNF, TNF, TNF, TNF, TNF, TNF..."
IMMUNE SYSTEM,21,"[IL2RA, TNF, TNF, TNF, TNF, TNF, TNF, TNF, TNF..."
SIGNALING BY INTERLEUKINS,19,"[IL2RA, TNF, TNF, TNF, TNF, TNF, TNF, TNF, TNF..."
SELENIUM MICRONUTRIENT NETWORK,18,"[ALB, ALB, TNF, TNF, TNF, TNF, TNF, TNF, TNF, ..."
VITAMIN B12 METABOLISM,18,"[ALB, ALB, TNF, TNF, TNF, TNF, TNF, TNF, TNF, ..."
FOLATE METABOLISM,18,"[ALB, ALB, TNF, TNF, TNF, TNF, TNF, TNF, TNF, ..."
REGULATION OF TOLL-LIKE RECEPTOR SIGNALING PATHWAY,18,"[TNF, TNF, TNF, TNF, TNF, TNF, TNF, TNF, TNF, ..."
LTF DANGER SIGNAL RESPONSE PATHWAY,18,"[TNF, TNF, TNF, TNF, TNF, TNF, TNF, TNF, TNF, ..."
TOLL-LIKE RECEPTOR SIGNALING PATHWAY,18,"[TNF, TNF, TNF, TNF, TNF, TNF, TNF, TNF, TNF, ..."
SIGNAL TRANSDUCTION,17,"[IL2RA, TNF, TNF, TNF, TNF, TNF, TNF, TNF, TNF..."


Interestingly, cytokine signaling is the pathway most indicated as being related to genes that are related to both COVID and HYA. This may be in agreement with a large amount of research indicating cytokine concentration elevation to be correlated with severe COVID cases: 


- Cao, Xuetao. "COVID-19: immunopathology and its implications for therapy." Nature reviews immunology 20.5 (2020): 269-270.

- Mangalmurti, Nilam, and Christopher A. Hunter. "Cytokine storms: understanding COVID-19." Immunity (2020).

- Wu, Dandan, and Xuexian O. Yang. "TH17 responses in cytokine storm of COVID-19: An emerging target of JAK2 inhibitor Fedratinib." Journal of Microbiology, Immunology and Infection (2020).



Additionally, and interestingly, past research has indicated a role of cytokines in hyaluronic acid production / degredation: 

- Sampson, Phyllis M., et al. "Cytokine regulation of human lung fibroblast hyaluronan (hyaluronic acid) production. Evidence for cytokine-regulated hyaluronan (hyaluronic acid) degradation and human lung fibroblast-derived hyaluronidase." The Journal of clinical investigation 90.4 (1992): 1492-1503.


## 5 Summary 
### 5.1 Summary


- RAS Pathway and corresponding proteins, pathways, and processes, and chemicals were highly implicated through genes derived from COVID -> Genes <- Vasodilation / Hypotension query

- Anatomical Entities related to genes were very representative of areas where symptoms in COVID patients often occur

- Cytokine pathways may be relevant to different COVID symptoms (Hyaluronic Acid Production) than initially proposed


### 5.2 Future Directions 

- Investigate COVID & Hyaluronic Acid connection further
