# Addressing COVID-19 Patient RAS-mediated Bradykinin Storm Hypothesis with  Biothings Explorer

&emsp;

SLIDES LINK: https://docs.google.com/presentation/d/1cL0Y-2FECPP5rWlJGI_ZWsvBysZVUGU-oDq75VexFrA/edit?usp=sharing

# Table of Contents

## &emsp; 0 Imports
## &emsp; 1 Overview of Background and BTE Approach 
### &emsp; &emsp; 1.1 Article: Summary and Background 
### &emsp; &emsp; 1.2 Overview of BTE Approach 
## &emsp; 2 Determining Related Genes
### &emsp; &emsp; 2.1 Load COVID-19, Hypotension, and Vasodilation Nodes
### &emsp; &emsp; 2.2 COVID -> Genes <- Hypotension
### &emsp; &emsp; 2.3 COVID -> Genes <- Vasodilation 
### &emsp; &emsp; 2.4 Create Gene Nodes
## &emsp; 3 Analyzing and Exploring Gene Results
### &emsp; &emsp; 3.1 Genes -> Pathways
### &emsp; &emsp; 3.2 Genes -> Biological Processes 
### &emsp; &emsp; 3.3 Genes -> Chemical Substances
### &emsp; &emsp; 3.4 Genes -> Anatomical Entities
### &emsp; &emsp; 3.5 Genes -> Chemical Substances <- Vasodilation
## &emsp; 4 Exploring COVID-19 to Hyaluronic Acid Connection
### &emsp; &emsp; 4.1 COVID-19 -> Genes <- Hyaluronic Acid Explain Query
### &emsp; &emsp; 4.2 COVID/HYA Genes -> Pathways
## &emsp; 5 Summary 
### &emsp; &emsp; 5.1 Summary
### &emsp; &emsp; 5.2 Future Directions 

## 0 Imports

In [1]:
# Import pandas and biothings explorers modules
import pandas as pandas
from biothings_explorer.query.predict import Predict
from biothings_explorer.query.visualize import display_graph
from biothings_explorer.user_query_dispatcher import FindConnection
from biothings_explorer.hint import Hint
import nest_asyncio
nest_asyncio.apply()
%matplotlib inline
import warnings
warnings.filterwarnings("ignore") 
ht = Hint()
import math


## predict_many -> functionality to be fully incorporated into BTE soon, will no longer need following
def predict_many(input_object_list, output_type_list, intermediate_node_list = ''):
    df_list = []
    for input_object in input_object_list: 
        if('name' in input_object):
            for output_type in output_type_list: 
                if(len(intermediate_node_list) > 0):
                    for inter in intermediate_node_list:
                        try: 
#                             print("Running: " + input_object['name'] + ' --> intermediate type ' + inter + ' --> output type ' + output_type )
                            fc = FindConnection(input_obj=input_object, output_obj=output_type, intermediate_nodes=[inter])
                            fc.connect(verbose=False)
                            df = fc.display_table_view()
                            rows = df.shape[0]
                            if(rows > 0):
                                df_list.append(df)
                        except:
                            pass
#                             print(input_object['name'] + ' --> intermediate type ' + inter + ' --> output type ' + output_type + ' FAILED')
                else:
                    try:
#                         print("Running: " + input_object['name'] + ' --> output type ' + output_type )
                        fc = FindConnection(input_obj=input_object, output_obj=output_type, intermediate_nodes=None)
                        fc.connect(verbose=False)
                        df = fc.display_table_view()
                        rows = df.shape[0]
                        if(rows > 0):
                            df_list.append(df)
                    except:
                        pass
#                         print(input_object['name'] + ' --> output type ' + output_type + ' FAILED')

    if(len(df_list) > 0):
        return pandas.concat(df_list)
    else:
        return None
    
    

def create_gene_inputs(gene_list):
    # get gene inputs through hint module
    gene_inputs = []
    for gene in gene_list: 
        try: 
            gene_input = ht.query(gene)["Gene"][0]
            gene_inputs.append(gene_input)
        except: 
            print(gene + ' Failed')
    return(gene_inputs)

  from tqdm.autonotebook import tqdm


## 1 Overview of Background and BTE Approach
&emsp;
### 1.1 Article: Summary and Background

Article Reference:

Garvin, Michael R., et al. "A mechanistic model and therapeutic interventions for COVID-19 involving a RAS-mediated bradykinin storm." Elife 9 (2020): e59177.


Article Link:  

https://elifesciences.org/articles/59177



Article Main Points: 

- RAS Pathway Imbalance implicated through gene expression analysis from cells in bronchoalveolar lavage fluid (BALF) from COVID-19 patients 

- Predicted RAS pathway imbalance to be cause of bradykinin-driven vascular dilation, vascular permeability and hypotension

- Leaky membranes -> allows Hyaluronic Acid (HYA) to permeate into lungs

- Analyses found that production of HYA was increased and the enzymes that could degrade it greatly decreased


### 1.2 Overview of BTE Approach 

As described in the article, vasodilation and hypotension are two distinctive signs and symptoms in severe COVID-19 cases that are predicted to lead to leaky membranes in the lung. Therefore, the following approach to determine whether or not this may be RAS-pathway linked is by looking at Genes that are both related to COVID and vasoconstriction or hypotension, and then analyzing this genes by looking at what pathways or processes they may be involved in, what chemical substances they may produce, and in what tissues or anatomical entities the genes are linked to. 

Also, starting in Section 4, a brief exploration of genes related to both COVID and Hyaluronic acid is done - HYA being a chemical found in the lungs of COVID patients that is hyper-absorbant and forms a gelatenous substance that blocks oxygen absorption by the lungs

## 2 Determining Related Genes
&emsp;
### 2.1 Load COVID-19, Hypotension, and Vasodilation Nodes
&emsp;


In [2]:
covid = ht.query('COVID-19')['Disease'][0]
covid

{'MONDO': 'MONDO:0100096',
 'DOID': 'DOID:0080600',
 'name': 'COVID-19',
 'primary': {'identifier': 'MONDO',
  'cls': 'Disease',
  'value': 'MONDO:0100096'},
 'display': 'MONDO(MONDO:0100096) DOID(DOID:0080600) name(COVID-19)',
 'type': 'Disease'}

In [3]:
SARS = ht.query("severe acute respiratory syndrome")["Disease"][0]
SARS

{'MONDO': 'MONDO:0005091',
 'DOID': 'DOID:2945',
 'UMLS': 'C1175175',
 'name': 'severe acute respiratory syndrome',
 'MESH': 'D045169',
 'ORPHANET': '140896',
 'primary': {'identifier': 'MONDO',
  'cls': 'Disease',
  'value': 'MONDO:0005091'},
 'display': 'MONDO(MONDO:0005091) DOID(DOID:2945) ORPHANET(140896) UMLS(C1175175) MESH(D045169) name(severe acute respiratory syndrome)',
 'type': 'Disease'}

In [4]:
ACE2 = ht.query("ACE2")["Gene"][0]
ACE2

{'NCBIGene': '59272',
 'name': 'angiotensin I converting enzyme 2',
 'SYMBOL': 'ACE2',
 'UMLS': 'C1422064',
 'HGNC': '13557',
 'UNIPROTKB': 'Q9BYF1',
 'ENSEMBL': 'ENSG00000130234',
 'primary': {'identifier': 'NCBIGene', 'cls': 'Gene', 'value': '59272'},
 'display': 'NCBIGene(59272) ENSEMBL(ENSG00000130234) HGNC(13557) UMLS(C1422064) UNIPROTKB(Q9BYF1) SYMBOL(ACE2)',
 'type': 'Gene'}

In [5]:
fc1 = FindConnection(input_obj=covid, output_obj='Gene', intermediate_nodes=None)
fc1.connect(verbose=False)
df1 = fc1.display_table_view()

In [6]:
df1

Unnamed: 0,input,input_type,pred1,pred1_source,pred1_api,pred1_pubmed,output_type,output_name,output_id
0,2019 NOVEL CORONAVIRUS,Disease,related_to,DISEASE,DISEASES API,,Gene,CD4,NCBIGene:920
1,2019 NOVEL CORONAVIRUS,Disease,related_to,DISEASE,DISEASES API,,Gene,CCL2,NCBIGene:6347
2,2019 NOVEL CORONAVIRUS,Disease,related_to,DISEASE,DISEASES API,,Gene,IL2,NCBIGene:3558
3,2019 NOVEL CORONAVIRUS,Disease,related_to,DISEASE,DISEASES API,,Gene,IFNG,NCBIGene:3458
4,2019 NOVEL CORONAVIRUS,Disease,related_to,DISEASE,DISEASES API,,Gene,SH2D3A,NCBIGene:10045
5,2019 NOVEL CORONAVIRUS,Disease,related_to,DISEASE,DISEASES API,,Gene,CRP,NCBIGene:1401
6,2019 NOVEL CORONAVIRUS,Disease,related_to,scigraph,Automat CORD19 Scigraph API,,Gene,CRP,NCBIGene:1401
7,2019 NOVEL CORONAVIRUS,Disease,related_to,DISEASE,DISEASES API,,Gene,SPECC1,NCBIGene:92521
8,2019 NOVEL CORONAVIRUS,Disease,related_to,DISEASE,DISEASES API,,Gene,IL1B,NCBIGene:3553
9,2019 NOVEL CORONAVIRUS,Disease,related_to,DISEASE,DISEASES API,,Gene,IL7,NCBIGene:3574


In [7]:
query_config = {
    "annotate": ["nodeDegree"],
    "expand": False
}
## way to filter at the gene step for top x number of genes? 
pd2 = Predict(
    input_objs=[covid],
    intermediate_nodes =['Gene'], 
    output_types =['Gene'], 
    config= query_config 
)
pd2.connect(verbose=True)


Your query have 1 input nodes, including COVID-19 .... And BTE will find paths that connect your input nodes to your output types ['Gene']. Paths will contain 1 intermediate nodes.

Intermediate node #1 will have these type constraints: Gene


==== Step #1: Query Path Planning ====

Input Types: Disease
Output Types: Gene
Predicates: None

BTE found 8 APIs based on SmartAPI Meta-KG.

API 1. Automat CORD19 Scibite API (1 API calls)
API 2. Automat CORD19 Scigraph API (1 API calls)
API 3. MGIgene2phenotype API (1 API calls)
API 4. Automat PHAROS API (1 API calls)
API 5. DISEASES API (1 API calls)
API 6. BioLink API (2 API calls)
API 7. Molecular Data Provider API (1 API calls)
API 8. CORD Disease API (1 API calls)


==== Step #2: Query path execution ====

NOTE: API requests are dispatched in parallel, so the list of APIs below is ordered by query time.

API 3.1: https://biothings.ncats.io/mgigene2phenotype/query?fields=_id&size=300 (POST -d q=DOID:0080600&scopes=mgi.associated_with_dise

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/biothings_explorer/call_apis/__init__.py", line 302, in callSingleAPI
    result = tf.transform()
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/biothings_explorer/call_apis/api_response_transform/__init__.py", line 49, in transform
    return self.tf.transform()
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/biothings_explorer/call_apis/api_response_transform/transformers/base_transformer.py", line 78, in transform
    item = self.wrap(item)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/biothings_explorer/call_apis/api_response_transform/transformers/cord_transformer.py", line 16, in wrap
    for k in item.keys():
RuntimeError: dictionary keys changed during iteration


API call to CORD Gene API with input ['11947', '17697', '8568', '5414', '1706', '9208', '2367', '5438', '10637', '399', '3541', '79', '6008', '2537', '1678', '16884', '5417', '4989', '6025', '6001', '11183', '13523', '2707', '27960', '4113', '6741', '6023', '2524', '30615', '9958', '3009', '5981', '11782', '1116', '9281', '5962', '338', '10618', '11876', '16885', '6018', '336', '10627', '5992', '6915', '11892', '195', '3535', '27954', '19679', '5434', '6898', '13557', '4552'] failed with unknown response
API 2.5: https://api.monarchinitiative.org/api/bioentity/gene/NCBIGene:28/interactions?direct=True&rows=200&unselect_evidence=True
API 2.5 BioLink API: 13 hits
API 4.1: https://biothings.ncats.io/semmedgene/query?fields=affected_by (POST -d q=C1540024,C1413716,C1333916,C0919550,C1335232,C1334124,C0035094,C1412186,C0879590,C1413931,C1416946,C1412332,C1426329,C1413244,C1332714,C0081714,C1332690,C0054871,C1415274,C1334122,C1336641,C1366571,C1332807,C1334085,C1334098,C1419338,C1439284,C182

API 2.22 BioLink API: 199 hits
API 2.23: https://api.monarchinitiative.org/api/bioentity/gene/NCBIGene:3627/interactions?direct=True&rows=200&unselect_evidence=True
API 2.23: https://api.monarchinitiative.org/api/bioentity/gene/NCBIGene:213/interactions?direct=True&rows=200&unselect_evidence=True
API 2.23 BioLink API: 199 hits
API 2.24 BioLink API: 197 hits
API 4.7 SEMMED Gene API: 4459 hits
API 4.8 SEMMED Gene API: 6219 hits
API 2.25: https://api.monarchinitiative.org/api/bioentity/gene/NCBIGene:10332/interactions?direct=True&rows=200&unselect_evidence=True
API 2.25 BioLink API: 15 hits
API 4.9: https://biothings.ncats.io/semmedgene/query?fields=related_to (POST -d q=C1540024,C1413716,C1333916,C0919550,C1335232,C1334124,C0035094,C1412186,C0879590,C1413931,C1416946,C1412332,C1426329,C1413244,C1332714,C0081714,C1332690,C0054871,C1415274,C1334122,C1336641,C1366571,C1332807,C1334085,C1334098,C1419338,C1439284,C1823096,C1539099,C1337092,C1420718,C1334112,C1332036,C1334114,C1335240,C1417035

API 2.64 BioLink API: 0 hits
API 2.65: https://api.monarchinitiative.org/api/bioentity/gene/NCBIGene:10045/interactions?direct=True&rows=200&unselect_evidence=True
API 2.65 BioLink API: 0 hits
API 2.66: https://api.monarchinitiative.org/api/bioentity/gene/NCBIGene:92521/interactions?direct=True&rows=200&unselect_evidence=True
API 2.66 BioLink API: 0 hits
API 2.67: https://api.monarchinitiative.org/api/bioentity/gene/NCBIGene:3558/interactions?direct=True&rows=200&unselect_evidence=True
API 2.67: https://api.monarchinitiative.org/api/bioentity/gene/NCBIGene:3458/interactions?direct=True&rows=200&unselect_evidence=True
API 2.67 BioLink API: 0 hits
API 2.68: https://api.monarchinitiative.org/api/bioentity/gene/NCBIGene:6347/interactions?direct=True&rows=200&unselect_evidence=True
API 2.68 BioLink API: 0 hits
API 2.69 BioLink API: 0 hits
API 2.70: https://api.monarchinitiative.org/api/bioentity/gene/NCBIGene:5972/interactions?direct=True&rows=200&unselect_evidence=True
API 2.70 BioLink API

In [8]:
df2 = pd2.display_table_view(extra_fields=["nodeDegree"]).sort_values(by="output_degree")
df2 = df2[df2["output_degree"] > 20]
genes_related_to_cov_w_int = list(dict.fromkeys(list(df2["output_label"])))
genes_related_to_cov_w_int = [x for x in genes_related_to_cov_w_int if not x.startswith('UMLS')]
genes_related_to_cov_w_int

['TLR7',
 'CCL20',
 'CD79A',
 'LBR',
 'ITIH4',
 'SP1',
 'IFNB1',
 'PTH',
 'SGSM3',
 'STAT6',
 'PIK3CB',
 'MYD88',
 'PTPN11',
 'CD14',
 'MPO',
 'CD44',
 'SERPINE1',
 'HSPA4',
 'PPARA',
 'CDKN1A',
 'IL9',
 'IRF3',
 'TNFRSF10B',
 'ERCC8',
 'ESR1',
 'SOCS3',
 'TNFSF11',
 'TRBV20OR9-2',
 'CSH1',
 'MAPK3',
 'HSPA9',
 'CD2',
 'TLR3',
 'ICAM1',
 'TAT',
 'PLAT',
 'HLA-E',
 'EGF',
 'TFRC',
 'ACE',
 'ADA',
 'SYK',
 'GORASP1',
 'AGTR1',
 'ERVK-10',
 'PTPRC',
 'IL3',
 'CD69',
 'ISG20',
 'CREB1',
 'SYT1',
 'CCL3',
 'AHSA1',
 'LEP',
 'IGHE',
 'POLDIP2',
 'CD80',
 'PIK3CA',
 'ALB',
 'CXCR4',
 'POMC',
 'GRAP2',
 'ITGAM',
 'FAS',
 'CA2',
 'MMP2',
 'APP',
 'CISH',
 'IL2RA',
 'MMP9',
 'CD28',
 'CCR5',
 'TP53',
 'CAT',
 'EGFR',
 'RELA',
 'ATM',
 'STAT5A',
 'CD40',
 'TLR4',
 'INS',
 'CCL5',
 'CCL2',
 'FOS',
 'ANG',
 'IL10',
 'REN',
 'IL6',
 'STAT3',
 'MAPK8',
 'CRK',
 'CD4',
 'CAMP',
 'IFNA1',
 'IL2',
 'TGFB1',
 'AKT1',
 'VEGFA',
 'MAPK1',
 'TNF']

In [10]:
len(genes_related_to_cov_w_int)

100

In [11]:
# for x in fc1.fc.G['COVID-19'].values():
#     print(x[0]['info'])

## create Gene list
genes_related_to_covid = list(dict.fromkeys(list(df1["output_name"]) + genes_related_to_cov_w_int))
covid_gene_inputs = create_gene_inputs(genes_related_to_covid)
covid_gene_inputs

TMEM27 Failed


[{'NCBIGene': '920',
  'name': 'CD4 molecule',
  'SYMBOL': 'CD4',
  'UMLS': 'C1332714',
  'HGNC': '1678',
  'UNIPROTKB': 'P01730',
  'ENSEMBL': 'ENSG00000010610',
  'primary': {'identifier': 'NCBIGene', 'cls': 'Gene', 'value': '920'},
  'display': 'NCBIGene(920) ENSEMBL(ENSG00000010610) HGNC(1678) UMLS(C1332714) UNIPROTKB(P01730) SYMBOL(CD4)',
  'type': 'Gene'},
 {'NCBIGene': '6347',
  'name': 'C-C motif chemokine ligand 2',
  'SYMBOL': 'CCL2',
  'UMLS': 'C1337092',
  'HGNC': '10618',
  'UNIPROTKB': 'P13500',
  'ENSEMBL': 'ENSG00000108691',
  'primary': {'identifier': 'NCBIGene', 'cls': 'Gene', 'value': '6347'},
  'display': 'NCBIGene(6347) ENSEMBL(ENSG00000108691) HGNC(10618) UMLS(C1337092) UNIPROTKB(P13500) SYMBOL(CCL2)',
  'type': 'Gene'},
 {'NCBIGene': '3558',
  'name': 'interleukin 2',
  'SYMBOL': 'IL2',
  'UMLS': 'C0879590',
  'HGNC': '6001',
  'UNIPROTKB': 'P60568',
  'ENSEMBL': 'ENSG00000109471',
  'primary': {'identifier': 'NCBIGene', 'cls': 'Gene', 'value': '3558'},
  'displa

In [12]:
genes_related_to_AE = predict_many(covid_gene_inputs,[ACE2],['AnatomicalEntity'])

The first query doesn't return any result. So BTE does not find any connection between your input and output


In [13]:
genes_related_to_AE

Unnamed: 0,input,input_type,pred1,pred1_source,pred1_api,pred1_pubmed,node1_type,node1_name,node1_id,pred2,pred2_source,pred2_api,pred2_pubmed,output_type,output_name,output_id
0,CD4,Gene,related_to,,BioLink API,,AnatomicalEntity,EPITHELIUM OF LACTIFEROUS GLAND,UBERON:UBERON:0003244,related_to,,BioLink API,,AnatomicalEntity,ACE2,NCBIGene:59272
1,CD4,Gene,related_to,Translator Text Mining Provider,CORD Gene API,,AnatomicalEntity,ENDOMETRIUM,UBERON:UBERON:0001295,related_to,,BioLink API,,AnatomicalEntity,ACE2,NCBIGene:59272
2,CD4,Gene,related_to,Translator Text Mining Provider,CORD Gene API,,AnatomicalEntity,LIEN,UBERON:UBERON:0002106,related_to,,BioLink API,,AnatomicalEntity,ACE2,NCBIGene:59272
3,CD4,Gene,related_to,Translator Text Mining Provider,CORD Gene API,,AnatomicalEntity,BRAIN,UBERON:UBERON:0000955,related_to,,BioLink API,,AnatomicalEntity,ACE2,NCBIGene:59272
4,CD4,Gene,related_to,,BioLink API,,AnatomicalEntity,COLUMNAR EPITHELIUM OF THE FALLOPIAN TUBE,UBERON:UBERON:0004804,related_to,,BioLink API,,AnatomicalEntity,ACE2,NCBIGene:59272
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1,MAPK1,Gene,related_to,Translator Text Mining Provider,CORD Gene API,,AnatomicalEntity,ANATOMICAL UNIT,UBERON:UBERON:0000062,related_to,Translator Text Mining Provider,CORD Gene API,,AnatomicalEntity,ACE2,NCBIGene:59272
2,MAPK1,Gene,related_to,Translator Text Mining Provider,CORD Gene API,,AnatomicalEntity,PORTION OF TISSUE,UBERON:UBERON:0000479,related_to,Translator Text Mining Provider,CORD Gene API,,AnatomicalEntity,ACE2,NCBIGene:59272
3,MAPK1,Gene,related_to,Translator Text Mining Provider,CORD Gene API,,AnatomicalEntity,CARDIUM,UBERON:UBERON:0000948,related_to,Translator Text Mining Provider,CORD Gene API,,AnatomicalEntity,ACE2,NCBIGene:59272
4,MAPK1,Gene,related_to,Translator Text Mining Provider,CORD Gene API,,AnatomicalEntity,BLOOD SERUM,UBERON:UBERON:0001977,related_to,Translator Text Mining Provider,CORD Gene API,,AnatomicalEntity,ACE2,NCBIGene:59272


In [14]:
gene_list_2 = list(genes_related_to_AE["input"])
d = {x:gene_list_2.count(x) for x in gene_list_2}
d2_gene_df = pandas.DataFrame.from_dict({k: v for k, v in sorted(d.items(), key=lambda item: item[1], reverse = True)}, orient='index').iloc[0:50]
d2_gene_df.columns = ["count"]
d2_gene_df


Unnamed: 0,count
ACE2,31
ACE,17
CRP,16
TNF,16
REN,15
TH,15
CISH,15
IL2RA,13
IL10,13
AGTR1,13


In [15]:
d2_genes = list(d2_gene_df[d2_gene_df["count"] >= 5].index)
d2_genes

['ACE2',
 'ACE',
 'CRP',
 'TNF',
 'REN',
 'TH',
 'CISH',
 'IL2RA',
 'IL10',
 'AGTR1',
 'INS',
 'CD4',
 'POMC',
 'CA2',
 'CD28',
 'CCL2',
 'IFNA1',
 'F3',
 'IL6',
 'PTH',
 'LEP',
 'ALB',
 'GPT',
 'CD2',
 'ISG20',
 'GRAP2',
 'FAS',
 'CCR5',
 'CAT',
 'EGFR',
 'AKT1',
 'AGTR2',
 'MB',
 'CCL20',
 'ICAM1',
 'PIK3CA',
 'CXCL8',
 'IL17A',
 'DPP4',
 'CD44',
 'IL9',
 'ESR1',
 'EGF',
 'CD80',
 'TP53',
 'TLR4',
 'FOS',
 'ANG',
 'F2',
 'SARS2']

In [16]:
v2_gene_inputs = create_gene_inputs(d2_genes)
v2_gene_inputs

[{'NCBIGene': '59272',
  'name': 'angiotensin I converting enzyme 2',
  'SYMBOL': 'ACE2',
  'UMLS': 'C1422064',
  'HGNC': '13557',
  'UNIPROTKB': 'Q9BYF1',
  'ENSEMBL': 'ENSG00000130234',
  'primary': {'identifier': 'NCBIGene', 'cls': 'Gene', 'value': '59272'},
  'display': 'NCBIGene(59272) ENSEMBL(ENSG00000130234) HGNC(13557) UMLS(C1422064) UNIPROTKB(Q9BYF1) SYMBOL(ACE2)',
  'type': 'Gene'},
 {'NCBIGene': '1636',
  'name': 'angiotensin I converting enzyme',
  'SYMBOL': 'ACE',
  'UMLS': 'C1413931',
  'HGNC': '2707',
  'UNIPROTKB': 'P12821',
  'ENSEMBL': 'ENSG00000159640',
  'primary': {'identifier': 'NCBIGene', 'cls': 'Gene', 'value': '1636'},
  'display': 'NCBIGene(1636) ENSEMBL(ENSG00000159640) HGNC(2707) UMLS(C1413931) UNIPROTKB(P12821) SYMBOL(ACE)',
  'type': 'Gene'},
 {'NCBIGene': '1401',
  'name': 'C-reactive protein',
  'SYMBOL': 'CRP',
  'UMLS': 'C1413716',
  'HGNC': '2367',
  'UNIPROTKB': 'P02741',
  'ENSEMBL': 'ENSG00000132693',
  'primary': {'identifier': 'NCBIGene', 'cls': 

In [17]:
genes_related_to_phens = predict_many(v2_gene_inputs,[SARS,covid],['BiologicalProcess','PhenotypicFeature'])
genes_related_to_phens

The first query doesn't return any result. So BTE does not find any connection between your input and output
The second query doesn't return any result. So BTE does not find any connection between your input and output
The first query doesn't return any result. So BTE does not find any connection between your input and output
The second query doesn't return any result. So BTE does not find any connection between your input and output
The second query doesn't return any result. So BTE does not find any connection between your input and output
The first query doesn't return any result. So BTE does not find any connection between your input and output
The second query doesn't return any result. So BTE does not find any connection between your input and output
The first query doesn't return any result. So BTE does not find any connection between your input and output
The second query doesn't return any result. So BTE does not find any connection between your input and output
The second que

The first query doesn't return any result. So BTE does not find any connection between your input and output
The first query doesn't return any result. So BTE does not find any connection between your input and output
The second query doesn't return any result. So BTE does not find any connection between your input and output
The first query doesn't return any result. So BTE does not find any connection between your input and output
The first query doesn't return any result. So BTE does not find any connection between your input and output
The second query doesn't return any result. So BTE does not find any connection between your input and output
The first query doesn't return any result. So BTE does not find any connection between your input and output
The second query doesn't return any result. So BTE does not find any connection between your input and output
The second query doesn't return any result. So BTE does not find any connection between your input and output
The first query

Unnamed: 0,input,input_type,pred1,pred1_source,pred1_api,pred1_pubmed,node1_type,node1_name,node1_id,pred2,pred2_source,pred2_api,pred2_pubmed,output_type,output_name,output_id
0,ACE2,Gene,affects,SEMMED,SEMMED Gene API,15825152,BiologicalProcess,C0007613,UMLS:C0007613,affects,SEMMED,SEMMED Disease API,19364769,BiologicalProcess,ACUTE RESPIRATORY CORONAVIRUS INFECTION,MONDO:MONDO:0005091
1,ACE2,Gene,disrupts,SEMMED,SEMMED Gene API,27806985,BiologicalProcess,C0007613,UMLS:C0007613,affects,SEMMED,SEMMED Disease API,19364769,BiologicalProcess,ACUTE RESPIRATORY CORONAVIRUS INFECTION,MONDO:MONDO:0005091
2,ACE2,Gene,affects,SEMMED,SEMMED Gene API,23013041,BiologicalProcess,C0017262,UMLS:C0017262,affects,SEMMED,SEMMED Disease API,1256595412650527127499043208739,BiologicalProcess,ACUTE RESPIRATORY CORONAVIRUS INFECTION,MONDO:MONDO:0005091
3,ACE2,Gene,affects,SEMMED,SEMMED Gene API,23013041,BiologicalProcess,C0017262,UMLS:C0017262,related_to,SEMMED,SEMMED Disease API,8049444,BiologicalProcess,ACUTE RESPIRATORY CORONAVIRUS INFECTION,MONDO:MONDO:0005091
4,ACE2,Gene,affects,SEMMED,SEMMED Gene API,23013041,BiologicalProcess,C0017262,UMLS:C0017262,affected_by,SEMMED,SEMMED Disease API,16174304,BiologicalProcess,ACUTE RESPIRATORY CORONAVIRUS INFECTION,MONDO:MONDO:0005091
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10,SARS2,Gene,related_to,Translator Text Mining Provider,CORD Gene API,,BiologicalProcess,MEMBRANE EVAGINATION,GO:GO:0006900,related_to,Translator Text Mining Provider,CORD Disease API,,BiologicalProcess,ACUTE RESPIRATORY CORONAVIRUS INFECTION,MONDO:MONDO:0005091
11,SARS2,Gene,related_to,Translator Text Mining Provider,CORD Gene API,,BiologicalProcess,HYPERSENSITIVITY,GO:GO:0002524,related_to,Translator Text Mining Provider,CORD Disease API,,BiologicalProcess,ACUTE RESPIRATORY CORONAVIRUS INFECTION,MONDO:MONDO:0005091
12,SARS2,Gene,related_to,Translator Text Mining Provider,CORD Gene API,,BiologicalProcess,LACTATION,GO:GO:0007595,related_to,Translator Text Mining Provider,CORD Disease API,,BiologicalProcess,ACUTE RESPIRATORY CORONAVIRUS INFECTION,MONDO:MONDO:0005091
0,SARS2,Gene,related_to,,BioLink API,,PhenotypicFeature,DIABETES MELLITUS,UMLS:C0011849,related_to,,BioLink API,,PhenotypicFeature,ACUTE RESPIRATORY CORONAVIRUS INFECTION,MONDO:MONDO:0005091


In [18]:
genes_related_to_phens[genes_related_to_phens["input"] == "ACE"]

Unnamed: 0,input,input_type,pred1,pred1_source,pred1_api,pred1_pubmed,node1_type,node1_name,node1_id,pred2,pred2_source,pred2_api,pred2_pubmed,output_type,output_name,output_id
0,ACE,Gene,affects,SEMMED,SEMMED Gene API,12570788,BiologicalProcess,C0007587,UMLS:C0007587,affects,SEMMED,SEMMED Disease API,17714515,BiologicalProcess,ACUTE RESPIRATORY CORONAVIRUS INFECTION,MONDO:MONDO:0005091
1,ACE,Gene,causes,SEMMED,SEMMED Gene API,27310436,BiologicalProcess,C0007587,UMLS:C0007587,affects,SEMMED,SEMMED Disease API,17714515,BiologicalProcess,ACUTE RESPIRATORY CORONAVIRUS INFECTION,MONDO:MONDO:0005091
2,ACE,Gene,affects,SEMMED,SEMMED Gene API,21349701,BiologicalProcess,C0007595,UMLS:C0007595,affects,SEMMED,SEMMED Disease API,16615058,BiologicalProcess,ACUTE RESPIRATORY CORONAVIRUS INFECTION,MONDO:MONDO:0005091
3,ACE,Gene,affects,SEMMED,SEMMED Gene API,179211619389791,BiologicalProcess,C0015895,UMLS:C0015895,affects,SEMMED,SEMMED Disease API,12409129,BiologicalProcess,ACUTE RESPIRATORY CORONAVIRUS INFECTION,MONDO:MONDO:0005091
4,ACE,Gene,disrupts,SEMMED,SEMMED Gene API,9482924,BiologicalProcess,C0015895,UMLS:C0015895,affects,SEMMED,SEMMED Disease API,12409129,BiologicalProcess,ACUTE RESPIRATORY CORONAVIRUS INFECTION,MONDO:MONDO:0005091
5,ACE,Gene,affects,SEMMED,SEMMED Gene API,179211619389791,BiologicalProcess,C0015895,UMLS:C0015895,related_to,SEMMED,SEMMED Disease API,115567701212620812383464,BiologicalProcess,ACUTE RESPIRATORY CORONAVIRUS INFECTION,MONDO:MONDO:0005091
6,ACE,Gene,disrupts,SEMMED,SEMMED Gene API,9482924,BiologicalProcess,C0015895,UMLS:C0015895,related_to,SEMMED,SEMMED Disease API,115567701212620812383464,BiologicalProcess,ACUTE RESPIRATORY CORONAVIRUS INFECTION,MONDO:MONDO:0005091
7,ACE,Gene,affects,SEMMED,SEMMED Gene API,179211619389791,BiologicalProcess,C0015895,UMLS:C0015895,affected_by,SEMMED,SEMMED Disease API,12409129,BiologicalProcess,ACUTE RESPIRATORY CORONAVIRUS INFECTION,MONDO:MONDO:0005091
8,ACE,Gene,disrupts,SEMMED,SEMMED Gene API,9482924,BiologicalProcess,C0015895,UMLS:C0015895,affected_by,SEMMED,SEMMED Disease API,12409129,BiologicalProcess,ACUTE RESPIRATORY CORONAVIRUS INFECTION,MONDO:MONDO:0005091
9,ACE,Gene,affects,SEMMED,SEMMED Gene API,26724739,BiologicalProcess,C0017262,UMLS:C0017262,affects,SEMMED,SEMMED Disease API,1256595412650527127499043208739,BiologicalProcess,ACUTE RESPIRATORY CORONAVIRUS INFECTION,MONDO:MONDO:0005091


In [19]:
v3_genes = list(dict.fromkeys(list(genes_related_to_phens["input"])))
v3_gene_inputs = create_gene_inputs(v3_genes)
v3_gene_inputs

[{'NCBIGene': '59272',
  'name': 'angiotensin I converting enzyme 2',
  'SYMBOL': 'ACE2',
  'UMLS': 'C1422064',
  'HGNC': '13557',
  'UNIPROTKB': 'Q9BYF1',
  'ENSEMBL': 'ENSG00000130234',
  'primary': {'identifier': 'NCBIGene', 'cls': 'Gene', 'value': '59272'},
  'display': 'NCBIGene(59272) ENSEMBL(ENSG00000130234) HGNC(13557) UMLS(C1422064) UNIPROTKB(Q9BYF1) SYMBOL(ACE2)',
  'type': 'Gene'},
 {'NCBIGene': '1636',
  'name': 'angiotensin I converting enzyme',
  'SYMBOL': 'ACE',
  'UMLS': 'C1413931',
  'HGNC': '2707',
  'UNIPROTKB': 'P12821',
  'ENSEMBL': 'ENSG00000159640',
  'primary': {'identifier': 'NCBIGene', 'cls': 'Gene', 'value': '1636'},
  'display': 'NCBIGene(1636) ENSEMBL(ENSG00000159640) HGNC(2707) UMLS(C1413931) UNIPROTKB(P12821) SYMBOL(ACE)',
  'type': 'Gene'},
 {'NCBIGene': '1401',
  'name': 'C-reactive protein',
  'SYMBOL': 'CRP',
  'UMLS': 'C1413716',
  'HGNC': '2367',
  'UNIPROTKB': 'P02741',
  'ENSEMBL': 'ENSG00000132693',
  'primary': {'identifier': 'NCBIGene', 'cls': 

In [40]:
len(v3_genes)
v3_genes

['ACE2',
 'ACE',
 'CRP',
 'TNF',
 'REN',
 'TH',
 'CISH',
 'IL2RA',
 'IL10',
 'AGTR1',
 'INS',
 'CD4',
 'POMC',
 'CA2',
 'CD28',
 'CCL2',
 'IFNA1',
 'F3',
 'IL6',
 'PTH',
 'LEP',
 'ALB',
 'GPT',
 'CD2',
 'ISG20',
 'GRAP2',
 'FAS',
 'CCR5',
 'CAT',
 'EGFR',
 'AKT1',
 'AGTR2',
 'MB',
 'CCL20',
 'ICAM1',
 'PIK3CA',
 'CXCL8',
 'IL17A',
 'DPP4',
 'CD44',
 'IL9',
 'ESR1',
 'EGF',
 'CD80',
 'TP53',
 'TLR4',
 'FOS',
 'ANG',
 'F2',
 'SARS2']

In [20]:
genes_to_pathways = predict_many(v3_gene_inputs,['Pathway'])
genes_to_pathways

Unnamed: 0,input,input_type,pred1,pred1_source,pred1_api,pred1_pubmed,output_type,output_name,output_id
0,ACE2,Gene,functional_association,CPDB,MyGene.info API,,Pathway,R-HSA-9678108,REACT:R-HSA-9678108
1,ACE2,Gene,functional_association,CPDB,MyGene.info API,,Pathway,R-HSA-9678110,REACT:R-HSA-9678110
2,ACE2,Gene,functional_association,CPDB,MyGene.info API,,Pathway,R-HSA-9679191,REACT:R-HSA-9679191
3,ACE2,Gene,functional_association,CPDB,MyGene.info API,,Pathway,R-HSA-9679506,REACT:R-HSA-9679506
5,ACE2,Gene,functional_association,CPDB,MyGene.info API,,Pathway,R-HSA-9694516,REACT:R-HSA-9694516
...,...,...,...,...,...,...,...,...,...
0,SARS2,Gene,functional_association,CPDB,MyGene.info API,,Pathway,TRNA AMINOACYLATION,REACT:R-HSA-379724
1,SARS2,Gene,functional_association,CPDB,MyGene.info API,,Pathway,MITOCHONDRIAL TRNA AMINOACYLATION,REACT:R-HSA-379726
2,SARS2,Gene,functional_association,CPDB,MyGene.info API,,Pathway,METABOLISM OF PROTEINS,REACT:R-HSA-392499
3,SARS2,Gene,functional_association,CPDB,MyGene.info API,,Pathway,TRANSLATION,REACT:R-HSA-72766


In [21]:
# list(dict.fromkeys(list(genes_to_pathways["output_id"])))
genes_to_pathways = genes_to_pathways[genes_to_pathways["output_id"].str.contains('WIKIPATHWAYS')]

In [22]:
gene_list_3 = list(genes_to_pathways["input"])
gene_to_path_count_dict = {x:gene_list_3.count(x) for x in gene_list_3}
gene_to_path_count_dict

{'ACE2': 1,
 'ACE': 1,
 'CRP': 6,
 'TNF': 53,
 'REN': 1,
 'TH': 8,
 'CISH': 9,
 'IL2RA': 8,
 'IL10': 10,
 'AGTR1': 5,
 'INS': 16,
 'CD4': 5,
 'POMC': 3,
 'CD28': 6,
 'CCL2': 22,
 'IFNA1': 5,
 'F3': 5,
 'IL6': 42,
 'PTH': 4,
 'LEP': 10,
 'ALB': 4,
 'GPT': 2,
 'ISG20': 1,
 'GRAP2': 2,
 'FAS': 18,
 'CCR5': 6,
 'CAT': 6,
 'EGFR': 40,
 'AKT1': 102,
 'AGTR2': 3,
 'MB': 1,
 'CCL20': 5,
 'ICAM1': 15,
 'PIK3CA': 51,
 'CXCL8': 26,
 'IL17A': 5,
 'CD44': 3,
 'IL9': 3,
 'ESR1': 14,
 'EGF': 24,
 'CD80': 9,
 'TP53': 52,
 'TLR4': 19,
 'FOS': 45,
 'F2': 9,
 'SARS2': 1}

In [41]:
# pathways
gene_to_pathway_results = {}
gene_to_pathway_genes = list(genes_to_pathways["output_name"]) # create list of genes
gene_to_pathway_genes = list(dict.fromkeys(gene_to_pathway_genes))  # remove duplicates

for gene in gene_to_pathway_genes: 
    gene_to_pathway_results[gene] = {
        'genes_related_count' : 0,
        "genes_related" : [],
        "weighted_score" : []
    }

    
for index, row in genes_to_pathways.iterrows():
    gene_to_pathway_results[row['output_name']]['genes_related_count'] = gene_to_pathway_results[row['output_name']]['genes_related_count'] + 1
    gene_to_pathway_results[row['output_name']]['genes_related'].append(row['input'])

for index, row in genes_to_pathways.iterrows():
    score = 0 
    gene_to_pathway_results[row['output_name']]['genes_related_count'] = len(list(dict.fromkeys(list( gene_to_pathway_results[row['output_name']]['genes_related']))))
    for x in  gene_to_pathway_results[row['output_name']]['genes_related']:
        score = score + (1/math.sqrt(gene_to_path_count_dict[x]))
#         score = score + (1/gene_to_path_count_dict[x])
    gene_to_pathway_results[row['output_name']]['weighted_score'] = score
#     gene_to_pathway_results[row['output_name']]['genes_related_count'] =( 
#         gene_to_pathway_results[row['output_name']]['genes_related_count'] 
# #         + len(list(set(gene_to_pathway_results[row['output_name']]['genes_related']) & set(more_than_one)))
#     )
gene_to_pathway_results = dict(sorted(gene_to_pathway_results.items(), key = lambda x: x[1]['weighted_score'], reverse = True))

# gene_to_pathway_results
pandas.DataFrame.from_dict(gene_to_pathway_results, orient='index').iloc[0:50]

Unnamed: 0,genes_related_count,genes_related,weighted_score
ACE INHIBITOR PATHWAY,5,"[ACE2, ACE, REN, AGTR1, AGTR2]",4.024564
T-CELL ANTIGEN RECEPTOR (TCR) SIGNALING PATHWAY,10,"[CD4, CD28, IL6, GRAP2, FAS, CCR5, AKT1, IL17A...",3.633472
ALLOGRAFT REJECTION,9,"[TNF, IL2RA, IL10, AGTR1, CD28, FAS, CXCL8, IL...",2.874969
FOLATE METABOLISM,10,"[CRP, TNF, INS, CCL2, IL6, ALB, CAT, ICAM1, TP...",2.801568
INTERLEUKIN-4 AND INTERLEUKIN-13 SIGNALING,11,"[TNF, IL10, POMC, CCL2, IL6, AKT1, ICAM1, CXCL...",2.686732
SELENIUM MICRONUTRIENT NETWORK,9,"[CRP, TNF, INS, CCL2, IL6, ALB, CAT, ICAM1, F2]",2.662893
T-CELL ANTIGEN RECEPTOR (TCR) PATHWAY DURING STAPHYLOCOCCUS AUREUS INFECTION,8,"[TNF, IL10, CD4, CD28, GRAP2, AKT1, PIK3CA, FOS]",2.404271
VITAMIN B12 METABOLISM,8,"[CRP, TNF, INS, CCL2, IL6, ALB, ICAM1, F2]",2.254645
REGULATION OF TOLL-LIKE RECEPTOR SIGNALING PATHWAY,10,"[TNF, CISH, IFNA1, IL6, AKT1, PIK3CA, CXCL8, C...",2.21919
PI3K-AKT SIGNALING PATHWAY,10,"[IL2RA, INS, IFNA1, IL6, EGFR, AKT1, PIK3CA, E...",2.174442


Result table interpretation: In agreement with argument made in article, top ranking results of "ACE Inhibitor Pathway" and "Metabolism of Angiotensinogen to Angiotensins" are both components of the RAS pathway. 

### 3.2 Genes -> Biological Processes

Look at bbiological processes related to the genes, and then display top biological process occurrences in results (and which genes related to each biological processes). 


In [24]:
gene_to_bioprocesses = predict_many(v3_gene_inputs, ['BiologicalProcess'])

In [25]:
gene_to_bioprocesses = gene_to_bioprocesses[gene_to_bioprocesses["pred1_api"] == "MyGene.info API"]
gene_to_bioprocesses 

Unnamed: 0,input,input_type,pred1,pred1_source,pred1_api,pred1_pubmed,output_type,output_name,output_id
37,ACE2,Gene,functional_association,entrez,MyGene.info API,15380922,BiologicalProcess,REGULATION OF CYTOKINE PRODUCTION,GO:GO:0001817
38,ACE2,Gene,functional_association,entrez,MyGene.info API,10924499,BiologicalProcess,ANGIOTENSIN MATURATION,GO:GO:0002003
39,ACE2,Gene,functional_association,entrez,MyGene.info API,,BiologicalProcess,ANGIOTENSIN MATURATION,GO:GO:0002003
40,ACE2,Gene,functional_association,entrez,MyGene.info API,18258853,BiologicalProcess,ANGIOTENSIN MEDIATED DRINKING BEHAVIOR,GO:GO:0003051
41,ACE2,Gene,functional_association,entrez,MyGene.info API,10924499,BiologicalProcess,BLOOD PRESSURE REGULATION BY RENIN-ANGIOTENSIN,GO:GO:0003081
...,...,...,...,...,...,...,...,...,...
70,SARS2,Gene,functional_association,entrez,MyGene.info API,,BiologicalProcess,TRNA AMINOACYLATION FOR PROTEIN TRANSLATION,GO:GO:0006418
71,SARS2,Gene,functional_association,entrez,MyGene.info API,21873635,BiologicalProcess,SERYL-TRNA AMINOACYLATION,GO:GO:0006434
72,SARS2,Gene,functional_association,entrez,MyGene.info API,10764807,BiologicalProcess,SERYL-TRNA AMINOACYLATION,GO:GO:0006434
73,SARS2,Gene,functional_association,entrez,MyGene.info API,21873635,BiologicalProcess,MITOCHONDRIAL SERYL-TRNA AMINOACYLATION,GO:GO:0070158


In [26]:
gene_list_4 = list(gene_to_bioprocesses["input"])
gene_to_bp_count_dict = {x:gene_list_4.count(x) for x in gene_list_4}
gene_to_bp_count_dict

{'ACE2': 22,
 'ACE': 38,
 'CRP': 14,
 'TNF': 177,
 'REN': 19,
 'TH': 65,
 'CISH': 9,
 'IL2RA': 19,
 'IL10': 78,
 'AGTR1': 34,
 'INS': 68,
 'CD4': 43,
 'POMC': 14,
 'CA2': 22,
 'CD28': 32,
 'CCL2': 60,
 'IFNA1': 12,
 'F3': 15,
 'IL6': 105,
 'PTH': 36,
 'LEP': 105,
 'ALB': 12,
 'GPT': 5,
 'CD2': 13,
 'ISG20': 12,
 'GRAP2': 5,
 'FAS': 30,
 'CCR5': 23,
 'CAT': 39,
 'EGFR': 109,
 'AKT1': 153,
 'AGTR2': 46,
 'MB': 9,
 'CCL20': 23,
 'ICAM1': 59,
 'PIK3CA': 48,
 'CXCL8': 41,
 'IL17A': 22,
 'DPP4': 16,
 'CD44': 29,
 'IL9': 6,
 'ESR1': 49,
 'EGF': 45,
 'CD80': 18,
 'TP53': 143,
 'TLR4': 95,
 'FOS': 40,
 'ANG': 31,
 'F2': 40,
 'SARS2': 5}

In [27]:
# bioprocesses

gene_to_bioprocess_results = {}
gene_to_bioprocess_genes = list(gene_to_bioprocesses["output_name"]) # create list of genes
gene_to_bioprocess_genes = list(dict.fromkeys(gene_to_bioprocess_genes))  # remove duplicates


for gene in gene_to_bioprocess_genes: 
    gene_to_bioprocess_results[gene] = {
        'bioprocess_count' : 0,
        "genes_related" : [],
        "weighted_score" : []
    }

for index, row in gene_to_bioprocesses.iterrows():
    gene_to_bioprocess_results[row['output_name']]['bioprocess_count'] = gene_to_bioprocess_results[row['output_name']]['bioprocess_count'] + 1
    gene_to_bioprocess_results[row['output_name']]['genes_related'].append(row['input'])

for index, row in gene_to_bioprocesses.iterrows():
    score = 0 
    gene_to_bioprocess_results[row['output_name']]['bioprocess_count'] = len(list(dict.fromkeys(list(gene_to_bioprocess_results[row['output_name']]['genes_related']))))
    for x in  gene_to_bioprocess_results[row['output_name']]['genes_related']:
        score = score + (1/math.sqrt(gene_to_bp_count_dict[x]))
    gene_to_bioprocess_results[row['output_name']]['weighted_score'] = score
    
    
for index, row in gene_to_bioprocesses.iterrows():
    gene_to_bioprocess_results[row['output_name']]['bioprocess_count'] = gene_to_bioprocess_results[row['output_name']]['bioprocess_count'] + 1
    gene_to_bioprocess_results[row['output_name']]['genes_related'].append(row['input'])

In [28]:
## extra step needed to analyze biological processes because many are returned as UMLS id instead of name

gene_to_bioprocess_results = dict(sorted(gene_to_bioprocess_results.items(), key = lambda x: x[1]['weighted_score'], reverse = True))
counter = 0 
gene_to_bioprocess_results_copy = gene_to_bioprocess_results
for key in gene_to_bioprocess_results_copy.keys(): 
    if counter < 300: 
        if(('C0' in key) or ('C1' in key)): 
            try: 
                name = ht.query(key)['BiologicalProcess'][0]['name']
                gene_to_bioprocess_results[name] = gene_to_bioprocess_results[key]
                del gene_to_bioprocess_results[key]
            except: 
                pass
    counter = counter + 1

In [29]:
# gene_to_bioprocess_results
pandas.DataFrame.from_dict(gene_to_bioprocess_results, orient='index').iloc[0:50]

Unnamed: 0,bioprocess_count,genes_related,weighted_score
CYTOKINE AND CHEMOKINE MEDIATED SIGNALING PATHWAY,40,"[TNF, IL2RA, IL10, CD4, POMC, CCL2, CCL2, IFNA...",3.457036
INFLAMMATION,34,"[CRP, TNF, IL2RA, AGTR1, CCL2, CCL2, IL6, CCR5...",3.124649
G PROTEIN COUPLED RECEPTOR PROTEIN SIGNALING PATHWAY,30,"[AGTR1, AGTR1, AGTR1, INS, POMC, CCL2, PTH, CC...",3.014645
ACTIVATION OF GLOBAL TRANSCRIPTION FROM RNA POLYMERASE II PROMOTER,32,"[TNF, TNF, TNF, TNF, POMC, CD28, IL6, IL6, PTH...",2.467041
POSITIVE REGULATION OF GENE EXPRESSION,31,"[CRP, TNF, TNF, TNF, TNF, INS, CD28, F3, IL6, ...",2.339937
IMMUNE RESPONSE,20,"[IL2RA, CD4, FAS, CCR5, CCL20, CXCL8, IL17A, I...",2.09744
ACTIVATION OF GENE-SPECIFIC TRANSCRIPTION,28,"[TNF, IL10, IL10, CD4, IL6, IL6, EGFR, AKT1, A...",1.984475
SIGNAL TRANSDUCTION,23,"[CD4, POMC, CCL2, FAS, EGFR, EGFR, AKT1, CCL20...",1.896163
POSITIVE REGULATION OF CELL POPULATION PROLIFERATION,22,"[INS, IL6, IL6, IL6, EGFR, EGFR, EGFR, AKT1, A...",1.895106
CELL SURFACE RECEPTOR LINKED SIGNAL TRANSDUCTION,20,"[IL2RA, CD4, CD28, CCL2, CD2, CCR5, EGFR, AGTR...",1.810696


Result table interpretation: In agreement with argument made in article, top ranking results of "renin activity," "Angiogenic Process," and "ANGIOTENSIN MATURATIONs" are components of the RAS pathway / bioprocesses. 

### 3.3 Genes -> PhenotypicFeatures

Look at chemical substances related to the genes, and then display top chemical substance occurrences in results (and which genes related to each chemical substances).

In [36]:
gene_to_phenotypic_features = predict_many(v3_gene_inputs, ['PhenotypicFeature'])


{'ACE': 10,
 'TNF': 7,
 'REN': 18,
 'TH': 54,
 'IL2RA': 19,
 'IL10': 2,
 'AGTR1': 10,
 'INS': 79,
 'CD4': 4,
 'POMC': 22,
 'CA2': 52,
 'CD28': 1,
 'IFNA1': 3,
 'PTH': 16,
 'LEP': 28,
 'ALB': 22,
 'GPT': 1,
 'FAS': 76,
 'CCR5': 1,
 'CAT': 25,
 'EGFR': 4,
 'AKT1': 209,
 'AGTR2': 37,
 'PIK3CA': 243,
 'DPP4': 1,
 'CD44': 3,
 'ESR1': 26,
 'EGF': 16,
 'TP53': 109,
 'FOS': 77,
 'ANG': 25,
 'F2': 71,
 'SARS2': 21}

In [30]:
gene_list_5 = list(gene_to_phenotypic_features["input"])
gene_to_phenotypic_feature_count_dict = {x:gene_list_5.count(x) for x in gene_list_5}
gene_to_phenotypic_feature_count_dict

gene_to_phenotypic_feature_results = {}
gene_to_phenotypic_feature_genes = list(gene_to_phenotypic_features["output_name"]) # create list of genes
gene_to_phenotypic_feature_genes = list(dict.fromkeys(gene_to_phenotypic_feature_genes))  # remove duplicates


for gene in gene_to_phenotypic_feature_genes: 
    gene_to_phenotypic_feature_results[gene] = {
        'phenotypic_feature_count' : 0,
        "genes_related" : [],
        "weighted_score" : []
    }

for index, row in gene_to_phenotypic_features.iterrows():
    gene_to_phenotypic_feature_results[row['output_name']]['phenotypic_feature_count'] = gene_to_phenotypic_feature_results[row['output_name']]['phenotypic_feature_count'] + 1
    gene_to_phenotypic_feature_results[row['output_name']]['genes_related'].append(row['input'])

for index, row in gene_to_phenotypic_features.iterrows():
    score = 0 
    gene_to_phenotypic_feature_results[row['output_name']]['phenotypic_feature_count'] = len(list(dict.fromkeys(list(gene_to_phenotypic_feature_results[row['output_name']]['genes_related']))))
    for x in  gene_to_phenotypic_feature_results[row['output_name']]['genes_related']:
        score = score + (1/math.sqrt(gene_to_phenotypic_feature_count_dict[x]))
    gene_to_phenotypic_feature_results[row['output_name']]['weighted_score'] = score
    
    
for index, row in gene_to_phenotypic_features.iterrows():
    gene_to_phenotypic_feature_results[row['output_name']]['phenotypic_feature_count'] = gene_to_phenotypic_feature_results[row['output_name']]['phenotypic_feature_count'] + 1
    gene_to_phenotypic_feature_results[row['output_name']]['genes_related'].append(row['input'])

In [35]:
gene_to_phenotypic_feature_results = dict(sorted(gene_to_phenotypic_feature_results.items(), key = lambda x: x[1]['weighted_score'], reverse = True))
pandas.DataFrame.from_dict(gene_to_phenotypic_feature_results, orient='index').iloc[0:20]

Unnamed: 0,phenotypic_feature_count,genes_related,weighted_score
C0034917,31,"[TNF, TNF, TNF, TNF, TNF, INS, CD4, CA2, ALB, ...",7.28204
C0019699,18,"[TNF, TNF, IL10, CD4, IFNA1, IFNA1, LEP, FAS, ...",4.485576
EPILEPSY,18,"[INS, PTH, PTH, FAS, AKT1, AGTR2, PIK3CA, PIK3...",1.43487
PROGRESSIVE RESPIRATORY FAILURE,10,"[ACE, REN, AGTR1, ANG, SARS2, ACE, REN, AGTR1,...",1.286376
ARTERIAL HYPOTENSION,12,"[ACE, REN, AGTR1, ALB, PIK3CA, TP53, ACE, REN,...",1.241291
C0456984,6,"[CA2, DPP4, TP53, CA2, DPP4, TP53]",1.234458
ECZEMA,4,"[IL2RA, CD28, IL2RA, CD28]",1.229416
ABNORMALLY SMALL CRANIUM,10,"[ACE, REN, AGTR1, EGF, TP53, ACE, REN, AGTR1, ...",1.21394
HAVING TOO MUCH BODY FAT,12,"[INS, POMC, LEP, ALB, AGTR2, EGF, INS, POMC, L...",1.142291
LOW LEVELS OF AMNIOTIC FLUID,8,"[ACE, REN, AGTR1, ALB, ACE, REN, AGTR1, ALB]",1.081359


Concept: [C0034917]  Reduced Glutathione
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7263077/

Polonikov, Alexey. "Endogenous Deficiency of Glutathione as the Most Likely Cause of Serious Manifestations and Death in COVID-19 Patients." ACS Infectious Diseases (2020).


C0019699
Concept: [C0019699]  HIV Seropositivity



### 4.1 COVID-19 -> Genes <- Hyaluronic Acid Explain Query

In [None]:
## Get Hayluronic Acid node
HYA = ht.query('hyaluronic acid')['ChemicalSubstance'][0]
HYA

In [None]:
fc3 = FindConnection(input_obj=covid, output_obj=HYA, intermediate_nodes=['Gene'])
fc3.connect(verbose=False)
df3 = fc3.display_table_view()
df3

### 4.2 COVID/HYA Genes -> Pathways

In [None]:
## get related genes and turn them into nodes
genes_related_to_HYA = list(df3["node1_name"])
# get gene inputs through hint module
gene_inputs_2 = []
for gene in genes_related_to_HYA: 
    try: 
        gene_input = ht.query(gene)["Gene"][0]
        gene_inputs_2.append(gene_input)
    except: 
        print(gene + ' Failed')

In [None]:
## Query Genes -> Pathways
HYA_gene_to_pathways = predict_many(gene_inputs_2, ['Pathway'])

In [None]:
# Display Pathway Counts and Genes related to each pathway 
gene_to_pathway_results = {}
gene_to_pathway_genes = list(HYA_gene_to_pathways["output_name"]) # create list of genes
gene_to_pathway_genes = list(dict.fromkeys(gene_to_pathway_genes))  # remove duplicates

for gene in gene_to_pathway_genes: 
    gene_to_pathway_results[gene] = {
        'pathway_count' : 0,
        "genes_related" : []
    }

for index, row in gene_to_pathways_2.iterrows():
    gene_to_pathway_results[row['output_name']]['pathway_count'] = gene_to_pathway_results[row['output_name']]['pathway_count'] + 1
    gene_to_pathway_results[row['output_name']]['genes_related'].append(row['input'])
    

gene_to_pathway_results = dict(sorted(gene_to_pathway_results.items(), key = lambda x: x[1]['pathway_count'], reverse = True))

# gene_to_pathway_results
pandas.DataFrame.from_dict(gene_to_pathway_results, orient='index').iloc[0:50]

Interestingly, cytokine signaling is the pathway most indicated as being related to genes that are related to both COVID and HYA. This may be in agreement with a large amount of research indicating cytokine concentration elevation to be correlated with severe COVID cases: 


- Cao, Xuetao. "COVID-19: immunopathology and its implications for therapy." Nature reviews immunology 20.5 (2020): 269-270.

- Mangalmurti, Nilam, and Christopher A. Hunter. "Cytokine storms: understanding COVID-19." Immunity (2020).

- Wu, Dandan, and Xuexian O. Yang. "TH17 responses in cytokine storm of COVID-19: An emerging target of JAK2 inhibitor Fedratinib." Journal of Microbiology, Immunology and Infection (2020).



Additionally, and interestingly, past research has indicated a role of cytokines in hyaluronic acid production / degredation: 

- Sampson, Phyllis M., et al. "Cytokine regulation of human lung fibroblast hyaluronan (hyaluronic acid) production. Evidence for cytokine-regulated hyaluronan (hyaluronic acid) degradation and human lung fibroblast-derived hyaluronidase." The Journal of clinical investigation 90.4 (1992): 1492-1503.


## 5 Summary 
### 5.1 Summary


- RAS Pathway and corresponding proteins, pathways, and processes, and chemicals were highly implicated through genes derived from COVID -> Genes <- Vasodilation / Hypotension query

- Anatomical Entities related to genes were very representative of areas where symptoms in COVID patients often occur

- Cytokine pathways may be relevant to different COVID symptoms (Hyaluronic Acid Production) than initially proposed


### 5.2 Future Directions 

- Investigate COVID & Hyaluronic Acid connection further
