# TABLE OF CONTENTS

## 1 USE CASE: COVID-19 
###  &emsp; 1.1 What genes are connected to COVID-19?
####  &emsp; &emsp; 1.1.1 COVID-19 -> Genes (determine directly related) 
####  &emsp; &emsp; 1.1.2 COVID-19 -> All intermediate node types -> Genes
###  &emsp; 1.2 What are the symptoms that are related to COVID-19?
####  &emsp; &emsp; 1.2.1 COVID-19 -> Symptoms (PhenotypicFeature, BiologicalProcess)
###  &emsp; 1.3 Which of the genes related to COVID-19 are related to symptoms of COVID-19? 
####  &emsp; &emsp; 1.3.1 Genes (from 1.1) -> Symptoms (From 1.2.1)
####  &emsp; &emsp; 1.3.2 Genes (from 1.1) -> [Drugs, SequenceVariant, Pathways, MolecularActivity] -> Symptoms (From 1.2.1)
###  &emsp; 1.4 What proteins/genes are in pathways of known COVID-19 related genes? Which of these can be related to symptoms? 
####  &emsp; &emsp; 1.4.1 Genes (from 1.1.1) -> Pathways -> Genes
####  &emsp; &emsp; 1.4.2 COVID-19 Symptoms -> Pathways -> Genes
###  &emsp; 1.5 In what way can co-occurrence data from COHD EHR data (conditions, drugs, and procedures) be used to further identify or establish genes associated with COVID-19? 
####  &emsp; &emsp; 1.5.1 Co-occurence of related conditions (parent diseases, siblings) and drugs
####  &emsp; &emsp; 1.5.2 Co-occurrence of related drugs and related symptoms 

In [1]:
###### CODE SETUP 

## First get all the functions set up
import pandas as pd
import requests
import difflib

# import itables.interactive
# from itables import show
# import itables.options as opt
# opt.maxBytes = 10000000


## Load BTE
from biothings_explorer.user_query_dispatcher import FindConnection
from biothings_explorer.hint import Hint
ht = Hint()

## Functions that will be used
# Check for every intermediate node type in Predict funciton
def predict_many(input_object, intermediate_node_list, output_type):
    df_list = []
    for inter in intermediate_node_list:
        try: 
            print("Intermediate Node type running:")
            print(inter)
            fc = FindConnection(input_obj=input_object, output_obj=output_type, intermediate_nodes=[inter])
            fc.connect(verbose=False)
            df = fc.display_table_view()
            rows = df.shape[0]
            if(rows > 0):
                df_list.append(df)
        except:
            print("FAILED")
    if(len(df_list) > 0):
        return pd.concat(df_list)
    else:
        return None
    
# all intermediate node types

node_type_list = (['Gene', 'SequenceVariant', 'ChemicalSubstance', 'Disease', 
                   'MolecularActivity', 'BiologicalProcess', 'CellularComponent', 
                   'Pathway', 'AnatomicalEntity', 'PhenotypicFeature'])

## 1.1 What genes are connected to COVID-19?

### 1.1.1 COVID-19 -> Genes (determine directly related) 

In [2]:
## get COVID-19
covid19 = ht.query("COVID-19")['Disease'][0]
covid19

{'MONDO': 'MONDO:0100096',
 'DOID': 'DOID:0080600',
 'name': 'COVID-19',
 'primary': {'identifier': 'MONDO',
  'cls': 'Disease',
  'value': 'MONDO:0100096'},
 'display': 'MONDO(MONDO:0100096) DOID(DOID:0080600) name(COVID-19)',
 'type': 'Disease'}

In [3]:
fc = FindConnection(input_obj=covid19, output_obj='Gene', intermediate_nodes=None)
fc.connect(verbose=True)
covid19_to_genes = fc.display_table_view()
covid19_to_genes


BTE will find paths that join 'COVID-19' and 'Gene'.                   Paths will have 0 intermediate node.




==== Step #1: Query path planning ====

Because COVID-19 is of type 'Disease', BTE will query our meta-KG for APIs that can take 'Disease' as input and 'Gene' as output

BTE found 8 apis:

API 1. hetio(1 API call)
API 2. cord_disease(1 API call)
API 3. scibite(1 API call)
API 4. pharos(1 API call)
API 5. biolink(1 API call)
API 6. mgi_gene2phenotype(1 API call)
API 7. DISEASES(1 API call)
API 8. scigraph(1 API call)


==== Step #2: Query path execution ====
NOTE: API requests are dispatched in parallel, so the list of APIs below is ordered by query time.

API 6.1: https://pending.biothings.io/mgigene2phenotype/query?fields=_id&size=300 (POST -d q=DOID:0080600&scopes=mgi.associated_with_disease.doid)
API 7.1: https://pending.biothings.io/DISEASES/query?fields=DISEASES.associatedWith (POST -d q=DOID:0080600&scopes=DISEASES.doid)
API 2.1: https://biothings.ncats.io/cord_disease

Unnamed: 0,input,input_type,pred1,pred1_source,pred1_api,pred1_pubmed,output_type,output_name,output_id
0,COVID-19,Disease,related_to,DISEASE,DISEASES API,,Gene,EID2,NCBIGene:163126
1,COVID-19,Disease,related_to,DISEASE,DISEASES API,,Gene,ACE2,NCBIGene:59272
2,COVID-19,Disease,related_to,scigraph,Automat CORD19 Scigraph API,,Gene,ACE2,NCBIGene:59272
3,COVID-19,Disease,related_to,scigraph,Automat CORD19 Scigraph API,,Gene,CRP,NCBIGene:1401
4,COVID-19,Disease,related_to,scigraph,Automat CORD19 Scigraph API,,Gene,MARS1,NCBIGene:4141
5,COVID-19,Disease,related_to,scigraph,Automat CORD19 Scigraph API,,Gene,POR,NCBIGene:5447
6,COVID-19,Disease,related_to,scigraph,Automat CORD19 Scigraph API,,Gene,TMPRSS2,NCBIGene:7113
7,COVID-19,Disease,related_to,scigraph,Automat CORD19 Scigraph API,,Gene,SON,NCBIGene:6651
8,COVID-19,Disease,related_to,scigraph,Automat CORD19 Scigraph API,,Gene,TH,NCBIGene:7054


### 1.1.2 COVID-19 -> All intermediate node types -> Genes

In [4]:
covid_allNodes_Genes = predict_many(covid19,node_type_list,'Gene')

Intermediate Node type running:
Gene
Intermediate Node type running:
SequenceVariant
Intermediate Node type running:
ChemicalSubstance
API 3.1 pharos failed
Intermediate Node type running:
Disease
Intermediate Node type running:
MolecularActivity
Intermediate Node type running:
BiologicalProcess
Intermediate Node type running:
CellularComponent
Intermediate Node type running:
Pathway
Intermediate Node type running:
AnatomicalEntity
Intermediate Node type running:
PhenotypicFeature


In [5]:
## Genes identified = HUGE NUMBER (13562)
len(list(covid_allNodes_Genes["output_name"]))

13562

In [6]:
i = list(covid_allNodes_Genes["output_name"])
d = {x:i.count(x) for x in i}
sorted_genes_covid_2_allNodes_2_genes = {k: v for k, v in sorted(d.items(), key=lambda item: item[1])}
for x in list(reversed(list(sorted_genes_covid_2_allNodes_2_genes)))[0:50]:
    print(str(x) + ": " + str(sorted_genes_covid_2_allNodes_2_genes[x]))

TNF: 43
CYP3A4: 33
CAT: 32
INS: 26
C0014442: 26
CYP2D6: 25
IL6: 23
C0017337: 23
ABCB1: 22
AKT1: 21
ANG: 20
TP53: 18
HIF1A: 17
SQSTM1: 17
CYP1A2: 17
C0010762: 17
FOS: 17
TLR9: 16
C0164786: 16
AR: 16
ACE2: 16
ACE: 16
SOD1: 15
CYP2C9: 15
PPIG: 15
CD4: 15
C1705556: 15
VEGFA: 15
EGFR: 15
ALB: 15
IL1B: 15
RELA: 15
APP: 15
C0010531: 15
C0030956: 15
SOD2: 14
BAX: 14
CDKN1A: 14
MTOR: 14
CASP3: 14
C1705526: 14
TH: 14
EPO: 14
MPO: 14
IFNA1: 14
C0033634: 14
TLR7: 13
CYP2B6: 13
C0020364: 13
C1142644: 13


In [7]:
## store top 50 genes
top_50_related_genes_covid_2_allNodes_2_genes = list(reversed(list(sorted_genes_covid_2_allNodes_2_genes )))[0:50]

## 1.2 What are the symptoms that are related to COVID-19?

### COVID-19 -> PhenotypicFeature

In [8]:
fc = FindConnection(input_obj=covid19, output_obj='PhenotypicFeature', intermediate_nodes=None)
fc.connect(verbose=False)
covid19_2_phentoypic_feature = fc.display_table_view()
covid19_2_phentoypic_feature

## no results 

In [9]:
## try more broad "corona" look at coronaviruses in general 
corona = ht.query("CORONAVINAE INFECTIOUS DISEASE")['Disease'][0]
corona

{'MONDO': 'MONDO:0005719',
 'name': 'Coronavinae infectious disease',
 'MESH': 'D018352',
 'primary': {'identifier': 'MONDO',
  'cls': 'Disease',
  'value': 'MONDO:0005719'},
 'display': 'MONDO(MONDO:0005719) MESH(D018352) name(Coronavinae infectious disease)',
 'type': 'Disease'}

In [10]:
fc = FindConnection(input_obj=corona, output_obj='PhenotypicFeature', intermediate_nodes=None)
fc.connect(verbose=False)
covid19_2_phentoypic_feature = fc.display_table_view()
covid19_2_phentoypic_feature

## no results 

### COVID-19 -> BiologicalProcess

In [11]:
fc = FindConnection(input_obj=covid19, output_obj='BiologicalProcess', intermediate_nodes=None)
fc.connect(verbose=False)
covid19_2_biologicalProcess = fc.display_table_view()
covid19_2_biologicalProcess

In [17]:
# try broader corona family again 
fc = FindConnection(input_obj=corona, output_obj='BiologicalProcess', intermediate_nodes=None)
fc.connect(verbose=False)
covid19_2_biologicalProcess = fc.display_table_view()
covid19_2_biologicalProcess

## Determine symptoms from: http://www.diseasesdatabase.com/relationships.asp?glngUserChoice=60833&bytRel=2&blnBW=0&strBB=LR&blnClassSort=255&Key={A27BEC6F-30C5-4893-BB0F-9FEB5589DEB3}


In [18]:
# Symptoms and signs:
    
# Cough
#    Coughing
# Diarrhoea
#     Loose stools
#     Diarrhea
# Myalgia
#     Myodynia
#     Muscle pain
# Pyrexia
#     Body temperature increased
#     Febrile
#     Fever
#     Hyperthermia
# Taste disturbance
#     Ageusia
#     Dysgeusia
#     Hypogeusia
#     Parageusia


# Haematological abnormalities:
# Lymphocytopenia
#     Lymphopenia
#     Lymphocyte count low (peripheral blood)

# Biochemical abnormalities:
# Lactate dehydrogenase levels raised (plasma or serum)
#     LDH raised

# Cardiac and vascular conditions:
# Myocarditis

# Inflammatory conditions:
# Pneumonia
#     Pneumonitis
#     Pulmonary inflammation


symptom_and_phenotype_list = ['Cough','Coughing','Diarrhoea','Loose stools','Diarrhea','Myalgia','Myodynia',
                              'Muscle pain','Pyrexia','Body temperature increased','Febrile','Fever','Hyperthermia',
                              'Taste disturbance','Ageusia','Dysgeusia','Hypogeusia','Parageusia','Lymphocytopenia',
                              'Lymphopenia','Lymphocyte count low (peripheral blood)',
                              'Lactate dehydrogenase levels raised (plasma or serum)','LDH raised','Myocarditis',
                              'Pneumonia','Pneumonitis','Pulmonary inflammation']



symptom_and_phenotype_list = [x.lower() for x in symptom_and_phenotype_list]
symptom_and_phenotype_list
# symptom_and_phenotype_list 

['cough',
 'coughing',
 'diarrhoea',
 'loose stools',
 'diarrhea',
 'myalgia',
 'myodynia',
 'muscle pain',
 'pyrexia',
 'body temperature increased',
 'febrile',
 'fever',
 'hyperthermia',
 'taste disturbance',
 'ageusia',
 'dysgeusia',
 'hypogeusia',
 'parageusia',
 'lymphocytopenia',
 'lymphopenia',
 'lymphocyte count low (peripheral blood)',
 'lactate dehydrogenase levels raised (plasma or serum)',
 'ldh raised',
 'myocarditis',
 'pneumonia',
 'pneumonitis',
 'pulmonary inflammation']

### 1.3 Which of the genes related to COVID-19 are related to symptoms of COVID-19? 

### 1.3.1 Genes (from 1.1) -> Symptoms (From 1.2.1)

#### 1.3.1.1 Gene -> Phenotype type "symptoms"

In [19]:
df_list = []
for x in top_50_related_genes_covid_2_allNodes_2_genes: 
#     print(x)
    try: 
        gene = ht.query(x)["Gene"][0]
        fc = FindConnection(input_obj=gene, output_obj='PhenotypicFeature', intermediate_nodes=None)
        fc.connect(verbose=False)
        df = fc.display_table_view()
        rows = df.shape[0]
        if(rows > 0):
            df_list.append(df)
    except:
        print(str(x) + " FAILED")
if(len(df_list) > 0):
    top50gene_2_phenotypicFeature = pd.concat(df_list)


C0014442 FAILED
C0017337 FAILED
C0010762 FAILED
C0164786 FAILED
C1705556 FAILED
C0010531 FAILED
C0030956 FAILED
C1705526 FAILED
C0033634 FAILED
C0020364 FAILED
C1142644 FAILED


In [20]:
top50gene_2_phenotypicFeature.shape

(1058, 9)

In [22]:
## Get names for HP ids
HP_ids = top50gene_2_phenotypicFeature[top50gene_2_phenotypicFeature["output_name"].str.contains("HP:",regex=False)]["output_name"]
HP_ids = list(HP_ids)
HP_ids = list(dict.fromkeys(HP_ids))
len(HP_ids)
HP_dict = {}
for x in HP_ids: 
    HP_ID = x.split(':')[1]
    r = requests.get('https://biothings.ncats.io/hpo/phenotype/HP%3A' + HP_ID)
    res = r.json()
    if(('_id' in res) & ('name' in res)):
        HP_dict[res['_id']] = res['name'].lower()

In [67]:
def get_similar_phen_indices(list1,list2,similarity):
    res = [] 
    i = 0
    while (i < len(list1)):
        append_i = False
        lookup = list1[i].lower()
        if('HP:' in list1[i]):
            if(list1[i]  in HP_dict):
                lookup = HP_dict[list1[i]]
        for j in list2:
                if(difflib.SequenceMatcher(None,lookup,j).ratio() > similarity):
    #                 if(i < 3):
                    print("Matched similar terms:")
                    print(lookup + ' and ' + j)
#                     print()
                    append_i = True
        if(append_i): 
            res.append(i) 
        i += 1
    print(len(res))
    return(res)


In [68]:
phen_indices = get_similar_phen_indices(list(top50gene_2_phenotypicFeature["output_name"]),symptom_and_phenotype_list,0.9)

Matched similar terms:
fever and fever
Matched similar terms:
diarrhea and diarrhoea
Matched similar terms:
diarrhea and diarrhea
Matched similar terms:
myalgia and myalgia
Matched similar terms:
fever and fever
4


In [69]:
top50gene_2_phenotypicFeature.iloc[phen_indices,:]

Unnamed: 0,input,input_type,pred1,pred1_source,pred1_api,pred1_pubmed,output_type,output_name,output_id
22,TP53,Gene,related_to,,BioLink API,,PhenotypicFeature,HP:0001945,HP:HP:0001945
83,TP53,Gene,related_to,,BioLink API,,PhenotypicFeature,HP:0002014,HP:HP:0002014
39,FOS,Gene,related_to,,BioLink API,,PhenotypicFeature,HP:0003326,HP:HP:0003326
19,TH,Gene,related_to,,BioLink API,"10407773,9732974,0011551,21937992,20430833,252...",PhenotypicFeature,HP:0001945,HP:HP:0001945


#### 1.3.1.2  Gene -> Bioprocess type "symptoms"

In [70]:
df_list = []
for x in top_50_related_genes_covid_2_allNodes_2_genes: 
#     print(x)
    try: 
        gene = ht.query(x)["Gene"][0]
        fc = FindConnection(input_obj=gene, output_obj='BiologicalProcess', intermediate_nodes=None)
        fc.connect(verbose=False)
        df = fc.display_table_view()
        rows = df.shape[0]
        if(rows > 0):
            df_list.append(df)
    except:
        print(str(x) + " FAILED")
if(len(df_list) > 0):
    top50gene_2_bioprocesses = pd.concat(df_list)

C0014442 FAILED
C0017337 FAILED
C0010762 FAILED
C0164786 FAILED
C1705556 FAILED
C0010531 FAILED
C0030956 FAILED
C1705526 FAILED
C0033634 FAILED
C0020364 FAILED
C1142644 FAILED


In [71]:
top50gene_2_bioprocesses.shape

(14172, 9)

In [72]:
## Get names for go ids
go_ids = top50gene_2_bioprocesses[top50gene_2_bioprocesses["output_name"].str.contains("go:",regex=False)]["output_name"]
go_ids = list(go_ids)
go_ids = list(dict.fromkeys(go_ids))
len(go_ids)
go_dict = {}
for x in go_ids: 
    go_ID = x.split(':')[1]
    r = requests.get('https://biothings.ncats.io/go_bp/geneset/GO%3A' + go_ID)
    res = r.json()
    if('name' in res):
        go_dict[res['_id']] = res['name'].lower()

In [73]:
def get_similar_bp_indices(list1,list2,similarity):
    res = [] 
    i = 0
    while (i < len(list1)):
        append_i = False
        lookup = list1[i].lower()
        if('go:' in list1[i]):
            if list1[i] in go_dict:
                lookup = go_dict[list1[i]]
        for j in list2:
                if(difflib.SequenceMatcher(None,lookup,j).ratio() > similarity):
    #                 if(i < 3):
                    print("Matched similar terms:")
                    print(lookup + ' and ' + j)
#                     print()
                    append_i = True
        if(append_i): 
            res.append(i) 
        i += 1
    print(len(res))
    return(res)

In [74]:
bp_indices = get_similar_bp_indices(list(top50gene_2_bioprocesses["output_name"]),symptom_and_phenotype_list,0.9)

0


In [75]:
top50gene_2_bioprocesses.iloc[bp_indices,:]

Unnamed: 0,input,input_type,pred1,pred1_source,pred1_api,pred1_pubmed,output_type,output_name,output_id


#### 1.3.1.3  Gene -> Disease type "symptoms" 

In [58]:
df_list = []
for x in top_50_related_genes_covid_2_allNodes_2_genes: 
#     print(x)
    try: 
        gene = ht.query(x)["Gene"][0]
        fc = FindConnection(input_obj=gene, output_obj='Disease', intermediate_nodes=None)
        fc.connect(verbose=False)
        df = fc.display_table_view()
        rows = df.shape[0]
        if(rows > 0):
            df_list.append(df)
    except:
        print(str(x) + " FAILED")
if(len(df_list) > 0):
    top50gene_2_diseases = pd.concat(df_list)

top50gene_2_diseases.shape

  obj, end = self.scan_once(s, idx)


C0014442 FAILED
C0017337 FAILED
C0010762 FAILED
C0164786 FAILED
C1705556 FAILED
C0010531 FAILED
C0030956 FAILED
C1705526 FAILED
C0033634 FAILED
C0020364 FAILED
C1142644 FAILED


(40877, 9)

In [76]:
def get_similar_disease_indices(list1,list2,similarity):
    res = [] 
    i = 0
    while (i < len(list1)):
        append_i = False
        lookup = list1[i].lower()
#         if('go:' in list1[i]):
#             if list1[i] in go_dict:
#                 lookup = go_dict[list1[i]]
        for j in list2:
                if(difflib.SequenceMatcher(None,lookup,j).ratio() > similarity):
    #                 if(i < 3):
                    print("Matched similar terms:")
                    print(lookup + ' and ' + j)
#                     print()
                    append_i = True
        if(append_i): 
            res.append(i) 
        i += 1
    print(len(res))
    return(res)


In [81]:
disease_indices = get_similar_disease_indices(list(top50gene_2_diseases["output_name"]),symptom_and_phenotype_list,0.9)

Matched similar terms:
fever and fever
Matched similar terms:
fever and fever
Matched similar terms:
fever and fever
Matched similar terms:
fever and fever
Matched similar terms:
pneumonia and pneumonia
Matched similar terms:
pneumonia and pneumonia
Matched similar terms:
pneumonia and pneumonia
Matched similar terms:
pneumonia and pneumonia
Matched similar terms:
pneumonia and pneumonia
Matched similar terms:
pneumonia and pneumonia
Matched similar terms:
pneumonia and pneumonia
Matched similar terms:
pneumonia and pneumonia
Matched similar terms:
pneumonia and pneumonia
Matched similar terms:
coughing and coughing
Matched similar terms:
diarrhea and diarrhoea
Matched similar terms:
diarrhea and diarrhea
Matched similar terms:
diarrhea and diarrhoea
Matched similar terms:
diarrhea and diarrhea
Matched similar terms:
diarrhea and diarrhoea
Matched similar terms:
diarrhea and diarrhea
Matched similar terms:
lymphopenia and lymphopenia
Matched similar terms:
lymphopenia and lymphopenia
M

Matched similar terms:
fever and fever
Matched similar terms:
fever and fever
Matched similar terms:
fever and fever
Matched similar terms:
myocarditis and myocarditis
Matched similar terms:
fever and fever
Matched similar terms:
diarrhea and diarrhoea
Matched similar terms:
diarrhea and diarrhea
Matched similar terms:
pneumonia and pneumonia
Matched similar terms:
pneumonia and pneumonia
Matched similar terms:
lymphopenia and lymphopenia
Matched similar terms:
diarrhea and diarrhoea
Matched similar terms:
diarrhea and diarrhea
Matched similar terms:
diarrhea and diarrhoea
Matched similar terms:
diarrhea and diarrhea
Matched similar terms:
diarrhea and diarrhoea
Matched similar terms:
diarrhea and diarrhea
Matched similar terms:
pneumonia and pneumonia
Matched similar terms:
pneumonia and pneumonia
Matched similar terms:
lymphopenia and lymphopenia
Matched similar terms:
myocarditis and myocarditis
Matched similar terms:
diarrhea and diarrhoea
Matched similar terms:
diarrhea and diarrh

In [85]:
# top50gene_2_diseases
relevant_top50gene_2_diseases = top50gene_2_diseases.iloc[disease_indices,:]
relevant_top50gene_2_diseases 

Unnamed: 0,input,input_type,pred1,pred1_source,pred1_api,pred1_pubmed,output_type,output_name,output_id
1539,TNF,Gene,disrupts,SEMMED,SEMMED Gene API,144323616300807226736189094446,Disease,FEVER,MONDO:C0015967
1540,TNF,Gene,causes,SEMMED,SEMMED Gene API,"10701765,15373964,16460809,1714101,17374708,17...",Disease,FEVER,MONDO:C0015967
1541,TNF,Gene,affects,SEMMED,SEMMED Gene API,"11593333,12879338,15855300,15965498,17967442,1...",Disease,FEVER,MONDO:C0015967
1542,TNF,Gene,related_to,disgenet,mydisease.info API,,Disease,FEVER,MONDO:C0015967
1772,TNF,Gene,disrupts,SEMMED,SEMMED Gene API,166782688666420,Disease,PNEUMONIA,MONDO:MONDO:0005249
...,...,...,...,...,...,...,...,...,...
1849,IFNA1,Gene,related_to,DISEASE,DISEASES API,,Disease,DIARRHEA,MONDO:MONDO:0001673
192,TLR7,Gene,related_to,SEMMED,SEMMED Gene API,19445181,Disease,PNEUMONIA,MONDO:MONDO:0005249
193,TLR7,Gene,related_to,DISEASE,DISEASES API,,Disease,PNEUMONIA,MONDO:MONDO:0005249
400,TLR7,Gene,related_to,DISEASE,DISEASES API,,Disease,DIARRHEA,MONDO:MONDO:0001673


In [80]:
i = list(top50gene_2_diseases.iloc[disease_indices,:]["input"])
d = {x:i.count(x) for x in i}
sorted_genes_from_symptoms = {k: v for k, v in sorted(d.items(), key=lambda item: item[1])}
for x in list(reversed(list(sorted_genes_from_symptoms)))[0:50]:
    print(str(x) + ": " + str(sorted_genes_from_symptoms[x]))

TNF: 29
CD4: 20
IFNA1: 14
IL6: 11
IL1B: 10
INS: 10
CAT: 10
VEGFA: 8
EPO: 7
TH: 6
MTOR: 6
ALB: 6
PPIG: 6
ACE: 6
EGFR: 5
AKT1: 5
MPO: 4
BAX: 4
ACE2: 4
AR: 4
FOS: 4
ABCB1: 4
TLR7: 3
CDKN1A: 3
APP: 3
SOD1: 3
TP53: 3
CYP3A4: 3
CASP3: 2
SOD2: 2
RELA: 2
TLR9: 2
SQSTM1: 2
CYP2B6: 1
CYP2C9: 1
CYP1A2: 1
ANG: 1
CYP2D6: 1


In [86]:
relevant_top50gene_2_diseases[relevant_top50gene_2_diseases["pred1"] == "causes"]

Unnamed: 0,input,input_type,pred1,pred1_source,pred1_api,pred1_pubmed,output_type,output_name,output_id
1540,TNF,Gene,causes,SEMMED,SEMMED Gene API,"10701765,15373964,16460809,1714101,17374708,17...",Disease,FEVER,MONDO:C0015967
1773,TNF,Gene,causes,SEMMED,SEMMED Gene API,1022372412576957246328452482336927350756,Disease,PNEUMONIA,MONDO:MONDO:0005249
2150,TNF,Gene,causes,SEMMED,SEMMED Gene API,21426732,Disease,COUGHING,MONDO:C0010200
2190,TNF,Gene,causes,SEMMED,SEMMED Gene API,17016558,Disease,DIARRHEA,MONDO:MONDO:0001673
2348,TNF,Gene,causes,SEMMED,SEMMED Gene API,2786048,Disease,LYMPHOPENIA,MONDO:MONDO:0003783
2388,TNF,Gene,causes,SEMMED,SEMMED Gene API,194368349220311,Disease,MYOCARDITIS,MONDO:MONDO:0004496
804,INS,Gene,causes,SEMMED,SEMMED Gene API,27572546,Disease,COUGHING,MONDO:C0010200
816,INS,Gene,causes,SEMMED,SEMMED Gene API,19846801,Disease,FEVER,MONDO:C0015967
877,INS,Gene,causes,SEMMED,SEMMED Gene API,103578841053544617389329,Disease,LYMPHOPENIA,MONDO:MONDO:0003783
537,IL6,Gene,causes,SEMMED,SEMMED Gene API,11259234,Disease,PNEUMONIA,MONDO:MONDO:0005249


**How to Interpret above**: Of the top genes associated with COVID-19, the above are genes that are known to cause symptoms described as symptoms in COVID-19

## 1.4 What proteins/genes are in pathways of known COVID-19 related genes? Which of these can be related to symptoms? 
### 1.4.1 Genes (from 1.1.1) -> Pathways -> Genes


### 1.4.2 COVID-19 Symptoms -> Pathways -> Genes

In [51]:
for x in symptom_and_phenotype_list:
#     print(x)
    if(ht.query(x)['PhenotypicFeature']):
        print(ht.query(x)['PhenotypicFeature'])

In [57]:
disease_symptom_list = []
for x in symptom_and_phenotype_list:
#     print(x)
    res = ht.query(x)['Disease']
    if(res):
        for y in res:
            if y['name'].lower() == x:
                disease_symptom_list.append(y)
disease_symptom_list

[{'MONDO': 'C0010200',
  'UMLS': 'C0010200',
  'name': 'Coughing',
  'primary': {'identifier': 'MONDO', 'cls': 'Disease', 'value': 'C0010200'},
  'display': 'MONDO(C0010200) UMLS(C0010200) name(Coughing)',
  'type': 'Disease'},
 {'MONDO': 'C0015967',
  'UMLS': 'C0015967',
  'name': 'Fever',
  'primary': {'identifier': 'MONDO', 'cls': 'Disease', 'value': 'C0015967'},
  'display': 'MONDO(C0015967) UMLS(C0015967) name(Fever)',
  'type': 'Disease'},
 {'MONDO': 'MONDO:0004496',
  'DOID': 'DOID:820',
  'UMLS': 'C0027059',
  'name': 'myocarditis',
  'MESH': 'D009205',
  'primary': {'identifier': 'MONDO',
   'cls': 'Disease',
   'value': 'MONDO:0004496'},
  'display': 'MONDO(MONDO:0004496) DOID(DOID:820) UMLS(C0027059) MESH(D009205) name(myocarditis)',
  'type': 'Disease'},
 {'MONDO': 'MONDO:0005249',
  'DOID': 'DOID:552',
  'UMLS': 'C0032285',
  'name': 'pneumonia',
  'MESH': 'D011014',
  'primary': {'identifier': 'MONDO',
   'cls': 'Disease',
   'value': 'MONDO:0005249'},
  'display': 'MONDO