# COVID-19 Drug Repurposing Example
This example shows how to do drug repurposing using DRKG even with the pretrained model.

## Collecting COVID-19 related disease
At the very beginning we need to collect a list of disease of Corona-Virus(COV) in DRKG. We can easily use the Disease ID that DRKG uses for encoding the disease. Here we take all of the COV disease as target.

In [1]:
COV_disease_list = [
'Disease::SARS-CoV2 E',
'Disease::SARS-CoV2 M',
'Disease::SARS-CoV2 N',
'Disease::SARS-CoV2 Spike',
'Disease::SARS-CoV2 nsp1',
'Disease::SARS-CoV2 nsp10',
'Disease::SARS-CoV2 nsp11',
'Disease::SARS-CoV2 nsp12',
'Disease::SARS-CoV2 nsp13',
'Disease::SARS-CoV2 nsp14',
'Disease::SARS-CoV2 nsp15',
'Disease::SARS-CoV2 nsp2',
'Disease::SARS-CoV2 nsp4',
'Disease::SARS-CoV2 nsp5',
'Disease::SARS-CoV2 nsp5_C145A',
'Disease::SARS-CoV2 nsp6',
'Disease::SARS-CoV2 nsp7',
'Disease::SARS-CoV2 nsp8',
'Disease::SARS-CoV2 nsp9',
'Disease::SARS-CoV2 orf10',
'Disease::SARS-CoV2 orf3a',
'Disease::SARS-CoV2 orf3b',
'Disease::SARS-CoV2 orf6',
'Disease::SARS-CoV2 orf7a',
'Disease::SARS-CoV2 orf8',
'Disease::SARS-CoV2 orf9b',
'Disease::SARS-CoV2 orf9c',
'Disease::MESH:D045169',
'Disease::MESH:D045473',
'Disease::MESH:D001351',
'Disease::MESH:D065207',
'Disease::MESH:D028941',
'Disease::MESH:D058957',
'Disease::MESH:D006517'
]

In [2]:
entity_idmap_file = '../data/drkg/embed/entities.tsv'
relation_idmap_file = '../data/drkg/embed/relations.tsv'

## Candidate drugs
Now we use FDA-approved drugs in Drugbank as candidate drugs. Here we use all of the drugs in the Drugbank.

In [3]:
import csv
drug_list = []
with open(entity_idmap_file, newline='', encoding='utf-8') as csvfile:
    reader = csv.DictReader(csvfile, delimiter='\t', fieldnames=['name','id'])
    for row_val in reader:
        if row_val['name'].startswith('Compound::DB'):
            drug_list.append(row_val['name'])

## Treatment relation

Two treatment relations in this context

In [4]:
treatment = ['Hetionet::CtD::Compound:Disease', 'GNBR::T::Compound:Disease']

## Get pretrained model
We can directly use the pretrianed model to do drug repurposing.

In [5]:
import pandas as pd
import numpy as np
import sys
sys.path.insert(1, '../utils')
from utils import download_and_extract
download_and_extract()

## Get embeddings for diseases and drugs

In [6]:
# Get drugname/disease name to entity ID mappings
entity_map = {}
entity_id_map = {}
relation_map = {}
with open(entity_idmap_file, newline='', encoding='utf-8') as csvfile:
    reader = csv.DictReader(csvfile, delimiter='\t', fieldnames=['name','id'])
    for row_val in reader:
        entity_map[row_val['name']] = int(row_val['id'])
        entity_id_map[int(row_val['id'])] = row_val['name']
        
with open(relation_idmap_file, newline='', encoding='utf-8') as csvfile:
    reader = csv.DictReader(csvfile, delimiter='\t', fieldnames=['name','id'])
    for row_val in reader:
        relation_map[row_val['name']] = int(row_val['id'])
        
# handle the ID mapping
drug_ids = []
disease_ids = []
for drug in drug_list:
    drug_ids.append(entity_map[drug])
    
for disease in COV_disease_list:
    disease_ids.append(entity_map[disease])

treatment_rid = [relation_map[treat]  for treat in treatment]

In [7]:
# Load embeddings
import torch as th
entity_emb = np.load('../data/drkg/embed/DRKG_TransE_l2_entity.npy')
rel_emb = np.load('../data/drkg/embed/DRKG_TransE_l2_relation.npy')

drug_ids = th.tensor(drug_ids).long()
disease_ids = th.tensor(disease_ids).long()
treatment_rid = th.tensor(treatment_rid)

drug_emb = th.tensor(entity_emb[drug_ids])
treatment_embs = [th.tensor(rel_emb[rid]) for rid in treatment_rid]

## Drug Repurposing Based on Edge Score
We use following algorithm to calculate the edge score. Note, here we use logsigmiod to make all scores < 0. The larger the score is, the stronger the $h$ will have $r$ with $t$.

$\mathbf{d} = \gamma - ||\mathbf{h}+\mathbf{r}-\mathbf{t}||_{2}$

$\mathbf{score} = \log\left(\frac{1}{1+\exp(\mathbf{-d})}\right)$

When doing drug repurposing, we only use the treatment related relations.

In [8]:
import torch.nn.functional as fn

gamma=12.0
def transE_l2(head, rel, tail):
    score = head + rel - tail
    return gamma - th.norm(score, p=2, dim=-1)

scores_per_disease = []
dids = []
for rid in range(len(treatment_embs)):
    treatment_emb=treatment_embs[rid]
    for disease_id in disease_ids:
        disease_emb = entity_emb[disease_id]
        score = fn.logsigmoid(transE_l2(drug_emb, treatment_emb, disease_emb))
        scores_per_disease.append(score)
        dids.append(drug_ids)
scores = th.cat(scores_per_disease)
dids = th.cat(dids)


In [9]:
# sort scores in decending order
idx = th.flip(th.argsort(scores), dims=[0])
scores = scores[idx].numpy()
dids = dids[idx].numpy()

### Now we output proposed treatments

In [10]:
_, unique_indices = np.unique(dids, return_index=True)
topk=100
topk_indices = np.sort(unique_indices)[:topk]
proposed_dids = dids[topk_indices]
proposed_scores = scores[topk_indices]

Now we list the pairs of in form of (drug, treat, disease, score) 

We select top K relevent drugs according the edge score

In [11]:
for i in range(topk):
    drug = int(proposed_dids[i])
    score = proposed_scores[i]
    
    print("{}\t{}".format(entity_id_map[drug], score))

Compound::DB00811	-0.21358221769332886
Compound::DB00982	-0.7457163333892822
Compound::DB00928	-0.8047788739204407
Compound::DB01082	-0.80478835105896
Compound::DB00563	-0.8762229681015015
Compound::DB00635	-0.8786484599113464
Compound::DB00853	-0.8897395133972168
Compound::DB01001	-0.89293372631073
Compound::DB00681	-0.9245818853378296
Compound::DB00787	-0.9289862513542175
Compound::DB00290	-0.9402258396148682
Compound::DB00566	-0.9544163942337036
Compound::DB00860	-0.9569648504257202
Compound::DB00249	-0.9634518027305603
Compound::DB01099	-0.9672449827194214
Compound::DB01222	-0.9871338605880737
Compound::DB00993	-0.9912917613983154
Compound::DB00512	-1.0083342790603638
Compound::DB12510	-1.0196998119354248
Compound::DB00624	-1.0209006071090698
Compound::DB01024	-1.0232834815979004
Compound::DB01234	-1.025983214378357
Compound::DB01004	-1.0309269428253174
Compound::DB00688	-1.03281569480896
Compound::DB00091	-1.0585496425628662
Compound::DB00558	-1.0633755922317505
Compound::DB00331	

### Check Clinial Trial Drugs
There are seven clinial trial drugs hit in top100. (Note: Ribavirin exists in DRKG as a treatment for SARS) We provide the clinical trial drug list in COVID19\_clinical\_trial\_drugs.tsv, but you can also refer to https://covid19-trials.com/.

In [12]:
clinical_drugs_file = './COVID19_clinical_trial_drugs.tsv'
clinical_drug_map = {}
with open(clinical_drugs_file, newline='', encoding='utf-8') as csvfile:
    reader = csv.DictReader(csvfile, delimiter='\t', fieldnames=['id', 'drug_name','drug_id'])
    for row_val in reader:
        clinical_drug_map[row_val['drug_id']] = row_val['drug_name']
        
for i in range(topk):
    drug = entity_id_map[int(proposed_dids[i])][10:17]
    if clinical_drug_map.get(drug, None) is not None:
        score = proposed_scores[i]
        print("[{}]{}".format(i, clinical_drug_map[drug]))

[0]Ribavirin
[21]Dexamethasone
[31]Colchicine
[51]Methylprednisolone
[76]Oseltamivir
