# Using Collaborative Filtering To Repurpose Drugs for COVID-19

*** from https://time.com/5819965/coronavirus-treatments-research/ ***

### Introduction
There are no treatments proven to disable SARS-CoV-2, the virus that causes COVID-19, which means all the options scientists are exploring are still very much in the trial-and-error stage. 

The sense of urgency is pushing researchers at academic institutes as well as pharmaceutical companies to turn to their libraries of thousands of approved drugs or compounds that are in early testing and screening to see if any can disable SARS-CoV-2. Because these are either already approved and deemed safe for people, if any emerge as possible anti-COVID-19 therapies, companies could begin testing them in people infected with the virus within weeks.

This notebook is attempting to accelerate this process of drug repurposing by using a popular and relatively simple technique called Collaborative Filtering to search for potential drug candidates. 


### What is Collaborative Filtering and How Can We Use It for Drug Repurposing?

When a website recommends a product to you it can recommend the product to you in one fo two ways: 
1. Using a content based approach (you are female, married and aged 30+; this product is bought by married females aged 30+ and therefore can be recommended to you) 
2. Using collaborative filtering wherein the latent factors (female, married, aged 30+ etc.) of the product and the user are not used but rather, they are inferred based on past purchases. More info here - https://en.wikipedia.org/wiki/Collaborative_filtering

We're using collaborative filtering for drug repurposing by simply replacing users with drugs and products with diseases. So if we have a large enough database of drugs and disease conditions, we could, in theory predict the efficacy of a drug on a particular disease condition (ex: COVID-19). 

This approach has been tried earlier for diseases such as Alzheimers and stroke - https://en.wikipedia.org/wiki/Collaborative_filtering



### Potential Drawbacks of Collaborative Filtering For COVID-19
1. "Cold Start Problem" - since COVID-19 is a new disease and no drug has been conclusively proven to be efficacious against the disease, we simply do not have the right amount of data to make a prediction. This situation will rapidly change if some existing drug emerges as having efficacy. For the purpose of modeling here, I've simply taken three drugs that have anecdotally shown some efficacy. Drugs used for modeling here are: 
    - Chloroquine
    - Ritonavir
    - Lopinavir	
    

2. Popular drugs may get recommended more ... we'll need to watch out for drugs that address many types of cancers.

### Code for Collaborative Filtering

In [7]:
from fastai.collab import *
from fastai.tabular import *

In [13]:
drug_diseases = pd.read_csv('./repoDBV2.csv')

In [14]:
drug_diseases.head()

Unnamed: 0,drug_name,ind_name,rating
0,Lepirudin,Heparin-induced thrombocytopenia with thrombosis,1
1,Cetuximab,Squamous cell carcinoma of mouth,1
2,Cetuximab,Squamous cell carcinoma of nose,1
3,Cetuximab,Squamous cell carcinoma of pharynx,1
4,Cetuximab,Laryngeal Squamous Cell Carcinoma,1


In [15]:
#this dataset has 10,565 rows
#if a drug is used for a disease, we give it a rating of 1
drug_diseases.shape

(10565, 3)

In [17]:
# I've also included 3 records that pertain to 
# drugs currently being used for COVID-19
# We've given arbitrary rating of 1 for all 3 drugs
# even though their efficacy is as yet not fully known
drug_diseases.tail()

Unnamed: 0,drug_name,ind_name,rating
10560,Temozolomide,Brain Diseases,1
10561,Dasatinib,Brain Diseases,1
10562,Chloroquine,COVID-19,1
10563,Ritonavir,COVID-19,1
10564,Lopinavir,COVID-19,1


In [30]:
data = CollabDataBunch.from_df(ratings, seed=42)

In [81]:
y_range = [-.999,1.01]
learn = collab_learner(data, n_factors=50, y_range=y_range)
learn.fit_one_cycle(5, 5e-3)

epoch,train_loss,valid_loss,time
0,0.923443,0.841007,00:01
1,0.424292,0.319026,00:01
2,0.149691,0.219889,00:01
3,0.062668,0.196784,00:01
4,0.040038,0.193257,00:01


In [82]:
#sanity check to see if our model is predicting well on the validation data
(drug, disease), efficacy = next(iter(data.valid_dl))
preds = learn.model(drug, disease)
print('Real\tPred\tDifference')
for p in list(zip(efficacy, preds))[:5]:
    print('{}\t{:.1f}\t{:.1f}'.format(p[0],p[1],p[1]-p[0]))

Real	Pred	Difference
1.0	1.0	-0.0
1.0	0.9	-0.1
1.0	0.7	-0.3
1.0	1.0	-0.0
1.0	0.6	-0.4


In [83]:
covid_candidate_drugs= pd.read_csv('./covid_candidate_drugs.csv')

In [84]:
covid_candidate_drugs.shape

(1572, 3)

In [85]:
covid_candidate_drugs.head()

Unnamed: 0,drug_name,ind_name,rating
0,Lepirudin,COVID-19,
1,Cetuximab,COVID-19,
2,Dornase alfa,COVID-19,
3,Denileukin diftitox,COVID-19,
4,Etanercept,COVID-19,


In [87]:
drugs_covid_predictions = pd.DataFrame(
    [covid_candidate_drugs.loc[i][0], learn.predict(covid_candidate_drugs.loc[i])[0]] for i in range(covid_candidate_drugs.shape[0])
)

In [88]:
#drugs_df.to_csv('drug_names.csv')
drugs_covid_predictions.to_csv('drugs_covid_predictions.csv')

In [89]:
drugs_covid_predictions = pd.read_csv('./drugs_covid_predictions.csv')

In [91]:
del drugs_covid_predictions['Unnamed: 0']

In [93]:
drugs_covid_predictions.rename(columns={'0': 'drug_name', '1': 'predicted_efficacy'}, inplace=True)

In [98]:
drugs_covid_predictions.sort_values('predicted_efficacy').tail(10)

Unnamed: 0,drug_name,predicted_efficacy
700,Salicylic acid,0.199436
683,Metronidazole,0.199734
1325,Trametinib,0.200392
67,Bevacizumab,0.200724
752,Doxorubicin,0.207391
496,Amphotericin B,0.209665
939,Dexamethasone,0.211339
793,Benzylpenicillin,0.217711
968,Sunitinib,0.227656
360,Cisplatin,0.230221


### Drugs from our top 5 list that are being investigated for COVID-19: 
* Bevacizumab - https://clinicaltrials.gov/ct2/show/NCT04275414 , https://clinicaltrials.gov/ct2/show/NCT04327401
* Metronidazole - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7114714/

### Next Steps
- At the moment, there is insufficient evidence to abandon this approach of Collaborative Filtering. 
- Therefore, possible next steps are: 
    - Try this approach with a larger dataset
    - Try adding improved drug-disease performance ratings (rather than 1 or nothing as it stands now)
    - Try playing around with the hyperparameters (epochs, number of latent factors, learning rate etc.)
    - Change score of Ruxolitinib to 1 and test (recently scientists have started researching this drug)
