Ujian Akhir Semester SCMA801007 - Komputasi Lanjut dan Big Data\
Nama : Ivana Joice Chandra\
NPM : 2306174955

#Using Collaborative Filtering To Repurpose Drugs for COVID-19

Collaborative filtering is a type of recommendation system technique that predicts users' preferences or interests by leveraging the preferences and behaviors of a group of similar users. In other words, it makes automatic predictions about a user's interests by collecting preferences from many users (collaborating) and recommending items or content based on the preferences of users with similar tastes or preferences.
\
\
In the context of drug repurposing when only drug and disease data are available, collaborative filtering relies on historical interactions between drugs and diseases to uncover meaningful patterns and potential repurposing opportunities. This approach involves constructing profiles for drugs and diseases based on their past associations, transforming these interactions into multidimensional vectors. The similarity between drugs and diseases is then calculated using metrics like cosine similarity or Euclidean distance, allowing the identification of shared patterns in their historical interactions.
\
\
Collaborative filtering, in this context, serves as a computational tool to guide drug repurposing efforts by identifying potential candidates for further investigation. Drugs that show high similarity based on their historical interactions with diseases become candidates for repurposing, suggesting that their effectiveness for a particular disease aligns with that of other drugs with shared characteristics. While collaborative filtering provides valuable insights, experimental validation remains crucial to confirm and validate the predicted repurposing candidates, ensuring a comprehensive and reliable approach to discovering new therapeutic uses for existing drugs.

## Getting Started

In this assignment, I'm reproducing the code that I found on Github. The original source code can be accessed from \
https://github.com/vikram-s-narayan/collaborative-filtering-for-drug-repurposing-COVID-V3. This notebook is attempting to accelerate this process of drug repurposing for COVID-19 by using a popular and relatively simple technique called Collaborative Filtering to search for potential drug candidates.

First of all, clone the GitHub repository and access the 'collaborative-filtering-for-drug-repurposing-COVID-V3' directory, which contains the original source code and the required dataset.

In [None]:
!git clone https://github.com/vikram-s-narayan/collaborative-filtering-for-drug-repurposing-COVID-V3

Cloning into 'collaborative-filtering-for-drug-repurposing-COVID-V3'...
remote: Enumerating objects: 11, done.[K
remote: Counting objects: 100% (11/11), done.[K
remote: Compressing objects: 100% (10/10), done.[K
remote: Total 11 (delta 1), reused 11 (delta 1), pack-reused 0[K
Receiving objects: 100% (11/11), 293.87 KiB | 5.25 MiB/s, done.
Resolving deltas: 100% (1/1), done.


In [None]:
%cd collaborative-filtering-for-drug-repurposing-COVID-V3

/content/collaborative-filtering-for-drug-repurposing-COVID-V3


## Import Library

In this tutorial, I'm utilizing the FastAI library to implement collaborative filtering, a powerful technique for building recommendation systems based on user-item interactions. The FastAI library is a high-level deep learning library built on top of PyTorch that provides a simple and efficient way to train state-of-the-art models for a variety of tasks. For collaborative filtering, FastAI provides specific modules and functions that make it easy to build recommendation systems.

In [None]:
from fastai.collab import *
from fastai.tabular.all import *
import pandas as pd

## Load Dataset

The dataset used is available on http://apps.chiragjpgroup.org/repoDB/ (associated paper: 'Brown AS and Patel CJ. repoDB: A New Standard for Drug Repositioning Validation. Scientific Data. 170029 (2017).'
\
\
The original author have created a simplified dataset from the full drug database and saved it as 'approved_COVID.csv'. The last column ('rating') provides a score of 1, if a drug is used for a particular condition. This dataset will then be used to train our model.

In [None]:
approved_covid = pd.read_csv('./approved_COVID.csv')
approved_covid.head()

Unnamed: 0,drug_name,ind_name,rating
0,Lepirudin,Heparin-induced thrombocytopenia with thrombosis,1
1,Cetuximab,Squamous cell carcinoma of mouth,1
2,Cetuximab,Squamous cell carcinoma of nose,1
3,Cetuximab,Squamous cell carcinoma of pharynx,1
4,Cetuximab,Laryngeal Squamous Cell Carcinoma,1


## Train the model

Before training our model, first we need to create a collaborative filtering data loaders (data) from a DataFrame (approved_covid) by specifying the columns for users, items, and ratings.

In [None]:
data = CollabDataLoaders.from_df(approved_covid, user_name='drug_name', item_name = 'ind_name',rating_name = 'rating',seed = 42)

  return getattr(torch, 'has_mps', False)


Next, initialize a collaborative filtering learner. First, set the y_range which ensures that the predicted ratings are constrained within the specified range, and then trains the model for six cycles using the one-cycle learning rate policy. The model aims to learn latent factors to make predictions within the specified rating range for drug recommendations.

In [None]:
y_range = [0,1.01]
learn = collab_learner(data, n_factors=50, y_range=y_range)
learn.fit_one_cycle(6, 5e-3)

  return getattr(torch, 'has_mps', False)


epoch,train_loss,valid_loss,time
0,0.237234,0.22592,00:01
1,0.17052,0.120502,00:01
2,0.090867,0.080802,00:01
3,0.050083,0.067801,00:01
4,0.032025,0.063632,00:01
5,0.026,0.063028,00:01


  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps',

Then check if our model is predicting well on the validation data.

In [None]:
import numpy as np
tensor1, efficacy = next(iter(data.valid))
drug = tensor1[:, 0]
disease = tensor1[:, 1]

preds = learn.model(tensor1)

tensor_example1 = preds
numpy_array = tensor_example1.detach().numpy()
tensor_example2 = efficacy
numpy_array2 = tensor_example2.detach().numpy()

print('Real\tPred\tDifference')
for p in list(zip(efficacy, preds))[:5]:
  print(p[0].float().item(),"   ",round(p[1].float().item(),2),"   ",round((p[1]-p[0]).float().item(),2))
    # print('{}\t{:.1f}\t{:.1f}'.format(p[0],p[1],p[1]-p[0]))

Real	Pred	Difference
1.0     0.56     -0.44
1.0     0.96     -0.04
1.0     0.97     -0.03
1.0     0.68     -0.32
1.0     0.85     -0.15


## Predicting The Efficacy of The Drugs on COVID-19

The original author have created a list of all the drugs in the previous database with the disease set as COVID-19 so that we can ask our model to make predictions of efficacy for each of the drugs on COVID-19. The database is saved as 'covid_candidate_drugs.csv'.

In [None]:
covid_candidate_drugs= pd.read_csv('./covid_candidate_drugs.csv')

In [None]:
covid_candidate_drugs.shape

(1573, 3)

In [None]:
covid_candidate_drugs.sample(5)

Unnamed: 0,drug_name,ind_name,rating
673,Didanosine,COVID-19,
357,Dextrothyroxine,COVID-19,
747,Oxaprozin,COVID-19,
669,Testolactone,COVID-19,
638,Drostanolone,COVID-19,


The next step is to predict the efficacy for each of the drugs on COVID-19. The original source code used learn.predict() to do the prediction. Here I attempted to run the original code obtained from the Github repository.

In [None]:
drugs_covid_predictions = pd.DataFrame(
    [covid_candidate_drugs.loc[i][0], learn.predict(covid_candidate_drugs.loc[i])[0]] for i in range(covid_candidate_drugs.shape[0])
)

TypeError: ignored

 As we can see, the provided code results in a TypeError after execution. I've tried several alternatives and still haven't found a way to resolve the error. Therefore, I modified the code using learn.get_preds().
 \
 \
 In Fastai, learn.get_preds() is a method that is used to obtain predictions from a trained learner. The learn.get_preds() method is convenient because it automatically handles the details of batching and inference, making it easy to obtain predictions for the entire dataset. It's particularly useful in scenarios where you want to evaluate the model's performance on a validation set or generate predictions for new data.

In [None]:
# Use the test DataLoader with the DataFrame
dl = learn.dls.test_dl(covid_candidate_drugs)
preds, _ = learn.get_preds(dl=dl)

# Extract the predicted ratings from the predictions
predicted_ratings = preds.numpy()

# Add the predicted ratings to the original DataFrame
covid_candidate_drugs['predicted_rating'] = predicted_ratings

# Display the new DataFrame
drugs_covid_predictions = covid_candidate_drugs[['drug_name','predicted_rating']]
drugs_covid_predictions

  return getattr(torch, 'has_mps', False)


  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps', False)
  return getattr(torch, 'has_mps',

                    drug_name  ind_name  predicted_rating
0                   Lepirudin  COVID-19          0.618225
1                   Cetuximab  COVID-19          0.647963
2                Dornase alfa  COVID-19          0.661940
3         Denileukin diftitox  COVID-19          0.687635
4                  Etanercept  COVID-19          0.702807
...                       ...       ...               ...
1568          Acetylcarnitine  COVID-19          0.591798
1569  Eslicarbazepine acetate  COVID-19          0.591798
1570             Nitrendipine  COVID-19          0.591798
1571          Succinylcholine  COVID-19          0.591798
1572               Remdesivir  COVID-19          0.715246

[1573 rows x 3 columns]


Then, save the prediction results to a csv file.

In [None]:
drugs_covid_predictions.to_csv('drugs_covid_predictions.csv')

## Prediction Result

In [None]:
drugs_covid_predictions = pd.read_csv('./drugs_covid_predictions.csv')
drugs_covid_predictions.head()

Unnamed: 0.1,Unnamed: 0,drug_name,predicted_rating
0,0,Lepirudin,0.618225
1,1,Cetuximab,0.647963
2,2,Dornase alfa,0.66194
3,3,Denileukin diftitox,0.687635
4,4,Etanercept,0.702807


In [None]:
del drugs_covid_predictions['Unnamed: 0']
drugs_covid_predictions.rename(columns={'predicted_rating': 'predicted_efficacy'}, inplace=True)
drugs_covid_predictions.sort_values('predicted_efficacy').head()

Unnamed: 0,drug_name,predicted_efficacy
698,Maprotiline,0.561654
207,Amitriptyline,0.562251
85,Folic Acid,0.562754
68,Cyanocobalamin,0.564357
277,Alprazolam,0.566615


Finally, we list out our top 30 drugs with highest predicted efficacy.

In [None]:
drugs_covid_predictions.sort_values('predicted_efficacy').tail(30)

Unnamed: 0,drug_name,predicted_efficacy
1068,Hydroxychloroquine,0.80324
156,Doxycycline,0.806526
284,Ampicillin,0.806632
1470,Pramocaine,0.806779
354,Ritonavir,0.806902
455,Prednisone,0.807381
860,Levofloxacin,0.807934
127,Moxifloxacin,0.809608
665,Mechlorethamine,0.811926
920,Clarithromycin,0.813737


Although the code runs successfully, it produces results different from those in the original GitHub repository. This discrepancy may be due to modifications made to the code, and there's a possibility that the training data used might differ as it is randomly sampled.