<a href="https://colab.research.google.com/github/ersilia-os/event-fund-ai-drug-discovery/blob/main/notebooks/session4_breakout.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Session 4: Generative Models

This notebook explores generative models based on similarity searches (using a molecule as initial hit, looks for similar molecules in a virtually generated library).
We will use our top10 hits from Session 2 to explore how these models work.

In [None]:
#@title Mount Google Drive and install the necessary packages

%%capture

from google.colab import drive
drive.mount('/content/drive')


!pip install rdkit
!pip install umap-learn
import sys
sys.path.append("drive/MyDrive/h3d_ersilia_ai_workshop/data/session1/")
from courseFunctions import *

In [None]:
#@title Select the SMILES from the MMV list
smiles = "CC(=O)c1sc(NC(=O)Nc2ccc(C)cc2C)nc1C" #@param {type:"string"}

In [None]:
#@markdown Run this cell to visualise the selected compound using RdKit
#revise session 2 skills development if you are unsure about this step to visualise molecules using RdKit
from rdkit import Chem
from rdkit.Chem import Draw

mol = Chem.MolFromSmiles(smiles)
Draw.MolToImage(mol)

## Ersilia Model Hub
First, we need to install Ersilia in this Google Colab notebook.

In [None]:
#@markdown Click on the play button to install Ersilia in this Colab notebook.

%%capture
%env MINICONDA_INSTALLER_SCRIPT=Miniconda3-py37_4.12.0-Linux-x86_64.sh
%env MINICONDA_PREFIX=/usr/local
%env PYTHONPATH={PYTHONPATH}:/usr/local/lib/python3.7/site-packages
%env CONDA_PREFIX=/usr/local
%env CONDA_PREFIX_1=/usr/local
%env CONDA_DIR=/usr/local
%env CONDA_DEFAULT_ENV=base
!wget https://repo.anaconda.com/miniconda/$MINICONDA_INSTALLER_SCRIPT
!chmod +x $MINICONDA_INSTALLER_SCRIPT
!./$MINICONDA_INSTALLER_SCRIPT -b -f -p $MINICONDA_PREFIX
!python -m pip install git+https://github.com/ersilia-os/ersilia.git
!python -m pip install requests --upgrade
import sys
_ = (sys.path.append("/usr/local/lib/python3.7/site-packages"))

### Fetching Similarity Models
We will work with two similarity models:
* eos4b8j: gdbchembl-similarity
* eos4b8j gdbmedchem-similarity

In short, each of those models uses a virtually generated library of billions of hits to identify the 100 most similar to the starting point. You can read more about them on their respective publications([gdbchembl](https://www.frontiersin.org/articles/10.3389/fchem.2020.00046/full) and [gdbmedchem](https://onlinelibrary.wiley.com/doi/abs/10.1002/minf.201900031))

GDBChEMBL contains a collection of 166.4 billion possible molecules of up to 17 atoms, and is browsable [here](http://faerun.gdb.tools/). The GBDMedChEM is a curated version of GDBChEMBL and restricts the search space to 10 million [molecules](http://gdb.unibe.ch)

*Disclaimer: both these models post predictions online. If you are concerned about IP privacy issues check the publication for more information on data policy.* 

## Generating a 100 molecules from the top hit
Together we will walk through an example of how we can generate hits from the best molecule we found in the MMV Malaria Box using the similarity search in ChEMBL

In [None]:
#@title GDBChEMBL Similarity
#@markdown Press the play button to run a prediction!
!ersilia fetch eos4b8j
from ersilia import ErsiliaModel

model = ErsiliaModel("eos4b8j")
model.serve()
output = model.predict(input=smiles, output="pandas")
model.close()
output.to_csv("drive/MyDrive/h3d_ersilia_ai_workshop/data/session4/eos4b8j.csv", index=False)

In [None]:
#@title GDBMEdChem Similarity
#@markdown Press the play button to run a prediction!
!ersilia fetch eos7jlv
from ersilia import ErsiliaModel

model = ErsiliaModel("eos7jlv")
model.serve()
output = model.predict(input=smiles, output="pandas")
model.close()
output.to_csv("drive/MyDrive/h3d_ersilia_ai_workshop/data/session4/eos7jlv.csv", index=False)

In [None]:
#@title Check the output of ChEMBL eos4b8j
eos4b8j_preds = "drive/MyDrive/h3d_ersilia_ai_workshop/data/session4/eos4b8j.csv" #@param {type:"string"}

df1 = pd.read_csv(eos4b8j_preds)
df1.head()

In [None]:
#@title Check the molecules predicted by MedChem - eos7jlv
eos7jlv_preds = "drive/MyDrive/h3d_ersilia_ai_workshop/data/session4/eos7jlv.csv" #@param {type:"string"}

df2 = pd.read_csv(eos7jlv_preds)
df2.head()

Discussion: which are the datasets that we have now? What can we do?

In [None]:
#@title See the molecules generated by eos4b8j - ChEMBL
eos4b8j_smiles = df1[list(df1.columns)[2:]].iloc[0].tolist()
data_smiles = pd.DataFrame()
data_smiles["Smiles"] = eos4b8j_smiles
data_smiles.to_csv("drive/MyDrive/h3d_ersilia_ai_workshop/data/session4/eos4b8j_smiles.csv", index=False)

from rdkit import Chem
from rdkit.Chem import Draw

mols = [Chem.MolFromSmiles(smi) for smi in eos4b8j_smiles]
Draw.MolsToGridImage(mols, molsPerRow=5)

In [None]:
#@title See the molecules generated by eos7jlv - MedChem
eos7jlv_smiles = df2[list(df2.columns)[2:]].iloc[0].tolist()
data_smiles = pd.DataFrame()
data_smiles["Smiles"] = eos7jlv_smiles
data_smiles.to_csv("drive/MyDrive/h3d_ersilia_ai_workshop/data/session4/eos7jlv_smiles.csv", index=False)


from rdkit import Chem
from rdkit.Chem import Draw

mols = [Chem.MolFromSmiles(smi) for smi in eos7jlv_smiles]
Draw.MolsToGridImage(mols, molsPerRow=5)

In [None]:
#@title Dimensionality Reduction
path = "drive/MyDrive/h3d_ersilia_ai_workshop/data/session4/"
file_list = ["eos4b8j_smiles.csv", "eos7jlv_smiles.csv", "input_mol.csv"]
my_plots = plots(path, file_list)

my_plots.plot_pca()
my_plots.plot_umap()

In [None]:
#@title Prediction of antimalarial activity for the generated molecules using the Ersilia Model Hub


!ersilia fetch eos2gth
from ersilia import ErsiliaModel

model = ErsiliaModel("eos2gth")
model.serve()
output = model.predict(input=eos4b8j_smiles, output="pandas")
model.close()
output.to_csv("drive/MyDrive/h3d_ersilia_ai_workshop/data/session4/eos2gth_eos4b8j.csv", index=False)

model = ErsiliaModel("eos2gth")
model.serve()
output = model.predict(input=eos7jlv_smiles, output="pandas")
model.close()
output.to_csv("drive/MyDrive/h3d_ersilia_ai_workshop/data/session4/eos2gth_eos7jlv.csv", index=False)

In [None]:
#@title Check the predicted antimalarial activities of the new compounds
import matplotlib.pyplot as plt

df1 = pd.read_csv("drive/MyDrive/h3d_ersilia_ai_workshop/data/session4/eos2gth_eos4b8j.csv")
df2 = pd.read_csv("drive/MyDrive/h3d_ersilia_ai_workshop/data/session4/eos2gth_eos7jlv.csv")

plt.hist(df1["score"], label="eos4b8j - ChEMBL", alpha=0.5)
plt.hist(df2["score"], label="eos7jlv - MedChem", alpha=0.5)
plt.legend()
plt.show()

In [16]:
#@title ADMETLab2 prediction for the generated molecules using the Ersilia Model Hub


!ersilia fetch eos2v11
from ersilia import ErsiliaModel

model = ErsiliaModel("eos2v11")
model.serve()
output = model.predict(input=eos4b8j_smiles, output="pandas")
model.close()
output.to_csv("drive/MyDrive/h3d_ersilia_ai_workshop/data/session2/eos2v11_eos4b8j.csv", index=False)

output = model.predict(input=eos7jlv_smiles, output="pandas")
model.close()
output.to_csv("drive/MyDrive/h3d_ersilia_ai_workshop/data/session2/eos2gth_eos7jlv.csv", index=False)

Requested model eos2v11 if not available locally. Do you want to fetch it? [Y/n]

IndexError: ignored