<a href="https://colab.research.google.com/github/gmm/RDKit-on-Colab/blob/main/RDKit_Google_Colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Cheminformatics in the Cloud

We can use Google Colab to run Jupyter iPython notebooks on Google Cloud infrastructure. This iPython Notebook has been developed to run on Google Colab.

We start by installing Conda and Mamba, using them to install the additional Python packages we need -- most importantly, RDKit.

## 1. Set Up Conda

First we need conda before we can install RDKit.

We use CondaColab, which was developed by [Jaime Rodríguez-Guerra](https://github.com/jaimergp), and a solution that is discussed here:

* https://inside-machinelearning.com/en/how-to-install-use-conda-on-google-colab/.

This needs to be run at the start of the notebook, because it restarts the kernel.

CondaColab uses [Mamba](https://mamba.readthedocs.io/en/latest/index.html), a "fast, robust, and cross-platform package manager, ... a Python-based CLI conceived as a drop-in replacement for conda, offering higher speed and more reliable environment solutions." Mamba's [core is implemented in C++ and allows parallel downloading of repository data and package files using multi-threading](https://github.com/mamba-org/mamba).

By default, CondaColab installs Mambaforge,  but `condacolab.install_anaconda()` [will install the Anaconda 2020.02 distribution, the last version that was built for Python 3.7](https://github.com/jaimergp/condacolab) (which is needed for Google Colab,  as of July 2021).

Don't panic when you see your "session crashed for an unknown reason": the kernel restarts automatically, and Colab picks this up as an error. You want this to happen, so you can use the new `conda` environment.

In [None]:
!pip install -q condacolab
import condacolab
condacolab.install()  # for ML, use `condacolab.install_anaconda()`

Check the versions of conda and mamba:

In [None]:
!which conda
!conda --version
!which mamba
!mamba --version

# 2. Update Conda

In tests, `Mamba` seems to be slightly faster than `conda` to update `conda`:

In [None]:
#!time conda update -y -n base conda

In [None]:
!time mamba update -y -n base conda

In [None]:
!conda --version
!mamba --version

# 3. Install RDKit

Use mamba to install RDKit.

In [None]:
# Install the latest version of RDKit using mamba
#
#  -- Hint: don't create a separate environment, otherwise you must activate
#           it before Google's Jupyter starts

!mamba install -y -c rdkit rdkit

# Install Py3Dmol

So we can view and interact with 3D molecules in iPython and Jupyter.

In [None]:
!mamba install -y -c conda-forge py3dmol

In [None]:
%pwd

In [None]:
%ls

# 4. Import RDKit

There are lots of useful websites explaining how to use RDKit.



*   [Getting Started with the RDKit in Python](https://rdkit.readthedocs.io/en/latest/GettingStartedInPython.html)
* [Greg Landrum's RDKit Blog](https://greglandrum.github.io/rdkit-blog)
** [Generating 3D conformers of molecules](https://greglandrum.github.io/rdkit-blog/conformers/exploration/2021/02/22/etkdg-and-distance-constraints.html)
*   https://xinhaoli74.github.io/blog/
**    https://xinhaoli74.github.io/blog/rdkit/2021/01/06/rdkit.html



In [None]:
import rdkit
print(rdkit.__version__)

In [None]:
from rdkit import Chem

In [None]:
from rdkit.Chem.Draw import IPythonConsole # to draw inline in iPython
from rdkit.Chem import Draw # to draw molecules

In [None]:
# Change the default to show molecules in interactive 3D
IPythonConsole.ipython_3d = True

# later on, get a conformer from an RDKit molecule, and use
# IPythonConsole.drawMol3D(m, confID=cids[0])

# See: https://greglandrum.github.io/rdkit-blog/conformers/exploration/2021/02/22/etkdg-and-distance-constraints.html

In [None]:
from rdkit.Chem import PandasTools # we use this to read in SDFs into pandas dataframes

In [None]:
from IPython.display import SVG # to use vectors, not bitmaps, for cleaner lines
from rdkit.Chem import rdDepictor  # to generate 2D depictions of molecules
from rdkit.Chem.Draw import rdMolDraw2D # to draw 2D molecules using vectors

In [None]:
import pandas as pd # we need this to make copies of pandas columns of molecules

In [None]:
from rdkit.Chem import AllChem  # we need this to compute 2D depictions
from copy import deepcopy  # we need deep copies of the molecule to avoid losing 3D coordinates

In [None]:
import py3Dmol # for inline 3D interactive views of molecules

# Let's Make Some Molecules!

We can create molecules by starting from a string that describes the elements, bonds (and their connectivity), stereochemistry, and charge of the molecule, using SMILES.

* [SMILES Examples](https://www.daylight.com/dayhtml_tutorials/languages/smiles/smiles_examples.html)
* [SMILES Tutorial](https://daylight.com/dayhtml_tutorials/languages/smiles/index.html)


In [None]:
mol = Chem.MolFromSmiles('c1ccccc1C(=O)N')
mol

In [None]:
def show_atom_numbers(mol, label):
  """See https://stackoverflow.com/questions/53321453/rdkit-how-to-show-moleculars-atoms-number?answertab=active#tab-top"""
  for atom in mol.GetAtoms():
    atom.SetProp(label, str(atom.GetIdx()+1))
  return mol

In [None]:
mol = Chem.MolFromSmiles('c1ccccc1C(=O)N')
show_atom_numbers(mol, 'atomLabel')

In [None]:
mol = Chem.MolFromSmiles('c1ccccc1C(=O)N')
show_atom_numbers(mol, 'molAtomMapNumber')

In [None]:
mol = Chem.MolFromSmiles('c1ccccc1C(=O)N')
show_atom_numbers(mol, 'atomNote')

# Drawing Sharper Molecules Using Vectors, not Bitmaps

This code uses SVG to depict molecules.

See: https://leedavies.dev/index.php/2018/10/06/rdkit-in-jupyter-notebooks/

In [None]:
# Create mol object from smiles string
mol = Chem.MolFromSmiles('c1cccnc1C(=O)N')

molSize=(450,150)
mc = Chem.Mol(mol.ToBinary())
if not mc.GetNumConformers():
  #Compute 2D coordinates
  rdDepictor.Compute2DCoords(mc)

# init the drawer with the size
drawer = rdMolDraw2D.MolDraw2DSVG(molSize[0],molSize[1])

#draw the molcule
drawer.DrawMolecule(mc)
drawer.FinishDrawing()

# get the SVG string
svg = drawer.GetDrawingText()

# fix the svg string and display it
display(SVG(svg.replace('svg:','')))

# Get Drugs from DrugBank

Free access to DrugBank is permitted for students, academics and non-profits; you need to apply for access then use your username
and password to login or `wget` files (using `--user` and `--password`).

See: https://go.drugbank.com/releases/latest#structures

In [None]:
### WARNING! NEVER EMBED USERNAME AND PASSWORD!!! ###
### ESPECIALLY BEFORE SAVING OR COMMITTING CHANGES!!! ###

import getpass
user = getpass.getpass("DrugBank username: ")
pwd = getpass.getpass("DrugBank password: ")

#!wget --user $user --password $pwd https://go.drugbank.com/releases/5-1-8/downloads/all-structures
!wget --user $user --password $pwd https://go.drugbank.com/releases/5-1-8/downloads/all-3d-structures
#!wget --user $user --password $pwd https://go.drugbank.com/releases/5-1-8/downloads/all-structure-links

In [None]:
%ls

In [None]:
!unzip all-3d-structures 
#!unzip all-structure-links 
#!unzip all-structures

In [None]:
%ls

In [None]:
!mv 3D\ structures.sdf all-drugbank-3D.sdf

In [None]:
%ls

# Drawing Molecules in a Grid

In [None]:
smiles = [
    'N#CC(OC1OC(COC2OC(CO)C(O)C(O)C2O)C(O)C(O)C1O)c1ccccc1',
    'c1ccc2c(c1)ccc1c2ccc2c3ccccc3ccc21',
    'C=C(C)C1Cc2c(ccc3c2OC2COc4cc(OC)c(OC)cc4C2C3=O)O1',
    'ClC(Cl)=C(c1ccc(Cl)cc1)c1ccc(Cl)cc1'
]

mols = [Chem.MolFromSmiles(smi) for smi in smiles]

In [None]:
Draw.MolsToGridImage(mols, molsPerRow=2, subImgSize=(200, 200))

# Get 3D Structures of Drugs from DrugBank

This will include everything: approved, withdrawn, illicit drugs and nutraceuticals, etc.

In [None]:
# Read in the 3D coordinates of all the drugs in DrugBank
# See Susan Leung's Blopig post: https://www.blopig.com/blog/2017/02/using-rdkit-to-load-ligand-sdfs-into-pandas-dataframes/
filename = 'all-drugbank-3D.sdf'
drugbank = PandasTools.LoadSDF(filename)

In [None]:
drugbank.info()

In [None]:
drugbank[:3]

In [None]:
drugbank[['MOLECULAR_WEIGHT','ROMol']][:3]

## Make 2D Copies of the Molecules

If we don't make deep copies of each molecule, Compute2DCoords will overwrite the original 3D coordinates of the atoms.

In [None]:
copy_of_mols = pd.Series( deepcopy(drugbank['ROMol'].to_dict()) ) # See: https://stackoverflow.com/questions/52708341/make-a-truly-deep-copy-of-a-pandas-series
for m in copy_of_mols:
  _ = AllChem.Compute2DCoords(m) # only updates coords of m
drugbank_2D = [m for m in copy_of_mols]

drugbank['ROMol_2D'] = drugbank_2D

In [None]:
drugbank[['MOLECULAR_WEIGHT', 'ROMol', 'ROMol_2D']][:2]

In [None]:
drug = drugbank['ROMol'][0]

In [None]:
drug

In [None]:
IPythonConsole.drawMol3D(drug)

In [None]:
view = py3Dmol.view(query='pdb:1hvr')
view.setStyle({'cartoon':{'color':'spectrum'}})
view

In [None]:
IPythonConsole.ipython_3d = False

In [None]:
drug

In [None]:
IPythonConsole.ipython_3d = True

In [None]:
drug