# Interactive visualization and filtering of small molecule datasets with mols2grid

A (short) tutorial by Cédric Bouysset - RDKit UGM 2021


<a href="https://colab.research.google.com/github/cbouy/UGM_2021/blob/main/Notebooks/Bouysset_mols2grid.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" align="left" alt="Open In Colab"/></a>


<a href="https://www.rdkit.org/"><img src="https://img.shields.io/static/v1?label=Powered%20by&message=RDKit&color=3838ff&style=flat&logo=" align="left"/></a><br>

`mols2grid` is a Python package for 2D molecular visualization, focused on Jupyter notebooks.

💻 **GitHub**: https://github.com/cbouy/mols2grid

👏 **Acknowledgments**:
* Contributors: [@fredrikw](https://github.com/fredrikw), [@JustinChavez](https://github.com/JustinChavez)
* Conda maintainer: [@hadim](https://github.com/hadim)
* Tutorials/code snippets: [@PatWalters](https://practicalcheminformatics.blogspot.com/2021/07/viewing-clustered-chemical-structures.html), [@czodrowskilab](https://github.com/czodrowskilab/5minfame/blob/main/2021_09_02-czodrowski-mols2grid.ipynb), [@dataprofessor](https://www.youtube.com/watch?v=0rqIwSeUImo), [@iwatobipen](https://iwatobipen.wordpress.com/2021/06/13/draw-molecules-on-jupyter-notebook-rdkit-mols2grid/), [@JustinChavez](https://blog.reverielabs.com/building-web-applications-from-python-scripts-with-streamlit/)

This tutorial covers the basics on how to use mols2grid and some more advanced use cases.  
It requires beginner knowledge with pandas and RDKit, and for the (optional) more advanced features some beginner knowledge in JavaScript, HTML and CSS may be necessary.

In [2]:
# Install requirements for the tutorial
%pip install rdkit-pypi mols2grid ipywidgets py3Dmol

Note: you may need to restart the kernel to use updated packages.


In [3]:
import mols2grid
import pandas as pd
from rdkit import Chem
from rdkit.Chem import Descriptors, Draw
from ipywidgets import interact, widgets
import urllib
from IPython.display import display
import py3Dmol

## The data

List of drugs approved by the FDA and others downloaded from [DrugCentral](https://drugcentral.org/), prefiltered to only contain the first 200 compounds with a molecular weight below 600 g/mol. You can get the raw dataset [here](https://unmtid-shinyapps.net/download/DrugCentral/20200516/structures.smiles.tsv).

In [4]:
# read the dataset
df = pd.read_csv("https://raw.githubusercontent.com/cbouy/UGM_2021/main/Notebooks/data/drugcentral_filtered.tsv", sep="\t")
df["mol"] = df["SMILES"].apply(Chem.MolFromSmiles)
# compute some descriptors
df["MolWt"] = df["mol"].apply(Descriptors.ExactMolWt)
df["LogP"] = df["mol"].apply(Descriptors.MolLogP)
df["NumHDonors"] = df["mol"].apply(Descriptors.NumHDonors)
df["NumHAcceptors"] = df["mol"].apply(Descriptors.NumHAcceptors)
# reformat the dataframe
df.drop(columns=["mol"], inplace=True)
df.rename(columns={"INN": "Name", "CAS_RN": "CAS"}, inplace=True)
print(f"{len(df)} molecules read")
df.head()

200 molecules read


Unnamed: 0,SMILES,InChI,InChIKey,ID,Name,CAS,MolWt,LogP,NumHDonors,NumHAcceptors
0,CCCCN1CCCC[C@H]1C(=O)NC1=C(C)C=CC=C1C,InChI=1S/C18H28N2O/c1-4-5-12-20-13-7-6-11-16(2...,LEBVLXFERQHONN-INIZCTEOSA-N,4,levobupivacaine,27262-47-1,288.220164,3.89654,1,2
1,COC(=O)C1=C(C)NC(C)=C([C@H]1C1=CC(=CC=C1)[N+](...,InChI=1S/C26H29N3O6/c1-17-22(25(30)34-4)24(20-...,ZBBHBTPTTSWHBA-DEOSSOPVSA-N,5,(S)-nicardipine,76093-36-2,479.205636,3.6778,1,8
2,CCOC(=O)C1=C(C)NC(C)=C([C@@H]1C1=CC(=CC=C1)[N+...,InChI=1S/C18H20N2O6/c1-5-26-18(22)15-11(3)19-1...,PVHUJELLJLJGLN-INIZCTEOSA-N,6,(S)-nitrendipine,80873-62-7,360.132136,2.5657,1,7
3,C[C@@H](CCC1=CC=C(O)C=C1)NCCC1=CC=C(O)C(O)=C1,InChI=1S/C18H23NO3/c1-13(2-3-14-4-7-16(20)8-5-...,JRWZLRBJNMZMFE-ZDUSSCGKSA-N,13,levdobutamine,61661-06-1,301.167794,2.9568,4,4
4,NC1=NC2=NC=C(CNC3=CC=C(C=C3)C(=O)N[C@@H](CCC(O...,InChI=1S/C19H20N8O5/c20-15-14-16(27-19(21)26-1...,TVZGACDUOSZQKY-LBPRGKRZSA-N,21,aminopterin,54-62-6,440.155666,0.2441,6,10


## The basics

- The input can be a DataFrame, a list of RDKit molecules, or an SDFile. The other arguments are optional.

In [5]:
mols2grid.display(
    df,
    # set the fields  displayed on the grid
    subset=["ID", "img", "CAS"],
    # set the fields displayed on mouse hover
    tooltip=["Name", "MolWt"],
)

MolGridWidget()

- You can make simple text searches using the text bar on the bottom right: try with `acid` for example
- But we can also make substructure queries by clicking on 🔎 > SMARTS and search for `C(=O)-[OH]`
- Next, let's sort our molecules by molecular weight (click again to reverse the order)
- Finally, select a couple of molecules (click on the checkbox) and you can then export you selection to a SMILES file (clipboard copy is blocked on Colab unfortunately)

The main point of mols2grid is that the widget let's you access your selections from Python afterwards:

In [6]:
mols2grid.get_selection()

{}

In [7]:
# retrieve the corresponding entries in the dataframe
df.iloc[list(mols2grid.get_selection().keys())]

Unnamed: 0,SMILES,InChI,InChIKey,ID,Name,CAS,MolWt,LogP,NumHDonors,NumHAcceptors


## Interactive filtering

Let's add more options for filtering the grid!

We'll use ipywidgets to add sliders for the molecular weight and the other molecular descriptors, and define a function that queries the internal dataframe using the values in the sliders.
Everytime the sliders are moved, the function is called to filter our grid.

In [8]:
grid = mols2grid.MolGrid(df, name="filters")
view = grid.display(
    n_rows=2,
    subset=["ID", "img", "CAS"],
    tooltip=["Name", "MolWt", "LogP", "NumHDonors", "NumHAcceptors"],
)

@interact(
    MolWt=widgets.IntRangeSlider(value=[0, 600], min=0, max=600, step=10),
    LogP=widgets.IntRangeSlider(value=[-10, 10], min=-10, max=10, step=1),
    NumHDonors=widgets.IntRangeSlider(value=[0, 20], min=0, max=20, step=1),
    NumHAcceptors=widgets.IntRangeSlider(value=[0, 20], min=0, max=20, step=1),
)
def filter_grid(MolWt, LogP, NumHDonors, NumHAcceptors):
    results = grid.dataframe.query(
        "@MolWt[0] <= MolWt <= @MolWt[1] and "
        "@LogP[0] <= LogP <= @LogP[1] and "
        "@NumHDonors[0] <= NumHDonors <= @NumHDonors[1] and "
        "@NumHAcceptors[0] <= NumHAcceptors <= @NumHAcceptors[1]"
    )
    return grid.filter_by_index(results.index)

view

MolGridWidget(grid_id='filters')

interactive(children=(IntRangeSlider(value=(0, 600), description='MolWt', max=600, step=10), IntRangeSlider(va…

Another advantage of using `mols2grid.MolGrid` instead of `mols2grid.display`: you get a shortcut for getting your selection as a DataFrame (equivalent to `df.iloc[list(mols2grid.get_selection().keys())]`)

In [9]:
grid.get_selection()

Unnamed: 0,SMILES,InChI,InChIKey,ID,Name,CAS,MolWt,LogP,NumHDonors,NumHAcceptors


## Callbacks

Callbacks are **functions that are executed when you click on a molecule's image**. They can be written in *JavaScript* or *Python*.

It can be used to display some additional information on the molecule or run some more complex code like database queries,  docking or machine-learning predictions.

For Python callbacks, you need to declare a function that takes a dictionnary as first argument. This dictionnary contains all the data related to the molecule you've just clicked on. For example, the SMILES of the molecule will be available as `data["SMILES"]`.

One limitation to keep in mind for Python callbacks is that using print or any other "output" functions inside the callback will not display anything by default. You need to use ipywidgets's `Output` widget to capture what the function is trying to display, and then show it.

In [10]:
output = widgets.Output()
# the Output widget let's us capture the output generated by the callback function
# its presence is mandatory if you want to print/display some info with your callback
@output.capture(clear_output=True, wait=True)
def show_3d(data):
    """Query PubChem to download the SDFile with 3D coordinates and
    display the molecule with py3Dmol
    """
    url = "https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/smiles/{}/SDF?record_type=3d"
    smi = urllib.parse.quote(data["SMILES"])
    try:
        response = urllib.request.urlopen(url.format(smi))
    except urllib.error.HTTPError:
        print(f"Could not find corresponding match on PubChem")
        print(data["SMILES"])
    else:
        sdf = response.read().decode()
        view = py3Dmol.view(height=300, width=800)
        view.addModel(sdf, "sdf")
        view.setStyle({'stick': {}})
        view.zoomTo()
        view.show()

## Google Colab requirement
try:
    from google import colab
except:
    pass
else:
    colab.output.register_callback("show_3d", show_3d)
##

g = grid.display(
    subset=["ID", "img", "Name"],
    tooltip_trigger="hover",
    callback=show_3d,
)
display(g)
output

Output()

You can also use JavaScript callbacks. JS callbacks don't require to declare a function, and you can directly access and use the `data` object similarly to Python in your callback script. The callback could then be as simple as `callback="console.log(JSON.stringify(data))"`

To display popup windows on click, a helper function is available: `mols2grid.make_popup_callback`. It requires a `title` as well as some `html` code to format and display the information that you'd like to show. All of the values inside the `data` object can be inserted in the title and html arguments using `${data["field_name"]}`. Additionally, you can execute a prerequisite JavaScript snippet to create variables that are then accessible in the html code.

In the following exemple, we create an RDKit molecule using the SMILES of the molecule (the `SMILES` field is always present in the data object, no matter your input when creating the grid). We then create an SVG image of the molecule, and calculate some descriptors. Finally, we inject these variables inside the HTML code. You can also style the popup window through the `style` argument.

You can also define your own JS callback from scratch, depending on your needs.

It is possible to load additional JS libraries by passing `custom_header="<script src=...></script>"` to `mols2grid.display`, and they will then be available in the callback.

In [11]:
callback = mols2grid.make_popup_callback(
    title="${data['Name']}",
    js="""
        var mol = RDKitModule.get_mol(data["SMILES"]);
        var svg = mol.get_svg(400, 300);
        var desc = JSON.parse(mol.get_descriptors());
        mol.delete();
    """,
    html="""
        <div class="row">
          <div class="col">${svg}</div>
          <div class="col">
            <b>Molecular weight</b>: ${desc.amw}<br/>
            <b>HBond Acceptors</b>: ${desc.NumHBA}<br/>
            <b>HBond Donors</b>: ${desc.NumHBD}<br/>
            <b>ClogP</b>: ${desc.CrippenClogP}<br/>
          </div>
        </div>""",
    style="max-width: 80%;",
)

grid.display(
    subset=["ID", "img", "Name"],
    tooltip_trigger="hover",
    callback=callback,
)

In [None]:
print(callback)

## Advanced customization

You can have full control on how molecules and the grid are rendered:

In [None]:
# custom drawing options for molecules:
opts = Draw.MolDrawOptions()
# white carbon and hydrogen atoms
opts.updateAtomPalette({x: (1, 1, 1) for x in [1, 6]})
# lighter blue for nitrogen
opts.updateAtomPalette({7: (.4, .4, 1)})
# transparent background
opts.clearBackground = False
# greg's favorite 🤡
opts.comicMode = True

# put the background of each cell in black with white font
custom_css = """
.cell { 
    background-color: black;
    color: white;
}
"""

def lipinsky(item):
    """Colors cells in dark blue if they don't follow Lipinsky's rules"""
    if not (
        (item["MolWt"] < 500) and 
        (item["NumHDonors"] <= 5) and
        (item["NumHAcceptors"] <= 10) and
        (item["LogP"] < 5)
    ):
        return "background-color: navy;"
    return ""

mols2grid.display(
    df.sample(45),
    subset=["ID", "img", "CAS"],
    tooltip=["Name", "CAS", "MolWt", "LogP", "NumHDonors", "NumHAcceptors"],
    size=(180, 180),
    n_columns=4, n_rows=2,
    MolDrawOptions=opts,
    custom_css=custom_css,
    hover_color="#727272",
    border="2px solid #333",
    # modify the style of some fields (MolWt), or of the entire cell (__all__)
    style={
        "MolWt": lambda x: "color: red" if x > 500 else "",
        "__all__": lipinsky,
    },
    # modify some fields (less significant digits in this case)
    transform={
        "MolWt": lambda x: round(x, 1),
        "LogP": lambda x: round(x, 1)
    },
    # hide checkboxes
    selection=False,
    name="customization",
)        