<div style="text-align:center;">
  <img src="https://github.com/MolSSI-Education/iqb-2025/blob/main/images/molssi_main_outline.png?raw=true" style="display: block; margin: 0 auto; max-height:325px;">
</div>


# Molecular Docking using gnina

*This tutorial was written by Jessica Nash (Software Scientist) at The Molecular Sciences Software Institue for the [Cheminformatic-Driven Molecular Docking Workshop](https://pdb101.rcsb.org/news/67d9853eaddf75595bd158f7) held as a Crash Course with the Institute for Quantitative Biomedicine (IQB) and the Protein Data Bank (PDB). Special thanks to Pat Walters and David Koes for feedback, suggestions, and proofreading on this notebook.*

*This notebook is Part 4 of 4 in the notebook series.*

Other notebooks in this series:
1. [Digital Representation of Molecules](https://colab.research.google.com/github/MolSSI-Education/iqb-2025/blob/main/01_Cheminfo_crash_course.ipynb)
2. [Exploring Chemical and Biological Data with BindingDB and the RDKit](https://colab.research.google.com/github/MolSSI-Education/iqb-2025/blob/main/02_Cheminfo_crash_course.ipynb)
3. [Preparing Structures for Docking](https://colab.research.google.com/github/MolSSI-Education/iqb-2025/blob/main/03_Cheminfo_crash_course.ipynb)
4. **Molecular Docking using gnina** (this notebook)

In this notebook, we will perform molecular docking of our original bound ligand along with the ligands we prepared in the last notebook.
Molecular docking is a computational method that predicts the preferred orientation of one molecule (the ligand) when bound to another molecule (the receptor, often a protein).

For docking, we will use a program called [gnina](https://github.com/gnina/gnina). gnina is pronounced `nee-na` (silent g) and is a fork of a software program called smina, which is itself a fork of Autodock Vina.
For those unfamiliar with software development lingo, a "fork" is a copy of a project. Forks may be modified and diverge from the original project.
Autodock Vina is the program we used for docking in last year's PDB Crash Course.

smina was created from AutoDock Vina in order to allow easier set up of docking calculations as well as to allow more customization of the scoring function. smina allows you to set the binding site automatically based on distance from a ligand as well as to define your own scoring functions. gnina builds on that by adding rescoring with convoluational neural networks to improve pose prediction. In this notebook, we will use gnina.

If you find gnina useful and use it in your own research, please be sure to cite the appropriate papers:

Citation
========

**GNINA 1.0: Molecular docking with deep learning** (Primary application citation)  
A McNutt, P Francoeur, R Aggarwal, T Masuda, R Meli, M Ragoza, J Sunseri, DR Koes. *J. Cheminformatics*, 2021  
[link](https://jcheminf.biomedcentral.com/articles/10.1186/s13321-021-00522-2) [PubMed](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8191141/) [ChemRxiv](https://chemrxiv.org/articles/preprint/GNINA_1_0_Molecular_Docking_with_Deep_Learning/13578140)

**Protein–Ligand Scoring with Convolutional Neural Networks**  (Primary methods citation)  
M Ragoza, J Hochuli, E Idrobo, J Sunseri, DR Koes. *J. Chem. Inf. Model*, 2017  
[link](http://pubs.acs.org/doi/full/10.1021/acs.jcim.6b00740) [PubMed](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5479431/) [arXiv](https://arxiv.org/abs/1612.02751)  

**Ligand pose optimization with atomic grid-based convolutional neural networks**  
M Ragoza, L Turner, DR Koes. *Machine Learning for Molecules and Materials NIPS 2017 Workshop*, 2017  
[arXiv](https://arxiv.org/abs/1710.07400)  

**Visualizing convolutional neural network protein-ligand scoring**  
J Hochuli, A Helbling, T Skaist, M Ragoza, DR Koes.  *Journal of Molecular Graphics and Modelling*, 2018  
[link](https://www.sciencedirect.com/science/article/pii/S1093326318301670) [PubMed](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6343664/) [arXiv](https://arxiv.org/abs/1803.02398)

**Convolutional neural network scoring and minimization in the D3R 2017 community challenge**  
J Sunseri, JE King, PG Francoeur, DR Koes.  *Journal of computer-aided molecular design*, 2018  
[link](https://link.springer.com/article/10.1007/s10822-018-0133-y) [PubMed](https://www.ncbi.nlm.nih.gov/pubmed/29992528)

**Three-Dimensional Convolutional Neural Networks and a Cross-Docked Data Set for Structure-Based Drug Design**  
PG Francoeur, T Masuda, J Sunseri, A Jia, RB Iovanisci, I Snyder, DR Koes. *J. Chem. Inf. Model*, 2020  
[link](https://pubs.acs.org/doi/abs/10.1021/acs.jcim.0c00411) [PubMed](https://pubmed.ncbi.nlm.nih.gov/32865404/) [Chemrxiv](https://chemrxiv.org/articles/preprint/3D_Convolutional_Neural_Networks_and_a_CrossDocked_Dataset_for_Structure-Based_Drug_Design/11833323/1)

**Virtual Screening with Gnina 1.0**
J Sunseri, DR Koes D. *Molecules*, 2021
[link](https://www.mdpi.com/1420-3049/26/23/7369) [Preprints](https://www.preprints.org/manuscript/202111.0329/v1)

In [None]:
# @title Overview
%%html
<style>
div.alert {
    color: #0056b3;
    background-color: #d9edf7;
    border-left: 5px solid #31708f;
    padding: 0.5em;
    font-size: 1.25em;
    line-height: 1.5;
}
div.alert ul {
    margin: 0.5em 0;
}
div.alert li {
    margin-bottom: 0.5em;
}
</style>

<div class="alert alert-block alert-info">
    <strong>Questions:</strong>
    <ul>
        <li>How can I perform molecular docking using the gnina software?</li>
        <li>What is redocking and how do I perform it?</li>
        <li>What is crossdocking and how do I perform it?</li>
        <li>How are docking results evaluated using scores and Root Mean Square Deviation (RMSD)?</li>
        <li>How do `gnina`'s CNN scores differ from traditional Vina scores in practice?</li>
        <li>How can I dock multiple ligands with gnina?</li>
        <li>How can docking results for multiple compounds be compared?</li>
    </ul>

    <strong>Objectives:</strong>
    <ul>
        <li>Use `gnina` to perform molecular docking for single and multiple ligands.</li>
        <li>Visualize docked poses within the protein binding site.</li>
        <li>Calculate RMSD to evaluate redocking and cross docking accuracy.</li>
        <li>Interpret the different scoring outputs from `gnina`.</li>
    </ul>
</div>

## Set Up
The cells in this section set up the software and files we will need for our calculations.




### Install Python Packages  
1. `useful_rdkit_utils` is a Python package written and maintained by Pat Walters that contains useful RDKit functions. We will use it for the functions `mcs_rmsd` (explained later).
2. `py3Dmol` is used for molecular visualization.
3. The RDKit is a popular cheminiformatics package we will use for processing molecules.


In [None]:
%%capture
!pip install useful_rdkit_utils py3Dmol rdkit
!apt install openbabel

### Download gnina

We are downloading the pre-compiled binary of gnina. You may also compile gnina yourself by following the directions on the [gnina GitHub repository](https://github.com/gnina/gnina).

In [None]:
# Download gnina
!wget https://github.com/gnina/gnina/releases/download/v1.3/gnina.fix

--2025-10-08 10:55:46--  https://github.com/gnina/gnina/releases/download/v1.3/gnina.fix
Resolving github.com (github.com)... 140.82.121.4
Connecting to github.com (github.com)|140.82.121.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://release-assets.githubusercontent.com/github-production-release-asset/45548146/a7090e9d-ca5b-4232-b307-e29a70dbe6d5?sp=r&sv=2018-11-09&sr=b&spr=https&se=2025-10-08T11%3A35%3A40Z&rscd=attachment%3B+filename%3Dgnina.fix&rsct=application%2Foctet-stream&skoid=96c2d410-5711-43a1-aedd-ab1947aa7ab0&sktid=398a6654-997b-47e9-b12b-9515b896b4de&skt=2025-10-08T10%3A34%3A56Z&ske=2025-10-08T11%3A35%3A40Z&sks=b&skv=2018-11-09&sig=pxBbjGwm8xyzfv0%2Fu7KKfLtMbJHtXei31264Ka5UCog%3D&jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmVsZWFzZS1hc3NldHMuZ2l0aHVidXNlcmNvbnRlbnQuY29tIiwia2V5Ijoia2V5MSIsImV4cCI6MTc1OTkyNDU0NiwibmJmIjoxNzU5OTIwOTQ2LCJwYXRoIjoicmVsZWFzZWFzc2V0cHJvZHVjdGlvbi5ibG9iLmNvcmUud2luZG93cy5

In [None]:
# Make gnina executable
!mv gnina.fix gnina
!chmod +x gnina

### Get Lesson Files

We have stored the files created in the last notebook as a zip file and stored it on GitHub. This cell downloads that file as well as `util.py` which contains a custom utility function for visualizing our ligand and protein.



In [None]:
%%capture
!wget https://github.com/MolSSI-Education/iqb-2025/raw/refs/heads/main/data/docking_files.zip
!wget https://raw.githubusercontent.com/MolSSI-Education/iqb-2025/refs/heads/main/util.py

In [None]:
!unzip docking_files.zip

Archive:  docking_files.zip
   creating: docking_files/
   creating: docking_files/protein_structures/
  inflating: docking_files/protein_structures/7L11_aligned_fixed.pdb  
  inflating: docking_files/protein_structures/7L11_aligned.pdb  
  inflating: docking_files/protein_structures/7LME_fixed.pdb  
  inflating: docking_files/protein_structures/7LME.pdb  
  inflating: docking_files/protein_structures/7L11.pdb  
  inflating: docking_files/protein_structures/7LME.pdbqt  
  inflating: docking_files/protein_structures/7L11.pdbqt  
   creating: docking_files/ligand_structures/
  inflating: docking_files/ligand_structures/Y6J_ideal.sdf  
  inflating: docking_files/ligand_structures/XF1_ideal.sdf  
  inflating: docking_files/ligand_structures/ligands_to_dock.sdf  
  inflating: docking_files/ligand_structures/Y6J_corrected_pose.sdf  
  inflating: docking_files/ligand_structures/XF1_corrected_pose.sdf  
  inflating: docking_files/ligand_structures/Y6J_fromPDB.pdb  
  inflating: docking_files/l

In [None]:
from google.colab import files

# Upload a file from local PC to your Colab VM
files.upload("molecular_docking")

# Download a file from your Colab VM to local PC
#files.download('mylocalfile.txt')

Saving 8ENK_A.pdb to molecular_docking/8ENK_A.pdb


{'molecular_docking/8ENK_A.pdb': b"ATOM      1  N   GLY A  46      10.581 -49.048  72.899  1.00 98.09           N  \nATOM      2  CA  GLY A  46       9.603 -48.456  73.793  1.00 94.94           C  \nATOM      3  C   GLY A  46       8.280 -49.199  73.830  1.00 96.09           C  \nATOM      4  O   GLY A  46       7.470 -48.995  74.734  1.00 94.37           O  \nATOM      5  N   PHE A  47       8.055 -50.061  72.835  1.00 88.80           N  \nATOM      6  CA  PHE A  47       6.852 -50.886  72.825  1.00 85.03           C  \nATOM      7  C   PHE A  47       6.925 -52.022  73.837  1.00 90.57           C  \nATOM      8  O   PHE A  47       5.890 -52.611  74.165  1.00 87.15           O  \nATOM      9  CB  PHE A  47       6.609 -51.448  71.422  1.00 79.79           C  \nATOM     10  CG  PHE A  47       5.939 -50.478  70.490  1.00 78.17           C  \nATOM     11  CD1 PHE A  47       4.590 -50.198  70.621  1.00 75.01           C  \nATOM     12  CD2 PHE A  47       6.653 -49.852  69.482  1.00 81

## Docking with gnina

Molecular Docking involves involves two main stages:

1. Sampling: The algorithm explores many possible positions and orientations (or "poses") of the ligand within the receptor's active site. In AutoDock Vina, smina, and gnina, conformations are generated using Monte Carlo Sampling.
2. Scoring: Each generated pose is evaluated using a scoring function which estimates the binding affinity. Poses are then ranked on these scores. For Vina scores, lower energy scores indicating more favorable interactions.

In addition to the traditional scoring functions avaiable in Vina and smina, gnina adds convolutional neural networks (CNNs) to scoring.  These deep learning models analyze a 3D grid representation of the protein-ligand complex, essentially evaluating a "picture" of the interaction based on atomic densities.

By default, gnina uses results from the CNN for **rescoring**, meaning that poses are initially sampled and scored with the traditional Vina scoring function but re-ranked after sampling using CNN models. You can, however, choose to use the CNN for all scoring, refinement, or not at all (using CNN scoring for refinement or all scoring is more computationally intensive).

For more details see the paper on [gnina v1.0](https://jcheminf.biomedcentral.com/articles/10.1186/s13321-021-00522-2) and [gnina v1.3](https://jcheminf.biomedcentral.com/articles/10.1186/s13321-025-00973-x).

## Redocking the Ligand

Redocking (also called "cognate docking") involves redocking a ligand back into the receptor structure from which the bound pose was experimentally determined.
Redocking is typically done to evaluate how well a docking program's sampling algorithm and scoring function and reproduce a known experimental binding pose.

We will begin our docking journey with gnina by performing a redock of our ligand.

In [None]:
from util import visualize_poses

v = visualize_poses(
    "7BCS_A_fixed.pdb",
    "multiple_ligands_docked.sdf",
    animate=False
)
v.show()

You may execute the cell below, and read the following explanation on the input parameters for gnina. gnina works through the command line, so we cannot use in-line comments.

```
./gnina \
  # Specify the receptor structure file (-r).
  # This file (7LME.pdbqt) should be prepared for docking (e.g., with hydrogens added).
  -r docking_files/7LME_all_atom.pdbqt \
  # Specify the ligand structure file (-l) to be docked.
  # This file (Y6J_ideal.pdbqt) contains the 3D coordinates of the ligand.
  -l docking_files/Y6J_ideal.pdbqt \
  # Define the docking search box automatically (--autobox_ligand).
  # The box will be centered around the coordinates of the ligand in the specified file
  # (Y6J_corrected_pose.sdf), which is the known experimental pose in this redocking example.
  # An optional padding (default 4Å) is added.
  --autobox_ligand docking_files/Y6J_corrected_pose.sdf \
  # Specify the output file path (-o) where the resulting docked poses will be saved.
  # The output format will be SDF, containing multiple poses ranked by score.
  -o docking_results/Y6J_docked_e12.sdf \
  # Set the random number generator seed (--seed) to 0.
  # Using a fixed seed makes the docking calculation reproducible.
  --seed 0 \
  # Set the exhaustiveness level (--exhaustiveness) to 12.
  # This controls the number of Monte Carlo chains for the ligand.
  # The default is 8
  --exhaustiveness 16
  ```

  Execute the next cell to run gnina.


In [None]:
# @title While you Wait: Navigating py3DMol visualizations
%%html
<style>
div.purple-box {
    color: #4b0082; /* Indigo for text */
    background-color: #f3e5f5; /* Light lavender background */
    border-left: 5px solid #7b1fa2; /* Medium purple border */
    padding: 0.5em;
    font-size: 1.25em; /* Matches the surrounding text size */
    line-height: 1.5; /* Ensures readability */
    font-family: Arial, sans-serif; /* Clean, modern font */
}
div.purple-box ul {
    margin: 0.5em 0; /* Space around the list */
}
div.purple-box li {
    margin-bottom: 0.5em; /* Space between list items */
}
</style>
<div class="purple-box">
   <p> Execute the next cell below to start running
your docking calculation. Then, come back and try these navigation tips for Py3DMol.</p>
    <strong>Py3DMol Visualization Navigation:</strong>
    <table border="1" style="border-collapse: collapse; width: 100%;">
  <thead>
    <tr>
      <th>Movement</th>
      <th>Mouse Input</th>
      <th>Touch Input</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Rotation</td>
      <td>Primary Mouse Button</td>
      <td>Single touch</td>
    </tr>
    <tr>
      <td>Translation</td>
      <td>Middle Mouse Button or Ctrl+Primary</td>
      <td>Triple touch</td>
    </tr>
    <tr>
      <td>Zoom</td>
      <td>Scroll Wheel or Second Mouse Button or Shift+Primary</td>
      <td>Pinch (double touch)</td>
    </tr>
    <tr>
      <td>Slab</td>
      <td>Ctrl+Second</td>
      <td>Not Available</td>
    </tr>
  </tbody>
</table>
</div>

Movement,Mouse Input,Touch Input
Rotation,Primary Mouse Button,Single touch
Translation,Middle Mouse Button or Ctrl+Primary,Triple touch
Zoom,Scroll Wheel or Second Mouse Button or Shift+Primary,Pinch (double touch)
Slab,Ctrl+Second,Not Available


In [None]:
# make a folder for our results
!mkdir -p docking_results

# use gnina
!./gnina \
  -r docking_files/protein_structures/7LME.pdbqt \
  -l docking_files/ligand_structures/Y6J_ideal.sdf \
  --autobox_ligand docking_files/ligand_structures/Y6J_corrected_pose.sdf \
  -o docking_results/Y6J_docked_7LME.sdf \
  --seed 0 \
  --exhaustiveness 16

              _             
             (_)            
   __ _ _ __  _ _ __   __ _ 
  / _` | '_ \| | '_ \ / _` |
 | (_| | | | | | | | | (_| |
  \__, |_| |_|_|_| |_|\__,_|
   __/ |                    
  |___/                     

gnina  master:25e64da   Built Apr 23 2025.
gnina is based on smina and AutoDock Vina.
Please cite appropriately.

Recommend running with single model (--cnn fast)
or without cnn scoring (--cnn_scoring=none).

Commandline: ./gnina -r docking_files/protein_structures/7LME.pdbqt -l docking_files/ligand_structures/Y6J_ideal.sdf --autobox_ligand docking_files/ligand_structures/Y6J_corrected_pose.sdf -o docking_results/Y6J_docked_7LME.sdf --seed 0 --exhaustiveness 16
Using random seed: 0

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Y6J | pose 0 | initial pose not within box

mode |  affinity  |  intramol  |    CNN     |   CNN
     | (kcal/mol) | (kca

In [None]:
!./gnina \
  -r 7BCS_A.pdbqt \
  -l ligands_to_dock.sdf \
  --autobox_ligand TJ5_corrected_pose.sdf \
  -o multiple_ligands_docked.sdf \
  --seed 0 \
  --exhaustiveness 16

              _             
             (_)            
   __ _ _ __  _ _ __   __ _ 
  / _` | '_ \| | '_ \ / _` |
 | (_| | | | | | | | | (_| |
  \__, |_| |_|_|_| |_|\__,_|
   __/ |                    
  |___/                     

gnina  master:25e64da   Built Apr 23 2025.
gnina is based on smina and AutoDock Vina.
Please cite appropriately.

Recommend running with single model (--cnn fast)
or without cnn scoring (--cnn_scoring=none).

Commandline: ./gnina -r 7BCS_A.pdbqt -l ligands_to_dock.sdf --autobox_ligand TJ5_corrected_pose.sdf -o TJ5_docked.sdf --seed 0 --exhaustiveness 16
Using random seed: 0

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
 | pose 0 | initial pose not within box
 | pose 0 | ligand outside box
 | pose 0 | ligand outside box
 | pose 0 | ligand outside box
 | pose 0 | ligand outside box
 | pose 0 | ligand outside box
 | pose 0 | initial pose not within

### Interpreting the output

When `gnina` finishes a docking run, it prints a summary table for the generated poses. This table is sorted by pose rank, with the poses `gnina` determined to be the best at the top.

The columns are the following:

* `mode`: pose rank
* `affinity (kcal/mol)`: the Vina score
* `intra (kcal/mol)`: the ligand's internal strain energy according to the Vina function
* `CNN pose score`: the score from the convolutional neural network predicting pose quality, where higher values closer to 1 indicate higher confidence in the pose's geometric accuracy and are used for ranking.
* `CNN affinity`: The CNN's prediction of binding affinity, expressed in pK units (higher values mean stronger binding, e.g., a predicted score of 9 corresponds to nanomolar (nM) affinity).

Looking at tscores for the redocked Y6J ligand, the table shows the poses ranked by `CNN pose score` (higher is better), with `mode 1` scoring highest (approx. 0.81). Notably, this differs from the ranking by the Vina `affinity` score (lower is better), where `mode 3` is most favorable (-7.51 kcal/mol) but has a much lower `CNN pose score` (approx. 0.49).


### Visualizing the Docked Structures

In the cell below, we use a function called `visualize_docked_structures`, a custom function defined for this workshop that we obtained above when we retreived `util.py`.
This function allows us to view our generated docked structures along with ligand it its original experimentally determined position.

In [None]:
v = visualize_poses(
    "7BCS_A_fixed.pdb",
    #"multiple_ligands_docked.sdf",
    "TJ5_ideal.sdf",
    cognate_file="TJ5_corrected_pose.sdf",
    animate=True,
)  # Change to True to see an animation of all of the poses
v.show()

In [None]:
v = visualize_poses(
    "docking_files/protein_structures/7LME_fixed.pdb",
    "docking_results/Y6J_docked_7LME.sdf",
    cognate_file="docking_files/ligand_structures/Y6J_corrected_pose.sdf",
    animate=False,
)  # Change to True to see an animation of all of the poses
v.show()

### Measuring Root-Mean-Square-Deviation (RMSD)

After generating docked poses, we next need to quantitatively evaluate how close the known reference structure.
The standard metric used for this comparison is the **Root Mean Square Deviation (RMSD)**.
RMSD measures the average distance between corresponding atoms of two molecular structures.
A lower RMSD value indicates greater similarity between the docked pose and the reference structure.
Mathematically, it's calculated as:

$$
RMSD = \sqrt[2]{\frac{1}{N} \sum_{i=1}^{N} \delta_i^2}
$$



where $N$ is the number of corresponding atom pairs being compared, and $\delta_i$ is the Euclidean distance between the $i$-th pair of atoms
In docking studies, a common threshold for considering a docked pose "successful" or accurate is an RMSD below 2 Angstroms compared to the crystal structure.

In this notebook, we will use the `mcs_rmsd` function from the `useful_rdkit_utils` package, written by Pat Walters (a co-instructor of this workshop!).
This function calculates the RMSD, but with a useful modification: it first identifies the **Maximum Common Substructure (MCS)** between the two input molecules using RDKit's `FindMCS` functionality.
It then calculates the RMSD using only the corresponding atoms belonging to this shared substructure.
This approach is particularly valuable when comparing molecules that are similar but not identical, as it focuses the RMSD calculation on the parts of the molecules that match.
While for redocking the original ligand the MCS will typically be the entire molecule, this function can be used later when we compare the poses of different (but similar) docked ligands to the original crystal ligand.



In [None]:
import useful_rdkit_utils as uru
from rdkit import Chem

cognate = Chem.MolFromMolFile("docking_files/ligand_structures/Y6J_corrected_pose.sdf")
poses = Chem.SDMolSupplier("docking_results/Y6J_docked_7LME.sdf")

for i, pose in enumerate(poses):
    n_match, rmsd = uru.mcs_rmsd(cognate, pose)
    print(f"{n_match}\t{rmsd:.2f}")

In [None]:
# @title Exercise
%%html
<style>
div.orange-alert {
    color: #854f00; /* Darker shade of orange for text */
    background-color: #ffe6cc; /* Light orange background */
    border-left: 5px solid #ff9933; /* Bright orange border */
    padding: 0.5em;
    font-size: 1.25em; /* Matches the surrounding text size */
    line-height: 1.5; /* Ensures readability */
}
div.orange-alert ul {
    margin: 0.5em 0; /* Space around the list */
}
div.orange-alert li {
    margin-bottom: 0.5em; /* Space between list items */
}
</style>

<div class="orange-alert">

<strong>How is docking without the CNN rescoring?</strong>

<p>You can turn off CNN rescoring with gnina by adding <code>--cnn_scoring none</code> to
  your gnina command. Try doing this - make sure to save your results in a new file.

  How does it affect redocking, particularly the measured RMSD score of the docked structures?

</div>

## Cross Docking the Ligand

We've seen that gnina does a great job fitting the ligand back into its original structure. However, this is an "easier" problem than docking a ligand into an unknown structure. When we redock, the binding cavity for the ligand will be perfectly fit for the ligand we are testing.

When performing cross-docking, you dock the ligand into the structure, but with a different version of the protein structure (a main protease structure from a different PDB ID with a different ligand bound, or with no ligand). If our docking is working well, we should get the same structure for the ligand regardless of the PDB ID we are docking to.


In [None]:
# @title Cross Docking Structure Prep
%%html
<style>
div.purple-box {
    color: #4b0082; /* Indigo for text */
    background-color: #f3e5f5; /* Light lavender background */
    border-left: 5px solid #7b1fa2; /* Medium purple border */
    padding: 0.5em;
    font-size: 1.25em; /* Matches the surrounding text size */
    line-height: 1.5; /* Ensures readability */
    font-family: Arial, sans-serif; /* Clean, modern font */
}
div.purple-box ul {
    margin: 0.5em 0; /* Space around the list */
}
div.purple-box li {
    margin-bottom: 0.5em; /* Space between list items */
}
</style>

<div class="purple-box">
    <strong>Cross Docking Structure Prep:</strong>
    <p>
        If you want to perform cross-docking and measure the RMSD of the docked ligands, you will need your two protein structures to be aligned.
        Although two PDB IDs may have the same protein, they may differ in the translation and orientation of the protein.
        If you do not align the structures, the RMSD you measure will primarily be from the translation and rotation of the structure and will not give you an idea of how close the fit is.
    </p>

    <p>We are going to cross dock with structure <strong>7L11</strong>.
    To prep this file, we loaded into VMD and aligned the protein backbone with the protein backbone of 7LME using
    VMD's Calculate RMSD Tool (use "Align" to align your structures!). We have not included this prep in the workshop
    </p>
    <pre>
    </pre>
</div>

In [None]:
v = visualize_poses(
    "docking_files/protein_structures/7LME_fixed.pdb",
    "docking_files/ligand_structures/Y6J_corrected_pose.sdf",
    cognate_file="docking_files/ligand_structures/XF1_corrected_pose.sdf",
    animate=False,
)  # Change to True to see an animation of all of the poses
v.show()

In [None]:
# use gnina
!./gnina \
  -r docking_files/protein_structures/7L11.pdbqt \
  -l docking_files/ligand_structures/Y6J_ideal.sdf \
  --autobox_ligand docking_files/ligand_structures/XF1_corrected_pose.sdf \
  -o docking_results/Y6J_docked_7L11.sdf \
  --seed 0 \
  --exhaustiveness 16

In [None]:
cognate = Chem.MolFromMolFile("docking_files/ligand_structures/Y6J_corrected_pose.sdf")
poses = Chem.SDMolSupplier("docking_results/Y6J_docked_7L11.sdf")

for i, pose in enumerate(poses):
    n_match, rmsd = uru.mcs_rmsd(cognate, pose)
    print(f"{n_match}\t{rmsd:.2f}")

Our crossdocking scores show very good agreement with our native structure and with redocking.

In [None]:
v = visualize_poses(
    "docking_files/protein_structures/7LME_fixed.pdb",
    "docking_results/Y6J_docked_7L11.sdf",
    cognate_file="docking_files/ligand_structures/Y6J_corrected_pose.sdf",
    animate=False,
)  # Change to True to see an animation of all of the poses
v.show()

## Docking our Prepared Ligands

In this section, we will dock the ligands we prepared in our previous notebook. Luckily, gnina allows docking of multiple ligands by providing an SDF with your ligands of choice.

We can run gnina for multiple ligands by providing this SDF with our ligands of interest to the `-l` argument.

The command you can use to run gnina with multiple ligands is below:

```
!./gnina \
  -r docking_files/protein_structures/7LME.pdbqt \
  -l docking_files/ligand_structures/ligands_to_dock.sdf \
  --autobox_ligand docking_files/ligand_structures/Y6J_corrected_pose.sdf \
  -o docking_results/multiple_ligands_docked.sdf \
  --seed 0 \
  --exhaustiveness 16
```

In the interest of time for this tutorial, we will retrieve precomputed results from running this command. If you later return to this tutorial, feel free to run this calculation yourself by replacing the next cell with the command above!

In [None]:
!./gnina \
  -r docking_files/protein_structures/7LME.pdbqt \
  -l docking_files/ligand_structures/ligands_to_dock.sdf \
  --autobox_ligand docking_files/ligand_structures/Y6J_corrected_pose.sdf \
  -o docking_results/multiple_ligands_docked.sdf \
  --seed 0 \
  --exhaustiveness 16

              _             
             (_)            
   __ _ _ __  _ _ __   __ _ 
  / _` | '_ \| | '_ \ / _` |
 | (_| | | | | | | | | (_| |
  \__, |_| |_|_|_| |_|\__,_|
   __/ |                    
  |___/                     

gnina  master:25e64da   Built Apr 23 2025.
gnina is based on smina and AutoDock Vina.
Please cite appropriately.

Recommend running with single model (--cnn fast)
or without cnn scoring (--cnn_scoring=none).

Commandline: ./gnina -r docking_files/protein_structures/7LME.pdbqt -l docking_files/ligand_structures/ligands_to_dock.sdf --autobox_ligand docking_files/ligand_structures/Y6J_corrected_pose.sdf -o docking_results/multiple_ligands_docked.sdf --seed 0 --exhaustiveness 16
Using random seed: 0

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Compound_12_i0 | pose 0 | initial pose not within box

mode |  affinity  |  intramol  |    CNN     |   CNN

In [None]:
!wget https://github.com/MolSSI-Education/iqb-2025/raw/refs/heads/main/data/docking_results.zip
!unzip -o docking_results.zip

--2025-10-08 13:49:56--  https://github.com/MolSSI-Education/iqb-2025/raw/refs/heads/main/data/docking_results.zip
Resolving github.com (github.com)... 140.82.121.4
Connecting to github.com (github.com)|140.82.121.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/MolSSI-Education/iqb-2025/refs/heads/main/data/docking_results.zip [following]
--2025-10-08 13:49:57--  https://raw.githubusercontent.com/MolSSI-Education/iqb-2025/refs/heads/main/data/docking_results.zip
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 42631 (42K) [application/zip]
Saving to: ‘docking_results.zip’


2025-10-08 13:49:57 (10.5 MB/s) - ‘docking_results.zip’ saved [42631/42631]

Archive:  docking_results.zip
  inflating: d

### Extracting the Scores

gnina stores information docking poses and the score information in the SDF written for the dock.
To analyze and compare the results from all our docking runs (the redocked `Y6J` and all the `Compound_*` ligands), we need to extract this scoring information from the SDF and put it into a structured table.

We can use RDKit PandasTools to read the molecular structures (poses) and their associated properties (scores) from the output SDF. The SDF will contain multiple poses for the docked ligands, and each pose record has the calculated scores (like `minimizedAffinity`, `CNNscore`, `CNNaffinity`, `CNN_VS`, etc.) stored as data fields. The `CNN_VS` score is the product of `CNNscore` and `CNNaffinity`. We would typically want ligands that score highly for both (and thus have a high `CNN_VS` score.

In [None]:
# uncomment to see file
#!cat docking_results/multiple_ligands_results.sdf

In [None]:
from rdkit.Chem import PandasTools
from rdkit.rdBase import BlockLogs
import pandas as pd

score_columns = [
    "minimizedAffinity",
    "CNNscore",
    "CNNaffinity",
    "CNN_VS",
    "CNNaffinity_variance",
]

sdf_paths = [
    #"docking_results/multiple_ligands_docked.sdf",
    "multiple_ligands_docked.sdf",
    #"docking_results/Y6J_docked_7LME.sdf",
    "TJ5_ideal.sdf"
]

df_list = []
for filename in sdf_paths:
    with BlockLogs():
        df_list.append(PandasTools.LoadSDF(filename))

combo_df = pd.concat(df_list)

# PandasTools reads all SDTags as strings, convert score columns to float
for col in score_columns:
    combo_df[col] = combo_df[col].astype(float)

combo_df

Unnamed: 0,ScrubInfo,minimizedAffinity,CNNscore,CNNaffinity,CNN_VS,CNNaffinity_variance,ID,ROMol
0,"{""isomerGroup"": 0, ""isomerId"": 0, ""confId"": 0,...",-3.01779,0.606215,5.000509,3.031385,0.354051,,<rdkit.Chem.rdchem.Mol object at 0x7bbfe41a8f20>
1,"{""isomerGroup"": 0, ""isomerId"": 0, ""confId"": 0,...",-3.42715,0.600035,4.688514,2.813273,0.171718,,<rdkit.Chem.rdchem.Mol object at 0x7bbfe41a8f90>
2,"{""isomerGroup"": 0, ""isomerId"": 0, ""confId"": 0,...",-3.46943,0.506332,4.564616,2.31121,0.199374,,<rdkit.Chem.rdchem.Mol object at 0x7bbfbf96e1f0>
3,"{""isomerGroup"": 0, ""isomerId"": 0, ""confId"": 0,...",-3.24254,0.369545,4.814494,1.779174,0.116433,,<rdkit.Chem.rdchem.Mol object at 0x7bbfbf96db60>
4,"{""isomerGroup"": 0, ""isomerId"": 0, ""confId"": 0,...",-4.30516,0.350376,5.297385,1.856079,0.040828,,<rdkit.Chem.rdchem.Mol object at 0x7bbfbf96e260>
5,"{""isomerGroup"": 0, ""isomerId"": 0, ""confId"": 0,...",-5.6459,0.282341,4.837227,1.365748,0.136485,,<rdkit.Chem.rdchem.Mol object at 0x7bbfbf96e3b0>
6,"{""isomerGroup"": 0, ""isomerId"": 0, ""confId"": 0,...",-6.33385,0.263875,4.299103,1.134425,0.057996,,<rdkit.Chem.rdchem.Mol object at 0x7bbfbf96e420>
7,"{""isomerGroup"": 0, ""isomerId"": 0, ""confId"": 0,...",-5.27217,0.260796,4.486389,1.170031,0.472933,,<rdkit.Chem.rdchem.Mol object at 0x7bbfbf96e490>
8,"{""isomerGroup"": 0, ""isomerId"": 0, ""confId"": 0,...",-3.81963,0.258758,4.420778,1.143912,0.320982,,<rdkit.Chem.rdchem.Mol object at 0x7bbfbf96e500>
9,"{""isomerGroup"": 1, ""isomerId"": 0, ""confId"": 0,...",-3.84866,0.457492,4.127989,1.888521,0.54382,,<rdkit.Chem.rdchem.Mol object at 0x7bbfbf96e570>


In [None]:
#top_poses = combo_df.sort_values(
#    by="minimizedAffinity", ascending=True
#).drop_duplicates("ID")
#top_poses

top_poses = combo_df.sort_values(
    by="minimizedAffinity", ascending=True
)
top_poses

Unnamed: 0,ScrubInfo,minimizedAffinity,CNNscore,CNNaffinity,CNN_VS,CNNaffinity_variance,ID,ROMol
22,"{""isomerGroup"": 2, ""isomerId"": 0, ""confId"": 0,...",-8.1215,0.244233,5.885269,1.437379,1.267879,,<rdkit.Chem.rdchem.Mol object at 0x7bbfbf96eb20>
23,"{""isomerGroup"": 2, ""isomerId"": 0, ""confId"": 0,...",-7.92779,0.230502,5.820847,1.341716,1.128007,,<rdkit.Chem.rdchem.Mol object at 0x7bbfbf96eb90>
18,"{""isomerGroup"": 2, ""isomerId"": 0, ""confId"": 0,...",-7.79564,0.318885,6.097023,1.944252,1.146329,,<rdkit.Chem.rdchem.Mol object at 0x7bbfbf96e960>
19,"{""isomerGroup"": 2, ""isomerId"": 0, ""confId"": 0,...",-7.74988,0.29889,6.246946,1.867152,1.098711,,<rdkit.Chem.rdchem.Mol object at 0x7bbfbf96e9d0>
21,"{""isomerGroup"": 2, ""isomerId"": 0, ""confId"": 0,...",-7.20541,0.245043,6.115548,1.498573,0.90096,,<rdkit.Chem.rdchem.Mol object at 0x7bbfbf96eab0>
25,"{""isomerGroup"": 2, ""isomerId"": 0, ""confId"": 0,...",-6.82137,0.221017,5.161786,1.140842,1.650731,,<rdkit.Chem.rdchem.Mol object at 0x7bbfbf96ec70>
33,"{""isomerGroup"": 3, ""isomerId"": 0, ""confId"": 0,...",-6.81549,0.149864,5.074862,0.76054,0.787367,,<rdkit.Chem.rdchem.Mol object at 0x7bbfbf96eff0>
6,"{""isomerGroup"": 0, ""isomerId"": 0, ""confId"": 0,...",-6.33385,0.263875,4.299103,1.134425,0.057996,,<rdkit.Chem.rdchem.Mol object at 0x7bbfbf96e420>
5,"{""isomerGroup"": 0, ""isomerId"": 0, ""confId"": 0,...",-5.6459,0.282341,4.837227,1.365748,0.136485,,<rdkit.Chem.rdchem.Mol object at 0x7bbfbf96e3b0>
7,"{""isomerGroup"": 0, ""isomerId"": 0, ""confId"": 0,...",-5.27217,0.260796,4.486389,1.170031,0.472933,,<rdkit.Chem.rdchem.Mol object at 0x7bbfbf96e490>


In [None]:
#top_poses.to_csv("multiple_ligands_results.csv")
top_poses.to_excel("multiple_ligands_results.xlsx")

In [None]:
top_poses.head()
top_poses['minimizedAffinity']
docking_scores = top_poses['minimizedAffinity']
docking_scores.head()

Unnamed: 0,minimizedAffinity
22,-8.1215
23,-7.92779
18,-7.79564
19,-7.74988
21,-7.20541


In [None]:
top_poses_top = top_poses[['minimizedAffinity', 'CNNaffinity']][top_poses['minimizedAffinity'] >= -7]
top_poses_top.head()

Unnamed: 0,minimizedAffinity,CNNaffinity
25,-6.82137,5.161786
33,-6.81549,5.074862
6,-6.33385,4.299103
5,-5.6459,4.837227
7,-5.27217,4.486389


Interestingly, the compounds sorted by `minimizedAffinity` (the Vina score) correlates well with the observed `IC50` value from our table.

In [None]:
ligand_data = pd.read_csv(
    "https://raw.githubusercontent.com/MolSSI-Education/iqb-2025/refs/heads/main/data/US20240293380_examples.csv"
)
ligand_data.sort_values(by="IC50 (nM)")

We can also use our visualization function to visualize the docked ligands to look at how they interact with the binding site.

In [None]:
v = visualize_poses(
    #"docking_files/protein_structures/7LME_fixed.pdb",
    "7BCS_A_fixed.pdb",
    #"docking_results/multiple_ligands_docked.sdf",
    "multiple_ligands_docked.sdf",
    #cognate_file="docking_files/ligand_structures/Y6J_corrected_pose.sdf",
    cognate_file="TJ5_corrected_pose.sdf",
    animate=True,
)  # Change to True to see an animation of all of the poses
v.show()

If we would like to visualize poses for only one molecule, we can use the RDKit's PandasTools again to write an SDF for just that compound.

In [None]:
compound_name = "Compound_12_i0"

compound_df = combo_df[combo_df["ID"] == compound_name]

PandasTools.WriteSDF(
    compound_df,
    f"docking_results/individual_{compound_name}.sdf",  # Output file path
    molColName="ROMol",  # Name of the column with RDKit molecules
    properties=score_columns,  # List of property columns to include
)


cognate = Chem.MolFromMolFile("docking_files/ligand_structures/Y6J_corrected_pose.sdf")
poses = Chem.SDMolSupplier(f"docking_results/individual_{compound_name}.sdf")

for i, pose in enumerate(poses):
    n_match, rmsd = uru.mcs_rmsd(cognate, pose)
    print(f"{n_match}\t{rmsd:.2f}")

In [None]:
v = visualize_poses(
    "docking_files/protein_structures/7LME_fixed.pdb",
    f"docking_results/individual_{compound_name}.sdf",
    cognate_file=f"docking_results/individual_{compound_name}.sdf",
    animate=False,
)  # Change to True to see an animation of all of the poses
v.show()

In [None]:
# @title More gnina Docking Options
%%html
<style>
div.purple-box {
    color: #4b0082; /* Indigo for text */
    background-color: #f3e5f5; /* Light lavender background */
    border-left: 5px solid #7b1fa2; /* Medium purple border */
    padding: 0.5em;
    font-size: 1.25em; /* Matches the surrounding text size */
    line-height: 1.5; /* Ensures readability */
    font-family: Arial, sans-serif; /* Clean, modern font */
}
div.purple-box ul {
    margin: 0.5em 0; /* Space around the list */
}
div.purple-box li {
    margin-bottom: 0.5em; /* Space between list items */
}
</style>

<div class="purple-box">
    <strong>Flexible & Whole Protein Docking with GNINA</strong>
    <ul>
        <li><b>Flexible Docking:</b> <code>gnina</code> allows specific receptor sidechains to be treated as flexible during docking. This is typically enabled using command-line options like <code>--flexdist_ligand</code> (to specify a reference point, often the ligand) and <code>--flexdist</code> (to set a distance threshold around that point for selecting flexible residues).</li>
        <li><b>Whole Protein Docking:</b> When the binding site isn't known beforehand, <code>gnina</code> can perform whole protein docking. This is usually done by setting the search box to encompass the entire receptor, commonly achieved by providing the receptor file itself as the argument to <code>--autobox_ligand</code>. This allows the ligand to explore the entire protein surface for potential binding pockets. <i>Note: Due to the much larger search space, significantly higher <code>--exhaustiveness</code> settings are strongly recommended for whole protein docking.</i></li>
    </ul>

    However, be careful using these options as they are much more computationally expensive than rigid docking.
</div>

## Flexible Docking

Throughout this tutorial, we've performed docking calculations treating the protein as a rigid receptor. This is a common thing to do and is computationally efficient. However, sometimes the binding site needs to adjust its  shape to accommodate a ligand. This is called "induced fit".

Induced fit is observed for our structure, according to the [reference](https://pubs.acs.org/doi/10.1021/acs.jmedchem.1c00598), particularly Gln 189 and Met49 have flexible sidechains. We have likely observed good docking results because all of our ligands bind to this site in a similar way and the structures we have started from already fit this well.

In the cell below, we add an argument to highlight residues to show where these residues are relative to our binding site.

In [None]:
# @title Final Exercise
%%html
<style>
div.orange-alert {
    color: #854f00; /* Darker shade of orange for text */
    background-color: #ffe6cc; /* Light orange background */
    border-left: 5px solid #ff9933; /* Bright orange border */
    padding: 0.5em;
    font-size: 1.25em; /* Matches the surrounding text size */
    line-height: 1.5; /* Ensures readability */
}
div.orange-alert ul {
    margin: 0.5em 0; /* Space around the list */
}
div.orange-alert li {
    margin-bottom: 0.5em; /* Space between list items */
}
</style>

<div class="orange-alert">

<strong>Final Challenge</strong>

<p>
<i>Challenge 1: Flexible Docking</i>
</p>

<p>
As stated above, sometimes your docking site may be flexible. `gnina` allows for flexible docking in a few different ways.
You may either use a cut off distance from a target with <code>--flexdist_ligand ARG</code> and <code>--flexdist DISTANCE`</code>,
or you may specify a set of flexible residues using <code>`--flexres`</code>. It is recommended to have as few flexible residues as possible
due to the increased computational cost. To specify the residues highlighted above as flexible, you can add <code>`--flexres A:49,A189`</code> to your
`gnina` command line argument. Here "A" refers to chain A.
</p>

<p>Try redocking or crossdocking using Y6J and flexible docking.</p>

<p>
    <i>
        Challenge 2: Cross Docking all ligands
    </i>
</p>
<p>
    Try docking all of your ligands to `7L11` using rigid docking. Do you observe the same ordering of ligands?
    </p>

</div>

In [None]:
# @title Key Points
%%html
<style>
div.green-note {
    color: #155724; /* Dark green for text */
    background-color: #d4edda; /* Light green background */
    border-left: 5px solid #28a745; /* Bright green border */
    padding: 0.5em;
    font-size: 1.25em; /* Consistent with text size */
    line-height: 1.5; /* Ensures readability */
    font-family: Arial, sans-serif; /* Clean and modern font */
}
div.green-note ul {
    margin: 0.5em 0; /* Space around the list */
}
div.green-note li {
    margin-bottom: 0.5em; /* Space between list items */
}
</style>

<div class="green-note">
    <strong>Key Points:</strong>
    <ul>
        <li><code>gnina</code> is used for molecular docking via the command line, requiring prepared receptor (<code>-r</code>) and ligand (<code>-l</code>) files, defining a search box (e.g., <code>--autobox_ligand</code>), and specifying an output file (<code>-o</code>).</li>
        <li>Docking results are evaluated using scoring functions (<code>minimizedAffinity</code> from Vina, <code>CNNscore</code> and <code>CNNaffinity</code> from the neural network) and Root Mean Square Deviation (RMSD) to measure geometric similarity to a known pose.</li>
        <li><code>gnina</code>'s default <code>CNNscore</code> ranks poses based on predicted geometric accuracy (likelihood of low RMSD) and often differs from rankings based on the Vina affinity score (<code>minimizedAffinity</code>).</li>
        <li>Redocking assesses reproducibility against a known structure, while cross-docking (using a different but related receptor structure) tests robustness, requiring protein alignment for meaningful RMSD comparison.</li>
        <li>Multiple ligands can be docked efficiently by providing a multi-molecule SDF file to <code>gnina</code>, and results can be compiled and analyzed using RDKit and Pandas in Python.</li>
    </ul>
</div>