In [1]:
%%capture
!pip install useful_rdkit_utils py3Dmol rdkit
!apt install openbabel

### Download gnina

We are downloading the pre-compiled binary of gnina. You may also compile gnina yourself by following the directions on the [gnina GitHub repository](https://github.com/gnina/gnina).

In [2]:
# Download gnina
!wget https://github.com/gnina/gnina/releases/download/v1.3/gnina.fix

--2025-12-03 20:14:08--  https://github.com/gnina/gnina/releases/download/v1.3/gnina.fix
Resolving github.com (github.com)... 140.82.116.4
Connecting to github.com (github.com)|140.82.116.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://release-assets.githubusercontent.com/github-production-release-asset/45548146/a7090e9d-ca5b-4232-b307-e29a70dbe6d5?sp=r&sv=2018-11-09&sr=b&spr=https&se=2025-12-03T20%3A58%3A11Z&rscd=attachment%3B+filename%3Dgnina.fix&rsct=application%2Foctet-stream&skoid=96c2d410-5711-43a1-aedd-ab1947aa7ab0&sktid=398a6654-997b-47e9-b12b-9515b896b4de&skt=2025-12-03T19%3A57%3A37Z&ske=2025-12-03T20%3A58%3A11Z&sks=b&skv=2018-11-09&sig=xBpxSDERCw%2FiWXp64WTPtQFXMk91AmNEwrrwk%2FA6SX4%3D&jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmVsZWFzZS1hc3NldHMuZ2l0aHVidXNlcmNvbnRlbnQuY29tIiwia2V5Ijoia2V5MSIsImV4cCI6MTc2NDc5NjQ0OCwibmJmIjoxNzY0NzkyODQ4LCJwYXRoIjoicmVsZWFzZWFzc2V0cHJvZHVjdGlvbi5ibG9iLmNvcmUud2luZG93c

In [3]:
# Make gnina executable
!mv gnina.fix gnina
!chmod +x gnina

## Docking with gnina

Molecular Docking involves involves two main stages:

1. Sampling: The algorithm explores many possible positions and orientations (or "poses") of the ligand within the receptor's active site. In AutoDock Vina, smina, and gnina, conformations are generated using Monte Carlo Sampling.
2. Scoring: Each generated pose is evaluated using a scoring function which estimates the binding affinity. Poses are then ranked on these scores. For Vina scores, lower energy scores indicating more favorable interactions.

In addition to the traditional scoring functions avaiable in Vina and smina, gnina adds convolutional neural networks (CNNs) to scoring.  These deep learning models analyze a 3D grid representation of the protein-ligand complex, essentially evaluating a "picture" of the interaction based on atomic densities.

By default, gnina uses results from the CNN for **rescoring**, meaning that poses are initially sampled and scored with the traditional Vina scoring function but re-ranked after sampling using CNN models. You can, however, choose to use the CNN for all scoring, refinement, or not at all (using CNN scoring for refinement or all scoring is more computationally intensive).

For more details see the paper on [gnina v1.0](https://jcheminf.biomedcentral.com/articles/10.1186/s13321-021-00522-2) and [gnina v1.3](https://jcheminf.biomedcentral.com/articles/10.1186/s13321-025-00973-x).

## Redocking the Ligand

Redocking (also called "cognate docking") involves redocking a ligand back into the receptor structure from which the bound pose was experimentally determined.
Redocking is typically done to evaluate how well a docking program's sampling algorithm and scoring function and reproduce a known experimental binding pose.

We will begin our docking journey with gnina by performing a redock of our ligand.

In [6]:
# # make a folder for our results
# !mkdir -p docking_results

# use gnina
!./gnina \
  -r 7t47_only_protein_chimera.pdb \
  -l MRTX.sdf \
  --autobox_ligand 6IC_from_7T47_before_docking_add_H.sdf \
  -o docking_results.sdf \
  --seed 0 \
  --exhaustiveness 16

              _             
             (_)            
   __ _ _ __  _ _ __   __ _ 
  / _` | '_ \| | '_ \ / _` |
 | (_| | | | | | | | | (_| |
  \__, |_| |_|_|_| |_|\__,_|
   __/ |                    
  |___/                     

gnina  master:25e64da   Built Apr 23 2025.
gnina is based on smina and AutoDock Vina.
Please cite appropriately.

Commandline: ./gnina -r 7t47_only_protein_chimera.pdb -l MRTX.sdf --autobox_ligand 6IC_from_7T47_before_docking_add_H.sdf -o docking_results.sdf --seed 0 --exhaustiveness 16
  Cannot initialize database 'space-groups.txt' which may cause further errors.
Using random seed: 0

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
 | pose 0 | initial pose not within box
 | pose 0 | ligand outside box
 | pose 0 | ligand outside box
 | pose 0 | ligand outside box
 | pose 0 | ligand outside box
 | pose 0 | ligand outside box
 | pose 0 | initial pos

### Interpreting the output

When `gnina` finishes a docking run, it prints a summary table for the generated poses. This table is sorted by pose rank, with the poses `gnina` determined to be the best at the top.

The columns are the following:

* `mode`: pose rank
* `affinity (kcal/mol)`: the Vina score
* `intra (kcal/mol)`: the ligand's internal strain energy according to the Vina function
* `CNN pose score`: the score from the convolutional neural network predicting pose quality, where higher values closer to 1 indicate higher confidence in the pose's geometric accuracy and are used for ranking.
* `CNN affinity`: The CNN's prediction of binding affinity, expressed in pK units (higher values mean stronger binding, e.g., a predicted score of 9 corresponds to nanomolar (nM) affinity).

Looking at tscores for the redocked Y6J ligand, the table shows the poses ranked by `CNN pose score` (higher is better), with `mode 1` scoring highest (approx. 0.81). Notably, this differs from the ranking by the Vina `affinity` score (lower is better), where `mode 3` is most favorable (-7.51 kcal/mol) but has a much lower `CNN pose score` (approx. 0.49).


### Measuring Root-Mean-Square-Deviation (RMSD)

After generating docked poses, we next need to quantitatively evaluate how close the known reference structure.
The standard metric used for this comparison is the **Root Mean Square Deviation (RMSD)**.
RMSD measures the average distance between corresponding atoms of two molecular structures.
A lower RMSD value indicates greater similarity between the docked pose and the reference structure.
Mathematically, it's calculated as:

$$
RMSD = \sqrt[2]{\frac{1}{N} \sum_{i=1}^{N} \delta_i^2}
$$



where $N$ is the number of corresponding atom pairs being compared, and $\delta_i$ is the Euclidean distance between the $i$-th pair of atoms
In docking studies, a common threshold for considering a docked pose "successful" or accurate is an RMSD below 2 Angstroms compared to the crystal structure.

In this notebook, we will use the `mcs_rmsd` function from the `useful_rdkit_utils` package, written by Pat Walters (a co-instructor of this workshop!).
This function calculates the RMSD, but with a useful modification: it first identifies the **Maximum Common Substructure (MCS)** between the two input molecules using RDKit's `FindMCS` functionality.
It then calculates the RMSD using only the corresponding atoms belonging to this shared substructure.
This approach is particularly valuable when comparing molecules that are similar but not identical, as it focuses the RMSD calculation on the parts of the molecules that match.
While for redocking the original ligand the MCS will typically be the entire molecule, this function can be used later when we compare the poses of different (but similar) docked ligands to the original crystal ligand.



In [7]:
import useful_rdkit_utils as uru
from rdkit import Chem

cognate = Chem.MolFromMolFile("/content/6IC_from_7T47_before_docking_add_H.sdf")
poses = Chem.SDMolSupplier("/content/docking_results.sdf")

for i, pose in enumerate(poses):
    n_match, rmsd = uru.mcs_rmsd(cognate, pose)
    print(f"{n_match}\t{rmsd:.2f}")

44	0.76
44	5.29
44	2.65
44	3.39
44	9.31
44	11.09
44	10.98
44	10.93
44	10.43




## Docking our Prepared Ligands


In [8]:
!./gnina \
  -r 7t47_only_protein_chimera.pdb \
  -l BDBM573509_simple.sdf \
  --autobox_ligand 6IC_from_7T47_before_docking_add_H.sdf \
  -o ligands_docked_BDBM573509_simple.sdf \
  --seed 0 \
  --exhaustiveness 16

              _             
             (_)            
   __ _ _ __  _ _ __   __ _ 
  / _` | '_ \| | '_ \ / _` |
 | (_| | | | | | | | | (_| |
  \__, |_| |_|_|_| |_|\__,_|
   __/ |                    
  |___/                     

gnina  master:25e64da   Built Apr 23 2025.
gnina is based on smina and AutoDock Vina.
Please cite appropriately.

Commandline: ./gnina -r 7t47_only_protein_chimera.pdb -l BDBM573509_simple.sdf --autobox_ligand 6IC_from_7T47_before_docking_add_H.sdf -o ligands_docked_BDBM573509_simple.sdf --seed 0 --exhaustiveness 16
  Cannot initialize database 'space-groups.txt' which may cause further errors.
Using random seed: 0

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
 | pose 0 | initial pose not within box

mode |  affinity  |  intramol  |    CNN     |   CNN
     | (kcal/mol) | (kcal/mol) | pose score | affinity
-----+------------+------------+--------

In [10]:
!./gnina \
  -r 7t47_only_protein_chimera.pdb \
  -l compound_1_simple.sdf \
  --autobox_ligand 6IC_from_7T47_before_docking_add_H.sdf \
  -o ligands_docked_compound_1_simple.sdf \
  --seed 0 \
  --exhaustiveness 16

              _             
             (_)            
   __ _ _ __  _ _ __   __ _ 
  / _` | '_ \| | '_ \ / _` |
 | (_| | | | | | | | | (_| |
  \__, |_| |_|_|_| |_|\__,_|
   __/ |                    
  |___/                     

gnina  master:25e64da   Built Apr 23 2025.
gnina is based on smina and AutoDock Vina.
Please cite appropriately.

Commandline: ./gnina -r 7t47_only_protein_chimera.pdb -l compound_1_simple.sdf --autobox_ligand 6IC_from_7T47_before_docking_add_H.sdf -o ligands_docked_compound_1_simple.sdf --seed 0 --exhaustiveness 16
  Cannot initialize database 'space-groups.txt' which may cause further errors.
Using random seed: 0

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
 | pose 0 | initial pose not within box
 | pose 0 | ligand outside box

mode |  affinity  |  intramol  |    CNN     |   CNN
     | (kcal/mol) | (kcal/mol) | pose score | affinity
-----+---

In [None]:
!./gnina \
  -r 7t47_only_protein_chimera.pdb \
  -l compound_2_simple.sdf \
  --autobox_ligand 6IC_from_7T47_before_docking_add_H.sdf \
  -o ligands_docked_compound_2.sdf \
  --seed 0 \
  --exhaustiveness 16

              _             
             (_)            
   __ _ _ __  _ _ __   __ _ 
  / _` | '_ \| | '_ \ / _` |
 | (_| | | | | | | | | (_| |
  \__, |_| |_|_|_| |_|\__,_|
   __/ |                    
  |___/                     

gnina  master:25e64da   Built Apr 23 2025.
gnina is based on smina and AutoDock Vina.
Please cite appropriately.

Commandline: ./gnina -r 7t47_only_protein_chimera.pdb -l compound_2_simple.sdf --autobox_ligand 6IC_from_7T47_before_docking_add_H.sdf -o ligands_docked_compound_2.sdf --seed 0 --exhaustiveness 16
  Cannot initialize database 'space-groups.txt' which may cause further errors.
Using random seed: 0

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
 | pose 0 | initial pose not within box
 | pose 0 | ligand outside box
 | pose 0 | ligand outside box

mode |  affinity  |  intramol  |    CNN     |   CNN
     | (kcal/mol) | (kcal/mol) | pose sc

In [None]:
!./gnina \
  -r 7t47_only_protein_chimera.pdb \
  -l compound_3_simple.sdf \
  --autobox_ligand 6IC_from_7T47_before_docking_add_H.sdf \
  -o ligands_docked_compound_3.sdf \
  --seed 0 \
  --exhaustiveness 16

              _             
             (_)            
   __ _ _ __  _ _ __   __ _ 
  / _` | '_ \| | '_ \ / _` |
 | (_| | | | | | | | | (_| |
  \__, |_| |_|_|_| |_|\__,_|
   __/ |                    
  |___/                     

gnina  master:25e64da   Built Apr 23 2025.
gnina is based on smina and AutoDock Vina.
Please cite appropriately.

Commandline: ./gnina -r 7t47_only_protein_chimera.pdb -l compound_3_simple.sdf --autobox_ligand 6IC_from_7T47_before_docking_add_H.sdf -o ligands_docked_compound_3.sdf --seed 0 --exhaustiveness 16
  Cannot initialize database 'space-groups.txt' which may cause further errors.
Using random seed: 0

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
 | pose 0 | initial pose not within box

mode |  affinity  |  intramol  |    CNN     |   CNN
     | (kcal/mol) | (kcal/mol) | pose score | affinity
-----+------------+------------+------------+--