<a href="https://colab.research.google.com/github/mitsukacke2285/drug_discovery_repo/blob/main/Docking_with_Gnina.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Step 1 Installation of dependencies
Gnina will run within a linux environment provided by google colab virtual machine.

1. `useful_rdkit_utils` is a Python package written and maintained by Pat Walters that contains useful RDKit functions. We will use it for the functions `mcs_rmsd` (explained later).
2. `py3Dmol` is used for molecular visualization.
3. The RDKit is a popular cheminiformatics package we will use for processing molecules.


## Step 1.1 Installation of Python packages

In [1]:
%%capture
!pip install useful_rdkit_utils py3Dmol rdkit # If this command doesn't work, run each command separately
!apt install openbabel

In [2]:
!pip install py3Dmol



In [3]:
!pip install rdkit



In [4]:
!pip install useful_rdkit_utils



In [5]:
!curl -L -O https://raw.githubusercontent.com/MolSSI-Education/iqb-2025/main/util.py

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2927  100  2927    0     0   9944      0 --:--:-- --:--:-- --:--:--  9955


## Step 1.2 Download gnina

We are downloading the pre-compiled binary of gnina. You may also compile gnina yourself by following the directions on the [gnina GitHub repository](https://github.com/gnina/gnina).

In [6]:
!wget https://github.com/gnina/gnina/releases/download/v1.3/gnina.fix

--2025-10-24 09:59:39--  https://github.com/gnina/gnina/releases/download/v1.3/gnina.fix
Resolving github.com (github.com)... 140.82.114.3
Connecting to github.com (github.com)|140.82.114.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://release-assets.githubusercontent.com/github-production-release-asset/45548146/a7090e9d-ca5b-4232-b307-e29a70dbe6d5?sp=r&sv=2018-11-09&sr=b&spr=https&se=2025-10-24T10%3A34%3A50Z&rscd=attachment%3B+filename%3Dgnina.fix&rsct=application%2Foctet-stream&skoid=96c2d410-5711-43a1-aedd-ab1947aa7ab0&sktid=398a6654-997b-47e9-b12b-9515b896b4de&skt=2025-10-24T09%3A34%3A32Z&ske=2025-10-24T10%3A34%3A50Z&sks=b&skv=2018-11-09&sig=xYICnJU2GtW9P3eOz8forQuc3Buk2x%2FwZ5vlThltClw%3D&jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmVsZWFzZS1hc3NldHMuZ2l0aHVidXNlcmNvbnRlbnQuY29tIiwia2V5Ijoia2V5MSIsImV4cCI6MTc2MTMwMzU3OSwibmJmIjoxNzYxMjk5OTc5LCJwYXRoIjoicmVsZWFzZWFzc2V0cHJvZHVjdGlvbi5ibG9iLmNvcmUud2luZG93cy5

In [7]:
# Make gnina executable
!mv gnina.fix gnina
!chmod +x gnina

# Step 2 Prepare folders and files

## Step 2.1 Upload files from protein and ligand preparation

Either drag-and-drop files into colab or use the next cell to upload desired files. These files will be used as inputs for running gnina.

In [10]:
from google.colab import files

# Upload {pdb_id}_A.pdbqt, {pdb_id}_A_fixed.pdb and {pdb_id}_A.pdb files from local PC to your Colab VM
files.upload("molecular_docking/protein_files")

# Download a file from your Colab VM to local PC
#files.download('mylocalfile.txt')

Saving 7BCS_A.pdb to molecular_docking/protein_files/7BCS_A.pdb


{'molecular_docking/protein_files/7BCS_A.pdb': b'ATOM      1  N   ARG A  47     129.335 126.730  78.585  1.00 83.00           N  \r\nATOM      2  CA  ARG A  47     128.700 127.778  79.374  1.00 83.00           C  \r\nATOM      3  C   ARG A  47     129.068 127.625  80.852  1.00 83.00           C  \r\nATOM      4  O   ARG A  47     128.294 127.088  81.642  1.00 83.00           O  \r\nATOM      5  CB  ARG A  47     127.179 127.738  79.179  1.00 83.00           C  \r\nATOM      6  CG  ARG A  47     126.409 128.938  79.752  1.00 83.00           C  \r\nATOM      7  CD  ARG A  47     126.668 130.233  78.984  1.00 83.00           C  \r\nATOM      8  NE  ARG A  47     127.822 130.956  79.522  1.00 83.00           N  \r\nATOM      9  CZ  ARG A  47     128.561 131.846  78.860  1.00 83.00           C  \r\nATOM     10  NH1 ARG A  47     128.300 132.182  77.600  1.00 83.00           N  \r\nATOM     11  NH2 ARG A  47     129.585 132.417  79.478  1.00 83.00           N  \r\nATOM     12  HA  ARG A  47 

In [13]:
# Upload {ligand_id}_ideal.sdf, {ligand_id}_corrected_pose.sdf  ligands_to_dock.sdf from local machine to google colab VM
files.upload("molecular_docking/ligand_structures")

Saving ligands_to_dock.sdf to molecular_docking/ligand_structures/ligands_to_dock.sdf


{'molecular_docking/ligand_structures/ligands_to_dock.sdf': b'_i0\r\n     RDKit          3D\r\n\r\n 24 25  0  0  0  0  0  0  0  0999 V2000\r\n   -3.5123   -0.5509    3.3481 Br  0  0  0  0  0  0  0  0  0  0  0  0\r\n   -3.3333   -0.1259    1.5079 C   0  0  0  0  0  0  0  0  0  0  0  0\r\n   -2.0851   -0.2311    0.9184 C   0  0  0  0  0  0  0  0  0  0  0  0\r\n   -1.9787    0.0852   -0.4361 C   0  0  0  0  0  0  0  0  0  0  0  0\r\n   -0.6889   -0.0040   -1.1703 C   0  0  0  0  0  0  0  0  0  0  0  0\r\n    0.4581    0.1011   -0.4198 N   0  0  0  0  0  0  0  0  0  0  0  0\r\n    1.7491    0.1719   -1.0689 C   0  0  0  0  0  0  0  0  0  0  0  0\r\n    2.8064    0.0246   -0.0401 C   0  0  0  0  0  0  0  0  0  0  0  0\r\n    3.8893   -0.8254    0.0073 C   0  0  0  0  0  0  0  0  0  0  0  0\r\n    4.4966   -0.5212    1.1879 N   0  0  0  0  0  0  0  0  0  0  0  0\r\n    3.8403    0.4489    1.8405 N   0  0  0  0  0  0  0  0  0  0  0  0\r\n    2.8005    0.7976    1.0961 N   0  0  0  0  0  0  0 

In [None]:
%%capture
!wget https://github.com/MolSSI-Education/iqb-2025/raw/refs/heads/main/data/docking_files.zip
!wget https://raw.githubusercontent.com/MolSSI-Education/iqb-2025/refs/heads/main/util.py

In [None]:
!unzip docking_files.zip

Archive:  docking_files.zip
   creating: docking_files/
   creating: docking_files/protein_structures/
  inflating: docking_files/protein_structures/7L11_aligned_fixed.pdb  
  inflating: docking_files/protein_structures/7L11_aligned.pdb  
  inflating: docking_files/protein_structures/7LME_fixed.pdb  
  inflating: docking_files/protein_structures/7LME.pdb  
  inflating: docking_files/protein_structures/7L11.pdb  
  inflating: docking_files/protein_structures/7LME.pdbqt  
  inflating: docking_files/protein_structures/7L11.pdbqt  
   creating: docking_files/ligand_structures/
  inflating: docking_files/ligand_structures/Y6J_ideal.sdf  
  inflating: docking_files/ligand_structures/XF1_ideal.sdf  
  inflating: docking_files/ligand_structures/ligands_to_dock.sdf  
  inflating: docking_files/ligand_structures/Y6J_corrected_pose.sdf  
  inflating: docking_files/ligand_structures/XF1_corrected_pose.sdf  
  inflating: docking_files/ligand_structures/Y6J_fromPDB.pdb  
  inflating: docking_files/l

## Step 2.2 Set protein and ligand directory

In [14]:
import os
import requests

pdb_id = input("Enter PDB code: ") # The Protein ID we're looking at
ligand_id = input("Enter ligand code: ") # The ID of the co-crystallized ligand

# Start by making a directory for us to work in and stage our intermediate files
protein_directory = "molecular_docking/protein_files"
protein_filename = f"{pdb_id}.pdb"
ligand_directory = "molecular_docking/ligand_structures"
ideal_ligand_filename = f"{ligand_id}_ideal.sdf" # Name of target ligand downloaded from RCSB PDB
docking_results_directory = "molecular_docking/docking_results"


Enter PDB code: 1XTJ
Enter ligand code: ADP


# Step 3 Docking

Redocking (also called "cognate docking") involves redocking a ligand back into the receptor structure from which the bound pose was experimentally determined.
Redocking is typically done to evaluate how well a docking program's sampling algorithm and scoring function and reproduce a known experimental binding pose.

We will begin our docking journey with gnina by performing a redock of our ligand.

In [16]:
from util import visualize_poses

v = visualize_poses(
    f"{protein_directory}/{pdb_id}_A_fixed.pdb",
    f"{ligand_directory}/{ligand_id}_ideal.sdf"
)
v.show()

In [17]:
from util import visualize_poses

v = visualize_poses(
    f"{protein_directory}/{pdb_id}_A.pdb",
    f"{ligand_directory}/{ligand_id}_corrected_pose.sdf"
)
v.show()

Commands for running Gnina

```
./gnina \
  # Specify the receptor structure file (-r).
  # This file (e.g. 7LME.pdbqt) should be prepared for docking (e.g., with hydrogens added).
  -r docking_files/7LME_all_atom.pdbqt \
  # Specify the ligand structure file (-l) to be docked.
  # This file (Y6J_ideal.pdbqt) contains the 3D coordinates of the ligand.
  -l docking_files/Y6J_ideal.pdbqt \
  # Define the docking search box automatically (--autobox_ligand).
  # The box will be centered around the coordinates of the ligand in the specified file
  # (Y6J_corrected_pose.sdf), which is the known experimental pose in this redocking example.
  # An optional padding (default 4Å) is added.
  --autobox_ligand docking_files/Y6J_corrected_pose.sdf \
  # Specify the output file path (-o) where the resulting docked poses will be saved.
  # The output format will be SDF, containing multiple poses ranked by score.
  -o docking_results/Y6J_docked_e12.sdf \
  # Set the random number generator seed (--seed) to 0.
  # Using a fixed seed makes the docking calculation reproducible.
  --seed 0 \
  # Set the exhaustiveness level (--exhaustiveness) to 12.
  # This controls the number of Monte Carlo chains for the ligand.
  # The default is 8
  --exhaustiveness 16
  # Run without Convolutional Neural Network (CNN) score
  --cnn_scoring none
  ```
```
Full command list:

Input:
  -r [ --receptor ] arg              rigid part of the receptor
  --flex arg                         flexible side chains, if any (PDBQT)
  -l [ --ligand ] arg                ligand(s)
  --flexres arg                      flexible side chains specified by comma
                                     separated list of chain:resid
  --flexdist_ligand arg              Ligand to use for flexdist
  --flexdist arg                     set all side chains within specified
                                     distance to flexdist_ligand to flexible
  --flex_limit arg                   Hard limit for the number of flexible
                                     residues
  --flex_max arg                     Retain at at most the closest flex_max
                                     flexible residues

Search space (required):
  --center_x arg                     X coordinate of the center
  --center_y arg                     Y coordinate of the center
  --center_z arg                     Z coordinate of the center
  --size_x arg                       size in the X dimension (Angstroms)
  --size_y arg                       size in the Y dimension (Angstroms)
  --size_z arg                       size in the Z dimension (Angstroms)
  --autobox_ligand arg               Ligand to use for autobox. A multi-ligand
                                     file still only defines a single box.
  --autobox_add arg                  Amount of buffer space to add to
                                     auto-generated box (default +4 on all six
                                     sides)
  --autobox_extend arg (=1)          Expand the autobox if needed to ensure the
                                     input conformation of the ligand being
                                     docked can freely rotate within the box.
  --no_lig                           no ligand; for sampling/minimizing
                                     flexible residues

Covalent docking:
  --covalent_rec_atom arg            Receptor atom ligand is covalently bound
                                     to.  Can be specified as
                                     chain:resnum:atom_name or as x,y,z
                                     Cartesian coordinates.
  --covalent_lig_atom_pattern arg    SMARTS expression for ligand atom that
                                     will covalently bind protein.
  --covalent_lig_atom_position arg   Optional.  Initial placement of covalently
                                     bonding ligand atom in x,y,z Cartesian
                                     coordinates.  If not specified,
                                     OpenBabel's GetNewBondVector function will
                                     be used to position ligand.
  --covalent_fix_lig_atom_position   If covalent_lig_atom_position is
                                     specified, fix the ligand atom to this
                                     position as opposed to using this position
                                     to define the initial structure.
  --covalent_bond_order arg (=1)     Bond order of covalent bond. Default 1.
  --covalent_optimize_lig            Optimize the covalent complex of ligand
                                     and residue using UFF. This will change
                                     bond angles and lengths of the ligand.

Scoring and minimization options:
  --scoring arg                      specify alternative built-in scoring
                                     function: ad4_scoring default dkoes_fast
                                     dkoes_scoring dkoes_scoring_old vina
                                     vinardo
  --custom_scoring arg               custom scoring function file
  --custom_atoms arg                 custom atom type parameters file
  --score_only                       score provided ligand pose
  --local_only                       local search only using autobox (you
                                     probably want to use --minimize)
  --minimize                         energy minimization
  --randomize_only                   generate random poses, attempting to avoid
                                     clashes
  --num_mc_steps arg                 fixed number of monte carlo steps to take
                                     in each chain
  --max_mc_steps arg                 cap on number of monte carlo steps to take
                                     in each chain
  --num_mc_saved arg                 number of top poses saved in each monte
                                     carlo chain
  --temperature arg                  temperature for metropolis accept
                                     criterion
  --minimize_iters arg (=0)          number iterations of steepest descent;
                                     default scales with rotors and usually
                                     isn't sufficient for convergence
  --accurate_line                    use accurate line search
  --simple_ascent                    use simple gradient ascent
  --minimize_early_term              Stop minimization before convergence
                                     conditions are fully met.
  --minimize_single_full             During docking perform a single full
                                     minimization instead of a truncated
                                     pre-evaluate followed by a full.
  --approximation arg                approximation (linear, spline, or exact)
                                     to use
  --factor arg                       approximation factor: higher results in a
                                     finer-grained approximation
  --force_cap arg                    max allowed force; lower values more
                                     gently minimize clashing structures
  --user_grid arg                    Autodock map file for user grid data based
                                     calculations
  --user_grid_lambda arg (=-1)       Scales user_grid and functional scoring
  --print_terms                      Print all available terms with default
                                     parameterizations
  --print_atom_types                 Print all available atom types

Convolutional neural net (CNN) scoring:
  --cnn_scoring arg (=1)             Amount of CNN scoring: none, rescore
                                     (default), refinement, metrorescore
                                     (metropolis+rescore), metrorefine
                                     (metropolis+refine), all
  --cnn arg                          built-in model to use, specify
                                     PREFIX_ensemble to evaluate an ensemble of
                                     models starting with PREFIX:
                                     all_default_to_default_1_3_1
                                     all_default_to_default_1_3_2
                                     all_default_to_default_1_3_3
                                     crossdock_default2018
                                     crossdock_default2018_1
                                     crossdock_default2018_1_3
                                     crossdock_default2018_1_3_1
                                     crossdock_default2018_1_3_2
                                     crossdock_default2018_1_3_3
                                     crossdock_default2018_1_3_4
                                     crossdock_default2018_2
                                     crossdock_default2018_3
                                     crossdock_default2018_4
                                     crossdock_default2018_KD_1
                                     crossdock_default2018_KD_2
                                     crossdock_default2018_KD_3
                                     crossdock_default2018_KD_4
                                     crossdock_default2018_KD_5 default1.0
                                     default2017 dense dense_1 dense_1_3
                                     dense_1_3_1 dense_1_3_2 dense_1_3_3
                                     dense_1_3_4 dense_1_3_PT_KD
                                     dense_1_3_PT_KD_1 dense_1_3_PT_KD_2
                                     dense_1_3_PT_KD_3 dense_1_3_PT_KD_4
                                     dense_1_3_PT_KD_def2018
                                     dense_1_3_PT_KD_def2018_1
                                     dense_1_3_PT_KD_def2018_2
                                     dense_1_3_PT_KD_def2018_3
                                     dense_1_3_PT_KD_def2018_4 dense_2 dense_3
                                     dense_4 fast general_default2018
                                     general_default2018_1
                                     general_default2018_2
                                     general_default2018_3
                                     general_default2018_4
                                     general_default2018_KD_1
                                     general_default2018_KD_2
                                     general_default2018_KD_3
                                     general_default2018_KD_4
                                     general_default2018_KD_5
                                     redock_default2018 redock_default2018_1
                                     redock_default2018_1_3
                                     redock_default2018_1_3_1
                                     redock_default2018_1_3_2
                                     redock_default2018_1_3_3
                                     redock_default2018_1_3_4
                                     redock_default2018_2 redock_default2018_3
                                     redock_default2018_4 redock_default2018_KD
                                     _1 redock_default2018_KD_2
                                     redock_default2018_KD_3
                                     redock_default2018_KD_4
                                     redock_default2018_KD_5
  --cnn_model arg                    torch cnn model file; if not specified a
                                     default model ensemble will be used
  --cnn_rotation arg (=0)            evaluate multiple rotations of pose (max
                                     24)
  --cnn_mix_emp_force                Merge CNN and empirical minus forces
  --cnn_mix_emp_energy               Merge CNN and empirical energy
  --cnn_empirical_weight arg (=1)    Weight for scaling and merging empirical
                                     force and energy
  --cnn_center_x arg                 X coordinate of the CNN center
  --cnn_center_y arg                 Y coordinate of the CNN center
  --cnn_center_z arg                 Z coordinate of the CNN center
  --cnn_verbose                      Enable verbose output for CNN debugging

Output:
  -o [ --out ] arg                   output file name, format taken from file
                                     extension
  --out_flex arg                     output file for flexible receptor residues
  --log arg                          optionally, write log file
  --atom_terms arg                   optionally write per-atom interaction term
                                     values
  --atom_term_data                   embedded per-atom interaction terms in
                                     output sd data
  --pose_sort_order arg (=0)         How to sort docking results: CNNscore
                                     (default), CNNaffinity, Energy
  --full_flex_output                 Output entire structure for out_flex, not
                                     just flexible residues.

Misc (optional):
  --cpu arg                          the number of CPUs to use (the default is
                                     to try to detect the number of CPUs or,
                                     failing that, use 1)
  --seed arg                         explicit random seed
  --exhaustiveness arg (=8)          exhaustiveness of the global search
                                     (roughly proportional to time)
  --num_modes arg (=9)               maximum number of binding modes to
                                     generate
  --min_rmsd_filter arg (=1)         rmsd value used to filter final poses to
                                     remove redundancy
  -q [ --quiet ]                     Suppress output messages
  --addH arg                         automatically add hydrogens in ligands (on
                                     by default)
  --stripH arg                       remove polar hydrogens from molecule
                                     _after_ performing atom typing for
                                     efficiency (off by default - nonpolar are
                                     always removed)
  --device arg (=0)                  GPU device to use
  --no_gpu                           Disable GPU acceleration, even if
                                     available.

Configuration file (optional):
  --config arg                       the above options can be put here

Information (optional):
  --help                             display usage summary
  --help_hidden                      display usage summary with hidden options
  --version                          display program version


  Execute the next cell to run gnina.```

## Step 3.1 Redocking the extracted ligand

In [19]:
# Run gnina
!mkdir molecular_docking/docking_results

ex = int(input("Define exhaustiveness: "))

cmd = f"""./gnina \
  -r {protein_directory}/{pdb_id}_A.pdbqt \
  -l {ligand_directory}/{ligand_id}_ideal.sdf \
  --autobox_ligand {ligand_directory}/{ligand_id}_corrected_pose.sdf \
  -o {docking_results_directory}/{ligand_id}_docked_{pdb_id}.sdf \
  --seed 0 \
  --exhaustiveness {ex}"""

!{cmd}

mkdir: cannot create directory ‘molecular_docking/docking_results’: File exists
Define exhaustiveness: 16
              _             
             (_)            
   __ _ _ __  _ _ __   __ _ 
  / _` | '_ \| | '_ \ / _` |
 | (_| | | | | | | | | (_| |
  \__, |_| |_|_|_| |_|\__,_|
   __/ |                    
  |___/                     

gnina  master:25e64da   Built Apr 23 2025.
gnina is based on smina and AutoDock Vina.
Please cite appropriately.

Recommend running with single model (--cnn fast)
or without cnn scoring (--cnn_scoring=none).

Commandline: ./gnina -r molecular_docking/protein_files/1XTJ_A.pdbqt -l molecular_docking/ligand_structures/ADP_ideal.sdf --autobox_ligand molecular_docking/ligand_structures/ADP_corrected_pose.sdf -o molecular_docking/docking_results/ADP_docked_1XTJ.sdf --seed 0 --exhaustiveness 16
Using random seed: 0

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************

# Step 3.2 Docking multiple ligands

In this section, we will dock the ligands we prepared in our previous notebook. Luckily, gnina allows docking of multiple ligands by providing an SDF with your ligands of choice.

In [None]:
# Run gnina
!mkdir molecular_docking/docking_results
ex = int(input("Define exhaustiveness: "))

cmd = f"""./gnina \
  -r {protein_directory}/{pdb_id}_A.pdbqt \
  -l {ligand_directory}/"ligands_to_dock.sdf" \
  --autobox_ligand {ligand_directory}/{ligand_id}_corrected_pose.sdf \
  -o {docking_results_directory}/multiple_ligands_docked.sdf \
  --seed 0 \
  --exhaustiveness {ex}"""

!{cmd}

mkdir: cannot create directory ‘molecular_docking/docking_results’: File exists
Define exhaustiveness: 16
              _             
             (_)            
   __ _ _ __  _ _ __   __ _ 
  / _` | '_ \| | '_ \ / _` |
 | (_| | | | | | | | | (_| |
  \__, |_| |_|_|_| |_|\__,_|
   __/ |                    
  |___/                     

gnina  master:25e64da   Built Apr 23 2025.
gnina is based on smina and AutoDock Vina.
Please cite appropriately.

Recommend running with single model (--cnn fast)
or without cnn scoring (--cnn_scoring=none).

Commandline: ./gnina -r molecular_docking/protein_files/1XTJ_A.pdbqt -l molecular_docking/ligand_structures/ligands_to_dock.sdf --autobox_ligand molecular_docking/ligand_structures/ADP_corrected_pose.sdf -o molecular_docking/docking_results/multiple_ligands_docked.sdf --seed 0 --exhaustiveness 16
Using random seed: 0

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
*************************

## Step 3.3 Flexible docking

In [None]:
# Flexible docking with known site/residues
!mkdir molecular_docking/docking_results
ex = int(input("Define exhaustiveness: "))

cmd = f"""./gnina \
  -r {protein_directory}/{pdb_id}_A.pdbqt \
  -l {ligand_directory}/"ligands_to_dock.sdf" \
  --autobox_ligand {ligand_directory}/{pdb_id}.pdbqt \
  -o {docking_results_directory}/multiple_ligands_docked_flex.sdf \
  --flexdist_ligand {ligand_directory}/{ligand_id}_corrected_pose.sdf \
  --flexdist 3.59 \
  --cnn_scoring none \
  --seed 0 \
  --exhaustiveness {ex}"""

!{cmd}

## Step 3.4 Whole protein docking

In [None]:
# Docking with whole protein
# Run gnina
!mkdir molecular_docking/docking_results

cmd = f"""./gnina \
  -r {protein_directory}/{pdb_id}_A.pdbqt \
  -l {ligand_directory}/"ligands_to_dock.sdf" \
  --autobox_ligand {ligand_directory}/{pdb_id}.pdbqt \
  -o {docking_results_directory}/{ligand_id}_docked_whole_{pdb_id}.sdf \
  --cnn_scoring none \
  --seed 0 \
  --exhaustiveness 64"""

!{cmd}

## Step 4 Analysis and processing results

## Step 4.1 Pose analysis via RSMD

In [None]:
v = visualize_poses(
    f"{protein_directory}/{pdb_id}_A_fixed.pdb",
    f"{docking_results_directory}/{ligand_id}_docked_{pdb_id}.sdf",
    cognate_file=f"{ligand_directory}/{ligand_id}_corrected_pose.sdf",
    animate=False,
)  # Change to True to see an animation of all of the poses
v.show()

In [None]:
# Compare the docking results against the ligand with the corrected pose. RSMD values closer to 1Å convey excellent overlaps of docked poses.
import useful_rdkit_utils as uru
from rdkit import Chem

cognate = Chem.MolFromMolFile(f"{ligand_directory}/{ligand_id}_corrected_pose.sdf")
poses = Chem.SDMolSupplier(f"{docking_results_directory}/{ligand_id}_docked_{pdb_id}.sdf")

for i, pose in enumerate(poses):
    n_match, rmsd = uru.mcs_rmsd(cognate, pose)
    print(f"{n_match}\t{rmsd:.2f}")

OSError: File error: Bad input file molecular_docking/docking_results/TJ5_docked_7BCS.sdf

In [None]:
#!wget https://github.com/MolSSI-Education/iqb-2025/raw/refs/heads/main/data/docking_results.zip
#!unzip -o docking_results.zip

--2025-10-15 08:16:56--  https://github.com/MolSSI-Education/iqb-2025/raw/refs/heads/main/data/docking_results.zip
Resolving github.com (github.com)... 140.82.112.4
Connecting to github.com (github.com)|140.82.112.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/MolSSI-Education/iqb-2025/refs/heads/main/data/docking_results.zip [following]
--2025-10-15 08:16:56--  https://raw.githubusercontent.com/MolSSI-Education/iqb-2025/refs/heads/main/data/docking_results.zip
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.109.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 42631 (42K) [application/zip]
Saving to: ‘docking_results.zip.1’


2025-10-15 08:16:56 (3.33 MB/s) - ‘docking_results.zip.1’ saved [42631/42631]

Archive:  docking_results.zip
  inflatin

## Step 4.2 Extracting the Scores

gnina stores information docking poses and the score information in the SDF written for the dock.
To analyze and compare the results from all our docking runs (the redocked `Y6J` and all the `Compound_*` ligands), we need to extract this scoring information from the SDF and put it into a structured table.

We can use RDKit PandasTools to read the molecular structures (poses) and their associated properties (scores) from the output SDF. The SDF will contain multiple poses for the docked ligands, and each pose record has the calculated scores (like `minimizedAffinity`, `CNNscore`, `CNNaffinity`, `CNN_VS`, etc.) stored as data fields. The `CNN_VS` score is the product of `CNNscore` and `CNNaffinity`. We would typically want ligands that score highly for both (and thus have a high `CNN_VS` score.

In [None]:
# uncomment to see file
#!cat docking_results/multiple_ligands_results.sdf

cat: docking_results/multiple_ligands_results.sdf: No such file or directory


In [None]:
from rdkit.Chem import PandasTools
from rdkit.rdBase import BlockLogs
import pandas as pd

score_columns = [
    "minimizedAffinity",
    "CNNscore",
    "CNNaffinity",
    "CNN_VS",
    "CNNaffinity_variance",
]

sdf_paths = [
    f"{docking_results_directory}/multiple_ligands_docked.sdf",
    f"{docking_results_directory}/{ligand_id}_docked_{pdb_id}.sdf",
]

df_list = []
for filename in sdf_paths:
    with BlockLogs():
        df_list.append(PandasTools.LoadSDF(filename))

combo_df = pd.concat(df_list)

# PandasTools reads all SDTags as strings, convert score columns to float
for col in score_columns:
    combo_df[col] = combo_df[col].astype(float)

combo_df

FileNotFoundError: [Errno 2] No such file or directory: 'molecular_docking/docking_results/TJ5_docked_7BCS.sdf'

In [None]:
# Sort by ascending docking scores

#top_poses = combo_df.sort_values(
#    by="minimizedAffinity", ascending=True
#).drop_duplicates("ID")
#top_poses

top_poses = combo_df.sort_values(
    by="minimizedAffinity", ascending=True
)
top_poses

Unnamed: 0,ScrubInfo,minimizedAffinity,CNNscore,CNNaffinity,CNN_VS,CNNaffinity_variance,ID,ROMol,model_server_result.job_id,model_server_result.datetime_utc,...,model_server_result.source_id,model_server_result.entry_id,model_server_params.name,model_server_params.value,model_server_stats.io_time_ms,model_server_stats.parse_time_ms,model_server_stats.create_model_time_ms,model_server_stats.query_time_ms,model_server_stats.encode_time_ms,model_server_stats.element_count
5,"{""isomerGroup"": 4, ""isomerId"": 0, ""confId"": 0,...",-9.34666,0.454894,6.811048,3.098305,0.465547,Compound_12_i0,<rdkit.Chem.rdchem.Mol object at 0x792c30e159a0>,,,...,,,,,,,,,,
2,"{""isomerGroup"": 4, ""isomerId"": 0, ""confId"": 0,...",-9.34157,0.615528,6.917795,4.258096,0.597604,Compound_12_i0,<rdkit.Chem.rdchem.Mol object at 0x792c30e15850>,,,...,,,,,,,,,,
18,"{""isomerGroup"": 4, ""isomerId"": 1, ""confId"": 0,...",-9.20305,0.735542,7.275836,5.351683,0.277724,Compound_12_i1,<rdkit.Chem.rdchem.Mol object at 0x792c30e15f50>,,,...,,,,,,,,,,
0,"{""isomerGroup"": 4, ""isomerId"": 0, ""confId"": 0,...",-9.14787,0.764619,7.065590,5.402485,0.458909,Compound_12_i0,<rdkit.Chem.rdchem.Mol object at 0x792c30e157e0>,,,...,,,,,,,,,,
32,"{""isomerGroup"": 2, ""isomerId"": 0, ""confId"": 0,...",-9.14357,0.407603,6.525291,2.659726,0.556016,Compound_1,<rdkit.Chem.rdchem.Mol object at 0x792c30e16570>,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7,,-6.04374,0.188582,5.081180,0.958221,1.192335,Y6J,<rdkit.Chem.rdchem.Mol object at 0x792c30e17370>,00w3jbVGXEL2eAat8Jcs7A,2025-10-23 13:47:30,...,pdb-bcif,7lme,atom_site,"{""label_asym_id"":""C"",""auth_seq_id"":401}",10,7,5,288,0,31
6,,-5.91551,0.214208,5.681234,1.216968,0.087496,Y6J,<rdkit.Chem.rdchem.Mol object at 0x792c30e17300>,00w3jbVGXEL2eAat8Jcs7A,2025-10-23 13:47:30,...,pdb-bcif,7lme,atom_site,"{""label_asym_id"":""C"",""auth_seq_id"":401}",10,7,5,288,0,31
8,,-5.89276,0.182263,5.715957,1.041806,0.253039,Y6J,<rdkit.Chem.rdchem.Mol object at 0x792c30e173e0>,00w3jbVGXEL2eAat8Jcs7A,2025-10-23 13:47:30,...,pdb-bcif,7lme,atom_site,"{""label_asym_id"":""C"",""auth_seq_id"":401}",10,7,5,288,0,31
1,,-5.81137,0.373810,6.059344,2.265046,0.045067,Y6J,<rdkit.Chem.rdchem.Mol object at 0x792c30e170d0>,00w3jbVGXEL2eAat8Jcs7A,2025-10-23 13:47:30,...,pdb-bcif,7lme,atom_site,"{""label_asym_id"":""C"",""auth_seq_id"":401}",10,7,5,288,0,31


In [None]:
# Save top poses as excel file
top_poses.to_excel(f"{docking_results_directory}/multiple_ligands_results.xlsx")

In [None]:
top_poses_top = top_poses[['minimizedAffinity', 'CNNaffinity']][top_poses['minimizedAffinity'] <= -7]
top_poses_top.head()

Unnamed: 0,minimizedAffinity,CNNaffinity
52,-6.91704,6.508137
30,-6.90586,6.485485
2,-6.87082,5.726686
46,-6.86254,6.501461
49,-6.83454,6.210299


## Step 4.3 Visualize docking results
We can also use our visualization function to visualize the docked ligands to look at how they interact with the binding site.

In [None]:
v = visualize_poses(
    f"{protein_directory}/{pdb_id}_A_fixed.pdb",
    #"7BCS_A_fixed.pdb",
    f"{docking_results_directory}/{ligand_id}_docked_{pdb_id}.sdf",
    #"multiple_ligands_docked.sdf",
    cognate_file=f"{ligand_directory}/{ligand_id}_corrected_pose.sdf",
    #cognate_file="TJ5_corrected_pose.sdf",
    animate=True,
)  # Change to True to see an animation of all of the poses
v.show()