<a href="https://colab.research.google.com/github/mitsukacke2285/drug_discovery_repo/blob/main/Docking_with_gnina.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Set Up
The cells in this section set up the software and files we will need for our calculations.
Gnina will run within a linux environment provided by google colab virtual machine.




### Install Python Packages  
1. `useful_rdkit_utils` is a Python package written and maintained by Pat Walters that contains useful RDKit functions. We will use it for the functions `mcs_rmsd` (explained later).
2. `py3Dmol` is used for molecular visualization.
3. The RDKit is a popular cheminiformatics package we will use for processing molecules.


In [21]:
%%capture
!pip install useful_rdkit_utils py3Dmol rdkit # If this command doesn't work, run each command separately
!apt install openbabel

In [22]:
!pip install py3Dmol



In [23]:
!pip install rdkit



In [24]:
!pip install useful_rdkit_utils



In [26]:
!curl -L -O https://raw.githubusercontent.com/MolSSI-Education/iqb-2025/main/util.py

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100  2927  100  2927    0     0  16143      0 --:--:-- --:--:-- --:--:-- 16171


### Download gnina

We are downloading the pre-compiled binary of gnina. You may also compile gnina yourself by following the directions on the [gnina GitHub repository](https://github.com/gnina/gnina).

In [2]:
# Download gnina
!wget https://github.com/gnina/gnina/releases/download/v1.3/gnina.fix

--2025-10-16 07:53:06--  https://github.com/gnina/gnina/releases/download/v1.3/gnina.fix
Resolving github.com (github.com)... 140.82.113.4
Connecting to github.com (github.com)|140.82.113.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://release-assets.githubusercontent.com/github-production-release-asset/45548146/a7090e9d-ca5b-4232-b307-e29a70dbe6d5?sp=r&sv=2018-11-09&sr=b&spr=https&se=2025-10-16T08%3A51%3A58Z&rscd=attachment%3B+filename%3Dgnina.fix&rsct=application%2Foctet-stream&skoid=96c2d410-5711-43a1-aedd-ab1947aa7ab0&sktid=398a6654-997b-47e9-b12b-9515b896b4de&skt=2025-10-16T07%3A51%3A49Z&ske=2025-10-16T08%3A51%3A58Z&sks=b&skv=2018-11-09&sig=quq5z44qPaNMj3%2BCutikXSjznWnseXlVA16QaGAip1w%3D&jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmVsZWFzZS1hc3NldHMuZ2l0aHVidXNlcmNvbnRlbnQuY29tIiwia2V5Ijoia2V5MSIsImV4cCI6MTc2MDYwNDc4NiwibmJmIjoxNzYwNjAxMTg2LCJwYXRoIjoicmVsZWFzZWFzc2V0cHJvZHVjdGlvbi5ibG9iLmNvcmUud2luZG93cy5

In [3]:
# Make gnina executable
!mv gnina.fix gnina
!chmod +x gnina

### Get Lesson Files

We have stored the files created in the last notebook as a zip file and stored it on GitHub. This cell downloads that file as well as `util.py` which contains a custom utility function for visualizing our ligand and protein.



In [None]:
#%%capture
!wget https://github.com/MolSSI-Education/iqb-2025/raw/refs/heads/main/data/docking_files.zip
!wget https://raw.githubusercontent.com/MolSSI-Education/iqb-2025/refs/heads/main/util.py

--2025-10-15 07:19:09--  https://github.com/MolSSI-Education/iqb-2025/raw/refs/heads/main/data/docking_files.zip
Resolving github.com (github.com)... 140.82.112.3
Connecting to github.com (github.com)|140.82.112.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/MolSSI-Education/iqb-2025/refs/heads/main/data/docking_files.zip [following]
--2025-10-15 07:19:10--  https://raw.githubusercontent.com/MolSSI-Education/iqb-2025/refs/heads/main/data/docking_files.zip
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1090053 (1.0M) [application/zip]
Saving to: ‘docking_files.zip’


2025-10-15 07:19:10 (18.1 MB/s) - ‘docking_files.zip’ saved [1090053/1090053]

--2025-10-15 07:19:10--  https://raw.githubuser

In [None]:
!unzip docking_files.zip

Archive:  docking_files.zip
   creating: docking_files/
   creating: docking_files/protein_structures/
  inflating: docking_files/protein_structures/7L11_aligned_fixed.pdb  
  inflating: docking_files/protein_structures/7L11_aligned.pdb  
  inflating: docking_files/protein_structures/7LME_fixed.pdb  
  inflating: docking_files/protein_structures/7LME.pdb  
  inflating: docking_files/protein_structures/7L11.pdb  
  inflating: docking_files/protein_structures/7LME.pdbqt  
  inflating: docking_files/protein_structures/7L11.pdbqt  
   creating: docking_files/ligand_structures/
  inflating: docking_files/ligand_structures/Y6J_ideal.sdf  
  inflating: docking_files/ligand_structures/XF1_ideal.sdf  
  inflating: docking_files/ligand_structures/ligands_to_dock.sdf  
  inflating: docking_files/ligand_structures/Y6J_corrected_pose.sdf  
  inflating: docking_files/ligand_structures/XF1_corrected_pose.sdf  
  inflating: docking_files/ligand_structures/Y6J_fromPDB.pdb  
  inflating: docking_files/l

In [28]:
from google.colab import files

# Upload {pdb_id}_A.pdbqt {pdb_id}_A_fixed.pdb files from local PC to your Colab VM
files.upload("molecular_docking/protein_files")

# Download a file from your Colab VM to local PC
#files.download('mylocalfile.txt')

Saving 1XTJ_A_fixed.pdb to molecular_docking/protein_files/1XTJ_A_fixed.pdb


{'molecular_docking/protein_files/1XTJ_A_fixed.pdb': b'REMARK   1 CREATED WITH OPENMM 8.3.1, 2025-10-14\r\nATOM      1  N   SER A   1     -12.037  -9.689  22.684  1.00  0.00           N  \r\nATOM      2  H   SER A   1     -12.645  -8.890  22.825  1.00  0.00           H  \r\nATOM      3  H2  SER A   1     -11.154  -9.482  23.141  1.00  0.00           H  \r\nATOM      4  H3  SER A   1     -12.444 -10.516  23.093  1.00  0.00           H  \r\nATOM      5  CA  SER A   1     -11.760  -9.894  21.249  1.00  0.00           C  \r\nATOM      6  HA  SER A   1     -11.271  -9.002  20.865  1.00  0.00           H  \r\nATOM      7  C   SER A   1     -13.048 -10.123  20.468  1.00  0.00           C  \r\nATOM      8  O   SER A   1     -13.559 -11.238  20.470  1.00  0.00           O  \r\nATOM      9  CB  SER A   1     -10.802 -11.074  21.056  1.00  0.00           C  \r\nATOM     10  HB2 SER A   1     -10.551 -11.199  20.006  1.00  0.00           H  \r\nATOM     11  HB3 SER A   1     -11.273 -11.992  21.41

In [8]:
# Upload {ligand_id}_ideal.sdf, {ligand_id}_corrected_pose.sdf  ligands_to_dock.sdf from local machine to google colab VM
files.upload("molecular_docking/ligand_structures")

Saving ligands_to_dock.sdf to molecular_docking/ligand_structures/ligands_to_dock.sdf


{'molecular_docking/ligand_structures/ligands_to_dock.sdf': b'_i0\r\n     RDKit          3D\r\n\r\n 24 25  0  0  0  0  0  0  0  0999 V2000\r\n   -5.5393   -2.0346    1.4921 Br  0  0  0  0  0  0  0  0  0  0  0  0\r\n   -4.1450   -0.8957    0.8969 C   0  0  0  0  0  0  0  0  0  0  0  0\r\n   -2.9065   -1.4434    0.6202 C   0  0  0  0  0  0  0  0  0  0  0  0\r\n   -1.8922   -0.5860    0.1912 C   0  0  0  0  0  0  0  0  0  0  0  0\r\n   -0.5608   -1.1736   -0.1141 C   0  0  0  0  0  0  0  0  0  0  0  0\r\n    0.5163   -0.3209   -0.0682 N   0  0  0  0  0  0  0  0  0  0  0  0\r\n    1.8612   -0.8355   -0.2059 C   0  0  0  0  0  0  0  0  0  0  0  0\r\n    2.7930    0.3058   -0.3749 C   0  0  0  0  0  0  0  0  0  0  0  0\r\n    3.8019    0.5010   -1.2923 C   0  0  0  0  0  0  0  0  0  0  0  0\r\n    4.3066    1.7252   -0.9733 N   0  0  0  0  0  0  0  0  0  0  0  0\r\n    3.6584    2.2658    0.0686 N   0  0  0  0  0  0  0  0  0  0  0  0\r\n    2.7280    1.4028    0.4509 N   0  0  0  0  0  0  0 

## Set protein and ligand directory

In [11]:
import os
import requests

pdb_id = input("Enter PDB code: ") # The Protein ID we're looking at
ligand_id = input("Enter ligand code: ") # The ligand ID we're looking at

# Start by making a directory for us to work in and stage our intermediate files
protein_directory = "molecular_docking/protein_files"
protein_filename = f"{pdb_id}.pdb"
ligand_directory = "molecular_docking/ligand_structures"
ideal_ligand_filename = f"{ligand_id}_ideal.sdf"


Enter PDB code: 1XTJ
Enter ligand code: ADP


## Redocking the Ligand

Redocking (also called "cognate docking") involves redocking a ligand back into the receptor structure from which the bound pose was experimentally determined.
Redocking is typically done to evaluate how well a docking program's sampling algorithm and scoring function and reproduce a known experimental binding pose.

We will begin our docking journey with gnina by performing a redock of our ligand.

In [27]:
from util import visualize_poses

v = visualize_poses(
    f"{protein_directory}/{pdb_id}_A_fixed.pdb",
    f"{ligand_directory}/{ligand_id}_ideal.sdf",
    animate=False
)
v.show()

FileNotFoundError: [Errno 2] No such file or directory: 'molecular_docking/protein_files/1XTJ_A_fixed.pdb'

You may execute the cell below, and read the following explanation on the input parameters for gnina. gnina works through the command line, so we cannot use in-line comments.

```
./gnina \
  # Specify the receptor structure file (-r).
  # This file (7LME.pdbqt) should be prepared for docking (e.g., with hydrogens added).
  -r docking_files/7LME_all_atom.pdbqt \
  # Specify the ligand structure file (-l) to be docked.
  # This file (Y6J_ideal.pdbqt) contains the 3D coordinates of the ligand.
  -l docking_files/Y6J_ideal.pdbqt \
  # Define the docking search box automatically (--autobox_ligand).
  # The box will be centered around the coordinates of the ligand in the specified file
  # (Y6J_corrected_pose.sdf), which is the known experimental pose in this redocking example.
  # An optional padding (default 4Å) is added.
  --autobox_ligand docking_files/Y6J_corrected_pose.sdf \
  # Specify the output file path (-o) where the resulting docked poses will be saved.
  # The output format will be SDF, containing multiple poses ranked by score.
  -o docking_results/Y6J_docked_e12.sdf \
  # Set the random number generator seed (--seed) to 0.
  # Using a fixed seed makes the docking calculation reproducible.
  --seed 0 \
  # Set the exhaustiveness level (--exhaustiveness) to 12.
  # This controls the number of Monte Carlo chains for the ligand.
  # The default is 8
  --exhaustiveness 16
  ```

  Execute the next cell to run gnina.


In [None]:
# make a folder for our results
!mkdir -p docking_results

# use gnina
!./gnina \
  -r docking_files/protein_structures/7LME.pdbqt \
  -l docking_files/ligand_structures/Y6J_ideal.sdf \
  --autobox_ligand docking_files/ligand_structures/Y6J_corrected_pose.sdf \
  -o docking_results/Y6J_docked_7LME.sdf \
  --seed 0 \
  --exhaustiveness 16

              _             
             (_)            
   __ _ _ __  _ _ __   __ _ 
  / _` | '_ \| | '_ \ / _` |
 | (_| | | | | | | | | (_| |
  \__, |_| |_|_|_| |_|\__,_|
   __/ |                    
  |___/                     

gnina  master:25e64da   Built Apr 23 2025.
gnina is based on smina and AutoDock Vina.
Please cite appropriately.

Recommend running with single model (--cnn fast)
or without cnn scoring (--cnn_scoring=none).

Commandline: ./gnina -r docking_files/protein_structures/7LME.pdbqt -l docking_files/ligand_structures/Y6J_ideal.sdf --autobox_ligand docking_files/ligand_structures/Y6J_corrected_pose.sdf -o docking_results/Y6J_docked_7LME.sdf --seed 0 --exhaustiveness 16
Using random seed: 0

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Y6J | pose 0 | initial pose not within box

mode |  affinity  |  intramol  |    CNN     |   CNN
     | (kcal/mol) | (kca

In [None]:
# Docking without CNN
!./gnina \
  -r docking_files/protein_structures/7LME.pdbqt \
  -l docking_files/ligand_structures/Y6J_ideal.sdf \
  --autobox_ligand docking_files/ligand_structures/Y6J_corrected_pose.sdf \
  -o docking_results/Y6J_docked_7LME.sdf \
  --cnn_scoring none \
  --seed 0 \
  --exhaustiveness 16

^C


### Visualizing the Docked Structures

In the cell below, we use a function called `visualize_docked_structures`, a custom function defined for this workshop that we obtained above when we retreived `util.py`.
This function allows us to view our generated docked structures along with ligand it its original experimentally determined position.

In [None]:
v = visualize_poses(
    "docking_files/protein_structures/7LME_fixed.pdb",
    "docking_results/Y6J_docked_7LME.sdf",
    cognate_file="docking_files/ligand_structures/Y6J_corrected_pose.sdf",
    animate=False,
)  # Change to True to see an animation of all of the poses
v.show()

**RMSD Analysis**

In [None]:
import useful_rdkit_utils as uru
from rdkit import Chem

cognate = Chem.MolFromMolFile("docking_files/ligand_structures/Y6J_corrected_pose.sdf")
poses = Chem.SDMolSupplier("docking_results/Y6J_docked_7LME.sdf")

for i, pose in enumerate(poses):
    n_match, rmsd = uru.mcs_rmsd(cognate, pose)
    print(f"{n_match}\t{rmsd:.2f}")

OSError: File error: Invalid input file docking_results/Y6J_docked_7LME.sdf

## Docking our Prepared Ligands

In this section, we will dock the ligands we prepared in our previous notebook. Luckily, gnina allows docking of multiple ligands by providing an SDF with your ligands of choice.

In [None]:
!./gnina \
  -r docking_files/protein_structures/7LME.pdbqt \
  -l docking_files/ligand_structures/ligands_to_dock.sdf \
  --autobox_ligand docking_files/ligand_structures/Y6J_corrected_pose.sdf \
  -o docking_results/multiple_ligands_docked.sdf \
  --cnn_scoring none \
  --seed 0 \
  --exhaustiveness 16

              _             
             (_)            
   __ _ _ __  _ _ __   __ _ 
  / _` | '_ \| | '_ \ / _` |
 | (_| | | | | | | | | (_| |
  \__, |_| |_|_|_| |_|\__,_|
   __/ |                    
  |___/                     

gnina  master:25e64da   Built Apr 23 2025.
gnina is based on smina and AutoDock Vina.
Please cite appropriately.

Recommend running with single model (--cnn fast)
or without cnn scoring (--cnn_scoring=none).

Commandline: ./gnina -r docking_files/protein_structures/7LME.pdbqt -l docking_files/ligand_structures/ligands_to_dock.sdf --autobox_ligand docking_files/ligand_structures/Y6J_corrected_pose.sdf -o docking_results/multiple_ligands_docked.sdf --cnn_scoring none --seed 0 --exhaustiveness 16
Using random seed: 0

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Compound_12_i0 | pose 0 | initial pose not within box

mode |  affinity  |  intramol  |

In [None]:
!wget https://github.com/MolSSI-Education/iqb-2025/raw/refs/heads/main/data/docking_results.zip
!unzip -o docking_results.zip

--2025-10-15 08:16:56--  https://github.com/MolSSI-Education/iqb-2025/raw/refs/heads/main/data/docking_results.zip
Resolving github.com (github.com)... 140.82.112.4
Connecting to github.com (github.com)|140.82.112.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/MolSSI-Education/iqb-2025/refs/heads/main/data/docking_results.zip [following]
--2025-10-15 08:16:56--  https://raw.githubusercontent.com/MolSSI-Education/iqb-2025/refs/heads/main/data/docking_results.zip
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.109.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 42631 (42K) [application/zip]
Saving to: ‘docking_results.zip.1’


2025-10-15 08:16:56 (3.33 MB/s) - ‘docking_results.zip.1’ saved [42631/42631]

Archive:  docking_results.zip
  inflatin

### Extracting the Scores

gnina stores information docking poses and the score information in the SDF written for the dock.
To analyze and compare the results from all our docking runs (the redocked `Y6J` and all the `Compound_*` ligands), we need to extract this scoring information from the SDF and put it into a structured table.

We can use RDKit PandasTools to read the molecular structures (poses) and their associated properties (scores) from the output SDF. The SDF will contain multiple poses for the docked ligands, and each pose record has the calculated scores (like `minimizedAffinity`, `CNNscore`, `CNNaffinity`, `CNN_VS`, etc.) stored as data fields. The `CNN_VS` score is the product of `CNNscore` and `CNNaffinity`. We would typically want ligands that score highly for both (and thus have a high `CNN_VS` score.

In [None]:
# uncomment to see file
!cat docking_results/multiple_ligands_results.sdf

cat: docking_results/multiple_ligands_results.sdf: No such file or directory


In [None]:
from rdkit.Chem import PandasTools
from rdkit.rdBase import BlockLogs
import pandas as pd

score_columns = [
    "minimizedAffinity",
    "CNNscore",
    "CNNaffinity",
    "CNN_VS",
    "CNNaffinity_variance",
]

sdf_paths = [
    "docking_results/multiple_ligands_docked.sdf",
    #"multiple_ligands_docked.sdf",
    "docking_results/Y6J_docked_7LME.sdf",
    #"TJ5_ideal.sdf"
]

df_list = []
for filename in sdf_paths:
    with BlockLogs():
        df_list.append(PandasTools.LoadSDF(filename))

combo_df = pd.concat(df_list)

# PandasTools reads all SDTags as strings, convert score columns to float
for col in score_columns:
    combo_df[col] = combo_df[col].astype(float)

combo_df

Unnamed: 0,ScrubInfo,minimizedAffinity,CNNscore,CNNaffinity,CNN_VS,CNNaffinity_variance,ID,ROMol
0,"{""isomerGroup"": 4, ""isomerId"": 0, ""confId"": 0,...",-9.14787,0.764619,7.065590,5.402485,0.458909,Compound_12_i0,<rdkit.Chem.rdchem.Mol object at 0x7964ad733680>
1,"{""isomerGroup"": 4, ""isomerId"": 0, ""confId"": 0,...",-8.05619,0.735773,6.955337,5.117549,0.098386,Compound_12_i0,<rdkit.Chem.rdchem.Mol object at 0x7964ad78c040>
2,"{""isomerGroup"": 4, ""isomerId"": 0, ""confId"": 0,...",-9.34157,0.615528,6.917795,4.258096,0.597604,Compound_12_i0,<rdkit.Chem.rdchem.Mol object at 0x7964ad78c270>
3,"{""isomerGroup"": 4, ""isomerId"": 0, ""confId"": 0,...",-7.62329,0.613069,6.659876,4.082964,0.090788,Compound_12_i0,<rdkit.Chem.rdchem.Mol object at 0x7964ad78c0b0>
4,"{""isomerGroup"": 4, ""isomerId"": 0, ""confId"": 0,...",-8.09464,0.527292,6.999388,3.690723,0.059682,Compound_12_i0,<rdkit.Chem.rdchem.Mol object at 0x7964ad78c890>
...,...,...,...,...,...,...,...,...
4,,-7.49248,0.269265,5.728605,1.542512,0.423906,Y6J,<rdkit.Chem.rdchem.Mol object at 0x7964ad78e1f0>
5,,-6.17790,0.244571,5.472425,1.338396,0.581873,Y6J,<rdkit.Chem.rdchem.Mol object at 0x7964ad78e260>
6,,-6.74170,0.212512,5.925318,1.259199,0.615063,Y6J,<rdkit.Chem.rdchem.Mol object at 0x7964ad78e2d0>
7,,-6.82403,0.203777,5.810819,1.184113,0.254853,Y6J,<rdkit.Chem.rdchem.Mol object at 0x7964ad78e340>


In [None]:
#top_poses = combo_df.sort_values(
#    by="minimizedAffinity", ascending=True
#).drop_duplicates("ID")
#top_poses

top_poses = combo_df.sort_values(
    by="minimizedAffinity", ascending=True
)
top_poses

Unnamed: 0,ScrubInfo,minimizedAffinity,CNNscore,CNNaffinity,CNN_VS,CNNaffinity_variance,ID,ROMol
5,"{""isomerGroup"": 4, ""isomerId"": 0, ""confId"": 0,...",-9.34666,0.454894,6.811048,3.098305,0.465547,Compound_12_i0,<rdkit.Chem.rdchem.Mol object at 0x7d9a26d7e340>
2,"{""isomerGroup"": 4, ""isomerId"": 0, ""confId"": 0,...",-9.34157,0.615528,6.917795,4.258096,0.597604,Compound_12_i0,<rdkit.Chem.rdchem.Mol object at 0x7d9a26d7e1f0>
18,"{""isomerGroup"": 4, ""isomerId"": 1, ""confId"": 0,...",-9.20305,0.735542,7.275836,5.351683,0.277724,Compound_12_i1,<rdkit.Chem.rdchem.Mol object at 0x7d9a26d7e8f0>
0,"{""isomerGroup"": 4, ""isomerId"": 0, ""confId"": 0,...",-9.14787,0.764619,7.065590,5.402485,0.458909,Compound_12_i0,<rdkit.Chem.rdchem.Mol object at 0x7d9a26d7e110>
32,"{""isomerGroup"": 2, ""isomerId"": 0, ""confId"": 0,...",-9.14357,0.407603,6.525291,2.659726,0.556016,Compound_1,<rdkit.Chem.rdchem.Mol object at 0x7d9a26d7ef10>
...,...,...,...,...,...,...,...,...
42,"{""isomerGroup"": 1, ""isomerId"": 0, ""confId"": 0,...",-6.73174,0.469921,6.405369,3.010015,0.316577,Compound_4,<rdkit.Chem.rdchem.Mol object at 0x7d9a26d7f370>
2,,-6.54141,0.377921,6.075754,2.296158,0.064094,Y6J,<rdkit.Chem.rdchem.Mol object at 0x7d9a26d7fae0>
34,"{""isomerGroup"": 2, ""isomerId"": 0, ""confId"": 0,...",-6.46708,0.370558,6.170868,2.286664,0.173717,Compound_1,<rdkit.Chem.rdchem.Mol object at 0x7d9a26d7eff0>
51,"{""isomerGroup"": 0, ""isomerId"": 0, ""confId"": 0,...",-6.18458,0.380530,5.935371,2.258587,0.246323,Compound_83,<rdkit.Chem.rdchem.Mol object at 0x7d9a26d7f760>


In [None]:
#top_poses.to_csv("multiple_ligands_results.csv")
top_poses.to_excel("multiple_ligands_results.xlsx")

In [None]:
top_poses.head()
top_poses['minimizedAffinity']
docking_scores = top_poses['minimizedAffinity']
docking_scores.head()

Unnamed: 0,minimizedAffinity
5,-9.34666
2,-9.34157
18,-9.20305
0,-9.14787
32,-9.14357


In [None]:
top_poses_top = top_poses[['minimizedAffinity', 'CNNaffinity']][top_poses['minimizedAffinity'] >= -7]
top_poses_top.head()

Unnamed: 0,minimizedAffinity,CNNaffinity
8,-6.958,5.608088
52,-6.91704,6.508137
30,-6.90586,6.485485
46,-6.86254,6.501461
49,-6.83454,6.210299


Interestingly, the compounds sorted by `minimizedAffinity` (the Vina score) correlates well with the observed `IC50` value from our table.

In [None]:
ligand_data = pd.read_csv(
    "https://raw.githubusercontent.com/MolSSI-Education/iqb-2025/refs/heads/main/data/US20240293380_examples.csv"
)
ligand_data.sort_values(by="IC50 (nM)")

Unnamed: 0,SMILES,Name,IC50 (nM),cluster
0,Fc1cc(Cl)cc(CN(C(=O)Cc2cncc3ccccc23)c2ccc(cc2)...,"US20240293380, Compound 12",15.0,0
1,Clc1cccc(CN(C(=O)Cc2nnc3ccccn23)c2ccc(cc2)-c2c...,"US20240293380, Compound 1",365.0,1
2,CC(C)(C)NC(=O)C(N(C(=O)Cc1cncc2ccccc12)c1ccc(c...,"US20240293380, Compound 63",1027.0,2
4,Clc1cccc(CN(C(=O)Cc2cnc3ccccn23)c2ccc(cc2)-c2c...,"US20240293380, Compound 4",1146.0,4
3,Fc1cncc(CN(C(=O)Cc2cncc(F)c2)c2ccc(cc2)C2CNC2)c1,"US20240293380, Compound 83",5225.0,3


We can also use our visualization function to visualize the docked ligands to look at how they interact with the binding site.

In [None]:
v = visualize_poses(
    "docking_files/protein_structures/7LME_fixed.pdb",
    #"7BCS_A_fixed.pdb",
    "docking_results/multiple_ligands_docked.sdf",
    #"multiple_ligands_docked.sdf",
    cognate_file="docking_files/ligand_structures/Y6J_corrected_pose.sdf",
    #cognate_file="TJ5_corrected_pose.sdf",
    animate=True,
)  # Change to True to see an animation of all of the poses
v.show()

If we would like to visualize poses for only one molecule, we can use the RDKit's PandasTools again to write an SDF for just that compound.

In [None]:
compound_name = "Compound_12_i0"

compound_df = combo_df[combo_df["ID"] == compound_name]

PandasTools.WriteSDF(
    compound_df,
    f"docking_results/individual_{compound_name}.sdf",  # Output file path
    molColName="ROMol",  # Name of the column with RDKit molecules
    properties=score_columns,  # List of property columns to include
)


cognate = Chem.MolFromMolFile("docking_files/ligand_structures/Y6J_corrected_pose.sdf")
poses = Chem.SDMolSupplier(f"docking_results/individual_{compound_name}.sdf")

for i, pose in enumerate(poses):
    n_match, rmsd = uru.mcs_rmsd(cognate, pose)
    print(f"{n_match}\t{rmsd:.2f}")

17	6.83
17	6.92
17	6.79
17	6.93
17	2.53
17	8.06
17	6.98
17	6.87
17	2.43


In [None]:
v = visualize_poses(
    "docking_files/protein_structures/7LME_fixed.pdb",
    f"docking_results/individual_{compound_name}.sdf",
    cognate_file=f"docking_results/individual_{compound_name}.sdf",
    animate=False,
)  # Change to True to see an animation of all of the poses
v.show()

## Flexible docking

In [None]:
# Flexible docking with known site/residues
!./gnina \
  -r docking_files/protein_structures/7LME.pdbqt \
  -l docking_files/ligand_structures/ligands_to_dock.sdf \
  --autobox_ligand docking_files/ligand_structures/Y6J_corrected_pose.sdf \
  -o docking_results/multiple_ligands_docked_flex.sdf \
  --flexdist_ligand docking_files/ligand_structures/Y6J_corrected_pose.sdf \
  --flexdist 3.59 \
  --cnn_scoring none \
  --seed 0 \
  --exhaustiveness 16

              _             
             (_)            
   __ _ _ __  _ _ __   __ _ 
  / _` | '_ \| | '_ \ / _` |
 | (_| | | | | | | | | (_| |
  \__, |_| |_|_|_| |_|\__,_|
   __/ |                    
  |___/                     

gnina  master:25e64da   Built Apr 23 2025.
gnina is based on smina and AutoDock Vina.
Please cite appropriately.

Recommend running with single model (--cnn fast)
or without cnn scoring (--cnn_scoring=none).

Commandline: ./gnina -r docking_files/protein_structures/7LME.pdbqt -l docking_files/ligand_structures/ligands_to_dock.sdf --autobox_ligand docking_files/ligand_structures/Y6J_corrected_pose.sdf -o docking_results/multiple_ligands_docked_flex.sdf --flexdist_ligand docking_files/ligand_structures/Y6J_corrected_pose.sdf --flexdist 3.59 --cnn_scoring none --seed 0 --exhaustiveness 16
Flexible residues: A:24 A:25 A:41 A:46 A:49 A:140 A:142 A:145 A:163 A:165 A:166 A:189
Using random seed: 0

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|

## Whole protein docking

In [None]:
# Docking with whole protein
!./gnina \
  -r docking_files/protein_structures/7LME.pdbqt \
  -l docking_files/ligand_structures/ligands_to_dock.sdf \
  --autobox_ligand docking_files/protein_structures/7LME.pdbqt \
  -o docking_results/whole_docked.sdf \
  --cnn_scoring none \
  --seed 0 \
  --exhaustiveness 64

## Free energy perturbation

In [None]:
ligands = ["Y6J.mol2", "compound_12.mol2"]

pairs = [(ligands[i], ligands[i+1]) for i in range(len(ligands)-1)]

with open("pairs.txt", "w") as f:
    for a, b in pairs:
        f.write(f"{a} {b}\n")

print("pairs.txt created with the following pairs:")
for a, b in pairs:
    print(f"{a} ↔ {b}")

pairs.txt created with the following pairs:
Y6J.mol2 ↔ compound_12.mol2
