# Basil Docking V0.1 - Docking and Preliminary Analysis
## Purpose

__Target Audience__<br>
Undergraduate chemistry/biochemistry students and, in general, people that have little to no knowledge of protein-ligand docking and would like to understand the general process of docking a ligand to a protein receptor.

__Brief Overview__<br>
Molecular docking is a computational method used to predict where molecules are able to bind to a protein receptor and what interactions exist between the molecule (from now on, refered to as "ligand") and the receptor. It is a popular technique utilized in drug discovery and design, as when creating new drugs and testing existing drugs aginst new receptors, it is useful to determine the likelihood of binding prior to screening as it can be used to eliminate molecules that are unlikely to bind to the receptor. This significantly reduces the potential cost and time needed to test the efficacy of a set of possible ligands. <br>

The general steps to perform molecular docking, assuming the ligand and receptor are ready to be docked, include the generation of potential ligand binding poses and the scoring of each generated pose (which predicts how strongly the ligand binds to the receptor, with a more negative score corresponding to a stronger bond). To dock a ligand to a protein, (insert text).<br>

This notebook series encompasses 
1. the preparation needed prior to docking (protein and ligand sanitation, ensuring files are in readable formats, and finding possible binding pockets)
2. __the process of docking ligand/s to a protein receptor using two docking engines (VINA and SMINA) and visualizing/analyzing the outputs__
3. further data collection and manipulation
4. utilizing machine learning to determine key residues (on the protein) and functional groups (on the ligand) responsible for protein-ligand binding

__Stepwise summary for this notebook (docking and preliminary analysis, notebook 2 out of (number))__
- Get docking box sizes from docking-prep notebook
- Dock ligand to protein using either VINA or SMINA
- Visualize different poses of ligands docked to protein
- Visualize protein-ligand interactions of poses

The methods utilized by this notebook are based off of Angel J. Ruiz-Moreno's Jupyter-Dock notebooks, which can be found on their GitHub account AngelRuizMoreno

Ruiz-Moreno A.J. Jupyter Dock: Molecular Docking integrated in Jupyter Notebooks. https://doi.org/10.5281/zenodo.5514956

## Table of Libraries Used
### Operations, variable creation, and variable manipulation

| Module (Submodule/s)| Abbreviation | Role | Citation |
| :--- | :--- | :--- | :---|
| numpy | np | add description | Harris, C.R., Millman, K.J., van der Walt, S.J. et al. Array programming with NumPy. Nature 585, 357–362 (2020). DOI: 10.1038/s41586-020-2649-2. (Publisher link). |
| pandas | pd | add description |  add source |
| numbers | n/a | add description |  add source |
| regex | re | add description |  add source |
| os | n/a | add description |  add source |
| sys | n/a | add description |  add source |
| glob |n/a | add description | add source |

### Visualization
| Module (Submodule/s)| Abbreviation | Role | Citation |
| :--- | :--- | :--- | :--- |
| py3Dmol | n/a | apoprotein and protein complex visualization |  Keshavan Seshadri, Peng Liu, and David Ryan Koes. Journal of Chemical Education 2020 97 (10), 3872-3876. https://doi.org/10.1021/acs.jchemed.0c00579. |

### Docking
| Module (Submodule/s)| Abbreviation | Role | Citation |
| :--- | :--- | :--- | :--- |
| vina | n/a | ligand-protein docking |  Eberhardt, J., Santos-Martins, D., Tillack, A.F., Forli, S. (2021). AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings. Journal of Chemical Information and Modeling. |
| --- | --- | --- | Trott, O., & Olson, A. J. (2010). AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of computational chemistry, 31(2), 455-461. |
| smina | n/a | ligand-protein docking |  Koes, D. R., Baumgartner, M. P., & Camacho, C. J. (2013). Lessons learned in empirical scoring with smina from the CSAR 2011 benchmarking exercise. Journal of chemical information and modeling, 53(8), 1893–1904. https://doi.org/10.1021/ci300604z |
| fpocket | n/a | description | Le Guilloux, V., Schmidtke, P. & Tuffery, P. Fpocket: An open source platform for ligand pocket detection. BMC Bioinformatics 10, 168 (2009). https://doi.org/10.1186/1471-2105-10-168. |

### Data analysis
| Module (Submodule/s)| Abbreviation | Role | Citation |
| :--- | :--- | :--- | :--- |
| MDAnalysis | mda | add description | R. J. Gowers, M. Linke, J. Barnoud, T. J. E. Reddy, M. N. Melo, S. L. Seyler, D. L. Dotson, J. Domanski, S. Buchoux, I. M. Kenney, and O. Beckstein. MDAnalysis: A Python package for the rapid analysis of molecular dynamics simulations. In S. Benthall and S. Rostrup, editors, Proceedings of the 15th Python in Science Conference, pages 98-105, Austin, TX, 2016. SciPy, doi:10.25080/majora-629e541a-00e. |
| --- | --- | --- | N. Michaud-Agrawal, E. J. Denning, T. B. Woolf, and O. Beckstein. MDAnalysis: A Toolkit for the Analysis of Molecular Dynamics Simulations. J. Comput. Chem. 32 (2011), 2319-2327, doi:10.1002/jcc.21787. PMCID:PMC3144279. |
| prolif | plf | add description|  chemosim-lab/ProLIF: v0.3.3 - 2021-06-11.https://doi.org/10.5281/zenodo.4386984. |

### UI
| Module (Submodule/s)| Abbreviation | Role | Citation |
| :--- | :--- | :--- | :--- |
| IPython (ipywidgets)| n/a | allows for widgets to be implemented | Fernando Pérez, Brian E. Granger, IPython: A System for Interactive Scientific Computing, Computing in Science and Engineering, vol. 9, no. 3, pp. 21-29, May/June 2007, doi:10.1109/MCSE.2007.53. URL: https://ipython.org |
| ipywidgets (Layout, Label, Dropdown, Box)| widgets | creates dropdowns for docking engine, pocket number, ligand, and pose selection | Fernando Pérez, Brian E. Granger, IPython: A System for Interactive Scientific Computing, Computing in Science and Engineering, vol. 9, no. 3, pp. 21-29, May/June 2007, doi:10.1109/MCSE.2007.53. URL: https://ipython.org |

In [24]:
import numpy as np
import pandas as pd
import numbers
import re
import sys, os
import glob
sys.path.insert(1, 'utilities/')
from utils import pdbqt_to_sdf

import py3Dmol
import ipywidgets as widgets
from ipywidgets import Layout, Label, Dropdown, Box, HBox, SelectMultiple

from vina import Vina
import smina

from openbabel import pybel
from rdkit import Chem
import MDAnalysis as mda 
from MDAnalysis.coordinates import PDB
import prolif as plf
from prolif.plotting.complex3d import Complex3D

import warnings
warnings.filterwarnings("ignore")

ModuleNotFoundError: No module named 'smina'

## Import values from docking-prep

Prior to docking, the data obtained from the previous notebook needs to be imported in order to be used. The glob library gets the pdb file containing the receptor, and the ligand information is obtained from the ligand_information.csv file created at the end of the previous notebook. The protein pocket information generated from fpocket is imported from the prot_pockets.csv file.

In [5]:
prot_pockets = pd.read_csv('data/protein_pockets.csv',index_col=[0])
ligand_information = pd.read_csv('data/ligand_information.csv')

find_pdb = os.path.join('data', 'PDB_files', '*.ent')
prot_file = glob.glob(find_pdb)[0]
prot_file_split = prot_file.split('/')[-1]
pdb_id = prot_file_split[3:7]

In [6]:
ligs = []
filenames = []
filenames_H = []
filenames_pdbqt = []
center = []
size = []
for r in ligand_information.index:
    ligs.append(ligand_information["ligs"][r])
    filenames.append(ligand_information["filenames"][r])
    filenames_H.append(ligand_information["filenames_H"][r])
    filenames_pdbqt.append(ligand_information["filenames_pdbqt"][r])
    temp = [float(ligand_information["center_x"][r]), float(ligand_information["center_y"][r]), float(ligand_information["center_z"][r])]
    center.append(temp)
    temp1 = [float(ligand_information["size_x"][r]), float(ligand_information["size_y"][r]), float(ligand_information["size_z"][r])]
    size.append(temp1)

## Docking

This notebook utilizes two docking engines for molecular docking: VINA and SMINA. VINA is one of many docking engines available in AutoDock Suite, and is widely used due to its relatively quick docking speed and easy-to-use interface compared to the other docking engines in the suite. SMINA is a fork of VINA, and allows for the modification of scoring terms by users and also adds other functions that make the engine more convenient (allowing multi-ligand files such as .sdf files, improving minimization algorithms, adding additional term types, and allowing for multiple ligand molecular formats). 

<div class="alert alert-block alert-info">
<b>Please note:</b> 
VINA only has force field parameters for atoms of the following elements
<ul> <li>hydrogen</li> <li>carbon</li> <li>oxygen</li> <li>nitrogen</li> <li>phosphorous</li> <li>sulfur</li> <li>calcium</li> <li>maganese</li> <li>iron</li> <li>zinc</li> <li>halogens (fluorine, chlorine, bromine, iodine)</li> </ul>
For ligands containing atoms that are not listed above, it is recommended that users either 1) select all ligands other than those containing atoms that are not supported using the selection widget below or 2) only use SMINA as the docking engine. Trying to dock a ligand with an unsupported ligand using VINA will result in an error.

To select multiple ligands using the selection widget, hold down the control key (PC) or command key (Mac) while clicking on the names of each ligand you would like to dock
</div>

In [7]:
style = {'description_width': 'initial'}
select_ligs = SelectMultiple(options = ligs, description = 'Select Ligand/s to Dock:', style = style)
select_ligs

SelectMultiple(description='Select Ligand/s to Dock:', options=('FES601', 'FES602', 'FAD606', 'MTE1326', 'MOS1…

### Docking using VINA

Below is a step-by-step (cell-by-cell) guide on how the VINA docking engine is used to generate poses and scores for each pocket and ligand
- Prior to docking, two new folders are created in the data folder to organize the output data (vina_out and vina_out_2). 
- Using the information collected in the docking-prep notebook, each pocket's center values and size values are added to their respective lists, which are called pocket_center and pocket_size. In both lists, each instance is a list of the x, y, and z values corresponding to one pocket's data (as a result, pocket_center and pocket_size are nested lists, and the length of both lists is equal to the number of binding pockets)
    - For example, pocket_center may look like this: [[x1, y1, z1][x2, y2, z2][x3, y3, z3]]
- Using the pocket size and center lists and the pdbqt files for the receptor and desired ligand, ligand poses are generated for each binding pocket (the number of poses depends on the value of n_poses, which is set to 5 in this notebook). The amount of computational effort needed to generate the poses for a given pocket and ligand is called the exhaustiveness. As exhaustiveness increases, the more reproducible the results tend to be. While the default value of exhaustiveness is 8, this notebook uses an exhaustiveness of 5 due to memory limitations.
- The results of running the VINA docking engine are stored as pdbqt files and can be located in the vina_out folder. In order to analyze and vizualize the results, the pdbqt files are converted into sdf files using the function pdbqt_to_sdf (created by Angel Ruiz-Moreno), which can be found in the vina_out_2 folder. The names of each file follows the formula of `(ligand name)_vina_pocket_(pocket number).pdbqt` for the pdbqt files and `(ligand name)_pocket_(pocket number)_(name of folder).sdf` for the sdf files.

In [8]:
# Create paths for output files
current_dir = os.getcwd()
dataPath = os.path.join(current_dir, "data")
vina_out = os.path.join(current_dir, "data", "vina_out")
vina_out_2 = os.path.join(current_dir, "data", "vina_out_2")

In [16]:
pocket_center = []
pocket_size = []
for i in select_ligs.value:
    for pocket in prot_pockets.index:
        c_x = prot_pockets.loc[pocket,'center_x']
        c_y = prot_pockets.loc[pocket,'center_y']
        c_z = prot_pockets.loc[pocket,'center_z']
        s_x = prot_pockets.loc[pocket,'size_x']
        s_y = prot_pockets.loc[pocket,'size_y']
        s_z = prot_pockets.loc[pocket,'size_z']
        pocket_center.append([c_x, c_y, c_z])
        pocket_size.append([s_x, s_y, s_z])

In [17]:
def vina_dock(ligand):
    v = Vina(sf_name='vina')
    v.set_receptor(f'data/PDBQT_files/{pdb_id}_protein.pdbqt')
    v.set_ligand_from_file(f"data/PDBQT_files/{ligand}_H.pdbqt")
    for pock_num, pocket in enumerate(prot_pockets.index):
        v.compute_vina_maps(center = pocket_center[pock_num], box_size = pocket_size[pock_num])
        v.dock(exhaustiveness=5, n_poses=5)
        v.write_poses("data/vina_out/" + str(ligand) + "_vina_pocket_" + str(pocket) + '.pdbqt', n_poses=5, overwrite=True)

In [22]:
for i in select_ligs.value:
    vina_dock(i)

TypeError: 

PDBQT parsing error: Unknown or inappropriate tag found in flex residue or ligand.
 > ROOT


In [15]:
# Create sdf files from pdbqt
for i in select_ligs.value:
    for pocket in prot_pockets.index:
        pdbqt_to_sdf(pdbqt_file=f"data/vina_out/{i}_vina_pocket_{pocket}.pdbqt",output=f"data/vina_out_2/{i}_pocket_{pocket}_vina_out_2.sdf")

### Docking using SMINA

Below is a step-by-step (cell-by-cell) guide on how the SMINA docking engine is used to generate poses and scores for each pocket and ligand
- Prior to docking, two new folders are created in the data folder to organize the output data (smina_out and smina_out_2). The path for the smina docking engine executable is also initialized to allow for the docking engine to be used, as it is a local file.
- Using the the pdbqt file for the receptor, the mol2 file for the desired ligand, and the pocket center/size values from the prot_pockets dataframe, ligand poses are generated for each binding pocket (the number of poses depends on the value of num_modes, which is set to 5 in this notebook). The amount of computational effort needed to generate the poses for a given pocket and ligand is called the exhaustiveness. As exhaustiveness increases, the more reproducible the results tend to be. While the default value of exhaustiveness is 8, this notebook uses an exhaustiveness of 5 due to memory limitations.
- The results of running the SMINA docking engine are stored as sdf files and can be located in the smina_out folder. However, due to the fact that the output files do not have a flag marking it as three dimensional, the sdf files must be read using SDMolSupplier and re-written using SDWriter to avoid excessive errors. The re-written sdf files can be found in the smina_out_2 folder. The names of each file follows the formula of `(ligand name)_pocket_(pocket number)_(name of folder).sdf` for the sdf files.

In [13]:
# Create paths for smina software and output files
smina_out = os.path.join(current_dir, "data", "smina_out")
smina_out_2 = os.path.join(current_dir, "data", "smina_out_2")

In [14]:
# Using SMINA to dock ligand/s in docking boxes based on fpocket's identified pockets
d = 0
for i in select_ligs.value: 
    for pocket in prot_pockets.index:
        rec = f'data/PDBQT_files/{pdb_id}_protein.pdbqt'
        lig = f'data/MOL2_files/{i}_H.mol2'
        outfile = f'data/smina_out/{i}_pocket_{pocket}_smina_out.sdf'
        center_x = prot_pockets.loc[pocket,'center_x']
        center_y = prot_pockets.loc[pocket,'center_y']
        center_z = prot_pockets.loc[pocket,'center_z']
        size_x = prot_pockets.loc[pocket,'size_x']
        size_y = prot_pockets.loc[pocket,'size_y']
        size_z = prot_pockets.loc[pocket,'size_z']
        smina -r {rec} -l {lig} -o {outfile} --center_x {center_x} --center_y {center_y} --center_z {center_z} --size_x {size_x} --size_y {size_y} --size_z {size_z} --exhaustiveness 5 --num_modes 5

   _______  _______ _________ _        _______ 
  (  ____ \(       )\__   __/( (    /|(  ___  )
  | (    \/| () () |   ) (   |  \  ( || (   ) |
  | (_____ | || || |   | |   |   \ | || (___) |
  (_____  )| |(_)| |   | |   | (\ \) ||  ___  |
        ) || |   | |   | |   | | \   || (   ) |
  /\____) || )   ( |___) (___| )  \  || )   ( |
  \_______)|/     \|\_______/|/    )_)|/     \|


smina is based off AutoDock Vina. Please cite appropriately.

Weights      Terms
-0.035579    gauss(o=0,_w=0.5,_c=8)
-0.005156    gauss(o=3,_w=2,_c=8)
0.840245     repulsion(o=0,_c=8)
-0.035069    hydrophobic(g=0.5,_b=1.5,_c=8)
-0.587439    non_dir_h_bond(g=-0.7,_b=0,_c=8)
1.923        num_tors_div

Using random seed: -1600578584

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------

***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
1       -2.8       0.000      0.000    
2       -2.8       28.359     28.894   
3       -2.8       29.087     29.203   
4       -2.8       28.375     28.900   
5       -2.8       21.447     22.262   
Refine time 0.380
Loop time 3.723
   _______  _______ _________ _        _______ 
  (  ____ \(       )\__   __/( (    /|(  ___  )
  | (    \/| () () |   ) (   |  \  ( || (   ) |
  | (_____ | || || |   | |   |   \ | || (___) |
  (_____  )| |(_)| |   | |   | (\ \) ||  ___  |
        ) || |   | |   | |   | | \   || (   ) |
  /\____) || )   ( |___) (___| )  \  || )   ( |
  \_______)|/     \|\_______/|/    )_)|/     \|


smina is based off AutoDock Vina. Please cite appropriately.

Weights      Terms
-0.035579    gauss(o=0,_w=0.5,_c=8)
-0.005156    gauss(o=3,_w=2,_c=8)
0.840245     repulsion(o=0,_c=8)
-0.035069    hydropho

Using random seed: 2091888658

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
1       -2.9       0.000      0.000    
2       -2.9       0.059      2.214    
3       -2.9       0.244      3.144    
4       -2.5       14.358     15.134   
5       -2.4       2.763      3.161    
Refine time 0.426
Loop time 2.519
   _______  _______ _________ _        _______ 
  (  ____ \(       )\__   __/( (    /|(  ___  )
  | (    \/| () () |   ) (   |  \  ( || (   ) |
  | (_____ | || || |   | |   |   \ | || (___) |
  (_____  )| |(_)| |   | |   | (\ \) ||  ___  |
        ) || |   | |   | |   | | \   || (   ) |
  /\____) || )   ( |___) (___| )  \  || )   ( |
  \_______)|/     \|\_______/|/    )_)|/     \|


smina is based off AutoDock Vina. Please cite appropriately.

Weigh

Using random seed: 489967734

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
1       -2.9       0.000      0.000    
2       -2.9       0.038      3.140    
3       -2.7       33.573     34.327   
4       -2.6       38.573     39.571   
5       -2.5       33.399     33.711   
Refine time 0.354
Loop time 4.202
   _______  _______ _________ _        _______ 
  (  ____ \(       )\__   __/( (    /|(  ___  )
  | (    \/| () () |   ) (   |  \  ( || (   ) |
  | (_____ | || || |   | |   |   \ | || (___) |
  (_____  )| |(_)| |   | |   | (\ \) ||  ___  |
        ) || |   | |   | |   | | \   || (   ) |
  /\____) || )   ( |___) (___| )  \  || )   ( |
  \_______)|/     \|\_______/|/    )_)|/     \|


smina is based off AutoDock Vina. Please cite appropriately.

Weight

Using random seed: -512364316

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
1       -2.9       0.000      0.000    
Refine time 101.684
Loop time 111.126
   _______  _______ _________ _        _______ 
  (  ____ \(       )\__   __/( (    /|(  ___  )
  | (    \/| () () |   ) (   |  \  ( || (   ) |
  | (_____ | || || |   | |   |   \ | || (___) |
  (_____  )| |(_)| |   | |   | (\ \) ||  ___  |
        ) || |   | |   | |   | | \   || (   ) |
  /\____) || )   ( |___) (___| )  \  || )   ( |
  \_______)|/     \|\_______/|/    )_)|/     \|


smina is based off AutoDock Vina. Please cite appropriately.

Weights      Terms
-0.035579    gauss(o=0,_w=0.5,_c=8)
-0.005156    gauss(o=3,_w=2,_c=8)
0.840245     repulsion(o=0,_c=8)
-0.035069    hydrophobic(g=0.5,_b=1.5,_

Using random seed: -1801810648

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
1       -8.1       0.000      0.000    
2       -7.4       4.193      6.157    
3       -7.3       1.855      2.770    
4       -7.3       3.373      5.446    
5       -7.0       1.592      1.699    
Refine time 2.538
Loop time 12.990
   _______  _______ _________ _        _______ 
  (  ____ \(       )\__   __/( (    /|(  ___  )
  | (    \/| () () |   ) (   |  \  ( || (   ) |
  | (_____ | || || |   | |   |   \ | || (___) |
  (_____  )| |(_)| |   | |   | (\ \) ||  ___  |
        ) || |   | |   | |   | | \   || (   ) |
  /\____) || )   ( |___) (___| )  \  || )   ( |
  \_______)|/     \|\_______/|/    )_)|/     \|


smina is based off AutoDock Vina. Please cite appropriately.

Wei

Using random seed: 1135875135

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
1       -8.1       0.000      0.000    
2       -7.4       47.508     48.751   
3       -7.3       3.371      5.451    
4       -7.0       47.698     48.963   
5       -6.7       3.370      5.091    
Refine time 2.432
Loop time 11.140
   _______  _______ _________ _        _______ 
  (  ____ \(       )\__   __/( (    /|(  ___  )
  | (    \/| () () |   ) (   |  \  ( || (   ) |
  | (_____ | || || |   | |   |   \ | || (___) |
  (_____  )| |(_)| |   | |   | (\ \) ||  ___  |
        ) || |   | |   | |   | | \   || (   ) |
  /\____) || )   ( |___) (___| )  \  || )   ( |
  \_______)|/     \|\_______/|/    )_)|/     \|


smina is based off AutoDock Vina. Please cite appropriately.

Weig

Using random seed: -2067788897

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
1       -2.3       0.000      0.000    
2       -2.2       11.404     11.404   
3       -2.1       19.090     19.090   
4       -2.1       12.913     12.913   
5       -2.1       8.611      8.611    
Refine time 0.338
Loop time 4.021
   _______  _______ _________ _        _______ 
  (  ____ \(       )\__   __/( (    /|(  ___  )
  | (    \/| () () |   ) (   |  \  ( || (   ) |
  | (_____ | || || |   | |   |   \ | || (___) |
  (_____  )| |(_)| |   | |   | (\ \) ||  ___  |
        ) || |   | |   | |   | | \   || (   ) |
  /\____) || )   ( |___) (___| )  \  || )   ( |
  \_______)|/     \|\_______/|/    )_)|/     \|


smina is based off AutoDock Vina. Please cite appropriately.

Weig

Using random seed: -190917329

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
1       -5.5       0.000      0.000    
2       -5.4       51.831     52.680   
3       -5.4       8.592      9.812    
4       -5.2       28.101     29.097   
5       -5.1       11.177     12.585   
Refine time 0.769
Loop time 7.090
   _______  _______ _________ _        _______ 
  (  ____ \(       )\__   __/( (    /|(  ___  )
  | (    \/| () () |   ) (   |  \  ( || (   ) |
  | (_____ | || || |   | |   |   \ | || (___) |
  (_____  )| |(_)| |   | |   | (\ \) ||  ___  |
        ) || |   | |   | |   | | \   || (   ) |
  /\____) || )   ( |___) (___| )  \  || )   ( |
  \_______)|/     \|\_______/|/    )_)|/     \|


smina is based off AutoDock Vina. Please cite appropriately.

Weigh

Using random seed: -1640110150

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
1       -4.9       0.000      0.000    
2       -4.9       32.329     33.492   
3       -4.8       16.458     17.270   
4       -4.6       11.076     11.399   
5       -4.6       9.491      10.879   
Refine time 0.756
Loop time 5.672
   _______  _______ _________ _        _______ 
  (  ____ \(       )\__   __/( (    /|(  ___  )
  | (    \/| () () |   ) (   |  \  ( || (   ) |
  | (_____ | || || |   | |   |   \ | || (___) |
  (_____  )| |(_)| |   | |   | (\ \) ||  ___  |
        ) || |   | |   | |   | | \   || (   ) |
  /\____) || )   ( |___) (___| )  \  || )   ( |
  \_______)|/     \|\_______/|/    )_)|/     \|


smina is based off AutoDock Vina. Please cite appropriately.

Weig

In [17]:
# Rewrite .sdf output files to add 3D tag
# This code will result in warnings. This is normal as long as the warning is
# "Warning: molecule is tagged as 2D, but at least one Z coordinate is not zero. Marking the mol as 3D."
mols_all = []
for i in select_ligs.value:
    mols = []
    for pocket in prot_pockets.index:
        with Chem.SDMolSupplier(f'data/smina_out/{i}_pocket_{pocket}_smina_out.sdf') as suppl:
            for mol in suppl:
                if mol is not None:
                    Chem.MolToMolBlock(mol)
                    mols.append(mol)
        with Chem.SDWriter(f"data/smina_out_2/{i}_pocket_{pocket}_smina_out_2.sdf") as w:
            for mol in mols:
                w.write(mol)



[11:35:53] Explicit valence for atom # 0 N, 4, is greater than permitted
[11:35:53] ERROR: Could not sanitize molecule ending on line 52
[11:35:53] ERROR: Explicit valence for atom # 0 N, 4, is greater than permitted
[11:35:53] Explicit valence for atom # 0 N, 4, is greater than permitted
[11:35:53] ERROR: Could not sanitize molecule ending on line 108
[11:35:53] ERROR: Explicit valence for atom # 0 N, 4, is greater than permitted
[11:35:53] Explicit valence for atom # 0 N, 4, is greater than permitted
[11:35:53] ERROR: Could not sanitize molecule ending on line 164
[11:35:53] ERROR: Explicit valence for atom # 0 N, 4, is greater than permitted
[11:35:53] Explicit valence for atom # 0 N, 4, is greater than permitted
[11:35:53] ERROR: Could not sanitize molecule ending on line 220
[11:35:53] ERROR: Explicit valence for atom # 0 N, 4, is greater than permitted
[11:35:53] Explicit valence for atom # 0 N, 4, is greater than permitted
[11:35:53] ERROR: Could not sanitize molecule ending on 

[11:35:53] ERROR: Could not sanitize molecule ending on line 52
[11:35:53] ERROR: Explicit valence for atom # 0 N, 4, is greater than permitted
[11:35:53] Explicit valence for atom # 0 N, 4, is greater than permitted
[11:35:53] ERROR: Could not sanitize molecule ending on line 108
[11:35:53] ERROR: Explicit valence for atom # 0 N, 4, is greater than permitted
[11:35:53] Explicit valence for atom # 0 N, 4, is greater than permitted
[11:35:53] ERROR: Could not sanitize molecule ending on line 164
[11:35:53] ERROR: Explicit valence for atom # 0 N, 4, is greater than permitted
[11:35:53] Explicit valence for atom # 0 N, 4, is greater than permitted
[11:35:53] ERROR: Could not sanitize molecule ending on line 220
[11:35:53] ERROR: Explicit valence for atom # 0 N, 4, is greater than permitted
[11:35:53] Explicit valence for atom # 0 N, 4, is greater than permitted
[11:35:53] ERROR: Could not sanitize molecule ending on line 276
[11:35:53] ERROR: Explicit valence for atom # 0 N, 4, is greater

## Analysis of docking output

Now that we have results from molecular docking, we need to make sense of the information. If you were to open the sdf files in a text editor, you would see x, y, and z coordinates for each atom in the ligand, the bond types between atoms in the ligand, and the score of the ligand pose. While useful, this information is difficult to interpret and visualize. To get information regarding the number of interactions, the types of interaction, and the atoms (ligand) and residues (receptor) involved in binding the ligand to the receptor, interaction fingerprints (IFPs) can be generated and viewed using the prolif library, which can be used to identify key atoms in the ligand and key residues in the receptor involved in protein-ligand complex formation.

In [16]:
# load protein
prot = mda.Universe(f"data/PDB_files/{pdb_id}_protein_H.pdb") #universe
prot = plf.Molecule.from_mda(prot)
prot.n_residues 

In [26]:
style = {'description_width': 'initial'}
select_dock = Dropdown(options = [('smina'), ('vina')], description = 'Select docking engine to use:', style = style)
select_dock

Dropdown(description='Select docking engine to use:', options=('smina', 'vina'), style=DescriptionStyle(descri…

In [None]:
all_ligand_plf = []
ligand_plf = []
all_df = []
all_ifps = []
for i in select_ligs.value:
        for pocket in prot_pockets.index:
            lig_suppl = plf.sdf_supplier(f"data/{select_dock.value}_out_2/{i}_pocket_{pocket}_{select_dock.value}_out_2.sdf")
            fp = plf.Fingerprint(count=True)
            fp.run_from_iterable(lig_suppl, protein_plf)
            results_df = fp.to_dataframe()
            all_df.append(results_df)
            for lig in lig_suppl:
                all_ligand_plf.append(lig)
                ifp = fp.generate(lig, protein_plf, metadata = True)
                all_ifps.append(ifp)

In [None]:
all_results = []
scores = [] # get list of scores for each pose
for h, i in enumerate(select_ligs.value):
    # initialize list that contains values all poses in all pockets (for 1 ligand at a time)
    nested_results = []
    # append pose data for each pocket to nested_results
    for pocket in prot_pockets.index:
        results = Chem.SDMolSupplier(f"data/{select_dock.value}_out_2/{i}_pocket_{pocket}_{select_dock.value}_out_2.sdf")
        nested_results.append(results) 
    # add all values in nested_results to allResults list
    all_results.append(nested_results)
    
# get score values for every pose in allResults
for linenum, i in enumerate(all_results):
    for num, pocket in enumerate(i):
        for num2, pose in enumerate(pocket):
            if select_dock.value == "smina":
                scores.append(float(all_results[linenum][num][num2].GetProp('minimizedAffinity')))
            else:
                scores.append(float(all_results[linenum][num][num2].GetProp('Score')))

In [None]:
df = pd.concat([d for d in all_df], axis=0, ignore_index=False, sort=False, keys = prot_pockets.index).reset_index()
df.insert(2, "Score", pd.Series(scores))
df = df.fillna(0)
df

While the dataframe generated using the prolif library has a lot of useful information, we are also going to add the distance between interacting ligand and protein atoms, the indexes of both the ligand and protein atoms involved in the interaction, and the functional group the ligand's atom is a member of if applicable.

In [None]:
col_names_list = []
residues = []
interactions = []
for key in all_ifps:
    for key_new in key:
        for key_2 in key[key_new]:
            residues.append(str(key_new[1]))
            interactions.append(str(key_2))
            lig_name = str(key_new[0])
            res_name = str(key_new[1])
            column_name = (lig_name, res_name, key_2)
            if column_name not in col_names_list:
                col_names_list.append(column_name)
                new_col_0 = (lig_name, res_name, f"Functional group involved ({key_2})")
                new_col_1 = (lig_name, res_name, f"Distance ({key_2})")
                new_col_2 = (lig_name, res_name, f"Index 1 (Ligand) ({key_2})")
                new_col_3 = (lig_name, res_name, f"Index 2 (Ligand) ({key_2})")
                new_col_4 = (lig_name, res_name, f"Index 3 (Protein) ({key_2})")
                new_col_5 = (lig_name, res_name, f"Index 4 (Protein) ({key_2})")
                x = df.columns.get_loc(column_name)
                df.insert(x + 1, new_col_0, pd.Series([0] * df.shape[0]))
                df.insert(x + 2, new_col_1, pd.Series([0] * df.shape[0]))
                df.insert(x + 3, new_col_2, pd.Series([0] * df.shape[0]))
                df.insert(x + 4, new_col_3, pd.Series([0] * df.shape[0]))
                df.insert(x + 5, new_col_4, pd.Series([0] * df.shape[0]))
                df.insert(x + 6, new_col_5, pd.Series([0] * df.shape[0]))

In [None]:
df = df.astype(object)

To get the functional groups in each ligand, a dictionary is created where the keys are the indexes of atoms determined to be in a functional group, and the corresponding value is the name of the functional group. Due to keys being unable to be used more than once in a dictionary, atoms that are members of two or more functional groups will only have one of their functional groups listed as the value.

In [None]:
groups_to_numbers = {
    'ester' : 1,
    'ether' : 2,
    'hydroxy' : 3,
    'carbox_acid' : 4,
    'aldehyde' : 5,
    'anhydr' : 6,
    'amine' : 7,
    'amine_2' : 8,
    'amine_3': 9,
    'amide' : 10,
    'amide_2' : 11,
    'amide_3' : 12,
    'nitro' : 13,
    'imine' : 14,
    'f_hal' : 15,
    'cl_hal' : 16,
    'br_hal' : 17,
    'i_hal' : 18,
    'alkene' : 19,
    'alkyne' : 20,
    'alkyne_term' : 21,
    'phenyl' : 22,
    'benzyl' : 23,
    'pyrrole' : 24,
    'imidiz' : 25,
    'pyridine' : 26,
    'pyrimidine' : 27
}
def group_idxes_from_mol(lig):
    match_indexes = {}
    mol = lig
    mol_neworder = tuple(zip(*sorted([(j, i) for i, j in enumerate(Chem.CanonicalRankAtoms(mol))])))[1]
    mol_renum = Chem.RenumberAtoms(mol, mol_neworder)
    for j in functional_groups:
        k = Chem.MolFromSmarts(j)
        if mol_renum.HasSubstructMatch(k):
            idxes = mol_renum.GetSubstructMatches(k)
            idxes_list = list(idxes)
            for index in idxes:
                for subind in index:
                    match_indexes[subind] = str(functional_groups_dict[j])
    return match_indexes

In [None]:
for number, key in enumerate(all_ifps):
    for key_new in key:
        for key_2 in key[key_new]:
            lig_name = str(key_new[0])
            res_name = str(key_new[1])
            column_name = (lig_name, res_name, key_2)
            new_col_0 = (lig_name, res_name, f"Functional group involved ({key_2})")
            new_col_1 = (lig_name, res_name, f"Distance ({key_2})")
            new_col_2 = (lig_name, res_name, f"Index 1 (Ligand) ({key_2})")
            new_col_3 = (lig_name, res_name, f"Index 2 (Ligand) ({key_2})")
            new_col_4 = (lig_name, res_name, f"Index 3 (Protein) ({key_2})")
            new_col_5 = (lig_name, res_name, f"Index 4 (Protein) ({key_2})")
            get_pose = df["Frame"][number]
            x = key[key_new]
            y = x[key_2]
            df_groups = []
            df_distance = []
            df_ind_1 = []
            df_ind_2 = []
            df_ind_3 = []
            df_ind_4 = []
            for inst_num, instance in enumerate(y):
                distance = instance["distance"]
                df_distance.append(distance)
                parent_index = instance["parent_indices"]
                if len(parent_index["ligand"]) == 2:
                    df_ind_1.append(parent_index["ligand"][0])
                    df_ind_2.append(parent_index["ligand"][1])
                else:
                    df_ind_1.append(parent_index["ligand"][0])
                    df_ind_2.append(0)
                if len(parent_index["protein"]) == 2:
                    df_ind_3.append(parent_index["protein"][0])
                    df_ind_4.append(parent_index["protein"][1])
                else:
                    df_ind_3.append(parent_index["protein"][0])
                    df_ind_4.append(0)
                current = all_ligand_plf[number]
                group_ints = group_idxes_from_mol(current)
                for value in group_ints.keys():
                    if len(parent_index["ligand"]) == 2:
                        if value == parent_index["ligand"][0] | value == parent_index["ligand"][1]:
                            df_groups.append(int(groups_to_numbers[group_ints[value]]))
                    else:
                        if value == parent_index["ligand"][0]:
                            df_groups.append(int(groups_to_numbers[group_ints[value]]))
                if len(df_groups) == inst_num:
                    df_groups.append(0)
            df.at[number, new_col_0] = df_groups
            df.at[number, new_col_1] = df_distance
            df.at[number, new_col_2] = df_ind_1
            df.at[number, new_col_3] = df_ind_2
            df.at[number, new_col_4] = df_ind_3
            df.at[number, new_col_5] = df_ind_4

In [None]:
df.to_csv('data/docking_information.csv', index = False)
df

Using the dropdown created by the cell below, up to five ligand poses can be selected to be viewed with the receptor. The number of viewers created by py3Dmol depends on the value of the "Number of Poses to View" dropdown. If multiple ligands were docked, all poses selected to be viewed will be of the same ligand. To select specific poses to view, the pocket number and the pose number corresponding to the desired pose must be selected in the dropdown, making sure that they belong to the same selection as noted in the parentheses of the dropdown's label.

In [12]:
pocket_list = []
for pocket in prot_pockets.index:
    pocket_list.append((str(pocket), int(pocket)))

form_item_layout = Layout(
    display='flex',
    flex_flow='row',
    justify_content='space-between')

ligand_number = Dropdown(options = select_ligs.value)
visual_number = Dropdown(options = [('1', 1), ('2', 2), ('3', 3), ('4', 4), ('5', 5)])

pock_number1 = Dropdown(options = pocket_list)
pose_number1 = Dropdown(options = [('1', 1), ('2', 2), ('3', 3), ('4', 4), ('5', 5)])
pock_number2 = Dropdown(options = pocket_list)
pose_number2 = Dropdown(options = [('1', 1), ('2', 2), ('3', 3), ('4', 4), ('5', 5)])
pock_number3 = Dropdown(options = pocket_list)
pose_number3 = Dropdown(options = [('1', 1), ('2', 2), ('3', 3), ('4', 4), ('5', 5)])
pock_number4 = Dropdown(options = pocket_list)
pose_number4 = Dropdown(options = [('1', 1), ('2', 2), ('3', 3), ('4', 4), ('5', 5)])
pock_number5 = Dropdown(options = pocket_list)
pose_number5 = Dropdown(options = [('1', 1), ('2', 2), ('3', 3), ('4', 4), ('5', 5)])

form_items1 = [Box([Label(value='Ligand'), ligand_number], layout=form_item_layout),
              Box([Label(value='Number of Poses to View'), visual_number], layout=form_item_layout)]

form_items2 = [Box([Label(value='Pocket Number (Selection 1)'), pock_number1], layout=form_item_layout),
               Box([Label(value='Pose Number (Selection 1)'), pose_number1], layout=form_item_layout),
               Box([Label(value='Pocket Number (Selection 2)'), pock_number2], layout=form_item_layout),
               Box([Label(value='Pose Number (Selection 2)'), pose_number2], layout=form_item_layout),
               Box([Label(value='Pocket Number (Selection 3)'), pock_number3], layout=form_item_layout),
               Box([Label(value='Pose Number (Selection 3)'), pose_number3], layout=form_item_layout),
               Box([Label(value='Pocket Number (Selection 4)'), pock_number4], layout=form_item_layout),
               Box([Label(value='Pose Number (Selection 4)'), pose_number4], layout=form_item_layout),
               Box([Label(value='Pocket Number (Selection 5)'), pock_number5], layout=form_item_layout),
               Box([Label(value='Pose Number (Selection 5)'), pose_number5], layout=form_item_layout)
              ]

form1 = Box(form_items1, layout=Layout(
    display='flex',
    flex_flow='column',
    border='solid 2px',
    align_items='stretch',
    width='50%'
))
form2 = Box(form_items2, layout=Layout(
    display='flex',
    flex_flow='column',
    border='solid 2px',
    align_items='stretch',
    width='50%'
))

form = HBox([form1, form2])
form

HBox(children=(Box(children=(Box(children=(Label(value='Docking Engine'), Dropdown(options=('smina', 'vina'), …

For each pose selected in the dropdown made by the cell above, a viewer containing the receptor (including the space-filling surface of the receptor), the generated ligand pose by the docking engine, and the original ligand pose will be created. All of the generated viewers are linked, so all viewers will rotate and move together. The original ligand pose may not be useful for ligands that were added by uploading a local mol2 file or by inputting a SMILES string.

In [80]:
# NEED TO TEST MORE
pocket_selection_list = [int(pock_number1.value), int(pock_number2.value), int(pock_number3.value), int(pock_number4.value), int(pock_number5.value)]
pose_selection_list = [int(pose_number1.value), int(pose_number2.value), int(pose_number3.value), int(pose_number4.value), int(pose_number5.value)]

#initialize py3dmol viewer
view = py3Dmol.view(height = 800, width = 900, viewergrid = (1,int(visual_number.value)), linked = True)
view.removeAllModels()
view.setViewStyle({'style':'outline','color':'black','width':0.1})

# view specified poses
num_sel = 0
while num_sel < int(visual_number.value): 
    # add receptor model to all py3dmol viewers
    view.addModel(open(f"data/PDB_files/{pdb_id}_protein_H.pdb",'r').read(),format='pdb')
    Prot=view.getModel(viewer = (0, num_sel))
    Prot.setStyle({'cartoon':{'arrows':True, 'tubes':True, 'style':'oval', 'color':'white'}}, viewer = (0, num_sel))
    view.addSurface(py3Dmol.VDW,{'opacity':0.6,'color':'white'}, viewer = (0, num_sel))
    
    # add reference model of ligand to py3dmol viewer
    view.addModel(open(f"data/MOL2_files/{ligand_number.value}_H.mol2",'r').read(),format='mol2')
    ref_m = view.getModel(viewer = (0, num_sel))
    ref_m.setStyle({},{'stick':{'colorscheme':'magentaCarbon','radius':0.2}})
    
    # add experimental docking data of a desired pose in a pocket to py3dmol viewer
    selected = Chem.SDMolSupplier(f"data/{select_dock.value}_out_2/{ligand_number.value}_pocket_{pocket_selection_list[num_sel]}_{select_dock.value}_out_2.sdf")
    p=Chem.MolToMolBlock(selected[pose_selection_list[num_sel] - 1],False)
    if dock_engine.value == 'smina':
        print('Reference (' + str(ligand_number.value) + '): Magenta | Smina Pose (' + str(ligand_number.value) + '): Cyan')
        print ('Score: {}'.format(selected[pose_selection_list[num_sel] - 1].GetProp('minimizedAffinity')))
    else:
        print('Reference (' + str(ligand_number.value) + '): Magenta | Vina Pose (' + str(ligand_number.value) + '): Cyan')
        print ('Pose: {} | Score: {}'.format(selected[pose_selection_list[num_sel] - 1].GetProp('Pose'), selected[pose_selection_list[num_sel] - 1].GetProp('Score')))
    view.addModel(p,'mol')
    x = view.getModel(viewer = (0, num_sel))
    x.setStyle({},{'stick':{'colorscheme':'cyanCarbon','radius':0.2}})
    num_sel += 1
view.zoomTo()
view.show()

Reference (FSN501): Magenta | Smina Pose (FSN501): Cyan
Score: -8.89021
Reference (FSN501): Magenta | Smina Pose (FSN501): Cyan
Score: -8.70127
Reference (FSN501): Magenta | Smina Pose (FSN501): Cyan
Score: -8.89021


Using an IFP, the interactions between the ligand and the receptor can be visualized using prolif's Complex3D submodule. Only one pose and its interactions can be viewed at a time.

In [1]:
# display interactions. select which one to view using dropdown
pose_pock_select = []
a = 0
while a < int(df.shape[0]):
    pose_pock_select.append(a + 1)
    a += 1
style = {'description_width': 'initial'}
select_pose = Dropdown(options = pose_pock_select, description = 'Select Pose to View:', style = style)
select_pose

NameError: name 'visual_number' is not defined

In [83]:
comp = Complex3D(all_ifps[select_pose.value], all_ligand_plf[select_pose.value], protein_plf)
comp.display()

<py3Dmol.view at 0x158bcfe60>