# Basil Docking V0.1 - Docking and Preliminary Analysis
## Purpose

__Target Audience__<br>
Undergraduate chemistry/biochemistry students and, in general, people that have little to no knowledge of protein-ligand docking and would like to understand the general process of docking a ligand to a protein receptor.

__Brief Overview__<br>
Molecular docking is a computational method used to predict where molecules are able to bind to a protein receptor and what interactions exist between the molecule (from now on, refered to as "ligand") and the receptor. It is a popular technique utilized in drug discovery and design, as when creating new drugs and testing existing drugs aginst new receptors, it is useful to determine the likelihood of binding prior to screening as it can be used to eliminate molecules that are unlikely to bind to the receptor. This significantly reduces the potential cost and time needed to test the efficacy of a set of possible ligands. <br>

The general steps to perform molecular docking, assuming the ligand and receptor are ready to be docked, include the generation of potential ligand binding poses and the scoring of each generated pose (which predicts how strongly the ligand binds to the receptor, with a more negative score corresponding to a stronger bond). To dock a ligand to a protein, (insert text).<br>

This notebook series encompasses 
1. The preparation needed prior to docking (protein and ligand sanitation, ensuring files are in readable formats, and finding possible binding pockets)
2. __The process of docking ligand/s to a protein receptor using two docking engines (VINA and SMINA) and visualizing/analyzing the outputs__
3. Further data collection and manipulation
4. Utilizing machine learning to determine key residues (on the protein) and functional groups (on the ligand) responsible for protein-ligand binding

__Stepwise summary for this notebook (docking and preliminary analysis, notebook 2 out of 4)__
- Get docking box sizes from docking-prep notebook
- Dock ligand to protein using either VINA or SMINA
- Visualize different poses of ligands docked to protein
- Visualize protein-ligand interactions of poses

The methods utilized by this notebook are based off of Angel J. Ruiz-Moreno's Jupyter-Dock notebooks, which can be found on their GitHub account AngelRuizMoreno

Ruiz-Moreno A.J. Jupyter Dock: Molecular Docking integrated in Jupyter Notebooks. https://doi.org/10.5281/zenodo.5514956

## Table of Libraries Used
### Operations, variable creation, and variable manipulation

| Module (Submodule/s)| Abbreviation | Role | Citation |
| :--- | :--- | :--- | :---|
| numpy | np | perform mathematical operations and fix NaN values in dataframe outputs | Harris, C.R., Millman, K.J., van der Walt, S.J. et al. Array programming with NumPy. Nature 585, 357–362 (2020). DOI: 10.1038/s41586-020-2649-2. (Publisher link). |
| pandas | pd | organize data in an easy-to-read format and allow for the exporting of data as a .csv file | The pandas development team. (2024). pandas-dev/pandas: Pandas (v2.2.3). Zenodo. https://doi.org/10.5281/zenodo.13819579 |
| re |n/a| regular expression; find and pull specific strings of characters depending on need, allow for easy naming and variable creation | Van Rossum, G. (2020). The Python Library Reference, release 3.8.2. Python Software Foundation. |
| os | n/a| allow for interaction with computer operating system, including the reading and writing of files |  Van Rossum, G. (2020). The Python Library Reference, release 3.8.2. Python Software Foundation. |
| sys |n/a| manipulate python runtime environment |  Van Rossum, G. (2020). The Python Library Reference, release 3.8.2. Python Software Foundation.|
| glob |n/a| pull files of interest, specifically for blind docking |  Van Rossum, G. (2020). The Python Library Reference, release 3.8.2. Python Software Foundation. |
| warnings | n/a | filter warnings | Van Rossum, G. (2020). The Python Library Reference, release 3.8.2. Python Software Foundation. |

### Visualization
| Module (Submodule/s)| Abbreviation | Role | Citation |
| :--- | :--- | :--- | :--- |
| py3Dmol | n/a | apoprotein and protein complex visualization |  Keshavan Seshadri, Peng Liu, and David Ryan Koes. Journal of Chemical Education 2020 97 (10), 3872-3876. https://doi.org/10.1021/acs.jchemed.0c00579. |

### Docking
| Module (Submodule/s)| Abbreviation | Role | Citation |
| :--- | :--- | :--- | :--- |
| vina | n/a | ligand-protein docking |  Eberhardt, J., Santos-Martins, D., Tillack, A.F., Forli, S. (2021). AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings. Journal of Chemical Information and Modeling. |
| --- | --- | --- | Trott, O., & Olson, A. J. (2010). AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of computational chemistry, 31(2), 455-461. |
| smina | n/a | ligand-protein docking |  Koes, D. R., Baumgartner, M. P., & Camacho, C. J. (2013). Lessons learned in empirical scoring with smina from the CSAR 2011 benchmarking exercise. Journal of chemical information and modeling, 53(8), 1893–1904. https://doi.org/10.1021/ci300604z |
| fpocket | n/a | find possible binding pockets in protein receptors | Le Guilloux, V., Schmidtke, P. & Tuffery, P. Fpocket: An open source platform for ligand pocket detection. BMC Bioinformatics 10, 168 (2009). https://doi.org/10.1186/1471-2105-10-168. |
|pdbqt_to_sdf | n/a | create sdf files from pdbqt files created from docking with vina | Ruiz-Moreno A.J. Jupyter Dock: Molecular Docking integrated in Jupyter Notebooks. https://doi.org/10.5281/zenodo.5514956 |

### Data analysis
| Module (Submodule/s)| Abbreviation | Role | Citation |
| :--- | :--- | :--- | :--- |
| rdkit (Chem)| n/a | reorder/retrieve ligand atoms, retrieve information from ligand sdf files for visualization and comparison  |  RDKit: Open-source cheminformatics; http://www.rdkit.org |
| MDAnalysis (PDB)| mda | allow for the selection of atoms for separating protein from ligands and ligands from each other | R. J. Gowers, M. Linke, J. Barnoud, T. J. E. Reddy, M. N. Melo, S. L. Seyler, D. L. Dotson, J. Domanski, S. Buchoux, I. M. Kenney, and O. Beckstein. MDAnalysis: A Python package for the rapid analysis of molecular dynamics simulations. In S. Benthall and S. Rostrup, editors, Proceedings of the 15th Python in Science Conference, pages 98-105, Austin, TX, 2016. SciPy, doi:10.25080/majora-629e541a-00e. |
| --- | --- | --- | N. Michaud-Agrawal, E. J. Denning, T. B. Woolf, and O. Beckstein. MDAnalysis: A Toolkit for the Analysis of Molecular Dynamics Simulations. J. Comput. Chem. 32 (2011), 2319-2327, doi:10.1002/jcc.21787. PMCID:PMC3144279. |
| prolif (Complex3D)| plf | calculate, record, and view protein-ligand interactions |  chemosim-lab/ProLIF: v0.3.3 - 2021-06-11.https://doi.org/10.5281/zenodo.4386984. |

### UI
| Module (Submodule/s)| Abbreviation | Role | Citation |
| :--- | :--- | :--- | :--- |
| IPython (ipywidgets)| n/a | allow for widgets to be implemented | Fernando Pérez, Brian E. Granger, IPython: A System for Interactive Scientific Computing, Computing in Science and Engineering, vol. 9, no. 3, pp. 21-29, May/June 2007, doi:10.1109/MCSE.2007.53. URL: https://ipython.org |
| ipywidgets (Layout, Label, Dropdown, Box)| widgets | create dropdowns for docking engine, pocket number, ligand, and pose selection | Fernando Pérez, Brian E. Granger, IPython: A System for Interactive Scientific Computing, Computing in Science and Engineering, vol. 9, no. 3, pp. 21-29, May/June 2007, doi:10.1109/MCSE.2007.53. URL: https://ipython.org |

In [95]:
import numpy as np
import pandas as pd
import numbers
import re
import sys, os
import glob
sys.path.insert(1, 'utilities/')
from utils import pdbqt_to_sdf

import py3Dmol
import ipywidgets as widgets
from ipywidgets import Layout, Label, Dropdown, Box, HBox, SelectMultiple

from vina import Vina

from openbabel import pybel
from rdkit import Chem
import MDAnalysis as mda 
from MDAnalysis.coordinates import PDB
import prolif as plf
from prolif.plotting.complex3d import Complex3D

sys.path.insert(1, 'utilities/ligandsplitter/ligandsplitter')
from ligandanalysis import get_vars, group_idxes_from_mol

import warnings
warnings.filterwarnings("ignore")
warnings.simplefilter(action='ignore', category=pd.errors.PerformanceWarning)

## Import values from docking-prep

Prior to docking, the data obtained from the previous notebook needs to be imported in order to be used. The glob library gets the pdb file containing the receptor, and the ligand information is obtained from the ligand_information.csv file created at the end of the previous notebook. The protein pocket information generated from fpocket is imported from the prot_pockets.csv file.

In [48]:
prot_pockets = pd.read_csv('data/protein_pockets.csv',index_col=[0])
ligand_information = pd.read_csv('data/ligand_information.csv')

pdb_id = ligand_information["pdb_id"][0]

In [18]:
ligs = []
filenames = []
filenames_H = []
filenames_pdbqt = []
center = []
size = []
for r in ligand_information.index:
    ligs.append(ligand_information["ligs"][r])
    filenames.append(ligand_information["filenames"][r])
    filenames_H.append(ligand_information["filenames_H"][r])
    filenames_pdbqt.append(ligand_information["filenames_pdbqt"][r])
    temp = [float(ligand_information["center_x"][r]), float(ligand_information["center_y"][r]), float(ligand_information["center_z"][r])]
    center.append(temp)
    temp1 = [float(ligand_information["size_x"][r]), float(ligand_information["size_y"][r]), float(ligand_information["size_z"][r])]
    size.append(temp1)

## Docking

This notebook utilizes two docking engines for molecular docking: VINA and SMINA. VINA is one of many docking engines available in AutoDock Suite, and is widely used due to its relatively quick docking speed and easy-to-use interface compared to the other docking engines in the suite. SMINA is a fork of VINA, and allows for the modification of scoring terms by users and also adds other functions that make the engine more convenient (allowing multi-ligand files such as .sdf files, improving minimization algorithms, adding additional term types, and allowing for multiple ligand molecular formats). 

<div class="alert alert-block alert-info">
<b>Please note:</b> 
VINA only has force field parameters for atoms of the following elements
<ul> <li>hydrogen</li> <li>carbon</li> <li>oxygen</li> <li>nitrogen</li> <li>phosphorous</li> <li>sulfur</li> <li>calcium</li> <li>maganese</li> <li>iron</li> <li>zinc</li> <li>halogens (fluorine, chlorine, bromine, iodine)</li> </ul>
For ligands containing atoms that are not listed above, it is recommended that users either 1) select all ligands other than those containing atoms that are not supported using the selection widget below or 2) only use SMINA as the docking engine. Trying to dock a ligand with an unsupported ligand using VINA will result in an error.

To select multiple ligands using the selection widget, hold down the control key (PC) or command key (Mac) while clicking on the names of each ligand you would like to dock
</div>

In [19]:
style = {'description_width': 'initial'}
select_ligs = SelectMultiple(options = ligs, description = 'Select Ligand/s to Dock:', style = style)
select_ligs

SelectMultiple(description='Select Ligand/s to Dock:', options=('SIN1', 'FSN501'), style=DescriptionStyle(desc…

Ligand docking can either be site-specific or blind. Site-specific docking uses a location of the receptor where we know the ligand binds, and uses the center and size of the ligand as determined in docking_prep. Blind docking attempts to bind the ligand in multiple potential pockets in the protein (determined using fpocket in docking_prep) and requires more computational energy to perform. The option selected in the dropbox below will determine the method used in this notebook.

In [20]:
style = {'description_width': 'initial'}
select_type = Dropdown(options = ["Site-specific docking","Blind docking"], description = 'Select Docking Type:', style = style)
select_type

Dropdown(description='Select Docking Type:', options=('Site-specific docking', 'Blind docking'), style=Descrip…

### Docking using VINA

Below is a step-by-step (cell-by-cell) guide on how the VINA docking engine is used to generate poses and scores for each pocket and ligand
- Prior to docking, two new folders are created in the data folder to organize the output data (vina_out and vina_out_2). 
- Using the information collected in the docking-prep notebook, each pocket's center values and size values are added to their respective lists, which are called pocket_center and pocket_size. In both lists, each instance is a list of the x, y, and z values corresponding to one pocket's data (as a result, pocket_center and pocket_size are nested lists, and the length of both lists is equal to the number of binding pockets)
    - For example, pocket_center may look like this: [[x1, y1, z1][x2, y2, z2][x3, y3, z3]]
- Using the pocket size and center lists and the pdbqt files for the receptor and desired ligand, ligand poses are generated for each binding pocket (the number of poses depends on the value of n_poses, which is set to 5 in this notebook). The amount of computational effort needed to generate the poses for a given pocket and ligand is called the exhaustiveness. As exhaustiveness increases, the more reproducible the results tend to be. While the default value of exhaustiveness is 8, this notebook uses an exhaustiveness of 5 due to memory limitations.
- The results of running the VINA docking engine are stored as pdbqt files and can be located in the vina_out folder. In order to analyze and vizualize the results, the pdbqt files are converted into sdf files using the function pdbqt_to_sdf (created by Angel Ruiz-Moreno), which can be found in the vina_out_2 folder. The names of each file follows the formula of `(ligand name)_vina_pocket_(pocket number).pdbqt` for the pdbqt files and `(ligand name)_pocket_(pocket number)_(name of folder).sdf` for the sdf files.

In [21]:
# Create paths for vina output files
current_dir = os.getcwd()
dataPath = os.path.join(current_dir, "data")

# create vina_out file path/dir, return error if exists
vina_out = os.path.join(dataPath, "vina_out")
try:
    os.mkdir(vina_out)
except OSError as error:
    print(error)

# create vina_out_2 file path/dir, return error if exists
vina_out_2 = os.path.join(dataPath, "vina_out_2")
try:
    os.mkdir(vina_out_2)
except OSError as error:
    print(error)

[Errno 17] File exists: '/Users/leesch/Desktop/BASIL/data/vina_out'
[Errno 17] File exists: '/Users/leesch/Desktop/BASIL/data/vina_out_2'


In [49]:
pocket_center = []
pocket_size = []
for i in select_ligs.value:
    for pocket in prot_pockets.index:
        c_x = prot_pockets.loc[pocket,'center_x']
        c_y = prot_pockets.loc[pocket,'center_y']
        c_z = prot_pockets.loc[pocket,'center_z']
        s_x = prot_pockets.loc[pocket,'size_x']
        s_y = prot_pockets.loc[pocket,'size_y']
        s_z = prot_pockets.loc[pocket,'size_z']
        pocket_center.append([c_x, c_y, c_z])
        pocket_size.append([s_x, s_y, s_z])

In [50]:
def vina_dock(ligand):
    v = Vina(sf_name='vina')
    v.set_receptor(f'data/PDBQT_files/{pdb_id}_protein.pdbqt')
    v.set_ligand_from_file(f"data/PDBQT_files/{ligand}_H.pdbqt")
    if select_type.value == "Blind docking":
        for pock_num, pocket in enumerate(prot_pockets.index):
            v.compute_vina_maps(center = pocket_center[pock_num], box_size = pocket_size[pock_num])
            v.dock(exhaustiveness=5, n_poses=5)
            v.write_poses("data/vina_out/" + str(ligand) + "_vina_pocket_" + str(pocket) + '.pdbqt', n_poses=5, overwrite=True)
    else:
        v.compute_vina_maps(center = center[ligs.index(ligand)], box_size = size[ligs.index(ligand)])
        v.dock(exhaustiveness=5, n_poses=5)
        v.write_poses("data/vina_out/" + str(ligand) + '.pdbqt', n_poses=5, overwrite=True)

In [51]:
for i in select_ligs.value:
    vina_dock(i)


mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
   1       -9.929          0          0
   2       -8.508        3.1      7.059
   3       -8.448      3.027      3.801
   4         -8.3      3.472      7.512
   5       -8.025       2.59      3.379
Computing Vina grid ... done.
Performing docking (random seed: -941108634) ... 
0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
   1       -3.402          0          0
   2       -3.296      1.316      3.386
   3       -3.174      1.064      1.604
   4       -2.911      7.161      8.344
   5       -2.833      8.628      9.521
Computing Vina grid ... done.




Performing docking (random seed: -941108634) ... 
0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
   1       -3.857          0          0
   2       -3.727     0.7923      3.478
   3       -3.512      1.096      1.367
   4       -1.914      9.189      9.763
Computing Vina grid ... done.
Performing docking (random seed: -941108634) ... 
0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
   1       -3.682          0          0
   2       -3.512      1.415       3.93
   3       -3.309      2.316      2.768
   4       -2.746      1.4




mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
   1       -3.041          0          0
   2       -2.944      1.222      4.148
   3       -2.712      1.626      3.813
   4       -2.648      2.908      4.056
   5       -2.618      2.464      2.598
Computing Vina grid ... done.
Performing docking (random seed: -941108634) ... 
0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
   1       -2.459          0          0
   2        -2.41     0.8961      4.262
   3       -2.375      2.498      4.273
   4       -2.285      2.248      3.233
   5        -2.18      1.941      2.524
Computing Vina grid ... done.
Performing docking (random seed: -941108634) ... 
0%   10   20   30   40   50   60   




mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
   1       -3.044          0          0
   2       -3.017     0.9377      3.386
   3       -2.793      1.464       1.82
   4       -2.705     0.6798      1.492
   5       -2.526      2.345      4.023
Computing Vina grid ... done.
Performing docking (random seed: -941108634) ... 
0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
   1        -2.42          0          0
   2       -2.361      15.81      16.28
   3       -2.321      2.241      3.359
   4        -1.97      1.795      3.379
   5        82.02      14.86      16.04
Computing Vina grid ... done.
Performing docking (random seed: -941108634) ... 
0%   10   20   30   40   50   60   




mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
   1       -3.593          0          0
   2       -3.572      1.629      1.991
   3       -3.461      1.757      3.941
   4       -2.925       2.21      3.032
   5       -2.795      7.686       8.72
Computing Vina grid ... done.
Performing docking (random seed: -941108634) ... 
0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
   1       -2.554          0          0
   2       -2.494      0.967       3.58
   3       -2.463      1.402      1.803
   4       -2.194      2.205      2.447
Computing Vina grid ... done.
Performing docking (random seed: -941108634) ... 
0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----



Performing docking (random seed: -941108634) ... 
0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
   1       -3.083          0          0
   2       -3.057      2.242      3.218
   3       -2.994      15.08      15.54
   4       -2.932      14.27      15.42
   5       -2.711      14.92      16.16
Computing Vina grid ... done.
Performing docking (random seed: -941108634) ... 
0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
   1       -2.429          0          0
   2       -2.342      1.592      2.857
   3       -2.288      1.3




mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
   1       -2.635          0          0
   2       -2.627      1.111      3.903
   3       -2.557      2.582      3.835
   4       -2.483       1.58      2.033
   5       -2.467      2.667      3.007
Computing Vina grid ... done.
Performing docking (random seed: -941108634) ... 
0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
   1       -3.418          0          0
   2       -3.358       1.19      1.673
   3       -3.249      1.276       3.54
   4       -2.917      7.318      8.499
Computing Vina grid ... done.
Performing docking (random seed: -941108634) ... 
0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----




mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
   1        -2.72          0          0
   2        -2.71      1.873      2.627
   3       -2.582      7.666      8.716
   4       -2.529      8.348      9.534
   5       -2.524      8.897      9.456
Computing Vina grid ... done.
Performing docking (random seed: 123073671) ... 
0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************





mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
   1          -10          0          0
   2       -8.541       2.03       2.48
   3       -8.527      2.987       3.74
   4       -8.366      3.065      6.991
   5       -8.359      3.488      7.496
Computing Vina grid ... done.
Performing docking (random seed: 123073671) ... 
0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
   1           27          0          0
   2        52.15      3.273      7.386
   3        65.03      8.271      11.73
Computing Vina grid ... done.
Performing docking (random seed: 123073671) ... 
0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
**********




mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
   1        36.01          0          0
   2        64.79      9.435      13.47
   3        73.02       8.72      13.42
   4        73.11      10.21      13.11
   5        77.18      8.051      12.41
Computing Vina grid ... done.
Performing docking (random seed: 123073671) ... 
0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
   1       -6.988          0          0
   2       -6.471      1.948      2.657
   3       -5.995      3.649      7.754
   4       -5.989       3.48      7.654
   5       -5.877      3.668      6.484
Computing Vina grid ... done.
Performing docking (random seed: 123073671) ... 
0%   10   20   30   40   50   60   70




mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
   1       -5.131          0          0
   2       -4.972      4.228       6.25
   3       -4.878      3.113      6.511
   4       -4.736      2.828      6.118
   5       -4.635      3.335      7.092
Computing Vina grid ... done.
Performing docking (random seed: 123073671) ... 
0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
   1       -6.794          0          0
   2        -6.18       6.66      9.567
   3       -6.146      3.445      5.937
   4       -6.074       4.07      5.477
   5       -5.745      4.506      6.514
Computing Vina grid ... done.
Performing docking (random seed: 123073671) ... 
0%   10   20   30   40   50   60   70



Performing docking (random seed: 123073671) ... 
0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
   1       -6.816          0          0
   2        -6.43      3.761      6.683
   3       -6.413      12.73      15.97
   4       -6.403      2.653      5.931
   5       -6.335      13.52      16.08
Computing Vina grid ... done.
Performing docking (random seed: 123073671) ... 
0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************





mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
   1       -5.176          0          0
   2       -5.138      3.397      6.889
   3       -5.023      3.118      7.001
   4       -4.848       2.87      5.184
   5       -4.845      3.066      4.416
Computing Vina grid ... done.
Performing docking (random seed: 123073671) ... 
0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
   1       -4.266          0          0
   2       -4.226      2.599      3.958
   3       -3.919      3.588      7.159
   4        -3.79       2.05      2.756
   5       -2.122      3.076      6.603
Computing Vina grid ... done.
Performing docking (random seed: 123073671) ... 
0%   10   20   30   40   50   60   70




mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
   1       -1.488          0          0
   2       0.4555      2.816      4.208
   3        58.83      4.406      6.716
   4        158.9      4.748      7.396
   5         1249      2.002      2.733
Computing Vina grid ... done.
Performing docking (random seed: 123073671) ... 
0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
   1       -5.271          0          0
   2       -5.114      2.469      3.176
   3       -4.611      3.603       7.61
   4       -4.372      2.539      3.222
   5       -4.092      3.398      6.991
Computing Vina grid ... done.
Performing docking (random seed: 123073671) ... 
0%   10   20   30   40   50   60   70



done.
Performing docking (random seed: 123073671) ... 
0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
   1       -7.886          0          0
   2       -7.797      1.528      2.154
   3       -7.712      3.622       7.41
   4       -7.517      3.061      6.801
   5       -7.296      3.005      6.959
Computing Vina grid ... done.
Performing docking (random seed: 123073671) ... 
0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************




In [52]:
# Create sdf files from pdbqt
for i in select_ligs.value:
    if select_type.value == "Blind docking":
        for pocket in prot_pockets.index:
            pdbqt_to_sdf(pdbqt_file=f"data/vina_out/{i}_vina_pocket_{pocket}.pdbqt",output=f"data/vina_out_2/{i}_pocket_{pocket}_vina_out_2.sdf")
    else:
        pdbqt_to_sdf(pdbqt_file=f"data/vina_out/{i}.pdbqt",output=f"data/vina_out_2/{i}_vina_out_2.sdf")

### Docking using SMINA

Below is a step-by-step (cell-by-cell) guide on how the SMINA docking engine is used to generate poses and scores for each pocket and ligand
- Prior to docking, two new folders are created in the data folder to organize the output data (smina_out and smina_out_2). The path for the smina docking engine executable is also initialized to allow for the docking engine to be used, as it is a local file.
- Using the the pdbqt file for the receptor, the mol2 file for the desired ligand, and the pocket center/size values from the prot_pockets dataframe, ligand poses are generated for each binding pocket (the number of poses depends on the value of num_modes, which is set to 5 in this notebook). The amount of computational effort needed to generate the poses for a given pocket and ligand is called the exhaustiveness. As exhaustiveness increases, the more reproducible the results tend to be. While the default value of exhaustiveness is 8, this notebook uses an exhaustiveness of 5 due to memory limitations.
- The results of running the SMINA docking engine are stored as sdf files and can be located in the smina_out folder. However, due to the fact that the output files do not have a flag marking it as three dimensional, the sdf files must be read using SDMolSupplier and re-written using SDWriter to avoid excessive errors. The re-written sdf files can be found in the smina_out_2 folder. The names of each file follows the formula of `(ligand name)_pocket_(pocket number)_(name of folder).sdf` for the sdf files.

In [85]:
# Create paths for smina software and output files
current_dir = os.getcwd()
dataPath = os.path.join(current_dir, "data")

# create smina_out file path/dir, return error if exists
smina_out = os.path.join(dataPath, "smina_out")
try:
    os.mkdir(smina_out)
except OSError as error:
    print(error)

# create smina_out_2 file path/dir, return error if exists
smina_out_2 = os.path.join(dataPath, "smina_out_2")
try:
    os.mkdir(smina_out_2)
except OSError as error:
    print(error)

[Errno 17] File exists: '/Users/leesch/Desktop/BASIL/data/smina_out'
[Errno 17] File exists: '/Users/leesch/Desktop/BASIL/data/smina_out_2'


In [None]:
# Using SMINA to dock ligand/s in docking boxes based on fpocket's identified pockets
d = 0
for i in select_ligs.value: 
    if select_type.value == "Blind docking":
        for pock_num, pocket in enumerate(prot_pockets.index):
            rec = f'data/PDBQT_files/{pdb_id}_protein.pdbqt'
            lig = f'data/MOL2_files/{i}_H.mol2'
            outfile = f'data/smina_out/{i}_pocket_{pocket}_smina_out.sdf'
            ! smina -r {rec} -l {lig} -o {outfile} -center_x {pocket_center[pock_num][0]} -center_y {pocket_center[pock_num][1]} -center_z {pocket_center[pock_num][2]} -size_x {pocket_size[pock_num][0]} -size_y {pocket_size[pock_num][1]} -size_z {pocket_size[pock_num][2]} --exhaustiveness 5 --num_modes 5
    else:
        rec = f'data/PDBQT_files/{pdb_id}_protein.pdbqt'
        lig = f'data/MOL2_files/{i}_H.mol2'
        outfile = f'data/smina_out/{i}_smina_out.sdf'
        ! smina -r {rec} -l {lig} -o {outfile} --autobox_ligand {lig} --autobox_add 5 --exhaustiveness 5 --num_modes 5

In [None]:
# Rewrite .sdf output files to add 3D tag
# This code will result in warnings. This is normal as long as the warning is
# "Warning: molecule is tagged as 2D, but at least one Z coordinate is not zero. Marking the mol as 3D."
mols_all = []
for i in select_ligs.value:
    mols = []
    if select_type.value == "Blind docking":
        for pocket in prot_pockets.index:
            with Chem.SDMolSupplier(f'data/smina_out/{i}_pocket_{pocket}_smina_out.sdf') as suppl:
                for mol in suppl:
                    if mol is not None:
                        Chem.MolToMolBlock(mol)
                        mols.append(mol)
            with Chem.SDWriter(f"data/smina_out_2/{i}_pocket_{pocket}_smina_out_2.sdf") as w:
                for mol in mols:
                    w.write(mol)
    else:
        with Chem.SDMolSupplier(f'data/smina_out/{i}_smina_out.sdf') as suppl:
            for mol in suppl:
                if mol is not None:
                    Chem.MolToMolBlock(mol)
                    mols.append(mol)
        with Chem.SDWriter(f"data/smina_out_2/{i}_smina_out_2.sdf") as w:
            for mol in mols:
                w.write(mol)

## Analysis of docking output

Now that we have results from molecular docking, we need to make sense of the information. If you were to open the sdf files in a text editor, you would see x, y, and z coordinates for each atom in the ligand, the bond types between atoms in the ligand, and the score of the ligand pose. While useful, this information is difficult to interpret and visualize. To get information regarding the number of interactions, the types of interaction, and the atoms (ligand) and residues (receptor) involved in binding the ligand to the receptor, interaction fingerprints (IFPs) can be generated and viewed using the prolif library, which can be used to identify key atoms in the ligand and key residues in the receptor involved in protein-ligand complex formation.

In [86]:
# load protein
prot_mol = Chem.MolFromPDBFile("data/PDB_files/" + str(pdb_id) + "_protein_H.pdb")
protein_plf = plf.Molecule.from_rdkit(prot_mol)

In [87]:
style = {'description_width': 'initial'}
select_dock = Dropdown(options = [('smina'), ('vina')], description = 'Select the docking engine that was used:', style = style)
select_dock

Dropdown(description='Select the docking engine that was used:', options=('smina', 'vina'), style=DescriptionS…

In [88]:
all_ligand_plf = []
ligand_plf = []
all_df = []
all_ifps = []
for i in select_ligs.value:
    if select_type.value == "Blind docking":
        for pocket in prot_pockets.index:
            lig_suppl = plf.sdf_supplier(f"data/{select_dock.value}_out_2/{i}_pocket_{pocket}_{select_dock.value}_out_2.sdf")
            fp = plf.Fingerprint(count=True)
            fp.run_from_iterable(lig_suppl, protein_plf)
            results_df = fp.to_dataframe()
            all_df.append(results_df)
            for lig in lig_suppl:
                all_ligand_plf.append(lig)
                ifp = fp.generate(lig, protein_plf, metadata = True)
                all_ifps.append(ifp)
    else:
        lig_suppl = plf.sdf_supplier(f"data/{select_dock.value}_out_2/{i}_{select_dock.value}_out_2.sdf")
        fp = plf.Fingerprint(count=True)
        fp.run_from_iterable(lig_suppl, protein_plf)
        results_df = fp.to_dataframe()
        all_df.append(results_df)
        for lig in lig_suppl:
            all_ligand_plf.append(lig)
            ifp = fp.generate(lig, protein_plf, metadata = True)
            all_ifps.append(ifp)

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/4 [00:00<?, ?it/s]

  0%|          | 0/4 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/4 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/4 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/4 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/4 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/4 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/4 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

In [55]:
all_results = []
scores = [] # get list of scores for each pose
for h, i in enumerate(select_ligs.value):
    # initialize list that contains values all poses in all pockets (for 1 ligand at a time)
    nested_results = []
    # append pose data for each pocket to nested_results
    if select_type.value == "Blind docking":
        for pocket in prot_pockets.index:
            results = Chem.SDMolSupplier(f"data/{select_dock.value}_out_2/{i}_pocket_{pocket}_{select_dock.value}_out_2.sdf")
            nested_results.append(results)
    else:
        results = Chem.SDMolSupplier(f"data/{select_dock.value}_out_2/{i}_{select_dock.value}_out_2.sdf")
        nested_results.append(results) 
    # add all values in nested_results to allResults list
    all_results.append(nested_results)
    
# get score values for every pose in allResults
if select_type.value == "Blind docking":
    for linenum, i in enumerate(all_results):
        for num, pocket in enumerate(i):
            for num2, pose in enumerate(pocket):
                if select_dock.value == "smina":
                    scores.append(float(all_results[linenum][num][num2].GetProp('minimizedAffinity')))
                else:
                    scores.append(float(all_results[linenum][num][num2].GetProp('Score')))
else:
    for linenum, i in enumerate(all_results):
        for num, pose in enumerate(i):
            for num2, item in enumerate(pose):
                if select_dock.value == "smina":
                    scores.append(float(all_results[linenum][num][num2].GetProp('minimizedAffinity')))
                else:
                    scores.append(float(all_results[linenum][num][num2].GetProp('Score')))

In [66]:
# prot pockets not listed - try to fix?
df = pd.concat([d for d in all_df], axis=0, ignore_index=False, sort=False).reset_index()
df.insert(1, "Score", pd.Series(scores))
df = df.fillna(0)

While the dataframe generated using the prolif library has a lot of useful information, we are also going to add the distance between interacting ligand and protein atoms, the indexes of both the ligand and protein atoms involved in the interaction, and the functional group the ligand's atom is a member of if applicable.

In [134]:
# fix for site specific
if select_type.value == "Blind docking":
    #df2 = df[["cav_id", "Frame", "Score"]].copy()
    df2 = df[["Frame", "Score", "UNL1"]].copy()
else:
    df2 = df[["Frame", "Score", "UNL1"]].copy()

In [135]:
df2

Unnamed: 0_level_0,Frame,Score,UNL1,UNL1,UNL1,UNL1,UNL1,UNL1,UNL1,UNL1,UNL1,UNL1,UNL1,UNL1,UNL1,UNL1,UNL1,UNL1,UNL1,UNL1,UNL1
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,ILE24.H,GLY25.H,GLY69.H,LYS70.H,GLU164.H,SER27.H,HIS57.H,HIS57.H,...,SER195.H,VAL213.H,VAL213.H,TYR76.H,ILE82.H,ILE82.H,MET84.H,MET84.H,LYS110.H,PRO111.H
Unnamed: 0_level_2,Unnamed: 1_level_2,Unnamed: 2_level_2,VdWContact,VdWContact,VdWContact,VdWContact,VdWContact,VdWContact,Hydrophobic,PiStacking,...,VdWContact,Hydrophobic,VdWContact,PiStacking,Hydrophobic,VdWContact,Hydrophobic,VdWContact,PiCation,VdWContact
0,0,-3.402,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,1,-3.296,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,2,-3.174,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,3,-2.911,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,4,-2.833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
193,0,-6.095,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,1.0,1.0,5.0,1.0,1.0,0.0
194,1,-5.957,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,6.0,0.0,1.0,0.0
195,2,-5.874,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0
196,3,-5.335,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [136]:
largest_array_column = {}
for col_num, column in enumerate(df):
    largest_array = 0
    if col_num > 1:
        for row in df[column]:
            if int(row) > largest_array:
                largest_array = int(row)
        largest_array_column[column] = largest_array

In [137]:
%%capture
# create new columns for functional group, residue type, distance, and index information
# LEE NOTE TO LEE: fix how columns are added. make pretty
col_names_list = []
residues = []
interactions = []
total_counter = 0
for key in all_ifps:
    for key_new in key:
        for key_2 in key[key_new]:
            print(key_new)
            residues.append(str(key_new[1]))
            interactions.append(str(key_2))
            lig_name = str(key_new[0])
            res_name = str(key_new[1])
            column_name = (lig_name, res_name, key_2)
            if column_name not in col_names_list:
                df2[column_name] = df[column_name]
                number_of_ints = int(largest_array_column[column_name])
                counter_ind = 0
                while counter_ind < number_of_ints:
                    df2[(lig_name, res_name, f"Functional group involved ({key_2}){counter_ind}")] = pd.Series([0] * df.shape[0])
                    df2[(lig_name, res_name, f"Residue type({key_2}){counter_ind}")] = pd.Series([0] * df.shape[0])
                    df2[(lig_name, res_name, f"Distance ({key_2}){counter_ind}")] = pd.Series([0] * df.shape[0])
                    df2[(lig_name, res_name, f"Index 1 (Ligand) ({key_2}){counter_ind}")] = pd.Series([0] * df.shape[0])
                    df2[(lig_name, res_name, f"Index 2 (Ligand) ({key_2}){counter_ind}")] = pd.Series([0] * df.shape[0])
                    df2[(lig_name, res_name, f"Index 3 (Protein) ({key_2}){counter_ind}")] = pd.Series([0] * df.shape[0])
                    df2[(lig_name, res_name, f"Index 4 (Protein) ({key_2}){counter_ind}")] = pd.Series([0] * df.shape[0])
                    counter_ind += 1
            total_counter += 1

In [138]:
df2 = df2.astype(object)

To get the functional groups in each ligand, a dictionary is created where the keys are the indexes of atoms determined to be in a functional group, and the corresponding value is the name of the functional group. Due to keys being unable to be used more than once in a dictionary, atoms that are members of two or more functional groups will only have one of their functional groups listed as the value.

In [139]:
# create amino acid residue and functional group dictionaries
all_func_groups, type_dict, functional_groups, functional_groups_dict, groups_to_numbers, groups_dict = get_vars()

#LEE NOTE TO LEE: changes made above makes this cell non-functional. fix
# find atom indices for ligand and protein, functional groups involved (ligand), residue type (protein), and
# distance between ligand and protein in interaction
total_counter = 0
for number, key in enumerate(all_ifps):
    for key_new in key:
        for key_2 in key[key_new]:
            lig_name = str(key_new[0])
            res_name = str(key_new[1])
            column_name = (lig_name, res_name, key_2)
            get_pose = df2["Frame"][number]
            x = key[key_new]
            y = x[key_2]
            df_groups = [0] * largest_array_column[column_name]
            df_residue = [0] * largest_array_column[column_name]
            df_distance = [0] * largest_array_column[column_name]
            df_ind_1 = [0] * largest_array_column[column_name]
            df_ind_2 = [0] * largest_array_column[column_name]
            df_ind_3 = [0] * largest_array_column[column_name]
            df_ind_4 = [0] * largest_array_column[column_name]
            for inst_num, instance in enumerate(y):
                distance = instance["distance"]
                df_distance[inst_num] = distance
                found_res = res_name[:3]
                df_residue[inst_num] = (type_dict[found_res])
                parent_index = instance["parent_indices"]
                if len(parent_index["ligand"]) == 2:
                    df_ind_1[inst_num] = parent_index["ligand"][0]
                    df_ind_2[inst_num] = parent_index["ligand"][1]
                else:
                    df_ind_1[inst_num] = parent_index["ligand"][0]
                    df_ind_2[inst_num] = 0
                if len(parent_index["protein"]) == 2:
                    df_ind_3[inst_num] = parent_index["protein"][0]
                    df_ind_4[inst_num] = parent_index["protein"][1]
                else:
                    df_ind_3[inst_num] = parent_index["protein"][0]
                    df_ind_4[inst_num] = 0
                current = all_ligand_plf[number]
                group_ints = group_idxes_from_mol(current)
                for value in group_ints.keys():
                    if len(parent_index["ligand"]) == 2:
                        if value == parent_index["ligand"][0] | value == parent_index["ligand"][1]:
                            df_groups[inst_num] = int(groups_to_numbers[group_ints[value]])
                    else:
                        if value == parent_index["ligand"][0]:
                            df_groups[inst_num] = int(groups_to_numbers[group_ints[value]])
            number_of_ints = int(largest_array_column[column_name])
            counter_ind = 0
            while counter < number_of_ints:
                df2.at[number, (lig_name, res_name, f"Functional group involved ({key_2}){counter_ind}")] = pd.array(df_groups, dtype=int)
                df2.at[number, (lig_name, res_name, f"Residue type({key_2}){counter_ind}")] = pd.array(df_residue, dtype=int)
                df2.at[number, (lig_name, res_name, f"Distance ({key_2}){counter_ind}")] = pd.array(df_distance, dtype=int)
                df2.at[number, (lig_name, res_name, f"Index 1 (Ligand) ({key_2}){counter_ind}")] = pd.array(df_ind_1, dtype=int)
                df2.at[number, (lig_name, res_name, f"Index 2 (Ligand) ({key_2}){counter_ind}")] = pd.array(df_ind_2, dtype=int)
                df2.at[number, (lig_name, res_name, f"Index 3 (Protein) ({key_2}){counter_ind}")] = pd.array(df_ind_3, dtype=int)
                df2.at[number, (lig_name, res_name, f"Index 4 (Protein) ({key_2}){counter_ind}")] = pd.array(df_ind_4, dtype=int)
                counter_ind += 1
            total_counter += 1

In [140]:
df2 = df2.convert_dtypes()
df2.to_csv('data/docking_information.csv', index = False)
df2

Unnamed: 0_level_0,Frame,Score,UNL1,UNL1,UNL1,UNL1,UNL1,UNL1,UNL1,UNL1,UNL1,UNL1,UNL1,UNL1,UNL1,UNL1,UNL1,UNL1,UNL1,UNL1,UNL1
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,ILE24.H,GLY25.H,GLY69.H,LYS70.H,GLU164.H,SER27.H,HIS57.H,HIS57.H,...,ILE82.H,ILE82.H,ILE82.H,PRO111.H,PRO111.H,PRO111.H,PRO111.H,PRO111.H,PRO111.H,PRO111.H
Unnamed: 0_level_2,Unnamed: 1_level_2,Unnamed: 2_level_2,VdWContact,VdWContact,VdWContact,VdWContact,VdWContact,VdWContact,Hydrophobic,PiStacking,...,Index 2 (Ligand) (VdWContact)0,Index 3 (Protein) (VdWContact)0,Index 4 (Protein) (VdWContact)0,Functional group involved (VdWContact)0,Residue type(VdWContact)0,Distance (VdWContact)0,Index 1 (Ligand) (VdWContact)0,Index 2 (Ligand) (VdWContact)0,Index 3 (Protein) (VdWContact)0,Index 4 (Protein) (VdWContact)0
0,0,-3.402,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1,-3.296,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,2,-3.174,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,3,-2.911,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,4,-2.833,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
193,0,-6.095,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
194,1,-5.957,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
195,2,-5.874,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
196,3,-5.335,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Using the dropdown created by the cell below, up to five ligand poses can be selected to be viewed with the receptor. The number of viewers created by py3Dmol depends on the value of the "Number of Poses to View" dropdown. If multiple ligands were docked, all poses selected to be viewed will be of the same ligand. To select specific poses to view, the pocket number and the pose number corresponding to the desired pose must be selected in the dropdown, making sure that they belong to the same selection as noted in the parentheses of the dropdown's label.

In [94]:
# fix for site specific
pocket_list = []
for pocket in prot_pockets.index:
    pocket_list.append((str(pocket), int(pocket)))

form_item_layout = Layout(
    display='flex',
    flex_flow='row',
    justify_content='space-between')

ligand_number = Dropdown(options = select_ligs.value)
visual_number = Dropdown(options = [('1', 1), ('2', 2), ('3', 3), ('4', 4), ('5', 5)])

pock_number1 = Dropdown(options = pocket_list)
pose_number1 = Dropdown(options = [('1', 1), ('2', 2), ('3', 3), ('4', 4), ('5', 5)])
pock_number2 = Dropdown(options = pocket_list)
pose_number2 = Dropdown(options = [('1', 1), ('2', 2), ('3', 3), ('4', 4), ('5', 5)])
pock_number3 = Dropdown(options = pocket_list)
pose_number3 = Dropdown(options = [('1', 1), ('2', 2), ('3', 3), ('4', 4), ('5', 5)])
pock_number4 = Dropdown(options = pocket_list)
pose_number4 = Dropdown(options = [('1', 1), ('2', 2), ('3', 3), ('4', 4), ('5', 5)])
pock_number5 = Dropdown(options = pocket_list)
pose_number5 = Dropdown(options = [('1', 1), ('2', 2), ('3', 3), ('4', 4), ('5', 5)])

form_items1 = [Box([Label(value='Ligand'), ligand_number], layout=form_item_layout),
              Box([Label(value='Number of Poses to View'), visual_number], layout=form_item_layout)]

form_items2 = [Box([Label(value='Pocket Number (Selection 1)'), pock_number1], layout=form_item_layout),
               Box([Label(value='Pose Number (Selection 1)'), pose_number1], layout=form_item_layout),
               Box([Label(value='Pocket Number (Selection 2)'), pock_number2], layout=form_item_layout),
               Box([Label(value='Pose Number (Selection 2)'), pose_number2], layout=form_item_layout),
               Box([Label(value='Pocket Number (Selection 3)'), pock_number3], layout=form_item_layout),
               Box([Label(value='Pose Number (Selection 3)'), pose_number3], layout=form_item_layout),
               Box([Label(value='Pocket Number (Selection 4)'), pock_number4], layout=form_item_layout),
               Box([Label(value='Pose Number (Selection 4)'), pose_number4], layout=form_item_layout),
               Box([Label(value='Pocket Number (Selection 5)'), pock_number5], layout=form_item_layout),
               Box([Label(value='Pose Number (Selection 5)'), pose_number5], layout=form_item_layout)
              ]

form1 = Box(form_items1, layout=Layout(
    display='flex',
    flex_flow='column',
    border='solid 2px',
    align_items='stretch',
    width='50%'
))
form2 = Box(form_items2, layout=Layout(
    display='flex',
    flex_flow='column',
    border='solid 2px',
    align_items='stretch',
    width='50%'
))

form = HBox([form1, form2])
form

HBox(children=(Box(children=(Box(children=(Label(value='Ligand'), Dropdown(options=('SIN1', 'FSN501'), value='…

For each pose selected in the dropdown made by the cell above, a viewer containing the receptor (including the space-filling surface of the receptor), the generated ligand pose by the docking engine, and the original ligand pose will be created. All of the generated viewers are linked, so all viewers will rotate and move together. The original ligand pose may not be useful for ligands that were added by uploading a local mol2 file or by inputting a SMILES string.

In [None]:
# NEED TO TEST MORE
# fix for site specific
pocket_selection_list = [int(pock_number1.value), int(pock_number2.value), int(pock_number3.value), int(pock_number4.value), int(pock_number5.value)]
pose_selection_list = [int(pose_number1.value), int(pose_number2.value), int(pose_number3.value), int(pose_number4.value), int(pose_number5.value)]

#initialize py3dmol viewer
view = py3Dmol.view(height = 800, width = 900, viewergrid = (1,int(visual_number.value)), linked = True)
view.removeAllModels()
view.setViewStyle({'style':'outline','color':'black','width':0.1})

# view specified poses
num_sel = 0
while num_sel < int(visual_number.value): 
    # add receptor model to all py3dmol viewers
    view.addModel(open(f"data/PDB_files/{pdb_id}_protein_H.pdb",'r').read(),format='pdb')
    Prot=view.getModel(viewer = (0, num_sel))
    Prot.setStyle({'cartoon':{'arrows':True, 'tubes':True, 'style':'oval', 'color':'white'}}, viewer = (0, num_sel))
    view.addSurface(py3Dmol.VDW,{'opacity':0.6,'color':'white'}, viewer = (0, num_sel))
    
    # add reference model of ligand to py3dmol viewer
    view.addModel(open(f"data/MOL2_files/{ligand_number.value}_H.mol2",'r').read(),format='mol2')
    ref_m = view.getModel(viewer = (0, num_sel))
    ref_m.setStyle({},{'stick':{'colorscheme':'magentaCarbon','radius':0.2}})
    
    # add experimental docking data of a desired pose in a pocket to py3dmol viewer
    selected = Chem.SDMolSupplier(f"data/{select_dock.value}_out_2/{ligand_number.value}_pocket_{pocket_selection_list[num_sel]}_{select_dock.value}_out_2.sdf")
    p=Chem.MolToMolBlock(selected[pose_selection_list[num_sel] - 1],False)
    if dock_engine.value == 'smina':
        print('Reference (' + str(ligand_number.value) + '): Magenta | Smina Pose (' + str(ligand_number.value) + '): Cyan')
        print ('Score: {}'.format(selected[pose_selection_list[num_sel] - 1].GetProp('minimizedAffinity')))
    else:
        print('Reference (' + str(ligand_number.value) + '): Magenta | Vina Pose (' + str(ligand_number.value) + '): Cyan')
        print ('Pose: {} | Score: {}'.format(selected[pose_selection_list[num_sel] - 1].GetProp('Pose'), selected[pose_selection_list[num_sel] - 1].GetProp('Score')))
    view.addModel(p,'mol')
    x = view.getModel(viewer = (0, num_sel))
    x.setStyle({},{'stick':{'colorscheme':'cyanCarbon','radius':0.2}})
    num_sel += 1
view.zoomTo()
view.show()

Using an IFP, the interactions between the ligand and the receptor can be visualized using prolif's Complex3D submodule. Only one pose and its interactions can be viewed at a time.

In [None]:
# display interactions. select which one to view using dropdown
pose_pock_select = []
a = 0
while a < int(df.shape[0]):
    pose_pock_select.append(a + 1)
    a += 1
style = {'description_width': 'initial'}
select_pose = Dropdown(options = pose_pock_select, description = 'Select Pose to View:', style = style)
select_pose

In [None]:
comp = Complex3D(all_ifps[select_pose.value], all_ligand_plf[select_pose.value], protein_plf)
comp.display()