# Docking

We are now on the third of our four notebooks. Just for review, here is a list and a description of each notebook that will enable us to review what we have done so far and to anticipate where we are headed. 

1. *EC_class_ligands_search.ipynb*. In the first notebook, we will use scripts to download ligands that are bound to trypsin structures found in the [RCSB Protein Data Bank](https://rcsb.org). 
1. *molecule_manipulation.ipynb*. In the second notebook, we will convert one of the ligands into a format called pdbqt that can be used for docking. We will also make two modified versions of the same ligand in pdbqt format. 
1. *docking.ipynb*. In the third notebook, we will conduct the docking exercise using a Python library called vina, which is based on AutoDock Vina.
1. *binding_site_investigation*. In the fourth notebook, we will visually explore the results of our docking exercise.

In a docking exercise, we are asking the computer to predict how well a small molecule or ligand will bind to a large molecule. In this notebook we will explore the binding of the three small molecules we prepared in [last notebook](molecule_manipulation.ipynb) to an experimentally determined trypsin structure from *Bos taurus*, the cow. The structure, [PDB entry 2ZQ2]([RCSB Protein Data Bank](https://rcsb.org), can be found in the [RCSB Protein Data Bank](https://rcsb.org). 

Learning Goals

* To see how a ligand binds to a protein
* To understand the limitations of a docking study

Learning Objectives

* To import and use the libraries listed below to conduct the docking study
* To understand the role of the functions in each of these libraries in the docking process
* To generate and interpret the quantitative docking results for each of the ligands

Libraries


| Library    | Description     |
| :-----------: | :------------ |
| os | operating system library that enables Jupyter notebooks to run in Mac OS, Windows and Linux |
| Bio | Biopython library that is rich in functions for parsing and manipulating proteins sequences and structures |
| MDAnalysis | a Python library for analyzing trajectories in molecular dynamics simulations |
| openbabel | A Python library for converting 3D structures among a variety of formats |
| vina | A Python implementation of Autodock Vina for docking ligands on macromolecules |


In [1]:
import os
from Bio.PDB import PDBList

# make a directory for docking files
os.makedirs("pdb", exist_ok=True)

pdb_id = "2zq2"

pdb_list = PDBList() 
pdb_list.retrieve_pdb_file(pdb_id, pdir='docking', file_format='pdb')

Structure exists: 'docking/pdb2zq2.ent' 


'docking/pdb2zq2.ent'

In [2]:
# use mdanalysis to isolate the protein

import MDAnalysis as mda

# Load into MDA universe
u = mda.Universe(f"docking/pdb{pdb_id}.ent")

# Select protein atoms
protein = u.select_atoms("protein")

# Write protein to new PDB file
protein.write(f"docking/{pdb_id}_protein.pdb")




In [3]:
# "Fix" protein with openbabel
from openbabel import pybel

pybel_read = pybel.readfile("pdb", f"docking/{pdb_id}_protein.pdb")

In [4]:
for structure in pybel_read:
    protein = structure
    
protein.OBMol.CorrectForPH(7.4)
protein.addh()

  Failed to kekulize aromatic bonds in OBMol::PerceiveBondOrders (title is docking/2zq2_protein.pdb)



In [5]:
# calculate Gasteiger partial charge for each atom.
for atom in protein.atoms:
    charge = atom.OBAtom.GetPartialCharge()

# Write protein to new PQR file
protein.write("pdbqt", f"docking/{pdb_id}.pdbqt", overwrite=True, opt={"r": None})

In [6]:
# Now we will identify the binding site
ligand_mda = u.select_atoms("resname 13U")

In [7]:
# find the center of the ligand
pocket_center = ligand_mda.center_of_geometry()
print(pocket_center)

[18.3068333  -8.06558332 10.638     ]


In [8]:
# compute min and max coordinates of the ligand
ligand_box = ligand_mda.positions.max(axis=0) - ligand_mda.positions.min(axis=0)
ligand_box

array([ 9.161999, 12.002001,  8.903   ], dtype=float32)

In [9]:
# Using the above to steps, we have the center of geometry of our 
# starting ligand and the size of the box that can contain it.

In [10]:
# Before docking, we need to prepare the ligands we will dock.
#  we need to make sure they are adjusted for ph and have partial 
# charges assigned, just like the protein

filenames = ["ligands/13U_ideal.mol2", 
             "ligands_modified/13U_modified_N.xyz", 
             "ligands_modified/13U_modified_methyl.xyz"]

for file in filenames:

    file_base = os.path.basename(file)

    starting_name = file_base.split(".")[0]
    file_type = file_base.split(".")[-1]

    ligand = list(pybel.readfile(file_type, file))[0]
    ligand.OBMol.CorrectForPH(7.4)
    ligand.addh()

    for atom in ligand.atoms:
        charge = atom.OBAtom.GetPartialCharge()

    ligand.write("pdbqt", f"docking/{starting_name}.pdbqt", overwrite=True)

In [11]:
# now we are ready for docking

from vina import Vina

In [12]:
v = Vina(sf_name="vina")
v.set_receptor(f"docking/{pdb_id}.pdbqt")
v.set_ligand_from_file(f"docking/13U_ideal.pdbqt")

In [13]:
pocket_center = [ float(x) for x in pocket_center ]
ligand_box = [ float(x) for x in ligand_box ]

In [14]:
v.compute_vina_maps(center=pocket_center, box_size=ligand_box)

Computing Vina grid ... done.


In [15]:
v.dock(exhaustiveness=5, n_poses=5)

Performing docking (random seed: -1110203316) ... 




0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
   1       -7.629          0          0
   2       -6.889      1.415      2.551
   3        -6.39       1.33      2.219
   4       -5.668      2.017      2.888
   5        -5.57      2.675      4.017


In [16]:
v.write_poses('vina_out_original.pdbqt', n_poses=5, overwrite=True)

In [17]:
v = Vina(sf_name="vina")
v.set_receptor(f"docking/{pdb_id}.pdbqt")
v.set_ligand_from_file(f"docking/13U_modified_methyl.pdbqt")
v.compute_vina_maps(center=pocket_center, box_size=ligand_box)
v.dock(exhaustiveness=5, n_poses=5)

Computing Vina grid ... done.
Performing docking (random seed: -683413570) ... 




0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
   1       -8.084          0          0
   2        -6.52      2.262      7.885
   3        -6.01      2.379      4.233
   4       -5.948      2.315      3.403
   5       -5.067       3.02      6.468


In [18]:
v.write_poses('vina_out_methyl.pdbqt', n_poses=5, overwrite=True)

In [19]:
v = Vina(sf_name="vina")
v.set_receptor(f"docking/{pdb_id}.pdbqt")
v.set_ligand_from_file(f"docking/13U_modified_N.pdbqt")
v.compute_vina_maps(center=pocket_center, box_size=ligand_box)
v.dock(exhaustiveness=5, n_poses=5)

Computing Vina grid ... done.
Performing docking (random seed: 618571335) ... 
0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
   1        -6.23          0          0
   2       -4.949      2.464      6.921
   3       -4.935      2.973      7.578
   4       -4.914      2.708      3.839
   5       -4.565      1.926      2.481




In [20]:
v.write_poses('vina_out_N.pdbqt', n_poses=5, overwrite=True)

In [22]:
from openbabel import openbabel

def convert_pdbqt_to_pdb(input_file, output_file):
    obConversion = openbabel.OBConversion()
    obConversion.SetInAndOutFormats("pdbqt", "pdb")

    mol = openbabel.OBMol()
    if not obConversion.ReadFile(mol, input_file):
        raise ValueError(f"Could not read file {input_file}")

    if not obConversion.WriteFile(mol, output_file):
        raise ValueError(f"Could not write file {output_file}")

    print(f"Successfully converted {input_file} to {output_file}")


In [23]:
# Example usage
input_file = "vina_out_original.pdbqt"
output_file = "vina_out_original.pdb"
convert_pdbqt_to_pdb(input_file, output_file)

Successfully converted vina_out_original.pdbqt to vina_out_original.pdb


In [24]:
# Example usage
input_file = "vina_out_methyl.pdbqt"
output_file = "vina_out_methyl.pdb"
convert_pdbqt_to_pdb(input_file, output_file)

Successfully converted vina_out_methyl.pdbqt to vina_out_methyl.pdb
