<h1>Docking pipeline</h1>
This notebook performs the following tasks:
<ol>
<li>Installing and importing necessary modules and packages, setting environments, and giving permissions to scripts and license. </li>
<li>Read a lit of PDB IDs and download the PDB file to the current directory using a Biopython module.</li>
<li>Ligand extraction from a PDB file, using OpenEye (OE), which requires the conversion to an OE Design Unit before writing out the ligand in PDB format.</li>
<li>Receptor (protein) extraction using a customisable Biopython module, which excludes heteroatoms and water. This writes out the receptor in PDB format.</li>
<li><font color='blue'>(Optional and not currently used)</font> Center of Mass (COM) and Center of Geometry (COG) calculation for the ligand. This is usually used for defining a docking search grid but not required for a score-only vina process using the --autobox flag within vina.</li>
<li>Docking file preparation of the extracted PDB ligand and receptor, for AutoDock Vina, which saves the prepped files in PDBQT format. </li>
<li>Scoring using Autodock Vina, using the --autobox and the --score-only flags. This means that no ligand poses are returned. An output log is saved in OUT format, and the energy is passed onto the next step to save to a CSV file.</li>
<li>The dG score obtained is written to a csv file under the 'computed_dG' column under the PDB ID label. Extra information (ligand SMILES, protein sequence and experimental binding affinity) are also included in the CSV file.</li>
<li>All data are moved to the 'data' directory except the CSV file, which continuously collects data in the current directory, and a new folder is created for each PDB ID.</li>

# Installing and configuring required packages

<b><font color='red'> (only run this if you don't have them already installed. If you git cloned this repo then no installation is required.) </font></b>

In [3]:
# # !conda install -c conda-forge -c openbiosim biosimspace --yes
# !conda install -c conda-forge -c bioconda mgltools openbabel zlib ncurses --yes
# !conda install py3Dmol --yes
# !conda install biopython --yes
# !conda install pdb2pqr --yes
# !conda install MDAnalysis --yes
# # !pip install -q condacolab
# # import condacolab # only needed for running on Google Colab
# # condacolab.install_miniconda()

In [4]:
# avoid using pip install at all costs...
# !pip3 -q install rdkit-pypi

In [5]:
# %%bash
# # we wont be using this script directly, but its function is used in the ExtractLigandFromDU function.
# wget -q https://docs.eyesopen.com/toolkits/python/_downloads/42fee916a9b875e2fc0a6e18e42c8701/extract_ligand_oedu.py
# chmod +x ./extract_ligand_oedu.py

In [6]:
# %%bash
# # this shell script will install ADFRsuite in the directory ADFRsuite, it is used to prepare the receptor and ligand for docking
# mkdir ADFRsuite
# cd ADFRsuite
# wget -q https://ccsb.scripps.edu/adfr/download/1038 -O ADFRsuite_x86_64Linux_1.0.tar.gz
# tar -xzf ADFRsuite_x86_64Linux_1.0.tar.gz
# chmod a+x ADFRsuite_x86_64Linux_1.0
# cd ADFRsuite_x86_64Linux_1.0
# # this will install ADFRsuite in the current directory, the pipeline used will answer y to all questions prompted by the installer
# yes Y | ./install.sh

In [7]:
# %%bash
# # Install AutoDock Vina
# mkdir -p vina
# cd vina
# wget -q https://github.com/ccsb-scripps/AutoDock-Vina/releases/download/v1.2.5/vina_1.2.5_linux_x86_64
# chmod +x ./vina_1.2.5_linux_x86_64
# # AutoDock GPU:
# # wget -q https://github.com/ccsb-scripps/AutoDock-GPU/releases/download/v1.5.3/adgpu_analysis
# # wget -q https://github.com/ccsb-scripps/AutoDock-GPU/releases/download/v1.5.3/adgpu-v1.5.3_linux_ocl_128wi
# cd ..
# alias vina='./vina/vina_1.2.5_linux_x86_64'
# # OpenEye Tools:
# # Protein Prep: 
# mkdir -p OpenEye
# cd OpenEye
# wget -q https://docs.eyesopen.com/toolkits/python/_downloads/99240d9bd18a29490f003efe389f1319/proteinprep.py
# wget -q https://docs.eyesopen.com/toolkits/python/_downloads/42fee916a9b875e2fc0a6e18e42c8701/extract_ligand_oedu.py
# chmod +x ./extract_ligand_oedu.py
# chmod +x ./proteinprep.py
# cd ..
# alias proteinprep='./OpenEye/proteinprep.py'
# alias extract_ligand='./OpenEye/extract_ligand_oedu.py'

We also need the asapdiscovery-modelling module to extract ligands. I've commented out the code as I've already git cloned it into my cwd.

In [8]:
# %%bash
# we will use asap discovery to prepare the ligand for docking
# follow guidance in https://github.com/asapdiscovery/asapdiscovery/tree/main to clone asapdiscovery
# git clone https://github.com/choderalab/asapdiscovery.git
# cd asapdiscovery
# change the platform to the platform you are using
# mamba env create -f devtools/conda-envs/asapdiscovery-{platform}.yml
# conda activate asapdiscovery
# pip install asapdiscovery-data

<h1><b><font color='red'>(Run codes from this point on.) </font></b></h1></br>

You will also need your OpenEye license AND set it up correctly using ```
os.environ['OE_LICENSE'] = '/path/to/license'``` before importing any OpenEye or ASAP Discovery modules, or OpenEye will kill your kernel!<br>

Click <a href=https://docs.eyesopen.com/toolkits/python/quickstart-python/license.html>here</a> to see how to install licence for OpenEye Toolkits.

On a LINUX commandline, you can run: <br>
```export OE_LICENSE=/home/USERNAME/oe_license.txt```<br>
to set your license path, and verify that it's been correctly set up by running:<br>
```echo $OE_LICENSE```<br>
```cat $OE_LICENSE```<br>

These commands should print out the path to the license and the content of the license.

ADFRsuite is used to prepare receptors and ligands for AutoDock Vina, and permission needs to be granted to scripts within `./ADFRsuite/ADFRsuite_x86_64Linux_1.0` 

In [9]:
import os
# giving permissions to run vina and ADFR scripts, change the directories of vina as required
os.chmod('./vina/vina_1.2.5_linux_x86_64', 0o755) # giving permissions to run vina scripts
os.chmod('./vina/vina_split_1.2.5_linux_x86_64', 0o755) # giving permissions to run vina scripts
os.chmod('./extract_ligand.sh', 0o755) # giving permissions to run extract_ligand.sh
os.chmod('./ADFRsuite/ADFRsuite_x86_64Linux_1.0', 0o755) # giving permissions to run ADFRsuite

# make sure to set the OE_LICENSE environment variable, the full path should be included, or else openeye will kill your kernel!
os.environ['OE_LICENSE'] = '/home/ian/oe_license.txt'
os.chmod('/home/ian/oe_license.txt', 0o755)

In [10]:
import os
from Bio.PDB import PDBList, PDBParser, Select, PDBIO
from subprocess import Popen, PIPE
import re
import logging 
from pathlib import Path
import contextlib
import subprocess
from rdkit import Chem
from rdkit.Chem import rdMolTransforms as rdmt
import MDAnalysis as mda
import numpy as np
import pandas as pd
from tqdm import tqdm 
from asapdiscovery.data.backend.openeye import (
    oechem,
    oedocking,
    oegrid,
    oespruce,
    openeye_perceive_residues,
)
from asapdiscovery.modeling.schema import MoleculeComponent, MoleculeFilter
from asapdiscovery.modeling.modeling import split_openeye_mol, make_design_unit
import matplotlib.pyplot as plt
import seaborn as sns
import openeye.oechem as oechem
from pdbfixer import PDBFixer
from openmm.app import PDBFile
from rdkit import Chem
from rdkit.Chem import SDWriter
from ase import Atoms
from ase.io.sdf import read_sdf
from ase.io import read
from iodata import load_one
from iodata.utils import angstrom

from openmm.app.element import zinc, iron, calcium  # Import other metals as needed
import openmm.unit as unit

# Set up the working directory
cwd = os.getcwd()

# We start by defining a function to change the directories temporarily, and another to run the shell commands, within built-in error traps

In [11]:
# Function to change directories
@contextlib.contextmanager
def set_directory(dirname: os.PathLike, mkdir: bool = False):
    pwd = os.getcwd()
    path = Path(dirname).resolve()
    if mkdir:
        path.mkdir(exist_ok=True, parents=True)
    os.chdir(path)
    yield path
    os.chdir(pwd)

# Function to run shell commands
def run_command(cmd, raise_error=True, input=None, timeout=None, **kwargs):
    """Run a shell command and handle possible errors."""
    # Popen is used to run the command
    sub = Popen(cmd, shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE, **kwargs)
    if input is not None:
        sub.stdin.write(bytes(input, encoding='utf-8'))
    try:
        out, err = sub.communicate(timeout=timeout)
        return_code = sub.poll()
    # if the command times out, kill the process
    except subprocess.TimeoutExpired:
        sub.kill()
        print(f"Command {cmd} timeout after {timeout} seconds")
        return 999, "", ""  # 999 is a special return code for timeout
    out = out.decode('utf-8')
    err = err.decode('utf-8')
    if raise_error and return_code != 0:
        raise CommandExecuteError(f"Command {cmd} failed: \n{err}")
    return return_code, out, err

# Exception for command execution errors
class CommandExecuteError(Exception):
    def __init__(self, msg):
        self.msg = msg
    def __str__(self):
        return self.msg

# Using OpenEye Toolkits to extract the ligand (1st method)

In [12]:
# from https://github.com/choderalab/perses/blob/main/examples/moonshot-mainseries/00-prep-receptor.py#L19-L35
def read_pdb_file(pdb_file):
    print(f'Reading receptor from {pdb_file}...')

    from openeye import oechem
    ifs = oechem.oemolistream()
    #ifs.SetFlavor(oechem.OEFormat_PDB, oechem.OEIFlavor_PDB_Default | oechem.OEIFlavor_PDB_DATA | oechem.OEIFlavor_PDB_ALTLOC)  # Causes extra protons on VAL73 for x1425
    ifs.SetFlavor(oechem.OEFormat_PDB, oechem.OEIFlavor_PDB_Default | oechem.OEIFlavor_PDB_DATA )

    if not ifs.open(pdb_file):
        oechem.OEThrow.Fatal("Unable to open %s for reading." % pdb_file)

    mol = oechem.OEGraphMol()
    if not oechem.OEReadMolecule(ifs, mol):
        oechem.OEThrow.Fatal("Unable to read molecule from %s." % pdb_file)
    ifs.close()

    return (mol)

# script from extract_ligand_oedu.py: https://docs.eyesopen.com/toolkits/python/oechemtk/oebio_examples/oebio_example_extract_ligand.html#section-example-oebio-extract-ligand
def ExtractLigandFromDU(du, ofs):
# @ <SNIPPET-ExtractLigandFromDesignUnit>
    lig = oechem.OEGraphMol()
    if not du.GetLigand(lig):
        oechem.OEThrow.Fatal("Error: Could not extract ligand from the OEDesignUnit.")
    oechem.OEWriteMolecule(ofs,lig) 
# @ </SNIPPET-ExtractLigandFromDesignUnit>

    ofs.close()

# Using OpenEye Toolkits to extract the ligand (2nd method in case the 1st fails)

In [13]:
# Function to read a molecular complex/ligand/protein from a file
def oe_read_molecule(filename):
    ifs = oechem.oemolistream()
    if not ifs.open(filename):
        oechem.OEThrow.Fatal("Unable to open file %s" % filename)
    mol = oechem.OEGraphMol()
    oechem.OEReadMolecule(ifs, mol)
    ifs.close()
    return mol

def oe_split_complex(input_filename, ligand_output_filename, protein_output_filename=None):
    # Read the complex
    complex_molecule = oe_read_molecule(input_filename)

    # Split the complex
    protein = oechem.OEGraphMol()
    ligand = oechem.OEGraphMol()
    water = oechem.OEGraphMol()
    other = oechem.OEGraphMol()

    # Split the complex
    oechem.OESplitMolComplex(ligand, protein, water, other, complex_molecule)

    # Write the components to files
    oechem.OEWriteMolecule(oechem.oemolostream(ligand_output_filename), ligand)
    
    # we dont need the water and other components
    # oechem.OEWriteMolecule(oechem.oemolostream(protein_output_filename), protein)
    # oechem.OEWriteMolecule(oechem.oemolostream(output_basename + "_water.pdb"), water)
    # oechem.OEWriteMolecule(oechem.oemolostream(output_basename + "_other.pdb"), other)

# Example usage
# input_filename = "./Mpro-x0072_0.pdb"  # Replace with your file path
# ligand_output_filename = "./Mpro-x0072_0_ligand.pdb"
# protein_output_filename = "./Mpro-x0072_0_protein.pdb"
# oe_split_complex(input_filename, ligand_output_filename, protein_output_filename)

Define some classes and functions to download, rename and split the ligand and the receptor from a complex.<br>
For ligand, we are using OpenEye to extract, for receptors (proteins), we use a customisable biopython module. (Please do not use biopython to extract ligands, as it does not retain the position of the ligand atoms)

In [14]:
# using Biopython to extract the receptor
class ReceptorSelect(Select):
    '''This class is used to select only the receptor residues from a PDB file, by excluding any HETATM and any water.'''

    def __init__(self, preserve_water=False, preserve_metal=True):
        self.preserve_water = preserve_water
        # List of common heavy metal ions in protein structures
        self.preserve_metal = preserve_metal
        # I deleted the heavy metal ions as they are not a valid AutoDock type! Be cautious here!
        self.heavy_metal_ions = ['ZN', 'FE', 'MG', 'CA', 'MN', 'CO', 'CU', 'NI', 'MO', 'W','YB', 'K', 'NA']  # TODO: Add more metal ions as needed


    def accept_residue(self, residue):
        # Exclude water molecules, assumed to have residue name 'HOH' or 'WAT'
        #TODO: What if there's water in the binding site?
        resname = residue.get_resname()
        # print(f"Processing residue: {resname}")

        if resname in ['HOH', 'WAT']:
            if self.preserve_water:
                # print("Preserving water molecule")
                return True
            else:
                # print("Excluding water molecule")
                return False

        # Include heavy metal ions
        if resname in self.heavy_metal_ions:
            if self.preserve_metal:
                # print("Preserving metal ion")
                return True
            else:
                # print("Excluding metal ion")
                return False
        
        # Check if any atom in the residue is from an ATOM record
        for atom in residue:
            '''in biopython, atom.get_full_id()[3] is accessing the fourth element of the full ID. This element represents the atom name, atom.get_full_id()[3][0] is accessing the first character of the atom name. This character represents the element symbol of the atom. The condition atom.get_full_id()[3][0] == ' ' checks whether the first character of the atom name is a space. If it is a space, then the atom is from an ATOM record, otherwise it is from a HETATM record.'''
            if atom.get_full_id()[3][0] == ' ':
                return True
            
        # print("Excluding residue")
        return False

def download_pdb_file(pdb_id):
    """Download PDB file using PDB ID."""
    pdbl = PDBList()
    filename = pdbl.retrieve_pdb_file(pdb_id, file_format='pdb', pdir='.', overwrite=True)
    return filename

def get_structure(filename):
    """Parse the structure from a PDB file."""
    parser = PDBParser()
    structure = parser.get_structure('structure', filename)
    return structure

def save_receptor(structure, receptor_file, preserve_water=False, preserve_metal=False):
    """Save the receptor part of the PDB file."""
    io = PDBIO()
    io.set_structure(structure)
    io.save(receptor_file, ReceptorSelect(preserve_water=preserve_water,
    preserve_metal = preserve_metal)) 

# Example usage:
# pdb_id = '1abc'
# filename = download_pdb_file(pdb_id)
# structure = get_structure(filename)
# save_receptor(structure, 'receptor.pdb', preserve_water=True)

def fix_receptor_file(input_receptor_file):
    '''This function must be used for complexes without crystal structurers on PDB, as there could be places where things should be fixed before docking. This must be done or else the docking software might report errors. '''

    #TODO: Fix this function - it unwantedly removes metal ions and water from the structure

    # Initialize the PDBFixer with the input file
    fixer = PDBFixer(filename=input_receptor_file)
    
    # Find and fix missing residues, atoms, and hydrogens
    fixer.findMissingResidues()
    fixer.findMissingAtoms()
    fixer.addMissingAtoms()
    fixer.addMissingHydrogens(7.0)
    
    # Define the output file name
    output_receptor_file = input_receptor_file.replace('.pdb', '_fixed.pdb')
    
    # Save the fixed receptor file
    with open(output_receptor_file, 'w') as output_file:
        PDBFile.writeFile(fixer.topology, fixer.positions, output_file)
    
    print(f"Fixed receptor file saved as: {output_receptor_file}")
    
    return output_receptor_file

# def fix_receptor_file(input_receptor_file):
#     '''This function must be used for complexes without crystal structures on PDB, as there could be places where things should be fixed before docking. This must be done or else the docking software might report errors.'''

#     # Initialize the PDBFixer with the input file
#     fixer = PDBFixer(filename=input_receptor_file)
    
#     # Preserve metal ions
#     metals = ['ZN', 'FE','CA']  # Add other metal symbols as needed
#     metal_atoms = []
#     for chain in fixer.topology.chains():
#         for residue in chain.residues():
#             if residue.name in metals:
#                 for atom in residue.atoms():
#                     metal_atoms.append((atom, atom.element, atom.name, residue.id, chain.id))

#     # Find and fix missing residues, atoms, and hydrogens
#     fixer.findMissingResidues()
#     fixer.findMissingAtoms()
#     fixer.addMissingAtoms()
#     fixer.addMissingHydrogens(7.0)

#     # Add metal ions back
#     for atom, element, name, residue_id, chain_id in metal_atoms:
#         chain = next(chain for chain in fixer.topology.chains() if chain.id == chain_id)
#         residue = chain.residue(residue_id)
#         if not any(a.name == name for a in residue.atoms()):
#             new_atom = residue.addAtom(name, element)
#             fixer.positions.append((atom.element.mass*unit.dalton).value_in_unit(unit.angstrom))

#     # Define the output file name
#     output_receptor_file = input_receptor_file.replace('.pdb', '_fixed.pdb')

#     # Save the fixed receptor file
#     with open(output_receptor_file, 'w') as output_file:
#         PDBFile.writeFile(fixer.topology, fixer.positions, output_file)
    
#     print(f"Fixed receptor file saved as: {output_receptor_file}")
    
#     return output_receptor_file

# Example usage:
# fixed_file = fix_receptor_file('Mpro-x0072_0_receptor.pdb')

def rename_file(old_filename, new_filename):
    """Rename the downloaded file to a standard name."""
    if os.path.exists(old_filename) and not os.path.exists(new_filename):
        os.rename(old_filename, new_filename)
    return new_filename

def extract_ligand(filename, ligand_file):
    """Extract the ligand from the complex."""
    print(f'Extracting ligand from {filename}...')
    complex = read_pdb_file(filename)
    success, du = make_design_unit(complex)
    
    if success:
        oeoutfile = oechem.oemolostream(ligand_file)
        ExtractLigandFromDU(du, oeoutfile)
    else:
        print(f'Failed to extract ligand from {filename} using the OpenEye Toolkit,' 
              'trying a different script written using OESplitMolComplex...')
        # Extract ligand using shell script
        complex_id = filename.rsplit('.')[0]

        # script written using OESplitMolComplex:
        oe_split_complex(input_filename = filename, 
                         ligand_output_filename = f"{ligand_file}")
        
        # this is my shell script - not guaranteed to work for everything and hence not used!
        # run_command(f'./extract_ligand.sh {complex_id}') 

def pdb_to_prot_lig(pdb_id, filename, preserve_water=False, preserve_metal=False):
    """Main function to handle the splitting of protein and ligand."""
    structure = get_structure(filename)

    # naming the output files
    receptor_file = f"{pdb_id}_receptor.pdb"
    ligand_file = f"{pdb_id}_ligand.pdb"
    complex_file = rename_file(filename, f"{pdb_id}.pdb")

    # ---old codes (uses Biopython for receptor extraction, and ASAP for ligand extraction - slower)---
    save_receptor(structure, receptor_file, preserve_water, preserve_metal)
    receptor_file = fix_receptor_file(receptor_file)
    extract_ligand(complex_file, ligand_file)

    # ---new codes (uses OE for ligand and receptor extraction)---
    # split the complex into protein and ligand
    # oe_split_complex(input_filename = filename, ligand_output_filename = ligand_file, protein_output_filename = receptor_file)
    
    print(f'The complex file has been saved as {complex_file}')
    print(f'The receptor file has been saved as {receptor_file}')
    print(f'The ligand file has been saved as {ligand_file}')
    return complex_file, receptor_file, ligand_file

# Example usage:
# filename = download_pdb_file(pdb_id) # if starting from a PDB ID
# receptor_file, ligand_file = pdb_to_prot_lig(pdb_id, filename)

Prepare the protein and ligands for Vina, the prepped files end in .pdbqt

In [15]:
class DockingPrepper:
    '''This class is used to prepare the ligand and protein files for docking using AutoDock Vina. It uses the ADFRsuite to convert the ligand and protein files to pdbqt format.
    # Example usage:
    prep = DockingPrepper(folder='path/to/folder', pdb_id = 'pdb_id.pdb', lig_file='ligand.pdb', prot_file='protein.pdb')
    
    # ---- AutoDock Vina Preparation ----
    prep.vina_process()
    
    # Alternatively...
    # ---- OpenEye Preparation ----
    prep.oe_process()
    '''
    def __init__(self, folder, pdb_id, lig_file, prot_file, lig_out_name=None, prot_out_name=None, add_h=True, preserve_water=False, preserve_metal=True):
        self.folder = folder
        self.pdb_id = pdb_id
        self.lig_file = lig_file
        self.prot_file = prot_file
        self.lig_out_name = lig_out_name or f"{lig_file.split('.')[0]}.pdbqt"
        self.prot_out_name = prot_out_name or f"{prot_file.split('.')[0]}.pdbqt"
        self.add_h = add_h
        self.preserve_water = preserve_water
        self.preserve_metal = preserve_metal #TODO/WARMING: This is not implemented yet!

    # ----------------- AutoDock Vina Preparation -----------------
    def vina_prepare_ligand(self):
        """Prepare the ligand file for Vina."""
        if self.add_h:
            cmd = f'./ADFRsuite/ADFRsuite_x86_64Linux_1.0/bin/prepare_ligand -l {self.lig_file} -o {self.lig_out_name} -A hydrogens'
        else:
            cmd = f'./ADFRsuite/ADFRsuite_x86_64Linux_1.0/bin/prepare_ligand -l {self.lig_file} -o {self.lig_out_name}'
        self._run_command(cmd, f'The ligand file has been saved as {self.lig_out_name}')

    def vina_prepare_protein(self):
        """Prepare the protein file for Vina."""
        if self.preserve_water:
            cmd = f'./ADFRsuite/ADFRsuite_x86_64Linux_1.0/bin/prepare_receptor -r {self.prot_file} -o {self.prot_out_name} -A checkhydrogens -U nphs_lps_nonstdres'
            '''        
            [-U]  cleanup type:
             'nphs': merge charges and remove non-polar hydrogens
             'lps': merge charges and remove lone pairs
             'waters': remove water residues
             'nonstdres': remove chains composed entirely of residues of
                      types other than the standard 20 amino acids
             'deleteAltB': remove XX@B atoms and rename XX@A atoms->XX
             (default is 'nphs_lps_waters_nonstdres') 
             '''
        else:
            cmd = f'./ADFRsuite/ADFRsuite_x86_64Linux_1.0/bin/prepare_receptor -r {self.prot_file} -o {self.prot_out_name} -A checkhydrogens'
        self._run_command(cmd, f'The protein file {self.prot_file} has been converted to pdbqt format and saved in {self.prot_out_name}')

    def _run_command(self, cmd, success_message):
        """Run a shell command and handle possible errors."""
        try:
            subprocess.run(cmd, shell=True, check=True, timeout=999)
            print(success_message)
        except subprocess.CalledProcessError as e:
            print(f"Error occurred: {e.output}")
            raise e
        except subprocess.TimeoutExpired:
            print(f"Command '{cmd}' timed out.")
            raise e

    def vina_process(self):
        """Process the ligand and protein files."""
        with set_directory(self.folder):
        # if the lig_file and prot_file are not in pdb format, convert them to pdb format using obabel
            if self.lig_file.split('.')[-1] != 'pdb':
                lig_name = self.lig_file.split('.')[0]
                cmd = f'obabel {self.lig_file} -O {lig_name}.pdb'
                self._run_command(cmd, f'The ligand file has been converted to {lig_name}.pdb')
                self.lig_file = f'{lig_name}.pdb'

            if self.prot_file.split('.')[-1] != 'pdb':
                prot_name = self.prot_file.split('.')[0]
                cmd = f'obabel {self.prot_file} -O {prot_name}.pdb'
                self._run_command(cmd, f'The protein file has been converted to {prot_name}.pdb')
                self.prot_file = f'{prot_name}.pdb'
                
            self.vina_prepare_ligand()
            self.vina_prepare_protein()
    
    # ----------------- OpenEye Preparation -----------------
    def oe_make_design_unit(self):
        ''' 
        This function makes design units from a protein-ligand complex.
        
        Input: input_file - the protein-ligand complex file
        Saved to file: {input_file_basename}_DU_{i}.oedu 
        Output: output_files - a list of the design unit files
        '''
        input_file = f"{self.pdb_id}.pdb"
    
        os.system(f"python ./OpenEye/make_design_units.py {input_file}")
        
        # this is the output pattern for the design unit files
        output_pattern = os.path.basename(input_file)[:-4] + "_DU_{}.oedu"
        # List the files in the current directory and filter for the expected output pattern
        output_files = []
        i = 0
        while True:
            output_file = output_pattern.format(i)
            if os.path.exists(output_file):
                output_files.append(output_file)
                i += 1
            else:
                break

        # output_files now contains the names of the output files
        print(f"Design unit was successfully made for {input_file}, output is saved to {output_files}.")
        return output_files

    def oe_make_receptor(self, input_file):

        oe_output_files = [] 
        for ifile in input_file:
            output_basename = os.path.basename(ifile)[:-5]
            os.system(f"python ./OpenEye/MakeReceptor.py -in {ifile} -out {output_basename}_receptor.oedu") # this will output {ifile}_receptor.oedu

            oe_output_files.append(f"{output_basename}_receptor.oedu")
            print(f'The receptor design unit file has been saved as {output_basename}_receptor.oedu')

        return oe_output_files 
    
    # Example usage:
    # input_file = "6EQ2.pdb"
    # complex_DU = oe_make_design_unit(input_file) # this will output ['6EQ2_DU_0.oedu']
    # receptor_DU = oe_make_receptor(complex_DU) # this will output ['6EQ2_DU_0_receptor.oedu']

    def oe_process(self):
        '''Process the ligand and protein files using OpenEye.'''
        print(f'Preparing the ligand and protein files using OpenEye Toolkits...')
        with set_directory(self.folder):
            complex_DU = self.oe_make_design_unit()
            oe_output_files = self.oe_make_receptor(input_file=complex_DU)
        
        # self.oe_output_files = [f"{self.pdb_id}_DU_0_receptor.oedu"]
        return oe_output_files

# Define a function to get the COM of the ligand file and hence a config file.

In [16]:
def get_COM(file):
        if file.endswith('mol2') or file.endswith('xyz'):
            mol = load_one(file)
            ase_mol = Atoms(numbers=mol.atnums, positions=mol.atcoords / angstrom)
        elif file.endswith('sdf'):
            ase_mol = read_sdf(file)
        elif file.endswith('pdb'):
            ase_mol = read(file)
        else:
            raise NotImplementedError(f"File extension not supported for {file}")

        return ase_mol.get_center_of_mass()
    
def write_config_vina(lig_pdbqt,prot_pdbqt,center, config_fp = "config.txt", weights=None, boxsize=50, exhaustiveness=32, num_modes=1, energy_range=30, **kwargs):
    '''
    Write the config file for AutoDock Vina docking
    :param exhaustiveness: int, the exhaustiveness of the docking
    :param num_modes: int, the number of modes (conformations) to be generated
    :param energy_range: int, the energy range of the docking
    '''

    lines = ["receptor = {}".format(prot_pdbqt),
            "ligand = {}".format(lig_pdbqt),
            "scoring = vina",
            "",
            "center_x = {}".format(center[0]),
            "center_y = {}".format(center[1]),
            "center_z = {}".format(center[2]),
            "",
            "size_x = {}".format(boxsize),
            "size_y = {}".format(boxsize),
            "size_z = {}".format(boxsize),
            "",
            # "exhaustiveness = {}".format(exhaustiveness),
            # "num_modes = {}".format(num_modes),
            # "energy_range = {}".format(energy_range),
            ]
    if weights is not None:
        assert len(weights) == 6, "Autodock vina needs 6 weights"
        # --weight_gauss1 1 --weight_gauss2 0 --weight_repulsion 0  --weight_hydrophobic 0 --weight_hydrogen 0 --weight_rot 0"
        lines.extend([
            f"weight_gauss1 = {weights[0]}",
            f"weight_gauss2 = {weights[1]}",
            f"weight_repulsion = {weights[2]}",
            f"weight_hydrophobic = {weights[3]}",
            f"weight_hydrogen = {weights[4]}",
            f"weight_rot = {weights[5]}",
        ])
    with open(config_fp, "w") as f:
        f.write("\n".join(lines))


# Defining a function to run the AutoDock Vina and obtain the docking score

In [18]:
class DockingScorer:
    '''This class is used to score the ligand using AutoDock Vina. It uses the prepared ligand and protein files to run Vina and extract the docking score.
    Example usage:
    scorer = DockingScorer(folder='path/to/folder', lig_file='ligand.pdbqt', prot_file='protein.pdbqt', save_out_file=True)
    docking_score = scorer.vina_score_ligand()
    print(f"Docking Score: {docking_score}")
    '''
    def __init__(self, 
                 folder, 
                 lig_file, 
                 prot_file,
                 complex_file=None,
                 get_vina_poses=False,
                 weights=None, 
                 save_out_file=True, 
                 protein_sequence=None, 
                 smiles=None, 
                 exp_binding_affinity=None,
                 from_pdb=True,
                 csv_out_file='docking_data_playground_from_pdb.csv',
                 receptor_DU=None):
        self.folder = folder
        self.lig_file = lig_file
        self.prot_file = prot_file
        self.complex_file = complex_file
        self.lig_name = os.path.splitext(lig_file)[0]
        self.prot_name = os.path.splitext(prot_file)[0]
        self.complex_name = os.path.splitext(complex_file)[0]
        self.pdb_id = self.prot_name.split('_')[0]
        self.get_vina_poses = get_vina_poses
        self.weights = weights
        self.save_out_file = save_out_file
        #TODO: more work required here to extract the protein sequence, ligand SMILES string and exp dG value, from an online db (low priority)
        self.protein_sequence = protein_sequence
        self.smiles = smiles
        self.exp_binding_affinity = exp_binding_affinity
        self.from_pdb = from_pdb
        self.csv_out_file = csv_out_file
        self.receptor_DU = receptor_DU

    def check_files(self):
        '''Check and download the necessary files if needed'''
        if not os.path.isfile(f"{self.folder}/{self.prot_name}.pdbqt") or not os.path.isfile(f"{self.folder}/{self.lig_name}.pdbqt"):
            filename = download_pdb_file(self.pdb_id)
            self.complex_file, self.prot_file, self.lig_file = pdb_to_prot_lig(self.pdb_id, filename)
    #TODO: add a functionality to use Vina GPU instead (low priority)
    
    # ----------------- AutoDock Vina Scoring -----------------
    def run_vina(self):
        '''Run Vina with the prepared files and return the output.'''
        
        with set_directory(self.folder):
            
            # extract a reference ligand from the protein-ligand complex
            ref_ligand = f"{self.complex_name}_ref_ligand.pdb"
            extract_ligand(self.complex_file, ref_ligand)

            # get the COM from the reference ligand so we can define a docking grid
            ligand_COM = get_COM(ref_ligand)
            write_config_vina(f'{self.lig_name}.pdbqt', f'{self.prot_name}.pdbqt',ligand_COM,config_fp = f"{self.lig_name}_{self.prot_name}_config.txt", weights=None)
            cmd = f"./vina/vina_1.2.5_linux_x86_64 --config {self.lig_name}_{self.prot_name}_config.txt --score_only" #TODO: make it so that this can also output poses

            if self.get_vina_poses == True:
                cmd = f"./vina/vina_1.2.5_linux_x86_64 --config {self.lig_name}_{self.prot_name}_config.txt"

            try:
                out, err = "", ""  # Initialize out and err
                code, out, err = run_command(cmd, timeout=100)
                if code != 0:
                    raise CommandExecuteError(f"Command failed with return code {code}")
                
                return out
            except CommandExecuteError as e:
                print(f"Error in {self.pdb_id}: {e}")
                print("out: ", out)
                print("err: ", err)
                raise e

    def extract_vina_score(self, out):
        '''Extract the docking score from Vina output.'''
        if not self.get_vina_poses:
            strings = re.split('Estimated Free Energy of Binding   :', out)
            line = strings[1].split('\n')[0]
            energy = float(line.strip().split()[0])
            return energy
        else:
            # TODO: if get_vina_poses is True, return the output and the energy
            energy = []
            for line in out.split('\n'):
                match = re.search(r'^\s*\d+\s+(-\d+\.\d+)', line)
                if match:
                    energy.append(float(match.group(1)))
            return min(energy) if energy else None

    
    # ----------------- OpenEye Toolkits Scoring -----------------
    def oe_clean_then_dock(self):
        ''' This function docks a ligand to a receptor using OpenEye's CleanThenDockMolecules.py script.
        Input: lig_file - the ligand file to be docked
            receptor_DU - the receptor file to dock the ligand to
        Output: output_files - a list of the docked ligand files, which also contains the chemgauss4 score.'''
        
        output_files = [] 

        if self.receptor_DU is None:
            print("No receptor design unit file provided. Please provide a receptor file to dock the ligand.")
            raise ValueError("No receptor design unit file provided.")
        for receptor_du in self.receptor_DU:   
            output_basename = f"{receptor_du.split('.')[0]}_{self.lig_name}"
            os.system(f'python ./OpenEye/CleanThenDockMolecules.py -in {self.lig_file} -out {output_basename}_docked.sdf -receptor {receptor_du}') # output score is Chemgauss4, contained in the sdf file
            
            output_files.append(f"{output_basename}_docked.sdf")
        
        print(f'The docked ligand file has been saved as {output_files}')
        return output_files    
    

    ################## combined
    # # TODO: modify this function to calculate and extract the other scores from the docked ligands
    def extract_chemgauss4_scores(self, docked_ligand):
        ''' This function extracts the Chemgauss4 scores from the docked ligands using RDKit and falls back to regex if needed.'''
        
        # Handle if docked_ligand is passed as a list
        if isinstance(docked_ligand, list):
            docked_ligand = docked_ligand[0]

        chemgauss4_scores = []

        try:
            # Try using RDKit-based approach
            supplier = Chem.SDMolSupplier(docked_ligand)
            for mol in supplier:
                if mol is None:
                    raise ValueError("Error: Invalid molecule found, try the regex-based approach.")
                
                if mol.HasProp('Chemgauss4'):
                    energy = mol.GetProp('Chemgauss4')
                    energy = float(energy)
                    chemgauss4_scores.append(energy)

                    if energy is None or chemgauss4_scores is None:
                        raise ValueError("Energy or chemgauss4_scores is None")

        except Exception as e:
            print(f"RDKit approach failed: {e}, trying regex-based approach...")
            # Fallback to regex-based approach
            chemgauss4_pattern = re.compile(r'> <Chemgauss4>\s+(-?\d+\.\d+)')
            try:
                with open(docked_ligand, 'r') as file:
                    sdf_content = file.read()

                matches = chemgauss4_pattern.findall(sdf_content)
                chemgauss4_scores = [float(score) for score in matches]

            except FileNotFoundError:
                print(f"Error: The file {docked_ligand} was not found.")
            except Exception as e:
                print(f"An error occurred while extracting Chemgauss4 scores: {e}")

        return chemgauss4_scores



    # # Example usage
    # docked_ligand = oe_clean_then_dock(ligand_file, receptor_DU) # this will output ['6EQ2_ligand_docked.sdf'], suppose ligand_file = '6EQ2_ligand.pdb'
    # Extracting Chemgauss4 scores from the docked ligands
    # scores = extract_chemgauss4_scores(docked_ligand)

    # ----------------- Putting things together -----------------
    # make the output into a pandas dataframe
    # add pdb_id, protein sequence, ligand SMILES string, dG, exp_dG as a column in the dataframe
    def extract_data_from_leakypdb(self, df):
        '''Extract the protein sequence, ligand SMILES string, and experimental binding affinity from the specified dataframe (leakypdb_test.csv).'''

        # # Read the DataFrame (must be done so that df is correctly recognised as a DataFrame, not a string.)    
        # df = pd.read_csv(df)

        # Filter the DataFrame for the specific PDB ID
        filtered_df = df[df['pdb_id'] == self.pdb_id] 

        # Check if the specific columns exist in the DataFrame
        required_columns = ['pdb_id','smiles', 'protein_sequence', 'binding_affinity']
        missing_columns = [col for col in required_columns if col not in df.columns]
        if missing_columns:
            return f"Missing columns in the data: {', '.join(missing_columns)}"
        # Extract the needed information
        result = filtered_df[required_columns]
        return result    

    def save_output(self, out, energy):
        '''Save the vina output to a file if required.'''
        if self.save_out_file:
            if self.from_pdb:
                with open(f"{self.pdb_id}.out", 'w') as f:
                    # Convert list to string if 'out' is a list
                    if isinstance(out, list):
                        out = "\n".join(out)
                    f.write(out)
                print(f"Output saved as {self.pdb_id}.out\n")

                df = self.extract_data_from_leakypdb(leakypdb) # define the dataframe to extract data from

                # add data to the dataframe: Assign the energy value to a new column 'computed_dG' for the filtered rows
                df.loc[df['pdb_id'] == self.pdb_id, 'computed_dG'] = energy
                updated_rows = df[df['pdb_id'] == self.pdb_id]

                if not os.path.isfile(f'{self.csv_out_file}'):
                    updated_rows.to_csv(f'{self.csv_out_file}', index=False)
                else: # else it exists so append without writing the header
                    updated_rows.to_csv(f'{self.csv_out_file}', mode='a', header=False, index=False)
                print(f"Data saved to {self.csv_out_file}")
                return updated_rows
            else:
                if isinstance(energy, list): 
                    for i, e in enumerate(energy):
                        with open(f"{self.prot_name}_{self.lig_name}_{i}.out", 'w') as f:
                            # Convert list to string if 'out' is a list
                            if isinstance(out, list):
                                out = "\n".join(out)
                            f.write(out)
                        print(f"Output saved as {self.prot_name}_{self.lig_name}_{i}.out\n")  
                        
                        lig_name_with_index = f'{self.lig_name}_{i}'

                        df = pd.DataFrame({'ligand_name': [lig_name_with_index], 
                                           'protein_name': [self.prot_name], 
                                           'computed_dG': [e],
                                           'error_message': [None]})
                                           
                        if not os.path.isfile(f'{self.csv_out_file}'):
                            df.to_csv(f'{self.csv_out_file}', index=False)
                        else: # else it exists so append without writing the header
                            df.to_csv(f'{self.csv_out_file}', mode='a', header=False, index=False)
                        print(f"Data saved to {self.csv_out_file}")

                else: 
                    with open(f"{self.prot_name}_{self.lig_name}.out", 'w') as f:
                        # Convert list to string if 'out' is a list
                        if isinstance(out, list):
                            out = "\n".join(out)
                        f.write(out)
                    print(f"Output saved as {self.prot_name}_{self.lig_name}.out\n")
                    
                    df = pd.DataFrame({'ligand_name': [self.lig_name], 
                                    'protein_name': [self.prot_name], 
                                    'computed_dG': [energy],
                                    'error_message': [None]})

                    if not os.path.isfile(f'{self.csv_out_file}'):
                        df.to_csv(f'{self.csv_out_file}', index=False)
                    else: # else it exists so append without writing the header
                        df.to_csv(f'{self.csv_out_file}', mode='a', header=False, index=False)
                    print(f"Data saved to {self.csv_out_file}")
                return df
            
    ######################## FINAL FUNCTIONS ########################
    def vina_score_ligand(self):
        """Main method to score the ligand using Vina."""
        if self.from_pdb:
            self.check_files()
        try:
            out = self.run_vina()
            energy = self.extract_vina_score(out)
        except Exception as e:
            print(f"Error in {self.pdb_id}: {e}")
            out = f"Error in {self.pdb_id}: {e}"
            energy = None
            raise e
        
        print(f"{self.pdb_id}: Estimated Free Energy of Binding = {energy} kcal/mol")
        self.save_output(out, energy)
        return energy

    def oe_score_ligand(self):
        '''Main method to score the ligand using OpenEye Toolkits.'''
        # if self.from_pdb:
        #     self.check_files()
        try:
            docked_ligand = self.oe_clean_then_dock()
            # TODO: out should be the output on the commandline.... not the docked_ligand
            out = docked_ligand
            energies = []
            if isinstance(docked_ligand, list) and len(docked_ligand) > 1:
                for i, lig in enumerate(docked_ligand):
                    energy = self.extract_chemgauss4_scores(lig)
                    # debug line
                    # print('debug line:', energy)
                    if energy is not None:
                        self.save_output(out, energy)
                        if isinstance(energy, list): 
                            for j, e in enumerate(energy):
                                print(f"Docked ligand {lig}, molecule {j+1}: Chemgauss4 score = {e:.2f}")
                                # self.save_output(out, e)
                                energies.append(e)
                        else: 
                            print(f"Docked ligand {lig}, molecule {i+1}: Chemgauss4 score = {energy:.2f}")
                            # self.save_output(out, energy)
                            energies.append(energy)
            else:
                if isinstance(docked_ligand, list):
                    docked_ligand = docked_ligand[0]
                    
                energy = self.extract_chemgauss4_scores(docked_ligand)
                if energy is not None:
                    self.save_output(out, energy)
                    if isinstance(energy, list): 
                        for j, e in enumerate(energy):
                            print(f"Docked ligand {docked_ligand}, molecule {j+1}: Chemgauss4 score = {e:.2f}")
                            # self.save_output(out, e)
                            energies.append(e)
                    else: 
                        print(f"Docked ligand {docked_ligand}, molecule {i+1}: Chemgauss4 score = {energy:.2f}")
                        # self.save_output(out, energy)
                        energies.append(energy)
                else:
                    self.save_output(out, energy)
                    energies = None
                    raise ValueError(f"{self.pdb_id}: No valid energy value available.")            
            # debug line
            # print(energies)
        except Exception as e:
            print(f"Error in {self.pdb_id}: {e}")
            out = f"Error in {self.pdb_id}: {e}"
            energies = None
            print(f"{self.pdb_id}: No valid energy value available.")
            # self.save_output(out, energies)
            raise e
    
        return energies

# Full Pipeline: From protein-ligand complex to dG scores 
Using the vina pipeline first:

In [19]:
# download the PDB file using the PDB ID
# filename = download_pdb_file(pdb_id)
# # split the protein and ligand from the PDB file
# receptor_file, ligand_file = pdb_to_prot_lig(pdb_id, filename)
# this is for when lig_files and prot_files are lists...
# for lig, prot in tqdm(zip(lig_files, prot_files)):
#         vina_process_lig_prot(lig, prot)

def vina_process_lig_prot(lig_file, prot_file, complex_file, preserve_water=False, preserve_metal=True, csv_out_file='vina_docking_data.csv'):
    '''Final function to process each protein and ligand pair, prepare them for docking, score the ligand and save the output files.
    Both lig_file, prot_file and complex_file must be strings, not lists. If you want to process multiple ligands and proteins (i.e. LISTS), you should use this function in the following manner: 

    for lig, prot in tqdm(zip(lig_files, prot_files)):
        vina_process_lig_prot(lig, prot, complex_file)
        ...

    The reference complex file needs to be a protein-ligand complex, with a docked ligand in the binding site of the protein of interest.
    '''
    # TODO: handling lists..?
    if isinstance(lig_file, list) or isinstance(prot_file, list) or isinstance(complex_file, list):
        raise ValueError("lig_file and prot_file must be strings, not lists. If you are trying to process multiple ligands and proteins, you should use this function in a loop.")
    
    # check if any of the input is none
    if lig_file is None or prot_file is None or complex_file is None:
        raise ValueError("lig_file, prot_file and complex_file must not be None. You risk deleting everything in your current directory if any of these is None!")

    lig_name = os.path.splitext(lig_file)[0]
    prot_name = os.path.splitext(prot_file)[0]
    complex_name = os.path.splitext(complex_file)[0]

    try:      
        os.environ['OE_LICENSE'] = '/home/ian/oe_license.txt' # change this to your OE_LICENSE path

        # prepare the ligand and receptor for Vina i.e. convert to pdbqt format
        prep = DockingPrepper('.',
                                lig_file=lig_file, 
                                prot_file=prot_file,
                                pdb_id=complex_name,
                                preserve_water=preserve_water,
                                preserve_metal=preserve_metal) # this is not really the pdb_id but more of a basename for the files
        # for vina preparation
        prep.vina_process()
                
        # using the DockingScorer class to get the docking score
        scorer = DockingScorer('.', 
                                lig_file, 
                                prot_file,
                                complex_file=complex_file, # reference complex file, required for the vina process 
                                get_vina_poses=True, # this only works for vina, there's no choice for OE - LOL!
                                save_out_file=True,
                                from_pdb=False,
                                csv_out_file=csv_out_file,
                                receptor_DU=None) # None because this is for Vina, but for OE, this will be the receptor DU (required)

        docking_score = scorer.vina_score_ligand()

    except Exception as e:
        # ------- START OF CODES: save the error message to the .out file, None as the energy score, and write both the energy and the error message to a csv file -------
        # Set up logging
        logging.basicConfig(filename=f'{lig_name}_{prot_name}error_log.txt', level=logging.ERROR, format='%(asctime)s - %(levelname)s - %(message)s')
        logging.error(f"Error processing {lig_file}: {str(e)}")
        print(f"Skipping {lig_file} due to an error: {str(e)}")
        
        # saving the error message to the .txt file and None as the energy score 
        energy = None
        df = pd.DataFrame({'ligand_name': [lig_file], 
                                'protein_name': [prot_file], 
                                'computed_dG': [energy],
                                'error_message' : [str(e)]}
                                )
    
        if not os.path.isfile(csv_out_file):
            df.to_csv(csv_out_file, index=True)
        else: # else it exists so append without writing the header
            df.to_csv(csv_out_file, mode='a', header=False, index=False)   
        print(f"Error message saved to {csv_out_file}") 
        # -------END OF CODES: save the error message to the .out file, None as the energy score, and write both the energy and the error message to a csv file -------
        pass

        
    new_dir = os.path.join(cwd, f'{lig_name}_{prot_name}')
    os.makedirs(new_dir, exist_ok=True)
    
    # copy the files to the new directory, this will overwrite the files if they already exist
    os.system(f'mv -f *{lig_name}* {new_dir}')
    os.system(f'mv -f *{prot_name}* {new_dir}')
    os.system(f'mv -f *{complex_name}* {new_dir}')

    # if the new directory already exists, forcibly remove it so we can overwrite it
    if os.path.exists(f'data/{new_dir}'):
        os.system(f'rm -rf data/{new_dir}')
    # move the output dir to the data storage dir 'data'
    os.system(f'mv -f {new_dir} ./data/')
    print(f"Data saved in ./data/{new_dir}")

### OpenEye Toolkits Pipeline

With this pipeline, you need to provide an additional parameter, a protein-ligand complex, as a reference of where the active site is.

In [20]:
# Use the following code if you need to download the PDB file using the PDB ID
# download the PDB file using the PDB ID
# filename = download_pdb_file(pdb_id)
# # split the protein and ligand from the PDB file
# receptor_file, ligand_file = pdb_to_prot_lig(pdb_id, filename)
# this is for when lig_files and prot_files are lists...
# for lig, prot in tqdm(zip(lig_files, prot_files)):
#         vina_process_lig_prot(lig, prot)

In [21]:
def oe_process_lig_prot(lig_file, prot_file, complex_file, preserve_water=False, preserve_metal=True, csv_out_file='oe_docking_data.csv'):
    '''Final function to process each protein and ligand pair, prepare them for docking, score the ligand and save the output files.
    Both lig_file and prot_file must be strings, not lists. If you want to process multiple ligands and proteins (i.e. LISTS), you should use this function in the following manner: 
    for lig, prot in tqdm(zip(lig_files, prot_files)):
        oe_process_lig_prot(lig, prot)
        ...
    '''

    # TODO: handling lists..?
    if isinstance(lig_file, list) or isinstance(prot_file, list) or isinstance(complex_file, list):
        raise ValueError("lig_file and prot_file must be strings, not lists. If you are trying to process multiple ligands and proteins, you should use this function in a loop.")
    
    # check if any of the input is none
    if lig_file is None or prot_file is None or complex_file is None:
        raise ValueError("lig_file, prot_file and complex_file must not be None. You risk deleting everything in your current directory if any of these is None!")
    
    lig_name = os.path.splitext(lig_file)[0]
    prot_name = os.path.splitext(prot_file)[0]
    complex_name = os.path.splitext(complex_file)[0]

    try:
        os.environ['OE_LICENSE'] = '/home/ian/oe_license.txt' # change this to your OE_LICENSE path
        
        # prepare the ligand and receptor
        prep = DockingPrepper('.',
                                lig_file=lig_file, 
                                prot_file=prot_file,
                                pdb_id=complex_name,
                                preserve_water=preserve_water,
                                preserve_metal=preserve_metal) # this is not really the pdb_id but more of a basename for the files

        # OE preparation
        DU = prep.oe_process()
        
        # using the DockingScorer class to get the docking score
        scorer = DockingScorer('.', 
                            lig_file, 
                            prot_file, 
                            save_out_file=True,
                            from_pdb=False,
                            complex_file=complex_file,
                            csv_out_file=csv_out_file,
                            receptor_DU=DU) # for OE, this will be the receptor DU produced from DockingPrepper.oe_process (required)
                            
        docking_score = scorer.oe_score_ligand()

    except Exception as e:
        # ------- START OF CODES: save the error message to the .out file, None as the energy score, and write both the energy and the error message to a csv file -------
        # Set up logging
        logging.basicConfig(filename=f'{lig_name}_{prot_name}error_log.txt', level=logging.ERROR, format='%(asctime)s - %(levelname)s - %(message)s')
        logging.error(f"Error processing {lig_file}: {str(e)}")
        print(f"Skipping {lig_file} due to an error: {str(e)}")
        
        # saving the error message to the .txt file and None as the energy score 
        energy = None
        df = pd.DataFrame({'ligand_name': [lig_file], 
                                'protein_name': [prot_file], 
                                'Chemgauss4': [energy],
                                'error_message' : [str(e)]}
                                )
    
        if not os.path.isfile(csv_out_file):
            df.to_csv(csv_out_file, index=True)
        else: # else it exists so append without writing the header
            df.to_csv(csv_out_file, mode='a', header=False, index=False)   
        print(f"Error message saved to {csv_out_file}") 
        # -------END OF CODES: save the error message to the .out file, None as the energy score, and write both the energy and the error message to a csv file -------

        pass  # This will skip the current lig and proceed with the next one

    new_dir = os.path.join(cwd, f'{lig_name}_{prot_name}')
    os.makedirs(new_dir, exist_ok=True)
    
    # copy the files to the new directory, this will overwrite the files if they already exist
    os.system(f'mv -f *{lig_name}* {new_dir}')
    os.system(f'mv -f *{prot_name}* {new_dir}')
    os.system(f'mv -f *{complex_name}* {new_dir}')

    # if the new directory already exists, forcibly remove it so we can overwrite it
    if os.path.exists(f'data/{new_dir}'):
        os.system(f'rm -rf data/{new_dir}')
    # move the output dir to the data storage dir 'data'
    os.system(f'mv -f {new_dir} ./data/')
    print(f"Data saved in ./data/{new_dir}")

# If the input SDF file has more than 1 molecules, then it needs to be split. For example, SDF files generated by diffusion model typically have more than 1 molecules

In [22]:
def split_sdf(input_sdf):
    # Load the multi-molecule SDF file
    supplier = Chem.SDMolSupplier(input_sdf)
    output_name = input_sdf.split('.')[0]
    
    # Count the number of molecules
    molecules = [mol for mol in supplier if mol is not None]
    num_molecules = len(molecules)
    
    output = []
    # Check if the SDF file contains more than one molecule
    if num_molecules > 1:
        # Loop over each molecule and write it to a separate file
        for i, mol in enumerate(molecules):
            output_sdf = f"{output_name}_{i+1}.sdf"
            output.append(output_sdf)

            writer = SDWriter(output_sdf)
            writer.write(mol)
            writer.close()
        print(f"{num_molecules} molecules have been successfully separated into individual SDF files.")
    else:
        print(f"The SDF file {input_sdf} contains one or no molecules. No splitting is required.")
        output = str(input_sdf)
    return output

# Replace with the actual path to your SDF file
# input_sdf = "molecules_bx_2024_06_18_154410_1.sdf"
# lig_file = split_sdf(input_sdf)
# print(lig_file)

# Loading the leakyPDB dataset - we use this dataset to test out our pipeline

In [20]:
# loading LeakyPDB dataset
leakypdb_test = pd.read_csv('leakypdb_test.csv')
leakypdb_test.rename(columns={'Unnamed: 0': 'pdb_id',
                          'header': 'protein_family', 
                          'seq':'protein_sequence',
                          'kd/ki':'binding_affinity',
                          'value':'pkd/pki'}, inplace=True)
leakypdb_test['pdb_id'] = leakypdb_test['pdb_id'].str.upper()
leakypdb_test
#TODO: kd/ki column may not be all experimentally determined, some may be predicted, need to check this

FileNotFoundError: [Errno 2] No such file or directory: 'leakypdb_test.csv'

In [21]:
leakypdb_ids=leakypdb_test['pdb_id'].tolist()
leakypdb_ids = [leakypdb_id.upper() for leakypdb_id in leakypdb_ids]
leakypdb_ids

NameError: name 'leakypdb_test' is not defined

# Testing the vina pipeline
<font color='orange'>Testing the vina pipeline without try except</font>

In [23]:
mpro_oe_water = pd.read_csv('results/mpro_oe/with_water/mpro_oe_w_water_data.csv')
mpro_oe_water_error_ligands = mpro_oe_water[mpro_oe_water['error_message'].notnull()]['ligand_name'].tolist()
import glob
all_mpro_ligands = []
for pattern in mpro_oe_water_error_ligands:
    matched_files = glob.glob(f'Updated_Mpro_ligands/{pattern}')
    all_mpro_ligands.extend(matched_files)

for ligand_file in all_mpro_ligands:
    
    os.system(f'cp {ligand_file} .')
    # ligand_file name without path name
    ligand_file = os.path.basename(ligand_file)

    receptor_file = 'Mpro-protein.pdb'
    complex_file = 'Mpro-x0072_0.pdb'
    os.system(f'cp Mpro_complexes/{receptor_file} .')
    os.system(f'cp Mpro_complexes/{complex_file} .')

    # receptor_file = '6YNQ_receptor_fixed.pdb'
    # complex_file = '6YNQ.pdb'             
    # os.system(f'cp Mpro_active_site/{complex_file} .')
    # os.system(f'cp Mpro_active_site/{receptor_file} .')

    

    # oe_process_lig_prot(ligand_file, 
    #                     receptor_file, 
    #                     complex_file, 
    #                     preserve_water=True, 
    #                     preserve_metal=True, 
    #                     csv_out_file='mpro_oe_w_water_data.csv')

# def oe_process_lig_prot(lig_file, prot_file, complex_file, preserve_water=False, preserve_metal=False, csv_out_file='oe_docking_data.csv'):
#     '''Final function to process each protein and ligand pair, prepare them for docking, score the ligand and save the output files.
#     Both lig_file and prot_file must be strings, not lists. If you want to process multiple ligands and proteins (i.e. LISTS), you should use this function in the following manner: 
#     for lig, prot in tqdm(zip(lig_files, prot_files)):
#         oe_process_lig_prot(lig, prot)
#         ...
#     '''
    lig_file = ligand_file
    prot_file = receptor_file
    complex_file = complex_file
    preserve_water=True 
    preserve_metal=True 
    csv_out_file='oe_docking_data.csv'

    # TODO: handling lists..?
    if isinstance(lig_file, list) or isinstance(prot_file, list) or isinstance(complex_file, list):
        raise ValueError("lig_file and prot_file must be strings, not lists. If you are trying to process multiple ligands and proteins, you should use this function in a loop.")
    
    # check if any of the input is none
    if lig_file is None or prot_file is None or complex_file is None:
        raise ValueError("lig_file, prot_file and complex_file must not be None. You risk deleting everything in your current directory if any of these is None!")
    
    lig_name = os.path.splitext(lig_file)[0]
    prot_name = os.path.splitext(prot_file)[0]
    complex_name = os.path.splitext(complex_file)[0]

    # try:
    os.environ['OE_LICENSE'] = '/home/ian/oe_license.txt' # change this to your OE_LICENSE path
    
    # prepare the ligand and receptor
    prep = DockingPrepper('.',
                            lig_file=lig_file, 
                            prot_file=prot_file,
                            pdb_id=complex_name,
                            preserve_water=preserve_water,
                            preserve_metal=preserve_metal) # this is not really the pdb_id but more of a basename for the files

    # OE preparation
    DU = prep.oe_process()
    
    # using the DockingScorer class to get the docking score
    scorer = DockingScorer('.', 
                        lig_file, 
                        prot_file, 
                        save_out_file=True,
                        from_pdb=False,
                        complex_file=complex_file,
                        csv_out_file=csv_out_file,
                        receptor_DU=DU) # for OE, this will be the receptor DU produced from DockingPrepper.oe_process (required)
                        
    docking_score = scorer.oe_score_ligand()

Preparing the ligand and protein files using OpenEye Toolkits...


DPI: 0.12, RFree: 0.23, Resolution: 1.65
Processing BU # 1 with title: ---, chains AB, alt: A
Processing BU # 2 with title: ---, chains AB, alt: B
Skipping redundant DU with alts outside the site of interest, renaming existing to collapse alts
Discarding redundant alt DU with title ---(AB)altB > LIG(A-1101)
Skipping redundant DU with alts outside the site of interest, renaming existing to collapse alts
Discarding redundant alt DU with title ---(AB)altB > LIG(B-1101)
Superposition - RMSD: 0.00, Ref: , Fit: , SeqScore: 3102


Design unit was successfully made for Mpro-x0072_0.pdb, output is saved to ['Mpro-x0072_0_DU_0.oedu', 'Mpro-x0072_0_DU_1.oedu'].
The receptor design unit file has been saved as Mpro-x0072_0_DU_0_receptor.oedu
The receptor design unit file has been saved as Mpro-x0072_0_DU_1_receptor.oedu
cleaning 
docking 
cleaning 
docking 
The docked ligand file has been saved as ['Mpro-x0072_0_DU_0_receptor_Updated_Mpro_data_2044_docked.sdf', 'Mpro-x0072_0_DU_1_receptor_Updated_Mpro_data_2044_docked.sdf']
RDKit approach failed: Error: Invalid molecule found, try the regex-based approach., trying regex-based approach...
Output saved as Mpro-protein_Updated_Mpro_data_2044_0.out

Data saved to oe_docking_data.csv
Docked ligand Mpro-x0072_0_DU_0_receptor_Updated_Mpro_data_2044_docked.sdf, molecule 1: Chemgauss4 score = -1.02
RDKit approach failed: Error: Invalid molecule found, try the regex-based approach., trying regex-based approach...
Output saved as Mpro-protein_Updated_Mpro_data_2044_0.out

Data s

[22:09:36] Explicit valence for atom # 4 N, 4, is greater than permitted
[22:09:36] ERROR: Could not sanitize molecule ending on line 162
[22:09:36] ERROR: Explicit valence for atom # 4 N, 4, is greater than permitted
[22:09:36] Explicit valence for atom # 4 N, 4, is greater than permitted
[22:09:36] ERROR: Could not sanitize molecule ending on line 162
[22:09:36] ERROR: Explicit valence for atom # 4 N, 4, is greater than permitted
DPI: 0.12, RFree: 0.23, Resolution: 1.65
Processing BU # 1 with title: ---, chains AB, alt: A
Processing BU # 2 with title: ---, chains AB, alt: B
Skipping redundant DU with alts outside the site of interest, renaming existing to collapse alts
Discarding redundant alt DU with title ---(AB)altB > LIG(A-1101)
Skipping redundant DU with alts outside the site of interest, renaming existing to collapse alts
Discarding redundant alt DU with title ---(AB)altB > LIG(B-1101)
Superposition - RMSD: 0.00, Ref: , Fit: , SeqScore: 3102


Design unit was successfully made for Mpro-x0072_0.pdb, output is saved to ['Mpro-x0072_0_DU_0.oedu', 'Mpro-x0072_0_DU_1.oedu'].
The receptor design unit file has been saved as Mpro-x0072_0_DU_0_receptor.oedu
The receptor design unit file has been saved as Mpro-x0072_0_DU_1_receptor.oedu
cleaning 
docking 
cleaning 
docking 
The docked ligand file has been saved as ['Mpro-x0072_0_DU_0_receptor_Updated_Mpro_data_1687_docked.sdf', 'Mpro-x0072_0_DU_1_receptor_Updated_Mpro_data_1687_docked.sdf']
RDKit approach failed: Error: Invalid molecule found, try the regex-based approach., trying regex-based approach...
Output saved as Mpro-protein_Updated_Mpro_data_1687_0.out

Data saved to oe_docking_data.csv
Docked ligand Mpro-x0072_0_DU_0_receptor_Updated_Mpro_data_1687_docked.sdf, molecule 1: Chemgauss4 score = -8.06
RDKit approach failed: Error: Invalid molecule found, try the regex-based approach., trying regex-based approach...
Output saved as Mpro-protein_Updated_Mpro_data_1687_0.out

Data s

[22:11:12] Explicit valence for atom # 22 N, 4, is greater than permitted
[22:11:12] ERROR: Could not sanitize molecule ending on line 99
[22:11:12] ERROR: Explicit valence for atom # 22 N, 4, is greater than permitted
[22:11:12] Explicit valence for atom # 22 N, 4, is greater than permitted
[22:11:12] ERROR: Could not sanitize molecule ending on line 99
[22:11:12] ERROR: Explicit valence for atom # 22 N, 4, is greater than permitted
DPI: 0.12, RFree: 0.23, Resolution: 1.65
Processing BU # 1 with title: ---, chains AB, alt: A
Processing BU # 2 with title: ---, chains AB, alt: B


In [52]:
lig_file = 'L3001_0.sdf' 
prot_file = '5S9L_receptor_w_water_metal.pdb'
complex_file = '5S9L.pdb'
csv_out_file='vina_playground_docking_data.csv'

'''Final function to process each protein and ligand pair, prepare them for docking, score the ligand and save the output files.
Both lig_file and prot_file must be strings, not lists. If you want to process multiple ligands and proteins (i.e. LISTS), you should use this function in the following manner: 
for lig, prot in tqdm(zip(lig_files, prot_files)):
    vina_process_lig_prot(lig, prot)
    ...
'''
# TODO: handling lists..?
if isinstance(lig_file, list) or isinstance(prot_file, list):
    raise ValueError("lig_file and prot_file must be strings, not lists.")

destination_dir = f'data/{lig_file.split(".")[0]}_with_{prot_file.split(".")[0]}'
os.makedirs(destination_dir, exist_ok=True)

lig_name = os.path.splitext(lig_file)[0]
prot_name = os.path.splitext(prot_file)[0]
complex_name = os.path.splitext(complex_file)[0]

# try:      
os.environ['OE_LICENSE'] = '/home/ian/oe_license.txt' # change this to your OE_LICENSE path

# prepare the ligand and receptor for Vina i.e. convert to pdbqt format
prep = DockingPrepper('.',
                        lig_file=lig_file, 
                        prot_file=prot_file,
                        pdb_id=complex_name, # this is not really the pdb_id but more of a basename for the files
# for vina preparation
                        preserve_water = True,
                        preserve_metal = True) 
prep.vina_process()
        
# using the DockingScorer class to get the docking score
scorer = DockingScorer('.', 
                        lig_file, 
                        prot_file,
                        complex_file=complex_file, # reference complex file, required for the vina process 
                        get_vina_poses=True, # this only works for vina, there's no choice for OE - HA!
                        save_out_file=True,
                        from_pdb=False,
                        csv_out_file=csv_out_file,
                        receptor_DU=None) # None because this is for Vina, but for OE, this will be the receptor DU (required)

docking_score = scorer.vina_score_ligand()

new_dir_name = f'{lig_name}_{prot_name}'
new_dir = os.path.join(cwd, new_dir_name)
os.makedirs(new_dir, exist_ok=True)

1 molecule converted


The ligand file has been converted to L3001_0.pdb
The ligand file has been saved as L3001_0.pdbqt

Sorry, there are no Gasteiger parameters available for atom 5S9L_receptor_w_water_metal:A: CA901:CA
Sorry, there are no Gasteiger parameters available for atom 5S9L_receptor_w_water_metal:A: ZN907:ZN
Sorry, there are no Gasteiger parameters available for atom 5S9L_receptor_w_water_metal:A: CA909:CA
Sorry, there are no Gasteiger parameters available for atom 5S9L_receptor_w_water_metal:A:HOH1001:O
Sorry, there are no Gasteiger parameters available for atom 5S9L_receptor_w_water_metal:A:HOH1002:O
Sorry, there are no Gasteiger parameters available for atom 5S9L_receptor_w_water_metal:A:HOH1003:O
Sorry, there are no Gasteiger parameters available for atom 5S9L_receptor_w_water_metal:A:HOH1004:O
Sorry, there are no Gasteiger parameters available for atom 5S9L_receptor_w_water_metal:A:HOH1005:O
Sorry, there are no Gasteiger parameters available for atom 5S9L_receptor_w_water_metal:A:HOH1006:O
S

DPI: 0.12, RFree: 0.19, Resolution: 1.90
Processing BU # 1 with title: ISOFORM 2 OF ECTONUCLEOTIDE, chains AB
Found unresolved N-terminal with 23 residues before TRP 51   A 1  , with sequence FTASRIKRAEWDEGPPTVLSDSP
Found 4 residue gap between LYS 463   A 1   and CYS 468   A 1  , with sequence PSGK
Found 2 residue gap between GLU 573   A 1   and ASN 576   A 1  , with sequence PK
Found unresolved C-terminal with 12 residues after GLU 861   A 1  , with sequence IGGRHHHHHHHH
   Falling back to charging protein with OEMMFF94Charges


Error in 5S9L: Command ./vina/vina_1.2.5_linux_x86_64 --config L3001_0_5S9L_receptor_w_water_metal_config.txt failed: 


PDBQT parsing error: Coordinate "7  65.14" is not valid.
 > ATOM   3946  HH1A1ARG A 450       8.039   7.307  65.141  1.00  0.00     0.174 HD

out:  
err:  
Error in 5S9L: Command ./vina/vina_1.2.5_linux_x86_64 --config L3001_0_5S9L_receptor_w_water_metal_config.txt failed: 


PDBQT parsing error: Coordinate "7  65.14" is not valid.
 > ATOM   3946  HH1A1ARG A 450       8.039   7.307  65.141  1.00  0.00     0.174 HD



CommandExecuteError: Command ./vina/vina_1.2.5_linux_x86_64 --config L3001_0_5S9L_receptor_w_water_metal_config.txt failed: 


PDBQT parsing error: Coordinate "7  65.14" is not valid.
 > ATOM   3946  HH1A1ARG A 450       8.039   7.307  65.141  1.00  0.00     0.174 HD


# <font color='orange'>Testing with complexes produced from donwloading PDB ID, then plitting the complex files</font>

Rescoring crystal structures

In [16]:
pdb_ids = ['5S9L',
           '1KLT',
           '1A42',
            '4B3D', # Failed in vina
           '3GVU',
           '6C7Q',
           ]
for pdb_id in tqdm(pdb_ids):
    # download 
    filename = download_pdb_file(pdb_id)
    # split the protein and ligand from the PDB file
    complex_file, receptor_file, ligand_file = pdb_to_prot_lig(pdb_id, filename, preserve_water=True,preserve_metal=True)
    # rest of pipeline
    vina_process_lig_prot(ligand_file, receptor_file, complex_file, preserve_water=True, preserve_metal=True, csv_out_file='AutoDock_w_water_data.csv')

  0%|          | 0/6 [00:00<?, ?it/s]

Downloading PDB structure '5s9l'...




Fixed receptor file saved as: 5S9L_receptor_fixed.pdb
Extracting ligand from 5S9L.pdb...
Reading receptor from 5S9L.pdb...


DPI: 0.12, RFree: 0.19, Resolution: 1.90
Processing BU # 1 with title: ISOFORM 2 OF ECTONUCLEOTIDE, chains AB
Found unresolved N-terminal with 23 residues before TRP 51   A 1  , with sequence FTASRIKRAEWDEGPPTVLSDSP
Found 4 residue gap between LYS 463   A 1   and CYS 468   A 1  , with sequence PSGK
Found 2 residue gap between GLU 573   A 1   and ASN 576   A 1  , with sequence PK
Found unresolved C-terminal with 12 residues after GLU 861   A 1  , with sequence IGGRHHHHHHHH
   Falling back to charging protein with OEMMFF94Charges


The complex file has been saved as 5S9L.pdb
The receptor file has been saved as 5S9L_receptor_fixed.pdb
The ligand file has been saved as 5S9L_ligand.pdb
The ligand file has been saved as 5S9L_ligand.pdbqt
Sorry, there are no Gasteiger parameters available for atom 5S9L_receptor_fixed:A:SER751:OG
Sorry, there are no Gasteiger parameters available for atom 5S9L_receptor_fixed:A: CA806:CA
Sorry, there are no Gasteiger parameters available for atom 5S9L_receptor_fixed:A:  K807:K
Sorry, there are no Gasteiger parameters available for atom 5S9L_receptor_fixed:A: ZN808:ZN
Sorry, there are no Gasteiger parameters available for atom 5S9L_receptor_fixed:A: CA810:CA
Sorry, there are no Gasteiger parameters available for atom 5S9L_receptor_fixed:A:HOH1235:O
The protein file 5S9L_receptor_fixed.pdb has been converted to pdbqt format and saved in 5S9L_receptor_fixed.pdbqt
Extracting ligand from 5S9L.pdb...
Reading receptor from 5S9L.pdb...


DPI: 0.12, RFree: 0.19, Resolution: 1.90
Processing BU # 1 with title: ISOFORM 2 OF ECTONUCLEOTIDE, chains AB
Found unresolved N-terminal with 23 residues before TRP 51   A 1  , with sequence FTASRIKRAEWDEGPPTVLSDSP
Found 4 residue gap between LYS 463   A 1   and CYS 468   A 1  , with sequence PSGK
Found 2 residue gap between GLU 573   A 1   and ASN 576   A 1  , with sequence PK
Found unresolved C-terminal with 12 residues after GLU 861   A 1  , with sequence IGGRHHHHHHHH
   Falling back to charging protein with OEMMFF94Charges
mv: cannot move '5S9L_ligand_5S9L_receptor_fixed' to a subdirectory of itself, '/home/ian/msc_project/tmp_exp_docking/5S9L_ligand_5S9L_receptor_fixed/5S9L_ligand_5S9L_receptor_fixed'
mv: cannot move '5S9L_ligand_5S9L_receptor_fixed' to a subdirectory of itself, '/home/ian/msc_project/tmp_exp_docking/5S9L_ligand_5S9L_receptor_fixed/5S9L_ligand_5S9L_receptor_fixed'
mv: cannot move '5S9L_ligand_5S9L_receptor_fixed' to a subdirectory of itself, '/home/ian/msc_projec

5S9L: Estimated Free Energy of Binding = -11.19 kcal/mol
Output saved as 5S9L_receptor_fixed_5S9L_ligand.out

Data saved to AutoDock_w_water_data.csv
Data saved in ./data//home/ian/msc_project/tmp_exp_docking/5S9L_ligand_5S9L_receptor_fixed
Downloading PDB structure '1klt'...
Fixed receptor file saved as: 1KLT_receptor_fixed.pdb
Extracting ligand from 1KLT.pdb...
Reading receptor from 1KLT.pdb...


DPI: 0.22, RFree: 0.26, Resolution: 1.90
Processing BU # 1 with title: CHYMASE, chains A


The complex file has been saved as 1KLT.pdb
The receptor file has been saved as 1KLT_receptor_fixed.pdb
The ligand file has been saved as 1KLT_ligand.pdb
The ligand file has been saved as 1KLT_ligand.pdbqt
adding gasteiger charges to peptide
Sorry, there are no Gasteiger parameters available for atom 1KLT_receptor_fixed:A:GLU71:OE2
Unable to assign MAP type to atom N
Sorry, there are no Gasteiger parameters available for atom 1KLT_receptor_fixed:A:LYS173B:NZ
Sorry, there are no Gasteiger parameters available for atom 1KLT_receptor_fixed:A:HOH367:O
The protein file 1KLT_receptor_fixed.pdb has been converted to pdbqt format and saved in 1KLT_receptor_fixed.pdbqt
Extracting ligand from 1KLT.pdb...
Reading receptor from 1KLT.pdb...


DPI: 0.22, RFree: 0.26, Resolution: 1.90
Processing BU # 1 with title: CHYMASE, chains A
mv: cannot move '1KLT_ligand_1KLT_receptor_fixed' to a subdirectory of itself, '/home/ian/msc_project/tmp_exp_docking/1KLT_ligand_1KLT_receptor_fixed/1KLT_ligand_1KLT_receptor_fixed'
mv: cannot move '1KLT_ligand_1KLT_receptor_fixed' to a subdirectory of itself, '/home/ian/msc_project/tmp_exp_docking/1KLT_ligand_1KLT_receptor_fixed/1KLT_ligand_1KLT_receptor_fixed'
mv: cannot move '1KLT_ligand_1KLT_receptor_fixed' to a subdirectory of itself, '/home/ian/msc_project/tmp_exp_docking/1KLT_ligand_1KLT_receptor_fixed/1KLT_ligand_1KLT_receptor_fixed'
 33%|███▎      | 2/6 [02:56<05:38, 84.58s/it] 

1KLT: Estimated Free Energy of Binding = -4.271 kcal/mol
Output saved as 1KLT_receptor_fixed_1KLT_ligand.out

Data saved to AutoDock_w_water_data.csv
Data saved in ./data//home/ian/msc_project/tmp_exp_docking/1KLT_ligand_1KLT_receptor_fixed
Downloading PDB structure '1a42'...
Fixed receptor file saved as: 1A42_receptor_fixed.pdb
Extracting ligand from 1A42.pdb...
Reading receptor from 1A42.pdb...


DPI: 0.00, RFree: 0.00, Resolution: 2.25
Processing BU # 1 with title: CARBONIC ANHYDRASE II, chains A
Found unresolved N-terminal with 2 residues before HIS 4   A 1  , with sequence SH
Found unresolved C-terminal with 1 residues after PHE 260   A 1  , with sequence K


The complex file has been saved as 1A42.pdb
The receptor file has been saved as 1A42_receptor_fixed.pdb
The ligand file has been saved as 1A42_ligand.pdb
The ligand file has been saved as 1A42_ligand.pdbqt
Sorry, there are no Gasteiger parameters available for atom 1A42_receptor_fixed:A:PHE256:O
Sorry, there are no Gasteiger parameters available for atom 1A42_receptor_fixed:A: ZN257:ZN
The protein file 1A42_receptor_fixed.pdb has been converted to pdbqt format and saved in 1A42_receptor_fixed.pdbqt
Extracting ligand from 1A42.pdb...
Reading receptor from 1A42.pdb...


DPI: 0.00, RFree: 0.00, Resolution: 2.25
Processing BU # 1 with title: CARBONIC ANHYDRASE II, chains A
Found unresolved N-terminal with 2 residues before HIS 4   A 1  , with sequence SH
Found unresolved C-terminal with 1 residues after PHE 260   A 1  , with sequence K
mv: cannot move '1A42_ligand_1A42_receptor_fixed' to a subdirectory of itself, '/home/ian/msc_project/tmp_exp_docking/1A42_ligand_1A42_receptor_fixed/1A42_ligand_1A42_receptor_fixed'
mv: cannot move '1A42_ligand_1A42_receptor_fixed' to a subdirectory of itself, '/home/ian/msc_project/tmp_exp_docking/1A42_ligand_1A42_receptor_fixed/1A42_ligand_1A42_receptor_fixed'
mv: cannot move '1A42_ligand_1A42_receptor_fixed' to a subdirectory of itself, '/home/ian/msc_project/tmp_exp_docking/1A42_ligand_1A42_receptor_fixed/1A42_ligand_1A42_receptor_fixed'
 50%|█████     | 3/6 [03:48<03:29, 69.83s/it]

1A42: Estimated Free Energy of Binding = -6.809 kcal/mol
Output saved as 1A42_receptor_fixed_1A42_ligand.out

Data saved to AutoDock_w_water_data.csv
Data saved in ./data//home/ian/msc_project/tmp_exp_docking/1A42_ligand_1A42_receptor_fixed
Downloading PDB structure '4b3d'...




Fixed receptor file saved as: 4B3D_receptor_fixed.pdb
Extracting ligand from 4B3D.pdb...
Reading receptor from 4B3D.pdb...


DPI: 0.09, RFree: 0.21, Resolution: 1.59
Processing BU # 1 with title: DNA REPAIR AND RECOMBINATION PROTEIN RADA, chains A
Found unresolved N-terminal with 1 residues before ALA 108   A 1  , with sequence M
Found 11 residue gap between VAL 285   A 1   and ALA 309   A 1  , with sequence QANGGHILAHS
Found 6 residue gap between ASP 329   A 1   and GLY 336   A 1  , with sequence APHLPE
   Falling back to charging protein with OEMMFF94Charges
Processing BU # 2 with title: DNA REPAIR AND RECOMBINATION PROTEIN RADA, chains C
Found unresolved N-terminal with 1 residues before ALA 108   C 2  , with sequence M
Found 10 residue gap between VAL 285   C 2   and SER 308   C 2  , with sequence QANGGHILAH
Found 5 residue gap between ASP 329   C 2   and GLU 335   C 2  , with sequence APHLP
   Falling back to charging protein with OEMMFF94Charges


The complex file has been saved as 4B3D.pdb
The receptor file has been saved as 4B3D_receptor_fixed.pdb
The ligand file has been saved as 4B3D_ligand.pdb
The ligand file has been saved as 4B3D_ligand.pdbqt
adding gasteiger charges to peptide
The protein file 4B3D_receptor_fixed.pdb has been converted to pdbqt format and saved in 4B3D_receptor_fixed.pdbqt
Extracting ligand from 4B3D.pdb...
Reading receptor from 4B3D.pdb...


DPI: 0.09, RFree: 0.21, Resolution: 1.59
Processing BU # 1 with title: DNA REPAIR AND RECOMBINATION PROTEIN RADA, chains A
Found unresolved N-terminal with 1 residues before ALA 108   A 1  , with sequence M
Found 11 residue gap between VAL 285   A 1   and ALA 309   A 1  , with sequence QANGGHILAHS
Found 6 residue gap between ASP 329   A 1   and GLY 336   A 1  , with sequence APHLPE
   Falling back to charging protein with OEMMFF94Charges
Processing BU # 2 with title: DNA REPAIR AND RECOMBINATION PROTEIN RADA, chains C
Found unresolved N-terminal with 1 residues before ALA 108   C 2  , with sequence M
Found 10 residue gap between VAL 285   C 2   and SER 308   C 2  , with sequence QANGGHILAH
Found 5 residue gap between ASP 329   C 2   and GLU 335   C 2  , with sequence APHLP
   Falling back to charging protein with OEMMFF94Charges
mv: cannot move '4B3D_ligand_4B3D_receptor_fixed' to a subdirectory of itself, '/home/ian/msc_project/tmp_exp_docking/4B3D_ligand_4B3D_receptor_fixed/4B3D_liga

4B3D: Estimated Free Energy of Binding = -5.427 kcal/mol
Output saved as 4B3D_receptor_fixed_4B3D_ligand.out

Data saved to AutoDock_w_water_data.csv
Data saved in ./data//home/ian/msc_project/tmp_exp_docking/4B3D_ligand_4B3D_receptor_fixed
Downloading PDB structure '3gvu'...
Fixed receptor file saved as: 3GVU_receptor_fixed.pdb
Extracting ligand from 3GVU.pdb...
Reading receptor from 3GVU.pdb...


DPI: 0.15, RFree: 0.23, Resolution: 2.05
Processing BU # 1 with title: TYROSINE-PROTEIN KINASE ABL2, chains A
   Falling back to charging protein with OEMMFF94Charges


The complex file has been saved as 3GVU.pdb
The receptor file has been saved as 3GVU_receptor_fixed.pdb
The ligand file has been saved as 3GVU_ligand.pdb
The ligand file has been saved as 3GVU_ligand.pdbqt
adding gasteiger charges to peptide
The protein file 3GVU_receptor_fixed.pdb has been converted to pdbqt format and saved in 3GVU_receptor_fixed.pdbqt
Extracting ligand from 3GVU.pdb...
Reading receptor from 3GVU.pdb...


DPI: 0.15, RFree: 0.23, Resolution: 2.05
Processing BU # 1 with title: TYROSINE-PROTEIN KINASE ABL2, chains A
   Falling back to charging protein with OEMMFF94Charges
mv: cannot move '3GVU_ligand_3GVU_receptor_fixed' to a subdirectory of itself, '/home/ian/msc_project/tmp_exp_docking/3GVU_ligand_3GVU_receptor_fixed/3GVU_ligand_3GVU_receptor_fixed'
mv: cannot move '3GVU_ligand_3GVU_receptor_fixed' to a subdirectory of itself, '/home/ian/msc_project/tmp_exp_docking/3GVU_ligand_3GVU_receptor_fixed/3GVU_ligand_3GVU_receptor_fixed'
mv: cannot move '3GVU_ligand_3GVU_receptor_fixed' to a subdirectory of itself, '/home/ian/msc_project/tmp_exp_docking/3GVU_ligand_3GVU_receptor_fixed/3GVU_ligand_3GVU_receptor_fixed'
 83%|████████▎ | 5/6 [06:47<01:24, 84.78s/it]

3GVU: Estimated Free Energy of Binding = -9.571 kcal/mol
Output saved as 3GVU_receptor_fixed_3GVU_ligand.out

Data saved to AutoDock_w_water_data.csv
Data saved in ./data//home/ian/msc_project/tmp_exp_docking/3GVU_ligand_3GVU_receptor_fixed
Downloading PDB structure '6c7q'...
Fixed receptor file saved as: 6C7Q_receptor_fixed.pdb
Extracting ligand from 6C7Q.pdb...
Reading receptor from 6C7Q.pdb...


DPI: 0.08, RFree: 0.23, Resolution: 1.51
Processing BU # 1 with title: BROMODOMAIN-CONTAINING PROTEIN 4, chains A
Found unresolved N-terminal with 19 residues before LYS 349   A 1  , with sequence SNAKDVPDSQQHPAPEKSS


The complex file has been saved as 6C7Q.pdb
The receptor file has been saved as 6C7Q_receptor_fixed.pdb
The ligand file has been saved as 6C7Q_ligand.pdb
The ligand file has been saved as 6C7Q_ligand.pdbqt
adding gasteiger charges to peptide
The protein file 6C7Q_receptor_fixed.pdb has been converted to pdbqt format and saved in 6C7Q_receptor_fixed.pdbqt
Extracting ligand from 6C7Q.pdb...
Reading receptor from 6C7Q.pdb...


DPI: 0.08, RFree: 0.23, Resolution: 1.51
Processing BU # 1 with title: BROMODOMAIN-CONTAINING PROTEIN 4, chains A
Found unresolved N-terminal with 19 residues before LYS 349   A 1  , with sequence SNAKDVPDSQQHPAPEKSS
mv: cannot move '6C7Q_ligand_6C7Q_receptor_fixed' to a subdirectory of itself, '/home/ian/msc_project/tmp_exp_docking/6C7Q_ligand_6C7Q_receptor_fixed/6C7Q_ligand_6C7Q_receptor_fixed'
mv: cannot move '6C7Q_ligand_6C7Q_receptor_fixed' to a subdirectory of itself, '/home/ian/msc_project/tmp_exp_docking/6C7Q_ligand_6C7Q_receptor_fixed/6C7Q_ligand_6C7Q_receptor_fixed'
mv: cannot move '6C7Q_ligand_6C7Q_receptor_fixed' to a subdirectory of itself, '/home/ian/msc_project/tmp_exp_docking/6C7Q_ligand_6C7Q_receptor_fixed/6C7Q_ligand_6C7Q_receptor_fixed'
100%|██████████| 6/6 [08:17<00:00, 82.95s/it]

6C7Q: Estimated Free Energy of Binding = -9.999 kcal/mol
Output saved as 6C7Q_receptor_fixed_6C7Q_ligand.out

Data saved to AutoDock_w_water_data.csv
Data saved in ./data//home/ian/msc_project/tmp_exp_docking/6C7Q_ligand_6C7Q_receptor_fixed





In [22]:
pdb_id = '6YNQ' # Mpro active site with ligand
filename = download_pdb_file(pdb_id)
# split the protein and ligand from the PDB file
complex_file, receptor_file, ligand_file = pdb_to_prot_lig(pdb_id, filename, preserve_water=True,preserve_metal=True)
# rest of pipeline
vina_process_lig_prot(ligand_file, receptor_file, complex_file, preserve_water=True, preserve_metal=True, csv_out_file='mpro_w_water_data.csv')

Downloading PDB structure '6ynq'...
Fixed receptor file saved as: 6YNQ_receptor_fixed.pdb
Extracting ligand from 6YNQ.pdb...
Reading receptor from 6YNQ.pdb...


 CL 403   A 2   and  CL 403   B 2   are overlapping. Overlap: 19.65, Tanimoto: 1.00
    Overlap is complete, marking  CL 403   B 2   for deletion
    Deleted CL   :  CL 403   B 2  
HOH 696   A 2   and HOH 696   B 2   are overlapping. Overlap: 19.65, Tanimoto: 1.00
    Overlap is complete, marking HOH 696   B 2   for deletion
    Deleted  O   : HOH 696   B 2  
DPI: 0.13, RFree: 0.23, Resolution: 1.80
Processing BU # 1 with title: 3C-LIKE PROTEINASE, chains AB


The complex file has been saved as 6YNQ.pdb
The receptor file has been saved as 6YNQ_receptor_fixed.pdb
The ligand file has been saved as 6YNQ_ligand.pdb
Error occurred: None
Skipping 6YNQ_ligand.pdb due to an error: Command './ADFRsuite/ADFRsuite_x86_64Linux_1.0/bin/prepare_ligand -l 6YNQ_ligand.pdb -o 6YNQ_ligand.pdbqt -A hydrogens' returned non-zero exit status 1.
Error message saved to mpro_w_water_data.csv
Data saved in ./data//home/ian/msc_project/tmp_vina_mpro_docking/6YNQ_ligand_6YNQ_receptor_fixed


Traceback (most recent call last):
  File "/home/ian/msc_project/docking/ADFRsuite/ADFRsuite_x86_64Linux_1.0/CCSBpckgs/AutoDockTools/Utilities24/prepare_ligand4.py", line 242, in <module>
    attach_singletons=attach_singletons)
  File "/home/ian/msc_project/docking/ADFRsuite/ADFRsuite_x86_64Linux_1.0/CCSBpckgs/AutoDockTools/MoleculePreparation.py", line 1016, in __init__
    detect_bonds_between_cycles=detect_bonds_between_cycles)
  File "/home/ian/msc_project/docking/ADFRsuite/ADFRsuite_x86_64Linux_1.0/CCSBpckgs/AutoDockTools/MoleculePreparation.py", line 765, in __init__
    delete_single_nonstd_residues=False)
  File "/home/ian/msc_project/docking/ADFRsuite/ADFRsuite_x86_64Linux_1.0/CCSBpckgs/AutoDockTools/MoleculePreparation.py", line 124, in __init__
    self.repairMol(mol, self.repair_type_list)
  File "/home/ian/msc_project/docking/ADFRsuite/ADFRsuite_x86_64Linux_1.0/CCSBpckgs/AutoDockTools/MoleculePreparation.py", line 174, in repairMol
    self.newHs = self.addHydrogens(mol)


In [30]:
receptor_file = '6R8O_receptor_fixed.pdb'
ligand_file = '6R8O_ligand.pdb'
complex_file = '6R8O.pdb'
vina_process_lig_prot(ligand_file, receptor_file, complex_file, csv_out_file='vina_docking_data_playground.csv')

The ligand file has been saved as 6R8O_ligand.pdbqt
adding gasteiger charges to peptide
The protein file 6R8O_receptor_fixed.pdb has been converted to pdbqt format and saved in 6R8O_receptor_fixed.pdbqt
Extracting ligand from 6R8O.pdb...
Reading receptor from 6R8O.pdb...


DPI: 0.05, RFree: 0.18, Resolution: 1.36
Processing BU # 1 with title: PEPTIDYL-PROLYL CIS-TRANS ISOMERASE F, MITOCHONDRIA, chains A


6R8O: Estimated Free Energy of Binding = -10.33 kcal/mol
Output saved as 6R8O_receptor_fixed_6R8O_ligand.out

Data saved to vina_docking_data_playground.csv
Data saved in ./data//home/ian/msc_project/docking/6R8O_ligand_6R8O_receptor_fixed


mv: cannot move '6R8O_ligand_6R8O_receptor_fixed' to a subdirectory of itself, '/home/ian/msc_project/docking/6R8O_ligand_6R8O_receptor_fixed/6R8O_ligand_6R8O_receptor_fixed'
mv: cannot move '6R8O_ligand_6R8O_receptor_fixed' to a subdirectory of itself, '/home/ian/msc_project/docking/6R8O_ligand_6R8O_receptor_fixed/6R8O_ligand_6R8O_receptor_fixed'
mv: cannot move '6R8O_ligand_6R8O_receptor_fixed' to a subdirectory of itself, '/home/ian/msc_project/docking/6R8O_ligand_6R8O_receptor_fixed/6R8O_ligand_6R8O_receptor_fixed'
mv: cannot move '/home/ian/msc_project/docking/6R8O_ligand_6R8O_receptor_fixed' to './data/6R8O_ligand_6R8O_receptor_fixed': Directory not empty


<font color='orange'>Testing with Mpro protein and a ligand</font>

In [26]:
ligand_file = 'Updated_Mpro_data_3.sdf'
# receptor_file = 'Mpro-protein.pdb'
# complex_file = 'Mpro-x0072_0.pdb'
os.system(f'cp Updated_Mpro_ligands/{ligand_file} .')
os.system(f'cp Mpro_complexes/{receptor_file} .')
os.system(f'cp Mpro_complexes/{complex_file} .')

vina_process_lig_prot(ligand_file, 
                      receptor_file, 
                      complex_file, 
                      preserve_water=True, 
                      preserve_metal=True, 
                      csv_out_file='mpro_vina_w_water_data.csv')

1 molecule converted


The ligand file has been converted to Updated_Mpro_data_3.pdb
The ligand file has been saved as Updated_Mpro_data_3.pdbqt
adding gasteiger charges to peptide
The protein file Mpro-protein.pdb has been converted to pdbqt format and saved in Mpro-protein.pdbqt
Extracting ligand from Mpro-x0072_0.pdb...
Reading receptor from Mpro-x0072_0.pdb...
Failed to extract ligand from Mpro-x0072_0.pdb using the OpenEye Toolkit,trying a different script written using OESplitMolComplex...


DPI: 0.12, RFree: 0.23, Resolution: 1.65
Processing BU # 1 with title: ---, chains AB


Mpro-protein: Estimated Free Energy of Binding = -6.421 kcal/mol
Output saved as Mpro-protein_Updated_Mpro_data_3.out

Data saved to mpro_vina_w_water_data.csv
Data saved in ./data//home/ian/msc_project/tmp_vina_mpro_docking/Updated_Mpro_data_3_Mpro-protein


mv: cannot move 'Updated_Mpro_data_3_Mpro-protein' to a subdirectory of itself, '/home/ian/msc_project/tmp_vina_mpro_docking/Updated_Mpro_data_3_Mpro-protein/Updated_Mpro_data_3_Mpro-protein'
mv: cannot move 'Updated_Mpro_data_3_Mpro-protein' to a subdirectory of itself, '/home/ian/msc_project/tmp_vina_mpro_docking/Updated_Mpro_data_3_Mpro-protein/Updated_Mpro_data_3_Mpro-protein'
mv: cannot move '/home/ian/msc_project/tmp_vina_mpro_docking/Updated_Mpro_data_3_Mpro-protein' to './data/Updated_Mpro_data_3_Mpro-protein': Directory not empty


<font color='yellow'>Running the pipeline for lists</font>

In [49]:
ligand_files = ['Mpro-x0161_0_ligand.pdb', 
                'Mpro-x0072_0_ligand.pdb'] 
receptor_files = ['Mpro-x0072_0_receptor_fixed.pdb',
                 'Mpro-x0161_0_receptor_fixed.pdb']
for ligand_file in tqdm(ligand_files):
    for receptor_file in receptor_files:
        vina_process_lig_prot(lig_file=ligand_file, 
                 prot_file=receptor_file,
                 csv_out_file='vina_docking_data_playground.csv')

The ligand file has been saved as Mpro-x0161_0_ligand.pdbqt
adding gasteiger charges to peptide
The protein file Mpro-x0072_0_receptor_fixed.pdb has been converted to pdbqt format and saved in Mpro-x0072_0_receptor_fixed.pdbqt
Mpro-x0072: Estimated Free Energy of Binding = 0.0 kcal/mol
Output saved as Mpro-x0072_0_receptor_fixed_Mpro-x0161_0_ligand.out

Data saved to vina_docking_data_playground.csv
Data saved in data/Mpro-x0161_0_ligand_with_Mpro-x0072_0_receptor_fixed
The ligand file has been saved as Mpro-x0161_0_ligand.pdbqt


cp: -r not specified; omitting directory 'Mpro-x0072_0_receptor_fixed'
mv: cannot move '/home/ian/msc_project/docking/Mpro-x0072_0_receptor_fixed' to 'data/Mpro-x0161_0_ligand_with_Mpro-x0072_0_receptor_fixed/Mpro-x0072_0_receptor_fixed': Directory not empty


adding gasteiger charges to peptide
The protein file Mpro-x0161_0_receptor_fixed.pdb has been converted to pdbqt format and saved in Mpro-x0161_0_receptor_fixed.pdbqt
Mpro-x0161: Estimated Free Energy of Binding = 0.0 kcal/mol
Output saved as Mpro-x0161_0_receptor_fixed_Mpro-x0161_0_ligand.out

Data saved to vina_docking_data_playground.csv
Data saved in data/Mpro-x0161_0_ligand_with_Mpro-x0161_0_receptor_fixed
The ligand file has been saved as Mpro-x0072_0_ligand.pdbqt


cp: -r not specified; omitting directory 'Mpro-x0161_0_receptor_fixed'
mv: cannot move '/home/ian/msc_project/docking/Mpro-x0161_0_receptor_fixed' to 'data/Mpro-x0161_0_ligand_with_Mpro-x0161_0_receptor_fixed/Mpro-x0161_0_receptor_fixed': Directory not empty


adding gasteiger charges to peptide
The protein file Mpro-x0072_0_receptor_fixed.pdb has been converted to pdbqt format and saved in Mpro-x0072_0_receptor_fixed.pdbqt
Mpro-x0072: Estimated Free Energy of Binding = -2.443 kcal/mol
Output saved as Mpro-x0072_0_receptor_fixed_Mpro-x0072_0_ligand.out

Data saved to vina_docking_data_playground.csv
Data saved in data/Mpro-x0072_0_ligand_with_Mpro-x0072_0_receptor_fixed
The ligand file has been saved as Mpro-x0072_0_ligand.pdbqt


cp: -r not specified; omitting directory 'Mpro-x0072_0_receptor_fixed'
mv: cannot move '/home/ian/msc_project/docking/Mpro-x0072_0_receptor_fixed' to 'data/Mpro-x0072_0_ligand_with_Mpro-x0072_0_receptor_fixed/Mpro-x0072_0_receptor_fixed': Directory not empty


adding gasteiger charges to peptide
The protein file Mpro-x0161_0_receptor_fixed.pdb has been converted to pdbqt format and saved in Mpro-x0161_0_receptor_fixed.pdbqt
Mpro-x0161: Estimated Free Energy of Binding = -0.252 kcal/mol
Output saved as Mpro-x0161_0_receptor_fixed_Mpro-x0072_0_ligand.out

Data saved to vina_docking_data_playground.csv
Data saved in data/Mpro-x0072_0_ligand_with_Mpro-x0161_0_receptor_fixed


cp: -r not specified; omitting directory 'Mpro-x0161_0_receptor_fixed'
mv: cannot move '/home/ian/msc_project/docking/Mpro-x0161_0_receptor_fixed' to 'data/Mpro-x0072_0_ligand_with_Mpro-x0161_0_receptor_fixed/Mpro-x0161_0_receptor_fixed': Directory not empty


<font color='yellow'>Trying out with diffusion model generated molecules</font>

For these molecules, the input sdf file likely has more than 1 molecules. If this is the case, then we have to add another step to split the sdf file into several sdf files first. We use the function ``split_sdf``.

In [39]:
lig_file='molecules_bx_2024_06_18_154410.sdf' # this is when the sdf file is just 1 molecule
lig_file = split_sdf(lig_file)
prot_file='Mpro-x0072_0_receptor_fixed.pdb' # replace with a protein file of interest
if isinstance(lig_file,list):
    for lig in lig_file:
        vina_process_lig_prot(lig_file=lig, 
                 prot_file=prot_file,
                 csv_out_file='vina_docking_data_playground.csv')
else:
    vina_process_lig_prot(lig_file=lig_file, 
                prot_file=prot_file,
                csv_out_file='vina_docking_data_playground.csv')

14 molecules have been successfully separated into individual SDF files.
The ligand file has been converted to molecules_bx_2024_06_18_154410_1.pdb


1 molecule converted


The ligand file has been saved as molecules_bx_2024_06_18_154410_1.pdbqt
adding gasteiger charges to peptide
The protein file Mpro-x0072_0_receptor_fixed.pdb has been converted to pdbqt format and saved in Mpro-x0072_0_receptor_fixed.pdbqt
Mpro-x0072: Estimated Free Energy of Binding = 16.739 kcal/mol
Output saved as Mpro-x0072_0_receptor_fixed_molecules_bx_2024_06_18_154410_1.out

Data saved to vina_docking_data_playground.csv
Data saved in data/molecules_bx_2024_06_18_154410_1_with_Mpro-x0072_0_receptor_fixed
The ligand file has been converted to molecules_bx_2024_06_18_154410_2.pdb


cp: -r not specified; omitting directory 'Mpro-x0072_0_receptor_fixed'
mv: cannot move '/home/ian/msc_project/docking/Mpro-x0072_0_receptor_fixed' to 'data/molecules_bx_2024_06_18_154410_1_with_Mpro-x0072_0_receptor_fixed/Mpro-x0072_0_receptor_fixed': Directory not empty
1 molecule converted


The ligand file has been saved as molecules_bx_2024_06_18_154410_2.pdbqt
adding gasteiger charges to peptide
The protein file Mpro-x0072_0_receptor_fixed.pdb has been converted to pdbqt format and saved in Mpro-x0072_0_receptor_fixed.pdbqt
Mpro-x0072: Estimated Free Energy of Binding = 14.988 kcal/mol
Output saved as Mpro-x0072_0_receptor_fixed_molecules_bx_2024_06_18_154410_2.out

Data saved to vina_docking_data_playground.csv
Data saved in data/molecules_bx_2024_06_18_154410_2_with_Mpro-x0072_0_receptor_fixed
The ligand file has been converted to molecules_bx_2024_06_18_154410_3.pdb


cp: -r not specified; omitting directory 'Mpro-x0072_0_receptor_fixed'
mv: cannot move '/home/ian/msc_project/docking/Mpro-x0072_0_receptor_fixed' to 'data/molecules_bx_2024_06_18_154410_2_with_Mpro-x0072_0_receptor_fixed/Mpro-x0072_0_receptor_fixed': Directory not empty
1 molecule converted


The ligand file has been saved as molecules_bx_2024_06_18_154410_3.pdbqt
adding gasteiger charges to peptide
The protein file Mpro-x0072_0_receptor_fixed.pdb has been converted to pdbqt format and saved in Mpro-x0072_0_receptor_fixed.pdbqt
Mpro-x0072: Estimated Free Energy of Binding = 31.436 kcal/mol
Output saved as Mpro-x0072_0_receptor_fixed_molecules_bx_2024_06_18_154410_3.out

Data saved to vina_docking_data_playground.csv
Data saved in data/molecules_bx_2024_06_18_154410_3_with_Mpro-x0072_0_receptor_fixed
The ligand file has been converted to molecules_bx_2024_06_18_154410_4.pdb


cp: -r not specified; omitting directory 'Mpro-x0072_0_receptor_fixed'
mv: cannot move '/home/ian/msc_project/docking/Mpro-x0072_0_receptor_fixed' to 'data/molecules_bx_2024_06_18_154410_3_with_Mpro-x0072_0_receptor_fixed/Mpro-x0072_0_receptor_fixed': Directory not empty
1 molecule converted


The ligand file has been saved as molecules_bx_2024_06_18_154410_4.pdbqt
adding gasteiger charges to peptide
The protein file Mpro-x0072_0_receptor_fixed.pdb has been converted to pdbqt format and saved in Mpro-x0072_0_receptor_fixed.pdbqt
Mpro-x0072: Estimated Free Energy of Binding = 25.424 kcal/mol
Output saved as Mpro-x0072_0_receptor_fixed_molecules_bx_2024_06_18_154410_4.out

Data saved to vina_docking_data_playground.csv
Data saved in data/molecules_bx_2024_06_18_154410_4_with_Mpro-x0072_0_receptor_fixed
The ligand file has been converted to molecules_bx_2024_06_18_154410_5.pdb


cp: -r not specified; omitting directory 'Mpro-x0072_0_receptor_fixed'
mv: cannot move '/home/ian/msc_project/docking/Mpro-x0072_0_receptor_fixed' to 'data/molecules_bx_2024_06_18_154410_4_with_Mpro-x0072_0_receptor_fixed/Mpro-x0072_0_receptor_fixed': Directory not empty
1 molecule converted


The ligand file has been saved as molecules_bx_2024_06_18_154410_5.pdbqt
adding gasteiger charges to peptide
The protein file Mpro-x0072_0_receptor_fixed.pdb has been converted to pdbqt format and saved in Mpro-x0072_0_receptor_fixed.pdbqt
Mpro-x0072: Estimated Free Energy of Binding = 16.036 kcal/mol
Output saved as Mpro-x0072_0_receptor_fixed_molecules_bx_2024_06_18_154410_5.out

Data saved to vina_docking_data_playground.csv
Data saved in data/molecules_bx_2024_06_18_154410_5_with_Mpro-x0072_0_receptor_fixed
The ligand file has been converted to molecules_bx_2024_06_18_154410_6.pdb


cp: -r not specified; omitting directory 'Mpro-x0072_0_receptor_fixed'
mv: cannot move '/home/ian/msc_project/docking/Mpro-x0072_0_receptor_fixed' to 'data/molecules_bx_2024_06_18_154410_5_with_Mpro-x0072_0_receptor_fixed/Mpro-x0072_0_receptor_fixed': Directory not empty
1 molecule converted


The ligand file has been saved as molecules_bx_2024_06_18_154410_6.pdbqt
adding gasteiger charges to peptide
The protein file Mpro-x0072_0_receptor_fixed.pdb has been converted to pdbqt format and saved in Mpro-x0072_0_receptor_fixed.pdbqt
Mpro-x0072: Estimated Free Energy of Binding = 18.963 kcal/mol
Output saved as Mpro-x0072_0_receptor_fixed_molecules_bx_2024_06_18_154410_6.out

Data saved to vina_docking_data_playground.csv
Data saved in data/molecules_bx_2024_06_18_154410_6_with_Mpro-x0072_0_receptor_fixed
The ligand file has been converted to molecules_bx_2024_06_18_154410_7.pdb


cp: -r not specified; omitting directory 'Mpro-x0072_0_receptor_fixed'
mv: cannot move '/home/ian/msc_project/docking/Mpro-x0072_0_receptor_fixed' to 'data/molecules_bx_2024_06_18_154410_6_with_Mpro-x0072_0_receptor_fixed/Mpro-x0072_0_receptor_fixed': Directory not empty
1 molecule converted


The ligand file has been saved as molecules_bx_2024_06_18_154410_7.pdbqt
adding gasteiger charges to peptide
The protein file Mpro-x0072_0_receptor_fixed.pdb has been converted to pdbqt format and saved in Mpro-x0072_0_receptor_fixed.pdbqt
Mpro-x0072: Estimated Free Energy of Binding = 28.215 kcal/mol
Output saved as Mpro-x0072_0_receptor_fixed_molecules_bx_2024_06_18_154410_7.out

Data saved to vina_docking_data_playground.csv
Data saved in data/molecules_bx_2024_06_18_154410_7_with_Mpro-x0072_0_receptor_fixed
The ligand file has been converted to molecules_bx_2024_06_18_154410_8.pdb


cp: -r not specified; omitting directory 'Mpro-x0072_0_receptor_fixed'
mv: cannot move '/home/ian/msc_project/docking/Mpro-x0072_0_receptor_fixed' to 'data/molecules_bx_2024_06_18_154410_7_with_Mpro-x0072_0_receptor_fixed/Mpro-x0072_0_receptor_fixed': Directory not empty
1 molecule converted


The ligand file has been saved as molecules_bx_2024_06_18_154410_8.pdbqt
adding gasteiger charges to peptide
The protein file Mpro-x0072_0_receptor_fixed.pdb has been converted to pdbqt format and saved in Mpro-x0072_0_receptor_fixed.pdbqt
Mpro-x0072: Estimated Free Energy of Binding = 24.821 kcal/mol
Output saved as Mpro-x0072_0_receptor_fixed_molecules_bx_2024_06_18_154410_8.out

Data saved to vina_docking_data_playground.csv
Data saved in data/molecules_bx_2024_06_18_154410_8_with_Mpro-x0072_0_receptor_fixed
The ligand file has been converted to molecules_bx_2024_06_18_154410_9.pdb


cp: -r not specified; omitting directory 'Mpro-x0072_0_receptor_fixed'
mv: cannot move '/home/ian/msc_project/docking/Mpro-x0072_0_receptor_fixed' to 'data/molecules_bx_2024_06_18_154410_8_with_Mpro-x0072_0_receptor_fixed/Mpro-x0072_0_receptor_fixed': Directory not empty
1 molecule converted


The ligand file has been saved as molecules_bx_2024_06_18_154410_9.pdbqt
adding gasteiger charges to peptide
The protein file Mpro-x0072_0_receptor_fixed.pdb has been converted to pdbqt format and saved in Mpro-x0072_0_receptor_fixed.pdbqt
Mpro-x0072: Estimated Free Energy of Binding = 26.111 kcal/mol
Output saved as Mpro-x0072_0_receptor_fixed_molecules_bx_2024_06_18_154410_9.out

Data saved to vina_docking_data_playground.csv
Data saved in data/molecules_bx_2024_06_18_154410_9_with_Mpro-x0072_0_receptor_fixed
The ligand file has been converted to molecules_bx_2024_06_18_154410_10.pdb


cp: -r not specified; omitting directory 'Mpro-x0072_0_receptor_fixed'
mv: cannot move '/home/ian/msc_project/docking/Mpro-x0072_0_receptor_fixed' to 'data/molecules_bx_2024_06_18_154410_9_with_Mpro-x0072_0_receptor_fixed/Mpro-x0072_0_receptor_fixed': Directory not empty
1 molecule converted


The ligand file has been saved as molecules_bx_2024_06_18_154410_10.pdbqt
adding gasteiger charges to peptide
The protein file Mpro-x0072_0_receptor_fixed.pdb has been converted to pdbqt format and saved in Mpro-x0072_0_receptor_fixed.pdbqt
Mpro-x0072: Estimated Free Energy of Binding = 25.968 kcal/mol
Output saved as Mpro-x0072_0_receptor_fixed_molecules_bx_2024_06_18_154410_10.out

Data saved to vina_docking_data_playground.csv
Data saved in data/molecules_bx_2024_06_18_154410_10_with_Mpro-x0072_0_receptor_fixed
The ligand file has been converted to molecules_bx_2024_06_18_154410_11.pdb


cp: -r not specified; omitting directory 'Mpro-x0072_0_receptor_fixed'
mv: cannot move '/home/ian/msc_project/docking/Mpro-x0072_0_receptor_fixed' to 'data/molecules_bx_2024_06_18_154410_10_with_Mpro-x0072_0_receptor_fixed/Mpro-x0072_0_receptor_fixed': Directory not empty
1 molecule converted


The ligand file has been saved as molecules_bx_2024_06_18_154410_11.pdbqt
adding gasteiger charges to peptide
The protein file Mpro-x0072_0_receptor_fixed.pdb has been converted to pdbqt format and saved in Mpro-x0072_0_receptor_fixed.pdbqt
Mpro-x0072: Estimated Free Energy of Binding = 28.271 kcal/mol
Output saved as Mpro-x0072_0_receptor_fixed_molecules_bx_2024_06_18_154410_11.out

Data saved to vina_docking_data_playground.csv
Data saved in data/molecules_bx_2024_06_18_154410_11_with_Mpro-x0072_0_receptor_fixed
The ligand file has been converted to molecules_bx_2024_06_18_154410_12.pdb


cp: -r not specified; omitting directory 'Mpro-x0072_0_receptor_fixed'
mv: cannot move '/home/ian/msc_project/docking/Mpro-x0072_0_receptor_fixed' to 'data/molecules_bx_2024_06_18_154410_11_with_Mpro-x0072_0_receptor_fixed/Mpro-x0072_0_receptor_fixed': Directory not empty
1 molecule converted


The ligand file has been saved as molecules_bx_2024_06_18_154410_12.pdbqt
adding gasteiger charges to peptide
The protein file Mpro-x0072_0_receptor_fixed.pdb has been converted to pdbqt format and saved in Mpro-x0072_0_receptor_fixed.pdbqt
Mpro-x0072: Estimated Free Energy of Binding = 26.477 kcal/mol
Output saved as Mpro-x0072_0_receptor_fixed_molecules_bx_2024_06_18_154410_12.out

Data saved to vina_docking_data_playground.csv
Data saved in data/molecules_bx_2024_06_18_154410_12_with_Mpro-x0072_0_receptor_fixed
The ligand file has been converted to molecules_bx_2024_06_18_154410_13.pdb


cp: -r not specified; omitting directory 'Mpro-x0072_0_receptor_fixed'
mv: cannot move '/home/ian/msc_project/docking/Mpro-x0072_0_receptor_fixed' to 'data/molecules_bx_2024_06_18_154410_12_with_Mpro-x0072_0_receptor_fixed/Mpro-x0072_0_receptor_fixed': Directory not empty
1 molecule converted


The ligand file has been saved as molecules_bx_2024_06_18_154410_13.pdbqt
adding gasteiger charges to peptide
The protein file Mpro-x0072_0_receptor_fixed.pdb has been converted to pdbqt format and saved in Mpro-x0072_0_receptor_fixed.pdbqt
Mpro-x0072: Estimated Free Energy of Binding = 17.754 kcal/mol
Output saved as Mpro-x0072_0_receptor_fixed_molecules_bx_2024_06_18_154410_13.out

Data saved to vina_docking_data_playground.csv
Data saved in data/molecules_bx_2024_06_18_154410_13_with_Mpro-x0072_0_receptor_fixed
The ligand file has been converted to molecules_bx_2024_06_18_154410_14.pdb


cp: -r not specified; omitting directory 'Mpro-x0072_0_receptor_fixed'
mv: cannot move '/home/ian/msc_project/docking/Mpro-x0072_0_receptor_fixed' to 'data/molecules_bx_2024_06_18_154410_13_with_Mpro-x0072_0_receptor_fixed/Mpro-x0072_0_receptor_fixed': Directory not empty
1 molecule converted


The ligand file has been saved as molecules_bx_2024_06_18_154410_14.pdbqt
adding gasteiger charges to peptide
The protein file Mpro-x0072_0_receptor_fixed.pdb has been converted to pdbqt format and saved in Mpro-x0072_0_receptor_fixed.pdbqt
Mpro-x0072: Estimated Free Energy of Binding = 25.664 kcal/mol
Output saved as Mpro-x0072_0_receptor_fixed_molecules_bx_2024_06_18_154410_14.out

Data saved to vina_docking_data_playground.csv
Data saved in data/molecules_bx_2024_06_18_154410_14_with_Mpro-x0072_0_receptor_fixed


cp: -r not specified; omitting directory 'Mpro-x0072_0_receptor_fixed'
mv: cannot move '/home/ian/msc_project/docking/Mpro-x0072_0_receptor_fixed' to 'data/molecules_bx_2024_06_18_154410_14_with_Mpro-x0072_0_receptor_fixed/Mpro-x0072_0_receptor_fixed': Directory not empty


# Testing: OE Pipeline

<font color='yellow'>Testing with complexes produced from splitting PDB IDs</font>

In [35]:
pdb_ids = ['6R8O',
           '5S9L',
           '1KLT',
           '1A42',
           '4B3D', # Failed in vina
           '3GVU',
           '6C7Q',
           '4O61', # Failed ADFR    
           ]
for pdb_id in tqdm(pdb_ids):
    # download 
    filename = download_pdb_file(pdb_id)
    # split the protein and ligand from the PDB file
    complex_file, receptor_file, ligand_file = pdb_to_prot_lig(pdb_id, filename, preserve_water=True,preserve_metal=True)
    # rest of pipeline
    oe_process_lig_prot(ligand_file, receptor_file, complex_file, preserve_water=True, preserve_metal=True, csv_out_file='OE_w_water_data.csv')

  0%|          | 0/8 [00:00<?, ?it/s]

Downloading PDB structure '6r8o'...
Extracting ligand from 6R8O.pdb...
Reading receptor from 6R8O.pdb...


DPI: 0.05, RFree: 0.18, Resolution: 1.36
Processing BU # 1 with title: PEPTIDYL-PROLYL CIS-TRANS ISOMERASE F, MITOCHONDRIA, chains A


The complex file has been saved as 6R8O.pdb
The receptor file has been saved as 6R8O_receptor.pdb
The ligand file has been saved as 6R8O_ligand.pdb
Preparing the ligand and protein files using OpenEye Toolkits...


DPI: 0.05, RFree: 0.18, Resolution: 1.36
Processing BU # 1 with title: PEPTIDYL-PROLYL CIS-TRANS ISOMERASE F, MITOCHONDRIA, chains A, alt: A
   Falling back to charging protein with OEMMFF94Charges
Processing BU # 2 with title: PEPTIDYL-PROLYL CIS-TRANS ISOMERASE F, MITOCHONDRIA, chains A, alt: B
   Falling back to charging protein with OEMMFF94Charges
Skipping redundant DU with alts outside the site of interest, renaming existing to collapse alts
Discarding redundant alt DU with title PEPTIDYL-PROLYL CIS-TRANS ISOMERASE F, MITOCHONDRIA(A)altB > JV2(A-304)


Design unit was successfully made for 6R8O.pdb, output is saved to ['6R8O_DU_0.oedu'].
The receptor design unit file has been saved as 6R8O_DU_0_receptor.oedu
cleaning JV2(A-304)
docking JV2(A-304)


mv: cannot move '6R8O_ligand_6R8O_receptor' to a subdirectory of itself, '/home/ian/msc_project/oe_docking_leakypdb_tmp/6R8O_ligand_6R8O_receptor/6R8O_ligand_6R8O_receptor'
mv: cannot move '6R8O_ligand_6R8O_receptor' to a subdirectory of itself, '/home/ian/msc_project/oe_docking_leakypdb_tmp/6R8O_ligand_6R8O_receptor/6R8O_ligand_6R8O_receptor'
mv: cannot move '6R8O_ligand_6R8O_receptor' to a subdirectory of itself, '/home/ian/msc_project/oe_docking_leakypdb_tmp/6R8O_ligand_6R8O_receptor/6R8O_ligand_6R8O_receptor'
 12%|█▎        | 1/8 [01:55<13:30, 115.75s/it]

The docked ligand file has been saved as ['6R8O_DU_0_receptor_6R8O_ligand_docked.sdf']
Output saved as 6R8O_receptor_6R8O_ligand_0.out

Data saved to OE_w_water_data.csv
Docked ligand 6R8O_DU_0_receptor_6R8O_ligand_docked.sdf, molecule 1: Chemgauss4 score = -15.09
Data saved in ./data//home/ian/msc_project/oe_docking_leakypdb_tmp/6R8O_ligand_6R8O_receptor
Downloading PDB structure '5s9l'...


DPI: 0.12, RFree: 0.19, Resolution: 1.90


Extracting ligand from 5S9L.pdb...
Reading receptor from 5S9L.pdb...


Processing BU # 1 with title: ISOFORM 2 OF ECTONUCLEOTIDE, chains AB
Found unresolved N-terminal with 23 residues before TRP 51   A 1  , with sequence FTASRIKRAEWDEGPPTVLSDSP
Found 4 residue gap between LYS 463   A 1   and CYS 468   A 1  , with sequence PSGK
Found 2 residue gap between GLU 573   A 1   and ASN 576   A 1  , with sequence PK
Found unresolved C-terminal with 12 residues after GLU 861   A 1  , with sequence IGGRHHHHHHHH
   Falling back to charging protein with OEMMFF94Charges


The complex file has been saved as 5S9L.pdb
The receptor file has been saved as 5S9L_receptor.pdb
The ligand file has been saved as 5S9L_ligand.pdb
Preparing the ligand and protein files using OpenEye Toolkits...


DPI: 0.12, RFree: 0.19, Resolution: 1.90
Processing BU # 1 with title: ISOFORM 2 OF ECTONUCLEOTIDE, chains AB, alt: A
Found unresolved N-terminal with 23 residues before TRP 51   A 1  , with sequence FTASRIKRAEWDEGPPTVLSDSP
Found 4 residue gap between LYS 463   A 1   and CYS 468   A 1  , with sequence PSGK
Found 2 residue gap between GLU 573   A 1   and ASN 576   A 1  , with sequence PK
Found unresolved C-terminal with 12 residues after GLU 861   A 1  , with sequence IGGRHHHHHHHH
   Falling back to charging protein with OEMMFF94Charges


Design unit was successfully made for 5S9L.pdb, output is saved to ['5S9L_DU_0.oedu'].
The receptor design unit file has been saved as 5S9L_DU_0_receptor.oedu
cleaning 6ZO(A-906)
docking 6ZO(A-906)


mv: cannot move '5S9L_ligand_5S9L_receptor' to a subdirectory of itself, '/home/ian/msc_project/oe_docking_leakypdb_tmp/5S9L_ligand_5S9L_receptor/5S9L_ligand_5S9L_receptor'
mv: cannot move '5S9L_ligand_5S9L_receptor' to a subdirectory of itself, '/home/ian/msc_project/oe_docking_leakypdb_tmp/5S9L_ligand_5S9L_receptor/5S9L_ligand_5S9L_receptor'
mv: cannot move '5S9L_ligand_5S9L_receptor' to a subdirectory of itself, '/home/ian/msc_project/oe_docking_leakypdb_tmp/5S9L_ligand_5S9L_receptor/5S9L_ligand_5S9L_receptor'
 25%|██▌       | 2/8 [03:24<09:59, 99.88s/it] 

The docked ligand file has been saved as ['5S9L_DU_0_receptor_5S9L_ligand_docked.sdf']
Output saved as 5S9L_receptor_5S9L_ligand_0.out

Data saved to OE_w_water_data.csv
Docked ligand 5S9L_DU_0_receptor_5S9L_ligand_docked.sdf, molecule 1: Chemgauss4 score = -14.36
Data saved in ./data//home/ian/msc_project/oe_docking_leakypdb_tmp/5S9L_ligand_5S9L_receptor
Downloading PDB structure '1klt'...
Extracting ligand from 1KLT.pdb...
Reading receptor from 1KLT.pdb...


DPI: 0.22, RFree: 0.26, Resolution: 1.90
Processing BU # 1 with title: CHYMASE, chains A


The complex file has been saved as 1KLT.pdb
The receptor file has been saved as 1KLT_receptor.pdb
The ligand file has been saved as 1KLT_ligand.pdb
Preparing the ligand and protein files using OpenEye Toolkits...


DPI: 0.22, RFree: 0.26, Resolution: 1.90
Processing BU # 1 with title: CHYMASE, chains A


Design unit was successfully made for 1KLT.pdb, output is saved to ['1KLT_DU_0.oedu'].
The receptor design unit file has been saved as 1KLT_DU_0_receptor.oedu
cleaning PMS(A-400)
docking PMS(A-400)


mv: cannot move '1KLT_ligand_1KLT_receptor' to a subdirectory of itself, '/home/ian/msc_project/oe_docking_leakypdb_tmp/1KLT_ligand_1KLT_receptor/1KLT_ligand_1KLT_receptor'
mv: cannot move '1KLT_ligand_1KLT_receptor' to a subdirectory of itself, '/home/ian/msc_project/oe_docking_leakypdb_tmp/1KLT_ligand_1KLT_receptor/1KLT_ligand_1KLT_receptor'
mv: cannot move '1KLT_ligand_1KLT_receptor' to a subdirectory of itself, '/home/ian/msc_project/oe_docking_leakypdb_tmp/1KLT_ligand_1KLT_receptor/1KLT_ligand_1KLT_receptor'
 38%|███▊      | 3/8 [04:25<06:49, 81.99s/it]

The docked ligand file has been saved as ['1KLT_DU_0_receptor_1KLT_ligand_docked.sdf']
Output saved as 1KLT_receptor_1KLT_ligand_0.out

Data saved to OE_w_water_data.csv
Docked ligand 1KLT_DU_0_receptor_1KLT_ligand_docked.sdf, molecule 1: Chemgauss4 score = -7.52
Data saved in ./data//home/ian/msc_project/oe_docking_leakypdb_tmp/1KLT_ligand_1KLT_receptor
Downloading PDB structure '1a42'...
Extracting ligand from 1A42.pdb...
Reading receptor from 1A42.pdb...


DPI: 0.00, RFree: 0.00, Resolution: 2.25
Processing BU # 1 with title: CARBONIC ANHYDRASE II, chains A
Found unresolved N-terminal with 2 residues before HIS 4   A 1  , with sequence SH
Found unresolved C-terminal with 1 residues after PHE 260   A 1  , with sequence K


The complex file has been saved as 1A42.pdb
The receptor file has been saved as 1A42_receptor.pdb
The ligand file has been saved as 1A42_ligand.pdb
Preparing the ligand and protein files using OpenEye Toolkits...


DPI: 0.00, RFree: 0.00, Resolution: 2.25
Processing BU # 1 with title: CARBONIC ANHYDRASE II, chains A
Found unresolved N-terminal with 2 residues before HIS 4   A 1  , with sequence SH
Found unresolved C-terminal with 1 residues after PHE 260   A 1  , with sequence K


Design unit was successfully made for 1A42.pdb, output is saved to ['1A42_DU_0.oedu'].
The receptor design unit file has been saved as 1A42_DU_0_receptor.oedu
cleaning BZU(A-555)
docking BZU(A-555)


mv: cannot move '1A42_ligand_1A42_receptor' to a subdirectory of itself, '/home/ian/msc_project/oe_docking_leakypdb_tmp/1A42_ligand_1A42_receptor/1A42_ligand_1A42_receptor'
mv: cannot move '1A42_ligand_1A42_receptor' to a subdirectory of itself, '/home/ian/msc_project/oe_docking_leakypdb_tmp/1A42_ligand_1A42_receptor/1A42_ligand_1A42_receptor'
mv: cannot move '1A42_ligand_1A42_receptor' to a subdirectory of itself, '/home/ian/msc_project/oe_docking_leakypdb_tmp/1A42_ligand_1A42_receptor/1A42_ligand_1A42_receptor'
 50%|█████     | 4/8 [05:08<04:27, 66.76s/it]

The docked ligand file has been saved as ['1A42_DU_0_receptor_1A42_ligand_docked.sdf']
Output saved as 1A42_receptor_1A42_ligand_0.out

Data saved to OE_w_water_data.csv
Docked ligand 1A42_DU_0_receptor_1A42_ligand_docked.sdf, molecule 1: Chemgauss4 score = -9.89
Data saved in ./data//home/ian/msc_project/oe_docking_leakypdb_tmp/1A42_ligand_1A42_receptor
Downloading PDB structure '4b3d'...


DPI: 0.09, RFree: 0.21, Resolution: 1.59
Processing BU # 1 with title: DNA REPAIR AND RECOMBINATION PROTEIN RADA, chains A
Found unresolved N-terminal with 1 residues before ALA 108   A 1  , with sequence M
Found 11 residue gap between VAL 285   A 1   and ALA 309   A 1  , with sequence QANGGHILAHS
Found 6 residue gap between ASP 329   A 1   and GLY 336   A 1  , with sequence APHLPE
   Falling back to charging protein with OEMMFF94Charges
Processing BU # 2 with title: DNA REPAIR AND RECOMBINATION PROTEIN RADA, chains C
Found unresolved N-terminal with 1 residues before ALA 108   C 2  , with sequence M
Found 10 residue gap between VAL 285   C 2   and SER 308   C 2  , with sequence QANGGHILAH
Found 5 residue gap between ASP 329   C 2   and GLU 335   C 2  , with sequence APHLP
   Falling back to charging protein with OEMMFF94Charges


Extracting ligand from 4B3D.pdb...
Reading receptor from 4B3D.pdb...
The complex file has been saved as 4B3D.pdb
The receptor file has been saved as 4B3D_receptor.pdb
The ligand file has been saved as 4B3D_ligand.pdb
Preparing the ligand and protein files using OpenEye Toolkits...


DPI: 0.09, RFree: 0.21, Resolution: 1.59
Processing BU # 1 with title: DNA REPAIR AND RECOMBINATION PROTEIN RADA, chains A, alt: A
Found unresolved N-terminal with 1 residues before ALA 108   A 1  , with sequence M
Found 11 residue gap between VAL 285   A 1   and ALA 309   A 1  , with sequence QANGGHILAHS
Found 6 residue gap between ASP 329   A 1   and GLY 336   A 1  , with sequence APHLPE
Cap on GLY 336   A 1   clashed with solvent, removing clashing solvent
  Removing HOH 2338   A 3  
Cap on VAL 285   A 1   clashed with solvent, removing clashing solvent
  Removing HOH 2308   A 3  
Cap on ASP 329   A 1   clashed with solvent, removing clashing solvent
  Removing HOH 2334   A 3  
   Falling back to charging protein with OEMMFF94Charges
Processing BU # 2 with title: DNA REPAIR AND RECOMBINATION PROTEIN RADA, chains A, alt: B
Found unresolved N-terminal with 1 residues before ALA 108   A 1  , with sequence M
Found 11 residue gap between VAL 285   A 1   and ALA 309   A 1  , with sequence

Design unit was successfully made for 4B3D.pdb, output is saved to ['4B3D_DU_0.oedu', '4B3D_DU_1.oedu', '4B3D_DU_2.oedu', '4B3D_DU_3.oedu', '4B3D_DU_4.oedu', '4B3D_DU_5.oedu', '4B3D_DU_6.oedu', '4B3D_DU_7.oedu'].
The receptor design unit file has been saved as 4B3D_DU_0_receptor.oedu
The receptor design unit file has been saved as 4B3D_DU_1_receptor.oedu
The receptor design unit file has been saved as 4B3D_DU_2_receptor.oedu
The receptor design unit file has been saved as 4B3D_DU_3_receptor.oedu
The receptor design unit file has been saved as 4B3D_DU_4_receptor.oedu
The receptor design unit file has been saved as 4B3D_DU_5_receptor.oedu
The receptor design unit file has been saved as 4B3D_DU_6_receptor.oedu
The receptor design unit file has been saved as 4B3D_DU_7_receptor.oedu
cleaning 5MI(A-1351)
docking 5MI(A-1351)
cleaning 5MI(A-1351)
docking 5MI(A-1351)
cleaning 5MI(A-1351)
docking 5MI(A-1351)
cleaning 5MI(A-1351)
docking 5MI(A-1351)
cleaning 5MI(A-1351)
docking 5MI(A-1351)
cleani

mv: cannot move '4B3D_ligand_4B3D_receptor' to a subdirectory of itself, '/home/ian/msc_project/oe_docking_leakypdb_tmp/4B3D_ligand_4B3D_receptor/4B3D_ligand_4B3D_receptor'
mv: cannot move '4B3D_ligand_4B3D_receptor' to a subdirectory of itself, '/home/ian/msc_project/oe_docking_leakypdb_tmp/4B3D_ligand_4B3D_receptor/4B3D_ligand_4B3D_receptor'
mv: cannot move '4B3D_ligand_4B3D_receptor' to a subdirectory of itself, '/home/ian/msc_project/oe_docking_leakypdb_tmp/4B3D_ligand_4B3D_receptor/4B3D_ligand_4B3D_receptor'
 62%|██████▎   | 5/8 [06:56<04:04, 81.50s/it]

The docked ligand file has been saved as ['4B3D_DU_0_receptor_4B3D_ligand_docked.sdf', '4B3D_DU_1_receptor_4B3D_ligand_docked.sdf', '4B3D_DU_2_receptor_4B3D_ligand_docked.sdf', '4B3D_DU_3_receptor_4B3D_ligand_docked.sdf', '4B3D_DU_4_receptor_4B3D_ligand_docked.sdf', '4B3D_DU_5_receptor_4B3D_ligand_docked.sdf', '4B3D_DU_6_receptor_4B3D_ligand_docked.sdf', '4B3D_DU_7_receptor_4B3D_ligand_docked.sdf']
Output saved as 4B3D_receptor_4B3D_ligand_0.out

Data saved to OE_w_water_data.csv
Docked ligand 4B3D_DU_0_receptor_4B3D_ligand_docked.sdf, molecule 1: Chemgauss4 score = -7.92
Output saved as 4B3D_receptor_4B3D_ligand_0.out

Data saved to OE_w_water_data.csv
Docked ligand 4B3D_DU_1_receptor_4B3D_ligand_docked.sdf, molecule 1: Chemgauss4 score = -7.86
Output saved as 4B3D_receptor_4B3D_ligand_0.out

Data saved to OE_w_water_data.csv
Docked ligand 4B3D_DU_2_receptor_4B3D_ligand_docked.sdf, molecule 1: Chemgauss4 score = -8.09
Output saved as 4B3D_receptor_4B3D_ligand_0.out

Data saved to OE_w

DPI: 0.15, RFree: 0.23, Resolution: 2.05
Processing BU # 1 with title: TYROSINE-PROTEIN KINASE ABL2, chains A
   Falling back to charging protein with OEMMFF94Charges


The complex file has been saved as 3GVU.pdb
The receptor file has been saved as 3GVU_receptor.pdb
The ligand file has been saved as 3GVU_ligand.pdb
Preparing the ligand and protein files using OpenEye Toolkits...


DPI: 0.15, RFree: 0.23, Resolution: 2.05
Processing BU # 1 with title: TYROSINE-PROTEIN KINASE ABL2, chains A, alt: A
   Falling back to charging protein with OEMMFF94Charges
Processing BU # 2 with title: TYROSINE-PROTEIN KINASE ABL2, chains A, alt: B
   Falling back to charging protein with OEMMFF94Charges
Skipping redundant DU with alts outside the site of interest, renaming existing to collapse alts
Discarding redundant alt DU with title TYROSINE-PROTEIN KINASE ABL2(A)altB > STI(A-1001)
Skipping redundant DU with alts outside the site of interest, renaming existing to collapse alts
Discarding redundant alt DU with title TYROSINE-PROTEIN KINASE ABL2(A)altB > STI(A-1002)
Superposition - RMSD: 0.00, Ref: , Fit: , SeqScore: 1446


Design unit was successfully made for 3GVU.pdb, output is saved to ['3GVU_DU_0.oedu', '3GVU_DU_1.oedu'].
The receptor design unit file has been saved as 3GVU_DU_0_receptor.oedu
The receptor design unit file has been saved as 3GVU_DU_1_receptor.oedu
cleaning STI(A-1001)
docking STI(A-1001)
cleaning STI(A-1001)
docking STI(A-1001)


mv: cannot move '3GVU_ligand_3GVU_receptor' to a subdirectory of itself, '/home/ian/msc_project/oe_docking_leakypdb_tmp/3GVU_ligand_3GVU_receptor/3GVU_ligand_3GVU_receptor'
mv: cannot move '3GVU_ligand_3GVU_receptor' to a subdirectory of itself, '/home/ian/msc_project/oe_docking_leakypdb_tmp/3GVU_ligand_3GVU_receptor/3GVU_ligand_3GVU_receptor'
mv: cannot move '3GVU_ligand_3GVU_receptor' to a subdirectory of itself, '/home/ian/msc_project/oe_docking_leakypdb_tmp/3GVU_ligand_3GVU_receptor/3GVU_ligand_3GVU_receptor'
 75%|███████▌  | 6/8 [09:13<03:21, 100.55s/it]

The docked ligand file has been saved as ['3GVU_DU_0_receptor_3GVU_ligand_docked.sdf', '3GVU_DU_1_receptor_3GVU_ligand_docked.sdf']
Output saved as 3GVU_receptor_3GVU_ligand_0.out

Data saved to OE_w_water_data.csv
Docked ligand 3GVU_DU_0_receptor_3GVU_ligand_docked.sdf, molecule 1: Chemgauss4 score = -25.99
Output saved as 3GVU_receptor_3GVU_ligand_0.out

Data saved to OE_w_water_data.csv
Docked ligand 3GVU_DU_1_receptor_3GVU_ligand_docked.sdf, molecule 1: Chemgauss4 score = -5.08
Data saved in ./data//home/ian/msc_project/oe_docking_leakypdb_tmp/3GVU_ligand_3GVU_receptor
Downloading PDB structure '6c7q'...
Extracting ligand from 6C7Q.pdb...
Reading receptor from 6C7Q.pdb...


DPI: 0.08, RFree: 0.23, Resolution: 1.51
Processing BU # 1 with title: BROMODOMAIN-CONTAINING PROTEIN 4, chains A
Found unresolved N-terminal with 19 residues before LYS 349   A 1  , with sequence SNAKDVPDSQQHPAPEKSS


The complex file has been saved as 6C7Q.pdb
The receptor file has been saved as 6C7Q_receptor.pdb
The ligand file has been saved as 6C7Q_ligand.pdb
Preparing the ligand and protein files using OpenEye Toolkits...


DPI: 0.08, RFree: 0.23, Resolution: 1.51
Processing BU # 1 with title: BROMODOMAIN-CONTAINING PROTEIN 4, chains A, alt: A
Found unresolved N-terminal with 19 residues before LYS 349   A 1  , with sequence SNAKDVPDSQQHPAPEKSS
Processing BU # 2 with title: BROMODOMAIN-CONTAINING PROTEIN 4, chains A, alt: B
Found unresolved N-terminal with 19 residues before LYS 349   A 1  , with sequence SNAKDVPDSQQHPAPEKSS
Superposition - RMSD: 0.00, Ref: A, Fit: A, SeqScore: 575


Design unit was successfully made for 6C7Q.pdb, output is saved to ['6C7Q_DU_0.oedu', '6C7Q_DU_1.oedu'].
The receptor design unit file has been saved as 6C7Q_DU_0_receptor.oedu
The receptor design unit file has been saved as 6C7Q_DU_1_receptor.oedu
cleaning EO1(A-501)
docking EO1(A-501)
cleaning EO1(A-501)
docking EO1(A-501)


mv: cannot move '6C7Q_ligand_6C7Q_receptor' to a subdirectory of itself, '/home/ian/msc_project/oe_docking_leakypdb_tmp/6C7Q_ligand_6C7Q_receptor/6C7Q_ligand_6C7Q_receptor'
mv: cannot move '6C7Q_ligand_6C7Q_receptor' to a subdirectory of itself, '/home/ian/msc_project/oe_docking_leakypdb_tmp/6C7Q_ligand_6C7Q_receptor/6C7Q_ligand_6C7Q_receptor'
mv: cannot move '6C7Q_ligand_6C7Q_receptor' to a subdirectory of itself, '/home/ian/msc_project/oe_docking_leakypdb_tmp/6C7Q_ligand_6C7Q_receptor/6C7Q_ligand_6C7Q_receptor'
 88%|████████▊ | 7/8 [11:20<01:49, 109.12s/it]

The docked ligand file has been saved as ['6C7Q_DU_0_receptor_6C7Q_ligand_docked.sdf', '6C7Q_DU_1_receptor_6C7Q_ligand_docked.sdf']
Output saved as 6C7Q_receptor_6C7Q_ligand_0.out

Data saved to OE_w_water_data.csv
Docked ligand 6C7Q_DU_0_receptor_6C7Q_ligand_docked.sdf, molecule 1: Chemgauss4 score = -9.55
Output saved as 6C7Q_receptor_6C7Q_ligand_0.out

Data saved to OE_w_water_data.csv
Docked ligand 6C7Q_DU_1_receptor_6C7Q_ligand_docked.sdf, molecule 1: Chemgauss4 score = -9.65
Data saved in ./data//home/ian/msc_project/oe_docking_leakypdb_tmp/6C7Q_ligand_6C7Q_receptor
Downloading PDB structure '4o61'...


DPI: 0.14, RFree: 0.23, Resolution: 1.90
Processing BU # 1 with title: RNA DEMETHYLASE ALKBH5, chains A
Found unresolved N-terminal with 3 residues before LEU 76   A 1  , with sequence GQQ
Found 7 residue gap between TYR 141   A 1   and GLY 149   A 1  , with sequence GAQLQKR
Found unresolved C-terminal with 1 residues after GLU 293   A 1  , with sequence T
   Falling back to charging protein with OEMMFF94Charges

Extracting ligand from 4O61.pdb...
Reading receptor from 4O61.pdb...


es and radii to DesignUnit: RNA DEMETHYLASE ALKBH5(A)__DU__biounit
Processing BU # 2 with title: RNA DEMETHYLASE ALKBH5, chains B
Found unresolved N-terminal with 4 residues before GLN 77   B 2  , with sequence GQQL
Found 8 residue gap between THR 140   B 2   and GLY 149   B 2  , with sequence YGAQLQKR
Found unresolved C-terminal with 1 residues after GLU 293   B 2  , with sequence T
   Falling back to charging protein with OEMMFF94Charges


The complex file has been saved as 4O61.pdb
The receptor file has been saved as 4O61_receptor.pdb
The ligand file has been saved as 4O61_ligand.pdb
Preparing the ligand and protein files using OpenEye Toolkits...


DPI: 0.14, RFree: 0.23, Resolution: 1.90
Processing BU # 1 with title: RNA DEMETHYLASE ALKBH5, chains A, alt: A
Found unresolved N-terminal with 3 residues before LEU 76   A 1  , with sequence GQQ
Found 7 residue gap between TYR 141   A 1   and GLY 149   A 1  , with sequence GAQLQKR
Found unresolved C-terminal with 1 residues after GLU 293   A 1  , with sequence T
   Falling back to charging protein with OEMMFF94Charges
Processing BU # 2 with title: RNA DEMETHYLASE ALKBH5, chains A, alt: B
Found unresolved N-terminal with 3 residues before LEU 76   A 1  , with sequence GQQ
Found 7 residue gap between TYR 141   A 1   and GLY 149   A 1  , with sequence GAQLQKR
Found unresolved C-terminal with 1 residues after GLU 293   A 1  , with sequence T
   Falling back to charging protein with OEMMFF94Charges
Processing BU # 3 with title: RNA DEMETHYLASE ALKBH5, chains B, alt: A
Found unresolved N-terminal with 4 residues before GLN 77   B 2  , with sequence GQQL
Found 8 residue gap between THR 140 

Design unit was successfully made for 4O61.pdb, output is saved to ['4O61_DU_0.oedu', '4O61_DU_1.oedu'].
The receptor design unit file has been saved as 4O61_DU_0_receptor.oedu
The receptor design unit file has been saved as 4O61_DU_1_receptor.oedu
cleaning UNL(A-303)
Molecule is fragmented. Skipping.


mv: cannot move '4O61_ligand_4O61_receptor' to a subdirectory of itself, '/home/ian/msc_project/oe_docking_leakypdb_tmp/4O61_ligand_4O61_receptor/4O61_ligand_4O61_receptor'
mv: cannot move '4O61_ligand_4O61_receptor' to a subdirectory of itself, '/home/ian/msc_project/oe_docking_leakypdb_tmp/4O61_ligand_4O61_receptor/4O61_ligand_4O61_receptor'
mv: cannot move '4O61_ligand_4O61_receptor' to a subdirectory of itself, '/home/ian/msc_project/oe_docking_leakypdb_tmp/4O61_ligand_4O61_receptor/4O61_ligand_4O61_receptor'
100%|██████████| 8/8 [13:14<00:00, 99.35s/it] 

cleaning UNL(A-303)
Molecule is fragmented. Skipping.
The docked ligand file has been saved as ['4O61_DU_0_receptor_4O61_ligand_docked.sdf', '4O61_DU_1_receptor_4O61_ligand_docked.sdf']
Error in 4O61: File error: Invalid input file 4O61_DU_0_receptor_4O61_ligand_docked.sdf
4O61: No valid energy value available.
Skipping 4O61_ligand.pdb due to an error: File error: Invalid input file 4O61_DU_0_receptor_4O61_ligand_docked.sdf
Error message saved to OE_w_water_data.csv
Data saved in ./data//home/ian/msc_project/oe_docking_leakypdb_tmp/4O61_ligand_4O61_receptor





<font color='yellow'>Testing the pipeline using Mpro data</font>

In [37]:
prot_file, lig_file = pdb_to_prot_lig(pdb_id='Mpro-x0072_0', filename='Mpro-x0072_0.pdb')
complex_file='Mpro-x0072_0.pdb'
oe_process_lig_prot(lig_file, prot_file, complex_file, csv_out_file='oe_docking_data_playground.csv')



Fixed receptor file saved as: Mpro-x0072_0_receptor_fixed.pdb
Extracting ligand from Mpro-x0072_0.pdb...
Reading receptor from Mpro-x0072_0.pdb...
Failed to extract ligand from Mpro-x0072_0.pdb using the OpenEye Toolkit,trying a different script written using OESplitMolComplex...


DPI: 0.12, RFree: 0.23, Resolution: 1.65
Processing BU # 1 with title: ---, chains AB


The receptor file has been saved as Mpro-x0072_0_receptor_fixed.pdb
The ligand file has been saved as Mpro-x0072_0_ligand.pdb
Preparing the ligand and protein files using OpenEye Toolkits...


DPI: 0.12, RFree: 0.23, Resolution: 1.65
Processing BU # 1 with title: ---, chains AB, alt: A
Processing BU # 2 with title: ---, chains AB, alt: B
Skipping redundant DU with alts outside the site of interest, renaming existing to collapse alts
Discarding redundant alt DU with title ---(AB)altB > LIG(A-1101)
Skipping redundant DU with alts outside the site of interest, renaming existing to collapse alts
Discarding redundant alt DU with title ---(AB)altB > LIG(B-1101)
Superposition - RMSD: 0.00, Ref: , Fit: , SeqScore: 3102


Design unit was successfully made for Mpro-x0072_0.pdb, output is saved to ['Mpro-x0072_0_DU_0.oedu', 'Mpro-x0072_0_DU_1.oedu'].
The receptor design unit file has been saved as Mpro-x0072_0_DU_0_receptor.oedu
The receptor design unit file has been saved as Mpro-x0072_0_DU_1_receptor.oedu
cleaning LIG
docking LIG
cleaning LIG
docking LIG
The docked ligand file has been saved as ['Mpro-x0072_0_DU_0_ligand_docked.sdf', 'Mpro-x0072_0_DU_1_ligand_docked.sdf']
Output saved as Mpro-x0072_0_receptor_fixed_Mpro-x0072_0_ligand_0.out

Data saved to oe_docking_data_playground.csv
Docked ligand Mpro-x0072_0_DU_0_ligand_docked.sdf, molecule 1: Chemgauss4 score = -6.91
Output saved as Mpro-x0072_0_receptor_fixed_Mpro-x0072_0_ligand_0.out

Data saved to oe_docking_data_playground.csv
Docked ligand Mpro-x0072_0_DU_1_ligand_docked.sdf, molecule 1: Chemgauss4 score = -6.92
Data saved in data/Mpro-x0072_0_ligand_with_Mpro-x0072_0_receptor_fixed


cp: -r not specified; omitting directory 'Mpro-x0072_0_receptor_fixed'
mv: cannot move '/home/ian/msc_project/docking/6R8O' to 'data/Mpro-x0072_0_ligand_with_Mpro-x0072_0_receptor_fixed/6R8O': Directory not empty


<font color='yellow'>Trying out with diffusion model generated molecules</font>

In [42]:
os.system('cp 2024_06_18_154410/molecules_bx_2024_06_18_154410.sdf ./')
os.system('cp Mpro_complexes/Mpro-x0072_0.pdb ./')
# split the protein and ligand from the PDB file
receptor_file, ligand_file = pdb_to_prot_lig(pdb_id ='Mpro-x0072_0', filename = 'Mpro-x0072_0.pdb')
#--------
lig_file='molecules_bx_2024_06_18_154410.sdf' # this contains 14 molecules
prot_file='Mpro-x0072_0_receptor_fixed.pdb' # replace with a protein file of interest
complex_file='Mpro-x0072_0.pdb' # replace with a reference protein-ligand complex
oe_process_lig_prot(lig_file, prot_file, complex_file, csv_out_file='oe_docking_data_playground.csv')



Fixed receptor file saved as: Mpro-x0072_0_receptor_fixed.pdb
Extracting ligand from Mpro-x0072_0.pdb...
Reading receptor from Mpro-x0072_0.pdb...


DPI: 0.12, RFree: 0.23, Resolution: 1.65
Processing BU # 1 with title: ---, chains AB


Failed to extract ligand from Mpro-x0072_0.pdb using the OpenEye Toolkit,trying a different script written using OESplitMolComplex...
The receptor file has been saved as Mpro-x0072_0_receptor_fixed.pdb
The ligand file has been saved as Mpro-x0072_0_ligand.pdb
Preparing the ligand and protein files using OpenEye Toolkits...


DPI: 0.12, RFree: 0.23, Resolution: 1.65
Processing BU # 1 with title: ---, chains AB, alt: A
Processing BU # 2 with title: ---, chains AB, alt: B
Skipping redundant DU with alts outside the site of interest, renaming existing to collapse alts
Discarding redundant alt DU with title ---(AB)altB > LIG(A-1101)
Skipping redundant DU with alts outside the site of interest, renaming existing to collapse alts
Discarding redundant alt DU with title ---(AB)altB > LIG(B-1101)
Superposition - RMSD: 0.00, Ref: , Fit: , SeqScore: 3102


Design unit was successfully made for Mpro-x0072_0.pdb, output is saved to ['Mpro-x0072_0_DU_0.oedu', 'Mpro-x0072_0_DU_1.oedu'].
The receptor design unit file has been saved as Mpro-x0072_0_DU_0_receptor.oedu
The receptor design unit file has been saved as Mpro-x0072_0_DU_1_receptor.oedu
Skipping molecules_bx_2024_06_18_154410.sdf due to an error: expected str, bytes or os.PathLike object, not NoneType
Error message saved to oe_docking_data_playground.csv
Data saved in ./data//home/ian/msc_project/docking/molecules_bx_2024_06_18_154410_Mpro-x0072_0_receptor_fixed


mv: cannot move 'molecules_bx_2024_06_18_154410_Mpro-x0072_0_receptor_fixed' to a subdirectory of itself, '/home/ian/msc_project/docking/molecules_bx_2024_06_18_154410_Mpro-x0072_0_receptor_fixed/molecules_bx_2024_06_18_154410_Mpro-x0072_0_receptor_fixed'
mv: cannot move 'molecules_bx_2024_06_18_154410_Mpro-x0072_0_receptor_fixed' to a subdirectory of itself, '/home/ian/msc_project/docking/molecules_bx_2024_06_18_154410_Mpro-x0072_0_receptor_fixed/molecules_bx_2024_06_18_154410_Mpro-x0072_0_receptor_fixed'
mv: cannot move 'molecules_bx_2024_06_18_154410_Mpro-x0072_0_receptor_fixed' to a subdirectory of itself, '/home/ian/msc_project/docking/molecules_bx_2024_06_18_154410_Mpro-x0072_0_receptor_fixed/molecules_bx_2024_06_18_154410_Mpro-x0072_0_receptor_fixed'
mv: cannot move '/home/ian/msc_project/docking/molecules_bx_2024_06_18_154410_Mpro-x0072_0_receptor_fixed' to './data/molecules_bx_2024_06_18_154410_Mpro-x0072_0_receptor_fixed': Directory not empty


<font color='yellow'>Running the pipeline for lists</font>

In [47]:
ligand_files = ['Mpro-x0161_0_ligand.pdb', 
                'Mpro-x0072_0_ligand.pdb'] 
receptor_files = ['Mpro-x0072_0_receptor_fixed.pdb',
                 'Mpro-x0161_0_receptor_fixed.pdb']
complex_file = 'Mpro-x0072_0.pdb'
for ligand_file in tqdm(ligand_files):
    for receptor_file in receptor_files:
        oe_process_lig_prot(lig_file=ligand_file, 
                 prot_file=receptor_file,
                 complex_file = complex_file,
                 csv_out_file='oe_docking_data_playground.csv')

  0%|          | 0/2 [00:00<?, ?it/s]DPI: 0.12, RFree: 0.23, Resolution: 1.65


Preparing the ligand and protein files using OpenEye Toolkits...


Processing BU # 1 with title: ---, chains AB, alt: A
Processing BU # 2 with title: ---, chains AB, alt: B
Skipping redundant DU with alts outside the site of interest, renaming existing to collapse alts
Discarding redundant alt DU with title ---(AB)altB > LIG(A-1101)
Skipping redundant DU with alts outside the site of interest, renaming existing to collapse alts
Discarding redundant alt DU with title ---(AB)altB > LIG(B-1101)
Superposition - RMSD: 0.00, Ref: , Fit: , SeqScore: 3102


Design unit was successfully made for Mpro-x0072_0.pdb, output is saved to ['Mpro-x0072_0_DU_0.oedu', 'Mpro-x0072_0_DU_1.oedu'].
The receptor design unit file has been saved as Mpro-x0072_0_DU_0_receptor.oedu
The receptor design unit file has been saved as Mpro-x0072_0_DU_1_receptor.oedu
cleaning LIG(B-1101)
docking LIG(B-1101)
cleaning LIG(B-1101)
docking LIG(B-1101)
The docked ligand file has been saved as ['Mpro-x0072_0_DU_0_ligand_docked.sdf', 'Mpro-x0072_0_DU_1_ligand_docked.sdf']
Output saved as Mpro-x0072_0_receptor_fixed_Mpro-x0161_0_ligand_0.out

Data saved to oe_docking_data_playground.csv
Docked ligand Mpro-x0072_0_DU_0_ligand_docked.sdf, molecule 1: Chemgauss4 score = -5.89
Output saved as Mpro-x0072_0_receptor_fixed_Mpro-x0161_0_ligand_0.out

Data saved to oe_docking_data_playground.csv
Docked ligand Mpro-x0072_0_DU_1_ligand_docked.sdf, molecule 1: Chemgauss4 score = -5.97
Data saved in data/Mpro-x0161_0_ligand_with_Mpro-x0072_0_receptor_fixed
Preparing the ligand and prot

mv: cannot move '/home/ian/msc_project/docking/6R8O' to 'data/Mpro-x0161_0_ligand_with_Mpro-x0072_0_receptor_fixed/6R8O': Directory not empty
DPI: 0.12, RFree: 0.23, Resolution: 1.65
Processing BU # 1 with title: ---, chains AB, alt: A
Processing BU # 2 with title: ---, chains AB, alt: B
Skipping redundant DU with alts outside the site of interest, renaming existing to collapse alts
Discarding redundant alt DU with title ---(AB)altB > LIG(A-1101)
Skipping redundant DU with alts outside the site of interest, renaming existing to collapse alts
Discarding redundant alt DU with title ---(AB)altB > LIG(B-1101)
Superposition - RMSD: 0.00, Ref: , Fit: , SeqScore: 3102


Design unit was successfully made for Mpro-x0072_0.pdb, output is saved to ['Mpro-x0072_0_DU_0.oedu', 'Mpro-x0072_0_DU_1.oedu'].
The receptor design unit file has been saved as Mpro-x0072_0_DU_0_receptor.oedu
The receptor design unit file has been saved as Mpro-x0072_0_DU_1_receptor.oedu
cleaning LIG(B-1101)
docking LIG(B-1101)
cleaning LIG(B-1101)
docking LIG(B-1101)


 50%|█████     | 1/2 [02:24<02:24, 144.37s/it]

The docked ligand file has been saved as ['Mpro-x0072_0_DU_0_ligand_docked.sdf', 'Mpro-x0072_0_DU_1_ligand_docked.sdf']
Output saved as Mpro-x0161_0_receptor_fixed_Mpro-x0161_0_ligand_0.out

Data saved to oe_docking_data_playground.csv
Docked ligand Mpro-x0072_0_DU_0_ligand_docked.sdf, molecule 1: Chemgauss4 score = -5.89
Output saved as Mpro-x0161_0_receptor_fixed_Mpro-x0161_0_ligand_0.out

Data saved to oe_docking_data_playground.csv
Docked ligand Mpro-x0072_0_DU_1_ligand_docked.sdf, molecule 1: Chemgauss4 score = -5.97
Data saved in data/Mpro-x0161_0_ligand_with_Mpro-x0161_0_receptor_fixed
Preparing the ligand and protein files using OpenEye Toolkits...


DPI: 0.12, RFree: 0.23, Resolution: 1.65
Processing BU # 1 with title: ---, chains AB, alt: A
Processing BU # 2 with title: ---, chains AB, alt: B
Skipping redundant DU with alts outside the site of interest, renaming existing to collapse alts
Discarding redundant alt DU with title ---(AB)altB > LIG(A-1101)
Skipping redundant DU with alts outside the site of interest, renaming existing to collapse alts
Discarding redundant alt DU with title ---(AB)altB > LIG(B-1101)
Superposition - RMSD: 0.00, Ref: , Fit: , SeqScore: 3102


Design unit was successfully made for Mpro-x0072_0.pdb, output is saved to ['Mpro-x0072_0_DU_0.oedu', 'Mpro-x0072_0_DU_1.oedu'].
The receptor design unit file has been saved as Mpro-x0072_0_DU_0_receptor.oedu
The receptor design unit file has been saved as Mpro-x0072_0_DU_1_receptor.oedu
cleaning LIG
docking LIG
cleaning LIG
docking LIG
The docked ligand file has been saved as ['Mpro-x0072_0_DU_0_ligand_docked.sdf', 'Mpro-x0072_0_DU_1_ligand_docked.sdf']
Output saved as Mpro-x0072_0_receptor_fixed_Mpro-x0072_0_ligand_0.out

Data saved to oe_docking_data_playground.csv
Docked ligand Mpro-x0072_0_DU_0_ligand_docked.sdf, molecule 1: Chemgauss4 score = -6.91
Output saved as Mpro-x0072_0_receptor_fixed_Mpro-x0072_0_ligand_0.out

Data saved to oe_docking_data_playground.csv
Docked ligand Mpro-x0072_0_DU_1_ligand_docked.sdf, molecule 1: Chemgauss4 score = -6.92
Data saved in data/Mpro-x0072_0_ligand_with_Mpro-x0072_0_receptor_fixed
Preparing the ligand and protein files using OpenEye Toolkits

mv: cannot move '/home/ian/msc_project/docking/6R8O' to 'data/Mpro-x0072_0_ligand_with_Mpro-x0072_0_receptor_fixed/6R8O': Directory not empty
DPI: 0.12, RFree: 0.23, Resolution: 1.65
Processing BU # 1 with title: ---, chains AB, alt: A
Processing BU # 2 with title: ---, chains AB, alt: B
Skipping redundant DU with alts outside the site of interest, renaming existing to collapse alts
Discarding redundant alt DU with title ---(AB)altB > LIG(A-1101)
Skipping redundant DU with alts outside the site of interest, renaming existing to collapse alts
Discarding redundant alt DU with title ---(AB)altB > LIG(B-1101)
Superposition - RMSD: 0.00, Ref: , Fit: , SeqScore: 3102


Design unit was successfully made for Mpro-x0072_0.pdb, output is saved to ['Mpro-x0072_0_DU_0.oedu', 'Mpro-x0072_0_DU_1.oedu'].
The receptor design unit file has been saved as Mpro-x0072_0_DU_0_receptor.oedu
The receptor design unit file has been saved as Mpro-x0072_0_DU_1_receptor.oedu
cleaning LIG
docking LIG
cleaning LIG
docking LIG


100%|██████████| 2/2 [04:45<00:00, 142.68s/it]

The docked ligand file has been saved as ['Mpro-x0072_0_DU_0_ligand_docked.sdf', 'Mpro-x0072_0_DU_1_ligand_docked.sdf']
Output saved as Mpro-x0161_0_receptor_fixed_Mpro-x0072_0_ligand_0.out

Data saved to oe_docking_data_playground.csv
Docked ligand Mpro-x0072_0_DU_0_ligand_docked.sdf, molecule 1: Chemgauss4 score = -6.91
Output saved as Mpro-x0161_0_receptor_fixed_Mpro-x0072_0_ligand_0.out

Data saved to oe_docking_data_playground.csv
Docked ligand Mpro-x0072_0_DU_1_ligand_docked.sdf, molecule 1: Chemgauss4 score = -6.92
Data saved in data/Mpro-x0072_0_ligand_with_Mpro-x0161_0_receptor_fixed





# <font color='green'>Playground</font>

In [None]:
import glob
ligand_files = glob.glob('L4001_*.sdf')
receptor_file = 'Mpro-x0072_0_receptor_fixed.pdb'
complex_file = 'Mpro-x0072_0.pdb'
for ligand_file in tqdm(ligand_files):
    oe_process_lig_prot(lig_file=ligand_file, 
                prot_file=receptor_file,
                complex_file = complex_file,
                csv_out_file='oe_docking_data_playground.csv')

In [27]:
os.system('python ./OpenEye/GenerateMultiplePose.py -receptor 1KLT_DU_0_receptor.oedu -numPoses 5 -in 1KLT_DU_0_receptor_L1001_0_docked.sdf -out output_poses.oedu')

Number of conformers: 5
Best Receptor pose flag: True
posing 
Receptor used: CHYMASE(A) >  pose probability: 0.050000


0

In [28]:
from openeye import oechem
# TODO: THIS FUNCTION IS NOT WORKING YET!!!!
def convert_oedu_to_sdf(input_file, output_file):
    ifs = oechem.oemolistream()
    if not ifs.open(input_file):
        oechem.OEThrow.Fatal("Unable to open %s" % input_file)
    
    ofs = oechem.oemolostream()
    if not ofs.open(output_file):
        oechem.OEThrow.Fatal("Unable to open %s" % output_file)
    
    if not ifs.SetFormat(oechem.OEFormat_OEB):
        oechem.OEThrow.Fatal("Unable to set input format to OEB for %s" % input_file)
    
    if not ofs.SetFormat(oechem.OEFormat_SDF):
        oechem.OEThrow.Fatal("Unable to set output format to SDF for %s" % output_file)
    
    for mol in ifs.GetOEGraphMols():
        oechem.OEWriteMolecule(ofs, mol)
    
    ifs.close()
    ofs.close()
    print("Conversion complete. Saved to %s" % output_file)

# Example usage
# input_file = "output_poses.oedu"  # Ensure this file is in the correct format
# output_file = "output_poses.sdf"
# convert_oedu_to_sdf(input_file, output_file)


# # Example usage
# input_file = "1KLT_DU_0_receptor.oedu"
# output_file = "1KLT_DU_0_receptor.pdb"
# convert_oedu_to_sdf(input_file, output_file)

Conversion complete. Saved to output_poses.sdf
Conversion complete. Saved to 1KLT_DU_0_receptor.pdb




In [19]:
# trying out my modified receptor select function to see whether water is preserved when instructed to
pdb_id = '5s9l'
complex_filename = download_pdb_file(pdb_id)
structure = get_structure(complex_filename)
save_receptor(structure, '5S9L_receptor_w_water_metal.pdb', preserve_water=True, preserve_metal=True)

Downloading PDB structure '5s9l'...




In [None]:


#TODO: fix issues with not getting peptide ligands (low priority)
#TODO: fix issues with ligand extraction: some ligands present several conformations
#TODO: determine whether a ligand is small molecule or peptide (low priority)
#TODO: log-scale dG values, checking whether kd/ki values are experimentally determined or predicted (high priority)