<a href="https://colab.research.google.com/github/mirsadra/DPP-4/blob/main/VirtualScreening.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Workflow for Virtual Screening

## 1. Preparation of Protein Structures

**Retrieve Structures**: Download the PDB files for 7Y4F, 7Y4G, 8HAY, and 1X70 from the Protein Data Bank.

**Protein Preparation**: Clean up the structures by removing water molecules and any other non-relevant molecules (e.g., ions, unless they are known to be crucial for the binding mechanism). Standardize the protonation states of amino acids and optimize the hydrogen bonding network.

In [None]:
# 1. Install Necessary Libraries: If you haven't installed these, you can do so using pip:
!pip install biopython rdkit-pypi

In [7]:
# Step 1: Load and Clean PDB Structures
from Bio.PDB import PDBParser, Select, PDBIO

class NonWaterSelect(Select):
    def accept_residue(self, residue):
        return residue.get_resname() != "HOH"

def clean_structure(input_pdb, output_pdb):
    parser = PDBParser()
    structure = parser.get_structure("Protein", input_pdb)

    # Remove water and other unwanted molecules
    io = PDBIO()
    io.set_structure(structure)
    io.save(output_pdb, select=NonWaterSelect())

In [None]:
# Example usage:
from google.colab import drive
drive.mount('/content/drive')

!ls "/content/drive/My Drive/DPP4"

path_7y4f = '/content/drive/My Drive/DPP4/7y4f.pdb'

clean_structure(path_7y4f, "7y4f_cleaned.pdb")

In [None]:
path_7y4g = '/content/drive/My Drive/DPP4/7y4g.pdb'
path_8hay = '/content/drive/My Drive/DPP4/8hay.pdb'
path_1x70 = '/content/drive/My Drive/DPP4/1x70.pdb'

clean_structure(path_7y4g, '7y4g_cleaned.pdb')
clean_structure(path_8hay, '8hay_cleaned.pdb')
clean_structure(path_1x70, '1x70_cleaned.pdb')

In [10]:
# Step 2: Extract and Prepare Ligands
!pip install rdkit

Collecting rdkit
  Downloading rdkit-2023.9.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (34.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m34.9/34.9 MB[0m [31m37.2 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: rdkit
Successfully installed rdkit-2023.9.6


In [18]:
from Bio.PDB import PDBParser
from rdkit import Chem
from rdkit.Chem import rdchem
import io

def extract_and_save_ligand_from_pdb(pdb_file, ligand_resname):
    parser = PDBParser(QUIET=True)
    structure = parser.get_structure("Protein", pdb_file)

    class LigandSelect(Select):
        def accept_residue(self, residue):
            return residue.get_resname().strip() == ligand_resname

    ligand_found = False
    io = PDBIO()
    for model in structure:
        for chain in model:
            for residue in chain:
                if residue.get_resname().strip() == ligand_resname:
                    ligand_found = True
                    io.set_structure(structure)
                    dir_path = os.path.dirname(pdb_file)
                    output_path = os.path.join(dir_path, f"{ligand_resname}.pdb")
                    io.save(output_path, LigandSelect())
                    print(f"Ligand PDB saved to {output_path}")
                    # Convert PDB to RDKit molecule
                    mol = Chem.MolFromPDBFile(output_path, sanitize=False, removeHs=False)
                    if mol:
                        sdf_path = output_path.replace(".pdb", ".sdf")
                        writer = Chem.SDWriter(sdf_path)
                        writer.write(mol)
                        writer.close()
                        print(f"Ligand SDF saved to {sdf_path}")
                    break
            if ligand_found:
                break
        if ligand_found:
            break

    if not ligand_found:
        print(f"No ligand with the residue name {ligand_resname} found.")

In [19]:
# Assuming you have PDB files or SDF files with ligands
ligand = extract_and_save_ligand_from_pdb(path_7y4g, "SIT")  # Sitagliptin extraction example

No ligand with the residue name SIT found.
