# Workflow for Virtual Screening

## 1. Preparation of Protein Structures

**Retrieve Structures**: Download the PDB files for 7Y4F, 7Y4G, 8HAY, and 1X70 from the Protein Data Bank.

**Protein Preparation**: Clean up the structures by removing water molecules and any other non-relevant molecules (e.g., ions, unless they are known to be crucial for the binding mechanism). Standardize the protonation states of amino acids and optimize the hydrogen bonding network.

In [1]:
# 1. Install Necessary Libraries: If you haven't installed these, you can do so using pip:
!pip install biopython rdkit-pypi

Collecting biopython
  Downloading biopython-1.83-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m12.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting rdkit-pypi
  Downloading rdkit_pypi-2022.9.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (29.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m29.4/29.4 MB[0m [31m32.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: rdkit-pypi, biopython
Successfully installed biopython-1.83 rdkit-pypi-2022.9.5


In [3]:
# Step 1: Load and Clean PDB Structures
from Bio.PDB import PDBParser, Select

class NonWaterSelect(Select):
    def accept_residue(self, residue):
        return residue.get_resname() != "HOH"

def clean_structure(input_pdb, output_pdb):
    parser = PDBParser()
    structure = parser.get_structure("Protein", input_pdb)

    # Remove water and other unwanted molecules
    io = PDBIO()
    io.set_structure(structure)
    io.save(output_pdb, select=NonWaterSelect())

In [4]:
# Example usage:
clean_structure("7Y4F.pdb", "7Y4F_cleaned.pdb")

FileNotFoundError: [Errno 2] No such file or directory: '7Y4F.pdb'