# Programmatic implementation of AmDock

Testing the programmatic htvs via the `AMDock` workflow. `AMDock` is a nive GUI for autodock that works very well. Below is a test run of my `htvs_amdock` module that tries to replicate the same workflow as `AMDock` programmatically.

## Preparation

1. Download the test complex [1HSG protein with mk1 ligand](https://www.rcsb.org/structure/1HSG). Separate the protein and ligand pdb data in `pymol`.
2. Do not download the ligand structure from any other source. Use the one in the complex, as it is experimentally verified.
3. To reduce the possibility of false positives, the ligand PDB, once extracted from the experimentally verified crystal pose, was manually moved/rotated about in pymol by a random distance and angle.

In [1]:
import importmonkey
import os

# set OMP_NUM_THREADS to number of available CPUs. Best for docking a single complex quickly
cpus = os.cpu_count() or 1
os.environ["OMP_NUM_THREADS"] = str(cpus)
home = os.environ["HOME"]

importmonkey.add_path(os.path.join(home, "gitrepos/gromacs_sims/scripts"))
from htvs_amdock import run_htvs

PROTEIN_PDB = os.path.join(home, "gitrepos/gromacs_sims/htvs/1hsg_mk1/experiment/1hsg_protein.pdb")
LIGAND_PDB = os.path.join(home, "gitrepos/gromacs_sims/htvs/1hsg_mk1/experiment/mk1.pdb")

testjob = [{
            "protein_name": "1hsg_protein",
            "ligand_name": "mk1",
            "protein_pdb": PROTEIN_PDB,
            "ligand_pdb": LIGAND_PDB,
            "working_dir": os.path.join(home, "gitrepos/gromacs_sims/htvs/1hsg_mk1/job_1hsg_mk1_script_20251206"),
        }]

# Uncomment this to rerun the docking
#run_htvs(testjob)

## AutoLigand error (resolved)

======== AutoLigand Site Search (Automatic) FAILED STDOUT ========

======== AutoLigand Site Search (Automatic) FAILED STDERR ========
```python
Traceback (most recent call last):
  File "/usr/local/miniforge3/envs/htvs/bin/AutoLigand", line 7, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/miniforge3/envs/htvs/lib/python3.12/site-packages/AutoDockTools/AutoLigand.py", line 198, in main
    word = string.split(linee)
           ^^^^^^^^^^^^
AttributeError: module 'string' has no attribute 'split'
```

The `split()` function was deprecated and the functionality moved to a `xxx.split()` method during upgrade from python 2 to python 3. The `AutoLigand` script taken from [AutoDockTools_py3](https://github.com/Valdes-Tresanco-MS/AutoDockTools_py3) is supposed to translate the old `AutoLigand.py` written in python-2 to python-3, but fumbled it here. 


## Resolution
Raised this issue at the github repo, forked a copy of mine, then modded accordingly and it now works.

## Post-Processing

Tabulate the docking scores.

In [2]:
import os
home = os.environ["HOME"]
results_dir = os.path.join(home, "gitrepos/gromacs_sims/htvs/1hsg_mk1/job_1hsg_mk1_script_20251206")
# Scan all log files in the results_dir and print a table of ligand names and their best binding affinities. Use tabulate module for formatting.
from tabulate import tabulate
import re
results = []
for filename in os.listdir(results_dir):
    if not filename.endswith(".log"):
        continue
    log_path = os.path.join(results_dir, filename)
    # extract pose id from filename (try common patterns, fallback to filename stem)
    m = re.search(r'pose[_-]?(\d+)', filename, re.IGNORECASE) or \
        re.search(r'site[_-]?(\d+)', filename, re.IGNORECASE) or \
        re.search(r'_(\d+)\.log$', filename)
    pose_id = m.group(1) if m else os.path.splitext(filename)[0]
    with open(log_path, "r") as f:
        lines = f.readlines()
        for i, line in enumerate(lines):
            if line.startswith("-----+"):
                if i + 1 < len(lines):
                    parts = lines[i+1].split()
                    if len(parts) >= 2:
                        try:
                            affinity = float(parts[1])
                        except ValueError:
                            continue
                        results.append((pose_id, affinity))

# sort by affinity in increasing numeric order and format for display
results.sort(key=lambda x: x[1])
table = [(pid, f"{aff:.3f}") for pid, aff in results]
print(tabulate(table, headers=["Pose ID", "Best Binding Affinity (kcal/mol)"]))

  Pose ID    Best Binding Affinity (kcal/mol)
---------  ----------------------------------
        8                             -10.62
        5                             -10.57
        2                             -10.57
        9                             -10.56
       10                              -6.886
        1                              -6.855
        6                              -6.817
        7                              -6.764
        3                              -6.752
        4                              -5.879


## Visualize

Use `nglview` to checkout the best docked pose.

### TODO
Directly use `mdanalysis` to compare best docked ligand pose with the experimental one. Currently doing it with `pymol`, but it is better to just compute RMSD on jupyter and visualize both poses with nglview. 

In [3]:
ligand_id = "mk1"
best_pose_id, best_affinity = results[0]
best_pose_ligand_pdb = os.path.join(results_dir, f"{ligand_id}_pose_site{best_pose_id}.pdb")
protein_pdb = testjob[0]["protein_pdb"]
#use nglview to visualize the best pose
import nglview as nv
view = nv.NGLWidget()
view.add_component(protein_pdb, ext="pdb", default_representation="cartoon", name="Protein")
view.add_component(best_pose_ligand_pdb, ext="pdb", default_representation="ball+stick", name="Best Ligand Pose")
view.center()
view.display()
view

NGLWidget()

## Further Testing

Below is a concise report and a ready test set of **10 experimentally determined protein–ligand complexes (all proteins ≤200 residues)** you can use to validate docking workflows; the list and recommendations are drawn from public structural resources (RCSB PDB and curated datasets).

---

### Protein–ligand test set

| **PDB ID** | **Protein (residues)** | **Ligand (type)** | **Method / Resolution** | **Link** |
|---|---:|---|---|---|
| **1HEW** | **Hen egg‑white lysozyme (129)** | **Tri‑N‑acetylchitotriose (saccharide)** | X‑ray 1.75 Å | https://www.rcsb.org/structure/1HEW |
| **1FKB** | **FKBP12 (107)** | **Rapamycin (small molecule)** | X‑ray 1.70 Å | https://www.rcsb.org/structure/1FKB |
| **1HVR** | **HIV‑1 protease (99 per chain)** | **Nonpeptide cyclic urea inhibitor (small molecule)** | X‑ray 1.80 Å | https://www.rcsb.org/structure/1HVR |
| **1STP** | **Streptavidin monomer (159)** | **Biotin (small molecule)** | X‑ray 2.60 Å | https://www.rcsb.org/structure/1STP |
| **1AFK** | **Ribonuclease A (124)** | **Nucleotide analog (small molecule)** | X‑ray 1.70 Å | https://www.rcsb.org/structure/1AFK |
| **2K0F** | **Calmodulin (148)** | **Calmodulin‑binding peptide (peptide ligand)** | NMR (solution) | https://www.rcsb.org/structure/2K0F |
| **1YCR** | **MDM2 (109)** | **p53 transactivation peptide (peptide ligand)** | X‑ray 2.60 Å | https://www.rcsb.org/structure/1YCR |
| **1BRS** | **Barnase (110) / Barstar (89)** | **Protein inhibitor (protein–protein complex)** | X‑ray 2.00 Å | https://www.rcsb.org/structure/1BRS |
| **1STN** | **Staphylococcal nuclease (149)** | **Small‑molecule binding site available** | X‑ray 1.70 Å | https://www.rcsb.org/structure/1STN |
| **1DFJ** | **Ribonuclease A (124)** | **Bound inhibitor (small/peptide‑like)** | X‑ray 2.50 Å | https://www.rcsb.org/structure/1DFJ |

### Notes and reasoning

- **Selection criteria:** I prioritized **experimentally validated** entries from public structural repositories and curated datasets, and constrained targets to **≤200 residues** to keep receptors small and tractable for docking and sampling. The RCSB PDB is the primary source for coordinates, experimental method, and resolution. PLBD and curated collections (e.g., PL‑REX) provide complementary lists and affinity/validation metadata useful for benchmarking.
- **Ligand diversity:** The set intentionally spans **small molecules (biotin, rapamycin, inhibitors), nucleotides, saccharides, peptides, and a protein inhibitor** to exercise different docking challenges: rigid small‑molecule fits, flexible sugar recognition, peptide backbone sampling, and protein–protein interfaces.
- **Why these are useful:** **Small proteins reduce conformational search space**, making it easier to isolate docking algorithm performance. Peptide and protein‑protein cases test sampling and scoring for larger, flexible interfaces; small‑molecule cases test pose prediction and scoring accuracy.
- **Practical preparation steps:** Download coordinate files (PDBx/mmCIF) and validation reports from RCSB; extract **chain IDs**, **ligand Chemical Component IDs**, and experimental metadata before preprocessing. Standardize protonation, remove or retain crystallographic waters per your protocol, and handle alternate conformers consistently.
- **Validation metrics:** For **small molecules** compute heavy‑atom RMSD to deposited ligand coordinates; for **peptides/protein interfaces** use backbone RMSD and interface contact recovery (e.g., fraction of native contacts). Where available, use experimental affinity data from PLBD or curated datasets to correlate docking scores with binding strength.
- **Next steps I can provide:** I can (a) expand the set with additional small‑molecule complexes, (b) extract **ligand Chemical Component IDs and recommended receptor chain selections** for each PDB entry, or (c) produce a CSV of the table for local use.

**Key takeaway:** This mixed set gives **broad coverage of docking scenarios** while keeping receptors small (≤200 residues), enabling focused benchmarking of sampling and scoring components using experimentally validated structures.

## Complex 1HEW_NAG


In [None]:
import importmonkey
import os
home = os.environ["HOME"]

importmonkey.add_path(os.path.join(home, "gitrepos/gromacs_sims/scripts"))
from htvs_amdock import run_htvs

PROTEIN_PDB = os.path.join(home, "gitrepos/gromacs_sims/htvs/1hew_nag/experiment/1HEW_expt.pdb")
LIGAND_PDB = os.path.join(home, "gitrepos/gromacs_sims/htvs/1hew_nag/experiment/nag.pdb")

testjob = [{
            "protein_name": "1hew_protein",
            "ligand_name": "nag",
            "protein_pdb": PROTEIN_PDB,
            "ligand_pdb": LIGAND_PDB,
            "working_dir": os.path.join(home, "gitrepos/gromacs_sims/htvs/1hew_nag/job_1hew_nag_script_20251208"),
        }]

# Uncomment this to rerun the docking
run_htvs(testjob, verbose=True)

In [2]:
import os
home = os.environ["HOME"]
results_dir = os.path.join(home, "gitrepos/gromacs_sims/htvs/1hew_nag/job_1hew_nag_script_20251208")
# Scan all log files in the results_dir and print a table of ligand names and their best binding affinities. Use tabulate module for formatting.
from tabulate import tabulate
import re
results = []
for filename in os.listdir(results_dir):
    if not filename.endswith(".log"):
        continue
    log_path = os.path.join(results_dir, filename)
    # extract pose id from filename (try common patterns, fallback to filename stem)
    m = re.search(r'pose[_-]?(\d+)', filename, re.IGNORECASE) or \
        re.search(r'site[_-]?(\d+)', filename, re.IGNORECASE) or \
        re.search(r'_(\d+)\.log$', filename)
    pose_id = m.group(1) if m else os.path.splitext(filename)[0]
    with open(log_path, "r") as f:
        lines = f.readlines()
        for i, line in enumerate(lines):
            if line.startswith("-----+"):
                if i + 1 < len(lines):
                    parts = lines[i+1].split()
                    if len(parts) >= 2:
                        try:
                            affinity = float(parts[1])
                        except ValueError:
                            continue
                        results.append((pose_id, affinity))

# sort by affinity in increasing numeric order and format for display
results.sort(key=lambda x: x[1])
table = [(pid, f"{aff:.3f}") for pid, aff in results]
print(tabulate(table, headers=["Pose ID", "Best Binding Affinity (kcal/mol)"]))

  Pose ID    Best Binding Affinity (kcal/mol)
---------  ----------------------------------
        7                              -7.044
        3                              -7.016
        2                              -6.982
        1                              -6.973
        5                              -6.964
        6                              -6.886
        4                              -6.861
        8                              -5.453


In [None]:
ligand_id = "nag"
best_pose_id, best_affinity = results[0]
ligand_expt_pose = os.path.join(home, "gitrepos/gromacs_sims/htvs/1hew_nag/experiment/nag_expt.pdb")
best_pose_ligand_pdb = os.path.join(results_dir, f"{ligand_id}_pose_site{best_pose_id}.pdb")
protein_pdb = testjob[0]["protein_pdb"]
#use nglview to visualize the best pose
import nglview as nv
from IPython.display import display, Image
view = nv.NGLWidget()

view.add_component(protein_pdb, ext="pdb", default_representation="cartoon", name="Protein")
view.add_component(ligand_expt_pose, ext="pdb", default_representation="ball+stick", name="Experimental Ligand")
view.add_component(best_pose_ligand_pdb, ext="pdb", default_representation="CPK", name="Best Ligand Pose")
view.center()
view

NGLWidget()