# **Activity 2.2: Molecular docking of immune and skin proteins (using PGA as validation)**

Before you proceed this workflow, I encourage you to follow the tutorials on ```00_tutorials/``` folder prior programming to have everything downloaded.

In this workflow, we will be docking our ligand with multiple human proteins.

Make sure you already have installed ```requests```, ```biopython```, ```openbabel```, ```MDAnalysis```, ```numpy```, ```pandas``` libraries. Otherwise, copy the following lines and paste it on the terminal:

1. ```conda activate vina```
2. ```sudo apt install openbabel```
3. ```conda install requests biopython openbabel MDAnalysis numpy pandas```

## 1. Create PDBQT file for Ligand

Get the SDF file of ligand from PubChem: https://pubchem.ncbi.nlm.nih.gov/

In [1]:
# Custom Library to prepare Ligand using Openbabel
from docking_functions.PrepareLigand import prepare_ligand
# Library for file handling and export
import os

ligands_pdbqt = {} # Dictionary to store PDBQT files from n-unit oligomers

# Set Directory with PGA files
input_dir = "files/"
# Set Output Directory for Ligands
ligand_dir = os.path.join("output", "ligand_files")
os.makedirs(ligand_dir, exist_ok=True)

for num_units in range(1, 13):  # 1 to 6 units; Modify this range as needed
    try:
        # Convert PDB file to PDBQT
        ligand_pdbqt = prepare_ligand(
            input_file = os.path.join(input_dir, f"pga_{num_units}_unit.pdb"),
            output_dir = ligand_dir,
            ph = 7.4,
            charge_method = 'gasteiger'
        )
        # Store PDBQT filename
        ligands_pdbqt[num_units] = ligands_pdbqt
    except Exception as e:
        print(e)

print("\nPDBQT conversion completed.")

1 molecule converted
1 molecule converted
1 molecule converted


Ligand prepared successfully: output/ligand_files/pga_1_unit.pdbqt
Ligand prepared successfully: output/ligand_files/pga_2_unit.pdbqt
Ligand prepared successfully: output/ligand_files/pga_3_unit.pdbqt


1 molecule converted
1 molecule converted
1 molecule converted


Ligand prepared successfully: output/ligand_files/pga_4_unit.pdbqt
Ligand prepared successfully: output/ligand_files/pga_5_unit.pdbqt
Ligand prepared successfully: output/ligand_files/pga_6_unit.pdbqt


1 molecule converted
1 molecule converted


Ligand prepared successfully: output/ligand_files/pga_7_unit.pdbqt
Ligand prepared successfully: output/ligand_files/pga_8_unit.pdbqt


1 molecule converted
1 molecule converted


Ligand prepared successfully: output/ligand_files/pga_9_unit.pdbqt
Ligand prepared successfully: output/ligand_files/pga_10_unit.pdbqt


1 molecule converted


Ligand prepared successfully: output/ligand_files/pga_11_unit.pdbqt
Ligand prepared successfully: output/ligand_files/pga_12_unit.pdbqt

PDBQT conversion completed.


1 molecule converted


## 2. Create PDBQT file from Protein

We already downloaded and processed PDBQT files in previous workflow.

In [2]:
pdb_ids = [
    ("6EC0", "Keratin 1", True), # PDB ID, PDB Protein Name, deduplicate
    ("7CWK", "Collagen type I", True),
    ("4BSO", "R-SPONDIN-1", True),
    ("1RG8", "Heparin-binding growth factor 1", True),
    ("1CSG", "Granulocyte-macrophage colony-stimulating factor", True),
    ("1TGJ", "TRANSFORMING GROWTH FACTOR-BETA 3", True),
    ("1VPF", "VASCULAR ENDOTHELIAL GROWTH FACTOR", True),
    ("2Z80", "Toll-like receptor 2, Variable lymphocyte receptor B", True),
    # Add as needed
]

pdbqt_filenames = [] # List to store converted PDBQT filenames

# Convert to pdbqt filenames
for pdb_id in pdb_ids:
    pdbqt_filename = f"output/receptor_files/{pdb_id[0]}_{pdb_id[1].replace(' ', '_')}.pdbqt"
    
    # Verify the pdbqt filename is in directory
    if os.path.exists(pdbqt_filename):
        pdbqt_filenames.append(pdbqt_filename)
        print(f"Found: {pdbqt_filename}")
    else:
        print(f"Missing: {pdbqt_filename}")

Found: output/receptor_files/6EC0_Keratin_1.pdbqt
Found: output/receptor_files/7CWK_Collagen_type_I.pdbqt
Found: output/receptor_files/4BSO_R-SPONDIN-1.pdbqt
Found: output/receptor_files/1RG8_Heparin-binding_growth_factor_1.pdbqt
Found: output/receptor_files/1CSG_Granulocyte-macrophage_colony-stimulating_factor.pdbqt
Found: output/receptor_files/1TGJ_TRANSFORMING_GROWTH_FACTOR-BETA_3.pdbqt
Found: output/receptor_files/1VPF_VASCULAR_ENDOTHELIAL_GROWTH_FACTOR.pdbqt
Found: output/receptor_files/2Z80_Toll-like_receptor_2,_Variable_lymphocyte_receptor_B.pdbqt


# 3. Run Molecular Docking

Copy and paste the following cell as needed...
Depending on how many proteins you want to dock.

In [3]:
# Store docking results to export it as CSV file
docking_results = []

In [4]:
# Custom Library to run Autodock Vina Blind Docking
from docking_functions.BlindDocking import compute_bounding_box, calculate_receptor_center_of_mass, run_vina_blind_docking
from docking_functions.docking_utils import extract_pose_results
from IPython.display import clear_output

# Set directory structure
ligand_dir = os.path.join("output", "ligand_files")
receptor_dir = os.path.join("output", "receptor_files")
docking_dir = os.path.join("output", "docking")

# Create docking subdirectory
os.makedirs(docking_dir, exist_ok=True)

# 1. Iterate over each n-unit PGA PDBQT files
for num_units in range(1, 13):  # 1 to 12 units

    # 2. Get Ligand file
    pga_pdbqt = os.path.join(ligand_dir, f"pga_{num_units}_unit.pdbqt")

    # 3. Iterate over each protein (PDBQT files)
    for receptor_pdbqt in pdbqt_filenames:
        try:
            print(f"\n{'='*70}")
            print(f"Running docking for: {os.path.basename(receptor_pdbqt)} - {os.path.basename(pga_pdbqt)}")
            print(f"{'='*70}")

            # 4. Calculate center of protein
            grid_center = calculate_receptor_center_of_mass(receptor_pdbqt)
            # 5. Calculate size
            grid_size = compute_bounding_box(receptor_pdbqt)

            # 6. Extract PDB code for output naming
            pdb_code = os.path.basename(receptor_pdbqt).split("_")[0]
            output_subdir = f"pga_{num_units}_{pdb_code}"

            # 7. Run vina
            vina_results, error, vina_log_file = run_vina_blind_docking(
                # 1. Protein PDBQT filename
                receptor_pdbqt=receptor_pdbqt,
                # 2. Ligand PDBQT filename
                ligand_pdbqt=pga_pdbqt,
                # 3. Align Protein and Ligand inside Grid size
                align_ligand=True,
                # 4. Docking Output Directory
                output_folder=docking_dir,
                # 5. Center of protein
                center=grid_center,
                # 6. Box grid size
                size=grid_size,
                # 7. Is energy minimized ligand?
                ligand_minimized=True,
                # 8. Computational power (Minimum = 8; Standard = 32)
                exhaustiveness=32,
                # 9. Number of conformational ligand poses
                num_modes=8,
                # 10. Number of CPUs available
                cpu=-1,
                # 11. Set seed for reproducible results
                seed=1234,
                # 12. Save all conformational ligand poses
                save_poses="all",
                # 13. Output filenames
                output_filename=output_subdir
            )

            if error:
                print(f"Error: {error}")
            else:
                print(f"Best affinity: {vina_results['best_affinity']:.2f} kcal/mol")
                print(f"Poses saved: {len(vina_results['pose_files'])}")
                
                # Extract all poses results
                result_dicts = extract_pose_results(
                    vina_results,
                    receptor_pdbqt,
                    pga_pdbqt,
                    output_subdir,
                    poses='all'
                )

                if result_dicts:
                    # Handle both single dict and list of dicts
                    if isinstance(result_dicts, list):
                        docking_results.extend(result_dicts)
                        print(f"\t- Extracted {len(result_dicts)} poses for {os.path.basename(receptor_pdbqt)}")
                    else:
                        docking_results.append(result_dicts)
                        print(f"\t- Completed docking for {os.path.basename(receptor_pdbqt)}")
                else:
                    print(f"\t- No results for {os.path.basename(receptor_pdbqt)}")        
        except Exception as e:
            print(f"Error: {e}")

        clear_output(wait=True) # Clears the current output before the next iteration

print(f"\n{'='*70}")
print(f"\t- Docking complete!")
print(f"Total results collected: {len(docking_results)}")
print(f"{'='*70}")


	- Docking complete!
Total results collected: 712


In [5]:
import pandas as pd
from docking_functions.docking_utils import export_docking_results

# DataFrame filename
df_filename = "output/docking_results_pga_summary.csv"

# Create DataFrame from all results
if docking_results:
    df_results = pd.DataFrame(docking_results)
    
    print("\n" + "="*50)
    print("All Docking Results:")
    print("="*50)
    print(df_results)
    
    # Export results using the export function
    export_docking_results(df_results, 
                           df_filename,
                           include_timestamp=False)

    # Optional: Sort by affinity
    df_sorted = df_results.sort_values('affinity_kcal_mol')
    print("\n" + "="*50)
    print("Results sorted by affinity (best to worst):")
    print("="*50)
    print(df_sorted[['protein_file', 'ligand_file', 'affinity_kcal_mol']])

else:
    print("No docking results were collected.")


All Docking Results:
                                          protein_file        ligand_file  \
0                                 6EC0_Keratin_1.pdbqt   pga_1_unit.pdbqt   
1                                 6EC0_Keratin_1.pdbqt   pga_1_unit.pdbqt   
2                                 6EC0_Keratin_1.pdbqt   pga_1_unit.pdbqt   
3                                 6EC0_Keratin_1.pdbqt   pga_1_unit.pdbqt   
4                                 6EC0_Keratin_1.pdbqt   pga_1_unit.pdbqt   
..                                                 ...                ...   
707  2Z80_Toll-like_receptor_2,_Variable_lymphocyte...  pga_12_unit.pdbqt   
708  2Z80_Toll-like_receptor_2,_Variable_lymphocyte...  pga_12_unit.pdbqt   
709  2Z80_Toll-like_receptor_2,_Variable_lymphocyte...  pga_12_unit.pdbqt   
710  2Z80_Toll-like_receptor_2,_Variable_lymphocyte...  pga_12_unit.pdbqt   
711  2Z80_Toll-like_receptor_2,_Variable_lymphocyte...  pga_12_unit.pdbqt   

     output_name  pose_number  affinity_kcal_mol  rms

## END