# PocketXMol Peptide Design Interface

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pengxingang/PocketXMol/blob/master/notebooks/PXM_PeptideDesign.ipynb)

<!-- **Paper**: [PocketXMol](https://www.biorxiv.org) -->

**GitHub**: [PocketXMol](https://github.com/pengxingang/PocketXMol)

---

### This notebook provides an interface to run PocketXMol's **peptide design** capability. You can:
- #### **Design *de novo*** peptides binding to a protein pocket.
- #### Perform **inverse folding** for peptides binding to a protein pocket.
- #### **Repack side chain** of peptides within a protein pocket.


> The notebook handles:
> 1. 🔧 Setting up the environment with required dependencies
> 2. ⚙️ Configuring for your specific task and protein receptor
> 3. 🔄 Running the sampling to generate peptides
> 4. 📊 Visualize results

Let's get started!


---
## **1. Setup**

In [None]:
#@title **Install Conda Colab**
#@markdown It will install Conda and restart the kernel. Don't worry.
!pip install -q condacolab
import condacolab
condacolab.install()

In [None]:
#@title **Setup PocketXMol**
#@markdown It will automatically load the PocketXMol model and setup the environment (takes several minutes).

import os
%cd /content
![ -d sample_data ] && rm -rf sample_data

# Clone the repository
if not os.path.exists('PocketXMol'):
    !echo Clone PocketXMol from github
    !git clone https://github.com/pengxingang/PocketXMol.git -q
else:
    print("✅ PocketXMol repository already exists")

# Download data
if not os.path.exists('/content/PocketXMol/data/trained_models/pxm/checkpoints/pocketxmol.ckpt'):
    %cd /content/PocketXMol
    # !gdown 1PF4V5kB-RLEFBD38HggVD9eR7RTeq573  # data_test.tar.gz
    !gdown 1Hu6qTkCyNUPPsQLLHL1kBFiwRbKUOFLs   # model_weights.tar.gz
    !tar -zxf model_weights.tar.gz && rm model_weights.tar.gz
else:
    print("✅ PocketXMol model weights already exists")
%cd /content

# Install dependencies
env_name = 'pxm_cu126'
install_path = f'install_{env_name}.sh'
if not os.path.exists(install_path):
# if True:
  cmd_list = [
    f"mamba create -n {env_name} python=3.11",
    f"source activate {env_name}",
    "pip install torch==2.6.0 --index-url https://download.pytorch.org/whl/cu126",
    "pip install pytorch-lightning",
    "pip install torch_geometric",
    "pip install torch_scatter torch_sparse torch_cluster -f https://data.pyg.org/whl/torch-2.6.0+cu126.html",
    "pip install biopython==1.83 rdkit==2023.9.3 peptidebuilder==1.1.0",
    "pip install lmdb easydict==1.9 numpy==1.24 pandas==1.5.2",
    "pip install tensorboard",
    "mamba install -c conda-forge openbabel -y"
  ]
  n_cmd = len(cmd_list)
  with open(install_path, 'w+') as f:
    for i_cmd, cmd in enumerate(cmd_list):
      f.write(f"echo \"Running ({i_cmd+1}/{n_cmd})... >{cmd}\"\n")
      f.write(f"{cmd} >> install.log\n")
else:
  print(f"✅ Environment file {install_path} already exists")
print("📦 Installing environment...")
!bash {install_path}
print("✅ PocketXMol environment has been setup")


## **2. Configure Peptide Design Task**

Configure your peptide design task below, including:
1. **Job parameters**: set the job name and the sampling parameters.
2. **Input data**: provide input data including peptide information and protein and define the pocket.
3. **Advanced settings (optional)**: configure the model for specific settings.


In [None]:
# @title **Job parameters** {"run":"auto"}

#@markdown ### 1. Define the **job**
task_name = 'peptide_design_example'  #@param {type:"string"}
output_directory = '/content/outputs'  #@param {type:"string"}

#@markdown ### 2. Configure **sampling** parameters
#@markdown Total number of docking poses to generate
num_samples = 10 #@param {type:"integer"}
#@markdown Number of poses to generate in each batch (reduce if you encounter memory issues)
batch_size = 10 #@param {type:"integer"}
#@markdown Ratio of molecules to save the sampling trajactory.
save_traj_prob = 0.05 #@param {type:"slider", min:0.0, max:1.0, step:0.05}
#@markdown Random seed
seed = 2024 #@param {type:"integer"}
#@markdown Use GPU or not for sampling
device = "cuda:0" #@param ["cuda:0", "cpu"]

In [None]:
# @title **Define input data** {"run":"auto"}

#@markdown ### 1. Provide the file path of **protein receptor**
protein_path = 'PocketXMol/data/examples/peptide/3bik_A.pdb'  #@param {type:"string"}
#@markdown > example: `PocketXMol/data/examples/peptide/3bik_A.pdb`

#@markdown ### 2. Provide the **ligand**
task_mode = 'de novo design'  #@param ["de novo design", "inverse folding", "side-chain repacking"]
#@markdown Enter input peptide file path or peptide length (length only for *de novo* design)
peptide_path_or_length = "10"  #@param {type:"string"}
#@markdown > examples:
#@markdown > - `10` for *de novo* peptide with length of $10$
#@markdown > - `PocketXMol/data/examples/hot136E/PDL1_pephot136E_pep.pdb` for using initial peptide for design

#@markdown ### 3. Specify the **pocket**
#@markdown Two steps to define the pocket:

#@markdown **Step 1**: Specify the radius (Å) around the reference to define pocket (Å):
radius = 20 #@param {type:"slider", min:5.0, max:30.0, step:1.0}
#@markdown > examples: `20` is a good start

#@markdown **Step 2**: Choose either way to define a reference:
#@markdown 1. Provide *a reference molecule* around the pocket
#@markdown 2. Provide *a point coordinate* around the pocket as reference.
define_pocket_by = 'reference_molecule' #@param ["reference_molecule", "pocket_coordinate"]

#@markdown - If you choose `reference_molecule`, provide the reference file or directly use input ligand file as refernce
#@markdown (in this case, the input ligand should be PDB file so that it contains correct coordinate information).
use_ligand_as_ref = False #@param {type:"boolean"}
reference_path = 'PocketXMol/data/examples/peptide/3bik_A_pocket_coord.sdf' #@param {type:"string"}
#@markdown > example: `PocketXMol/data/examples/peptide/3bik_A_pocket_coord.sdf`

#@markdown - If you choose `pocket_coordinate`, provide the XYZ coordinate:
pocket_x = 0.0 #@param {type:"number"}
pocket_y = 0.0 #@param {type:"number"}
pocket_z = 0.0 #@param {type:"number"}
#@markdown > example: `[12.9130, -5.3910, -30.0240]`

#@markdown > 💡 Tips:
#@markdown > 1. Choose either value for `define_pocket_by` and set corresponding parameters. Make sure their consistency.
#@markdown > 2. The radius of $[10, 20]$ is recomendded for `define_pocket_by=reference_molecule`
#@markdown > 3. The radius of $[15, 22]$ is recomendded for `define_pocket_by=pocket_coordinate`. In this case, the radius can be slightly larger to cover engough pocket residues.


# Advanced settings, default values (in case people forget to run the next cell)
init_noise_scale = 1
space_center_by = 'pocket_center'
num_steps = 100


### **Advanced settings (optional)**

In [None]:
# @title  {"run":"auto"}



#@markdown ### Initial noise scale
#@markdown - The parameter controls the initial noise scale (in $[0, 1]$), from which the noise scale gradually decays to zero during sampling.
#@markdown  - $\text{init_noise_scale}=0$ means no noise at the initial step (of course, also no noise at subsequent steps)
#@markdown  - $\text{init_noise_scale}=1$ means sampling from noise prior at the initial step (input peptide information (if any) is ignored).
#@markdown  - $0<\text{init_noise_scale}<1$ means adding noise to the input peptide at the initial step (instead of from noise prior).
#@markdown - For *de novo* design, set $\text{init_noise_scale}=1$; if you want to use the information of the input peptide (e.g., generating similar peptides), set $\text{init_noise_scale}<1$.
init_noise_scale = 1 #@param {type:"number"}

#@markdown ### Noise space center
#@markdown - The parameter defines the noise space center during sampling
#@markdown  - `pocket_center` directly uses the pocket center as space center
#@markdown  - `input_ligand_center` uses the input ligand center as space center. In this case, the input ligand should be SDF/PDB file that contains correct coordinate information.
#@markdown  - `specified_coordinate` uses the provided coordinate as space center
#@markdown - Generally, this parameter has little impact so it is recommended to use `pocket_center` (default) or `input_ligand_center`.
space_center_by = 'pocket_center' #@param ["pocket_center", "input_ligand_center", "specified_coordinate"]
#@markdown  - Specify the coordinate if you choose `specified_coordinate`, otherwise, skip it.
space_x = 0.0 #@param {type:"number"}
space_y = 0.0 #@param {type:"number"}
space_z = 0.0 #@param {type:"number"}


#@markdown ### Denoising steps
#@markdown - The parameter defines the denosing steps for sampling
#@markdown - Default is 100.
num_steps = 100 #@param {type:"integer"}



## **3. Run sampling**

Now run the docking program with your configuration.

In [None]:
#@title **Prepare config file**
#@markdown The previous parameters will be saved in the config file.
import os

print('Pareparing config file...')

# config file path
protein_path = os.path.abspath(protein_path)
if os.path.isfile(peptide_path_or_length):
  input_ligand = os.path.abspath(peptide_path_or_length)
else:
  input_ligand = 'peplen_' + peptide_path_or_length

# setup config/data/pocket_args
pocket_args = {"radius": radius}
if define_pocket_by == 'reference_molecule':
  if use_ligand_as_ref:
    assert input_ligand.endswith('.pdb'), 'If the input ligand is used as pocket reference, it must be PDB file that contains coordinate information.'
    pocket_args['ref_ligand_path'] = input_ligand
    print('Use input_ligand as pocket reference')
  else:
    reference_path = os.path.abspath(reference_path)
    pocket_args['ref_ligand_path'] = reference_path
    print(f'Use provided reference molecule in {reference_path} as pocket reference')
elif define_pocket_by == 'pocket_coordinate':
  pocket_coord = [pocket_x, pocket_y, pocket_y]
  pocket_args['pocket_coord'] = pocket_coord
  print(f'Use pocket coordinate {pocket_coord} as pocket reference')
else:
  raise ValueError(f'Invalid value for define_pocket_by: {define_pocket_by}.')

# task_mode
model_dict = {'full': 0, 'sc': 0, 'packing': 0}
if task_mode == 'de novo design':
  model_dict['full'] = 1
elif task_mode == 'inverse folding':
  model_dict['sc'] = 1
elif task_mode == 'side-chain repacking':
  model_dict['packing'] = 1
else:
  raise ValueError(f'Invalid value {task_mode} for task_mode')

# setup config/transforms
transforms = {
    "variable_sc_size":{  # distributions of number of side-chain atoms
      "name": "variable_sc_size",
      "applicable_tasks": ['pepdesign'],
      "num_atoms_distri": {
        "mean": 8,
        "std": {
          "coef": 0.3817,
          "bias": 1.8727
        }
      }
    }
}
if space_center_by == 'pocket_center':
  pass
elif space_center_by == 'input_ligand_center':
  transforms['featurizer'] = {'mol_as_pocket_center': True}
elif space_center_by == 'specified_coordinate':
  transforms['featurizer_pocket'] = {'center': [space_x, space_y, space_z]}
else:
  raise ValueError(f'Invalid space_center_by: {space_center_by}.')

data_id = '_'.join(task_name.split())
# Create configuration file
config = {
    "sample": {
        "seed": seed,
        "batch_size": batch_size,
        "num_mols": num_samples,
        "save_traj_prob": save_traj_prob,
    },
    "data": {
        "protein_path": protein_path,
        "input_ligand": input_ligand,
        "is_pep": True,
        "pocket_args": pocket_args,
        "pocmol_args": {
            "data_id": data_id,
            "pdbid": ""
        }
    },
    "transforms": transforms,
    "task": {
        "name": "pepdesign",
        "transform": {
            "name": "pepdesign",
            "settings": {
                "mode": model_dict,
            }
        }
    },
    "noise": {
        "name": "pepdesign",
        "num_steps": num_steps,
        "prior": {
            "bb": "from_train",
            "sc": "from_train",
        },
        "level": {
            "bb": {
                "name": "advance",
                "min": 0.0,
                "max": 1.0,
                "step2level": {
                    "scale_start": 0.99999,
                    "scale_end": 0.00001,
                    "width": 3
                }
            },
            "sc": {
                "name": "advance",
                "min": 0.0,
                "max": 1.0,
                "step2level": {
                    "scale_start": 0.99999,
                    "scale_end": 0.00001,
                    "width": 3
                }
            }
        }
    }
}

# Save configuration to file
import yaml
config_dir = f"/content/PocketXMol/configs/user_defined"
!mkdir -p {config_dir}
config_path = f"{config_dir}/{data_id}.yml"
with open(config_path, 'w') as f:
    yaml.dump(config, f, default_flow_style=False, indent=2)
print(f"✅ Configuration file created at {config_path}\n")


print("Configuration Summary for Peptide Design:")
print(f"- Protein File: {protein_path}")
print(f"- Ligand: {input_ligand}")
print(f"- Task Mode: {task_mode}")
print(f"- Pocket Radius: {radius} Å")
print(f"- Generate {num_samples} peptides with batch size of {batch_size}")

In [None]:
#@title **Generate docking poses**
#@markdown Running the PocketXMol sampling script


output_dir = output_directory

# # Create output directory
!mkdir -p {output_dir}

print(f"🚀 Starting sampling...")
print(f"This may take some time depending on your configuration.")

# Determine which script to run based on docking type
!cd /content/PocketXMol && source activate {env_name} && \
python scripts/sample_use.py \
    --config_task {config_path} \
    --outdir {output_dir} \
    --device {device}


# get exp_dir
from datetime import datetime
import re
exp_name_list = [f for f in os.listdir(output_dir) if f.startswith(data_id)]
def extract_timestamp(filename):
    match = re.search(r'(\d{8}_\d{6})$', filename)
    if match:
        return datetime.strptime(match.group(1), "%Y%m%d_%H%M%S")
    return datetime.min
exp_name = max(exp_name_list, key=lambda f: extract_timestamp(f))
gen_path = f'{output_dir}/{exp_name}'

print(f"✅ Sampling completed! Results saved to {gen_path}")



📋 Now there are the following content in the generation directory:
- `{exp_name}_SDF`: the SDF/PDB files of the generated ligand poses
- `SDF`: the generation trajactory files (Non-empty only if `save_traj_prob>0`)
- `gen_info.csv`: the meta information of the generated poses, including file names, self-confidence score (`cfd_traj`), and others.
- `log.txt`: generation logs
<!-- - `scripts`,`utils`,`models`: copy of core codes when running. -->
- `{job_name}.yml`: the complete config file.

## **4. Analyze Results**



In [None]:
#@title **Show top-ranked results**
import pandas as pd
import numpy as np
import shutil
from scipy.special import expit  # sigmoid


# Find ranking file if it exists (for small molecules)
gen_info = os.path.join(gen_path, "gen_info.csv")
gen_lig_dir = os.path.join(gen_path, f'{os.path.basename(gen_path)}_SDF')
pocket_path = os.path.join(gen_lig_dir, '0_inputs/pocket_block.pdb')
df_gen = pd.read_csv(gen_info)

# Sort by tuned_ranking score (higher is better)
df_gen = df_gen.sort_values(by="cfd_traj", ascending=False)
df_gen['cfd_traj_prob'] = df_gen['cfd_traj'].apply(expit)


# Show top values
n_top = 10
lines_top = df_gen.head(n_top)
lines_top.insert(0, 'rank', np.arange(n_top))
# make top subdirectory
top_dir = os.path.join(gen_path, 'top_ranks')
os.makedirs(top_dir, exist_ok=True)
for i_sort, (_, line) in enumerate(lines_top.iterrows()):
  filename = line['filename']
  src_path = os.path.join(gen_lig_dir, filename)
  tgt_path = os.path.join(top_dir, f'rank{i_sort}_{filename}')
  shutil.copy(src_path, tgt_path)
lines_top.to_csv(os.path.join(top_dir, f'gen_info_top{n_top}.csv'), index=False)
shutil.copy(pocket_path, os.path.join(top_dir, 'pocket_block.pdb'))

print(f"Top {n_top} poses:")
lines_top

In [None]:
# @title **Show 3D structure** {"run":"auto"}
receptor_style = "surface" #@param ["line", "surface", "cartoon", "stick"]
show_receptor = "pocket" #@param ["protein", "pocket", "none"]
ligand_style = "cartoon & stick" #@param ["stick", "cartoon", "cartoon & stick", "surface"]
show_ligand_rank = 0 # @param {type:"slider", min:0, max:10, step:1}


# Install required packages
try:
  import py3Dmol
except ModuleNotFoundError:
  os.system('pip install py3dmol')
  import py3Dmol
try:
  from rdkit import Chem
  from rdkit.Chem import AllChem
except ModuleNotFoundError:
  os.system('pip install rdkit')
  from rdkit import Chem
  from rdkit.Chem import AllChem

import ipywidgets as widgets
from IPython.display import display


# Prepare file
lig_filename_list = df_gen['filename'].values.tolist()


# Step : Visualization function
def show_complex(receptor_paths, ligand_paths):
    viewer = py3Dmol.view(width=800, height=600)

    # Load receptors
    for receptor_path in receptor_paths:
      with open(receptor_path, 'r') as f:
          viewer.addModel(f.read(), 'pdb')
      # Apply style options
      style = {}
      if receptor_style == 'cartoon':
          style = {"cartoon": {"color": "spectrum"}}
          viewer.setStyle(style)
      elif receptor_style == 'stick':
          style = {"stick": {"colorscheme": "greenCarbon"}}
          viewer.setStyle(style)
      elif receptor_style == 'surface':
          viewer.addSurface(py3Dmol.VDW, {'opacity': 0.7, 'color': 'white'}, {'model': 0})

    # Load ligands
    for ligand_path in ligand_paths:
      if ligand_path.endswith('.sdf'):
        if ligand_style == 'cartoon': continue
        suppl = Chem.SDMolSupplier(ligand_path)
        for mol in suppl:
          if mol is None:
            continue
          mol_block = Chem.MolToMolBlock(mol)
          viewer.addModel(mol_block, 'mol')
          if ligand_style == 'surface':
            viewer.addSurface(py3Dmol.VDW, {'opacity': 0.9, 'color': "spectrum"}, {'model': -1})
          else:
            viewer.setStyle({'model': -1}, {"stick": {}})
      else:  # assume PDB
        if ligand_style == 'stick': continue
        with open(ligand_path, 'r') as f:
          viewer.addModel(f.read(), 'pdb')
          viewer.setStyle({'model': -1}, {"cartoon": {"color": "spectrum"}})


    # set viewer
    viewer.zoomTo({'model': -1})
    return viewer


if show_ligand_rank < num_samples:
  show_lig_names = [lig_filename_list[show_ligand_rank]]
  show_lig_paths = [os.path.join(gen_lig_dir, name) for name in show_lig_names]
  if show_lig_paths[0].endswith('.pdb'):
    show_lig_paths.append(show_lig_paths[0].replace('.pdb', '_mol.sdf'))
  print('Showing ligand pose path:', show_lig_paths[0])
else:
  raise ValueError(f'`show_ligand_rank`={show_ligand_rank} cannot exceed `num_samples`={num_samples}.')
show_rec_paths = []
if show_receptor == 'protein':
  show_rec_paths.append(protein_path)
  print('Showing receptor (protein):', protein_path)
elif show_receptor ==  'pocket':
  show_rec_paths.append(pocket_path)
  print('Showing receptor (pocket):', pocket_path)

show_complex(show_rec_paths, show_lig_paths).show()

In [None]:
#@title **Download Results**
#@markdown Download the docking results.

from google.colab import files
import os

zip_filename = f"PXM_{exp_name}.zip"

# Zip the results directory
!cd {output_dir} && zip -r {zip_filename} {exp_name} -q
# Download the zip file
files.download(os.path.join(output_dir, zip_filename))
print(f"Downloaded {zip_filename} containing docking results")

