Todo:
- Add Chotia_to_hlt app
- Visualize input files (pre-computed BioLib job URL?)
- Update biolib app URLs
- Update URLs to official BioLib Repository
- Add "Master" BioLib app executing all functionality in one app

## De novo generation of antibody binders with RFantibody

In this notebook, you may choose your own antibody framework and target protein structure, and design novel antibody binders. This workflow has been shown to generate weak antibody binders in the μM to nM range, with up to ~5-10% experimental success rates for some degree of binding.

---

Antibody therapeutics represent a substantial market (approximately $550M USD) with tremendous potential for treating various diseases. Traditional approaches to antibody discovery are slow and laborious, typically involving immunizing mice or screening random libraries. 

This notebook implements the RFantibody pipeline for structure-based design of de novo antibodies against a chosen target.

It takes two inputs:
- i) An input antibody framework (e.g. hu-4D5-8_Fv.pdb - a humanized single-domain antibody already approved in two FDA therapies)
- ii) A target protein of interest (e.g. respiratory syncytial virus (RSV) protein) with binding site (epitope residues)

And runs the following three methods:
1. **De novo design of an antibody backbone targeting a protein of interest** - using an antibody-finetuned version of RFdiffusion ([Nature paper](https://www.nature.com/articles/s41586-023-06415-8))
2. **Design of the CDR loop residues** - using ProteinMPNN ([Science paper](https://www.science.org/doi/10.1126/science.add2187))
3. **Filtering designs on predicted structure 'self-consistency'** - using an antibody-finetuned version of RoseTTAFold2 ([Preprint](https://www.biorxiv.org/content/10.1101/2023.05.24.542179v1)), shown to correlate with significantly improved experimental success rates.

The RFantibody pipeline itself is described in detail in [this preprint](https://www.biorxiv.org/content/10.1101/2024.03.14.585103v2).

Advantages:
- Designs novel antibodies binding a target protein
- Can target most epitope binding region of interest (preferring structured regions)
- Focuses on designing antibody CDR loops (main residues determining binding)
- Designs may be filtered by "self-cosistency" of predicted structures, correlating with experimental success rates

Current limitations:
- Generated antibodies at best tend to be weak binders (low binding affinities in μM to nM range)
- Often low experimental success rates (~5-10% for some degree of binding) - heavily dependent on filtering
- No screening for e.g. human immunogenicity

## Pre-requisites for this workshop

The only pre-requisites for this workshop are the following:
- i) A BioLib account (biolib.com) for running RFantibody jobs (requires a GPU) in the cloud.
- ii) An antibody PDB and target protein structure PDB. You may use your own, or our examples provided in step 1 below.

#### i) Install pybiolib and login to biolib.com

In [1]:
# Download BioLib
!pip install --upgrade pybiolib

# Login with Biolib
import biolib
biolib.login()

2025-03-17 18:15:58,776 | INFO : Already signed in


#### ii) Set input antibody and target PDB
First, let's set up our environment and input files. We're working with two primary input files:

1. **framework_pdb**: `hu-4D5-8_Fv.pdb` - Our humanized single domain antibody (VHH) framework
2. **target_pdb**: `rsv_site3.pdb` - The target protein (RSV site 3) of interest
3. List of hotspot residues defining our epitope (target binding site)

You may also choose your own antibody framework and target PDB, prepared with this script:
https://github.com/RosettaCommons/RFantibody?tab=readme-ov-file#input-preparation

Let's verify that these files exists.

In [19]:
import os
framework_pdb = "hu-4D5-8_Fv.pdb"
target_pdb = "rsv_site3.pdb"
hotspot_res = "[T305,T456]"
print(f"Framework PDB exists: {os.path.exists(framework_pdb)}")
print(f"Target PDB exists: {os.path.exists(target_pdb)}")
print(f"(If these are missing, please download from https://github.com/mhoie/bioit-rfantibody before proceeding)")

Framework PDB exists: False
Target PDB exists: False
(If these are missing, please download from https://github.com/mhoie/bioit-rfantibody before proceeding)


## Step 1 of 3: Generate antibody-antigen docking pose, with [RFDiffusion (antibody-finetuned)](https://biolib.com/BioLibDevelopment/prediction-app/)

This step takes the input antibody framework and target protein, and designs the 3D structure of new CDR loops in interaction with the target protein (antibody-antigen docking pose). The CDR loops will be generated as backbones only (no residues), with the actual residues to be determined in the next step.

#### [Input parameters](https://biolib.com/BioLibDevelopment/prediction-app/)

Already set above:
- **antibody.framework_pdb**: Path to the antibody framework we're using (e.g. hu-4D5-8_Fv.pdb)
- **antibody.target_pdb**: Path to the target protein structure (e.g. rsv_site3.pdb)
- **ppi.hotspot_res**: List of hotspot residues defining our epitope (target binding site)

New parameters:
- **antibody.design_loops**: Dictionary mapping each CDR loop to a range of allowed loop lengths
  - L1, L2, L3: Light chain CDR loops
  - H1, H2, H3: Heavy chain CDR loops
  - Numbers specify length ranges (e.g., L1:8-13 means loop L1 can be 8-13 residues long)
  - Example: `[L1:8-13,L2:7,L3:9-11,H1:7,H2:6,H3:5-13]`
- **inference.num_designs**: Number of designs to generate (20)


In [5]:
# Input parameters (antibody framework and target PDBs are set above)
design_loops = "[L1:8-13,L2:7,L3:9-11,H1:7,H2:6,H3:5-13]"
num_designs = 20

# Output directory
outdir_rfdiff = "output/rfdiffusion"

# Run RFdiffusion through BioLib
app_rfdiff = biolib.load('BioLibDevelopment/prediction-app')  # Replace with actual RFantibody app ID
job_rfdiff = app_rfdiff.cli(args=[
    "--antibody.framework_pdb", f"{framework_pdb}",
    "--antibody.target_pdb", f"{target_pdb}",
    "--ppi.hotspot_res", f"{hotspot_res}",
    "--antibody.design_loops", f"{design_loops}",
    "--inference.num_designs", f"{num_designs}",
], blocking=False)

2025-03-17 18:17:34,315 | INFO : Loaded project BioLibDevelopment/prediction-app:0.0.39


#### RFdiffusion output files
The RFdiffusion step generates PDB files containing the antibody framework with designed CDR loops docked to the target protein. At this stage, the CDR loops have backbone structures but no amino acid sequences yet!

Output files:
- example.pdb - Antibody PDB structure, lacking the CDR loop residues (which will be predicted in the next step)

You can visualize the job output at the URL printed below, and store output files when the job is completed.

In [7]:
# Try to save job output files
print(f"Job URL: https://biolib.com/results/{job_rfdiff.id}")
if job_rfdiff.get_status() == "completed":
    job_rfdiff.save_files(outdir_rfdiff)
    print(f"Generated {len([f for f in os.listdir(outdir_rfdiff) if f.endswith('.pdb')])} PDB files")
else:
    print(f"Job status is not completed ({job_rfdiff.get_status()}), please wait a moment (or try again if failed)")

Job URL: https://biolib.com/results/58d16a9b-8562-408c-9545-dee80c8bdaff
2025-03-17 18:17:53,519 | INFO : Saving 1 files to output/rfdiffusion...
Generated 0 PDB files


## Step 2: Design binding CDR loop residues with [ProteinMPNN](https://biolib.com/BioLibDevelopment/prediction-app/)

The second step takes the docks generated by RFdiffusion and assigns amino acid sequences to the CDR loops using ProteinMPNN. We use the base version of ProteinMPNN (not an antibody-finetuned model) with a wrapper script that focuses on designing just the CDR loops.

#### [Input parameters](https://biolib.com/BioLibDevelopment/prediction-app/)
- **pdbdir**: Directory containing the input PDB files from RFdiffusion


In [9]:
# Input directory
input_dir = outdir_rfdiff  # Using the output from RFdiffusion

# Output directory
outdir_mpnn = "output/proteinmpnn"

# Run ProteinMPNN
app_mpnn = biolib.load('BioLibDevelopment/prediction-app')  # Replace with actual app ID
job_mpnn = app_mpnn.cli(args=[
    "-pdbdir", f"{input_dir}",
    "-num_seq_per_target", "1"
], blocking=False)

2025-03-17 18:18:27,267 | INFO : Loaded project BioLibDevelopment/prediction-app:0.0.39


#### ProteinMPNN outputs
ProteinMPNN outputs PDB files with the same structures as the input but with amino acid sequences designed for the CDR loops. By default, it provides one sequence per input structure.

Output files:
- example.pdb (antibody structure with predicted CDR residues)


In [12]:
# Try to save job output files
print(f"Job URL: https://biolib.com/results/{job_mpnn.id}")
if job_mpnn.get_status() == "completed":
    job_mpnn.save_files(outdir_mpnn)
    print(f"Generated {len([f for f in os.listdir(outdir_mpnn) if f.endswith('.pdb')])} PDB files")
else:
    print(f"Job status is not completed ({job_mpnn.get_status()}), please wait a moment (or try again if failed)")

Job URL: https://biolib.com/results/717cfc8f-e18f-4a58-bb70-e4f68a2103d0
2025-03-17 18:18:38,722 | INFO : Saving 1 files to output/proteinmpnn...
Generated 0 PDB files


## Step 3 / 3: Filter designs for predicted structure self-consistency, with [RosettaFold2 antibody fine-tuned](https://biolib.com/BioLibDevelopment/prediction-app/)
The final step uses an antibody-finetuned version of RoseTTAFold2 (RF2) to predict the structure of the designed sequences and assess whether RF2 is confident that the sequence will bind as designed.

The RFantibody protocol recommends filtering on the following metrics, shown to lead to an up to 10X improvement in experimental success rates.
- RF2 predicted alignment error (pAE) < 10
- RMSD between design and RF2 prediction < 2Å

### [Input parameters](https://biolib.com/BioLibDevelopment/prediction-app/)

- **input.pdb_dir**: Directory containing the PDB files from ProteinMPNN
- **num_recycles**: Number of recycling iterations in the model (default: 10)
- **hotspot**: Percentage of hotspots provided to the model (default: 10%)

In [15]:
# Input directory
input_dir = outdir_mpnn  # Using the output from ProteinMPNN

# Output directory
outdir_rf2 = "output/rosettfold2"

# Run RosettaFold2
app_rf2 = biolib.load('BioLibDevelopment/prediction-app')  # Replace with actual app ID
job_rf2 = app_rf2.cli(args=[
    "input.pdb_dir", f"{input_dir}",
    "num_recycles", "10",
    "hotspot", "0.1"
], blocking=False)

2025-03-17 18:19:29,310 | INFO : Loaded project BioLibDevelopment/prediction-app:0.0.39


# RosettaFold2 outputs
RosettaFold2 predicts the structure of the designed antibodies and provides confidence metrics. We can use these to filter for promising designs.

Output files:
- example.pdb - Predicted structure

In [18]:
# Try to save job output files
print(f"Job URL: https://biolib.com/results/{job_rf2.id}")
if job_rf2.get_status() == "completed":
    job_rf2.save_files(outdir_rf2)
    print(f"Generated {len([f for f in os.listdir(outdir_rf2) if f.endswith('.pdb')])} PDB files")
else:
    print(f"Job status is not completed ({job_rf2.get_status()}), please wait a moment (or try again if failed)")

Job URL: https://biolib.com/results/24e7ad8e-7a24-41e9-8a81-5308517bdf56
2025-03-17 18:19:46,309 | INFO : Saving 1 files to output/rosettfold2...
Generated 0 PDB files


## Conclusion

This notebook has demonstrated the complete RFantibody pipeline for structure-based design of de novo antibodies. The workflow consists of three main steps:

1. **RFdiffusion**: Generating antibody-target docking poses with designed CDR loop structures
2. **ProteinMPNN**: Designing amino acid sequences for the CDR loops
3. **RF2**: Filtering designs based on predicted structure quality

This computational pipeline can generate designs with a success rate of approximately 5-10% for some degree of binding to the target. Further experimental validation and optimization is likely to be required to improve binding affinity and other pharmaceutical properties.

For larger scale antibody design campaigns, we recommended to generating thousands of designs to increase the chances of finding high-quality binders, as the current filtering metrics are still highly limited.