## De novo generation of antibody binders with RFantibody
[![colab.ipynb](https://img.shields.io/badge/github-%23121011.svg?logo=github)](https://github.com/mhoie/workshop/blob/main/workshop.ipynb)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mhoie/workshop/blob/main/workshop.ipynb)

In this notebook, you may choose your own antibody framework and target protein structure, and design novel antibody binders. This workflow by the [Baker Lab](https://www.bakerlab.org/2025/02/28/designing-antibodies-with-rfdiffusion/) has been shown to generate weak antibody binders in the μM to nM range, with ~2% experimental success rates for some degree of binding after in-silico filtering.


---


Antibody therapeutics represent a substantial market (approximately $550M USD) with tremendous potential for treating various diseases. Traditional approaches to antibody discovery are slow and laborious, typically involving immunizing mice or screening random libraries.

This notebook implements the [RFAntibody pipeline](https://github.com/RosettaCommons/RFantibody/tree/main), described in this [pre-print](https://www.biorxiv.org/content/10.1101/2024.03.14.585103v2), for structure-based design of de novo antibodies against a chosen target.

**Inputs:**
- i) An input antibody framework (e.g. hu-4D5-8_Fv.pdb - a humanized ScFv antibody framework PDB
- ii) A target protein of interest (e.g. respiratory syncytial virus (RSV) protein)
- iii) Binding site on the target protein (epitope residues)

**Workflow:**
1. **RFDiffusion** generation of a bound antibody-antigen complex (backbone only) - using an antibody-finetuned version of RFdiffusion ([Nature paper](https://www.nature.com/articles/s41586-023-06415-8))
2. **ProteinMPNN** generation of designed CDR sequences, the complementarity determining loops involved in antibody binding ([Science paper](https://www.science.org/doi/10.1126/science.add2187))
3. **RosettaFold2** structure-prediction of designed sequences, to filter out low confidence structures - using an antibody-finetuned version of RoseTTAFold2 ([Preprint](https://www.biorxiv.org/content/10.1101/2023.05.24.542179v1)).

The last step has been shown to dramatically improve experimental success rates, by a factor of 10-20X.

**Output:**
- De novo designed antibody sequences, predicted to bind the target protein

<div>
<img src="https://github.com/mhoie/workshop/blob/main/img/whiteboard4.jpg?raw=1" width="1000"/>
</div>

**Advantages:**
- Designs novel antibodies binding a target protein
- Can target most epitope binding region of interest (preferring structured regions)
- Focuses on designing antibody CDR loops (main residues determining binding)
- Designs may be filtered by "self-cosistency" of predicted structures, correlating with experimental success rates

**Current limitations:**
- Generated antibodies at best tend to be weak binders (low binding affinities in μM to nM range)
- Often low experimental success rates (~2% for some degree of binding) - heavily dependent on filtering
- No screening for e.g. human immunogenicity

**Read more:**
- To read more about this workflow, please refer to the RFAntibody Github page: https://github.com/RosettaCommons/RFantibody/blob/main/README.md
- And the Biorxiv pre-print by Bennett et al 2024: https://www.biorxiv.org/content/10.1101/2024.03.14.585103v2

---

## Pre-requisites for this workshop

The only pre-requisites for this workshop are the following:
- i) Register a BioLib account on [https://biolib.com/sign-up](https://biolib.com/sign-up), for running RFantibody jobs (requires a GPU) in the cloud.
- ii) An antibody PDB and target protein structure PDB. Examples are provided below, but you may also provide your own following the instructions below.

#### i) Install pybiolib and login to biolib.com

In [1]:
# Download BioLib
!pip install --quiet --upgrade pybiolib

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/155.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m155.7/155.7 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/147.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m147.8/147.8 kB[0m [31m9.5 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
# Login with BioLib
import biolib
biolib.login()

Opening authorization page at: https://biolib.com/sign-in/request/notebook/?token=22tx92xXHbeKx6BJ
If your browser does not open automatically, click on the link above.


<IPython.core.display.Javascript object>

Successfully signed in!


#### ii) Set input antibody and target PDB
We're working with two primary input files for RFAntibody:

1. **framework_pdb**: `hu-4D5-8_Fv.pdb` - A humanized ScFv antibody Framework
2. **target_pdb**: `rsv_site3.pdb` - The target protein (RSV site 3) of interest
3. List of hotspot residues defining our epitope (target binding site)

Let's verify that these files exists.

In [3]:
# Download input files if not already present
!wget --no-verbose -nc "https://raw.githubusercontent.com/mhoie/workshop/refs/heads/main/hu-4D5-8_Fv.pdb"
!wget --no-verbose -nc "https://raw.githubusercontent.com/mhoie/workshop/refs/heads/main/rsv_site3.pdb"

2025-04-01 18:04:45 URL:https://raw.githubusercontent.com/mhoie/workshop/refs/heads/main/hu-4D5-8_Fv.pdb [131172/131172] -> "hu-4D5-8_Fv.pdb" [1]
2025-04-01 18:04:45 URL:https://raw.githubusercontent.com/mhoie/workshop/refs/heads/main/rsv_site3.pdb [461944/461944] -> "rsv_site3.pdb" [1]


#### iii) Bring your own target (optional)
You may also choose your own antibody framework and target PDB. Please see the bottom of this notebook, and the RFAntibody guide linked below:
https://github.com/RosettaCommons/RFantibody?tab=readme-ov-file#input-preparation



---

## Step 1 of 3: Generate antibody-antigen docking pose, with [RFDiffusion (antibody-finetuned)](https://biolib.com/BioITWorkshop/RFDiffusionAntibody)

This step takes the input antibody framework and target protein, and designs the 3D structure of new CDR loops in interaction with the target protein (antibody-antigen docking pose). The CDR loops will be generated as backbones only (no residues), with the actual residues to be determined in the next step.

#### [RFDiffusion input parameters](https://biolib.com/BioITWorkshop/RFDiffusionAntibody)

- **framework_pdb**: Path to the antibody framework we're using (e.g. hu-4D5-8_Fv.pdb)
- **target_pdb**: Path to the target protein structure (e.g. rsv_site3.pdb)
- **hotspot_res**: List of hotspot residue (chain + position) defining target binding site / epitope. Must always start with T to follow HLT format (explained at end of notebook)
- **design_loops**: Possible range of lengths of the CDR loops.
  - L1, L2, L3: Light chain CDR loops
  - H1, H2, H3: Heavy chain CDR loops
  - Numbers specify length ranges (e.g., L1:8-13 means loop L1 can be 8-13 residues long)
- **num_designs**: Number of 3D designs to generate (1)


In [4]:
# RFDiffusion input
framework_pdb = "hu-4D5-8_Fv.pdb"
target_pdb = "rsv_site3.pdb"
hotspot_res = "[T305,T456]" # E.g. chain T, position 305. Chain must always be T to follow HLT format (see end of notebook)

# RFDiffusion parameters
design_loops = "[L1:8-13,L2:7,L3:9-11,H1:7,H2:6,H3:5-13]" # Possible lengths of LCDR1, LCDR2, LCDR3, HCDR1, HCDR2, HCDR3
num_designs = 1

# Check that everything present
import os
print(f"Framework PDB exists: {os.path.exists(framework_pdb)}")
print(f"Target PDB exists: {os.path.exists(target_pdb)}")
print(f"(If these are missing, please download from https://github.com/mhoie/bioit-rfantibody before proceeding)")

Framework PDB exists: True
Target PDB exists: True
(If these are missing, please download from https://github.com/mhoie/bioit-rfantibody before proceeding)


In [None]:
# Run RFdiffusion through BioLib
app_rfdiff = biolib.load('BioITWorkshop/RFDiffusionAntibody')  # Replace with actual RFantibody app ID
job_rfdiff = app_rfdiff.run(
    target_pdb=target_pdb,
    framework_pdb=framework_pdb,
    hotspot_res=hotspot_res,
    design_loops=design_loops,
    num_designs=num_designs,
)
job_rfdiff.save_files("output/rfdiffusion")
job_rfdiff.list_output_files()

INFO:biolib:Loaded project BioITWorkshop/RFDiffusionAntibody:1.0.7
INFO:biolib:View the result in your browser at: https://biolib.com/results/098fbbd9-61ca-46cd-ac57-66ad9ef2f145/
INFO:biolib:Cloud: The job has been queued. Please wait...
INFO:biolib:Cloud: The job has been queued. Please wait...
INFO:biolib:Cloud: The job has been queued. Please wait...
INFO:biolib:Cloud: The job has been queued. Please wait...
INFO:biolib:Cloud: The job has been queued. Please wait...
INFO:biolib:Cloud: The job has been queued. Please wait...
INFO:biolib:Cloud: The job has been queued. Please wait...
INFO:biolib:Cloud: Initializing
INFO:biolib:Cloud: Pulling images...
INFO:biolib:Cloud: Computing...


Running RFDiffusion Antibody...
Starting generation of 1 design(s)...
Loading model...
Design 0: timestep 5/50
Design 0: timestep 10/50
Design 0: timestep 15/50
Design 0: timestep 20/50
Design 0: timestep 25/50
Design 0: timestep 30/50


#### Wait for RFdiffusion output files (~4-5 minutes)
The RFdiffusion step generates PDB files containing the antibody framework with designed CDR loops docked to the target protein. At this stage, the CDR loops have backbone structures but no amino acid sequences yet!

Main output file:
- output/rfdiffusion/_0.pdb - Antibody PDB backbone (N, Ca, C, O atoms only), lacking the CDR loop residues (which will be predicted in the next step)

Example output format:
```pdb
ATOM      1  N   GLU H   1      23.793  -8.718 -21.757  1.00  1.00

ATOM      2  CA  GLU H   1      23.755  -8.421 -20.330  1.00  1.00

ATOM      3  C   GLU H   1      23.563  -6.931 -20.082  1.00  1.00

ATOM      4  O   GLU H   1      23.856  -6.105 -20.947  1.00  1.00

ATOM      5  N   VAL H   2      22.855  -6.630 -18.891  1.00  1.00
```

### RFDiffusion training to predict protein structures
<div>
<img src="https://github.com/mhoie/workshop/blob/main/img/rfdiff.png?raw=1" width="700"/>
</div>

### Example RFDiffusion trajectory:

<div>
<img src="https://github.com/mhoie/workshop/blob/main/img/diffuse.gif?raw=1" width="700"/>
</div>

## Step 2 of 3: Design binding CDR loop residues with [ProteinMPNN](https://biolib.com/BioITWorkshop/ProteinMPNNAb)

The second step takes the docks generated by RFdiffusion and assigns amino acid sequences to the CDR loops using ProteinMPNN. We use the base version of ProteinMPNN (not an antibody-finetuned model) with a wrapper script that focuses on designing just the CDR loops.

#### [ProteinMPNN input parameters](https://biolib.com/BioITWorkshop/ProteinMPNNAb)
- **pdb**: Directory containing the previous RFdiffusion output PDB files, or a single PDB file
- **num_seqs_per_struct**: Number of sequences to design per input structure PDB file

In [None]:
# Input directory
input_dir = "output/rfdiffusion"  # Using the output from RFdiffusion

# Run ProteinMPNN
app_mpnn = biolib.load('BioITWorkshop/ProteinMPNN')
job_mpnn = app_mpnn.run(
    pdb=input_dir,
    num_seqs_per_struct=3
)
job_mpnn.save_files("output/proteinmpnn")
job_mpnn.list_output_files()

#### Wait for ProteinMPNN output files (<1 minute)
ProteinMPNN outputs PDB files with the same structures as the input but with amino acid sequences designed for the CDR loops. By default, it provides one sequence per input structure.

Output files:
- ab_design_0_dldesign_0.pdb (antibody structure with predicted CDR residues)
- ab_design_0_dldesign_1.pdb (antibody structure with predicted CDR residues)
- ... etc


Example output:
```pdb
ATOM      1  N   GLU H   1      23.793  -8.718 -21.757  1.00  0.00
ATOM      2  CA  GLU H   1      23.755  -8.421 -20.330  1.00  0.00
ATOM      3  C   GLU H   1      23.563  -6.931 -20.082  1.00  0.00
ATOM      4  O   GLU H   1      23.856  -6.105 -20.947  1.00  0.00
ATOM      5  N   VAL H   2      22.855  -6.630 -18.891  1.00  0.00
ATOM      6  CA  VAL H   2      22.864  -5.216 -18.533  1.00  0.00
```

## Step 3 / 3: Predict structure and filter for self-consistency, with [RosettaFold2 (antibody fine-tuned)](https://biolib.com/BioITWorkshop/RF2Antibody)

The final step uses an antibody-finetuned version of RoseTTAFold2 (RF2) to predict the structure of the designed sequences and assess whether RF2 is confident that the sequence will bind as designed.

### [RosettaFold2 input parameters](https://biolib.com/BioITWorkshop/RF2Antibody)

- **input.pdb_dir**: Directory containing the PDB files from ProteinMPNN
- **num_recycles**: Number of recycling iterations in the model (default: 10). Higher numbers up to 10 improves accuracy but at increased computational time

In [None]:
# Input directory
input_dir = "output/proteinmpnn"  # Using the output from ProteinMPNN

# Run RosettaFold2
app_rf2 = biolib.load('BioITWorkshop/RF2Antibody')  # Replace with actual app ID
job_rf2 = app_rf2.run(
    pdb=input_dir,
    num_recycles=3,
)
job_rf2.save_files("output/rosettafold2")
job_rf2.list_output_files()

#### Wait for RosettaFold2 output files (1-2 minutes)
RosettaFold2 predicts the structure of the designed antibodies and provides confidence metrics. We can use these to filter for promising designs.

Output files:
- scores.csv - Predicted structural quality scores for filtering of designs
- ab_design_0_dldesign_1.pdb - Predicted structure of design 0
- ab_design_0_dldesign_2.pdb - Predicted structure of design 1
- ... etc

Example output scores.tsv:
```csv
interaction_pae,pae,    pred_lddt,  target_aligned_antibody_rmsd, ..., framework_aligned_cdr_rmsd, ...
8.07,           8.77,   0.9,        11.53,                        ..., 2.18,                       ...
7.52,           8.19,   0.89,       18.97,                        ..., 2.4,                        ...
8.47,           9.15,   0.9,        10.8,                        ...,  2.35,                       ...

```

The RFantibody protocol recommends filtering on the following metrics, shown to lead to an up to 10X improvement in experimental success rates.
- RF2 predicted alignment error (pAE) < 10 (pae column)
- RMSD between design and RF2 prediction < 2.00 Å for the CDRs (framework_aligned_cdr_rmsd column)


Lower pAE values correlates strongly with improved accuracy of the designed antibody docks, and experimental success rate (see [pre-print Extended Data Figure 2B](https://www.biorxiv.org/content/10.1101/2024.03.14.585103v2) )

<div>
<img src="https://github.com/mhoie/workshop/blob/main/img/pae.png?raw=1" width="400"/>
</div>

---

## Conclusion

This notebook has demonstrated the complete RFantibody pipeline for structure-based design of de novo antibodies. The workflow consists of three main steps:

1. **RFdiffusion (antibody fine-tuned)**: Generating antibody-target docking poses with designed CDR loop structures
2. **ProteinMPNN**: Designing amino acid sequences for the CDR loops
3. **RosettaFold2 (antibody fine-tuned)**: Filtering designs based on predicted structure quality

This computational pipeline can generate designs with a success rate of approximately 2% for some degree of binding to the target. Further experimental validation and optimization is likely to be required to improve binding affinity and other pharmaceutical properties.

For larger scale antibody design campaigns, we recommended to generating thousands of designs to increase the chances of finding high-quality binders, as the current filtering metrics are still highly limited.

<div>
<img src="https://github.com/mhoie/workshop/blob/main/img/rfantibody.png?raw=1" width="700"/>
</div>

For practical tips on antibody design, we refer to Baker Lab's RFAntibody README's [practical considerations for Antibody Design ](https://github.com/RosettaCommons/RFantibody/blob/main/README.md#practical-considerations-for-antibody-design).

---

## What to try next

(Before processing, remember to rename your output folder to e.g. output2, or you'll be re-running old results!)

In [None]:
!mv output/ output_old

## A) Generate antibodies against a new target protein
In order to provide your own target PDB, you'll need to convert it into an HLT format first ([described here](https://github.com/RosettaCommons/RFantibody/blob/main/README.md#hlt-file-format) ).

- i) Upload your target PDB file to the TargetPDBtoHLT app on BioLib: https://biolib.com/BioITWorkshop/TargetPDBtoHLT/
- ii) Upload the processed PDB to Google colab
- iii) Change the target_pdb parameter in Step 1 on top of the notebook
- iv) Select the target epitope residues



### Example new target workflow

0. Remove anything currently in the output/ folder on Google Colab (or rename this folder to output2)

1. Choose a target from RCSB, e.g. the X-ray structure of 1M47 Human interleukin-2: https://www.rcsb.org/structure/1M47

<img src="https://github.com/mhoie/workshop/blob/main/img/rcsb_IL.png?raw=1" style="width:40%;">

2. Predict likely epitope residues with a tool like DiscoTope-3.0: https://biolib.com/DTU/DiscoTope-3

<img src="https://github.com/mhoie/workshop/blob/main/img/discotope3.png?raw=1" style="width:40%;">

3. Process the PDB file with the TargetPDBtoHLT app: https://biolib.com/BioITWorkshop/TargetPDBtoHLT/

4. Upload the processed PDB to Google Colab, and change the target_pdb variable to e.g. '1m47_T.pdb' (in Step 1)

<img src="https://github.com/mhoie/workshop/blob/main/img/change_params.png?raw=1" style="width:30%;">

5. Change the epitope_residues variable to e.g. '[T41, T111]' based on predicted epitopes (Note: T is the chain (must always be T), and number corresponds to the residue position in the PDB)

6. Run through the workflow as normal.

For this target, the known antibody-antigen dock is available at SAbDab for comparison:
https://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/sabdab/structureviewer/?pdb=8sow


## B) Try a nanobody framework

Swap out the heavy + light chain ScFv antibody framework (hu-4D5-8_Fv) for a nanobody framework instead:

```bash
framework_pdb = "h-NbBCII10.pdb"
```

h-NbBcII10FGLA is a widely used, humanized nanobody framework. Due to their smaller size and ease of assembly, nanobodies are hugely popular in antibody design. Two FDA-approved nanobody therapeutics already exist on the market (see [pre-print](https://www.biorxiv.org/content/10.1101/2024.03.14.585103v2) )

## C) Optimize the workflow

Optimize the following parameters to increase diversity of the generated designs, aiming to get the lowest possible pAE + CDR RMSD design.

- i) Antibody framework
- ii) Target PDB
- iii) Epitope residues
- iv) No. 3D backbone designs (RFDiffusion)
- v) CDR loop lengths (RFDiffusion)
- vi) No. designed sequences (ProteinMPNN)