## De novo generation of antibody binders with RFantibody
[![colab.ipynb](https://img.shields.io/badge/github-%23121011.svg?logo=github)](https://github.com/mhoie/workshop/blob/main/workshop.ipynb)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mhoie/workshop/blob/main/workshop.ipynb)

In this notebook, you may choose your own antibody framework and target protein structure, and design novel antibody binders. This workflow by the [Baker Lab](https://www.bakerlab.org/2025/02/28/designing-antibodies-with-rfdiffusion/) has been shown to generate weak antibody binders in the μM to nM range, with ~2% experimental success rates for some degree of binding after in-silico filtering.


---


Antibody therapeutics represent a substantial market (approximately $550M USD) with tremendous potential for treating various diseases. Traditional approaches to antibody discovery are slow and laborious, typically involving immunizing mice or screening random libraries.

This notebook implements the [RFAntibody pipeline](https://github.com/RosettaCommons/RFantibody/tree/main) (described in this [pre-print](https://www.biorxiv.org/content/10.1101/2024.03.14.585103v2) ) for structure-based design of de novo antibodies against a chosen target.

**Inputs:**
- i) An input antibody framework (e.g. hu-4D5-8_Fv.pdb - a humanized ScFv antibody framework PDB
- ii) A target protein of interest (e.g. respiratory syncytial virus (RSV) protein)
- iii) Binding site on the target protein (epitope residues)

**Workflow:**
1. **RFDiffusion** generation of a bound antibody-antigen complex (backbone only) - using an antibody-finetuned version of RFdiffusion ([Nature paper](https://www.nature.com/articles/s41586-023-06415-8))
2. **ProteinMPNN** generation of designed CDR sequences, the complementarity determining loops involved in antibody binding ([Science paper](https://www.science.org/doi/10.1126/science.add2187))
3. **RosettaFold2** structure-prediction of designed sequences, to filter out low confidence structures - using an antibody-finetuned version of RoseTTAFold2 ([Preprint](https://www.biorxiv.org/content/10.1101/2023.05.24.542179v1)).

The last step has been shown to dramatically improve experimental success rates, by a factor of 10-20X.

**Output:**
- De novo designed antibody sequences, predicted to bind the target protein

<img src="img/whiteboard4.jpg" style="width:60%;">


---


**Advantages:**
- Designs novel antibodies binding a target protein
- Can target most epitope binding region of interest (preferring structured regions)
- Focuses on designing antibody CDR loops (main residues determining binding)
- Designs may be filtered by "self-cosistency" of predicted structures, correlating with experimental success rates

**Current limitations:**
- Generated antibodies at best tend to be weak binders (low binding affinities in μM to nM range)
- Often low experimental success rates (~2% for some degree of binding) - heavily dependent on filtering
- No screening for e.g. human immunogenicity

**Read more:**
- To read more about this workflow, please refer to the RFAntibody Github page: https://github.com/RosettaCommons/RFantibody/blob/main/README.md
- And the Biorxiv pre-print by Bennett et al 2024: https://www.biorxiv.org/content/10.1101/2024.03.14.585103v2

## Pre-requisites for this workshop

The only pre-requisites for this workshop are the following:
- i) Register a BioLib account on [https://biolib.com/sign-up](https://biolib.com/sign-up), for running RFantibody jobs (requires a GPU) in the cloud.
- ii) An antibody PDB and target protein structure PDB. Examples are provided below, but you may also provide your own following the instructions below.

#### i) Install pybiolib and login to biolib.com

In [1]:
# Download BioLib
!pip install --quiet --upgrade pybiolib

In [2]:
# Login with BioLib
import biolib
biolib.login()

2025-03-31 11:26:18,087 | INFO : Already signed in


#### ii) Set input antibody and target PDB
We're working with two primary input files for RFAntibody:

1. **framework_pdb**: `hu-4D5-8_Fv.pdb` - A humanized ScFv antibody Framework
2. **target_pdb**: `rsv_site3.pdb` - The target protein (RSV site 3) of interest
3. List of hotspot residues defining our epitope (target binding site)

Let's verify that these files exists.

In [14]:
# Download input files if not already present
!wget --no-verbose -nc "https://raw.githubusercontent.com/mhoie/workshop/refs/heads/main/hu-4D5-8_Fv.pdb"
!wget --no-verbose -nc "https://raw.githubusercontent.com/mhoie/workshop/refs/heads/main/rsv_site3.pdb"

#### iii) Bring your own target (optional)
You may also choose your own antibody framework and target PDB. Please see the bottom of this notebook, and the RFAntibody guide linked below:
https://github.com/RosettaCommons/RFantibody?tab=readme-ov-file#input-preparation



## Step 1 of 4: Generate antibody-antigen docking pose, with [RFDiffusion (antibody-finetuned)](https://biolib.com/BioITWorkshop/RFDiffusionAntibody)
*Estimated time: ~2-3 minutes*

This step takes the input antibody framework and target protein, and designs the 3D structure of new CDR loops in interaction with the target protein (antibody-antigen docking pose). The CDR loops will be generated as backbones only (no residues), with the actual residues to be determined in the next step.

#### [RFDiffusion input parameters](https://biolib.com/BioITWorkshop/RFDiffusionAntibody)

Already set above:
- **framework_pdb**: Path to the antibody framework we're using (e.g. hu-4D5-8_Fv.pdb)
- **target_pdb**: Path to the target protein structure (e.g. rsv_site3.pdb)
- **hotspot_res**: List of hotspot residues defining our epitope (target binding site)

New parameters:
- **design_loops**: Dictionary mapping each CDR loop to a range of allowed loop lengths
  - L1, L2, L3: Light chain CDR loops
  - H1, H2, H3: Heavy chain CDR loops
  - Numbers specify length ranges (e.g., L1:8-13 means loop L1 can be 8-13 residues long)
  - Example: `[L1:8-13,L2:7,L3:9-11,H1:7,H2:6,H3:5-13]`
- **num_designs**: Number of designs to generate (20)


In [None]:
# RFDiffusion input
framework_pdb = "hu-4D5-8_Fv.pdb"
target_pdb = "rsv_site3.pdb"
hotspot_res = "[T305,T456]"'

# RFDiffusion parameters
design_loops = "[L1:8-13,L2:7,L3:9-11,H1:7,H2:6,H3:5-13]"
num_designs = 1

# Check that everything present
import os
print(f"Framework PDB exists: {os.path.exists(framework_pdb)}")
print(f"Target PDB exists: {os.path.exists(target_pdb)}")
print(f"(If these are missing, please download from https://github.com/mhoie/bioit-rfantibody before proceeding)")

In [16]:

# Run RFdiffusion through BioLib
app_rfdiff = biolib.load('BioITWorkshop/RFDiffusionAntibody')  # Replace with actual RFantibody app ID
job_rfdiff = app_rfdiff.run(
    target_pdb=target_pdb,
    framework_pdb=framework_pdb,
    hotspot_res=hotspot_res,
    design_loops=design_loops,
    num_designs=num_designs,
)
job_rfdiff.save_files("output/rfdiffusion", skip_file_if_exists=True)
job_rfdiff.list_output_files()

INFO:biolib:Loaded project BioITWorkshop/RFDiffusionAntibody:0.0.2
INFO:biolib:View the result in your browser at: https://biolib.com/results/5fb1dfeb-fc93-490e-a5f2-7cd5f683e950/


#### Wait for RFdiffusion output files (~2-3 minutes)
The RFdiffusion step generates PDB files containing the antibody framework with designed CDR loops docked to the target protein. At this stage, the CDR loops have backbone structures but no amino acid sequences yet!

Main output file:
- output/rfdiffusion/_0.pdb - Antibody PDB backbone (N, Ca, C, O atoms only), lacking the CDR loop residues (which will be predicted in the next step)

Example output format:
```pdb
ATOM      1  N   GLU H   1      23.793  -8.718 -21.757  1.00  1.00

ATOM      2  CA  GLU H   1      23.755  -8.421 -20.330  1.00  1.00

ATOM      3  C   GLU H   1      23.563  -6.931 -20.082  1.00  1.00

ATOM      4  O   GLU H   1      23.856  -6.105 -20.947  1.00  1.00

ATOM      5  N   VAL H   2      22.855  -6.630 -18.891  1.00  1.00
```

## Step 2 of 4: Design binding CDR loop residues with [ProteinMPNN](https://biolib.com/BioITWorkshop/ProteinMPNNAb)
*Estimated time: <1 minute*

The second step takes the docks generated by RFdiffusion and assigns amino acid sequences to the CDR loops using ProteinMPNN. We use the base version of ProteinMPNN (not an antibody-finetuned model) with a wrapper script that focuses on designing just the CDR loops.

#### [ProteinMPNN input parameters](https://biolib.com/BioITWorkshop/ProteinMPNNAb)
- **pdb**: Directory containing the previous RFdiffusion output PDB files, or a single PDB file
- **num_seqs_per_struct**: Number of sequences to design per input structure PDB file

In [17]:
# Input directory
input_dir = "output/rfdiffusion"  # Using the output from RFdiffusion

# Run ProteinMPNN
app_mpnn = biolib.load('BioITWorkshop/ProteinMPNNAb')
job_mpnn = app_mpnn.run(
    pdb=input_dir,
    num_seqs_per_struct=3
)
job_mpnn.save_files("output/proteinmpnn", skip_file_if_exists=True)
job_mpnn.list_output_files()

INFO:biolib:Loaded project BioITWorkshop/ProteinMPNNAb:0.0.3
INFO:biolib:View the result in your browser at: https://biolib.com/results/7870dc6f-0e88-4c28-a602-cfcf9990791f/


#### Wait for ProteinMPNN output files (<1 minute)
ProteinMPNN outputs PDB files with the same structures as the input but with amino acid sequences designed for the CDR loops. By default, it provides one sequence per input structure.

Output files:
- _0_dldesign_0.pdb (antibody structure with predicted CDR residues)
- _0_dldesign_1.pdb (antibody structure with predicted CDR residues)
- ... etc


Example output:
```pdb
ATOM      1  N   GLU H   1      23.793  -8.718 -21.757  1.00  0.00
ATOM      2  CA  GLU H   1      23.755  -8.421 -20.330  1.00  0.00
ATOM      3  C   GLU H   1      23.563  -6.931 -20.082  1.00  0.00
ATOM      4  O   GLU H   1      23.856  -6.105 -20.947  1.00  0.00
ATOM      5  N   VAL H   2      22.855  -6.630 -18.891  1.00  0.00
ATOM      6  CA  VAL H   2      22.864  -5.216 -18.533  1.00  0.00
```

## Step 3 / 4: Filter designs for predicted structure self-consistency, with [RosettaFold2 antibody fine-tuned](https://biolib.com/BioITWorkshop/RF2Antibody)
*Estimated time: ~1-2 minutes*

The final step uses an antibody-finetuned version of RoseTTAFold2 (RF2) to predict the structure of the designed sequences and assess whether RF2 is confident that the sequence will bind as designed.

The RFantibody protocol recommends filtering on the following metrics, shown to lead to an up to 10X improvement in experimental success rates.
- RF2 predicted alignment error (pAE) < 10
- RMSD between design and RF2 prediction < 2Å for the CDRs

### [RosettaFold2 input parameters](https://biolib.com/BioITWorkshop/RF2Antibody)

- **input.pdb_dir**: Directory containing the PDB files from ProteinMPNN
- **num_recycles**: Number of recycling iterations in the model (default: 10). Higher numbers up to 10 improves accuracy but at increased computational time

In [20]:
# Input directory
input_dir = "output/proteinmpnn"  # Using the output from ProteinMPNN

# Run RosettaFold2
app_rf2 = biolib.load('BioITWorkshop/RF2Antibody')  # Replace with actual app ID
job_rf2 = app_rf2.run(
    pdb=input_dir,
    num_recycles=3,
)
job_rf2.save_files("output/rosettafold2", skip_file_if_exists=True)
job_rf2.list_output_files()

INFO:biolib:Loaded project BioITWorkshop/RF2Antibody:0.0.3
INFO:biolib:View the result in your browser at: https://biolib.com/results/a4de66df-5bdd-4ff5-b7e5-3680ece86287/


#### Wait for RosettaFold2 output files (1-2 minutes)
RosettaFold2 predicts the structure of the designed antibodies and provides confidence metrics. We can use these to filter for promising designs.

Output files:
- scores.csv - Predicted structural quality scores for filtering of designs
- _0_dldesign_0.pdb - Predicted structure of design 0
- _0_dldesign_1.pdb - Predicted structure of design 1
- ... etc

Example output scores.tsv:
```csv
interaction_pae,pae,    pred_lddt,  target_aligned_antibody_rmsd, ..., framework_aligned_cdr_rmsd, ...
8.07,           8.77,   0.9,        11.53,                        ..., 2.18,                       ...
7.52,           8.19,   0.89,       18.97,                        ..., 2.4,                        ...
8.47,           9.15,   0.9,        10.8,                        ...,  2.35,                       ...

```

Of these, our target values are:
- Predicted alignment error (pae) below 10
- Framework aligned CDR rmsd (framework_aligned_cdr_rmsd) below 2.00

## Conclusion

This notebook has demonstrated the complete RFantibody pipeline for structure-based design of de novo antibodies. The workflow consists of three main steps:

1. **RFdiffusion (antibody fine-tuned)**: Generating antibody-target docking poses with designed CDR loop structures
2. **ProteinMPNN (antibody fine-tuned)**: Designing amino acid sequences for the CDR loops
3. **RosettaFold2 (antibody fine-tuned)**: Filtering designs based on predicted structure quality
4. **AntibodyProfiler**: Further selection of designs based on similarity to therapeutically approved antibodies

This computational pipeline can generate designs with a success rate of approximately 2% for some degree of binding to the target. Further experimental validation and optimization is likely to be required to improve binding affinity and other pharmaceutical properties.

For larger scale antibody design campaigns, we recommended to generating thousands of designs to increase the chances of finding high-quality binders, as the current filtering metrics are still highly limited.

### Bring your own antibody target

In order to provide your own target PDB, you'll need to convert it into an HLT format first ([described here](https://github.com/RosettaCommons/RFantibody/blob/main/README.md#hlt-file-format) ).


- i) Upload yyour target PDB file to the TargetPDBtoHLT app on BioLib: https://biolib.com/BioITWorkshop/TargetPDBtoHLT/
- ii) Download the new HLT-formatted PDB file
- iii) Change the target_pdb parameter in Step 1 on top of the notebook
- iv) Select the target epitope residues




<img src="img/rfantibody.png" style="width:60%;">


For practical tips on antibody design, we refer to Baker Lab's RFAntibody README's [practical considerations for Antibody Design ](https://github.com/RosettaCommons/RFantibody/blob/main/README.md#practical-considerations-for-antibody-design).