# De Novo Protein Design Workflow using NIMs


This example notebook outlines a workflow for creating de novo protein binders using NVIDIA Inference Microservices (NIMs). This workflow leverages advanced AI models to enable computational biologists to design novel protein structures efficiently.

The input to this workflow is a protein sequence, which is then fed to AlphaFold2 for structural prediction; alternatively, this can be skipped and a precomputed protein structure (in PDB format) can be used as input. Protein backbones are then generated with RFDiffusion, sequences are generated with ProteinMPNN, and finally complex structures are predicted with AlphaFold2-multimer. 

This setup provides a powerful framework for exploring protein design, offering flexibility and precision in generating functional protein binders. For more information, refer to the respective repositories and documentation.

## Getting started with Demo NIMs

This is all performed bu AI-Factory `blueprint`

Initial startup of the `AlphaFold NIM` data is time consuming and requires roughly 1.2TB of disk space

After set up is complete, check the status of the four running NIMS e.g with the command:

```bash
curl -sS http://127.0.0.1:18081/v1/health/ready
curl -sS http://127.0.0.1:18082/v1/health/ready
curl -sS http://127.0.0.1:18083/v1/health/ready
curl -sS http://127.0.0.1:18084/v1/health/ready
```

In [1]:
!curl -sS http://127.0.0.1:18081/v1/health/ready | jq .

[1;39m{
  [0m[1;34m"status"[0m[1;39m: [0m[0;32m"ready"[0m[1;39m
[1;39m}[0m


First, we'll install some prerequisites so our examples work.

In [2]:
! pip install requests



In [3]:
import json
import os
import requests
from enum import StrEnum, Enum
from typing import Tuple, Dict, Any, List
from pathlib import Path
from datetime import datetime

One needs to use an NGC Personal Key to run the examples below.

In [4]:
NVIDIA_API_KEY = os.getenv("NVIDIA_API_KEY") or input("Paste Run Key: ")

Paste Run Key:  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


In [5]:
HEADERS = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {NVIDIA_API_KEY}",
    "poll-seconds": "900"
}

In [6]:
NIM_HOST_URL_BASE = "http://localhost"

class NIM_PORTS(Enum):
    ALPHAFOLD2_PORT = 18081
    RFDIFFUSION_PORT = 18082
    PROTEINMPNN_PORT = 18083
    AF2_MULTIMER_PORT = 18084


class NIM_ENDPOINTS(StrEnum):
    ALPHAFOLD2 = "protein-structure/alphafold2/predict-structure-from-sequence"
    RFDIFFUSION =  "biology/ipd/rfdiffusion/generate"
    PROTEINMPNN =  "biology/ipd/proteinmpnn/predict"
    AF2_MULTIMER = "protein-structure/alphafold2/multimer/predict-structure-from-sequences"

In [7]:
def query_nim(
            payload: Dict[str, Any],
            nim_endpoint: str,
            headers: Dict[str, str] = HEADERS,
            base_url: str = "http://localhost",
            nim_port: int = 8080,
            echo: bool = False) -> Tuple[int, Dict]:
    function_url = f"{base_url}:{nim_port}/{nim_endpoint}"
    if echo:
        print("*"*80)
        print(f"\tURL: {function_url}")
        print(f"\tPayload: {payload}")
        print("*"*80)
    response = requests.post(function_url,
                            json=payload,
                            headers=headers)
    if response.status_code == 200:
        return response.status_code, response.json()
    else:
        raise Exception(f"Error: response returned code [{response.status_code}], with text: {response.text}")

def check_nim_readiness(nim_port: NIM_PORTS,
                        base_url: str = NIM_HOST_URL_BASE,
                        endpoint: str = "v1/health/ready") -> bool:
    """
    Return true if a NIM is ready.
    """
    try:
        response = requests.get(f"{base_url}:{nim_port}/{endpoint}")
        d = response.json()
        if "status" in d:
            if d["status"] == "ready":
                return True
        return False
    except Exception as e:
        print(e)
        return False

def get_reduced_pdb(pdb_id: str, rcsb_path: str = None) -> str:
    pdb = Path(pdb_id)
    if not pdb.exists() and rcsb_path is not None:
        pdb.write_text(requests.get(rcsb_path).text)
    lines = filter(lambda line: line.startswith("ATOM"), pdb.read_text().split("\n"))
    return "\n".join(list(lines))


In [8]:
class ExampleRequestParams:
    def __init__(self,
                target_sequence: str,
                contigs: str, 
                hotspot_res: List[str],
                input_pdb_chains: List[str],
                ca_only: bool,
                use_soluble_model: bool,
                sampling_temp: List[float],
                diffusion_steps: int = 15,
                num_seq_per_target: int = 20):
        self.target_sequence = target_sequence
        self.contigs = contigs
        self.hotspot_res = hotspot_res
        self.input_pdb_chains = input_pdb_chains
        self.ca_only = ca_only
        self.use_soluble_model = use_soluble_model
        self.sampling_temp = sampling_temp
        self.diffusion_steps = diffusion_steps
        self.num_seq_per_target = num_seq_per_target

### Example data
Below, we include three example input sets. Note that these are of varying difficulty and will exhibit different runtimes and resource utilizations.
- Example 1R42 should run on most systems with 4 GPUs with 40GB of VRAM or more.
- Example 5PTN
- Example 6VXX requires 4 GPUs with 80GB of VRAM each.

In [9]:
example_6vxx = ExampleRequestParams(
    target_sequence="MGILPSPGMPALLSLVSLLSVLLMGCVAETGTQCVNLTTRTQLPPAYTNSFTRGVYYPDKVFRSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPVLPFNDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKVCEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVSQPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRDLPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWTAGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKCTLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATRFASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFELLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTNTSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTGSNVFQTRAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPSGAGSVASQSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPVSMTKTSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDKNTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSFIEDLLFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDPPEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTTAPAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNTFVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYEQYIKGSGRENLYFQGGGGSGYIPEAPRDGQAYVRKDGEWVLLSTFLGHHHHHHHH",
    contigs="A353-410/0 100-200",
    hotspot_res=["A360","A361","A362","A366"],
    input_pdb_chains=["A"],
    ca_only=False,
    use_soluble_model=False,
    sampling_temp=[0.1],
    diffusion_steps=15,
    num_seq_per_target=20
)
example_5ptn = ExampleRequestParams(
    target_sequence="NITEEFYQSTCSAVSKGYLSALRTGWYTSVITIELSNIKKIKCNGTDAKIKLIKQELDKYKNAVTELQLLMQSTPATNNQARGSGSGRSLGFLLGVGSAIASGVAVSKVLHLEGEVNKIKSALLSTNKAVVSLSNGVSVLTSKVLDLKNYIDKQLLPIVNKQSCSIPNIETVIEFQQKNNRLLEITREFSVNAGVTTPVSTYMLTNSELLSLINDMPITNDQKKLMSNNVQIVRQQSYSIMSIIKEEVLAYVVQLPLYGVIDTPCWKLHTSPLCTTNTKEGSNICLTRTDRGWYCDNAGSVSFFPQAETCKVQSNRVFCDTMNSLTLPSEVNLCNVDIFNPKYDCKIMTSKTDVSSSVITSLGAIVSCYGKTKCTASNKNRGIIKTFSNGCDYVSNKGVDTVSVGNTLYYVNKQEGKSLYVKGEPIINFYDPLVFPSDQFDASISQVNEKINQSLAFIRKSDELLSAIGGYIPEAPRDGQAYVRKDGEWVLLSTFLGGLVPRGSHHHHHH",
    contigs="A1-25/0 70-100",
    hotspot_res=["A14","A15","A17","A18"],
    input_pdb_chains=["A"],
    ca_only=False,
    use_soluble_model=False,
    sampling_temp=[0.1],
    diffusion_steps=15,
    num_seq_per_target=20
)
example_1r42 = ExampleRequestParams(
    target_sequence="STIEEQAKTFLDKFNHEAEDLFYQSSLASWNYNTNITEENVQNMNNAGDKWSAFLKEQSTLAQMYPLQEIQNLTVKLQLQALQQNGSSVLSEDKSKRLNTILNTMSTIYSTGKVCNPDNPQECLLLEPGLNEIMANSLDYNERLWAWESWRSEVGKQLRPLYEEYVVLKNEMARANHYEDYGDYWRGDYEVNGVDGYDYSRGQLIEDVEHTFEEIKPLYEHLHAYVRAKLMNAYPSYISPIGCLPAHLLGDMWGRFWTNLYSLTVPFGQKPNIDVTDAMVDQAWDAQRIFKEAEKFFVSVGLPNMTQGFWENSMLTDPGNVQKAVCHPTAWDLGKGDFRILMCTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKSIGLLSPDFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKDQWMKKWWEMKREIVGVVEPVPHDETYCDPASLFHVSNDYSFIRYYTRTLYQFQFQEALCQAAKHEGPLHKCDISNSTEAGQKLFNMLRLGKSEPWTLALENVVGAKNMNVRPLLNYFEPLFTWLKDQNKNSFVGWSTDWSPYAD",
    contigs="A114-353/0 50-100",
    hotspot_res=["A119","A123","A233","A234","A235"],
    input_pdb_chains=["A"],
    ca_only=False,
    use_soluble_model=False,
    sampling_temp=[0.1],
    diffusion_steps=15,
    num_seq_per_target=20
)

In [10]:
## Set the example here to switch example inputs.
## Note: Example 6vxx requires a GPU with at least 80GB of VRAM.
example = example_5ptn

### Check that the NIM is ready from Python

We can test whether each NIM is up and running using our check_nim_readiness function

In [11]:
status = check_nim_readiness(NIM_PORTS.ALPHAFOLD2_PORT.value)
print(f"AlphaFold2 NIM is ready: {status}")

AlphaFold2 NIM is ready: True


In [12]:
status = check_nim_readiness(NIM_PORTS.PROTEINMPNN_PORT.value)
print(f"ProteinMPNN NIM is ready: {status}")

ProteinMPNN NIM is ready: True


In [13]:
status = check_nim_readiness(NIM_PORTS.RFDIFFUSION_PORT.value)
print(f"RFDiffusion NIM is ready: {status}")

RFDiffusion NIM is ready: True


In [14]:
status = check_nim_readiness(NIM_PORTS.AF2_MULTIMER_PORT.value)
print(f"AlphaFold2-multimer NIM is ready: {status}")

AlphaFold2-multimer NIM is ready: True


### AlphaFold2

AlphaFold2 is a deep learning model for predicting protein structure from amino acid sequence that has achieved state-of-the-art performance. The NVIDIA AlphaFold2 NIM includes GPU-accelerated MMseqs2, which accelerates the MSA portion of the structural prediction pipeline.

**Inputs**:
- `sequence`: An amino acid sequence
- `algorithm`: The algorithm used for Multiple Sequence Alignment (MSA). This can be either of `jackhmmer` or `mmseqs2`. MMSeqs2 is significantly faster.

**Outputs**:
- A list of predicted structures in PDB format.

In [15]:
## estimated runtime: ~25 minutes for example 1R42 on 1 A6000 GPU
## 12 minutes on H100 for example 1R42
alphafold2_query = {
    "sequence" : example.target_sequence,
    "algorithm" : "mmseqs2",
}
print(f"[{datetime.now():%Y-%m-%d %H:%M:%S}] start predicted structures in PDB format")
rc, alphafold2_response = query_nim(
                                    payload=alphafold2_query,
                                    nim_endpoint=NIM_ENDPOINTS.ALPHAFOLD2.value,
                                    nim_port=NIM_PORTS.ALPHAFOLD2_PORT.value,
                                    echo=True
                                    )
print(f"[{datetime.now():%Y-%m-%d %H:%M:%S}] end predicted structures in PDB format")

[2025-11-05 11:01:40] start predicted structures in PDB format
********************************************************************************
	URL: http://localhost:18081/protein-structure/alphafold2/predict-structure-from-sequence
	Payload: {'sequence': 'NITEEFYQSTCSAVSKGYLSALRTGWYTSVITIELSNIKKIKCNGTDAKIKLIKQELDKYKNAVTELQLLMQSTPATNNQARGSGSGRSLGFLLGVGSAIASGVAVSKVLHLEGEVNKIKSALLSTNKAVVSLSNGVSVLTSKVLDLKNYIDKQLLPIVNKQSCSIPNIETVIEFQQKNNRLLEITREFSVNAGVTTPVSTYMLTNSELLSLINDMPITNDQKKLMSNNVQIVRQQSYSIMSIIKEEVLAYVVQLPLYGVIDTPCWKLHTSPLCTTNTKEGSNICLTRTDRGWYCDNAGSVSFFPQAETCKVQSNRVFCDTMNSLTLPSEVNLCNVDIFNPKYDCKIMTSKTDVSSSVITSLGAIVSCYGKTKCTASNKNRGIIKTFSNGCDYVSNKGVDTVSVGNTLYYVNKQEGKSLYVKGEPIINFYDPLVFPSDQFDASISQVNEKINQSLAFIRKSDELLSAIGGYIPEAPRDGQAYVRKDGEWVLLSTFLGGLVPRGSHHHHHH', 'algorithm': 'mmseqs2'}
********************************************************************************
[2025-11-05 11:09:45] end predicted structures in PDB format


In [16]:
## Print the first two lines (160 characters) of the alphafold2 response
alphafold2_response[0][0:160]

'ATOM      1  N   ASN A   1       0.953  22.065  -2.606  1.00 90.96           N  \nATOM      2  H   ASN A   1       0.622  22.843  -2.054  1.00 90.96           H '

## RFDiffusion

This section demonstrates how to use RFDiffusion NIM in a *de novo* protein design workflow. Inspired by AI image generation models, RFDiffusion applies generative diffusion techniques to create novel protein structures. It excels in designing complex protein architectures, including binders and symmetric assemblies, by sculpting atomic clouds into functional proteins.

**Inputs**
- `input_pdb` is the protein target in PDB format
- `contigs` is the RFDiffusion language for how to specify regions to work on. See the official [RFDiffusion repo](https://github.com/RosettaCommons/RFdiffusion?tab=readme-ov-file#running-the-diffusion-script) for a full breakdown. A20-60/0 50-100 means to generate a binder to chain A residue 20-60, where the binder is 50-100 residues long. The /0 specifies a chain break.
- `hotspot_res` hot spot residues (specifically for binders)
- `diffusion_steps` number of diffusion_steps

**Output**:
- `output_pdb` is the output pdb
- `protein` is the input pdb

In [17]:
## Expected runtime: ~15 seconds to 1 minute
## H100 runtime: 9 seconds
rfdiffusion_query = {
        "input_pdb" : alphafold2_response[0], ## Take the first structure prediction (of 5) from AlphaFold2
        "contigs" : "51-51/A163-181/60-60", #example.contigs
        # "hotspot_res" : example.hotspot_res,
        "diffusion_steps" : example.diffusion_steps
    }
print(f"[{datetime.now():%Y-%m-%d %H:%M:%S}] start predicted of novel protein structures in PDB format")
rc, rfdiffusion_response = query_nim(
    payload=rfdiffusion_query,
    nim_endpoint=NIM_ENDPOINTS.RFDIFFUSION.value,
    nim_port=NIM_PORTS.RFDIFFUSION_PORT.value
)
print(f"[{datetime.now():%Y-%m-%d %H:%M:%S}] end predicted structures in PDB format")

[2025-11-05 11:09:45] start predicted of novel protein structures in PDB format
[2025-11-05 11:09:47] end predicted structures in PDB format


In [18]:
## Print the first 160 characters of the RFDiffusion PDB output
print(rfdiffusion_response["output_pdb"][0:160])

ATOM      1  N   GLY A   1       0.634 -18.891 -11.516  1.00  0.00
ATOM      2  CA  GLY A   1      -0.091 -19.811 -12.385  1.00  0.00
ATOM      3  C   GLY A   1


## ProteinMPNN
ProteinMPNN (Protein Message Passing Neural Network) is a deep learning-based graph neural network used in *de novo* protein design workflows. It predicts amino acid sequences for given protein backbones, leveraging evolutionary, functional, and structural information to generate sequences that are likely to fold into the desired 3D structures. This tool integrates seamlessly with NIMs into workflows involving RFDiffusion for backbone generation and AlphaFold-2 Multimer for interaction prediction, enhancing the accuracy and efficiency of protein design.

**Inputs**: 
- `input_pdb` Input protein for which amino acid sequences need to be predicted
- `ca_only` Defaults to false, CA-only model helps to address specific needs in protein design where focusing on the alpha carbon (CA)
- `use_soluble_model` ProteinMPNN offers soluble models for applications requiring high solubility and non-soluble models for membrane protein studies and industrial applications where solubility is less critical.
- `num_seq_per_target` how many seqs to generate for a given target protein structure
- `sampling_temp` ranges from 0 to 1 ranges from 0 to 1 and controls the diversity of design outcomes by adjusting the probability values for the 20 amino acids at each sequence position. Higher values increase
 
**Outputs**:
- `ProteinMPNN.fa` which is a fasta file containing the generated sequences for the given structure.

In [19]:
## Expected runtime: < 30 seconds for 20 short sequences
## H100 Runtime: 8 seconds
proteinmpnn_query = {
        "input_pdb" : rfdiffusion_response["output_pdb"],
        "input_pdb_chains" : example.input_pdb_chains,
        "ca_only" : example.ca_only,
        "use_soluble_model" : example.use_soluble_model,
        "num_seq_per_target" : example.num_seq_per_target,
        "sampling_temp" : example.sampling_temp
}

rc, proteinmpnn_response = query_nim(
    payload=proteinmpnn_query,
    nim_endpoint=NIM_ENDPOINTS.PROTEINMPNN.value,
    nim_port=NIM_PORTS.PROTEINMPNN_PORT.value
)

In the next step, we'll extract FASTA sequences from the output FASTA file created by ProteinMPNN. Then, we'll create binder-target pairs that we can feed to AlphaFold2-Multimer to predict the binder-target complex structure.

In [20]:
fasta_sequences = [x.strip() for x in proteinmpnn_response["mfasta"].split("\n") if '>' not in x][2:]

binder_target_pairs = [[binder, example.target_sequence] for binder in fasta_sequences]

print(f"Generated {len(fasta_sequences)} FASTA sequences and {len(binder_target_pairs)} binder-target pairs.")

Generated 20 FASTA sequences and 20 binder-target pairs.


### AlphaFold2-Multimer

AlphaFold2-Multimer is a deep learning model that extends the AlphaFold2 pipelines to predict the combined structure a list of input peptide sequences. 

**Inputs**:

- `sequences`: A list of peptide sequences. For this use case, a single pair of sequences (one peptide chain from the ProteinMPNN result plus the original protein sequence used as input to this workflow).
- `algorithm`: The algorithm uses for Multiple Sequence Alignment (MSA). This can be either `jackhmmer` or `mmseqs2`. MMSeqs2 is significantly faster.

**Output**:

- A list of lists of predicted structures in PDB format. A list of five predictions is returned for each input binder-target pair.

In [24]:
print(f"{NIM_ENDPOINTS.AF2_MULTIMER.value}")
print(f"{NIM_PORTS.AF2_MULTIMER_PORT.value}")
rc, multimer_response = query_nim(
        payload=multimer_query,
        nim_endpoint=NIM_ENDPOINTS.AF2_MULTIMER.value,
        nim_port=NIM_PORTS.AF2_MULTIMER_PORT.value
    )

protein-structure/alphafold2/multimer/predict-structure-from-sequences
18084


KeyboardInterrupt: 

In [21]:
## Expected runtime: 20 min per binder-target pair.
## Total runtime: roughly 3 hours
n_processed = 0
multimer_response_codes = [0 for i in binder_target_pairs]
multimer_results = [None for i in binder_target_pairs]

## NOTE: change this value to process more or fewer target-binder pairs.
pairs_to_process = 1

for binder_target_pair in binder_target_pairs:
    multimer_query = {
        "sequences" : binder_target_pair,
        "selected_models" : [1]
    }
    print(f"[{datetime.now():%Y-%m-%d %H:%M:%S}] Processing pair number {n_processed+1} of {len(binder_target_pairs)}")
    rc, multimer_response = query_nim(
        payload=multimer_query,
        nim_endpoint=NIM_ENDPOINTS.AF2_MULTIMER.value,
        nim_port=NIM_PORTS.AF2_MULTIMER_PORT.value
    )
    multimer_response_codes[n_processed] = rc
    multimer_results[n_processed] = multimer_response
    print(f"[{datetime.now():%Y-%m-%d %H:%M:%S}] Finished binder-target pair number {n_processed+1} of {len(binder_target_pairs)}")
    n_processed += 1
    if n_processed >= pairs_to_process:
        break

[2025-11-05 11:09:52] Processing pair number 1 of 20


KeyboardInterrupt: 

In [None]:
## Print just the first 160 characters of the first multimer response
result_idx = 0
prediction_idx = 0
print(multimer_results[result_idx][prediction_idx][0:160])

### Assessing the predicted binders and structures

There are many metrics that can be used to assess the quality of the predicted binder-target structure. The predicted local distance difference test (pLDDT) is a measure of per-residue confidence in the local structure. It has a range of zero to one hundred, with higher scores considered more accurate.

The following snippet ranks the results of the binder-target pair AlphaFold2-Multimer predictions by their pLDDT.

In [None]:
# Function to calculate average pLDDT over all residues 
def calculate_average_pLDDT(pdb_string):
    total_pLDDT = 0.0
    atom_count = 0
    pdb_lines = pdb_string.splitlines()
    for line in pdb_lines:
        # PDB atom records start with "ATOM"
        if line.startswith("ATOM"):
            atom_name = line[12:16].strip() # Extract atom name
            if atom_name == "CA":  # Only consider atoms with name "CA"
                try:
                    # Extract the B-factor value from columns 61-66 (following PDB format specifications)
                    pLDDT = float(line[60:66].strip())
                    total_pLDDT += pLDDT
                    atom_count += 1
                except ValueError:
                    pass  # Skip lines where B-factor can't be parsed as a float

    if atom_count == 0:
        return 0.0  # Return 0 if no N atoms were found

    average_pLDDT = total_pLDDT / atom_count
    return average_pLDDT


In [None]:
plddts = []
for idx in range(0, len(multimer_results)):
    if multimer_results[idx] is not None:
        plddts.append(calculate_average_pLDDT(multimer_results[idx][0]))

In [None]:
## Combine the results with their pLDDTs
binder_target_results = list(zip(binder_target_pairs, multimer_results, plddts))

## Sort the results by plddt
sorted_binder_target_results = sorted(binder_target_results, key=lambda x : x[2])

## print the top 5 results
for i in range(0, len(sorted_binder_target_results)):
    print("-"*80)
    print(f"rank: {i}")
    print(f"binder: {sorted_binder_target_results[i][0][0]}")
    print(f"target: {sorted_binder_target_results[i][0][1]}")
    print(f"pLDDT: {sorted_binder_target_results[i][2]}")
    print("-"*80)

These sequences show the highest pLDDT for their binder-target pair.