
# Co-folding with Boltz

*Disclaimer: This topic is a very active area of research and thus prone to quick changes. The Notebook was last edited on 30.06.2025*

---
### In this lesson you'll learn:

- how to predict a protein structure from the amino acid sequence.
- how to align two structures in python and calculate the RMSD in between.
- how to predict the structure of a protein-ligand complex from sequence and SMILES.
- about current limitations of co-folding methods.

---

This notebook is about protein folding, especially *co-folding*, i.e. not just the prediciton of the 3D protein structure from the sequence but also the placement of a ligand in the correct conformation and binding pocket.  

While protein-folding was largly solved by [Alphafold2](https://doi.org/10.1038/s41586-021-03819-2), newer models try to solve the structure of protein-ligand complexes from the sequence and a SMILES. Finding the binding conformation is typically done using docking software such as [gnina](https://github.com/gnina/gnina), [GOLD](https://www.ccdc.cam.ac.uk/solutions/software/gold/) or [Glide](https://www.schrodinger.com/platform/products/glide/). 

Google's [Alphafold3](https://github.com/google-deepmind/alphafold3) is still one of the top performing models for co-folding, but in this notebook we'll be using [Boltz](https://github.com/jwohlwend/boltz), specifically Boltz2. As of writing, this is the newest co-folding model and even offers binding strenght estimation, which we'll  have a look at at the end of the notebook. 

---

## Installation, Google Colab and the Command Line

**Optional Boltz Install**: To complete this notebook, it is not necessary to install Boltz, but we give the option to do so. Another great resource to try out Boltz in Google Colab is [this Notebook](https://colab.research.google.com/github/kimjc95/computational-chemistry/blob/main/Boltz_on_Colab.ipynb).

**Local install**: This Notebook is created to be run with Google Colab and the following cells will install Boltz and its dependencies. If you want to run Boltz locally (recommended if you have a strong GPU) please follow the instructions [here]() and install on linux or in WSL. Please then just execute the escaped linux commands (the commands starting with `!`, e.g. `!boltz predict example_file.fasta`). Note that you might need to resolve the paths (where the files are stored).

**Google Colab**: Make sure you are connected to a runtime with GPU support (go to the upper right corner, open the drop-down menu, select "change runtime type" and make sure a Hardware Accelerator other than CPU is selected).

**Command Line**: As Boltz is a command line program, we will be using a few linux commands instead of typical python code. You can recognize these commands by the `!` which is used to tell the python interpreter to pass this  command to the underlying operating system.

---

In [None]:
# Install py3Dmol for viewing
!pip install py3Dmol

## (Optional) Run an actual Boltz Prediciton

In [None]:
# Installation (for Google Colab only)
# this code was adapted from [Joo-Chan Kim](https://zenodo.org/records/14881401)

import os
import subprocess

print('Installing dependencies (estimate: 2min) ... ', end='')
dependencies = "torch torchvision torchaudio numpy hydra-core pytorch-lightning "
dependencies += "rdkit dm-tree requests pandas types-requests einops einx fairscale "
dependencies += "mashumaro modelcif wandb click pyyaml biopython scipy numba gemmi "
dependencies += "scikit-learn chembl_structure_pipeline "
dependencies += "cuequivariance_ops_cu12 cuequivariance_ops_torch_cu12 cuequivariance_torch"

precision = '32-true'

subprocess.run("pip install ipywidgets torch torchvision torchaudio", shell=True)
subprocess.run("git clone https://github.com/jwohlwend/boltz.git", shell=True)
subprocess.run(f"sed -i 's/bf16-mixed/{precision}/g' /content/boltz/src/boltz/main.py", shell=True)
subprocess.run(f"pip install {dependencies}", shell=True)
subprocess.run("cd boltz; pip install --no-deps -e .", shell=True)

print('done.')

Now we'll create a very simple fasta file using some linux commands:

In [None]:
# `echo` is the linux way of using `print()` and with the `>` we write the ouput to a file
!echo -e ">A|protein|empty\nAAAA\n" > peptide.fasta

In [None]:
# we can have a look at the file with `cat`
!cat peptide.fasta

Now we can use `boltz predict peptide.fasta` to predict the structure of the small peptide.
Note that we used the keyword `empty` in our `.fasta` file. This will lead to much worse performance, because we don't do any sequence alignment.
Normally, we would have to add the `--use_msa_server` keyword to use an external server for the sequence alignment (or a supply our own .a3m file).

In [None]:
!boltz predict peptide.fasta

The output is saved in the folder `boltz_results_peptide` and we can use Google Colabs file browser (the folder symbol on the left hotbar). The actual predicted 3D structure can be visualized using py3Dmol.

In [None]:
import py3Dmol 
fasta_name = 'peptide'

with open(f"boltz_results_{fasta_name}/predictions/{fasta_name}/{fasta_name}_model_0.cif") as ifile:
    system = "".join([x for x in ifile])

view = py3Dmol.view(width=400, height=300)
view.addModelsAsFrames(system)
view.setStyle({'model': -1}, {"cartoon": {'color': 'spectrum'}})
view.zoomTo()
view.show()

There are many [more settings](https://github.com/jwohlwend/boltz/blob/main/docs/prediction.md) available to run Boltz with, this section should just give you a point to get started with co-folding. You could also try different sequence lengths, but be aware that longer sequences will take longer to calculate and at some point the GPU will run out of memory.  

## Comparing crystal structures and predicted structures

While it is possible to run the following examples in Google Colab, the freely available GPUs are not that fast and might run out of memory for larger sequences. We thus ran the predictions locally, so you don't have to wait for the models. We do show you the code to generate the structures, feel free to run them (in Google Colab if you have the time, or locally). Other options would be to generate structures using [Alphafold-Server](https://alphafoldserver.com/welcome), or using a local install of [Alphafold3](https://github.com/google-deepmind/alphafold3). Note, that for these options you will have to look in the documentation of these methods on how to prepare the files for these models.

We'll have a look at the **AKT1 kinase**. Kinases are abundand in the PDB, and (co-)folding models should be able to easily produce a correct structure.

First, lets just have a look at the pdb entry, we'll start with the entry [`3MVH`](https://www.rcsb.org/structure/3MVH) which already has an orthosteric inhibitor bound:

In [42]:
import py3Dmol

view = py3Dmol.view(width=400, height=300, query="3mvh")
view.setStyle({'cartoon':{'color':'spectrum'}})

view.show()


To predict the structure, we can get the protein sequence. While we could use the one from the PDB structure above, the sequence might be truncated or mutated (e.g. for easier crystallization), so we'll take the sequence from the [UniProt](https://www.uniprot.org/) database (ID: P31749)

```
>sp|P31749|AKT1_HUMAN RAC-alpha serine/threonine-protein kinase OS=Homo sapiens OX=9606 GN=AKT1 PE=1 SV=2
MSDVAIVKEGWLHKRGEYIKTWRPRYFLLKNDGTFIGYKERPQDVDQREAPLNNFSVAQC
QLMKTERPRPNTFIIRCLQWTTVIERTFHVETPEEREEWTTAIQTVADGLKKQEEEEMDF
RSGSPSDNSGAEEMEVSLAKPKHRVTMNEFEYLKLLGKGTFGKVILVKEKATGRYYAMKI
LKKEVIVAKDEVAHTLTENRVLQNSRHPFLTALKYSFQTHDRLCFVMEYANGGELFFHLS
RERVFSEDRARFYGAEIVSALDYLHSEKNVVYRDLKLENLMLDKDGHIKITDFGLCKEGI
KDGATMKTFCGTPEYLAPEVLEDNDYGRAVDWWGLGVVMYEMMCGRLPFYNQDHEKLFEL
ILMEEIRFPRTLGPEAKSLLSGLLKKDPKQRLGGGSEDAKEIMQHRFFAGIVWQHVYEKK
LSPPFKPQVTSETDTRYFDEEFTAQMITITPPDQDDSMECVDSERRPHFPQFSYSASGTA
```
(from https://rest.uniprot.org/uniprotkb/P31749.fasta)

We'll quickly  modify the fasta so, as boltz expects the sequence type and a path to an `a3m` file (which we'll leave empty):


In [30]:
with open('p31749.fasta', 'w') as f:
    f.write(
"""
>A|protein|
MSDVAIVKEGWLHKRGEYIKTWRPRYFLLKNDGTFIGYKERPQDVDQREAPLNNFSVAQC
QLMKTERPRPNTFIIRCLQWTTVIERTFHVETPEEREEWTTAIQTVADGLKKQEEEEMDF
RSGSPSDNSGAEEMEVSLAKPKHRVTMNEFEYLKLLGKGTFGKVILVKEKATGRYYAMKI
LKKEVIVAKDEVAHTLTENRVLQNSRHPFLTALKYSFQTHDRLCFVMEYANGGELFFHLS
RERVFSEDRARFYGAEIVSALDYLHSEKNVVYRDLKLENLMLDKDGHIKITDFGLCKEGI
KDGATMKTFCGTPEYLAPEVLEDNDYGRAVDWWGLGVVMYEMMCGRLPFYNQDHEKLFEL
ILMEEIRFPRTLGPEAKSLLSGLLKKDPKQRLGGGSEDAKEIMQHRFFAGIVWQHVYEKK
LSPPFKPQVTSETDTRYFDEEFTAQMITITPPDQDDSMECVDSERRPHFPQFSYSASGTA
"""
    )


With this, we can run a Boltz prediction. If you want to run the prediction yourself, uncomment the first line and comment out the rest of the next cell.

In [None]:
# !boltz predict p31749.fasta --use_msa_server
!wget -O boltz_results_p31749.zip https://uni-muenster.sciebo.de/s/qyFes2eQQApmroC/download
!unzip -o boltz_results_p31749.zip
!rm boltz_results_p31749.zip

We can have a look at the predicted fold and compare it to the crystal structure:

In [7]:
name = 'p31749'
structure_path = f"boltz_results_{name}/predictions/{name}/{name}_model_0.cif"

with open(structure_path) as ifile:
    system = "".join([x for x in ifile])

view = py3Dmol.view(width=800, height=600, query='pdb:3mvh', linked=True)
view.addModelsAsFrames(system)
view.setStyle({'model': -1}, {"cartoon": {'color': 'red'}}) # our prediction
view.setStyle({'model': -2}, {"cartoon": {'color': 'green'}}) # PDB:3mvh
view.zoomTo()
view.show()

The predicted structure (red) does look like a sensible protein, with some alpha helices and beta sheets. However, since the two structures aren't aligned, we can't really tell how good the prediction is. So that is what we'll tackle next.

### Aligning Structures in Python

We'll use the `biopython` package to align two structures.

In [None]:
# download the crystal structure for alignment
!curl https://files.rcsb.org/download/3MVH.cif > 3mvh.cif

In [8]:
from Bio.PDB import MMCIFParser, PDBIO
from Bio.PDB.cealign import CEAligner

# Load the structures
parser = MMCIFParser(QUIET=True)
struct_3mvh = parser.get_structure("3mvh", "3mvh.cif")
struct_folded_p31749 = parser.get_structure("p31749_folded", structure_path)

# Align using biopython
aligner = CEAligner()
aligner.set_reference(struct_3mvh)
aligner.align(struct_folded_p31749) # this will change the coordinates

print(f"RMSD: {aligner.rms:.4f}")

# we save the aligned structure to show it easily with py3Dmol
io = PDBIO()
io.set_structure(struct_folded_p31749)
io.save('p31749_aligned.pdb')

RMSD: 5.5372


In [41]:
with open("p31749_aligned.pdb") as f:
    predicted = "".join([x for x in f])

view = py3Dmol.view(width=400, height=300, query='pdb:3mvh', linked=True)
view.addModelsAsFrames(predicted)
view.setStyle({'model': -1}, {"cartoon": {'color': 'red'}}) # our prediction
view.setStyle({'model': -2}, {"cartoon": {'color': 'green'}}) # PDB:3mvh
view.zoomTo()
view.show()

We see now that the fold seems to be largly correct. The RMSD is a bit high, but the sequences are of different length, so we expect that. Some of the outer loops are of our prediction don't show nice secondary structures. As we don't have a crystal structure to directly compare, we can't be sure whether that is an artifact of the folding or if these are just flexible regions that would also be difficult to locate in crystal structures. We could play around more with the folding settings to try and get a better structure (generating more examples, using more steps, etc.), but for this exercise we'll stay with the basic prediction.

We can also predict the structure for the same sequence and align it:

In [None]:
with open("3mvh.fasta", "w") as f: # from https://www.rcsb.org/fasta/entry/3MVH
    f.write("""
>A|protein
GAMDPRVTMNEFEYLKLLGKGTFGKVILVKEKATGRYYAMKILKKEVIVAKDEVAHTLTENRVLQNSRHPFLTALKYSFQ
THDRLCFVMEYANGGELFFHLSRERVFSEDRARFYGAEIVSALDYLHSEKNVVYRDLKLENLMLDKDGHIKITDFGLCKE
GIKDGATMKTFCGTPEYLAPEVLEDNDYGRAVDWWGLGVVMYEMMCGRLPFYNQDHEKLFELILMEEIRFPRTLGPEAKS
LLSGLLKKDPKQRLGGGSEDAKEIMQHRFFAGIVWQHVYEKKLSPPFKPQVTSETDTRYFDEEFTAQMITITPPDQDDSM
ECVDSERRPHFPQFDYSASSTA
""")

# !boltz predict 3mvh.fasta --use_msa_server
!wget -O boltz_results_3mvh.zip https://uni-muenster.sciebo.de/s/WxzfcnrtiKE4HyP/download
!unzip -o boltz_results_3mvh.zip
!rm boltz_results_3mvh.zip

In [40]:
name = '3mvh'
path_3mvh_folded = f"boltz_results_{name}/predictions/{name}/{name}_model_0.cif"

# Load the structures
parser = MMCIFParser(QUIET=True)
struct_3mvh = parser.get_structure("3mvh", "3mvh.cif")
struct_folded_3mvh = parser.get_structure("3mvh_folded", path_3mvh_folded)

# Align using biopython
aligner = CEAligner()
aligner.set_reference(struct_3mvh)
aligner.align(struct_folded_3mvh) # this will change the coordinates

print(f"RMSD: {aligner.rms:.4f}")
io = PDBIO()
io.set_structure(struct_folded_3mvh)
io.save('3mvh_aligned.pdb')

with open("3mvh_aligned.pdb") as f:
    predicted = "".join([x for x in f])

view = py3Dmol.view(width=400, height=300, query='pdb:3mvh', linked=True)
view.addModelsAsFrames(predicted)
view.setStyle({'model': -1}, {"cartoon": {'color': 'red'}}) # our prediction
view.setStyle({'model': -2}, {"cartoon": {'color': 'green'}}) # PDB:3mvh
view.zoomTo()
view.show()

RMSD: 0.6096


Here we see an almost perfect alignment. As the structure is from 2010, it is well within the training range for the model and thus we expected a good performance.

## Co-folding

Newer developements of these models lead to better DNA/RNA co-folding (which we will not look at here) and to protein-ligand complex co-folding. The model we have been working with so far has been crystalized with a ligand binding to the orthosteric pocket.

In [39]:
view = py3Dmol.view(width=400, height=300, query='pdb:3mvh', linked=True)
view.setStyle({"cartoon": {'color': 'green'}})
view.setStyle({'resn': 'WFE'}, {"stick": {'color': 'red'}})
view.zoomTo()
view.show()

However, so far we only predicted the protein structure, whithout any ligand present. Now, we'll modify our input `.fasta` file to also include the ligand. We'll use the longer sequence from the UniProt database. We could take the SMILES or simply use the CCD (Chemical Component Dictionary), a 3-letter code from the PDB to co-fold with.  

In [None]:
with open("p31749_ligand.fasta", "w") as f: # from https://www.rcsb.org/fasta/entry/3MVH
    f.write("""
>A|protein|
MSDVAIVKEGWLHKRGEYIKTWRPRYFLLKNDGTFIGYKERPQDVDQREAPLNNFSVAQC
QLMKTERPRPNTFIIRCLQWTTVIERTFHVETPEEREEWTTAIQTVADGLKKQEEEEMDF
RSGSPSDNSGAEEMEVSLAKPKHRVTMNEFEYLKLLGKGTFGKVILVKEKATGRYYAMKI
LKKEVIVAKDEVAHTLTENRVLQNSRHPFLTALKYSFQTHDRLCFVMEYANGGELFFHLS
RERVFSEDRARFYGAEIVSALDYLHSEKNVVYRDLKLENLMLDKDGHIKITDFGLCKEGI
KDGATMKTFCGTPEYLAPEVLEDNDYGRAVDWWGLGVVMYEMMCGRLPFYNQDHEKLFEL
ILMEEIRFPRTLGPEAKSLLSGLLKKDPKQRLGGGSEDAKEIMQHRFFAGIVWQHVYEKK
LSPPFKPQVTSETDTRYFDEEFTAQMITITPPDQDDSMECVDSERRPHFPQFSYSASGTA
>B|ccd
WFE
""")

# !boltz predict p31749_ligand.fasta --use_msa_server
!wget -O boltz_results_p31749_ligand.zip https://uni-muenster.sciebo.de/s/3A4q9xnJDb76GjE/download
!unzip -o boltz_results_p31749_ligand.zip
!rm boltz_results_p31749_ligand.zip

In [50]:
name = 'p31749_ligand'
structure_path = f"boltz_results_{name}/predictions/{name}/{name}_model_0.cif"

with open(structure_path) as f:
    predicted = "".join([x for x in f])

view = py3Dmol.view(width=400, height=300, linked=True)
view.addModelsAsFrames(predicted)
view.setStyle({"cartoon": {'color': 'green'}})
view.setStyle({'resn': 'WFE'}, {"stick": {'color': 'red'}})
view.zoomTo()
view.show()

In a real drug design setting (and also when using classic docking programs) such a pose is followed by a visual inspection. Here, we check wether the placement and the conformation makes sense. 

- Are the benzene rings flat?
- Do we have weird angles or geometry?
- Is the ligand to close to / inside of the protein chain?
- Does the ligand still have the right stereo-chemistry and bonds?

Especially the last two points can be more of a problem for co-folding models, classic docking programs typically have no problem here.

There is another point, and that is the placement in the pocket. With classical docking programs, you often have to manually decide where the pocket of the protein is, however with co-folding methods we are not given that option and the models just places the ligand where it thinks it fits best.

Let's check wheter the model places the ligand in the correct place:

In [65]:
name = 'p31749_ligand'
path_p31749_ligand = f"boltz_results_{name}/predictions/{name}/{name}_model_0.cif"

parser = MMCIFParser(QUIET=True)
struct_3mvh = parser.get_structure("3mvh", "3mvh.cif")
struct_p31749_ligand = parser.get_structure("p31749_ligand", path_p31749_ligand)

aligner = CEAligner()
aligner.set_reference(struct_3mvh)
aligner.align(struct_p31749_ligand) # this will change the coordinates

print(f"RMSD: {aligner.rms:.4f}")
io = PDBIO()
io.set_structure(struct_p31749_ligand)
io.save('p31749_ligand_aligned.pdb')

with open("p31749_ligand_aligned.pdb") as f:
    predicted = "".join([x for x in f])

view = py3Dmol.view(width=400, height=300, query='pdb:3mvh', linked=True)
view.addModelsAsFrames(predicted)
view.setStyle({'model': -1}, {"cartoon": {'color': 'red'}}) # our prediction
view.setStyle({'model': -1, 'resn': 'WFE'}, {"stick": {'color': 'blue'}}) # our prediction
view.setStyle({'model': -2}, {"cartoon": {'color': 'green'}}) # PDB:3mvh
view.setStyle({'model': -2, 'resn': 'WFE'}, {"stick": {'color': 'yellow'}}) # PDB:3mvh
view.zoomTo()
view.show()

RMSD: 4.3682


Nice! The placement looks quite good, not just the position, but also the conformation.

## Task: Co-fold a different ligand

Now its your turn. Generate a `.fasta` file, but instead of a ligand that we know can bind, try using this SMILES instead: `CC(=O)Nc1cccc(c1)c2ccc3c(n2)n(c(n3)c4cccnc4N)c5ccc(cc5)CNC(=O)c6cccc(c6)F`. Then either predict the structure (or download it).

In [None]:
with open("new_ligand.fasta", "w") as f:
    pass # put the entries to your new fasta file here
# hint: instead of `ccd` you'll now have to use `smiles`

# predict here
# or use the folder at https://uni-muenster.sciebo.de/s/He772EYLa2AHMrw/download

In [None]:
with open("new_ligand.fasta", "w") as f: # from https://www.rcsb.org/fasta/entry/3MVH
    f.write("""
>A|protein|
MSDVAIVKEGWLHKRGEYIKTWRPRYFLLKNDGTFIGYKERPQDVDQREAPLNNFSVAQC
QLMKTERPRPNTFIIRCLQWTTVIERTFHVETPEEREEWTTAIQTVADGLKKQEEEEMDF
RSGSPSDNSGAEEMEVSLAKPKHRVTMNEFEYLKLLGKGTFGKVILVKEKATGRYYAMKI
LKKEVIVAKDEVAHTLTENRVLQNSRHPFLTALKYSFQTHDRLCFVMEYANGGELFFHLS
RERVFSEDRARFYGAEIVSALDYLHSEKNVVYRDLKLENLMLDKDGHIKITDFGLCKEGI
KDGATMKTFCGTPEYLAPEVLEDNDYGRAVDWWGLGVVMYEMMCGRLPFYNQDHEKLFEL
ILMEEIRFPRTLGPEAKSLLSGLLKKDPKQRLGGGSEDAKEIMQHRFFAGIVWQHVYEKK
LSPPFKPQVTSETDTRYFDEEFTAQMITITPPDQDDSMECVDSERRPHFPQFSYSASGTA
>B|smiles
CC(=O)Nc1cccc(c1)c2ccc3c(n2)n(c(n3)c4cccnc4N)c5ccc(cc5)CNC(=O)c6cccc(c6)F
""")

# !boltz predict new_ligand.fasta --use_msa_server
# !wget -O boltz_results_new_ligand.zip https://uni-muenster.sciebo.de/s/He772EYLa2AHMrw/download
# !unzip -o boltz_results_new_ligand.zip
# !rm boltz_results_new_ligand.zip

Now try aligning the two predicted structures. Where is the ligand placed?

In [90]:
name = 'p31749_ligand'
path_p31749_ligand = f"boltz_results_{name}/predictions/{name}/{name}_model_0.cif"
name = 'new_ligand'
path_new_ligand = f"boltz_results_{name}/predictions/{name}/{name}_model_0.cif"

# We have to rename the new ligand from `LIG1` to `LIG`, otherwise py3Dmol breaks
!sed -i 's/LIG1/LIG/g' boltz_results_new_ligand/predictions/new_ligand/new_ligand_model_0.cif

parser = MMCIFParser(QUIET=True)
struct_p31749_ligand = parser.get_structure("p31749_ligand", path_p31749_ligand)
struct_new_ligand = parser.get_structure("new_ligand", path_new_ligand)

aligner = CEAligner()
aligner.set_reference(struct_p31749_ligand)
aligner.align(struct_new_ligand) # this will change the coordinates

print(f"RMSD: {aligner.rms:.4f}")
io = PDBIO()
io.set_structure(struct_new_ligand)
io.save('new_ligand_aligned.pdb')

with open(path_p31749_ligand) as f:
    predicted_first_ligand = "".join([x for x in f])
with open('new_ligand_aligned.pdb') as f:
    predicted_new_ligand = "".join([x for x in f])

view = py3Dmol.view(width=400, height=300, linked=True)
view.addModelsAsFrames(predicted_first_ligand)
view.addModelsAsFrames(predicted_new_ligand)
view.setStyle({'model': -2}, {"cartoon": {'color': 'green'}}) # first prediction
view.setStyle({'model': -2, 'resn': 'WFE'}, {"stick": {'color': 'yellow'}}) # first prediction
view.setStyle({'model': -1}, {"cartoon": {'color': 'red'}}) # new_ligand
view.setStyle({'model': -1, 'resn': 'LIG'}, {"stick": {'color': 'blue'}}) # new_ligand
view.zoomTo()
view.show()

RMSD: 2.4930


So, the ligand looks sensible, but it is placed in a different pocket. Actually, this ligand is a know allosteric binder to AKT1, so we can actually compare our fold with another crystal structure (PDB: `4EJN`):

In [None]:
!curl https://files.rcsb.org/download/4EJN.cif > 4ejn.cif

In [95]:
name = 'new_ligand'
path_new_ligand = f"boltz_results_{name}/predictions/{name}/{name}_model_0.cif"

parser = MMCIFParser(QUIET=True)
struct_4ejn = parser.get_structure("4ejn", "4ejn.cif")
struct_new_ligand = parser.get_structure("new_ligand", path_new_ligand)

aligner = CEAligner()
aligner.set_reference(struct_4ejn)
aligner.align(struct_new_ligand) # this will change the coordinates

print(f"RMSD: {aligner.rms:.4f}")
io = PDBIO()
io.set_structure(struct_new_ligand)
io.save('new_ligand_aligned_2.pdb')

with open('new_ligand_aligned_2.pdb') as f:
    predicted_new_ligand = "".join([x for x in f])

view = py3Dmol.view(width=400, height=300, query='pdb:4ejn', linked=True)
view.addModelsAsFrames(predicted_new_ligand)
view.setStyle({'model': -2}, {"cartoon": {'color': 'green'}}) # pdb: 4ejn
view.setStyle({'model': -2, 'resn': '0R4'}, {"stick": {'color': 'yellow'}}) # pdb: 4ejn
view.setStyle({'model': -1}, {"cartoon": {'color': 'red'}}) # new_ligand
view.setStyle({'model': -1, 'resn': 'LIG'}, {"stick": {'color': 'blue'}}) # new_ligand
view.zoomTo()
view.show()

RMSD: 2.6608


As we can see, the conformation is slightly different, but the overall placement looks quite good. However, [research](https://www.sciencedirect.com/science/article/pii/S2667318525000121) has shown that this example is in the minority when it comes to allosteric and orthosteric binding pockets. **More often than not, the allosteric ligand will end up in the orthosteric pocket, despite both orthosteric and allosteric protein-ligand complexes being part of the training dataset.** This will be explored more in the next task.

## Bonus Task: Finding an example where the wrong pocket is chosen

Try to apply your newly gained knowledge on `PTK2` (PDB: 3BZ3 (orhtosteric), 4EBV (allosteric)). Does the ligand end up in the correct pocket for both crystal structures?

In [None]:
# create the .fasta files

# predict the structures
# or use prepared predictions
# https://uni-muenster.sciebo.de/s/jdnbJBgdLrwrRi8/download [3bz3]
# https://uni-muenster.sciebo.de/s/aFrWACDdJ4kRXGo/download [4ebv]

# align and view output  

## Bonus Task: Affinity Prediction
Boltz2 is able to predict protein-ligand binding affinities. For that to work, the input file has to be in `yaml` format. Try converting one of the `.fasta` files to a `.yaml` file ([this documentation](https://github.com/jwohlwend/boltz/blob/main/docs/prediction.md) will help). And run an affinity prediction by adding the `affinity` property to the yaml file.

In [None]:
# your code goes here