
# Boltz2-Notebook: Diffusion-Based Protein–Ligand Structure Prediction & Affinity Analysis


![Python](https://img.shields.io/badge/Python-3.10-blue?logo=python)
![CUDA](https://img.shields.io/badge/CUDA-Enabled-green?logo=nvidia)
![Boltz2](https://img.shields.io/badge/Model-Boltz2-purple)
![Platform](https://img.shields.io/badge/Platform-Colab%20|%20Linux-lightgrey?logo=googlecolab)
![License](https://img.shields.io/badge/License-MIT-orange)
![Status](https://img.shields.io/badge/Status-Active-success)
![Build](https://img.shields.io/badge/Build-Stable-brightgreen)
![Contributions](https://img.shields.io/badge/Contributions-Welcome-blue)
<br>

---

## Boltz2-Notebook Overview

**Boltz2-Notebook** is an **interactive Google Colaboratory platform** for **diffusion-based protein–ligand structure prediction** and **binding affinity estimation**.  
It integrates the **Boltz2 deep learning model** into a single, automated notebook environment — eliminating the need for local GPU setup, command-line execution, or YAML configuration.

Developed to enhance accessibility and usability, **Boltz2-Notebook** provides a fully guided workflow from setup to post-prediction analysis.  
It features a **graphical interface**, **3D molecular visualization**, and **automated confidence & affinity dashboards** — all within Google Colab.

---

###  Pipeline Overview
1. **Input**: Provide a protein sequence (and optional ligands).  
2. **YAML Generation**: The sequence is formatted into a YAML config.  
3. **MSA Search**: Boltz2 fetches multiple sequence alignments (MSA) using online servers.  
4. **Structure Prediction**: The neural network predicts 3D coordinates using diffusion and recycling steps.  
5. **Output**: Results include 3D models (CIF/PDB), confidence scores (**pLDDT**), and error heatmaps (**PAE**).  
6. **Visualization**: The notebook displays the predicted structure and confidence plots.  

---

 **Note:** This notebook automates the full Boltz2 workflow, from setup to visualization, with **color-coded status** and **interactive outputs**.  

---

##  Credits & Authorship

- **Notebook Developer:** Atharva Tilewale & Dr. Dhaval Patel
- **Affiliation:** Gujarat Biotechnology University | Bioinformatics & Computational Biology  
- **GitHub Repository:** [Boltz2-Notebook](https://github.com/AtharvaTilewale/boltz2-notebook)  
- **Contact:** [LinkedIn](https://www.linkedin.com/in/atharvatilewale) | [GitHub](https://github.com/AtharvaTilewale)  

**Acknowledgements:**  
- **Boltz2 framework**: [Original Boltz repository](https://github.com/jwohlwend/boltz) by J. Wohlwend and collaborators.  
- **Dependencies:** PyTorch, Biopython, NumPy, Matplotlib, Py3Dmol, PyYAML.  
- Special thanks to the **open-source community** for providing tools that make structural bioinformatics more accessible.  

---

## Cite
If you use this notebook, please **cite the following repository**:

[![GitHub Repo](https://img.shields.io/badge/GitHub-Boltz--Notebook-181717?logo=github)](https://github.com/AtharvaTilewale/Boltz-Notebook)

- Passaro, S., Corso, G., Wohlwend, J., Reveiz, M., Thaler, S., Somnath, V. R., Getz, N., Portnoi, T., Roy, J., Stark, H., Kwabi-Addo, D., Beaini, D., Jaakkola, T., & Barzilay, R. (2025).  
  **Boltz-2: Towards Accurate and Efficient Binding Affinity Prediction.** *bioRxiv.*  
    [![bioRxiv Boltz2](https://img.shields.io/badge/bioRxiv-Boltz2-red)](https://doi.org/10.1101/2025.06.14.659707)

- Wohlwend, J., Corso, G., Passaro, S., Getz, N., Reveiz, M., Leidal, K., Swiderski, W., Atkinson, L., Portnoi, T., Chinn, I., Silterra, J., Jaakkola, T., & Barzilay, R. (2024).  
  **Boltz-1: Democratizing Biomolecular Interaction Modeling.** *bioRxiv.*  
    [![bioRxiv Boltz1](https://img.shields.io/badge/bioRxiv-Boltz1-orange)](https://doi.org/10.1101/2024.11.19.624167)

- Mirdita, M., Schütze, K., Moriwaki, Y., Heo, L., Ovchinnikov, S., & Steinegger, M. (2022).  
  **ColabFold: Making protein folding accessible to all.** *Nature Methods.*  
    [![ColabFold](https://img.shields.io/badge/ColabFold-Reference-yellow)](https://doi.org/10.1038/s41592-022-01488-1)


---

In [1]:
# @title Install Dependencies and Boltz2 with CUDA support
import sys
import subprocess
import threading
import time
import os
import shutil
import torch

os.chdir("/content/")

# ANSI color codes for colored output
class Color:
    CYAN = "\033[96m"
    GREEN = "\033[92m"
    YELLOW = "\033[93m"
    RED = "\033[91m"
    RESET = "\033[0m"

# ---------------- GPU CHECK ----------------
print(f"{Color.CYAN}[i] Checking GPU availability...{Color.RESET}")
if not torch.cuda.is_available():
    print(f"{Color.RED}[✘] No GPU detected!{Color.RESET}")
    print(f"{Color.YELLOW}Please change runtime to 'GPU' (T4 or higher).{Color.RESET}")
    print(f"{Color.CYAN}Runtime > Change Runtime Type > Select any available GPU from Hardware Accelerator.{Color.RESET}")
    sys.exit(1)
else:
    gpu_name = torch.cuda.get_device_name(0)
    print(f"{Color.GREEN}[✔] GPU detected:{Color.RESET} {gpu_name}")

# ---------------- INSTALL STEPS ----------------
repo_dirs = ["boltz2-notebook"]

steps = [
    {
        "loader": f"{Color.CYAN}Cloning Notebook Modules...{Color.RESET}",
        "done":   f"[{Color.GREEN}✔{Color.RESET}] Notebook modules cloned successfully.",
        "fail":   f"[{Color.RED}✘{Color.RESET}] Boltz-Notebook clone failed.",
        "cmd": ["git", "clone", "https://github.com/AtharvaTilewale/boltz2-notebook.git"]
    },
]

def loader(msg, stop_event):
    symbols = ["-", "\\", "|", "/"]
    i = 0
    while not stop_event.is_set():
        sys.stdout.write(f"\r[{symbols[i % len(symbols)]}] {msg}   ")
        sys.stdout.flush()
        time.sleep(0.1)
        i += 1
    sys.stdout.write("\r" + " " * (len(msg) + 10) + "\r")

# Step 1: Remove repo if it exists
for repo in repo_dirs:
    if os.path.isdir(repo):
        print(f"{Color.YELLOW}[i] Repository already exists. Removing '{repo}'...{Color.RESET}")
        try:
            shutil.rmtree(repo)
            print(f"[{Color.GREEN}✔{Color.RESET}] Existing repository '{repo}' removed.")
        except Exception as e:
            print(f"[{Color.RED}✘{Color.RESET}] Failed to remove '{repo}': {e}")
            raise

all_success = True

# Main steps
for step in steps:
    stop_event = threading.Event()
    t = threading.Thread(target=loader, args=(step["loader"], stop_event))
    t.start()
    try:
        subprocess.run(step["cmd"], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL, check=True)
        stop_event.set()
        t.join()
        print(step["done"])
    except Exception as e:
        stop_event.set()
        t.join()
        print(f"{step['fail']} {e}")
        all_success = False
        break

# Run setup if clone worked
if all_success:
    %run /content/boltz2-notebook/dist/setup.py
    print(f"{Color.GREEN}All steps completed successfully.{Color.RESET}")


[96m[i] Checking GPU availability...[0m
[92m[✔] GPU detected:[0m Tesla T4
[[92m✔[0m] Notebook modules cloned successfully.
[96m ===Initialising Setup=== [0m




[[92m✔[0m] Boltz cloned successfully.
[[92m✔[0m] Dependencies installed successfully.
[[92m✔[0m] Validation complete.
[92mAll steps completed successfully.[0m


In [2]:

# @title Download CCD Dataset and Test Boltz2
import sys
import threading
import time
import os

# ANSI color codes for colored output
class Color:
    CYAN = "\033[96m"
    GREEN = "\033[92m"
    YELLOW = "\033[93m"
    RED = "\033[91m"
    RESET = "\033[0m"

def loader(msg, stop_event):
    symbols = ["-", "\\", "|", "/"]
    i = 0
    while not stop_event.is_set():
        sys.stdout.write(f"\r[{symbols[i % len(symbols)]}] {msg}   ")
        sys.stdout.flush()
        time.sleep(0.1)
        i += 1
    sys.stdout.write("\r" + " " * (len(msg) + 10) + "\r")
    sys.stdout.flush()

# Step 1: Create data directory
os.makedirs("/content/boltz_data", exist_ok=True)
os.chdir("/content/boltz_data/")

# Step 2: Write YAML file
yaml_content = f"""\
version: 1
sequences:
    - protein:
        id: [A]
        sequence: MVTPE
    - ligand:
        id: [B]
        ccd: SAH
"""
with open("/content/boltz_data/test.yaml", "w") as f:
    f.write(yaml_content)

# Step 3: Run boltz predict (silent)
step_msg = f"{Color.YELLOW}Downloading CCD Dataset...{Color.RESET}"
stop_event = threading.Event()
t = threading.Thread(target=loader, args=(step_msg, stop_event))
t.start()
try:
    import subprocess
    subprocess.run(
        ["boltz", "predict", "test.yaml", "--use_msa_server"],
        cwd="/content/boltz_data",
        stdout=subprocess.DEVNULL,
        stderr=subprocess.DEVNULL,
        check=True
    )
    stop_event.set()
    t.join()
    print(f"[{Color.GREEN}✔{Color.RESET}] CCD Dataset Downloaded and validated.")
except Exception as e:
    stop_event.set()
    t.join()
    print(f"[{Color.RED}✘{Color.RESET}] CCD Dataset Download or validation failed: {e}")


[[92m✔[0m] CCD Dataset Downloaded and validated.


In [3]:
import os
import shutil
import yaml
import subprocess
import math

# ========= Configuration =========
base_dir = "/content/boltz_data"
job_name = "fgfr2_ligand_x50"
run_root = os.path.join(base_dir, job_name)

# the ligand number
n_ligands = 50
radius = 35.0

protein_sequence = (
    "MVSWGRFICLVVVTMATLSLARPSFSLVEDTTLEPEEPPTKYQISQPEVYVAAPGESLEVRCLLKDAAVISWTKDGVHLGPNNRTVLIGEYLQIKGATPRDSGLYACTASRTVDSETWYFMVNVTDAISSGDDEDDTDGAEDFVSENSNNKRAPYWTNTEKMEKRLHAVPAANTVKFRCPAGGNPMPTMRWLKNGKEFKQEHRIGGYKVRNQHWSLIMESVVPSDKGNYTCVVENEYGSINHTYHLDVVERSPHRPILQAGLPANASTVVGGDVEFVCKVYSDAQPHIQWIKHVEKNGSKYGPDGLPYLKVLKAAGVNTTDKEIEVLYIRNVTFEDAGEYTCLAGNSIGISFHSAWLTVLPAPGREKEITASPDYLEIAIYCIGVFLIACMVVTVILCRMKNTTKKPDFSSQPAVHKLTKRIPLRRQVTVSAESSSSMNSNTPLVRITTRLSSTADTPMLAGVSEYELPEDPKWEFPRDKLTLGKPLGEGCFGQVVMAEAVGIDKDKPKEAVTVAVKMLKDDATEKDLSDLVSEMEMMKMIGKHKNIINLLGACTQDGPLYVIVEYASKGNLREYLRARRPPGMEYSYDINRVPEEQMTFKDLVSCTYQLARGMEYLASQKCIHRDLAARNVLVTENNVMKIADFGLARDINNIDYYKKTTNGRLPVKWMAPEALFDRVYTHQSDVWSFGVLMWEIFTLGGSPYPGIPVEELFKLLKEGHRMDKPANCTNELYMMMRDCWHAVPSQRPTFKQLVEDLDRILTLTTNEEYLDLSQPLEQYSPSYPDTRSSCSSGDDSVFSPDPMPYEPCLPQYPHINGSVKT"
)

ligand_smiles = "CCCCCCCCCCCCCCCC(=O)OCC(COP(=O)(O)OCC[N+](C)(C)C)OC(=O)CCCCCCCCCCCCCCC"
pocket_contacts = [["A", 378], ["A", 398]]
max_distance = 6.0

protein_id = "A"
ligand_id = "B"

# ========= Setup =========
shutil.rmtree(run_root, ignore_errors=True)
os.makedirs(run_root, exist_ok=True)

def find_first_cif(folder: str):
    for root, _, files in os.walk(folder):
        for fn in files:
            if fn.endswith(".cif"):
                return os.path.join(root, fn)
    return None


# ========= (1) Boltz: runs only 1 time =========
run_boltz = True

template_cif = os.path.join(run_root, "005_fgfr2_ligand_x50_model.cif")

if run_boltz:
    run_dir = os.path.join(run_root, "lig_1")
    os.makedirs(run_dir, exist_ok=True)

    yaml_path = os.path.join(run_dir, "005_fgfr2_ligand_x50.yaml")

    cfg = {
        "version": 1,
        "sequences": [
            {"protein": {"id": protein_id, "msa": "empty", "sequence": protein_sequence}},
            {"ligand":  {"id": ligand_id,  "smiles": ligand_smiles}},
        ],
        "constraints": [
            {"pocket": {
                "binder": ligand_id,
                "contacts": pocket_contacts,
                "max_distance": max_distance,
            }}
        ],
    }

    with open(yaml_path, "w") as f:
        yaml.dump(cfg, f, sort_keys=False)

    result = subprocess.run(
        ["boltz", "predict", yaml_path, "--use_msa_server", "--out_dir", run_dir],
        capture_output=True,
        text=True,
    )

    if result.returncode != 0:
        print("STDERR:\n", result.stderr)
        raise RuntimeError("boltz predict failed")

    cif_path = find_first_cif(run_dir)
    if not cif_path:
        raise RuntimeError(f"No CIF file found in {run_dir}")

    shutil.copy(cif_path, template_cif)

print("Template CIF:", template_cif)

Template CIF: /content/boltz_data/fgfr2_ligand_x50/005_fgfr2_ligand_x50_model.cif


In [4]:
# ========= (2) Write ChimeraX script: separate ECD/ICD + copy ligand arrange ring =========
cxc_path = os.path.join(run_root, f"005_fgfr2_ligand_x50.cxc")

ecd = "1-377"
tm  = "378-398"
icd = "399-821"

ecd_up = 25
icd_down = 25

with open(cxc_path, "w") as f:
    f.write("close session\n")
    f.write(f"open C:/Users/SamanthaTu/OneDrive/Desktop/MSc_research/005_fgfr2_ligand_x50/005_fgfr2_ligand_x50_model.cif")

    # Protein display
    f.write("cartoon\n")
    f.write(f"color /A:{ecd} blue\n")
    f.write(f"color /A:{tm} red\n")
    f.write(f"color /A:{icd} green\n")
    f.write("view /A\n\n")

    # Membrane plane
    f.write(f"define plane /A:{tm}@CA radius 45 thickness 2 color #808080\n\n")

    # Split into ECD/TM/ICD submodels and move ECD/ICD apart
    f.write(f"split #1 atoms /A:{ecd} atoms /A:{tm} atoms /A:{icd}\n")
    f.write(f"move z {ecd_up} models #1.1\n")
    f.write(f"move z -{icd_down} models #1.3\n\n")

    # Ligand style
    f.write("show /B\n")
    f.write("style /B stick\n")
    f.write("color /B gold\n\n")

    # ---- Optional: duplicate ligand into many copies and arrange as a ring ----
    # We use combine to copy just the ligand atoms into new models, then move/turn each model.
    f.write(f"# --- Make {n_ligands} ligand copies (display only) ---\n")
    f.write("# The original ligand is /B in model #1\n")
    f.write("hide /A\n")  # optional: hide protein to see ring clearly; comment out if you want protein visible
    f.write("show /B\n\n")

    # First copy is the original ligand in #1; we create copies #2..#n_ligands
    for k in range(2, n_ligands + 1):
        f.write(f"combine /B name lig{k}\n")   # creates a new model containing the selected atoms [page:6]

    f.write("\n# --- Arrange ligand models in a ring ---\n")
    # After combine, the new models will be numbered #2, #3, ... in order of creation (typical behavior).
    # We place each at radius along X then rotate about Z by angle.
    for idx in range(1, n_ligands + 1):
        model_id = 1 if idx == 1 else idx  # original ligand is still in #1 (as part of the opened structure)
        angle = 360.0 * (idx - 1) / n_ligands
        f.write(f"move cofr /B\n")  # set center of rotation for selection (safe even if repeated) [page:7]
        f.write(f"turn z {angle:.3f} models #{model_id}\n")
        f.write(f"move x {radius:.3f} models #{model_id}\n")

    f.write("\n# Show protein back (optional)\n")
    f.write("show /A\n")
    f.write("view /A\n")

print("Wrote ChimeraX script:", cxc_path)
print("Next: download/open this .cxc in ChimeraX.")

Wrote ChimeraX script: /content/boltz_data/fgfr2_ligand_x50/005_fgfr2_ligand_x50.cxc
Next: download/open this .cxc in ChimeraX.


In [None]:
# @title Generate Parameters
%run /content/boltz_data/dist/param_gen.py

In [None]:
# @title Run Boltz2 Engine
%run /content/boltz_data/dist/Boltz_Run.py

In [None]:
# @title Analyse Results
%run /content/boltz_data/dist/analysis.py

In [None]:
# @title Copy Results to Drive
import shutil, os
from google.colab import drive
from Bio.PDB import MMCIFParser, PDBIO

# ANSI color codes for colored output
class Color:
    CYAN = "\033[96m"
    GREEN = "\033[92m"
    YELLOW = "\033[93m"
    RED = "\033[91m"
    BLUE = "\033[94m"
    MAGENTA = "\033[95m"
    RESET = "\033[0m"

# Mount Google Drive
drive.mount('/content/drive')

# Paths
drive_output_dir = f"/content/drive/MyDrive/Boltz2_Results/{job_name}"
local_output_path = f"/content/boltz_data/{job_name}"

# # Convert CIF to PDB
# cif_file = f"{local_output_path}/boltz_results_{job_name}/predictions/{job_name}/{job_name}_model_0.cif"
# pdb_file = f"{local_output_path}/{job_name}.pdb"

# parser = MMCIFParser(QUIET=True)
# structure = parser.get_structure("prot", cif_file)
# io = PDBIO()
# io.set_structure(structure)
# io.save(pdb_file)

# Remove old folder in Drive if exists
if os.path.exists(drive_output_dir):
    print(f"Removing existing folder {drive_output_dir}")
    shutil.rmtree(drive_output_dir)
    print("Old Drive folder removed.")

# Copy local output folder to Drive
shutil.copytree(local_output_path, drive_output_dir)
print(f"{Color.GREEN}All results copied to Google Drive: {drive_output_dir}{Color.RESET}")
# # Copy PDB file separately (optional, just in case)
# drive_pdb_file = os.path.join(drive_output_dir, os.path.basename(pdb_file))
# shutil.copy(pdb_file, drive_pdb_file)

In [None]:
# @title Download Results (.zip)
from google.colab import files
from Bio.PDB import MMCIFParser, PDBIO
import shutil
import os

# Local output folder you want to download
local_output_path = f"/content/boltz_data/{job_name}"

# # Convert CIF to PDB
# cif_file = f"{local_output_path}/boltz_results_{job_name}/predictions/{job_name}/{job_name}_model_0.cif"
# pdb_file = f"{local_output_path}/{job_name}.pdb"

# # Parse CIF and save as PDB
# parser = MMCIFParser(QUIET=True)
# structure = parser.get_structure("prot", cif_file)
# io = PDBIO()
# io.set_structure(structure)
# io.save(pdb_file)

# Path for the zip file
zip_file = f"/content/{job_name}.zip"

# Remove previous zip if exists
if os.path.exists(zip_file):
    os.remove(zip_file)

# Create zip of the entire folder
shutil.make_archive(base_name=f"/content/{job_name}", format='zip', root_dir=local_output_path)

# Download the zip file
files.download(zip_file)

# Success message
print(f"{Color.GREEN}Download successful! All results from '{job_name}' are saved in '{zip_file}'{Color.RESET}")
