Skip to content

moritztng/tt-bio

Repository files navigation

████████╗████████╗        ██████╗  ██╗  ██████╗
╚══██╔══╝╚══██╔══╝        ██╔══██╗ ██║ ██╔═══██╗
   ██║      ██║    █████╗ ██████╔╝ ██║ ██║   ██║
   ██║      ██║    ╚════╝ ██╔══██╗ ██║ ██║   ██║
   ██║      ██║           ██████╔╝ ██║ ╚██████╔╝
   ╚═╝      ╚═╝           ╚═════╝  ╚═╝  ╚═════╝

Important

TT-Boltz is now TT-Bio

TT-Bio runs Boltz-2, ESMFold2, and Protenix-v2 structure prediction and BoltzGen binder design on Tenstorrent Blackhole and Wormhole, supporting single-card and multi-card configurations (e.g. QuietBox with 4 cards or Galaxy server with 32 cards). Multiple machines can also be combined into a single prediction run.

Installation

Create a Python virtual environment with Python 3.10 or 3.12, install TT-Bio, then install the matching Tenstorrent system dependencies.

python3.10 -m venv env
source env/bin/activate
pip install "tt-bio @ git+https://github.com/moritztng/tt-bio.git"
tt-bio install-deps

tt-bio install-deps installs the SFPI compiler version that matches the installed ttnn wheel and clears stale TT-Metal kernel cache entries. It may ask for your sudo password.

Advanced Install (editable local clone)

git clone https://github.com/moritztng/tt-bio.git
cd tt-bio
pip install -e .
tt-bio install-deps

Optional: Build TT-Metal / TT-NN from Source

If you need to build from source, follow the Tenstorrent Installation Guide.

Verify Installation

tt-bio --help
tt-bio predict --help
tt-bio msa --help

Basic Usage

Structure Prediction

tt-bio predict examples/prot.yaml --model boltz2 --use_msa_server --override

Every command names its model with --model:

  • boltz2 — folds complexes of proteins, DNA, RNA, and ligands and predicts binding affinity. Needs an MSA for each protein chain.
  • esmfold2 / esmfold2-fast — fold a single protein sequence on-device, no MSA required (esmfold2-fast is the lighter, faster checkpoint):
  • protenix-v2 — folds a single protein with an optional MSA (an AlphaFold3-family model, the Protenix reproduction; an MSA is recommended for best accuracy):
tt-bio predict seq.fasta --model esmfold2-fast --fast
tt-bio predict seq.fasta --model protenix-v2 --use_msa_server   # or fold single-sequence with no MSA flag

ESMFold2 and Protenix-v2 are protein-only, so the ligand, affinity, potential, constraint, template, and energy options below apply to Boltz-2 only. Both can use an MSA: pass --use_msa_server (or a precomputed a3m via the input file / --msa_db_path); with no MSA source they fold single-sequence. The shared options — --fast, --recycling_steps, --sampling_steps, --diffusion_samples, --output_format, the MSA flags, and the multi-card / multi-machine flags — work for every model. Each model downloads its weights automatically on first use.

Boltz-2 needs an MSA (multiple sequence alignment) for each protein chain. --use_msa_server sends sequences to the ColabFold MSA API and downloads the resulting alignments (online MSA).

--fast makes some operations use block-fp8, a lower-precision numeric format that runs faster. Accuracy is typically very close.

predict accepts either a single YAML/FASTA file or a directory containing many input files.

A live display shows the progress of each protein. On a multi-card machine such as a QuietBox or Galaxy server, every card is used in parallel and labelled in the display (quietbox:tt0, quietbox:tt1, ...). Models load once per card and stay resident, so jobs flow through without per-protein reloads:

tt-bio predict proteins/ --model boltz2 --out_dir results --use_msa_server --fast

If you have additional machines with Tenstorrent cards, you can add them to a single run — see Optional: Multi-Machine Prediction.

Offline MSA (Optional)

Use this if you have enough disk and RAM and want local MSA. This avoids external MSA server calls and is faster for repeated runs.

tt-bio msa
tt-bio predict examples/prot.yaml --model boltz2 --override

tt-bio msa downloads UniRef30 to ~/.boltz/msa_db (~100GB download, ~500GB on disk after indexing). predict auto-detects this path.

To add EnvDB and use it in prediction: EnvDB can improve MSA coverage when UniRef30 hits are weak, at higher disk/RAM cost.

tt-bio msa --db all
tt-bio predict examples/prot.yaml --model boltz2 --use_envdb --override

Key Options:

  • --override: Re-run from scratch, ignoring cached files
  • --use_msa_server: Generate MSA via ColabFold API
  • --msa_db_path: Use a local database at a custom path (e.g. --msa_db_path /data/colabfold_db)
  • --use_envdb: Include EnvDB in offline MSA (tt-bio msa --db all)
  • --accelerator=tenstorrent: Use Tenstorrent hardware (default, or use cpu/gpu)
  • --fast: Makes some operations use block-fp8, a lower-precision numeric format that runs faster; accuracy is typically very close
  • --debug: Show all raw output from the hardware and libraries instead of the progress display
  • --debug --log: Same as --debug, but also print what each device is currently working on

Binding Affinity Prediction (Boltz-2)

Predict binding affinity for protein-ligand complexes:

tt-bio predict examples/affinity.yaml --model boltz2 --use_msa_server --override --affinity_mw_correction

The --affinity_mw_correction flag applies molecular weight correction for more accurate predictions.

Input Format

ESMFold2 takes a plain protein FASTA or a YAML with one or more protein chains. The richer inputs below — ligands, affinity, DNA/RNA, constraints, and templates — are Boltz-2 features.

Create a YAML file describing your complex:

version: 1
sequences:
  - protein:
      id: A
      sequence: MVTPEGNVSLVDESLLVGVTDEDRAVRSAHQFYERLIGLWAPAVMEAAHELGVFAALAEAPADSGELARRLDCDARAMRVLLDALYAYDVIDRIHDTNGFRYLLSAEARECLLPGTLFSLVGKFMHDINVAWPAWRNLAEVVRHGARDTSGAESPNGIAQEDYESLVGGINFWAPPIVTTLSRKLRASGRSGDATASVLDVGCGTGLYSQLLLREFPRWTATGLDVERIATLANAQALRLGVEERFATRAGDFWRGGWGTGYDLVLFANIFHLQTPASAVRLMRHAAACLAPDGLVAVVDQIVDADREPKTPQDRFALLFAASMTNTGGGDAYTFQEYEEWFTAAGLQRIETLDTPMHRILLARRATEPSAVPEGQASENLYFQ
  - ligand:
      id: B
      smiles: 'N[C@@H](Cc1ccc(O)cc1)C(=O)O'
properties:
  - affinity:
      binder: B

Entity Types:

  • Polymers: protein, dna, rna — provide sequence
  • Ligands: ligand — provide smiles or ccd code

Multiple Identical Chains:

- protein:
    id: [A, B]  # Two identical chains
    sequence: ...

Understanding Results

Output Structure

boltz_results_prot/
├── structures/
│   ├── prot.cif                      # Best-ranked predicted structure
│   └── prot_model_1.cif              # Additional samples (if diffusion_samples > 1)
├── results.json                      # One entry per target with confidence/affinity metrics
├── power_profile.csv                 # (optional, --report-energy)
├── power_profile.png                 # (optional, --report-energy)
├── prot_pae.npz                      # (optional, --write_pae)
├── prot_pde.npz                      # (optional, --write_pde)
└── prot_embeddings.npz               # (optional, --write_embeddings)

MSA results are cached in <out_dir>/msa/ (default ./msa/), keyed by sequence hash. The same protein sequence is never searched twice, even across different input files or runs. The MSA search uses all available CPU threads and keeps the database index memory-mapped for maximum speed.

Confidence Scores

Each target entry in results.json contains confidence metrics. The fields below are Boltz-2's; an ESMFold2 entry instead carries plddt (mean, 0-1), ptm when available, and n_residues / n_chains.

{
    "id": "prot",
    "status": "ok",
    "confidence_score": 0.84,
    "ptm": 0.84,
    "iptm": 0.82,
    "complex_plddt": 0.84,
    "chains_ptm": {
        "0": 0.85,
        "1": 0.83
    },
    "pair_chains_iptm": {
        "0": {"0": 0.85, "1": 0.72},
        "1": {"0": 0.82, "1": 0.83}
    }
}
  • confidence_score: Overall confidence (0-1, higher is better), calculated as 0.8 × complex_plddt + 0.2 × iptm. Models are ranked by this score
  • ptm: Predicted TM-score for complex (0-1)
  • iptm: Interface TM-score (0-1)
  • complex_plddt: Average per-residue confidence (0-1)
  • chains_ptm: Per-chain TM-scores (0-1)
  • pair_chains_iptm: Per-chain-pair interface TM-scores (0-1)

Affinity Predictions

For affinity targets, the same results.json entry also contains:

{
    "affinity_pred_value": 2.47,
    "affinity_probability_binary": 0.41,
    "affinity_pred_value1": 2.55,
    "affinity_pred_value2": 2.19,
    "affinity_probability_binary1": 0.50,
    "affinity_probability_binary2": 0.42
}
  • affinity_probability_binary: Probability of binding (0-1). Use for hit discovery (higher = more likely to bind)
  • affinity_pred_value: Predicted binding affinity as log10(IC50) in μM. Use for ligand optimization (lower = stronger binding). Only compare between known active molecules
  • affinity_pred_value1, affinity_pred_value2: Individual model predictions for binding affinity
  • affinity_probability_binary1, affinity_probability_binary2: Individual model predictions for binding probability

Advanced Usage

Input Format Details

Proteins with Custom MSA

- protein:
    id: A
    sequence: MVTPEGNVSLVDES...
    msa: ./path/to/msa.a3m

Proteins with Modifications

- protein:
    id: A
    sequence: MVTPEGNVSLVDES...
    modifications:
      - position: 5
        ccd: PTR  # Modified residue code

Ligands

- ligand:
    id: B
    smiles: 'CC1=CC=CC=C1'  # SMILES string
    # OR
    ccd: ATP                # CCD code

Constraints

Pocket Constraints (binding site):

constraints:
  - pocket:
      binder: B              # Ligand chain
      contacts: [[A, 10], [A, 11], [A, 12]]  # Binding site residues
      max_distance: 6.0      # Angstroms (4-20A, default 6A)
      force: false           # Use potential to enforce (default: false)

Contact Constraints:

constraints:
  - contact:
      token1: [A, 10]
      token2: [A, 50]
      max_distance: 8.0
      force: false

Templates

Use experimental structures as templates:

templates:
  - cif: ./template.cif
    chain_id: A
    template_id: A
    force: true              # Enforce template alignment
    threshold: 2.0           # Max deviation in Angstroms

Command-Line Options

Options apply to every model unless tagged (Boltz-2).

Common Options:

Option Default Description
--model boltz2 boltz2, esmfold2, esmfold2-fast (single-sequence ESMFold2), or protenix-v2 (AlphaFold3-family folder)
--out_dir ./ Output directory
--cache ~/.boltz (Boltz-2) model cache directory; ESMFold2 uses the Hugging Face cache
--accelerator tenstorrent (Boltz-2) tenstorrent, cpu, or gpu; ESMFold2 always runs on Tenstorrent
--recycling_steps 3 Number of recycling iterations
--sampling_steps 200 Diffusion sampling steps
--diffusion_samples 1 Number of structure samples
--output_format cif cif or pdb
--override False Re-run from scratch
--use_msa_server False Use online ColabFold API for MSAs (required for Boltz-2, optional for ESMFold2)
--use_potentials False (Boltz-2) Apply physical constraints
--affinity_mw_correction False (Boltz-2) Apply MW correction to affinity
--num_devices 0 Number of TT devices (0=all available)
--device_ids Comma-separated TT device IDs (e.g. 0,2)
--fast False Makes some operations use block-fp8, a lower-precision numeric format that runs faster; accuracy is typically very close
--listen Accept worker connections from other machines; see Multi-Machine Prediction
--report-energy False (Boltz-2) Enables optional energy profiling for one TT device (requires tt-mgmt add-on); writes power_profile.csv and power_profile.png
--energy-metric both (Boltz-2) Choose power channel(s): tdp, input, or both
--energy-sample-hz 20.0 (Boltz-2) Sampling rate in Hz for both power_w and input_power_w channels

Affinity-Specific Options (Boltz-2):

Option Default Description
--sampling_steps_affinity 200 Sampling steps for affinity
--diffusion_samples_affinity 5 Number of affinity samples

MSA Options (Boltz-2; used by ESMFold2 only when you opt into an MSA):

Option Default Description
--msa_db_path auto-detect Path to local ColabFold database
--use_envdb False Also search environmental database
--use_msa_server False Use ColabFold API for MSA
--msa_server_url https://api.colabfold.com MSA server URL
--msa_pairing_strategy greedy greedy or complete
--max_msa_seqs 8192 Maximum MSA sequences
--subsample_msa False Subsample MSA
--num_subsampled_msa 1024 Number of subsampled sequences

MSA Database Setup Options:

Option Default Description
--db uniref30 uniref30 (~500GB), envdb (~800GB), or all
--path ~/.boltz/msa_db Where to store the databases
--install-tools True Auto-install missing mmseqs/colabfold_search

MSA Server Authentication

For --use_msa_server:

Basic Authentication:

export BOLTZ_MSA_USERNAME=myuser
export BOLTZ_MSA_PASSWORD=mypassword
tt-bio predict ... --model boltz2 --use_msa_server

API Key Authentication:

export MSA_API_KEY_VALUE=your-api-key
tt-bio predict ... --model boltz2 --use_msa_server

Optional: Multi-Machine Prediction

Combine the cards across any mix of Tenstorrent machines — a workstation, one or more QuietBoxes, one or more Galaxy servers — into a single run.

On the machine driving the run:

tt-bio predict ./proteins --model boltz2 --listen 8765 --use_msa_server --fast

On every additional machine, replace HOST with the driving machine's hostname or IP:

tt-bio worker --connect http://HOST:8765

Optional: Energy Measurement (Boltz-2)

Use --report-energy to profile energy during prediction:

tt-bio predict examples/686.yaml --model boltz2 --override --device_ids 0 --report-energy --energy-metric both --energy-sample-hz 5

Behavior:

  • Select metric channel(s) with --energy-metric (tdp, input, both)
  • Uses one sampling rate (--energy-sample-hz, default 20 Hz)
  • Supports only Tenstorrent runs with one selected device
  • Records two power channels when available:
    • power_w: tt-mgmt UMD telemetry power (TDP channel)
    • input_power_w: tt-mgmt UMD telemetry input power
  • Requires optional tt-mgmt installation:
    • git clone --recursive https://github.com/aperezvicente-TT/tt-mgmt.git
    • pip install -e ./tt-mgmt
  • Prints energy summary metrics for selected channels
  • Always writes:
    • power_profile.csv
    • power_profile.png

BoltzGen

BoltzGen designs protein binders against a target. The pipeline runs design → inverse folding → folding → analysis → filtering and writes the top-ranked binders to <output>/final_ranked_designs/.

tt-bio gen run examples/binder.yaml --num_designs 10

This automatically uses every available card (splitting the designs across them and merging the results) and writes to ./binder/. Add --device_ids 0,2 to run on specific cards only.

Input Format

entities:
  - protein:
      id: B
      sequence: 80..120         # designed chain, sampled length per design
  - file:
      path: target.cif          # target structure (path relative to this yaml)
      include:
        - chain:
            id: A

80..120 randomises the binder length per design; a fixed integer pins it. Ligand, DNA, and RNA targets use the same YAML grammar as tt-bio predict. See the BoltzGen examples for binding sites, scaffolds, and residue constraints.

Protocols

--protocol sets defaults appropriate for the binder type.

Protocol Use for
protein-anything (default) de-novo protein binder
peptide-anything peptide binder
nanobody-anything nanobody / VHH
antibody-anything antibody
protein-small_molecule binder against a small-molecule target (adds affinity step)
protein-redesign re-design existing residues (e.g. symmetric dimers)

Running a Subset

--steps restricts the pipeline.

tt-bio gen run examples/binder.yaml --steps design --num_designs 10
tt-bio gen run examples/binder.yaml --output existing/ --steps analysis filtering

Command-Line Options

Option Default Description
--protocol protein-anything Protocol; sets defaults appropriate for the binder type
--num_designs 10000 Number of binders to generate
--budget 30 Number of top designs kept after filtering
--output ./<basename>/ Output directory
--steps (all) Run only specific stages
--config STEP key=val Override per-stage config (e.g. --config design sampling_steps=200)
--device_ids all cards Restrict to specific cards (e.g. 0,2)
--fast False Use block-fp8 for some ops (slightly lower precision, faster)
--cache ~/.boltz/boltzgen Cache for downloaded weights
--debug False Disable live display; show raw stage output
--debug --log False Add per-stage progress markers

Cite

If you use this code or the models in your research, please cite the following papers:

@article{passaro2025boltz2,
  author = {Passaro, Saro and Corso, Gabriele and Wohlwend, Jeremy and Reveiz, Mateo and Thaler, Stephan and Somnath, Vignesh Ram and Getz, Noah and Portnoi, Tally and Roy, Julien and Stark, Hannes and Kwabi-Addo, David and Beaini, Dominique and Jaakkola, Tommi and Barzilay, Regina},
  title = {Boltz-2: Towards Accurate and Efficient Binding Affinity Prediction},
  year = {2025},
  doi = {10.1101/2025.06.14.659707},
  journal = {bioRxiv}
}

@article{stark2025boltzgen,
  author = {Stark, Hannes and Faltings, Felix and Choi, MinGyu and Xie, Yuxin and Hur, Eunsu and O'Donnell, Timothy John and Bushuiev, Anton and U{\c c}ar, Talip and Passaro, Saro and Mao, Weian and Reveiz, Mateo and Bushuiev, Roman and Pluskal, Tom{\'a}{\v s} and Sivic, Josef and Kreis, Karsten and Vahdat, Arash and Ray, Shamayeeta and Goldstein, Jonathan T. and Savinov, Andrew and Hambalek, Jacob A. and Gupta, Anshika and Taquiri-Diaz, Diego A. and Zhang, Yaotian and Hatstat, A. Katherine and Arada, Angelika and Kim, Nam Hyeong and Tackie-Yarboi, Ethel and Boselli, Dylan and Schnaider, Lee and Liu, Chang C. and Li, Gene-Wei and Hnisz, Denes and Sabatini, David M. and DeGrado, William F. and Wohlwend, Jeremy and Corso, Gabriele and Barzilay, Regina and Jaakkola, Tommi},
  title = {BoltzGen: Toward Universal Binder Design},
  year = {2025},
  doi = {10.1101/2025.11.20.689494},
  journal = {bioRxiv}
}

@article{wohlwend2024boltz1,
  author = {Wohlwend, Jeremy and Corso, Gabriele and Passaro, Saro and Getz, Noah and Reveiz, Mateo and Leidal, Ken and Swiderski, Wojtek and Atkinson, Liam and Portnoi, Tally and Chinn, Itamar and Silterra, Jacob and Jaakkola, Tommi and Barzilay, Regina},
  title = {Boltz-1: Democratizing Biomolecular Interaction Modeling},
  year = {2024},
  doi = {10.1101/2024.11.19.624167},
  journal = {bioRxiv}
}

@misc{candido2026language,
  author = {Candido, Salvatore and Hayes, Thomas and Derry, Alexander and Rao, Roshan and Lin, Zeming and Verkuil, Robert and others},
  title = {Language Modeling Materializes a World Model of Protein Biology},
  year = {2026},
  url = {https://biohub.ai/papers/esm_protein.pdf},
  note = {Preprint; ESMC / ESMFold2}
}

@misc{protenix2025,
  author = {{ByteDance AML AI4Science Team}},
  title = {Protenix: An AlphaFold3 Reproduction for Biomolecular Structure Prediction},
  year = {2025},
  url = {https://github.com/bytedance/Protenix}
}

In addition if you use the automatic MSA generation, please cite:

@article{mirdita2022colabfold,
  title={ColabFold: making protein folding accessible to all},
  author={Mirdita, Milot and Sch{\"u}tze, Konstantin and Moriwaki, Yoshitaka and Heo, Lim and Ovchinnikov, Sergey and Steinegger, Martin},
  journal={Nature methods},
  year={2022}
}

License

tt-bio is released under the MIT License (see LICENSE) and is built on the MIT-licensed Boltz-2 / Boltz-1 code. It bundles third-party code, each under its upstream license: the ESMFold2 host-side reference under tt_bio/_vendor/ (the esm pipeline, MIT, © Chan Zuckerberg Biohub; and the HuggingFace ESMFold2 model definition, Apache-2.0) and the BoltzGen binder-design source under tt_bio/boltzgen/ (MIT, © Hannes Stärk). Protenix-v2 is an independent ttnn reimplementation — no upstream code is vendored — and its weights download from ByteDance's Hugging Face mirror under Apache-2.0. See NOTICE for sources, versions, and modifications.

About

Boltz-2, BoltzGen, ESMFold-2 implementation for inference on Tenstorrent hardware

Resources

License

Stars

Watchers

Forks

Contributors