# Botlz-1x with Docker for Local Development and Cloud Deployment

---

This documentation provides a guide on Botlz-1x implemented in Python, designed for both local development and cloud deployment using Docker. It covers the following topics:

1. **Introduction to Botlz-1x**: Overview of the Botlz-1x and its applications.
2. **Setting Up the Development Environment**: Step-by-step instructions for setting up a local development environment using Docker.
3. **Building and Running the Docker Container**: Instructions for building the Docker image and running the container.
4. **Deploying to the Cloud**: Guidelines for deploying the Botlz-1x to a cloud platform using Docker.
5. **Best Practices**: Tips and best practices for working with Botlz-1x and Docker.

## Introduction to Botlz-1x and the Protein-Folding Journey

---

Predicting a protein’s three-dimensional shape from its amino acid sequence has been a “grand challenge” in biology for over half a century. In 1973, Christian Anfinsen demonstrated that the information for folding is encoded in the sequence itself when his chemically denatured ribonuclease spontaneously refolded into its active conformation upon removal of denaturants ([Aklectures][1]). This foundational experiment launched an era of **physics-based** and **statistical** modeling.

In 1994, the biannual **CASP** (Critical Assessment of Structure Prediction) competition was created to objectively benchmark methods—even before structures were publicly released ([Wikipedia][2]). Early efforts like **Rosetta** (first in FORTRAN, then C++) applied fragment assembly and Monte Carlo sampling for **de novo** prediction, winning CASP tasks via clever energy functions and design protocols ([docs.rosettacommons.org][3], [PMC][4]). Over the 2000s, Rosetta expanded into docking, design, and community-driven platforms like Foldit ([Wikipedia][5]).

The deep-learning era arrived in 2020 when **DeepMind’s AlphaFold 2** achieved atomic-level accuracy in CASP14, effectively solving the prediction problem for most single-chain proteins ([Nature][6], [WIRED][7]). Soon after, the Baker lab released **RoseTTAFold**, democratizing high-accuracy predictions on consumer GPUs in minutes ([Baker Lab][8]). In 2024, the Nobel Prize for Chemistry recognized Demis Hassabis, John Jumper, and David Baker for these complementary breakthroughs in AI-driven folding and design ([Le Monde.fr][9]).

Building on this lineage, **Botlz-1x** leverages a novel **Boltzmann-inspired** architecture that blends state-space models with graph-based potentials to predict structures faster and with fewer resources. This notebook shows you how to:

1. **Containerize** Botlz-1x in Docker for reproducible local experiments
2. **Scale** training and inference via cloud deployment
3. **Integrate** with existing folding pipelines and compare performance

Whether you’re an academic exploring protein design or an industry practitioner deploying at scale, Botlz-1x offers a lightweight, production-ready alternative in the post-AlphaFold landscape.


## Notebook Roadmap

---

### Sections
- [Building and Running the Docker Container](#building-and-running-the-docker-container)
- [Using Botlz-1x](#using-botlz-1x)
- [Deploying to the Cloud](#deploying-to-the-cloud)


### Prerequisites

Before you begin, ensure you have the following installed on your local machine:

- Docker: [Install Docker](https://docs.docker.com/get-docker/)
- A compatible GPU (for Botlz-1x)
- NVIDIA drivers (if using GPU)



## Building and Running the Docker Container

---

To build and run the Docker container for Botlz-1x, follow these steps:

1. **Clone the Repository**: Clone the Botlz-1x repository to your local machine.

   ```bash
   git clone https://github.com/gabenavarro/MLContainerLab.git
   cd MLContainerLab
   ```

2. **Build the Docker Image**: Use the provided Dockerfile to build the Docker image.

   ```bash
   # You can choose any tag you want for the image
   # Feel free to play around with the base image, just make sure the host has the same or higher CUDA version
   docker build -f ./assets/build/Dockerfile.boltz1x.cu126cp310 -t boltz1x:126-310 .
   ```
3. **Run the Docker Container**: Run the Docker container with the necessary configurations. In the first example, we will run the container locally with GPU support. This is the recommended way to run a container while in development mode. For scaling up, we will use the second example which runs the container in the cloud.

   ```bash
    # Run the container with GPU support
    docker run -dt \
        --gpus all \
        --shm-size=64g \
        -v "$(pwd):/workspace" \
        --name boltz1x \
        --env NVIDIA_VISIBLE_DEVICES=all \
        --env GOOGLE_APPLICATION_CREDENTIALS=/workspace/assets/secrets/gcp-key.json \
        boltz1x:126-310
    ```
> Note: The `-v "$(pwd):/workspace"` option mounts the current directory to `/workspace` in the container, allowing you to access your local files from within the container. The `--env` options set environment variables for GPU visibility and Google Cloud credentials.<br>
> Note: The `--gpus all` option allows the container to use all available GPUs. <br>

4. **Access the Container with IDE**: In this example, we will use Visual Studio Code to access the container. You can use any IDE of your choice.

   ```bash
   # In a scriptable manner
   CONTAINER_NAME=boltz1x
   FOLDER=/workspace
   HEX_CONFIG=$(printf {\"containerName\":\"/$CONTAINER_NAME\"} | od -A n -t x1 | tr -d '[\n\t ]')
   code --folder-uri "vscode-remote://attached-container+$HEX_CONFIG$FOLDER"
   ```

> Note: The `code` command is used to open Visual Studio Code. Make sure you have the Remote - Containers extension installed in VS Code to access the container directly. <br>
> Note: Make sure you have installed Remote - Containers extension in VS Code.<br>



Quick use

```bash
  --out_dir PATH               The path where to save the predictions.
  --cache PATH                 The directory where to download the data and
                               model. Default is ~/.boltz, or $BOLTZ_CACHE if
                               set.
  --checkpoint PATH            An optional checkpoint, will use the provided
                               Boltz-1 model by default.
  --devices INTEGER            The number of devices to use for prediction.
                               Default is 1.
  --accelerator [gpu|cpu|tpu]  The accelerator to use for prediction. Default
                               is gpu.
  --recycling_steps INTEGER    The number of recycling steps to use for
                               prediction. Default is 3.
  --sampling_steps INTEGER     The number of sampling steps to use for
                               prediction. Default is 200.
  --diffusion_samples INTEGER  The number of diffusion samples to use for
                               prediction. Default is 1.
  --step_scale FLOAT           The step size is related to the temperature at
                               which the diffusion process samples the
                               distribution.The lower the higher the diversity
                               among samples (recommended between 1 and 2).
                               Default is 1.638.
  --write_full_pae             Whether to dump the pae into a npz file.
                               Default is True.
  --write_full_pde             Whether to dump the pde into a npz file.
                               Default is False.
  --output_format [pdb|mmcif]  The output format to use for the predictions.
                               Default is mmcif.
  --num_workers INTEGER        The number of dataloader workers to use for
                               prediction. Default is 2.
  --override                   Whether to override existing found predictions.
                               Default is False.
  --seed INTEGER               Seed to use for random number generator.
                               Default is None (no seeding).
  --use_msa_server             Whether to use the MMSeqs2 server for MSA
                               generation. Default is False.
  --msa_server_url TEXT        MSA server url. Used only if --use_msa_server
                               is set.
  --msa_pairing_strategy TEXT  Pairing strategy to use. Used only if
                               --use_msa_server is set. Options are 'greedy'
                               and 'complete'
  --no_potentials              Whether to not use potentials for steering.
                               Default is False.
```


[1]: https://aklectures.com/lecture/structure-of-proteins/anfinsens-experiment-of-protein-folding "Anfinsen's Experiment of Protein Folding - AK Lectures"
[2]: https://en.wikipedia.org/wiki/CASP "CASP - Wikipedia"
[3]: https://docs.rosettacommons.org/docs/latest/meta/Rosetta-Timeline "History of Rosetta"
[4]: https://pmc.ncbi.nlm.nih.gov/articles/PMC7603796 "Macromolecular modeling and design in Rosetta: recent methods ..."
[5]: https://en.wikipedia.org/wiki/Rosetta%40home "Rosetta@home"
[6]: https://www.nature.com/articles/s41586-021-03819-2 "Highly accurate protein structure prediction with AlphaFold - Nature"
[7]: https://www.wired.com/story/deepmind-alphafold-protein-diseases "DeepMind wants to use its AI to cure neglected diseases"
[8]: https://www.bakerlab.org/2021/07/15/accurate-protein-structure-prediction-accessible "Accurate protein structure prediction accessible to all - Baker Lab"
[9]: https://www.lemonde.fr/en/science/article/2024/10/09/nobel-prize-for-chemistry-2024-artificial-intelligence-garners-more-recognition_6728828_10.html "Nobel Prize for Chemistry 2024: Artificial intelligence garners more recognition"


## Using Botlz-1x

Now we will go ahead and run Boltz-1x with a few different file formats in order to understand the different configurations.  First we will start with a fasta file.

```fasta
>A|protein|./examples/msa/seq1.a3m
MVTPEGNVSLVDESLLVGVTDEDRAVRSAHQFYERLIGLWAPAVMEAAHELGV
>B|protein|./examples/msa/seq1.a3m
MVTPEGNVSLVDESLLVGVTDEDRAVRSAHQFYERLIGLWAPAVMEAAHELGV
>C|ccd
SAH
>D|ccd
SAH
>E|smiles
N[C@@H](Cc1ccc(O)cc1)C(=O)O
>F|smiles
N[C@@H](Cc1ccc(O)cc1)C(=O)O
```

The header is separated by `|` character. The first item is the chain ID and must be unique. The second item is the entity type, with options `protein`, `dna`, `rna`, `ccd`, and `smiles`. The last index is the path to a precomputed MSA file, although this is optional as the MSA file can be calculated as part of the boltz run. In a production environment, it makes sense to first pre-compute all MSA files using a CPU and memory high box, then run protein folding inference with a GPU heavy box.

The exact fast file we will use is found [boltz1x.fasta](../assets/test-files/boltz1.fasta). As has the following content:

```fasta
>A|protein
DEAIHCPPCSEEKLARCRPPVGCEELVREPGCGCCATCALGLGMPCGVYTPRCGSGLRCYPPRGVEKPLHTLMHGQGVCMELAEIEAIQESL
>B|protein
GPETLCGAELVDALQFVCGDRGFYFNKPTGYGSSSRRAPQTGIVDECCFRSCDLRRLEMYCAPLKPAKSA
```

This the PDB file [2DSP](https://www.rcsb.org/structure/2DSP). 

![image](../assets/images/2dsp_assembly-1.jpeg)

In [6]:
# Simple example
!boltz predict /workspace/assets/test-files/boltz1.fasta \
    --recycling_steps 10 \
    --diffusion_samples 25 \
    --accelerator gpu \
    --out_dir /workspace/datasets/boltz1x/predict2 \
    --cache /workspace/datasets/boltz1x/cache \
    --use_msa_server

Checking input data.
Running predictions for 1 structure
Processing input data.
  0%|                                                     | 0/1 [00:00<?, ?it/s]Generating MSA for /workspace/assets/test-files/boltz1.fasta with 2 protein entities.

  0%|                                      | 0/300 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|                              | 0/300 [elapsed: 00:00 remaining: ?][A
PENDING:   0%|                             | 0/300 [elapsed: 00:00 remaining: ?][ASleeping for 5s. Reason: PENDING

RUNNING:   0%|                             | 0/300 [elapsed: 00:06 remaining: ?][A
RUNNING:   2%|▍                        | 5/300 [elapsed: 00:06 remaining: 06:02][ASleeping for 10s. Reason: RUNNING

RUNNING:   2%|▍                        | 5/300 [elapsed: 00:16 remaining: 06:02][A
RUNNING:   5%|█▏                      | 15/300 [elapsed: 00:16 remaining: 05:10][ASleeping for 7s. Reason: RUNNING

RUNNING:   5%|█▏                      | 15/300 [elapsed: 00:23 re

The results are fairly close to experimental modal.
![image](../assets/images/bolt_prediction_igf.png)