# Botlz-1x with Docker for Local Development and Cloud Deployment

---

This documentation provides a guide on Botlz-1x implemented in Python, designed for both local development and cloud deployment using Docker. It covers the following topics:

1. **Introduction to Botlz-1x**: Overview of the Botlz-1x and its applications.
2. **Setting Up the Development Environment**: Step-by-step instructions for setting up a local development environment using Docker.
3. **Building and Running the Docker Container**: Instructions for building the Docker image and running the container.
4. **Deploying to the Cloud**: Guidelines for deploying the Botlz-1x to a cloud platform using Docker.
5. **Best Practices**: Tips and best practices for working with Botlz-1x and Docker.

## Introduction to Botlz-1x and the Protein-Folding Journey

---

Predicting a protein’s three-dimensional shape from its amino acid sequence has been a “grand challenge” in biology for over half a century. In 1973, Christian Anfinsen demonstrated that the information for folding is encoded in the sequence itself when his chemically denatured ribonuclease spontaneously refolded into its active conformation upon removal of denaturants ([Aklectures][1]). This foundational experiment launched an era of **physics-based** and **statistical** modeling.

In 1994, the biannual **CASP** (Critical Assessment of Structure Prediction) competition was created to objectively benchmark methods—even before structures were publicly released ([Wikipedia][2]). Early efforts like **Rosetta** (first in FORTRAN, then C++) applied fragment assembly and Monte Carlo sampling for **de novo** prediction, winning CASP tasks via clever energy functions and design protocols ([docs.rosettacommons.org][3], [PMC][4]). Over the 2000s, Rosetta expanded into docking, design, and community-driven platforms like Foldit ([Wikipedia][5]).

The deep-learning era arrived in 2020 when **DeepMind’s AlphaFold 2** achieved atomic-level accuracy in CASP14, effectively solving the prediction problem for most single-chain proteins ([Nature][6], [WIRED][7]). Soon after, the Baker lab released **RoseTTAFold**, democratizing high-accuracy predictions on consumer GPUs in minutes ([Baker Lab][8]). In 2024, the Nobel Prize for Chemistry recognized Demis Hassabis, John Jumper, and David Baker for these complementary breakthroughs in AI-driven folding and design ([Le Monde.fr][9]).

Building on this lineage, **Botlz-1x** leverages a novel **Boltzmann-inspired** architecture that blends state-space models with graph-based potentials to predict structures faster and with fewer resources. This notebook shows you how to:

1. **Containerize** Botlz-1x in Docker for reproducible local experiments
2. **Scale** training and inference via cloud deployment
3. **Integrate** with existing folding pipelines and compare performance

Whether you’re an academic exploring protein design or an industry practitioner deploying at scale, Botlz-1x offers a lightweight, production-ready alternative in the post-AlphaFold landscape.


## Notebook Roadmap

---

### Sections
- [Building and Running the Docker Container](#building-and-running-the-docker-container)
- [Using Botlz-1x](#using-botlz-1x)
- [Small GSK3B Study](#small-gsk3b-study)
- [Deploying to the Cloud](#deploying-to-the-cloud)


### Prerequisites

Before you begin, ensure you have the following installed on your local machine:

- Docker: [Install Docker](https://docs.docker.com/get-docker/)
- A compatible GPU (for Botlz-1x)
- NVIDIA drivers (if using GPU)



## Building and Running the Docker Container

---

To build and run the Docker container for Botlz-1x, follow these steps:

1. **Clone the Repository**: Clone the Botlz-1x repository to your local machine.

   ```bash
   git clone https://github.com/gabenavarro/MLContainerLab.git
   cd MLContainerLab
   ```

2. **Build the Docker Image**: Use the provided Dockerfile to build the Docker image.

   ```bash
   # You can choose any tag you want for the image
   # Feel free to play around with the base image, just make sure the host has the same or higher CUDA version
   docker build -f ./assets/build/Dockerfile.boltz1x.cu126cp310 -t boltz1x:126-310 .
   ```
3. **Run the Docker Container**: Run the Docker container with the necessary configurations. In the first example, we will run the container locally with GPU support. This is the recommended way to run a container while in development mode. For scaling up, we will use the second example which runs the container in the cloud.

   ```bash
    # Run the container with GPU support
    docker run -dt \
        --gpus all \
        --shm-size=64g \
        -v "$(pwd):/workspace" \
        --name boltz1x \
        --env NVIDIA_VISIBLE_DEVICES=all \
        --env GOOGLE_APPLICATION_CREDENTIALS=/workspace/assets/secrets/gcp-key.json \
        boltz1x:126-310
    ```
> Note: The `-v "$(pwd):/workspace"` option mounts the current directory to `/workspace` in the container, allowing you to access your local files from within the container. The `--env` options set environment variables for GPU visibility and Google Cloud credentials.<br>
> Note: The `--gpus all` option allows the container to use all available GPUs. <br>

4. **Access the Container with IDE**: In this example, we will use Visual Studio Code to access the container. You can use any IDE of your choice.

   ```bash
   # In a scriptable manner
   CONTAINER_NAME=boltz1x
   FOLDER=/workspace
   HEX_CONFIG=$(printf {\"containerName\":\"/$CONTAINER_NAME\"} | od -A n -t x1 | tr -d '[\n\t ]')
   code --folder-uri "vscode-remote://attached-container+$HEX_CONFIG$FOLDER"
   ```

> Note: The `code` command is used to open Visual Studio Code. Make sure you have the Remote - Containers extension installed in VS Code to access the container directly. <br>
> Note: Make sure you have installed Remote - Containers extension in VS Code.<br>



Quick use

```bash
  --out_dir PATH               The path where to save the predictions.
  --cache PATH                 The directory where to download the data and
                               model. Default is ~/.boltz, or $BOLTZ_CACHE if
                               set.
  --checkpoint PATH            An optional checkpoint, will use the provided
                               Boltz-1 model by default.
  --devices INTEGER            The number of devices to use for prediction.
                               Default is 1.
  --accelerator [gpu|cpu|tpu]  The accelerator to use for prediction. Default
                               is gpu.
  --recycling_steps INTEGER    The number of recycling steps to use for
                               prediction. Default is 3.
  --sampling_steps INTEGER     The number of sampling steps to use for
                               prediction. Default is 200.
  --diffusion_samples INTEGER  The number of diffusion samples to use for
                               prediction. Default is 1.
  --step_scale FLOAT           The step size is related to the temperature at
                               which the diffusion process samples the
                               distribution.The lower the higher the diversity
                               among samples (recommended between 1 and 2).
                               Default is 1.638.
  --write_full_pae             Whether to dump the pae into a npz file.
                               Default is True.
  --write_full_pde             Whether to dump the pde into a npz file.
                               Default is False.
  --output_format [pdb|mmcif]  The output format to use for the predictions.
                               Default is mmcif.
  --num_workers INTEGER        The number of dataloader workers to use for
                               prediction. Default is 2.
  --override                   Whether to override existing found predictions.
                               Default is False.
  --seed INTEGER               Seed to use for random number generator.
                               Default is None (no seeding).
  --use_msa_server             Whether to use the MMSeqs2 server for MSA
                               generation. Default is False.
  --msa_server_url TEXT        MSA server url. Used only if --use_msa_server
                               is set.
  --msa_pairing_strategy TEXT  Pairing strategy to use. Used only if
                               --use_msa_server is set. Options are 'greedy'
                               and 'complete'
  --no_potentials              Whether to not use potentials for steering.
                               Default is False.
```


[1]: https://aklectures.com/lecture/structure-of-proteins/anfinsens-experiment-of-protein-folding "Anfinsen's Experiment of Protein Folding - AK Lectures"
[2]: https://en.wikipedia.org/wiki/CASP "CASP - Wikipedia"
[3]: https://docs.rosettacommons.org/docs/latest/meta/Rosetta-Timeline "History of Rosetta"
[4]: https://pmc.ncbi.nlm.nih.gov/articles/PMC7603796 "Macromolecular modeling and design in Rosetta: recent methods ..."
[5]: https://en.wikipedia.org/wiki/Rosetta%40home "Rosetta@home"
[6]: https://www.nature.com/articles/s41586-021-03819-2 "Highly accurate protein structure prediction with AlphaFold - Nature"
[7]: https://www.wired.com/story/deepmind-alphafold-protein-diseases "DeepMind wants to use its AI to cure neglected diseases"
[8]: https://www.bakerlab.org/2021/07/15/accurate-protein-structure-prediction-accessible "Accurate protein structure prediction accessible to all - Baker Lab"
[9]: https://www.lemonde.fr/en/science/article/2024/10/09/nobel-prize-for-chemistry-2024-artificial-intelligence-garners-more-recognition_6728828_10.html "Nobel Prize for Chemistry 2024: Artificial intelligence garners more recognition"


## Using Botlz-1x

Now we will go ahead and run Boltz-1x with a few different file formats in order to understand the different configurations.  First we will start with a fasta file.

```fasta
>A|protein|./examples/msa/seq1.a3m
MVTPEGNVSLVDESLLVGVTDEDRAVRSAHQFYERLIGLWAPAVMEAAHELGV
>B|protein|./examples/msa/seq1.a3m
MVTPEGNVSLVDESLLVGVTDEDRAVRSAHQFYERLIGLWAPAVMEAAHELGV
>C|ccd
SAH
>D|ccd
SAH
>E|smiles
N[C@@H](Cc1ccc(O)cc1)C(=O)O
>F|smiles
N[C@@H](Cc1ccc(O)cc1)C(=O)O
```

The header is separated by `|` character. The first item is the chain ID and must be unique. The second item is the entity type, with options `protein`, `dna`, `rna`, `ccd`, and `smiles`. The last index is the path to a precomputed MSA file, although this is optional as the MSA file can be calculated as part of the boltz run. In a production environment, it makes sense to first pre-compute all MSA files using a CPU and memory high box, then run protein folding inference with a GPU heavy box.

The exact fast file we will use is found [boltz1x.fasta](../assets/test-files/boltz1.fasta). As has the following content:

```fasta
>A|protein
DEAIHCPPCSEEKLARCRPPVGCEELVREPGCGCCATCALGLGMPCGVYTPRCGSGLRCYPPRGVEKPLHTLMHGQGVCMELAEIEAIQESL
>B|protein
GPETLCGAELVDALQFVCGDRGFYFNKPTGYGSSSRRAPQTGIVDECCFRSCDLRRLEMYCAPLKPAKSA
```

This the PDB file of Inhibition of Insulin-like Growth Factors by IGF Binding Proteins ([2DSP](https://www.rcsb.org/structure/2DSP)). 

![image](../assets/images/2DSP.png)

In [6]:
# Simple example
!boltz predict /workspace/assets/test-files/boltz1.fasta \
    --recycling_steps 10 \
    --diffusion_samples 25 \
    --accelerator gpu \
    --out_dir /workspace/datasets/boltz1x/predict2 \
    --cache /workspace/datasets/boltz1x/cache \
    --use_msa_server

Checking input data.
Running predictions for 1 structure
Processing input data.
  0%|                                                     | 0/1 [00:00<?, ?it/s]Generating MSA for /workspace/assets/test-files/boltz1.fasta with 2 protein entities.

  0%|                                      | 0/300 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|                              | 0/300 [elapsed: 00:00 remaining: ?][A
PENDING:   0%|                             | 0/300 [elapsed: 00:00 remaining: ?][ASleeping for 5s. Reason: PENDING

RUNNING:   0%|                             | 0/300 [elapsed: 00:06 remaining: ?][A
RUNNING:   2%|▍                        | 5/300 [elapsed: 00:06 remaining: 06:02][ASleeping for 10s. Reason: RUNNING

RUNNING:   2%|▍                        | 5/300 [elapsed: 00:16 remaining: 06:02][A
RUNNING:   5%|█▏                      | 15/300 [elapsed: 00:16 remaining: 05:10][ASleeping for 7s. Reason: RUNNING

RUNNING:   5%|█▏                      | 15/300 [elapsed: 00:23 re

The results are fairly close to experimental modal. <br>

![image](../assets/images/boltz_2DSP.png) <br>

Now that we have completed the fasta file, lets go ahead and run the YAML file as it give us more flexibility and control of the prediction. The [boltz documentation](https://github.com/jwohlwend/boltz/blob/main/docs/prediction.md) shares the following template

```yaml
sequences:
    - ENTITY_TYPE:
        id: CHAIN_ID 
        sequence: SEQUENCE    # only for protein, dna, rna
        smiles: 'SMILES'        # only for ligand, exclusive with ccd
        ccd: CCD              # only for ligand, exclusive with smiles
        msa: MSA_PATH         # only for protein
        modifications:
          - position: RES_IDX   # index of residue, starting from 1
            ccd: CCD            # CCD code of the modified residue
        cyclic: false
     
    - ENTITY_TYPE:
        id: [CHAIN_ID, CHAIN_ID]    # multiple ids in case of multiple identical entities
        ...
constraints:
    - bond:
        atom1: [CHAIN_ID, RES_IDX, ATOM_NAME]
        atom2: [CHAIN_ID, RES_IDX, ATOM_NAME]
    - pocket:
        binder: CHAIN_ID
        contacts: [[CHAIN_ID, RES_IDX], [CHAIN_ID, RES_IDX]]
```


In our example, we will use a yaml file with to recreate PDB structure of FRAT1 binding to GSK3A ([1GNG](https://www.rcsb.org/structure/1GNG))

![image](../assets/images/1GNG.png) <br>

```yaml
version: 1
sequences:
    # GSK3A
    - protein:
        id: A 
        sequence: MSGRPRTTSFAESCKPVQQPSAFGSMKVSRDKDGSKVTTVVATPGQGPDRPQEVSYTDTKVIGNGSFGVVYQAKLCDSGELVAIKKVLQDKRFKNRELQIMRKLDHCNIVRLRYFFYSSGEKKDEVYLNLVLDYVPETVYRVARHYSRAKQTLPVIYVKLYMYQLFRSLAYIHSFGICHRDIKPQNLLLDPDTAVLKLCDFGSAKQLVRGEPNVSYICSRYYRAPELIFGATDYTSSIDVWSAGCVLAELLLGQPIFPGDSGVDQLVEIIKVLGTPTREQIREMNPNYTEFKFPQIKAHPWTKVFRPRTPPEAIALCSRLLEYTPTARLTPLEACAHSFFDELRDPNVKLPNGRDTPALFNFTTQELSSNPPLATILIPPHARIQAAASTPTNATAASDANTGDRGQTNNAASASASNST
    # FRAT1
    - protein:
        id: B
        sequence: MPCRREEEEEAGEEAEGEEEEEDSFLLLQQSVALGSSGEVDRLVAQIGETLQLDAAQHSPASPCGPPGAPLRAPGPLAAAVPADKARSPAVPLLLPPALAETVGPAPPGVLRCALGDRGRVRGRAAPYCVAELATGPSALSPLPPQADLDGPPGAGKQGIPQPLSGPCRRGWLRGAAASRRLQQRRGSQPETRTGDDDPHRLLQQLVLSGNLIKEAVRRLHSRRLQLRAKLPQRPLLGPLSAPVHEPPSPRSPRAACSDPGASGRAQLRTGDGVLVPGS
```

In [9]:
!boltz predict /workspace/assets/test-files/boltz1-example2.yaml \
    --recycling_steps 10 \
    --diffusion_samples 25 \
    --accelerator gpu \
    --out_dir /workspace/datasets/boltz1x/predict4 \
    --cache /workspace/datasets/boltz1x/cache \
    --use_msa_server

Checking input data.
Running predictions for 1 structure
Processing input data.
  0%|                                                     | 0/1 [00:00<?, ?it/s]Generating MSA for /workspace/assets/test-files/boltz1-example2.yaml with 2 protein entities.

  0%|                                      | 0/300 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|                              | 0/300 [elapsed: 00:00 remaining: ?][A
COMPLETE:   0%|                            | 0/300 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████████████████| 300/300 [elapsed: 00:01 remaining: 00:00][A

  0%|                                      | 0/300 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|                              | 0/300 [elapsed: 00:00 remaining: ?][A
COMPLETE:   0%|                            | 0/300 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████████████████| 300/300 [elapsed: 00:01 remaining: 00:00][A
100%|█████████████████████████████████████████████| 1/1 [00:03<00:00,  3

The results are fairly close to experimental modal. <br>

![image](../assets/images/boltz_1GNG.png) <br>


## Small GSK3B Study

Now lets move on to small study. Now that we have a model showing the interactions between GSK3K and FRAT1, lets see if any known inhibitor bind in the same spot.

![image](../assets/images/boltz_1GNG_zoom_in.png) <br>

First, lets start with identifying the interacting residues.

In [16]:
import sys
sys.path.append("/workspace")
from utils.protein_analysis import analyze_chain_interactions

table = analyze_chain_interactions("/workspace/assets/test-files/boltz1_1GNG_model_0.cif")
table[table["Interaction_Types"]!="None"].head(5)


Interaction Summary:
Total residue pairs analyzed: 136
Pairs with interactions: 40
Electrostatic interactions: 10
Hydrogen bond interactions: 8
Hydrophobic interactions: 27


Unnamed: 0,Chain_A_Residue,Chain_A_Position,Chain_B_Residue,Chain_B_Position,Min_Distance_A,Closest_Atoms,Interaction_Types,Has_Electrostatic,Has_Hydrogen_Bond,Has_Hydrophobic,Chain_A_Type,Chain_B_Type
1,LYS,297,GLU,22,2.49191,O-OE1,Electrostatic;Hydrogen_Bond,True,True,False,Positive,Negative
2,GLN,295,GLU,18,2.505144,N-OE2,Hydrogen_Bond,False,True,False,Polar,Negative
8,ASP,264,ARG,223,2.682222,OD1-NH2,Electrostatic;Hydrogen_Bond,True,True,False,Negative,Positive
14,ILE,281,ILE,213,3.044317,CD1-CD1,Hydrophobic,False,False,True,Hydrophobic,Hydrophobic
15,PHE,229,VAL,207,3.112422,O-CG1,Hydrophobic,False,False,True,Aromatic,Hydrophobic


In [None]:
counts = (
    table
    .loc[table["Interaction_Types"] != "None", "Chain_A_Position"]
    .value_counts()
)
counts = counts[counts > 1]
counts

Chain_A_Position
263    4
270    4
297    3
296    3
228    3
229    3
295    2
264    2
266    2
267    2
294    2
292    2
Name: count, dtype: int64

In [1]:
import pandas as pd
# Download csv file of small molecule interactions to /workspace/assets/test-files/GSK3B_inhibitor.csv
!wget -O /workspace/assets/test-files/GSK3B_inhibitor.csv https://www.guidetopharmacology.org/GRAC/SARFileDownload?objectId=2030
gsk3b_inhibitor_df = pd.read_csv("/workspace/assets/test-files/GSK3B_inhibitor.csv")
gsk3b_inhibitor_df.head(5)

--2025-05-23 08:16:22--  https://www.guidetopharmacology.org/GRAC/SARFileDownload?objectId=2030
Resolving www.guidetopharmacology.org (www.guidetopharmacology.org)... 129.215.67.107
Connecting to www.guidetopharmacology.org (www.guidetopharmacology.org)|129.215.67.107|:443... connected.
HTTP request sent, awaiting response... 200 200
Length: unspecified [text/csv]
Saving to: ‘/workspace/assets/test-files/GSK3B_inhibitor.csv’

/workspace/assets/t     [  <=>               ]  16.40K  48.4KB/s    in 0.3s    

2025-05-23 08:16:25 (48.4 KB/s) - ‘/workspace/assets/test-files/GSK3B_inhibitor.csv’ saved [16796]



Unnamed: 0,target,target_id,target_uniprot,target_species,ligand,ligand_id,ligand_species,ligand_pubchem_cid,smiles,inchi,...,affinity_low,original_affinity_units,original_affinity_low_nm,original_affinity_median_nm,original_affinity_high_nm,original_affinity_relation,assay_description,receptor_site,ligand_context,pubmed_id
glycogen synthase kinase 3 beta,2030,P49841,Human,Li<sup>+</sup>,5212,,3028194,,,,...,IC50,,2500000.0,,=,,,,11162580,
glycogen synthase kinase 3 beta,2030,P49841,Human,alsterpaullone,5925,,5005498,O=C1Nc2ccccc2c2c(C1)c1cc(ccc1[nH]2)[N+](=O)[O-],InChI=1S/C16H11N3O3/c20-15-8-12-11-7-9(19(21)2...,OLUKILHGKRVDCT-UHFFFAOYSA-N,...,IC50,,4.0,,=,,,,10998059,
glycogen synthase kinase 3 beta,2030,P49841,Human,alsterpaullone 2-cyanoethyl,5926,,16760286,N#CCCc1ccc2c(c1)c1[nH]c3c(c1CC(=O)N2)cc(cc3)[N...,InChI=1S/C19H14N4O3/c20-7-1-2-11-3-5-17-15(8-1...,UBLFSMURWWWWMH-UHFFFAOYSA-N,...,IC50,,0.8,,=,,,,18077363,
glycogen synthase kinase 3 beta,2030,P49841,Human,Cdk/Crk inhibitor,5943,,135473382,OCCOc1ccc(cc1)Cc1nc(=O)c2c(n1)n([nH]c2C(C)C)c1...,InChI=1S/C23H22Cl2N4O3/c1-13(2)20-19-22(29(28-...,VQNCIRRXQQTXEL-UHFFFAOYSA-N,...,IC50,,754.0,,=,,,,18077363,
glycogen synthase kinase 3 beta,2030,P49841,Human,Cdk1/5 inhibitor,5947,,438981,Nc1[nH]nc2c1nc1ccccc1n2,InChI=1S/C9H7N5/c10-8-7-9(14-13-8)12-6-4-2-1-3...,DWHVZCLBMTZRQM-UHFFFAOYSA-N,...,IC50,,1000.0,,=,,,,18077363,


Now we will create a YAML file to model the binding of GSK3B to each of the known inhibitors that target Human GSK3B. The general structure of the YAML file is as follows:

```yaml
version: 1
sequences:
    # GSK3B
    - protein:
        id: A 
        sequence: MSGRPRTTSFAESCKPVQQPSAFGSMKVSRDKDGSKVTTVVATPGQGPDRPQEVSYTDTKVIGNGSFGVVYQAKLCDSGELVAIKKVLQDKRFKNRELQIMRKLDHCNIVRLRYFFYSSGEKKDEVYLNLVLDYVPETVYRVARHYSRAKQTLPVIYVKLYMYQLFRSLAYIHSFGICHRDIKPQNLLLDPDTAVLKLCDFGSAKQLVRGEPNVSYICSRYYRAPELIFGATDYTSSIDVWSAGCVLAELLLGQPIFPGDSGVDQLVEIIKVLGTPTREQIREMNPNYTEFKFPQIKAHPWTKVFRPRTPPEAIALCSRLLEYTPTARLTPLEACAHSFFDELRDPNVKLPNGRDTPALFNFTTQELSSNPPLATILIPPHARIQAAASTPTNATAASDANTGDRGQTNNAASASASNST
    # GSK3B inhibitor
    - ligand:
      id: B
      smiles: 'INHIBITOR-SMILES'
```

In [2]:
for _, row in gsk3b_inhibitor_df.dropna(subset=["ligand_pubchem_cid"]).iterrows():
    if row["ligand_pubchem_cid"] != "nan" and row["target_uniprot"] == "Human":
        yaml_str = f"""# This is a comment
version: 1
sequences:
    # GSK3B
    - protein:
        id: A 
        sequence: MSGRPRTTSFAESCKPVQQPSAFGSMKVSRDKDGSKVTTVVATPGQGPDRPQEVSYTDTKVIGNGSFGVVYQAKLCDSGELVAIKKVLQDKRFKNRELQIMRKLDHCNIVRLRYFFYSSGEKKDEVYLNLVLDYVPETVYRVARHYSRAKQTLPVIYVKLYMYQLFRSLAYIHSFGICHRDIKPQNLLLDPDTAVLKLCDFGSAKQLVRGEPNVSYICSRYYRAPELIFGATDYTSSIDVWSAGCVLAELLLGQPIFPGDSGVDQLVEIIKVLGTPTREQIREMNPNYTEFKFPQIKAHPWTKVFRPRTPPEAIALCSRLLEYTPTARLTPLEACAHSFFDELRDPNVKLPNGRDTPALFNFTTQELSSNPPLATILIPPHARIQAAASTPTNATAASDANTGDRGQTNNAASASASNST
    # GSK3B inhibitor
    - ligand:
        id: B
        smiles: '{row["ligand_pubchem_cid"]}'
"""
        with open(f"/workspace/assets/test-files/boltz/GSK3B_inhibitor_{row['ligand']}.yaml", "w") as f:
            f.write(yaml_str)
        print(f"Generated YAML for {row['ligand']}")

Generated YAML for 5925
Generated YAML for 5926
Generated YAML for 5943
Generated YAML for 5947
Generated YAML for 5955
Generated YAML for 5976
Generated YAML for 5977
Generated YAML for 5978
Generated YAML for 5979
Generated YAML for 5980
Generated YAML for 5989
Generated YAML for 5991
Generated YAML for 6000
Generated YAML for 6929
Generated YAML for 7744
Generated YAML for 7819
Generated YAML for 7907
Generated YAML for 7958
Generated YAML for 8006
Generated YAML for 8007
Generated YAML for 8014
Generated YAML for 8015
Generated YAML for 8016
Generated YAML for 8017
Generated YAML for 8018
Generated YAML for 8019
Generated YAML for 8114
Generated YAML for 8115
Generated YAML for 8171
Generated YAML for 8478
Generated YAML for 9198
Generated YAML for 9810
Generated YAML for 9811
Generated YAML for 9923
Generated YAML for 10108
Generated YAML for 10688
Generated YAML for 11407
Generated YAML for 11412
Generated YAML for 11548
Generated YAML for 13774


In [3]:
import subprocess
import os

for _, row in gsk3b_inhibitor_df.dropna(subset=["ligand_pubchem_cid"]).iterrows():
    # Check if f"/workspace/assets/test-files/boltz/GSK3B_inhibitor_{row['ligand']}.yaml" exists
    yaml_file = f"/workspace/assets/test-files/boltz/GSK3B_inhibitor_{row['ligand']}.yaml"
    if os.path.exists(yaml_file):
        # Run the command
        command = [
            "boltz", "predict", yaml_file,
            "--recycling_steps", "10",
            "--diffusion_samples", "25",
            "--accelerator", "gpu",
            "--out_dir", f"/workspace/datasets/boltz1x/predict4/{row['ligand']}",
            "--cache", "/workspace/datasets/boltz1x/cache",
            "--use_msa_server"
        ]
        subprocess.run(command)

Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_5925.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:02 remaining: 00:00][A
100%|██████████| 1/1 [00:03<00:00,  3.13s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [22:13<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [22:13<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_5926.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:02 remaining: 00:00][A
100%|██████████| 1/1 [00:03<00:00,  3.35s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [24:02<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [24:02<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_5943.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:01 remaining: 00:00][A
100%|██████████| 1/1 [00:01<00:00,  1.74s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [24:34<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [24:34<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_5947.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:02 remaining: 00:00][A
100%|██████████| 1/1 [00:02<00:00,  2.42s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [23:14<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [23:14<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_5955.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:01 remaining: 00:00][A
100%|██████████| 1/1 [00:01<00:00,  1.84s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [23:49<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [23:49<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_5976.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:01 remaining: 00:00][A
100%|██████████| 1/1 [00:02<00:00,  2.34s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [23:45<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [23:45<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_5977.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:01 remaining: 00:00][A
100%|██████████| 1/1 [00:01<00:00,  1.73s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [23:04<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [23:04<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_5978.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:02 remaining: 00:00][A
100%|██████████| 1/1 [00:02<00:00,  2.40s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [23:26<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [23:26<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_5979.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:01 remaining: 00:00][A
100%|██████████| 1/1 [00:02<00:00,  2.36s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [23:42<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [23:42<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_5980.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:01 remaining: 00:00][A
100%|██████████| 1/1 [00:02<00:00,  2.48s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [23:45<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [23:45<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_5989.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:01 remaining: 00:00][A
100%|██████████| 1/1 [00:02<00:00,  2.36s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [23:41<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [23:41<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_5991.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:01 remaining: 00:00][A
100%|██████████| 1/1 [00:02<00:00,  2.26s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [23:43<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [23:43<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_6000.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:01 remaining: 00:00][A
100%|██████████| 1/1 [00:01<00:00,  1.71s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [23:28<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [23:28<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_6929.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:02 remaining: 00:00][A
100%|██████████| 1/1 [00:02<00:00,  2.45s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [23:48<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [23:48<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_7744.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:01 remaining: 00:00][A
100%|██████████| 1/1 [00:02<00:00,  2.59s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [24:55<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [24:55<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_7819.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:01 remaining: 00:00][A
100%|██████████| 1/1 [00:02<00:00,  2.29s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [24:02<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [24:02<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_7907.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:01 remaining: 00:00][A
100%|██████████| 1/1 [00:01<00:00,  1.86s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [24:38<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [24:38<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_7958.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:01 remaining: 00:00][A
100%|██████████| 1/1 [00:02<00:00,  2.31s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [24:41<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [24:41<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_8006.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:01 remaining: 00:00][A
100%|██████████| 1/1 [00:02<00:00,  2.50s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [24:22<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [24:22<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_8007.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:01 remaining: 00:00][A
100%|██████████| 1/1 [00:02<00:00,  2.40s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [24:02<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [24:02<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_8014.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:01 remaining: 00:00][A
100%|██████████| 1/1 [00:02<00:00,  2.13s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [24:16<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [24:16<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_8015.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:01 remaining: 00:00][A
100%|██████████| 1/1 [00:02<00:00,  2.37s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [24:09<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [24:09<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_8016.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:01 remaining: 00:00][A
100%|██████████| 1/1 [00:02<00:00,  2.40s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [24:54<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [24:54<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_8017.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:01 remaining: 00:00][A
100%|██████████| 1/1 [00:02<00:00,  2.39s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [24:21<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [24:21<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_8018.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:01 remaining: 00:00][A
100%|██████████| 1/1 [00:01<00:00,  1.71s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [23:59<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [23:59<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_8019.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:03 remaining: 00:00][A
100%|██████████| 1/1 [00:03<00:00,  3.82s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [24:26<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [24:26<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_8114.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:02 remaining: 00:00][A
100%|██████████| 1/1 [00:02<00:00,  2.88s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [24:00<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [24:00<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_8115.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:01 remaining: 00:00][A
100%|██████████| 1/1 [00:01<00:00,  1.89s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [24:05<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [24:05<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_8171.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:02 remaining: 00:00][A
100%|██████████| 1/1 [00:02<00:00,  2.55s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [24:11<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [24:11<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_8478.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:01 remaining: 00:00][A
100%|██████████| 1/1 [00:02<00:00,  2.39s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [24:41<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [24:41<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_9198.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:02 remaining: 00:00][A
100%|██████████| 1/1 [00:02<00:00,  2.47s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [23:59<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [23:59<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_9810.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:03 remaining: 00:00][A
100%|██████████| 1/1 [00:03<00:00,  3.59s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [25:10<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [25:10<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_9811.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:02 remaining: 00:00][A
100%|██████████| 1/1 [00:02<00:00,  2.59s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [24:03<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [24:03<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_9923.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:01 remaining: 00:00][A
100%|██████████| 1/1 [00:02<00:00,  2.42s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [24:48<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [24:48<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_10108.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:01 remaining: 00:00][A
100%|██████████| 1/1 [00:01<00:00,  1.80s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [24:08<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [24:08<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_10688.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:02 remaining: 00:00][A
100%|██████████| 1/1 [00:02<00:00,  2.59s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [25:04<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [25:04<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_11407.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:02 remaining: 00:00][A
100%|██████████| 1/1 [00:02<00:00,  2.52s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [24:52<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [24:52<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_11412.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:01 remaining: 00:00][A
100%|██████████| 1/1 [00:01<00:00,  1.91s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [24:47<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [24:47<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_11548.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:02 remaining: 00:00][A
100%|██████████| 1/1 [00:02<00:00,  2.56s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [28:53<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [28:53<00:00,  0.00it/s]
Checking input data.
Running predictions for 1 structure
Processing input data.


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A

Generating MSA for /workspace/assets/test-files/boltz/GSK3B_inhibitor_13774.yaml with 1 protein entities.



COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?][A
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:02 remaining: 00:00][A
100%|██████████| 1/1 [00:02<00:00,  2.59s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will 

Predicting DataLoader 0: 100%|██████████| 1/1 [30:41<00:00,  0.00it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|██████████| 1/1 [30:41<00:00,  0.00it/s]


As an example of the output, let's take a look at GSK3B_inhibitor 5947. 

![image](../assets/images/GSK3B_inhibitor_5947.png)

Lets analyse the interaction between GSK3B and the inhibitor in a more detailed manner using python function `analyze_protein_ligand_interactions`

In [2]:
import sys
sys.path.append("/workspace")
from utils.protein_analysis import analyze_protein_ligand_interactions

df = analyze_protein_ligand_interactions('../assets/test-files/boltz/GSK3B_inhibitor_5947_model_0.cif', ligand_name='LIG')
df

Found 1 ligand residue(s): ['LIG']

Protein-Ligand Interaction Summary:
Total interacting residues: 18
Electrostatic interactions: 3
Hydrogen bond interactions: 1
Hydrophobic interactions: 8
Van der Waals contacts: 9
Average interaction distance: 4.21 Å
Closest interaction: 2.61 Å

Top 5 strongest interactions:
ASP133 - LIG: 2.61Å (Electrostatic;Hydrogen_Bond;Van_der_Waals)
VAL135 - LIG: 3.26Å (Hydrophobic;Van_der_Waals)
LEU188 - LIG: 3.40Å (Hydrophobic;Van_der_Waals)
VAL110 - LIG: 3.47Å (Hydrophobic;Van_der_Waals)
ALA83 - LIG: 3.55Å (Hydrophobic;Van_der_Waals)


Unnamed: 0,Protein_Residue,Protein_Position,Protein_ICode,Ligand_Name,Ligand_Chain,Ligand_Position,Ligand_ICode,Min_Distance_A,Closest_Atoms,Interaction_Types,...,Has_Electrostatic,Has_Hydrogen_Bond,Has_Hydrophobic,Has_VdW,Protein_Type,Electrostatic_Score,HBond_Score,Hydrophobic_Score,Avg_Distance,Contact_Count
0,ASP,133,,LIG,B,1,,2.61,O-N20,Electrostatic;Hydrogen_Bond;Van_der_Waals,...,True,True,False,True,Negative,2.83,1.74,0.0,9.26,4
1,VAL,135,,LIG,B,1,,3.26,N-N21,Hydrophobic;Van_der_Waals,...,False,False,True,True,Hydrophobic,0.0,0.0,0.35,7.17,10
2,LEU,188,,LIG,B,1,,3.4,CD1-C15,Hydrophobic;Van_der_Waals,...,False,False,True,True,Hydrophobic,0.0,0.0,0.32,7.05,11
3,VAL,110,,LIG,B,1,,3.47,CG2-N20,Hydrophobic;Van_der_Waals,...,False,False,True,True,Hydrophobic,0.0,0.0,0.31,8.38,1
4,ALA,83,,LIG,B,1,,3.55,CB-N20,Hydrophobic;Van_der_Waals,...,False,False,True,True,Hydrophobic,0.0,0.0,0.29,6.88,5
5,ASP,200,,LIG,B,1,,4.06,CB-C12,Electrostatic,...,True,False,False,False,Negative,1.62,0.0,0.0,7.84,4
6,PHE,67,,LIG,B,1,,3.75,CZ-C11,Hydrophobic;Van_der_Waals,...,False,False,True,True,Aromatic,0.0,0.0,0.25,8.75,7
7,VAL,70,,LIG,B,1,,3.87,CG2-C11,Hydrophobic;Van_der_Waals,...,False,False,True,True,Hydrophobic,0.0,0.0,0.23,6.54,10
8,LYS,85,,LIG,B,1,,4.59,CD-C14,Electrostatic,...,True,False,False,False,Positive,1.17,0.0,0.0,7.94,0
9,LEU,132,,LIG,B,1,,4.02,CB-N20,Hydrophobic,...,False,False,True,False,Hydrophobic,0.0,0.0,0.2,7.72,1


In [None]:
counts = (
    df
    .loc[df["Interaction_Types"] != "None", "Protein_Position"]
    .value_counts()
)
# Convert to DataFrame
interaction_counts = counts.reset_index()
interaction_counts.columns = ['Position', 'Count']
interaction_counts["ID"] = 5947
interaction_counts.sort_values(by='Count', ascending=False).head(10)

Unnamed: 0,Position,Count,ID
0,133,1,5947
1,135,1,5947
2,188,1,5947
3,110,1,5947
4,83,1,5947
5,200,1,5947
6,67,1,5947
7,70,1,5947
8,85,1,5947
9,132,1,5947


Notice, none of the residues in the are the ones makign contact netween GSK3B and FRAT1. This is a good indication that the inhibitor is not binding to the same site as FRAT1. Lets go ahead and run the same analysis for all the inhibitors in the [GSK3B_inhibitors.csv](../assets/test-files/GSK3B_inhibitors.csv) file and see if any of them bind to the same site as FRAT1.

In [12]:
import pandas as pd
import sys
sys.path.append("/workspace")
from utils.protein_analysis import analyze_protein_ligand_interactions

# Analyze interactions for all ligands in GSK3B_inhibitor.csv
gsk3b_inhibitor_df = pd.read_csv("/workspace/assets/test-files/GSK3B_inhibitor.csv")
interaction_df = []
for _, row in gsk3b_inhibitor_df.dropna(subset=["ligand_pubchem_cid"]).iterrows():
    # '../assets/test-files/boltz/GSK3B_inhibitor_5947_model_0.cif', ligand_name='LIG'
    df = analyze_protein_ligand_interactions(f'../datasets/boltz1x/predict4/{row["ligand"]}/boltz_results_GSK3B_inhibitor_{row["ligand"]}/predictions/GSK3B_inhibitor_{row["ligand"]}/GSK3B_inhibitor_{row["ligand"]}_model_0.cif', ligand_name="LIG")
    if df is None or df.empty:
        print(f"No interactions found for {row['ligand']}")
        continue
    counts = (
        df
        .loc[df["Interaction_Types"] != "None", "Protein_Position"]
        .value_counts()
    )
    # Convert to DataFrame
    interaction_counts = counts.reset_index()
    interaction_counts.columns = ['Position', 'Count']
    interaction_counts["ID"] = row["ligand"]
    interaction_df.append(interaction_counts)

# Concatenate all DataFrames in interaction_df
interaction_df = pd.concat(interaction_df, ignore_index=True)
# Save to CSV
interaction_df.to_csv('/workspace/assets/test-files/boltz/GSK3B_inhibitor_interactions.csv', index=False)
# Display the top 10 interactions
interaction_df.head(10)

Found 1 ligand residue(s): ['LIG']

Protein-Ligand Interaction Summary:
Total interacting residues: 23
Electrostatic interactions: 6
Hydrogen bond interactions: 3
Hydrophobic interactions: 8
Van der Waals contacts: 13
Average interaction distance: 4.07 Å
Closest interaction: 2.59 Å

Top 5 strongest interactions:
LYS85 - LIG: 3.15Å (Electrostatic;Hydrogen_Bond;Van_der_Waals)
ASP200 - LIG: 3.15Å (Electrostatic;Hydrogen_Bond;Van_der_Waals)
ASP133 - LIG: 3.71Å (Electrostatic;Van_der_Waals)
VAL135 - LIG: 2.59Å (Hydrophobic;Van_der_Waals)
TYR134 - LIG: 3.50Å (Hydrogen_Bond;Van_der_Waals)
Found 1 ligand residue(s): ['LIG']

Protein-Ligand Interaction Summary:
Total interacting residues: 24
Electrostatic interactions: 6
Hydrogen bond interactions: 3
Hydrophobic interactions: 8
Van der Waals contacts: 14
Average interaction distance: 4.09 Å
Closest interaction: 2.69 Å

Top 5 strongest interactions:
LYS85 - LIG: 3.02Å (Electrostatic;Hydrogen_Bond;Van_der_Waals)
ASP200 - LIG: 3.15Å (Electrostatic

Unnamed: 0,Position,Count,ID
0,85,1,5925
1,200,1,5925
2,133,1,5925
3,135,1,5925
4,134,1,5925
5,132,1,5925
6,136,1,5925
7,188,1,5925
8,141,1,5925
9,137,1,5925


Let see if any of the inhibitors bind to the same site as FRAT1. We will use the dataframe we just created analyzing all the inhibitors interactions.

In [13]:
gsk3b_frat_sites = [263,270,297,296,228,229,295,264,266,267,294,292]
[row["ID"] for i, row in interaction_df.iterrows() if row["Position"] in gsk3b_frat_sites]

[]

It looks like none of the inhibitors bind to the same site as FRAT1. This is a good indication that the inhibitors are not binding to the same site as FRAT1. However, that means that if we wanted to design a new inhibitor that binds to the same site as FRAT1, we would need to use a different approach.

Let quickly look at where the inhbitors are binding to GSK3B. Below is the residue sites where the inhibitors are binding to GSK3B. Then we will also use a UMAP to see if we can categorize the inhibitors based on their binding sites.

In [15]:
interaction_df["Position"].value_counts().head(10)

Position
200    40
133    40
188    40
135    40
134    40
132    40
138    40
62     40
83     40
70     40
Name: count, dtype: int64

In [19]:
# Create a numpy array for each ligand, 
# where the index is the position and the value is the count
import numpy as np
ligand_counts = {}
for _, row in interaction_df.iterrows():
    if row["ID"] not in ligand_counts:
        # Find the maximum position to determine the size of the array
        max_position = interaction_df["Position"].max()
        ligand_counts[row["ID"]] = np.zeros(max_position)  # Assuming positions are 0-299
    ligand_counts[row["ID"]][row["Position"]-1] = row["Count"]
# Convert to DataFrame for better visualization
ligand_counts_df = pd.DataFrame.from_dict(ligand_counts, orient='index')
ligand_counts_df.head(10)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,192,193,194,195,196,197,198,199,200,201
5925,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0
5926,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0
5943,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0
5947,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0
5955,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0
5976,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0
5977,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0
5978,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0
5979,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0
5980,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0


In [20]:
# Lets create UMAP, let start with installing umap-learn
!pip install umap-learn plotly
import umap
from plotly.express import scatter
# Create a UMAP model
umap_model = umap.UMAP(n_neighbors=5, min_dist=0.1, metric='euclidean')
# Fit and transform the ligand counts
umap_embeddings = umap_model.fit_transform(ligand_counts_df.values)
# Create a DataFrame for the UMAP embeddings
umap_df = pd.DataFrame(umap_embeddings, columns=['UMAP1', 'UMAP2'])
# Add the ligand IDs to the DataFrame
umap_df['Ligand'] = ligand_counts_df.index
# Plot the UMAP embeddings
scatter(umap_df, x='UMAP1', y='UMAP2', color='Ligand', title='UMAP of Ligand Counts', hover_data=['Ligand'], height=600, width=800)

Collecting umap-learn
  Downloading umap_learn-0.5.7-py3-none-any.whl (88 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m88.8/88.8 KB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting plotly
  Downloading plotly-6.1.1-py3-none-any.whl (16.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.1/16.1 MB[0m [31m30.8 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting scikit-learn>=0.22
  Downloading scikit_learn-1.6.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.5/13.5 MB[0m [31m31.9 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting pynndescent>=0.5
  Downloading pynndescent-0.5.13-py3-none-any.whl (56 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.9/56.9 KB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
Collecting narwhals>=1.15.1
  Downloading narwhals-1.40.0-py3-none-any.whl (357 kB)


## Appendix

```pymol
# PyMOL Commands for Highlighting Inter-chain Interactions
# Load your structure first: load your_structure.pdb

# =====================================================
# 1. ELECTROSTATIC INTERACTIONS
# =====================================================

# Define charged residues
select positive_A, chain A and resn ARG+LYS+HIS
select negative_A, chain A and resn ASP+GLU
select positive_B, chain B and resn ARG+LYS+HIS  
select negative_B, chain B and resn ASP+GLU

# Find electrostatic interactions (typically within 4-6 Å)
select electrostatic_A, (positive_A within 5 of negative_B) or (negative_A within 5 of positive_B)
select electrostatic_B, (positive_B within 5 of negative_A) or (negative_B within 5 of positive_A)

# Combine and highlight electrostatic interactions
select electrostatic_pairs, electrostatic_A or electrostatic_B
color red, electrostatic_pairs
show sticks, electrostatic_pairs

# =====================================================
# 2. HYDROGEN BONDING INTERACTIONS
# =====================================================

# Define polar residues capable of hydrogen bonding
select polar_A, chain A and resn SER+THR+ASN+GLN+TYR+ASP+GLU+ARG+LYS+HIS+TRP
select polar_B, chain B and resn SER+THR+ASN+GLN+TYR+ASP+GLU+ARG+LYS+HIS+TRP

# Find potential hydrogen bonding residues (within 3.5 Å for strong H-bonds)
select hbond_A, polar_A within 3.5 of polar_B
select hbond_B, polar_B within 3.5 of polar_A

# Combine and highlight hydrogen bonding residues
select hbond_pairs, hbond_A or hbond_B
color blue, hbond_pairs
show sticks, hbond_pairs

# Optional: Use PyMOL's built-in hydrogen bond detection
distance hbonds, chain A, chain B, mode=2, cutoff=3.5, label=0
hide labels, hbonds

# =====================================================
# 3. HYDROPHOBIC INTERACTIONS
# =====================================================

# Define hydrophobic residues
select hydrophobic_A, chain A and resn ALA+VAL+LEU+ILE+PHE+TRP+MET+PRO
select hydrophobic_B, chain B and resn ALA+VAL+LEU+ILE+PHE+TRP+MET+PRO

# Find hydrophobic interactions (typically within 4-5 Å)
select hydrophobic_A_int, hydrophobic_A within 4.5 of hydrophobic_B
select hydrophobic_B_int, hydrophobic_B within 4.5 of hydrophobic_A

# Combine and highlight hydrophobic interactions
select hydrophobic_pairs, hydrophobic_A_int or hydrophobic_B_int
color yellow, hydrophobic_pairs
show sticks, hydrophobic_pairs

# =====================================================
# 4. VISUALIZATION ENHANCEMENTS
# =====================================================

# Show the interface region
select interface, (chain A within 5 of chain B) or (chain B within 5 of chain A)
show surface, interface
set transparency, 0.7, interface

# Different representation for each interaction type
show spheres, electrostatic_pairs
show sticks, hbond_pairs  
show sticks, hydrophobic_pairs

# Set sphere scale for electrostatic interactions
set sphere_scale, 0.3, electrostatic_pairs

# =====================================================
# 5. COMPREHENSIVE VIEW
# =====================================================

# Create a selection combining all interaction types
select all_interactions, electrostatic_pairs or hbond_pairs or hydrophobic_pairs

# Label the interacting residues
label all_interactions and name CA, "%s%s" % (resn, resi)

# Zoom to the interface
zoom interface

# Clean up intermediate selections
delete positive_A, negative_A, positive_B, negative_B
delete polar_A, polar_B, hydrophobic_A, hydrophobic_B
delete electrostatic_A, electrostatic_B, hbond_A, hbond_B
delete hydrophobic_A_int, hydrophobic_B_int

# =====================================================
# 6. ALTERNATIVE: DISTANCE-BASED ANALYSIS
# =====================================================

# More comprehensive analysis using distance objects
distance salt_bridges, (chain A and resn ARG+LYS+HIS), (chain B and resn ASP+GLU), cutoff=4.0, label=0
distance salt_bridges2, (chain A and resn ASP+GLU), (chain B and resn ARG+LYS+HIS), cutoff=4.0, label=0

# Hide distance labels but keep the objects for analysis
hide labels, salt_bridges*

# Color distance objects
color red, salt_bridges*
```