In [1]:
from pathlib import Path
import os

This Tutorial follows the Boltzgen example vanilla protein. For details see https://github.com/HannesStark/boltzgen 

First we need to define the working directory, and upload the 1g13.cif file to the working directory.

In [2]:
WORKING_DIR = Path.home() / "protein_design_w_Boltzgen" # created by user


You do not need to edit the cell below. Here we set up the environment and give names to input and output files.

In [3]:
BOTLZGEN_INPUT_FILE = "input.yaml"  # will be created in this notebook
BOLTZGEN_RUN_FILE = "run.sh"  # will be created in this notebook
BOLTZGEN_YAML_PATH = (
    WORKING_DIR / BOTLZGEN_INPUT_FILE
)  # will be created in this notebook
BOLTZGEN_RUN_PATH = (
    WORKING_DIR / BOLTZGEN_RUN_FILE
)  # will be created in this notebook

RESULTS_DIR = (
    WORKING_DIR / "workbench/test_run" 
) # will be created in this notebook


# check that the directories exist, if not print a warning
if not os.path.isdir(WORKING_DIR):
    print(
        f"Directory {WORKING_DIR} does not exist! Please create this directory."
    )

### Input File
For each run, you need to provide an input yaml file. The structure of these files and examples on how to use them can be found in the [Boltzgen github](https://github.com/HannesStark/boltzgen?tab=readme-ov-file#how-to-make-a-design-specification-yaml), with detailed explanations [here](https://github.com/HannesStark/boltzgen/blob/main/example/README.md). Note that larger prediction runs might require larger GPUs, which you can set in the start of the jupyter session on bwVisu. 


In [4]:
input_yaml = '''
entities:
  - protein: 
      id: C
      sequence: 80..140
  - file:
      path: 1g13.cif
       
      include: 
        - chain:
            id: A
'''
with open(BOLTZGEN_YAML_PATH, "w") as file:   
    print(BOLTZGEN_YAML_PATH)
    file.write(input_yaml)
    print(f"File written to {BOLTZGEN_YAML_PATH}.")

/home/hd/hd_hd/hd_aq354/protein_design/input.yaml
File written to /home/hd/hd_hd/hd_aq354/protein_design/input.yaml.


Now we combine the input file with the information on input and output directories to start the calculation. More information on available pipelines can be found in the [Boltzgen github](https://github.com/HannesStark/boltzgen?tab=readme-ov-file#all-command-line-arguments).

In [5]:
run_file = f'''
#!/bin/bash

boltzgen run {str(BOLTZGEN_YAML_PATH)}  \\
  --output {str(RESULTS_DIR)}  \\
  --protocol protein-anything \\
  --num_designs 10  \\
  --budget 2 \\
'''

with open(BOLTZGEN_RUN_PATH, "w") as file:
    file.write(run_file)
    print(f"File written to {BOLTZGEN_RUN_PATH}.")

File written to /home/hd/hd_hd/hd_aq354/protein_design/run.sh.


In [6]:
!which boltzgen

~/.local/bin/boltzgen


### Run the Boltzgen Prediction
Execute the cell below to start the prediction. This will take about 12-15 minutes. Good luck!

In [7]:
os.system(f'echo "Running file {BOLTZGEN_RUN_PATH}"')
os.system(f"bash {BOLTZGEN_RUN_PATH}")
# takes about 15 minutes

Running file /home/hd/hd_hd/hd_aq354/protein_design/run.sh


Traceback (most recent call last):
  File "/home/hd/hd_hd/hd_aq354/.local/bin/boltzgen", line 5, in <module>
    from boltzgen.cli.boltzgen import main
  File "/home/hd/hd_hd/hd_aq354/.local/lib/python3.12/site-packages/boltzgen/cli/boltzgen.py", line 46, in <module>
    import torch
  File "/home/hd/hd_hd/hd_aq354/.local/lib/python3.12/site-packages/torch/__init__.py", line 34, in <module>
    from typing_extensions import ParamSpec as _ParamSpec
ModuleNotFoundError: No module named 'typing_extensions'


256

### Next Steps
In your output directory you will find the directory final_ranked_designs which contains the overview pdf and csv files for interpretation.
