## Preparation

Before we can start an AlphaFold3 calculations, the terms of use for AlphaFold3 requier that each user must obtain their own copy of the trained model parameters:

1. Fill out the form [https://forms.gle/svvpY4u2jsHEwWYS6](https://forms.gle/svvpY4u2jsHEwWYS6)
2. Once access has been granted, download the model parameters file: af3.bin.zst
3. Store the model parameters file in a directory on the cluster, for example in $HOME/af3-models

AlphaFold 3 will not run without the model parameters file.

In [None]:
from pathlib import Path

ALPHAFOLD_MODEL_DIR = "af3models"

Next we need to define where AlphaFold finds our input data and where the output files are written to. You can see these files in the file browser on the left. If you change these names, remember to change them in the second notebook as well.

In [None]:
ALPHAFOLD_WORKING_DIR = Path("afold_test")
ALPHAFOLD_RESULTS_DIR_PART1 = ALPHAFOLD_WORKING_DIR / "output"

## Input File
For each run, the protein structure needs to be supplied in the AlphaFold3 Input file. The structure of these files can be found [in the AlphaFold3 github](https://github.com/google-deepmind/alphafold3/blob/main/docs/input.md), with examples [here](https://github.com/google-deepmind/alphafold3/blob/main/docs/input.md#full-example). Note that larger prediction runs might require different settings in the start of the jupyter session!

Fill in the sequence and chain ids, and remember the project name in "name".

In [None]:
input_json = """
{
"name": "test",
"sequences": [
  {   
    "protein": 
    {
      "id": ["A", "B"],
      "sequence": "GMRESYANENQFGFKTINSDIHKIVIVGGYGKLGGLFARYLRASGYPISILDREDWAVAESILANADVVIVSVPINLTLETIERLKPYLTENMLLADLTSVKREPLAKMLEVHTGAVLGLHPMFGADIASMAKQVVVRCDGRFPERYEWLLEQIQIWGAKIYQTNATEHDHNMTYIQALRHFSTFANGLHLSKQPINLANLLALSSPIYRLELAMIGRLFAQDAELYADIIMDKSENLAVIETLKQTYDEALTFFENNDRQGFIDAFHKVRDWFGDYSEQFLKESRQLLQQANDLKQG"
    }
  }
],
"modelSeeds": [1],
"dialect": "alphafold3",
"version": 1
}

"""

ALPHAFOLD_JSON_PATH = ALPHAFOLD_WORKING_DIR / "input.json"  # file name?

with open(ALPHAFOLD_JSON_PATH, "w") as file:
    file.write(input_json)

Now we combine the information on input and output directories to generate the run file to start the calculation:

In [None]:
run_file = f"""
#!/bin/bash
# AlphaFold 3 - Part 1: Alignment (CPU only)

# Load software module 
module load bio/alphafold/3.0.1

# Run with option --norun_inference to generate Multiple Sequence Alignments (MSAs) and templates
python $ALPHAFOLD_BIN_DIR/run_alphafold.py \\
    --json_path={str(ALPHAFOLD_JSON_PATH)} \\
    --db_dir=$ALPHAFOLD_DATABASES \\
    --model_dir={str(ALPHAFOLD_MODEL_DIR)}  \\
    --output_dir={str(ALPHAFOLD_RESULTS_DIR_PART1)}  \\
    --norun_inference
"""

ALPHAFOLD_RUN_PATH = ALPHAFOLD_WORKING_DIR / "run.sh"  # file name!

with open(ALPHAFOLD_RUN_PATH, "w") as file:
    file.write(run_file)

## Run the Multi Sequence Alignment
Execute the cell below to start the alignment job. Good luck!


In [None]:
! bash {ALPHAFOLD_RUN_PATH}

## Next steps
Once the MSA alignment ends with "Done processing" the next step on a GPU machine can be started.

Keep an eye out for "Output directory ... exists and non-empty, using instead ..." in the cell output above, since we need the output files for the next step!