<a href="https://colab.research.google.com/github/phbradley/alphafold_finetune/blob/main/alphafold_ft_colab_pipeline_v1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# alphafold_finetune colab

This draft colab notebook has examples of fine-tuning and binder prediction. Hopefully it will give an idea of one route to installing the software, but note that the colab machines already have the GPU stuff (CUDA/CUDNN) set up, which simplifies things. Starting from scratch may require additional GPU-specific installation depending on the machine.

The plan for the future is to add a "forms" interface that will let users run structure/binding predictions for peptide-MHC targets starting from the allele and peptide information.  

This notebook is based on the AlphaFold colab notebook https://colab.research.google.com/github/deepmind/alphafold/blob/main/notebooks/AlphaFold.ipynb -- many thanks to the AlphaFold developers for creating and sharing their code and related content.



## Setup

Start by running the 3 cells below to set up the code and download data.

In [None]:
!curl -L https://github.com/conda-forge/miniforge/releases/download/23.3.1-1/Mambaforge-23.3.1-1-Linux-x86_64.sh > /tmp/mamba.sh
!rm -rf /opt/conda
!/bin/bash /tmp/mamba.sh -b -p /opt/conda/

with open('/tmp/requirements.txt', 'w') as f:
  f.write('''
absl-py
biopython
chex
dm-haiku
dm-tree
immutabledict
jax
ml-collections
numpy
optax
pandas
py3dmol
protobuf
scipy
typing-extensions
''')

!/opt/conda/bin/conda install -y -c conda-forge --file /tmp/requirements.txt

In [None]:
# Set environment variables before running any other code.
import os
os.environ['TF_FORCE_UNIFIED_MEMORY'] = '1'
os.environ['XLA_PYTHON_CLIENT_MEM_FRACTION'] = '4.0'

from IPython.utils import io
import os
import subprocess
import tqdm.notebook

TQDM_BAR_FORMAT = '{l_bar}{bar}| {n_fmt}/{total_fmt} [elapsed: {elapsed} remaining: {remaining}]'

try:
  with tqdm.notebook.tqdm(total=100, bar_format=TQDM_BAR_FORMAT) as pbar:
    with io.capture_output() as captured:
      # Install py3dmol.
      %shell pip install py3dmol
      pbar.update(2)

      # Install OpenMM and pdbfixer.
      %shell rm -rf /opt/conda
      %shell wget -q -P /tmp \
        https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \
          && bash /tmp/Miniconda3-latest-Linux-x86_64.sh -b -p /opt/conda \
          && rm /tmp/Miniconda3-latest-Linux-x86_64.sh
      pbar.update(12)

      PATH=%env PATH
      %env PATH=/opt/conda/bin:{PATH}

      # phil changing to python 3.8 from 3.10 for compat with local versions...
      %shell conda install -qy conda==4.13.0 python=3.8

      #%shell conda install -qy conda==4.13.0 \
      #    && conda install -qy -c conda-forge \
      #      python=3.8
      pbar.update(80)

except subprocess.CalledProcessError:
  print(captured)
  raise

#print(captured)


In [None]:
GIT_REPO = 'https://github.com/phbradley/alphafold_finetune'

PARAMS_URLS = ['https://www.dropbox.com/s/e3uz9mwxkmmv35z/params_model_2_ptm.npz',
]

PARAMS_DIR = './alphafold_params/params'

try:
  with tqdm.notebook.tqdm(total=100, bar_format=TQDM_BAR_FORMAT) as pbar:
    with io.capture_output() as captured:
      %shell rm -rf alphafold_finetune
      %shell git clone --branch main {GIT_REPO} alphafold_finetune
      pbar.update(20)
      # Install the required versions of all dependencies.
      %shell pip3 install -r ./alphafold_finetune/requirements_colab_python38_v2.txt
      pbar.update(50)

      # Load parameters
      %shell mkdir --parents "{PARAMS_DIR}"
      for URL in PARAMS_URLS:
        PARAMS_PATH = os.path.join(PARAMS_DIR, os.path.basename(URL))
        %shell wget -O "{PARAMS_PATH}" "{URL}"
      pbar.update(15)

      # download alphafold fine-tune dataset
      %shell wget https://files.ipd.uw.edu/pub/alphafold_finetune_motmaen_pnas_2023/datasets_alphafold_finetune_v2_2023-02-20.tgz
      %shell tar -xzf datasets_alphafold_finetune_v2_2023-02-20.tgz
      %shell mv datasets_alphafold_finetune alphafold_finetune/
      pbar.update(15)

except subprocess.CalledProcessError:
  print(captured)
  raise

#print(captured)

import jax
if jax.local_devices()[0].platform == 'tpu':
  raise RuntimeError('Colab TPU runtime not supported. Change it to GPU via Runtime -> Change Runtime Type -> Hardware accelerator -> GPU.')
elif jax.local_devices()[0].platform == 'cpu':
  raise RuntimeError('Colab CPU runtime not supported. Change it to GPU via Runtime -> Change Runtime Type -> Hardware accelerator -> GPU.')
else:
  print(f'Running with {jax.local_devices()[0].device_kind} GPU')

# Make sure everything we need is on the path.
import sys
sys.path.append('/opt/conda/lib/python3.8/site-packages')




In [None]:
cd alphafold_finetune/

## This command fine-tunes AlphaFold's parameters for peptide-MHC binding prediction.

In [None]:
%shell python run_finetuning.py \
    --data_dir /content/alphafold_params/ \
    --outprefix testrun1 \
    --binder_intercepts 0.80367635 --binder_intercepts 0.43373787 \
    --freeze_binder  \
    --train_dataset datasets_alphafold_finetune/pmhc_finetune/combo_1and2_train.tsv \
    --valid_dataset datasets_alphafold_finetune/pmhc_finetune/combo_1and2_valid.tsv


## This command makes structure and binding predictions for a set of 10mer targets using the fine-tuned parameters from the paper.

In [None]:
%shell python run_prediction.py --targets examples/pmhc_hcv_polg_10mers/targets.tsv \
    --outfile_prefix polg_test2 --model_names model_2_ptm_ft \
    --model_params_files datasets_alphafold_finetune/params/mixed_mhc_pae_run6_af_mhc_params_20640.pkl \
    --ignore_identities