# Prepare environment

In [None]:
import os
import sys

REPO_ADDRESS = "https://github.com/sokrypton/ColabDesign.git"
REPO_DIR = "/home/ma/git/computation/ColabDesign"

if not os.path.exists(REPO_DIR):
    !git clone {REPO_ADDRESS}

%cd {REPO_DIR}

In [None]:
PARAMS_DIR = "/mnt/nas/alphafold/alphafold_params_2022-12-06/"
PARAMS_LOCAL_DIR = "params"

# Create a symbolic link to the params directory
if not os.path.exists(PARAMS_LOCAL_DIR):
    !ln -s {PARAMS_DIR} {PARAMS_LOCAL_DIR}

In [None]:
import os
from colabdesign import mk_afdesign_model, clear_mem
from IPython.display import HTML
import numpy as np

In [None]:
# Download PDB file from RCSB, given a PDB ID
def download_pdb(pdb_id):
    pdb_id = pdb_id.lower()
    pdb_filename = f"{pdb_id}.pdb"
    if not os.path.exists(pdb_filename):
        !wget https://files.rcsb.org/download/{pdb_filename}
    return pdb_filename

In [None]:
download_pdb("6LU7")

The losses being optimised in the design process are defined in the `model.set_opt` function. The following losses are available:

- general losses
  - *pae*       - minimizes the predicted alignment error
  - *plddt*     - maximizes the predicted LDDT (local distance difference test)
  - *pae* and *plddt* values are between 0 and 1 (where lower is better for both)

- fixbb specific losses
  - *dgram_cce* - minimizes the categorical-crossentropy between predicted distogram and one extracted from pdb.
  - *fape*      - minimize difference between coordinates (frame aligned point error)
  - we find *dgram_cce* loss to be more stable for design (compared to *fape*)

- hallucination specific losses
  - *con*       - maximize `1` contacts per position. `model.set_opt("con",num=1)`

- binder specific losses
  - *pae* - minimize PAE at interface and within binder
  - *con* - - maximize `2` contacts per binder position, within binder. `model.set_opt("con",num=2)`
  - *i_con* - maximize `1` contacts per binder position `model.set_opt("i_con",num=1)`

- partial hallucination specific losses
  - *sc_fape* - sidechain-specific fape

The pAE (predicted alignment error) measures

distogram is defined as 

In AlphaFold, the `plddt` score measures the percentage of aligned residues that are within 8 angstroms of the correct position. The `rmsd` score measures the root-mean-square deviation of the aligned residues from the correct position. The `confidence` score is the probability that the model is correct. The `confidence` score is not used in the AlphaFold ranking, but is included for completeness.

# Fixed backbone design

## 1TEN

For a given protein backbone, generate/design a new sequence that AlphaFold thinks folds into that conformation. 

In [None]:
clear_mem()
af_model = mk_afdesign_model(protocol="fixbb")
af_model.prep_inputs(pdb_filename=download_pdb("1TEN"), chain="A")

print("length",  af_model._len)
print("weights", af_model.opt["weights"])

In [None]:
af_model.restart()
af_model.design_3stage()

In [None]:
af_model.plot_traj()  

The `plot_traj` function plots the training trajectories.

In [None]:
af_model.save_pdb(f"tenascin_{af_model.protocol}.pdb")

In [None]:
HTML(af_model.animate())

In [None]:
af_model.get_seqs()

# Hallucination