In [None]:
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/jmou2/PaviaProteinDesign/blob/main/01_Monday/task_1_af2.ipynb)

# AlphaFolding a protein structure

Our goal in this task is to understand how to make, use and analyze AlphaFold 2 (AF2) predictions of protein strucutres. Alphafold is a deep learning model that gives a single (or multiple ranked) protein structure(s) when given the amino acid sequence of a protein. 

While the primary use of AF2 in many biology labs is to predict the structures of proteins without a solved structure, numerous publications have now shown that it can be used to predict the stability of a designed protein. By folding the structure without any Multiple Sequence Alignment (MSA) information, otherwise known as single-sequence-mode, we can understand how confidently a particular sequence folds into its cognate structure. Proteins that are confidently predicted to fold into the correct structure with a low RMSD are often more stable, monomeric, and soluble in the lab. 


### Setup
Run the cell below to download and install ProDy.

In [None]:
! pip install prody

import prody as pr

### 1 - Choosing a protein and cleaning it up

Download any structure of your choosing from the [PDB](https://www.rcsb.org/). Load it into a prody object and clean it up by:
- removing any heteroatoms like waters, ligands, or solvents
- deleting extra chains in the asymmetric unit (using a multiple chain protein is ok, but may be harder to fold and align)

Save the cleaned up protein as a pdb file. 

For simplicity, avoid large proteins (stay <1000 amino acids) and choose a single-chain protein, or pick 1-2 chains within a protein to fold.



In [None]:
# download the protein


# clean up the protein


# save the protein as a pdb file

### 2 - Alphafold your protein

We will use an [Colabfold](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb) to fold our protein. 

Prepare the sequence of your protein using ProDy below and paste it into the `query_sequence` input in the ColabFold notebook. Make sure `template_mode` is set to none. 

Next, change the `msa_mode` to `single_sequence`. 

Click "Runtime" --> "Run all". Note that there are additional explanations at the bottom of the colabfold notebook.

In [None]:
# get the sequence of your cleaned up protein using ProDy


### 3 - Analyze the Alphafolded structure 

Read in your rank 1 predicted structure into a ProDy object. Superpose the predicted structure with the cleaned up protein and calculate the RMSD of the CA atoms. Save the superposed prediction and open both the original and superposed predicted structures in pymol.

In pymol, color the predicted structure by b-factor. This is a per-residue representation of the confidence of AF2 predictions. 

What can you notice about the confidence for different secondary structures? How does the confidence of the structure compare with its alignment? 

If only part of your structure is folded properly, you may want to perform the alignemnt only using a subset of atoms. In this case, you can use ProDy to select the atoms before performing the transformation calculation (for instance, calculate the transformation using only CA atoms in residues 10-200). The transformation will always be applied to all the atoms in the prody object.


In [None]:
# parse the alphafolded structure 


# superpose the predicted structure 


# calculate the RMSD


# save the superposed prediction

### 4 - Bonus: fold mulitple proteins and compare their AlphaFold RMSDs

Pick a set of proteins for which you would like to compare the "foldability". This could be a protein which has a thermophilic and non-thermophilic counterpart; proteins of different folds or sizes; a rigid versus flexible protein; or a de novo versus a native protein. 

In [None]:
# your code here