<a href="https://colab.research.google.com/github/phenix-project/Colabs/blob/main/alphafold2/AlphaFoldWithDensityMap.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### <center> <b> <font color='black'>  AlphaFold with a density map </font></b> </center>

<font color='green'>This notebook integrates Phenix model rebuilding with AlphaFold to improve AlphaFold modeling.  You upload a sequence and a density map (ccp4/mrc format) and it carries out cycles of AlphaFold modeling, rebuilding with the density map, and AlphaFold modeling with the rebuilt model as a template. In each cycle you get a new AlphaFold model and a rebuilt model.

To run this notebook you will need a Google account. Most of the demos can be run with a free account; to run with long protein chains or to run faster you can use a Colab Pro or Pro+ account.

To understand how this all works see the Phenix tutorial video ["AlphaFold changes everything"](https://youtu.be/9IExeA_A8Xs) and the [BioRxiv preprint](https://www.biorxiv.org/content/10.1101/2022.01.07.475350v2) on using AlphaFold with a density map.

You can run a demo of any one of 25 structures by selecting one in the second cell.  You then only need to select the demo and enter the Phenix download password.

This notebook is derived from [ColabFold](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb) and the DeepMind [AlphaFold2 Colab](https://colab.research.google.com/github/deepmind/alphafold/blob/main/notebooks/AlphaFold.ipynb).

-----------------
<b> <font color='black'> <center>Instructions for a simple run: </b><p><i>Note: If this is the first time you’re running this notebook, be sure to check out the helpful hints at the bottom of the page!</center>
</p></i></font>


1. Run the first cell to install condacolab and reboot the virtual machine. You need to do this <b><i>before</i></b> using <b><i>Run all</i></b> in step 3.

2. While the virtual machine is rebooting, select the "Basic Inputs" cell, type in a Phenix download password, and either (A) select a demo or (B) paste in a sequence, resolution and jobname. You can also edit the Options in the next cell if you want.

3. If you are not running a demo, open Google Drive in a new browser window, make a new folder called 
<b><i>ColabInputs</i></b>, and upload your map file (CCP4/MRC) there. Be sure the name of your map file starts with the jobname. You can alternatively upload your 
map file directly by 
making the input_directory field blank in the Options cell. 

4. Start your run by going up to the <b><i>Runtime</i></b> pulldown menu and selecting <b><i>Run all</i></b>.

5. Scroll down the page and follow what is going on.  If necessary, upload your map file when the Upload button appears below the "Setting up input files" form. 
If you use Google drive
for your input and output files you will be asked for permission.


-----------------
<b> <font color='black'> <center>Please cite the ColabFold and AlphaFold2 papers if you use this notebook:</center>
</font></b> 

- <font color='green'>[Mirdita, M.,  Ovchinnikov, S., Steinegger, M.(2021). ColabFold - Making protein folding accessible to all *bioRxiv*, 2021.08.15.456425](https://www.biorxiv.org/content/10.1101/2021.08.15.456425v2)</font> 

- <font color='green'> [Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021)](https://www.nature.com/articles/s41586-021-03819-2)
</font>
-----------------


In [None]:
#@title 1. Hit the triangle <b>Run</b> button to the left to install condacolab and reboot the virtual machine.  

#@markdown  You can edit the forms below while it is rebooting.

#@markdown  You may get questions about this notebook not being authored by Google and about needing a high-RAM environment.  Just click OK to go on.

#@markdown In 30 sec you get 3 black pop-up messages in the lower left corner of window about a crash (because of the reboot).
#@markdown Close the last one and you are ready to go with <b><i>Runtime</i></b> / <b><i>Run all</i></b> once you have entered all your options.

#@markdown <i>Normally leave box below at Standard</i>
custom_update = 'Standard' #@param {type:'string'} ['None','Standard','Latest']
if custom_update == 'None':
  custom_update = None

import os

print("\nINSTALLING BIOPYTHON")
!pip install biopython dm-haiku==0.0.5 ml-collections py3Dmol
print("\nINSTALLING JAX 0.3.15")
!pip install jax==0.3.15
!pip install jaxlib==0.3.15
print("\nDONE INSTALLING JAX 0.3.15")
print("\nPLEASE IGNORE ALL jax ERROR MESSAGES")

# Get the helper python files
import os
os.chdir("/content/")
file_name = 'phenix_colab_utils.py'
if os.path.isfile(file_name):
  os.remove(file_name)
os.environ['file_name'] = file_name
result = os.system("wget -qnc https://raw.githubusercontent.com/phenix-project/Colabs/main/alphafold2/$file_name")

print("About to install condacolab...")
import phenix_colab_utils as cu
cu.get_helper_files(custom_update = custom_update)

if custom_update:
  os.system("touch NEED_UPDATES")
elif os.path.isfile("NEED_UPDATES"):
  os.remove("NEED_UPDATES")
# get all the other helper files
cu.clear_python_caches()
cu.install_condacolab()
!touch STEP_1
print("Ready with condacolab installed...close the last of 3 crash messages in lower left corner when it comes up")

In [None]:
#@title 2. Basic inputs (Required)

import os
if not os.path.isfile("STEP_1"):
  raise AssertionError("Please run step 1 first")

os.chdir("/content")
from phenix_colab_utils import exit, get_map_name, get_demo_info, set_up_demo, make_four_char_name

if not os.path.isfile("STEP_1"):
  exit("Please run step 1 first...")


phenix_download_password='' #@param {type:"string"}

#@markdown <b><i>Demo run</i></b>: Select the structure to predict. <i>(Any text entered in the <b>Normal run</b> section will be ignored if you select a demo</i>).

demo_to_run = 'None (Demos with boxed maps, approx timings are for Colab; Pro is 2x faster, Pro+ 3x)' #@param ['None (Demos with boxed maps, approx timings are for Colab; Pro is 2x faster, Pro+ 3x)', '7mjs (EMDB 23883, 3.03 A, 132 residues, 2 hours, RMSD (A) Start: 3.2 End: 1.0)', '7lx5 (EMDB 23566, 3.44 A, 196 residues, 4 hours, RMSD (A) Start: 2.8 End: 2.2)', '7c2k (EMDB 30275, 2.93 A, 927 residues, 10 hours, RMSD (A) Start: 4.1 End: 1.8 NOTE: requires Colab Pro)', '7ev9 (EMDB 31325, 2.6 A, 382 residues, 5 hours, RMSD (A) Start: 2.0 End: 0.6)', '7kzz (EMDB 23093, 3.42 A, 281 residues, 3 hours, RMSD (A) Start: 2.7 End: 1.6)', '7mlz (EMDB 23914, 3.71 A, 196 residues, 4 hours, RMSD (A) Start: 3.1 End: 2.4)', '7l6u (EMDB 23208, 3.3 A, 311 residues, 10 hours, RMSD (A) Start: 2.5 End: 1.9)', '7mby (EMDB 23750, 2.44 A, 339 residues, 15 hours, RMSD (A) Start: 1.7 End: 0.6)', '7me0 (EMDB 23786, 2.48 A, 347 residues, 3 hours, RMSD (A) Start: 1.3 End: 0.3)', '7ls5 (EMDB 23502, 2.74 A, 243 residues, 10 hours, RMSD (A) Start: 1.0 End: 0.7)', '7n8i (EMDB 24237, 3 A, 106 residues, 1 hour, RMSD (A) Start: 0.4 End: 0.2)', '7lc6 (EMDB 23269, 3.7 A, 557 residues, 2 hours, RMSD (A) Start: 0.8 End: 0.6)', '7brm (EMDB 30160, 3.6 A, 257 residues, 5 hours, RMSD (A) Start: 4.2 End: 3.6)', '7lci (EMDB 23274, 2.9 A, 393 residues, 9 hours, RMSD (A) Start: 4.6 End: 3.0)', '7eda (EMDB 31062, 2.78 A, 334 residues, 4 hours, RMSD (A) Start: 4.0 End: 3.6)', '7l1k (EMDB 23110, 3.16 A, 149 residues, 2 hours, RMSD (A) Start: 0.6 End: 0.5)', '7m9c (EMDB 23723, 4.2 A, 257 residues, 3 hours, RMSD (A) Start: 1.4 End: 1.2)', '7lvr (EMDB 23541, 2.9 A, 441 residues, 12 hours, RMSD (A) Start: 1.1 End: 0.9)', '7rb9 (EMDB 24400, 3.76 A, 372 residues, 2 hours, RMSD (A) Start: 1.5 End: 1.4)', '7m7b (EMDB 23709, 2.95 A, 209 residues, 6 hours, RMSD (A) Start: 6.5 End: 6.2)', '7lsx (EMDB 23508, 3.61 A, 245 residues, 10 hours, RMSD (A) Start: 1.4 End: 1.5)', '7bxt (EMDB 30237, 4.2 A, 103 residues, 3 hours, RMSD (A) Start: 1.4 End: 2.2)', '7ku7 (EMDB 23035, 3.4 A, 269 residues, 6 hours, RMSD (A) Start: 18.5 End: 18.4)', '7lv9 (EMDB 23530, 4.5 A, 97 residues, RMSD (A) Start: 15.5 End: 15.5)', '7msw (EMDB 23970, 3.76 A, 635 residues, RMSD (A) Start: 27.0 End: 27.0)'] {type:"string"}
if demo_to_run.split()[0] != "None":
  jobname, sequence, resolution = set_up_demo(demo_to_run)
  is_demo = True
else:
  is_demo = False

#@markdown <b><i>Normal run</i></b>: enter sequence of chain to predict (at least 20 residues),
#@markdown resolution, name of this job. Upload your map file to a folder named
#@markdown <i>ColabInputs</i> in your Google Drive (or get it ready to upload when a button
#@markdown appears in cell 4).


if (not is_demo):
  sequence = '' #@param {type:"string"}
  resolution = '' #@param {type:"string"}
  jobname = '' #@param {type:"string"}

if resolution:
  try:
    resolution = float(resolution)
  except Exception as e:
    exit("Please supply a number for resolution")
else:
  resolution = None

query_sequence = sequence
password = phenix_download_password



# Check for required inputs
if not password:
  exit("Please supply a Phenix download password")
if not query_sequence and (not is_demo):
  exit("Please supply a demo or a query sequence, resolution and jobname")
if not resolution and (not is_demo):
  exit("Please supply a demo or a query sequence, resolution and jobname")
if not jobname and (not is_demo):
  exit("Please supply a demo or a query sequence, resolution and jobname")

# Make jobname have the expected format 
orig_jobname = jobname
jobname = make_four_char_name(jobname)
if orig_jobname != jobname:
  print("Changing jobname to '%s' to match required format" %(jobname))

if (is_demo):
  print("\nRunning demo for %s" %(jobname))

print("\nJOBNAME:", jobname)
print("RESOLUTION:",resolution)
print("SEQUENCE:",query_sequence)

# Save all parameters in a dictionary
params = {}
for p in ['resolution','jobname', 'password', 'query_sequence']:
  params[p] = locals().get(p,None)
! touch STEP_2
! rm -f STEP_3


In [None]:
#@title 3. Options



import os
if not os.path.isfile("STEP_1"):
  raise AssertionError("Please run step 1 first")
  
from phenix_colab_utils import exit
if not os.path.isfile("STEP_2") or \
    not 'is_demo' in locals().keys():
  exit("Please run step 2 first...")

#@markdown <b> A. Commonly-used options </b>

#@markdown Include templates from the PDB (a good idea if there are similar proteins in the PDB)
include_templates_from_pdb = False #@param {type:"boolean" }


#@markdown Input directory containing your map and any models (usually <b>ColabInputs</b> on your Google drive). Leave blank to upload directly. Skip parts of the file name like /content/ or MyDrive/). Must be set to <b>ColabInputs</b> if demo is run.
input_directory = "ColabInputs" #@param {type:"string"}
if is_demo and input_directory != "ColabInputs":
  exit("For a demo the input_directory must be ColabInputs")

#@markdown Maximum number of AlphaFold models to create on first cycle (best model chosen by plDDT, stops generating models if all have similar plDDT)
maximum_number_of_models =  50#@param {type:"integer"}
random_seed_iterations = maximum_number_of_models


#@markdown Save outputs to the directory <b>ColabOutputs</b> on Google drive
save_outputs_in_google_drive = True #@param {type:"boolean" }

#@markdown Carry on from where you left off (requires saving outputs in Google drive with checkbox above on initial run)
carry_on = False #@param {type: "boolean"}

#@markdown <b> B. Advanced options</b>



#@markdown Maximum cycles to run (fewer may be run if little change in models between cycles)
maximum_cycles =  10#@param {type:"integer"}

#@markdown Maximum number of templates from PDB to include (takes 1 minute per template):

maximum_templates_from_pdb =  20#@param {type:"integer"}

#@markdown Upload additional templates directly (read from input_directory if specified)
upload_manual_templates = False #@param {type:"boolean" }

#@markdown Upload MSA file (a3m format) directly (read from input_directory if specified)
upload_msa_file = False #@param {type:"boolean" }

#@markdown Specify if uploaded templates are already placed in the map and to be only used as suggestions for rebuilding and not as AlphaFold templates</font></i></b>
uploaded_templates_are_fragment_suggestions = False #@param {type:"boolean" }
uploaded_templates_are_map_to_model = uploaded_templates_are_fragment_suggestions


#@markdown Specify how to use multiple sequence alignment information (you might use it only on first cycle to force AlphaFold to follow rebuilt models)
msa_use = 'Use MSA throughout' #@param ["Use MSA throughout", "Use MSA in first cycle","Skip all MSA"]
# Set actual parameters

#@markdown Version of Phenix to use
phenix_version ='dev-4536' #@param {type:"string"}
version = phenix_version  # rename variable

#@markdown Random seed
random_seed = 581867 #@param {type:"integer"}

#@markdown Specify if you want to run a series of jobs by uploading a file with one jobname, resolution and sequence per line</i></b>
upload_file_with_jobname_resolution_sequence_lines = False #@param {type:"boolean"}
if is_demo and upload_file_with_jobname_resolution_sequence_lines:
  exit("For a demo upload_file_with_jobname_resolution_sequence_lines must be False")




#@markdown Turn on debugging</i></b>
debug = False #@param {type:"boolean"}


# We are going to get these from uploaded file...
if upload_file_with_jobname_resolution_sequence_lines:
  params['jobname'] = None
  params['resolution'] = None
  params['sequence'] = None

if msa_use == "Use MSA throughout":
  skip_all_msa = False 
  skip_all_msa_after_first_cycle = False
elif msa_use == "Use MSA in first cycle":
  skip_all_msa = False 
  skip_all_msa_after_first_cycle = True
else:
  skip_all_msa = True 
  skip_all_msa_after_first_cycle = True

upload_maps = True  # This version expects a map
use_msa = (not skip_all_msa)

minimum_random_seed_iterations = int(max(1,random_seed_iterations//20))
data_dir = "/content"
content_dir = "/content"
# Save parameters
for p in ['content_dir','data_dir','save_outputs_in_google_drive',
    'input_directory','working_directory',
    'include_templates_from_pdb','maximum_templates_from_pdb',
    'upload_msa_file',
    'upload_manual_templates','uploaded_templates_are_map_to_model',
    'maximum_cycles','version',
    'upload_file_with_jobname_resolution_sequence_lines',
    'use_msa','skip_all_msa_after_first_cycle',
    'upload_maps','debug','carry_on','random_seed',
    'random_seed_iterations','minimum_random_seed_iterations']:
  params[p] = locals().get(p,None)
!touch STEP_3

In [None]:
#@title 4. Setting up input files...
#@markdown You will be asked for permission to use your Google drive if needed.

#@markdown The upload button will appear below this cell if needed

import os
if not os.path.isfile("STEP_1"):
  raise AssertionError("Please run step 1 first")
  
if not os.path.isfile("STEP_3"):
  from phenix_colab_utils import exit
  exit("Please run steps 2-3 again before rerunning step 4...")


# Set up the inputs using the helper python files
from phenix_alphafold_utils import set_up_input_files
params = set_up_input_files(params, convert_to_params = False)
! touch STEP_4
! rm -f STEP_2 STEP_3


In [None]:
#@title 5. Installing Phenix, Alphafold and utilities...
#@markdown This step takes 8 minutes

import os
if not os.path.isfile("STEP_1"):
  raise AssertionError("Please run step 1 first")
  
from phenix_colab_utils import exit

if not os.path.isfile("STEP_4"):
  exit("Please run steps 1-4 first...")

import phenix_colab_utils as cu

# Get tensorflow import before installation
if not locals().get('tf'):
  tf = cu.import_tensorflow()

# Install selected software
cu.install_software(
  bioconda = True,
  phenix = True,
    phenix_version = params.get('version'),
    phenix_password = params.get('password'),
  alphafold = True,
  pdb_to_cif = True
    )

if os.path.isdir("updates") and os.path.isfile("NEED_UPDATES"):
  from install_updates import install_updates
  print("Installing updates")
  install_updates(skip_download = True)
!touch STEP_5

In [None]:
#@title 6. Creating AlphaFold models

import os
if not os.path.isfile("STEP_1"):
  raise AssertionError("Please run step 1 first")
  
from phenix_colab_utils import exit

if not os.path.isfile("STEP_4"):
  exit("Please run steps 2-4 again before rerunning this step...")

if not os.path.isfile("STEP_5"):
  exit("Please run step 5 first...")

! rm -f STEP_2 STEP_3 STEP_4

# Convert params from dict to alphafold_with_density_map params
from phenix_alphafold_utils import get_alphafold_with_density_map_params
params = get_alphafold_with_density_map_params(params)

from run_alphafold_with_density_map import run_jobs

# Working directory
os.chdir(params.content_dir)
results = run_jobs(params)



In [None]:
#@title Utilities (skipped unless checked)

# Put whatever utilities you want here. They will be run if checked
clear_caches = False #@param {type:"boolean" }
if clear_caches:
  from phenix_colab_utils import clear_python_caches
  clear_python_caches(modules = ['run_alphafold_with_density_map3','run_job','rebuild_model','install_phenix','run_fix_paths','runsh','mk_mock_template','mk_template','hh_process_seq','run_job','get_template_hit_list','run_alphafold_with_density_map','get_template_hit_list','get_cif_file_list','alphafold_utils','get_msa','get_templates_from_drive','phenix_alphafold_utils','phenix_colab_utils','clear_python_caches'])
  from phenix_colab_utils import clear_python_caches
  clear_python_caches()


crash_deliberately_and_restart = False #@param {type:"boolean" }
if crash_deliberately_and_restart:
  print("Crashing by using all memory.  Results in restart, losing everything")
  [1]*10**10

upload_helper_files = False #@param {type:"boolean" }
def get_helper_files():
  import os
  for file_name in ['phenix_colab_utils.py',
      'alphafold_utils.py','run_alphafold_with_density_map.py','phenix_alphafold_utils.py']:
    if os.path.isfile(file_name):
      os.remove(file_name)
    os.environ['file_name'] = file_name
    result = os.system("wget -qnc https://raw.githubusercontent.com/phenix-project/Colabs/main/alphafold2/$file_name")
if upload_helper_files:
  get_helper_files()

remove_everything_and_restart = False #@param {type:"boolean" }
if remove_everything_and_restart:
  !kill -9 -1

auto_reload = False #@param {type:"boolean" }
if auto_reload:
  %load_ext autoreload
  %autoreload 2

**Helpful hints**

**What this Colab notebook is good for**

* The purpose of this notebook is to generate an AlphaFold model of a single protein chain that is compatible with a density map showing that chain.

* The chain can be of any length between 20 and about 1000 residues. Longer chains could work but may fail due to the GPU memory required.

* The density map should be at a resolution of about 4.5 A or better.  The density map can be your complete map (perhaps showing many chains), or it can be a map boxed to contain just the chain of interest.  If you can box the map that will always make it faster, and it could prevent failure in cases where docking of the predicted model into the map is difficult.

**When this notebook will not work**

* This notebook will fail if the AlphaFold prediction is of very low confidence (i.e., if few of the residues in the structure have plDDT values over 70).

* The notebook will fail if it cannot find the location of the predicted model in the density map, or if the density map differs so much from the model that it cannot find a way to rebuild the model to agree with the map

**Password**
* Your Phenix download password is the password you get from <a href = "https://phenix-online.org/download" target="_blank"> phenix_online.org/download </a> and that you (or someone from your institution) used to download Phenix. It is updated weekly so you may need to request a new one rather frequently.

**Saving your results**

* The best way to save your results is to leave the <b>save_outputs_in_google_drive</b> box in <b>Commonly-needed options</b> checked.  This way your results are saved in a <b>ColabOutputs</b> folder on your Google drive as they are generated and you can also <b>Carry on</b> if your notebook crashes or times out.

* Wehn your job is done a zip file with your results is normally downloaded automatically. Sometimes this doesn't work and you need to download it manually using the folder icon on the left side of the screen. All your files are in a folder named based on your job name. The zip file is in that folder and also in the main folder.  All your important files are also on your Google drive in the <b>ColabOutputs</b> folder.

**Running a demo**

* You can run any one of 25 demo structures by selecting it
in the second cell. If you run a demo you only need to select the demo and supply your Phenix download password (no other inputs are necessary and the sequence, resolution and jobname fields are ignored).

* These demos are not selected to give good results...they are the same structures used in the paper describing this procedure. Some of them start out with very good AlphaFold predictions and basically nothing happens. A few start out with really poor AlphaFold predictions and again nothing happens.  Most of the others start out with a moderately good AlphaFold prediction (average plDDT of 70-80) and then the procedure works pretty well and you get an improved plDDT and an improved model.  The ones at the top of the list are likely to work best.  You can see the expected results for any of the demos [here](https://phenix-online.org/phenix_data/terwilliger/alphafold_with_density_2022/Demos).  

* The approximate length of time (on Colab) required to run the demos is listed. These are estimates for the standard Colab.  The run time is shorter on Colab Pro and on Colab Pro+.

* Colab can crash even on the demos...this usually happens if your session does not have enough memory but there may be bugs in our code that cause crashes as well. If you contact us (use the Phenix GUI to do that) we will try to fix any bugs that you find.

* You might want to leave the "save_outputs_in_google_drive" button on in the third cell so that the results are saved there as you go and so you can use the "carry_on" button to run again if it times out or crashes (see next section)

* After you run the demo you can download a .zip file with the results from a directory that starts with the name of your demo (for example the 7c2k demo would be in a directory called 7c2k_30275).  You can get to the directory with the folder button on the left side of the notebook. Then if you navigate to your .zip file you can hover your cursor over the file name and click on the three dots and select download.

* The map file used in the demo will be in the directory "ColabInputs" with a file name starting with the name of your demo.

**Carrying on after a timeout or crash**

* If you save your results in your Google drive folder <b>ColabOutputs</b> by specifying a Google drive input_directory, you can continue on after a crash.  You set up the inputs just as you did on the initial run, but check the <b>carry_on</b> box. You then (usually) go through the whole process again (reboot the virtual machine, then <b>Run all</b>).  The notebook will look in your <b>ColabOutputs</b> directory for the files that it is going to create...if it finds them there it will use them instead of creating them again.  If you are lucky you may be able to restart without rebooting...you can try by just selecting <b>Run all</b> again and if it runs you are ok.

**Sequence format**

* Your sequence should contain only the 1-letter code of one protein chain. It can contain spaces if you want.

**File names and jobname must match**
* Your AlphaFold predictions will be named yyyy_ALPHAFOLD_x.pdb
and your rebuilt models yyyy_REBUILT_x.pdb, where yyyy is your jobname and x is the cycle number.

* All model file names must start with 4 characters, optionally followed by "_" and more characters, and must end in ".pdb" or ".cif",  Valid file names are abcd.pdb, abcd.cif, abcd_other.pdb.  Non-valid names are abc.pdb, abcde.cif.

* Your jobname must match the beginnings of your map file names and model file names.  If your jobname is joba then your map file name must look like: joba_xxx.mrc or joba_yyy.ccp4.  Your model file name must look like: joba_mymodel.pdb or joba.cif.  This correspondence is used to match map and model files with jobnames.

**Options for uploading your map file**

* (A) Upload when the Upload button appears at the bottom of the cell after you hit Runtime / Run all in step 3
* (B) Upload in advance to a unique folder in your Google Drive and specify this directory in the entry form.
* (C) as in B but upload to a unique new folder in /content/.  Note that C requires using the command-line tool at the bottom left of the page to create a new directory like MyFiles, uploading with the upload button near the top left of the page, and moving the uploaded file from /content/my_file.mrc to /content/MyFiles/my_file.mrc.

**Uploading a file with all your file information**

* To upload
a file with a jobname, resolution, and sequence on each line, 
check ***upload_file_with_jobname_resolution_sequence_lines*** and hit
the ***Run*** button to the left of the first cell.

* If you upload a file with multiple sequences, each line of the file should have exactly one job name, a space, resolution, and a sequence, like this:

7n8i_24237 2.3 VIWMTQSPSSLSASVGDRVTITCQASQDIRFYLNWYQQKPGKAPKLLISDASNMETGVPSRFSGS

7lvr_23541 3 MRECISIHVGQAGVQIGNACWELYCLEHGIQPDGQMPSDKTIGGGDDSFNTFFSETG

**Randomized tries on first cycle**

* You can specify how many AlphaFold models to try
and build at the start (50 may be a good number unless you have a big structure). Models are scored by plDDT
and the highest-scoring one is kept.  If all the models
have similar plDDT as they are being created the randomization step is discontinued and the best one found is used.

**Try turning off MSA's after first cycle**

* You can encourage AlphaFold to use your rebuilt templates by specifying skip_all_msa_after_first_cycle. This will just use your template information and intrinsic structural information in AlphaFold for all cycles except the first.

**Try including templates from the PDB**

* The default is to not include templates from the PDB, but you can often improve your modeling a lot if you do include them.  If you know that your structure is similar to some structures in the PDB it is a good idea to include templates from the PDB.  You can also choose specific chains from the PDB and upload them yourself and check the "upload_manual_templates" box.

**Reproducibility**

* The tensorflow and AlphaFold2 code will give different results depending on the GPU that is used and the random seed that you choose.
 You can see what GPU you have by opening a cell with the '+Code' button and typing:
 ! nvidia-smi  and then running that cell.
The GPU type will be listed (like Tesla V100-SXM2).
You get a much higher-quality GPU with Colab Pro or Pro+ than with the free version.

**Running cells in this Colab notebook**
* You can step through this notebook one part at a time
by hitting the ***Run*** buttons to the left one at a time. 

* The cell that is active is indicated by a ***Run*** button that has turned into a black circle with a moving black arc

* When execution is done, the ***Run*** button will go back 
to its original white triangle inside a black circle

* You can stop execution of the active cell by hitting its ***Run*** button. It will turn red to indicate it has stopped.

* You can rerun any cell any time that nothing is running.  That means you can go all the way through, then go back to the first cell and enter another sequence and redo the procedure.

* If something goes wrong, the Colab Notebook will print out
an error message.  Usually this will be something telling you
how to change your inputs.  You enter your new inputs and
hit the ***Run*** button again to carry on.

**Possible problems**

* The automatic download may not always work. Normally the
file download starts when the .zip files are created,
but the actual download happens when all the AlphaFold
models are completed.
You can click on the 
folder icon to the left of the window and download your
jobname.zip file manually.  Open and close the file
browser to show recently-added files.

* Your Colab connection may time out if you go away and
leave it, or if you run for a long time (more than an hour).
If your connection times out you lose everything that
is not yet downloaded. So you might want to download as you go or specify a Google drive input directory. 

* Google Colab assigns different types of GPUs with varying amount of memory. Some might not have enough memory to predict the structure for a long sequence.  

**Result zip file contents**

1. Alphafold prediction for each cycle
2. Rebuilt model for each cycle
3. PAE matrix (.jsn) for each cycle
4. PAE and plDDT figures (.png) for each cycle

**Colab limitations**
* While Colab is free, it is designed for interactive work and not-unlimited memory and GPU usage. It will time-out after a few hours and it may check that you are not a robot at random times.  On a time-out you may lose your work. You can increase your allowed time with Colab+

* AlphaFold can crash if it requires too much memory. On a crash you may lose all your work that is not yet downloaded. You can have more memory accessible if you have Colab+. If you are familiar with Colab scripts you can try this [hack](https://towardsdatascience.com/double-your-google-colab-ram-in-10-seconds-using-these-10-characters-efa636e646ff ) with the <b>crash_deliberately_and_restart</b> check-off in the Utilities section to increase your memory allowance.


**Description of the plots**

*   **Number of sequences per position** - Look for at least 30 sequences per position, for best performance, ideally 100 sequences.
*   **Predicted lDDT per position** - model confidence (out of 100) at each position. The higher the better.
*   **Predicted Alignment Error** - For homooligomers, this could be a useful metric to assess how confident the model is about the interface. The lower the better.

**Updates**

- <b> <font color='green'>2022-01-25 Includes integrated rebuilding and AlphaFold2 modeling
- <b> <font color='green'>2022-02-18 Includes demos of 25 chains from August 2022 PDB entries and corresponding boxed maps.
- <b> <font color='green'>2022-03-01 Allows any number of starting AlphaFold models.

**Acknowledgments**

- <b> <font color='green'>This notebook is based on the very nice notebook from ColabFold ([Mirdita et al., *bioRxiv*, 2021](https://www.biorxiv.org/content/10.1101/2021.08.15.456425v1), https://github.com/sokrypton/ColabFold)</font></b> 

- <b><font color='green'>ColabFold is based on AlphaFold2 [(Jumper et al. 2021)](https://www.nature.com/articles/s41586-021-03819-2)
</font></b>