<a href="https://colab.research.google.com/github/pikulsomesh/tutorials/blob/master/rosettafold_nomsa_notemplates.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
#################
# WARNING 
#################
# this notebook disables many aspects (templates, trfold, pytosetta) of the full rosettafold pipeline
# it is intended as quick demo for predicting structures of denovo designed proteins (for which there exists no msa or templates)
# for prediction of natural proteins, use the full pipeline: https://github.com/RosettaCommons/RoseTTAFold

# that being said, if you do have a custom MSA, go towards the end of the notebook!

#################
# EXTRA
#################
# check out AlphaFold
# https://colab.research.google.com/drive/1qWO6ArwDMeba1Nl57kk_cQ8aorJ76N6x

In [1]:
%%bash
# download model
git clone https://github.com/RosettaCommons/RoseTTAFold.git
mv RoseTTAFold/network/* .

Cloning into 'RoseTTAFold'...


In [2]:
%%bash
# download model params
wget -qnc https://files.ipd.uw.edu/pub/RoseTTAFold/weights.tar.gz
tar -xf weights.tar.gz

In [3]:
%%bash
pip install -q dgl-cu102
pip install -q torch-scatter -f https://pytorch-geometric.com/whl/torch-1.9.0+cu102.html
pip install -q torch-sparse -f https://pytorch-geometric.com/whl/torch-1.9.0+cu102.html
pip install -q torch-geometric
pip install -q py3Dmol

In [4]:
import predict_e2e
import py3Dmol

DGL backend not selected or invalid.  Assuming PyTorch for now.


Setting the default backend to "pytorch". You can change it in the ~/.dgl/config.json file or export the DGLBACKEND environment variable.  Valid options are: pytorch, mxnet, tensorflow (all lowercase)


Using backend: pytorch


In [5]:
# load model
pred = predict_e2e.Predictor(model_dir="weights")

# Single sequence input (no MSA, no templates)

In [6]:
# CHANGE THIS LINE TO YOUR FAVE SEQUENCE
query_sequence = "ALKARSAAKAVRWPKKAIKQASKKVAKYALKLLRKKKAASKLWLQLHWPRW"
with open("test.fas","w") as a3m: a3m.write(f">test\n{query_sequence}\n")

In [7]:
# make prediction using model
pred.predict("test.fas","test")

SE(3) iteration 0 [0.469  0.4702]
SE(3) iteration 1 [0.4895 0.4998]
SE(3) iteration 2 [0.4712 0.4924]
SE(3) iteration 3 [0.497 0.548]
SE(3) iteration 4 [0.502  0.5557]
SE(3) iteration 5 [0.5107 0.5605]
SE(3) iteration 6 [0.5083 0.5625]
SE(3) iteration 7 [0.5146 0.566 ]
SE(3) iteration 8 [0.527  0.5664]
SE(3) iteration 9 [0.533 0.568]
SE(3) iteration 10 [0.538 0.569]
SE(3) iteration 11 [0.542 0.569]
SE(3) iteration 12 [0.5474 0.571 ]
SE(3) iteration 13 [0.552  0.5713]
SE(3) iteration 14 [0.5557 0.5713]
SE(3) iteration 15 [0.559 0.572]
SE(3) iteration 16 [0.562 0.572]
SE(3) iteration 17 [0.5645 0.5723]
SE(3) iteration 18 [0.5664 0.5728]
SE(3) iteration 19 [0.567  0.5737]
SE(3) iteration 20 [0.569  0.5737]
SE(3) iteration 21 [0.57  0.575]
SE(3) iteration 22 [0.569  0.5757]
SE(3) iteration 23 [0.5684 0.5767]
SE(3) iteration 24 [0.569 0.577]
SE(3) iteration 25 [0.569 0.577]
SE(3) iteration 26 [0.5703 0.578 ]
SE(3) iteration 27 [0.5693 0.5786]
SE(3) iteration 28 [0.568 0.579]
SE(3) iteration

In [8]:
  p = py3Dmol.view(js='https://3dmol.org/build/3Dmol.js')
  p.addModel(open("test.pdb",'r').read(),'pdb')
  p.setStyle({'cartoon': {'color':'spectrum'}})
  p.zoomTo()
  p.show()

# But what if I do have an MSA?
(no templates)

In [None]:
%%bash
wget -qnc https://gremlin2.bakerlab.org/db/ECOLI/fasta/P0A8I3.fas

In [None]:
pred.predict("P0A8I3.fas","yaaa")

SE(3) iteration 0 [0.6646 0.646 ]
SE(3) iteration 1 [0.727 0.757]
SE(3) iteration 2 [0.759  0.7915]
SE(3) iteration 3 [0.763 0.796]
SE(3) iteration 4 [0.7715 0.819 ]
SE(3) iteration 5 [0.774  0.8315]
SE(3) iteration 6 [0.775 0.831]
SE(3) iteration 7 [0.7764 0.8315]
SE(3) iteration 8 [0.7773 0.831 ]
SE(3) iteration 9 [0.779  0.8315]
SE(3) iteration 10 [0.777  0.8315]
SE(3) iteration 11 [0.777 0.832]
SE(3) iteration 12 [0.777 0.832]
SE(3) iteration 13 [0.777 0.832]
SE(3) iteration 14 [0.778  0.8315]
SE(3) iteration 15 [0.7764 0.8315]
SE(3) iteration 16 [0.778  0.8315]
SE(3) iteration 17 [0.7803 0.8315]
SE(3) iteration 18 [0.7793 0.8315]
SE(3) iteration 19 [0.779 0.832]
SE(3) iteration 20 [0.778  0.8315]
SE(3) iteration 21 [0.779 0.832]
SE(3) iteration 22 [0.7803 0.8315]
SE(3) iteration 23 [0.782 0.832]
SE(3) iteration 24 [0.7827 0.832 ]
SE(3) iteration 25 [0.7866 0.832 ]
SE(3) iteration 26 [0.787 0.832]
SE(3) iteration 27 [0.787  0.8315]
SE(3) iteration 28 [0.788 0.832]
SE(3) iteration 2

In [None]:
  p = py3Dmol.view(js='https://3dmol.org/build/3Dmol.js')
  p.addModel(open("yaaa.pdb",'r').read(),'pdb')
  p.setStyle({'cartoon': {'color':'spectrum'}})
  p.zoomTo()
  p.show()

In [None]:
##############################
# Where do I get an MSA?
##############################
# For any "serious" use, I would recommend using the rosettafold pipeline to make the MSAs, 
# since this is what it was trained on. 

# That being said, part of the MSA generation pipeline (specifically searching against uniprot database using hhblits)
# can be done here: https://toolkit.tuebingen.mpg.de/tools/hhblits
# download the a3m file and upload to the notebook!

# hit "submit" -> wait... -> "Query Template MSA" -> "Download Full A3M"
# upload notebook to google colab (left side, hit the "folder" icon, then hit "upload" icon)
# change this line:
# a3m_lines = "".join(open("YOUR_A3M_FILE.a3m","r").readlines())