<a href="https://colab.research.google.com/github/kiharalab/CryoREAD/blob/main/CryoREAD.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CryoREAD: De novo structure modeling for nucleic acids in cryo-EM maps using deep learning


<a href="https://github.com/marktext/marktext/releases/latest">
   <img src="https://img.shields.io/badge/cryo_READ-v1.0.0-green">
   <img src="https://img.shields.io/badge/platform-Linux%20%7C%20Mac%20-green">
   <img src="https://img.shields.io/badge/Language-python3-green">
   <img src="https://img.shields.io/badge/dependencies-tested-green">
   <img src="https://img.shields.io/badge/licence-GNU-green">
</a>  

Cryo_READ is a computational tool using deep learning to automatically build full DNA/RNA atomic structure from cryo-EM map.  

Copyright (C) 2022 Xiao Wang, Genki Terashi, Daisuke Kihara, and Purdue University.

License: GPL v3. (If you are interested in a different license, for example, for commercial use, please contact us.)

Contact: Daisuke Kihara (dkihara@purdue.edu).

For technical problems or questions, please reach to Xiao Wang (wang3702@purdue.edu).

**We strongly suggest to use Google Chrome for CryoREAD Colab version. Other browsers such as Safari may raise errors when uploading or downloading files.**

If you are using other browsers, disabling tracking protection may help resolve the errors when uploading or downloading files.

For more details, see **<a href="#Instructions">Instructions</a>** of the notebook and checkout the **[CryoREAD GitHub](https://github.com/kiharalab/CryoREAD)**. If you use CryoREAD, please cite it: **<a href="#Citation">Citation</a>**.

#Overall Protocol
1) Structure Detection by deep neural network Cryo-READ networks;   
2) Tracing backbone according to detections;   
3) Fragment-based nucleotide assignment;  
4) Full atomic structure modeling.   


<p align="center">
  <img src="https://user-images.githubusercontent.com/50850224/199084130-34b35a89-3c0c-4647-b693-82fbcc10c820.jpg" alt="cryo-READ framework" width="70%">
</p>

# Instructions <a name="Instructions"></a>
## Tutorial ppt [PPT](https://github.com/kiharalab/CryoREAD/blob/main/CryoREAD_tutorial.pptx)
## Steps
1. Connect to a gpu machine by clicking the right top button **"connect"** in the notebook, then we can run DAQ under GPU support.
2. Click the left running button in <a href="#Dependency">Install Dependencies</a> to install dependencies.
3. Upload your cryo-EM maps in mrc/map format by clicking the left running button in <a href="#Map">Upload Cryo EM maps</a>. If you want to use our example,then choose the box **use_author_example**. <br>
Here we suggest user to upload a cryo-EM map with **spacing 1** to save the running time.<br>
Here is a simple instructions to do that via [ChimeraX](https://www.rbvi.ucsf.edu/chimerax/): <a name="ChimeraX"></a>
```
1 open your map via chimeraX.
2 In the bottom command line to type command: vol resample #1 spacing 1.0
3 In ChimeraX, click "save", then choose "MRC density map(*.mrc)" in "Files of type", then in "Map" choose the resampled map, finally specify the file name and path to save.
4 Then you can use the resampled map to upload
```
4. (Optional) Upload your fasta file storing sequence information by clicking the left running button in <a href="#Fasta">Upload Fasta</a>. <br>
We suggest to use the following style:
```
>[chain_id1]
sequence_info1
>[chain_id2]
sequence_info2
```
such as
```
>A
CUGACAUACUUGUUCCACUCUAGCAGCACGUAAAUAUUGGCGUAGUGAAAUAUAUAUUAAACACCAAUAUUACUGUGCUGCUUUAGUGUGACAGGGAUACAGCAA
```
Meanwhile, standard way in PDB database should also be supported.

5. Specify the Parameters in <a href="#Param">Parameters</a>. Either you modified or not, click the left running button to set it.

6. Running CryoREAD by by clicking the left running button in <a href="#Running">Run CryoREAD</a>.

7. (Optional) Click the left running button in <a href="#Download">Download</a> to download the zip files. If you choose to generate structures by either using sequence or not, the output will be saved in pdb format. You can easily open it in **COOT** to do further refinement based on your expertise. For simple visualization, you can also use pymol to check the outputted structure. If you only choose to generate CryoREAD detections, you can check the predicted probabilities by

8. Visualize structure online by clicking the left running button in <a href="#Visualization">Visualization</a>

**Result in zip file**
1. A PDB file with final structure.
2. A PDB file saved naive structure from CryoREAD without refinement.
3. Detection map with probability values of different classes: phosphate, sugar, base, protein; base-A, base-U/T, base-C, base-G.



# Run CryoREAD Online

In [None]:
#@title Install dependencies <a name="Dependency"></a>
#@markdown Please make sure the notebook is already connected to **GPU**, DAQ needs GPU support to run.<br>
#@markdown Click the right top button **"connect"**, then the notebook will automatically connect to a gpu machine

%cd /content
!pip install biopython ortools==9.4.1874
!pip install mrcfile==1.2.0
!pip install numpy>=1.19.4
!pip install numba>=0.52.0
!pip install torch>=1.6.0
!pip install scipy>=1.6.0
!pip install tqdm
!pip install progress
!pip install numba-progress
!pip install py3Dmol
!rm -rf CryoREAD
!git clone https://github.com/kiharalab/CryoREAD --quiet
%cd CryoREAD

In [None]:
#@title Input cryo-EM map <a name="Map"></a>
#@markdown **Please make sure the cryo-EM map is 3D cryo-EM map with the same format in EMDB.**
#@markdown <br>Here we suggest user to upload a cryo-EM map with **spacing 1** to save the running time. Detailed instructions with ChimeraX is <a href="#ChimeraX">ChimeraX resampling</a>
#@markdown <br> **Support file format: .mrc, .mrc.gz, .map, .map.gz**
from google.colab import files
import os
import os.path
import re
import hashlib
import random
import string

rand_letters = string.ascii_lowercase
rand_letters = ''.join(random.choice(rand_letters) for i in range(20))
#@markdown Instead of uploading, you can also specify the link here to automatically download maps from EMDB and other servers.
#@markdown Example: https://files.wwpdb.org/pub/emdb/structures/EMD-21051/map/emd_21051.map.gz
download_link = '' #@param {type:"string"}
#@markdown ```If you want to use author's example, just select the following box.```
if download_link!='':
  root_dir = os.getcwd()
  upload_dir = os.path.join(root_dir,rand_letters)
  if not os.path.exists(upload_dir):
    os.mkdir(upload_dir)
  os.chdir(upload_dir)
  os.system("wget %s"%download_link)
  parse_link=download_link.split("/")[-1]
  map_input_path = os.path.join(upload_dir,parse_link)
  os.chdir(root_dir)
  fasta_input_path = None
else:
  use_author_example = True #@param {type:"boolean"}
  if not use_author_example:
    os.chdir("/content/CryoREAD")
    root_dir = os.getcwd()
    upload_dir = os.path.join(root_dir,rand_letters)
    if not os.path.exists(upload_dir):
      os.mkdir(upload_dir)
    os.chdir(upload_dir)
    map_input = files.upload()
    for fn in map_input.keys():
      print('User uploaded file "{name}" with length {length} bytes'.format(
        name=fn, length=len(map_input[fn])))
      map_input_path = os.path.abspath(fn)
      print("Map save to %s"%map_input_path)
    os.chdir(root_dir)
    fasta_input_path = None
  else:
    map_input_path = os.path.join(os.getcwd(),"example")
    map_input_path = os.path.join(map_input_path,"21051.mrc")
    fasta_input_path = os.path.join(os.getcwd(),"example")
    fasta_input_path = os.path.join(fasta_input_path,"21051.fasta")
    print("Autho Example is selected!",map_input_path)

In [None]:
#@title (Optional) Input Fasta File <a name="Fasta"></a>
#@markdown If you choose to use author's example, please **skip** this. <br>
#@markdown If your **sequence length** is longer than **500**, please ignore this step and just use no sequence mode since free-version colab can only run 2-3 hours per day with 2 CPUs.
#@markdown <br>Otherwise, please upload your fasta file.
#@markdown <br> **Support file format: .fasta**


from google.colab import files
import os
import os.path
import re
import hashlib
import random
import string

rand_letters = string.ascii_lowercase
rand_letters = ''.join(random.choice(rand_letters) for i in range(20))
if use_author_example:
  print("you have chosen to use author's example, you can not upload map files any more.")
else:
  root_dir = os.getcwd()
  upload_dir = os.path.join(root_dir,rand_letters)
  if not os.path.exists(upload_dir):
    os.mkdir(upload_dir)
  os.chdir(upload_dir)
  fasta_input = files.upload()
  for fn in fasta_input.keys():
    print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(fasta_input[fn])))
    fasta_input_path = os.path.abspath(fn)
    print("Fasta file save to %s"%fasta_input_path)
  os.chdir(root_dir)


In [None]:

#@title Specify Parameters <a name="Param"></a>
contour = 0.6 #@param {type:"number"}
#@markdown ```author (in EMDB) recommended contour level for the input map. Using contour level will not have any impact on the result, but can reduce the computation time by ignoring uninterested regions. ```
#@markdown <br>```If you are not sure the contour level, just use 0.```
#@markdown  <br>```default:0. Suggested Range: [0,author_contour]```
stride = 32  #@param {type:"number"}
#@markdown Detailed explanation can be seen: [stride_definition](https://deepai.org/machine-learning-glossary-and-terms/stride)<br>
#@markdown ```stride step for scanning the cryo-EM map with a box size of 64. Increasing the stride can reduce the computation time but may lead to unreliable result. ```<br>``` default stride: 16(integer). Suggested values: [16,32,48].```
#@markdown <br>**If your job encounter disk space limit of colab, please increase stride to 32 or 48 to save time.**

detection_only = 0 #@param {type:"number"}
#@markdown ```If you only want to get the predictions from CryoREAD, please change it to 1. Otherwise, leave it as 0.```
#@markdown ```You can also set it to 1 if you find whole process cannot be finished in colab in its limited time.```
use_sequence = 0 #@param {type:"number"}
#@markdown ```use sequence information to further refine base assignment or not. Default: 0. Because fragment-base assignment takes long time to finish, therefore we set it as 0 by default. However, for sequence length less than 500, it may be possible to finish in colab (2-3 hour limit per day). If you are confident your job can finish in colab, you can set it as 1 for better base assignment. ```
# resolution =3.7 #@param {type:"number"}
# #@markdown ```Resolution of Cryo-EM maps. Required for last step refinement. If you specify 0 here, we will skip refinement step.```
resolution = 0

In [None]:
#@title Run CryoREAD <a name="Running"></a>
#@markdown Please allow 5min-2hours to get the output, since 3D input processing and inferencing takes some time.
#@markdown <br>Our running time is directly correlated to the size of the map.
#@markdown <br>If your map is too big, please run locally with our github code. If you don't have GPU resources, please make contact with us and we are happy to run it for you.
%cd /content/CryoREAD
!git pull origin main #make sure up to date
if detection_only:
  command_line = "python main.py --mode=0 -F=%s -M=best_model --contour=%f --gpu=0 --batch_size=8 --prediction_only --stride=%d"%(map_input_path,contour*0.5,stride)
elif use_sequence:
  if resolution==0:
    command_line = "python main.py --mode=0 -F=%s -M=best_model --contour=%f --gpu=0 --batch_size=8  -P=%s --rule_soft=0 --stride=%d"%(map_input_path,contour*0.5,fasta_input_path,stride)
  else:
    command_line = "python main.py --mode=0 -F=%s -M=best_model --contour=%f --gpu=0 --batch_size=8  -P=%s --rule_soft=0 --resolution=%f  --refine --colab --stride=%d"%(map_input_path,contour*0.5,fasta_input_path,resolution,stride)
else:
  if resolution==0:
    command_line = "python main.py --mode=0 -F=%s -M=best_model --contour=%f --gpu=0 --batch_size=8  --no_seqinfo --stride=%d"%(map_input_path,contour*0.5,stride)
  else:
    command_line = "python main.py --mode=0 -F=%s -M=best_model --contour=%f --gpu=0 --batch_size=8  --no_seqinfo --resolution=%f  --refine --colab --stride=%d"%(map_input_path,contour*0.5,resolution,stride)
!echo $command_line
!$command_line
!echo "INFO : CryoREAD Done"

In [None]:
#@title Download Output <a name="Download"></a>
#@markdown The pdb file of predicted structure and detection probability map of CryoREAD will be compressed and downloaded. You can visualize your structure by Pymol and may further refine it by loading it into COOT.
from google.colab import files
import os, tarfile
import shutil
import zipfile

zip_format = True #@param {type:"boolean"}
#@markdown If you want to download tar.gz format file, please not choose **zip_format** box.
map_name = os.path.split(map_input_path)[1].replace(".mrc", "")
map_name = map_name.replace(".map", "")
map_name = map_name.replace(".gz", "")
map_name = map_name.replace("(","").replace(")","")
download_path = os.path.join(os.getcwd(),"Predict_Result")
user_download_path = os.path.join(download_path,map_name)
detection_download_path = os.path.join(user_download_path,"2nd_stage_detection")
tmp_download_dir = os.path.join(os.getcwd(),"tmp")
if not os.path.exists(tmp_download_dir):
  os.mkdir(tmp_download_dir)
os.system("rm "+str(tmp_download_dir)+"/*")
#get detection maps
for item in os.listdir(detection_download_path):
  if ".mrc" in item and "chain" in item:
    shutil.copy(os.path.join(detection_download_path,item),os.path.join(tmp_download_dir,item))
if not detection_only:
  #get structures
  structure_download_path = user_download_path #os.path.join(user_download_path,"Output")
  # if use_sequence:
  #   structure_download_path = os.path.join(structure_download_path,"Output_Structure")
  # else:
  #   structure_download_path = os.path.join(structure_download_path,"Output_Structure_noseq")
  for item in os.listdir(structure_download_path):
    if ".pdb" in item:
      shutil.copy(os.path.join(structure_download_path,item),os.path.join(tmp_download_dir,item))
if zip_format:
  tar_path = os.path.join(download_path,map_name+"_cryoread.zip")
else:
  tar_path = os.path.join(download_path,map_name+"_cryoread.tar.gz")
def zip_file(tar_path,src_dir):
    zip_name = tar_path
    z = zipfile.ZipFile(zip_name,'w',zipfile.ZIP_DEFLATED)
    for dirpath, dirnames, filenames in os.walk(src_dir):
        fpath = dirpath.replace(src_dir,'')
        fpath = fpath and fpath + os.sep or ''
        for filename in filenames:
            z.write(os.path.join(dirpath, filename),fpath+filename)
            print ('==Compress Success!==',filename)
    z.close()

def make_targz(output_filename, source_dir):
    """
    :param output_filename:
    :param source_dir:
    :return: bool
    """
    try:
        with tarfile.open(output_filename, "w:gz") as tar:
            tar.add(source_dir, arcname=os.path.basename(source_dir))

        return True
    except Exception as e:
        print(e)
        return False
if zip_format:
  zip_file(tar_path,tmp_download_dir)
else:
  make_targz(tar_path,tmp_download_dir)
files.download(tar_path)



In [None]:
#@title CryoREAD Predicted Structure Visualization (3D) <a name="Visualization"></a>
#@markdown Limited by redistribution constraints, the structure here is not refined and may include atom clashes. If you want better structures, please use our server for full services: https://em.kiharalab.org/algorithm/CryoREAD.
#@markdown <br>Please **skip** this step if you choose detection_only.
#@markdown <br>To check the structure positions in map, please download the structure and visualize in coot, chimera or pymol to compare against the input map.
map_name = os.path.split(map_input_path)[1].replace(".mrc", "")
map_name = map_name.replace(".map", "")
map_name = map_name.replace(".gz", "")
map_name = map_name.replace("(","").replace(")","")
download_path = os.path.join(os.getcwd(),"Predict_Result")
user_download_path = os.path.join(download_path,map_name)
# structure_download_path = os.path.join(user_download_path,"graph_atomic_modeling")
# if use_sequence:
#   structure_download_path = os.path.join(structure_download_path,"Output_Structure")
# else:
#   structure_download_path = os.path.join(structure_download_path,"Output_Structure_noseq")
# listfiles=[]
# for item in os.listdir(structure_download_path):
#     if ".pdb" in item:
#       listfiles.append(os.path.join(structure_download_path,item))
# final_pdb_path=None
# if len(listfiles)==0:
#   print("no pdb detected in the prediction directory",structure_download_path)
# elif len(listfiles)==1:
#   final_pdb_path=listfiles[0]
# else:
#   for x in listfiles:
#     if "Refine" in x:
#       final_pdb_path=x
#   if final_pdb_path is None:
#     final_pdb_path=listfiles[0]
structure_download_path = os.path.join(user_download_path,"Output")
final_pdb_path = os.path.join(structure_download_path,"Refine_cycle3.pdb")
if not os.path.exists(final_pdb_path):
  final_pdb_path = os.path.join(structure_download_path,"Refine_cycle2.pdb")
  if not os.path.exists(final_pdb_path):
    final_pdb_path = os.path.join(structure_download_path,"Refine_cycle1.pdb")
    if not os.path.exists(final_pdb_path):
      final_pdb_path = os.path.join(user_download_path,"CryoREAD_norefine.pdb")
    else:
      listfiles=[x for x in os.listdir(structure_download_path) if ".pdb" in x]
      if len(listfiles)!=0:
        final_pdb_path = os.path.join(structure_download_path,listfiles[0])
      else:
        print("we do not find any pdb output in %s"%structure_download_path)
        #return
print("visualize %s"%final_pdb_path)
import py3Dmol
def show_pdb(output_pdb_path):
  view = py3Dmol.view(js='https://3dmol.org/build/3Dmol.js',)
  view.addModel(open(output_pdb_path,'r').read(),'pdb')
  #view.setStyle({"style":"sticks"})
  view.setStyle({'stick':{}})
  #view.setStyle({'cartoon': {'spectrum': {'prop':'b','min':-1,'max':1}}})
  view.zoomTo()
  return view
if final_pdb_path is not None:
  show_pdb(final_pdb_path).show()




# Citation: <a name="Citation"></a>

Xiao Wang, Genki Terashi & Daisuke Kihara. De novo structure modeling for nucleic acids in cryo-EM maps using deep learning. Nature Methods, 2023.
<a href="https://www.nature.com/articles/s41592-023-02032-5">Paper</a>
```
@article{xiao2022CryoREAD,   
  title={De novo structure modeling for nucleic acids in cryo-EM maps using deep learning},   
  author={Xiao Wang, Genki Terashi, and Daisuke Kihara},    
  journal={Nature Methods},    
  year={2023}    
}   
```