[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/XYSSDFFSDFS)  

# Running NucleicNet on Colab with GPU

This notebook illustrates a basic benchmark on Google Colab. In general a Google Colab free user will have access to the following computing resource

* CPU. 2-core Intel(R) Xeon(R) @ 2.20GHz Family 6
* RAM. 37G
* GPU. Tesla K80 with 13G memory if you get one. We only support sessions with GPU.


Please make sure the GPU is turned on by clicking `Runtime > Change runtime type > Hardware accelerator:GPU > Save`. To run the notebook, click `Runtime > Run all`. 

# Cannot connect to GPU backend

We only support sessions with GPU enabled. If you receive a message `Cannot connect to GPU backend`, which means the GPU session is disabled, please review suggestions by [Google Research](https://research.google.com/colaboratory/faq.html#usage-limits). Free users in general do not have priority in using the GPU and may hit user limit. `Colab does not publish these limits, in part because they can (and sometimes do) vary quickly.GPUs and TPUs are sometimes prioritized for users who use Colab interactively rather than for long-running computations, or for users who have recently used less resources in Colab. As a result, users who use Colab for long-running computations, or users who have recently used more resources in Colab, are more likely to run into usage limits and have their access to GPUs and TPUs temporarily restricted.` We do not bear any responsibility in financing the user for any of their paid Google Colab sessions. Do not send us invoice! Try another day.


# Acknowledgement

We would like to acknowledge [AlphaFold](https://colab.research.google.com/github/deepmind/alphafold/blob/main/notebooks/AlphaFold.ipynb#scrollTo=VzJ5iMjTtoZw) and [ColabFold](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/beta/AlphaFold2_advanced.ipynb#scrollTo=woIxeCPygt7K) in formalising routines in setting up Colab.

# Upload a PDB file
Note that user should make sure that the pdb submitted does not contain non-protein parts and that the protein submitted is intended, even though we have script to retain only the following 20 canonical protein residues: 
* `"ALA","CYS","ASP","GLU","PHE","GLY", "HIS","ILE","LYS","LEU","MET","ASN", "PRO","GLN","ARG","SER","THR","VAL", "TRP","TYR"` 

as a basic sanitization procedure, we do not replace non-canonical amino acids, which will be left as a hole if present.

Also, do not submit a C-alpha only structure or a structure with no sidechain or heavily stubbed sidechain (particularly common practice in cryoEM!). For simplicity, we also do not do biological assembly and users should look out for rna binding at the interface of assembled unit proteins.

The pdb files has to be put into `../GoogleColab/` if you are using this notebook. The output is stored in a user designated folder e.g. `../GoogleColab/ServerOutputV1p1/`

In [1]:
from google.colab import files
import os

uploaded = files.upload() #@markdown Upload a pdb file of your choice
uploaded_filename = list(uploaded.keys())[0]
print(uploaded_filename)

import os
import shutil
uploaded_suffix = uploaded_filename.split(".")[-1]
shutil.move(uploaded_filename, "upload.%s" %(uploaded_suffix))




KeyboardInterrupt: ignored

# Install Conda and Dependencies

This step regularly takes 1h . A tiny green arrow appears on the line number indicating which line it is stucked at. 
* TODO Specify all dependency with a {}.yaml file s.t. it does not take long to resolve dependency
-[x] Make a NucleicNetLite repo without data to reduce the time to git clone
-[ ] dssp

In [1]:


#@markdown Please execute this cell by pressing the _Play_ button 
#@markdown on the left to download and import third-party software 
#@markdown in this Colab notebook. 

#@markdown **Note**: This installs the software on the Colab 
#@markdown notebook in the cloud and not on your computer.


from IPython.utils import io
import os
import subprocess
import tqdm.notebook

# This is for your safety
try:
  from google.colab import files
  IN_COLAB = True
except:
  IN_COLAB = False

import jax
if jax.local_devices()[0].platform == 'tpu':
  raise RuntimeError('Colab TPU runtime not supported. Change it to GPU via Runtime -> Change Runtime Type -> Hardware accelerator -> GPU.')
elif jax.local_devices()[0].platform == 'cpu':
  raise RuntimeError('Colab CPU runtime not supported. Change it to GPU via Runtime -> Change Runtime Type -> Hardware accelerator -> GPU.')
else:
  print(f'Running with {jax.local_devices()[0].device_kind} GPU')


#GPU count and name  go to 'Runtime > change runtime type > Hardware Accelerator > GPU'
if IN_COLAB:


  # =================================
  # Hardware
  # =================================
  #hard disk space that we can use
  !echo -e 'HARDDISK Space Remaining'
  !df -h / | awk '{print $4}'
  #memory that we can use
  !echo -e 'RAM Space Remaining'
  !free -h --si | awk  '/Mem:/{print $2}'

  !echo -e 'NVIDIA Info'
  !nvidia-smi -L
  #!nvidia-smi  -q -i 0 -d CLOCK
  #!nvidia-smi  -q -i 0 -d SUPPORTED_CLOCKS
  !nvidia-smi --auto-boost-default=ENABLED -i 0
  !nvidia-smi -pm ENABLED -i 0
  !nvidia-smi -ac 2505,875 -i 0




  # ========================
  # Get python
  # =========================
  %shell pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib
  %shell eval "$(conda shell.bash hook)" # copy conda command to shell
  %shell rm -rf /opt/conda # NOTE Dangerous. Luckily mine is a windows.

  # NOTE This get a new anaconda for you
  %shell wget -q -P /tmp \
    https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \
      && bash /tmp/Miniconda3-latest-Linux-x86_64.sh -b -p /opt/conda \
      && rm /tmp/Miniconda3-latest-Linux-x86_64.sh


  PATH=%env PATH
  %env PATH=/opt/conda/bin:{PATH}
  %shell conda update -y conda
  %shell conda init bash
  #"""
  %shell conda install -y pytorch-lightning=1.5.7 pytorch=1.10.1 torchvision=0.11.2 torchaudio=0.10.1 torchmetrics=0.6.2 cudatoolkit=11.3 -c pytorch -c conda-forge
  %shell conda install -y -c conda-forge nbformat ipywidgets psutil=5.9.0 tqdm  numpy=1.21.2 scipy=1.7.3 pandas=1.3.5 networkx=2.6.3 pygraphviz=1.7 scikit-learn=1.0.2 plotly=5.5.0 seaborn=0.11.2 matplotlib=3.5.1 biopandas=0.2.9
  %shell conda install -y -c anaconda ipywidgets>=7.0.0 nbformat>=4.2.0
  #"""



  # ========================
  # Download NN
  # ========================
  %shell which python
  %shell rm -rf NucleicNetLite/
  from google.colab import drive
  drive.mount("/content/drive")
  %shell git clone https://github.com/jhmlam/NucleicNetLite.git
  #%shell conda-env create -n NucleicNet -f ./NucleicNetLite/NucleicNetLite.yml



  # ==========================
  # Download pymol
  # ==========================
  %shell apt-get install pymol
  %shell pymol -cq

  # =========================
  # Finish installation
  # =========================


  # Create a ramdisk to store a database chunk to make Jackhmmer run fast.
  #%shell sudo mkdir -m 777 --parents /tmp/ramdisk
  #%shell sudo mount -t tmpfs -o size=9G ramdisk /tmp/ramdisk
# ========================
# Finish Installation
# ==========================


Running with Tesla T4 GPU
HARDDISK Space Remaining
Avail
38G
RAM Space Remaining
13G
NVIDIA Info
GPU 0: Tesla T4 (UUID: GPU-f79d4019-5b25-5261-8337-97d923dbd04c)
Enabling/disabling default auto boosted clocks is not supported for GPU: 00000000:00:04.0.
All done.
Enabled persistence mode for GPU 00000000:00:04.0.
All done.
Specified clock combination "(MEM 2505, SM 875)" is not supported for GPU 00000000:00:04.0. Run 'nvidia-smi -q -d SUPPORTED_CLOCKS' to see list of supported clock combinations
All done.
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting google-api-python-client
  Downloading google_api_python_client-2.55.0-py2.py3-none-any.whl (8.8 MB)
[K     |████████████████████████████████| 8.8 MB 4.1 MB/s 
Collecting google-auth-httplib2
  Downloading google_auth_httplib2-0.1.0-py2.py3-none-any.whl (9.3 kB)
Collecting google-auth-oauthlib
  Downloading google_auth_oauthlib-0.5.2-py2.py3-none-any.whl (19 kB)
Installing col

In [2]:
#%shell conda-env export -n base > NucleicNetLite.yml
#%shell cat NucleicNetLite.yml
#2. Commit the yml file, git clone the repo onto the target OS, and create a conda environment from it as follows:
#conda env create -f environment.yml

name: base
channels:
  - pytorch
  - anaconda
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - _openmp_mutex=5.1=1_gnu
  - absl-py=1.2.0=pyhd8ed1ab_0
  - aiohttp=3.8.1=py39hb9d737c_1
  - aiosignal=1.2.0=pyhd8ed1ab_0
  - argon2-cffi=21.3.0=pyhd8ed1ab_0
  - argon2-cffi-bindings=21.2.0=py39hb9d737c_2
  - asttokens=2.0.5=pyhd8ed1ab_0
  - async-timeout=4.0.2=pyhd8ed1ab_0
  - atk-1.0=2.36.0=h516909a_2
  - attrs=22.1.0=pyh71513ae_0
  - backcall=0.2.0=pyh9f0ad1d_0
  - backports=1.0=py_2
  - backports.functools_lru_cache=1.6.4=pyhd8ed1ab_0
  - beautifulsoup4=4.11.1=pyha770c72_0
  - biopandas=0.2.9=pyhd8ed1ab_0
  - blas=1.0=mkl
  - bleach=5.0.1=pyhd8ed1ab_0
  - blinker=1.4=py_1
  - bottleneck=1.3.4=py39hd257fcd_1
  - brotli=1.0.9=h166bdaf_7
  - brotli-bin=1.0.9=h166bdaf_7
  - brotlipy=0.7.0=py39h27cfd23_1003
  - bzip2=1.0.8=h7f98852_4
  - c-ares=1.18.1=h7f98852_0
  - ca-certificates=2022.4.26=h06a4308_0
  - cachetools=5.0.0=pyhd8ed1ab_0
  - cairo=1.16.0=h18b612c_1001
  -



# Install NucleicNet with git

In [2]:
%%bash
ls NucleicNetLite/GoogleColab/
cd NucleicNetLite/NucleicNet/util/ && tar -zxvf feature-3.1.0.tar.gz && cd ../../../
chmod -R 777 NucleicNetLite/NucleicNet/util/feature-3.1.0/
chmod -R 777 NucleicNetLite/

# Test on the usage of feature and dssp.
NucleicNetLite/NucleicNet/util/feature-3.1.0/bin/featurize
NucleicNetLite/NucleicNet/util/dssp




cd ../../


Colab00_NucleicNetApplication (4).ipynb
Colab00_NucleicNetApplication (9).ipynb
command_FinishInstallColabTorch.py
command_TestColab.py
README.md
feature-3.1.0/
feature-3.1.0/tools/
feature-3.1.0/tools/bin/
feature-3.1.0/tools/bin/lisp2model.pl
feature-3.1.0/tools/bin/hits2tab
feature-3.1.0/tools/bin/viewpdb
feature-3.1.0/tools/bin/featurestoarff.py
feature-3.1.0/tools/bin/stat2xml
feature-3.1.0/tools/bin/stat2score
feature-3.1.0/tools/bin/pointfilter.py
feature-3.1.0/tools/bin/ploteval
feature-3.1.0/tools/bin/hits2xml
feature-3.1.0/tools/bin/briefscore.pl
feature-3.1.0/tools/bin/hitfinder.py
feature-3.1.0/tools/bin/mygetsequence.py
feature-3.1.0/tools/bin/pickrandom.py
feature-3.1.0/tools/bin/protein_amber.py
feature-3.1.0/tools/bin/convert_files.py
feature-3.1.0/tools/bin/makeAmberParmsFile.py
feature-3.1.0/tools/bin/atomselector.py
feature-3.1.0/tools/bin/getpdbnr
feature-3.1.0/tools/bin/score2tab
feature-3.1.0/tools/bin/tab2site
feature-3.1.0/tools/bin/viewpdbinfo
feature-3.1.0/too

Error: ERROR: No PDB IDs requested
DSSP 2.0.4 options:
  -h [ --help ]         Display help message
  -i [ --input ] arg    Input file
  -o [ --output ] arg   Output file, use 'stdout' to output to screen
  -v [ --verbose ]      Verbose output
  --version             Print version
  -d [ --debug ] arg    Debug level (for even more verbose output)


Examples: 

To calculate the secondary structure for the file 1crn.pdb and
write the result to a file called 1crn.dssp, you type:

  dssp.exe -i 1crn.pdb -o 1crn.dssp



In [3]:
%%bash
set -e
cd NucleicNetLite/GoogleColab/
echo -e 'Testing Installation A'
#sed -i 's/import scikit-learn/import sklearn/g' command_FinishInstallColabTorch.py
#conda init bash
#conda activate base
/opt/conda/bin/python command_FinishInstallColabTorch.py



echo -e 'Testing Installation B'
sed 's#GoogleColab#InstallationTest#g' command_TestColab.py > command_FinishInstallTestColab.py
sed -i 's#ServerOutputV1p1#ServerOutputColab#g'  command_FinishInstallTestColab.py

/opt/conda/bin/python command_FinishInstallTestColab.py
# TODO get a test folder

cd ../..

Testing Installation A
Torch Test succeeded
Testing Installation B
True


In [43]:
%%bash
set -e
##%shell eval "$(conda shell.bash hook)" # copy conda command to shell
#%shell which python
source activate python ./NucleicNetLite/GoogleColab/command_FinishInstallColabTorch.py

Traceback (most recent call last):
  File "/content/./NucleicNetLite/GoogleColab/command_FinishInstallColabTorch.py", line 1, in <module>
    import torch
ModuleNotFoundError: No module named 'torch'


In [None]:
# Make sure everything we need is on the path.
import sys
sys.path.append('/content/NucleicNet')

In [None]:

from distutils.sysconfig import get_python_lib
print(get_python_lib())
import sys
sys.path.append('/content/NucleicNet')



# FAQ