[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/XYSSDFFSDFS)  

# Running NucleicNet on Colab with GPU

This notebook illustrates a basic benchmark on Google Colab. In general a Google Colab free user will have access to the following computing resource

* CPU. 2-core Intel(R) Xeon(R) @ 2.20GHz Family 6
* RAM. 37G
* GPU. Tesla K80 with 13G memory if you get one. We only support sessions with GPU.


Please make sure the GPU is turned on by clicking `Runtime > Change runtime type > Hardware accelerator:GPU > Save`. To run the notebook, click `Runtime > Run all`. 

# Cannot connect to GPU backend

We only support sessions with GPU enabled. If you receive a message `Cannot connect to GPU backend`, which means the GPU session is disabled, please review suggestions by [Google Research](https://research.google.com/colaboratory/faq.html#usage-limits). Free users in general do not have priority in using the GPU and may hit user limit. `Colab does not publish these limits, in part because they can (and sometimes do) vary quickly.GPUs and TPUs are sometimes prioritized for users who use Colab interactively rather than for long-running computations, or for users who have recently used less resources in Colab. As a result, users who use Colab for long-running computations, or users who have recently used more resources in Colab, are more likely to run into usage limits and have their access to GPUs and TPUs temporarily restricted.` We do not bear any responsibility in financing the user for any of their paid Google Colab sessions. Do not send us invoice! Try another day.


# Acknowledgement

We would like to acknowledge [AlphaFold](https://colab.research.google.com/github/deepmind/alphafold/blob/main/notebooks/AlphaFold.ipynb#scrollTo=VzJ5iMjTtoZw) and [ColabFold](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/beta/AlphaFold2_advanced.ipynb#scrollTo=woIxeCPygt7K) in formalising routines in setting up Colab.

# Upload a PDB file
Note that user should make sure that the pdb submitted does not contain non-protein parts and that the protein submitted is intended, even though we have script to retain only the following 20 canonical protein residues: 
* `"ALA","CYS","ASP","GLU","PHE","GLY", "HIS","ILE","LYS","LEU","MET","ASN", "PRO","GLN","ARG","SER","THR","VAL", "TRP","TYR"` 

as a basic sanitization procedure, we do not replace non-canonical amino acids, which will be left as a hole if present.

Also, do not submit a C-alpha only structure or a structure with no sidechain or heavily stubbed sidechain (particularly common practice in cryoEM!). For simplicity, we also do not do biological assembly and users should look out for rna binding at the interface of assembled unit proteins.

The pdb files has to be put into `../GoogleColab/` if you are using this notebook. The output is stored in a user designated folder e.g. `../GoogleColab/ServerOutputV1p1/`

In [1]:
from google.colab import files
import os

uploaded = files.upload() #@markdown Upload a pdb file of your choice
uploaded_filename = list(uploaded.keys())[0]
print(uploaded_filename)

import os
import shutil
uploaded_suffix = uploaded_filename.split(".")[-1]
shutil.move(uploaded_filename, "upload.%s" %(uploaded_suffix))




Saving 1aud.pdb to 1aud.pdb
1aud.pdb


'upload.pdb'

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following package was automatically installed and is no longer required:
  libnvidia-common-460
Use 'apt autoremove' to remove it.
The following additional packages will be installed:
  apbs libglew2.0 libmaloc1 pymol-data python-opengl python-pmw
Suggested packages:
  glew-utils libgle3 python-pmw-doc
The following NEW packages will be installed:
  apbs libglew2.0 libmaloc1 pymol pymol-data python-opengl python-pmw
0 upgraded, 7 newly installed, 0 to remove and 49 not upgraded.
Need to get 5,912 kB of archives.
After this operation, 26.3 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 libmaloc1 amd64 0.2-3.1 [48.3 kB]
Get:2 http://archive.ubuntu.com/ubuntu bionic/universe amd64 apbs amd64 1.4-1build1 [218 kB]
Get:3 http://archive.ubuntu.com/ubuntu bionic/universe amd64 libglew2.0 amd64 2.0.0-5 [140 kB]
Get:4 http://archive.ubuntu.co



In [3]:
ls sample_data/

[0m[01;32manscombe.json[0m*                mnist_test.csv
california_housing_test.csv   mnist_train_small.csv
california_housing_train.csv  [01;32mREADME.md[0m*


# Install Conda and Dependencies

This step regularly takes 1h . A tiny green arrow appears on the line number indicating which line it is stucked at. 
* TODO Specify all dependency with a {}.yaml file s.t. it does not take long to resolve dependency
* TODO Make a NucleicNetLite repo without data to reduce the time to git clone


In [2]:


#@markdown Please execute this cell by pressing the _Play_ button 
#@markdown on the left to download and import third-party software 
#@markdown in this Colab notebook. 

#@markdown **Note**: This installs the software on the Colab 
#@markdown notebook in the cloud and not on your computer.


from IPython.utils import io
import os
import subprocess
import tqdm.notebook

# This is for your safety
try:
  from google.colab import files
  IN_COLAB = True
except:
  IN_COLAB = False

import jax
if jax.local_devices()[0].platform == 'tpu':
  raise RuntimeError('Colab TPU runtime not supported. Change it to GPU via Runtime -> Change Runtime Type -> Hardware accelerator -> GPU.')
elif jax.local_devices()[0].platform == 'cpu':
  raise RuntimeError('Colab CPU runtime not supported. Change it to GPU via Runtime -> Change Runtime Type -> Hardware accelerator -> GPU.')
else:
  print(f'Running with {jax.local_devices()[0].device_kind} GPU')


#GPU count and name  go to 'Runtime > change runtime type > Hardware Accelerator > GPU'
if IN_COLAB:

  #hard disk space that we can use
  !df -h / | awk '{print $4}'
  #memory that we can use
  !free -h --si | awk  '/Mem:/{print $2}'
  !nvidia-smi -L
  #!nvidia-smi  -q -i 0 -d CLOCK
  #!nvidia-smi  -q -i 0 -d SUPPORTED_CLOCKS
  !nvidia-smi --auto-boost-default=ENABLED -i 0
  !nvidia-smi -pm ENABLED -i 0
  !nvidia-smi -ac 2505,875 -i 0


  %shell pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib
  %shell eval "$(conda shell.bash hook)" # copy conda command to shell
  %shell rm -rf /opt/conda # NOTE Dangerous. Luckily mine is a windows.

  # NOTE This get a new anaconda for you
  %shell wget -q -P /tmp \
    https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \
      && bash /tmp/Miniconda3-latest-Linux-x86_64.sh -b -p /opt/conda \
      && rm /tmp/Miniconda3-latest-Linux-x86_64.sh


  PATH=%env PATH
  %env PATH=/opt/conda/bin:{PATH}
  %shell conda update -y conda
  %shell conda init bash
  %shell conda install -y pytorch-lightning=1.5.7 pytorch=1.10.1 -c pytorch -c conda-forge
  %shell conda install -y -c conda-forge nbformat ipywidgets psutil=5.9.0 tqdm  numpy=1.21.2 scipy=1.7.3 pandas=1.3.5 scikit-learn=1.0.2 plotly=5.5.0 seaborn=0.11.2 matplotlib=3.5.1 biopandas=0.2.9
  %shell conda install -y -c anaconda ipywidgets>=7.0.0 nbformat>=4.2.0



  
  %shell which python
  %shell rm -rf NucleicNet
  from google.colab import drive
  drive.mount("/content/drive")
  %shell git clone https://github.com/jhmlam/NucleicNetLite.git


  %shell apt-get install pymol




  # Create a ramdisk to store a database chunk to make Jackhmmer run fast.
  #%shell sudo mkdir -m 777 --parents /tmp/ramdisk
  #%shell sudo mount -t tmpfs -o size=9G ramdisk /tmp/ramdisk
# ========================
# Finish Installation
# ==========================


Running with Tesla T4 GPU
Avail
30G
13G
GPU 0: Tesla T4 (UUID: GPU-674bb64b-950c-a081-346f-ad567217d792)
Enabling/disabling default auto boosted clocks is not supported for GPU: 00000000:00:04.0.
All done.
Persistence mode is already Enabled for GPU 00000000:00:04.0.
All done.
Specified clock combination "(MEM 2505, SM 875)" is not supported for GPU 00000000:00:04.0. Run 'nvidia-smi -q -d SUPPORTED_CLOCKS' to see list of supported clock combinations
All done.
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting google-api-python-client
  Using cached google_api_python_client-2.55.0-py2.py3-none-any.whl (8.8 MB)
Collecting google-auth-httplib2
  Using cached google_auth_httplib2-0.1.0-py2.py3-none-any.whl (9.3 kB)
Collecting google-auth-oauthlib
  Using cached google_auth_oauthlib-0.5.2-py2.py3-none-any.whl (19 kB)
Collecting httplib2<1dev,>=0.15.0
  Downloading httplib2-0.20.4-py3-none-any.whl (96 kB)
[2K     [90m━━━━━━━━━━━━━━

# Install NucleicNet with git

In [10]:
%%bash
ls NucleicNetLite/GoogleColab/
source activate base
cd NucleicNetLite/NucleicNet/util/ && tar -zxvf feature-3.1.0.tar.gz && cd ../../../
chmod -R 777 NucleicNetLite/NucleicNet/util/feature-3.1.0/

cd NucleicNetLite/GoogleColab/
echo -e 'Testing Installation A'
sed -i 's/import scikit-learn/import sklearn/g' command_FinishInstallColabTorch.py
python command_FinishInstallColabTorch.py
echo -e 'Testing Installation B'
# TODO get a test folder
cd ../../


command_FinishInstallColabTorch.py
README.md
feature-3.1.0/
feature-3.1.0/tools/
feature-3.1.0/tools/bin/
feature-3.1.0/tools/bin/lisp2model.pl
feature-3.1.0/tools/bin/hits2tab
feature-3.1.0/tools/bin/viewpdb
feature-3.1.0/tools/bin/featurestoarff.py
feature-3.1.0/tools/bin/stat2xml
feature-3.1.0/tools/bin/stat2score
feature-3.1.0/tools/bin/pointfilter.py
feature-3.1.0/tools/bin/ploteval
feature-3.1.0/tools/bin/hits2xml
feature-3.1.0/tools/bin/briefscore.pl
feature-3.1.0/tools/bin/hitfinder.py
feature-3.1.0/tools/bin/mygetsequence.py
feature-3.1.0/tools/bin/pickrandom.py
feature-3.1.0/tools/bin/protein_amber.py
feature-3.1.0/tools/bin/convert_files.py
feature-3.1.0/tools/bin/makeAmberParmsFile.py
feature-3.1.0/tools/bin/atomselector.py
feature-3.1.0/tools/bin/getpdbnr
feature-3.1.0/tools/bin/score2tab
feature-3.1.0/tools/bin/tab2site
feature-3.1.0/tools/bin/viewpdbinfo
feature-3.1.0/tools/bin/normalize.py
feature-3.1.0/tools/bin/nonsitechains.py
feature-3.1.0/tools/bin/site2xml
feature

In [None]:
# Make sure everything we need is on the path.
import sys
sys.path.append('/content/NucleicNet')

In [None]:

from distutils.sysconfig import get_python_lib
print(get_python_lib())
import sys
sys.path.append('/content/NucleicNet')



# FAQ