# Installations
This notebook outlines installation process we will also decompress volumes of datasets stored in `../Database-PDB/`. 





## Hardware Requirement
Below is a tested min spec on hardware. To be updated. We only support Linux machines.
* System. Ubuntu 20.04.3 LTS
* Processor. AMD Opteron(tm) processor 6376 Ã— 32 
* Storage. More than 500 GB storage
* RAM. 32 GB 
* Nvidia GPU. On par with a GeForce GTX 1070


## Software Requirement
We have a few dependencies stored in NucleicNet/util/
* FEATURE 3.1.0 depends on GNU GCC, GNU make, and zlib. Follow the INSTALL document there. You may need "sudo apt-get install libz-dev" on ubuntu
* dssp. This is precompiled

## Optional Software
* Gephi. If graphs are to be visualised in gephi.
> > sudo apt install openjdk-8-jdk 
> > export JAVA_HOME=/usr/lib/jvm/java-8-openjdk








# Dataset
In case where users are only interested in the labels and corresponding sanitised coordinates, we also provides ways to retrieve it. In brief, 
* `halo/{pdbid}{conformationid}.haloxyz` stores coordinates of where predictions is made. See Notebook 02 for definition on `halo`. This is stored in standard [XYZ format](https://openbabel.org/wiki/XYZ_(format)).
* `typi/{pdbid}{conformationid}.typi` stores the corresponding class labels at each of the coordinates in `halo`. See Notebook 02 for definitions of classes.

However, if the Altman features are desired, they have to be generated by users as it takes >180GB. Follow Notebook03 for instructions on how to generate it. After the decompression, we should expect the following storage requirement in `../Database-PDB/`:

* 4.6G	./apo
* 59G	./halo
* 1.1G	./dssp
* 938M	./typi
* 16G	./cleansed
* 185G	./feature    (Not included, but we provide codes to generate them.)
* 2.3G	./landmark
* 2.2G  ./3CvFoldReference_SXPR_*

Remarks for maintenance staff. The compression is done with e.g. `tar cvzf - halo/ | split --bytes=24MB - halo.tar.gz.` or `tar cvzf - 3CvFoldReference_SXPR_BC* | split --bytes=24MB - 3CvFoldReference_SXPR_BC.tar.gz.`.

In [None]:
import subprocess
# NOTE Install
subprocess.call("conda create --name Nucl python=3.8.10", shell = True)
subprocess.call("conda activate Nucl", shell = True)
subprocess.call("conda install -y pytorch-lightning=1.5.7 pytorch=1.10.1 torchvision=0.11.2 torchaudio=0.10.1 torchmetrics=0.6.2 cudatoolkit=11.3 -c pytorch -c conda-forge", shell = True)
subprocess.call("conda install -y -c conda-forge nbformat ipywidgets psutil=5.9.0 tqdm  numpy=1.21.2 scipy=1.7.3 pandas=1.3.5 networkx=2.6.3 pygraphviz=1.7 scikit-learn=1.0.2 plotly=5.5.0 seaborn=0.11.2 matplotlib=3.5.1 fpocket=4.0.0 biopandas=0.2.9", shell = True)
subprocess.call("conda install -y -c anaconda ipywidgets>=7.0.0 nbformat>=4.2.0")


# NOTE Decompress halo and pdbs
subprocess.call("cd ../Database-PDB/halo/ && cat halo.tar.gz.* | tar zxvf - --strip-components 1 && cd ../../Notebooks")
subprocess.call("cd ../Database-PDB/typi/ && cat typi.tar.gz.* | tar zxvf - --strip-components 1 && cd ../../Notebooks")
subprocess.call("cd ../Database-PDB/cleansed/ && cat cleansed.tar.gz.* | tar zxvf - --strip-components 1 && cd ../../Notebooks")
subprocess.call("cd ../Database-PDB/landmark/ && cat landmark.tar.gz.* | tar zxvf - --strip-components 1 && cd ../../Notebooks")
subprocess.call("cd ../Database-PDB/apo/ && cat apo.tar.gz.* | tar zxvf - --strip-components 1 && cd ../../Notebooks")
subprocess.call("cd ../Database-PDB/dssp/ && cat dssp.tar.gz.* | tar zxvf - --strip-components 1 && cd ../../Notebooks")


# NOTE Decompress halo indexings
subprocess.call("cd ../Database-PDB/DerivedData/ && cat 3CvFoldReference_SXPR_BC.tar.gz.* | tar zxvf - && cd ../../Notebooks")
subprocess.call("cd ../Database-PDB/DerivedData/ && cat 3CvFoldReference_SXPR_Mmseq.tar.gz.* | tar zxvf - && cd ../../Notebooks")


# NOTE Executables permission
subprocess.call("cd ../NucleicNet/util/ && tar -zxvf feature-3.1.0.tar.gz && cd ../../Notebooks")
subprocess.call("cd ../ && chmod -R +x ./NucleicNet/util/dssp && cd ./Notebooks")
subprocess.call("cd ../ && chmod -R +x ./NucleicNet/util/feature-3.1.0/ && cd ./Notebooks")
subprocess.call("cd ../ && chmod -R +x ./Database-PDB/ && cd ./Notebooks")

## Test Your GPU
We will see if the pytorch installed can detect your GPU. We require the use of GPU. 
 

In [4]:
import torch

print(torch.__version__)
print(torch.cuda.device_count())



1.10.1
1
