<br></br>
<font size=20><center>**High Performance Computing**
<img src="./imgs/ua_logo.png" width="500"></img></center></font>
<br></br>
## A Quick Introduction
<p>Nicholas Schiraldi, PhD<br>
13 Feb 2020
</p>

<br></br>
<font size=20>Why should I be interested in High Performance Computing (HPC)?</font>

- Research problems that use computing can outgrow the desktop or laptop computer where they started

- Genomics research and other bioinformatics can require not *too* much memory (10s of gbs) <!-- <center>![](./imgs/200px-laptop-openclipartorg-aoguerrero.png)</center> -->
... or a *ton* of memory (hundreds of GBs)
 <!--<center><img src="./imgs/servers-openclipartorg-ericlemerdy.png" width="100"></img></center>-->
- Some examples
  - ongoing work with Belfort Lab -> ~350 GB of Memory
  - ongoing work with Turnerlab -> 500GB - 1TB memory
  
*Images courtesy of https://hpc-carpentry.github.io/hpc-intro/


<br></br>
<font size=20>Why should I be interested in High Performance Computing (HPC)?</font>

<center><img src="./imgs/biz-niz-man-make-it-rain-meme.jpg" width=500></img></center>

- Much of the job growth in sciences is in data science, of which HPC is the foundation
- If you are proficient in the lab, and proficient in data analytics, you're much more marketable to the private sector

<font size=20>Why should I be interested in High Performance Computing (HPC)?</font>

<center><img src="./imgs/paper_capture.PNG" width=800><img></center>

- HPC enables cross disciplinary publications, and as someone familar with HPC you can contribute to efforts outside fo your discipline

<font size=20>HPC at UAlbany</font>
- What resources are available to you?
  - 22 nodes (expanding to 37 soon) of varying architecture
  - 2 Nvidia K80 GPUs
  - Rstudio, Jupyterhub, etc.
  - Other software delivered via module files or singularity environments
  - gitlab for source control and private repositories 
  - 10TB of shared space in your advisor's research directory
  

<font size=20>HPC at UAlbany</font>
<center><img src="./imgs/cluster_snapshot.PNG"></img>

<font size=20>HPC at UAlbany</font>
- Access is provided via SSH to a headnode (see tutorials at rit.albany.edu)
- Allocations are governed by SLURM, an open source resource manager
  - You can spawn an interactive shell, or submit a job to run
- The maximum wall time is 14 days, but we can make accommodations when necessary
- We do allow using the cluster for classroom use, we have ways to guarantee availability when needed

<font size=20>HPC at UAlbany - Things to Remember</font>
- Always work in your shared lab directory, your home directory is only 10GB !!!
- NEVER, never, *never*, **never**, **execute code on the headnode**
- Be a good HPC citizen, use only what you node for the amount of time you need it
- You will never recieve sudo permissions, 99% of the time you do not need them (more later)
- If you are unsure about how to schedule a job or what you are doing, email RTS@albany.edu


<font size=20>HPC at UAlbany - Software</font>
- Much software is precompiled in the path /network/rit/misc/software
  - Of which, the most up to date versions are available via module files
    - <code> module avail </code> 
    - <code> module load gromacs/2019.4-gpu </code>
    - <code> module load openmpi/3.1.4 </code>
    - <code> module load abyss/2.2.3</code>
- You can install any additional software you need to your lab directory, and add it to your path in ```~/.bash_profile``` or ```~/.bashrc```
  - If you think the software will be widely used, we are happy to create a modulefile for it!

<font size=20>HPC at UAlbany - Software - A note on Anaconda </font>
- Anaconda is a fantastic package manager, that is usually used in full stack python engineering
- Anaconda allows you to create environments for your code which **encourage reproducibility**
- But, **Anaconda is more than that**
<center><img src="./imgs/bioconda.PNG"width=600></img></center>
- Many software libraries in biology are prebuilt in Anaconda, regardless of whether or not they use python!
  - If you're not sure how to compile a CMakefile or untar a .tar.gz, check for a prebuilt version in anaconda!


<font size=20>HPC at UAlbany - Scheduling a Job</font>
- Follow the guide on how to connect to the headnode (https://wiki.albany.edu/pages/viewpage.action?pageId=77894732)
- To schedule an interactive shell
  - read our guide on how to use screen/tmux to preserve interactive sessions (https://wiki.albany.edu/display/rit/How-to%3A+Using+screen+or+tmux+to+preserve+a+Linux+terminal+session)
  - `srun -p batch -n 1 -N 1 --cpus-per-task=40 --mem=60G --constraint=avx512|avx --x11 --pty  $SHELL -i`
- To submit a job for execution
```bash 
  #!/bin/bash
  # ~/myjob.sbatch
  #SBATCH -N 1                    # One node
  #SBATCH -n 1                    # One task on that node
  #SBATCH -t 03-00                # Run for 3 days               
  #SBATCH --cpus-per-task=40      # 40 cpus per task
  #SBATCH --mem=60G               # 60 gb of memory
  #SBATCH --constraint=avx512|avx # use the latest CPU architectures if availble, fall back to older architectures
  #SBATCH -o myjob.%j.log         # output files
  #SBATCH --mail-type=ALL
  #SBATCH --mail-type=ALL
  
  # Start code below
  module load openmpi/3.1.4
  module load abyss/2.2.3
  
  ABYSS-P [args]  
```

`sbatch myjob.sbatch`

<font size=20>HPC at UAlbany - Research Reproducibility</font>
- Plan for how to make your data accessbile -- UAlbany Scholar's archive is  great place to let data live (although there are limits)!
- Plan for how to make your code accessible
    - check your code into gitlab (if you need private repos for free) or github (if you want access to a broader audience). Seriously, it's not as scary as it seems
  - `git clone [repo]`
  - `git init`
  - `git add [args]`
  - `git commit -m [statement]`
  - `git push -u origin master`
  - `git pull`
- Spend time inserting comments and documenting workflows
- Python? Use conda environments
- R? Use packrat
- Something else? github or singularity + github!


<font size=20>HPC at UAlbany - A note on singularity</font>
- Singularity is a container service, similar to docker, but we can't allow access to docker on our systems
- It allows you to build a "container", think of it like a computer within a computer, that can be distributed to all
- Imagine a world where bioinformaticians allow folks to run there code with one command
  - singularity makes this possible!
- Has someone published a docker container you need to run? Singularity makes it possible!
- Want to great a uniform image for your research group, to ensure everyone is using the same computing environment?
  - Singularity makes this possible!