# HPC Workshop

## Getting Started: Log into Greene HPC

### With NYU VPN on (Recommended)
1. Install NYU VPN if you haven't already:  
   Mac: https://nyu.service-now.com/sp?sys_kb_id=a6be768b1c8dd504bbcf4dc2835ec355&id=kb_article_view&sysparm_rank=5&sysparm_tsqueryId=205010939313de10e8637d4efaba1002  
   Windows: https://nyu.service-now.com/sp?sys_kb_id=6177d7031c811904bbcf4dc2835ec340&id=kb_article_view&sysparm_rank=4&sysparm_tsqueryId=7660d0939313de10e8637d4efaba10de
2. Turn on NYU VPN
3. Open a terminal window, log in with the following command:  
    `ssh <NetID>@greene.hpc.nyu.edu`
4. Enter your password and hit Enter. If you see `Permission Denied`, let us know with your NetID to grant you access. 

### Without NYU VPN
1. Open a terminal window, log into the gateway first with the following command:  
   `ssh <NetID>@gw.hpc.nyu.edu`
2. Then inside the gateway, log into Greene with:  
   `ssh <NetID>@greene.hpc.nyu.edu`

### Graphical interface (Open OnDemand)
To access OOD visit: https://ood.hpc.nyu.edu (VPN Required). You do need to log into Greene in terminal at least once to initialize your home directory for OOD to work. 

## Modules and Slurm

### Modules
The HPC has a lot of softwares preinstalled that can be used using `module` commands. Here's some basic commands:  
**Modules basic commands**
* `module load <module-name>`	: load a module
* `module unload <module-name>`	: unload a module
* `module show <module-name>`	: see exactly what effect loading the module will have with
* `module purge`			: remove all loaded modules from your environment
* `module list`			: check which modules are currently loaded in your environment
* `module avail`			: check what software packages are available

### Slurm
Slurm software system is a resource manager and a job scheduler, which is designed to allocate resources and schedule jobs.  
**Slurm basic commands**
* `sbatch <job-script>`: submit a job
* `squeue --me`: check jobs you submitted
* `scancel <job-id>`: cancel a job
* `srun <arguments>`: starts a interactive job
* `sinfo`: check cluster status


## Create conda enviroment with Jupyter on HPC
1. Load anaconda with `module load anaconda3/2024.02`
2. Create symbolic link
    * You need to create a symbolic link, so conda will download files for packages to be installed into scratch, not your home directory. By doing the following: 
    * `mkdir /home/<NetID>/.conda`
    * `mkdir /scratch/<NetID>/conda_pkgs`
    * `ln -s /scratch/<NetID>/conda_pkgs /home/<NetID>/.conda/pkgs`
3. Create your conda environment:  
   `conda create -p /scratch/<NetID>/<env_name> python=3.9 jupyter tqdm pandas scikit-learn matplotlib`
4. Submit a job that activates the environment you just built, does the port forwarding, and starts a jupyter session. See a sample job script below:

```bash
#!/bin/bash
#SBATCH --job-name jupyter-notebook
#SBATCH --output jupyter-notebook-%J.log
#SBATCH --nodes=1
#SBATCH --cpus-per-task=1
#SBATCH --ntasks-per-node=1
#SBATCH --mem=8GB
#SBATCH --time=48:00:00
#SBATCH --gres=gpu:1

# activate your environment
module purge
module load anaconda3/2024.02
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
source /share/apps/anaconda3/2024.02/etc/profile.d/conda.sh
conda activate /scratch/hz1975/dl4med_25
export PATH=/scratch/hz1975/dl4med_25/bin:$PATH

# get tunneling info
XDG_RUNTIME_DIR=""
port=$(shuf -i8000-9999 -n1)
node=$(hostname -s)
user=$(whoami)

# print tunneling instructions jupyter-log
echo -e "
Terminal command to create your ssh tunnel
ssh -N -L ${port}:${node}:${port} ${user}@greene.hpc.nyu.edu
Windows MobaXterm info
Forwarded port:same as remote port
Remote server: ${node}
Remote port: ${port}
SSH server: greene.hpc.nyu.edu
SSH login: $user
SSH port: 22
Use a Browser on your local machine to go to:
localhost:${port}/lab?token=${token}  (prefix w/ https:// if using password)
"

jupyter notebook --no-browser --port=${port} --ip=${node}
```


---

Once the job runs, a log file named `jupyter-notebook-%J.log` will be outputted. This script can also be adapted to run your model training code. 

5. Open the log file, copy and paste the terminal command into a new terminal window and run it.
6. Scroll down the log file, and you'll see a link that opens the jupyter looking like:
   `http://127.0.0.1:<port>/tree?token=<token>`
7. Now you should see the jupyter page. We need to install PyTorch inside our environment. To do so, open a terminal session within Jupyter by clicking New -> Terminal.
8. Install PyTorch with `pip3 install torch torchvision torchaudio`
9. Verify that torch works using `import torch; torch.cuda.is_available()` in Python

Alternatively, you can set up your enviroments in a container using Singularity. Follow the tutorial here: https://sites.google.com/nyu.edu/nyu-hpc/hpc-systems/greene/software/singularity-with-miniconda?authuser=0

## Resources:
NYU Greene HPC: https://sites.google.com/nyu.edu/nyu-hpc/home?authuser=0  
Getting started: https://sites.google.com/nyu.edu/nyu-hpc/hpc-systems/greene/getting-started?authuser=0  
Slurm tutorial: https://sites.google.com/nyu.edu/nyu-hpc/training-support/tutorials/slurm-tutorial?authuser=0  
Slurm commands: https://sites.google.com/nyu.edu/nyu-hpc/training-support/general-hpc-topics/slurm-main-commands?authuser=0  
Conda environments: https://sites.google.com/nyu.edu/nyu-hpc/hpc-systems/greene/software/conda-environments-python-r?authuser=0  
Singularity + Conda: https://sites.google.com/nyu.edu/nyu-hpc/hpc-systems/greene/software/singularity-with-miniconda?authuser=0  
Using containers on HPC: https://docs.google.com/presentation/d/1BG5JaMdwUkcSn887Q-cSf7M-5nQ9a7Dv3NQ0q6EKyqY/present?slide=id.g10f3178a1ff_0_63  
