## SLURM Jobs and GPU Usage
<!-- <font color='purple'>SLURM Jobs and GPU Usage</font> -->

In this workshop, we'll leverage the power of Quest GPU nodes to run our open-source LLMs. To do so, please use the temporary Quest allocation: <font color='purple'>__e32337__</font>.

Afterwards, you can request your own Quest allocation here: https://www.it.northwestern.edu/departments/it-services-support/research/computing/quest/general-access-allocation-types.html.   

### <font color='purple'>_Understanding Parallel Computing_</font>

The purpose of running our LLMs on GPU nodes is largely to speed up processing.  In order to understand this, you'll often hear us talk about <font color='purple'>__CPUs__</font>, <font color='purple'>__GPUs__</font>, and <font color='purple'>__CUDA__</font>.  This section breaks down these terms.

#### <font color='purple'>__CPUs__</font>
 Much like your own computer, some of our KLC and Quest nodes are equipped with both processors and graphics cards. A processor or <font color='purple'>__central processing unit (CPU)__</font> is responsible for all the mathematical and logical calculations on a node. In a nutshell, it runs code. While CPUs are extremely powerful and complete most tasks in an infinitesimally short amount of time, a CPU core can only handle one task at a time and runs things __sequentially__.

<img src="images/cpu_sequential.png" width="500" height="175" style="display: block; margin-left: auto; margin-right: auto;">


One way to speed up processing is through <font color='purple'>_parallel computing_</font>. Parallel computing is a method of solving a single problem by breaking it down into smaller chunks that run __simultaneously__.  This essentially breaks up a task and distributes it over multiple CPUs.  



<img src="images/cpu_parallel.png" width="350" height="300" style="display: block; margin-left: auto; margin-right: auto;">

#### <font color='purple'>__GPUs__</font>
A graphics card or <font color='purple'>__graphics processing unit (GPU)__</font> is a specialized hardware component that can efficiently handle parallel mathematical operations. In comparison to the 24 cores you can use on KLC, a A100 GPU contains 6,912 <font color='purple'>CUDA</font> cores.  While a CUDA core is less powerful than an individual CPU core, their sheer volume make them ideal for handling large amounts of computations in parallel. 

<img src="images/gpu.png" width="350" height="300" style="display: block; margin-left: auto; margin-right: auto;">

#### <font color='purple'>__Limitations of Parallel Computing__</font>
If GPUs are so much better at parallelization than CPUs, why aren't all tasks given to GPUs?  While parallelizing tasks often makes them run more quickly, it is important to note that some tasks actually take longer if parallelized.  For instance, searching a sorted list for a specific value might run faster without parallelization.  If you take a list and divide it among multiple CPUs, each CPU will search a portion of that list for the given value. Once each individual CPU search is complete, the different cores then have to communicate and coordinate with one another to determine the final answer.  If the list is relatively small, the coordination across cores might actually take longer than having a single CPU core complete the task alone. 

#### <font color='purple'>__CUDA__</font>
The potential inefficiency of parallelization raises the question of how your system knows when to send a task to CPUs or to GPUs? This is where <font color='purple'>__CUDA__</font> comes in.  <font color='purple'>__CUDA (compute unified device architecture)__</font> is a powerful platform that helps computer programs run faster. On the GPU nodes, we use it to solve performance intensive problems by optimizing when to allocate certains tasks to CPU processing or GPU processing. 

In the image below, CUDA determines which tasks to delegate to GPUs or to CPUs.  

<img src="images/giffy_gif.gif" style="display: block; margin-left: auto; margin-right: auto;">


### <font color='purple'>_Sample GPU Python Code_</font>

To get started with the GPU nodes, here is a sample Python script. The code below allows you to test whether GPUs are available on a node and runs tensors. This file is located in <font color='purple'>__/kellogg/admin/gpu_test_files/pytorch_gpu_test.py__</font>.

In [None]:
# pytorch_gpu_test.py

import torch

# Check if CUDA is available
if torch.cuda.is_available():
    print("CUDA is available!")
    print("Number of GPUs available:", torch.cuda.device_count())
    print("GPU:", torch.cuda.get_device_name(0))
else:
    print("CUDA is not available.")

# Check if CUDA is available and set the device accordingly
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Print whether a GPU or CPU is being used
if device.type == 'cuda':
    print("Using GPU")
else:
    print("Using CPU")

# Create two random tensors
tensor1 = torch.randn(1000, 1000, device=device)
tensor2 = torch.randn(1000, 1000, device=device)

# Add the two tensors, the operation will be performed on the GPU if available
result = tensor1 + tensor2

print(result)

### <font color='purple'>_SLURM Script to Access GPU Nodes_</font>

For this workshop, we'll submit jobs to the Quest GPU nodes through a <font color='purple'>SLURM</font> (scheduler) script. You can launch the sample python code using this script. 

In [None]:
#!/bin/bash

#SBATCH --account=e32337
#SBATCH --partition gengpu
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --gres=gpu:a100:1
#SBATCH --time 0:30:00
#SBATCH --mem=40G

module purge
module load python
source activate /kellogg/software/envs/gpu-pytorch
python pytorch_gpu_test.py


Breaking down this script: 

- `--account=e32337` is the Quest allocation you are given.
- `--partition=gengpu` directs you to GPU nodes on the Quest Genomics Cluster
- `--nodes=1` specifies that the job will be run on 1 node of the cluster. 
- `--ntasks-per-node=1` this line specifies how many cores of the node you will use. Setting `--ntasks-per-node=2` will run your script on two cores of the node. Only adjust this parameter if your code is parallelizable, otherwise it will slow down the processing speed. 
- `--gres=gpu:a100:1` GRES stand for `Generic Resources'. This line specifies that the job requires 1 GPU of type "a100".  By excluding this line, your job will be restricted to CPU resources.
- `--time==00:30:00` indicates that this job will be allowed to run for up to 30 minutes.
- `--mem` specifies how much memory you are requesting. 

After accessing the GPU node, the script loads python and activates the <font color='purple'>__gpu-pytorch__</font> environment.  Finally it launches the python code. You can run this script with: 

In [None]:
./gpu_pytorch_test.sh

After submitting your job, you will receive a job ID, like: 

<font color='purple'>__Submitted batch job 9428806__</font> 

Once the script runs to completion, the results will be saved to a __slurm-[job_iD].out__ file, (like __slurm-9428806.out__) in your current directory. 

### <font color='purple'>_Reference Sources_</font>

- [Cuda Simply Explained](youtube.com/watch?v=r9IqwpMR9TE)
- [Understanding Parallel Computing](https://blog.paperspace.com/demystifying-parallel-computing-gpu-vs-cpu-explained-simply-with-cuda/)
