## SLURM Jobs and GPU Usage
<!-- <font color='purple'>SLURM Jobs and GPU Usage</font> -->

In this workshop, we'll leverage the power of Quest GPU nodes to run our open-source LLMs. To do so, please use the temporary Quest allocation: <font color='purple'>__pXXXXXX__</font>.

Afterwards, you can request your own Quest allocation here: https://www.it.northwestern.edu/departments/it-services-support/research/computing/quest/general-access-allocation-types.html.   

### <font color='purple'>1.) Understanding Parallel Computing</font>

The purpose of running our LLMs on GPU nodes is primarily to speed up calculations and processing.  In order to understand this, you'll often hear us talk about <font color='purple'>CPUs</font>, <font color='purple'>GPUs</font>, and <font color='purple'>CUDA</font>.  This section breaks down these terms.

 Much like your own computer, some of our KLC and Quest nodes are equipped with both processors and graphics cards. A processor or <font color='purple'>central processing unit (CPU)</font> is responsible for all the mathematical and logical calculations on a node. In a nutshell it runs code. While CPUs are extremely powerful and complete most tasks in an infinitesimally short amount of time, it is important to note that a CPU core can only handle one task a a time and runs things sequentially.

<img src="cpu.png" width="500" height="200" style="display: block; margin-left: auto; margin-right: auto;">




One way to speed up processing is through parallel computing. Parallel computing is a method of solving a single problem by breaking it down into smaller chunks that can run simultaneously.  This is essentially taking a task and distributing it over multiple CPUs.  

Conversely, a graphics card or <font color='purple'>graphics processing unit (GPU)</font> is a specialized hardware component that can efficiently handle parallel mathematical operations. In comparison to the 24 cores you are entitled to use on KLC, a A100 GPU contains 6,912 <font color='purple'>CUDA</font> cores.  Whiel a CUDA core is less powerful than an individual CPU core, the sheer volume of them make them ideal for handling large amounts of computations in parallel. If GPUs are so much better at parallelization than CPUs, why aren't all tasks given to GPUs?  

While parallelizing tasks often makes them run more quickly, it is important to note that some tasks actually take longer if parallelized.  For instance, searching a sorted list for a specific value.  If you take a list and divide it among multiple CPUs, each CPU will search a portion of that list for the given value. Once each individual CPU search is complete, the different cores then have to communicate and coordinate with one another to determine the final answer.  If the list is relatively small, the coordination across cores might actually take longer than having a single CPU core compelte the task alone. 

The potentially inefficiency of parallelization raises the question of how your system know when to send a task to CPUs or to GPUs? This is where CUDA comes in.  <font color='purple'>CUDA (compute unified device architecture)</font> is a powerful platform that helps computer programs run faster. We use it to solve performance intensitve problems. cuda is software but also embedded in hardware. CUDA allows you to switch between cpu processing and gpu processing for certain tasks. 

<img src="cuda.png" width="500" height="250" style="display: block; margin-left: auto; margin-right: auto;">


### <font color='purple'>1.) Sample Python Code</font>

To get started with the GPU nodes, here is some sample Python script. The code below allows you to test whether GPUs are available and runs tensors. This file is located in <font color='purple'>__/kellogg/admin/gpu_test_files/pytorch_gpu_test.py__</font>.

In [None]:
# pytorch_gpu_test.py

import torch

# Check if CUDA is available
if torch.cuda.is_available():
    print("CUDA is available!")
    print("Number of GPUs available:", torch.cuda.device_count())
    print("GPU:", torch.cuda.get_device_name(0))
else:
    print("CUDA is not available.")

# Check if CUDA is available and set the device accordingly
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Print whether a GPU or CPU is being used
if device.type == 'cuda':
    print("Using GPU")
else:
    print("Using CPU")

# Create two random tensors
tensor1 = torch.randn(1000, 1000, device=device)
tensor2 = torch.randn(1000, 1000, device=device)

# Add the two tensors, the operation will be performed on the GPU if available
result = tensor1 + tensor2

print(result)

### <font color='purple'>2.) SLURM Script</font>

Today, we'll submit jobs to the Quest GPU nodes through a <font color='purple'>SLURM</font> (scheduler) script. So long as you have a Quest allocation, you can launch the sample python code using this SLURM script. 

In [None]:
#!/bin/bash

#SBATCH -A your_quest_allocation_account
#SBATCH -p gengpu
#SBATCH --gres=gpu:a100:1
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -t 0:30:00
#SBATCH --mem=40G

module purge
module load python
source activate /kellogg/software/envs/gpu-pytorch
python pytorch_gpu_test.py


Breaking down this script: 

- `srun' is used to submit an interactive job to the Slurm scheduler.
- `--partition=gengpu` directs you to GPU nodes on the Quest Genomics Cluster
- `--account=XXXXX` must reference the Quest allocation you are given.
- `--nodes=1` specifies that the job will be run on 1 node of the cluster.  Since there is only 1 GPU node available this cannot be adjusted.
- `--ntasks-per-node=1` this line specifies how many cores of the node you will use. Setting `--ntasks-per-node=2` will run your script on two cores of the node. Only adjust this parameter if your code is parallelizable, otherwise it will slow down the processing speed. 
- `--gres=gpu:a100:1` GRES stand for `Generic Resources'. This line specifies that the job requires 1 GPU of type "a100".  By excluding this line, your job will be restricted to CPU resources.
- `--time==00:30:00` indicates that this job will be allowed to run for up to 30 minutes.
- `--mem` specifies how much memory you are requesting. 

After accessing the GPU node, the script loads python and activates the <font color='purple'>__gpu-pytorch__</font> environment.  Finally it launches the python code.

After submitting your job, you will receive a job ID, like: 

<font color='purple'>__Submitted batch job 9428806__</font> 

Once the script runs to completion, the results will be saved to a __slurm-[job_iD].out__ file, (like __slurm-9428806.out__) in your current directory. 