# High Performance Computing (HPC)

## 1. What is High Performance Computing (HPC)?
"Hardware and software resources used for computational science that are beyond what is commonly found on a desktop computer."

---

## 2. Do you need HPC?

If you are not sure you need HPC, you probably don't. Your desktop computer today is phenomenally powerful. We saw this in the previous notebook.

---

## 3. CPU vs GPU: Two Very Different Kinds of Speed

A useful way to think about the difference between a **CPU** and a **GPU** is with a transportation analogy. A **CPU is like a jet aircraft**: it is extremely fast, highly flexible, and excellent at handling complex, branching tasks—but it can only carry a relatively small payload at a time. A **GPU is like a giant cargo ship**: it is slower per individual operation, but it can carry an enormous amount of work in parallel. Both are “fast,” but they are fast in fundamentally different ways.

A modern **CPU** typically has:
- **8–64 powerful cores**
- Very high clock speeds
- Heavy emphasis on **latency** (doing one thing as fast as possible)
- Sophisticated control logic for branching, caching, and task switching  
This makes CPUs ideal for:
- Program control flow  
- I/O handling  
- Decision-heavy code  
- Serial or weakly-parallel algorithms  

A modern **GPU**, by contrast, typically has:
- **Thousands to tens of thousands of simple cores**
- Lower clock speeds per core
- Emphasis on **throughput** (doing many similar things at once)
- A programming model optimized for massive data parallelism  
This makes GPUs ideal for:
- FFTs and spectral methods  
- Dense linear algebra  
- Image and signal processing  
- Machine learning  
- Large stencil and convolution operations  

In terms of parallelism: a CPU core is a “heavyweight worker” that can do almost anything, but you only get a few dozen of them. A GPU core is a “lightweight worker” that can do only simple operations, but you get them by the thousands. If your problem can be broken into many identical operations on large arrays, the GPU wins by brute-force parallelism. If your problem involves lots of logic, branching, or irregular memory access, the CPU usually wins.

In practice, modern scientific computing uses both: the **CPU orchestrates the calculation**, manages I/O and control flow, and launches kernels, while the **GPU acts as a massive numerical engine** that grinds through the arithmetic. Understanding *which parts of an algorithm belong on the CPU and which belong on the GPU* is now one of the central skills in high-performance scientific computing.

---

## 4. HPC at USF: CIRCE
USF Research Computing (RC) hosts CIRCE - the Central Instructional and Research Computing Environment. It is an HPC cluster mainly intended for "embarrassingly parallel" computing, such as running the same code on lots of data, without the need to communicate between CPU's. Features:
- 340 nodes with 10,000 cores running Red Hat Linux
- 62 TB shared RAM
- 171 GPUs
- 5 PB of storage

Link: https://wiki.rc.usf.edu/index.php/CIRCE_Hardware

Anyone can use CIRCE, but you have to submit your jobs in a queue, unless you have your own nodes ($5-10k each). Dr. Sylvain Charbonnier has 4 nodes purchased in 2022. They provide about ~40 times the CPU speed of a Mac Mini M4. The full CIRCE cluster has about ~2000 times the CPU speed. Or about 150 times the GPU power. However, the GPU on the Mac Mini M4 could still be faster than all the CPU's on CIRCE combined (for certain types of processing)!

---

## 5. Logging on to CIRCE



## 5. Is it worth it?

Writing and running jobs on CIRCE is quite involved (as you will see!).

It is now far easier to exploit a powerful GPU on a personal desktop than on a traditional HPC cluster. On a Mac mini M4, the GPU is available immediately with no scheduler, no modules, no queue, and no explicit device management for many common scientific and machine-learning libraries—your code can hit the GPU simply by selecting a backend such as Metal or MPS. This makes rapid development, prototyping, and medium-scale production runs extraordinarily efficient. By contrast, running GPU jobs on a shared cluster like CIRCE requires remote access, resource requests through a scheduler, careful management of drivers and CUDA versions, and longer debug cycles. For many compute-bound, memory-fit problems (e.g., FFTs, spectrograms, dense linear algebra, and ML-based classification), a well-written desktop GPU program can outperform the entire CPU side of a cluster for a single job. Clusters remain essential, however, when problems require distributed memory, massive I/O, or thousands of concurrent jobs.

---

##

Links:
- https://wiki.rc.usf.edu/index.php/Main_Page
- https://wiki.rc.usf.edu/index.php/CIRCE
- https://wiki.rc.usf.edu/index.php/Connecting_To_CIRCE
- https://wiki.rc.usf.edu/index.php/CIRCE_Data_Access



## Request access
We will try to get access to CIRCE for this class. Draft an email like the one below, but fill in your details, and those of your MS/PhD advisor. It is likely that we will be directed to accounts on the 

Email Template:

```

Subject: CIRCE Access Request

---

To: rc-help@usf.edu

Dear RC Help Team,

I am writing to request access to the CIRCE research cluster for my ongoing research project. Please find my information below:

**Student Information:**
- Name: Jacob Krier
- NETID: jakrier@usf.edu
- College/Department: CAS/Geosciences

**Faculty Sponsor:**
- Name: Glenn Thompson
- Email: thompsong@usf.edu

I am conducting research with my PhD Advisor (see above) and would appreciate access to CIRCE resources for my project. Please let me know if you need any additional information.

Thank you for your assistance.

Best regards,
Jacob Krier
jakrier@usf.edu

```

