# Introduction to High Performance Computing

* [What is High Performance Computing (HPC)?](#What-is-High-Performance-Computing?)
* [Why is it important?](#Why-is-it-important?)
* [Where is it used?](#Where-is-it-used?)
* [What are the components of HPC?](#What-are-the-components-of-HPC?)
 - [Master Node](#Master-Node)
 - [Compute Node](#Compute-Node)
 - [Shared file storage and network](#Shared-file-storage-and-network)
 - [Scheduler](#Scheduler)


## What is High Performance Computing?

High Performance Computing most generally refers to the practice of aggregating computing power in a way that delivers much higher performance than one could get out of a typical desktop computer or workstation in order to solve large problems in science, engineering, or business.

**RocketML enables easy transition from desktop to HPC through Data Science Workbench**



## Why is it important?

- Solve problems 100X faster -- days to minutes
- Solve intractable problems
- Significant competitive advantage - Your competition might be already using HPC


## Where is it used?
- Computer aided engineering (CAE): Automotive design and testing, transportation, structural, mechanical design 
- Chemical engineering: Process and molecular design 
- Digital content creation (DCC) and distribution: Computer aided graphics in film and media 
- Economics/financial: Wall Street risk analysis, portfolio management, automated trading 
- Electronic design and automation (EDA): Electronic component design and verification 
- Geosciences and geo-engineering: Oil and gas exploration and reservoir modeling 
- Mechanical design and drafting: 2D and 3D design and verification, mechanical modeling 
- Defense and energy: Nuclear stewardship, basic and applied research 
- Government labs: Basic and applied research 
- University/academic: Basic and applied research 
- Weather forecasting: Near term and climate/earth modeling

Ref: HPC for Dummies by Douglas Eadline, PhD


## What are the components of HPC?

![cluster](images/Beowulf.png)

### Master Node

The application programs never see the computational nodes (also called slave computers) but only interact with the "Master" which is a specific computer handling the scheduling and management of the compute nodes.[13] In a typical implementation the Master has two network interfaces, one that communicates with the private Beowulf network for the slaves, the other for the general purpose network of the organization.

### Compute Node

![compute-node](images/node_diagram.png)

The processor contains multiple compute cores (usually shortened to core); 4 in the diagram above. Each core contains a floating point unit (FPU) which is responsible for actually performning the computations on the data and various fast memory caches which are responsible for holding data that is currently being worked on. The compute power of a processor generally depends on three things:

The speed of the processor (2-3 GHz are common speeds on modern processors)
The power of the floating point unit (generally the more modern the processor, the more powerful the FPU is)
The number of cores available (12-16 cores are typical on modern processors)
Often, HPC nodes have multiple processors (usually 2 processors per node) so the number of cores available on a node is doubled (i.e. 24-26 cores per node, rather than 12-16 cores per node). This configuration can have implications for performance.

Each node also has a certain amount of memory available (also referred to as RAM or DRAM) in addtion to the processor memory caches. Modern compute nodes typically have in the range 64-256 GB of memory per node.

Finally, each node also has access to storage (also called disk or file system) for persistent storage of data. As we shall see later, this storage is often shared across all nodes and there are often multiple different types of storage connected to a node.

[Reference](https://epcced.github.io/hpc-intro/010-hpc-concepts/)

### Shared Storage and File Systems
The kind of computing that people do on HPC systems often involves very large files, and/or many of them. Further, the files have to be accessible from all of the front-end and compute nodes on the system. So most HPC systems have specialized file systems that are designed to meet these needs. Frequently, these specialized file systems are intended to be used only for short- or medium-term storage, not permanent storage. As a consequence of this, most HPC systems often have several different file systems available – for example home, and scratch file systems. It can be very important to select the right file system to get the results you want (performance or permanence are the typical trade-offs). [Reference](https://epcced.github.io/hpc-intro/010-hpc-concepts/)

**On DSW a folder named `/shared/` is accessible from both master and compute nodes**


### Scheduler
In order to share these large systems among many users, it is common to allocate subsets of the compute nodes to tasks (or jobs), based on requests from users. These jobs may take a long time to complete, so they come and go in time. To manage the sharing of the compute nodes among all of the jobs, HPC systems use a batch system or scheduler. The batch system usually has commands for submitting jobs, inquiring about their status, and modifying them.
[Reference](https://epcced.github.io/hpc-intro/010-hpc-concepts/)

**On DSW _slurm_ scheduler is installed and setup. Data Scientists don't need to spend time configuring the scheduler**