# Fundamentals of accelerated computing

In order to understand how to use GPU's effectively with Fortran, we need to get a fundamental understanding of how accelerated computing works. The following sections introduce the fundamental concepts of working with HIP and accelerators.

## A brief history of GPU's for scientific computing.

Graphics Processing Units (**GPU**'s) originated with the need to quickly perform math operations for rendering a 3D scene for display on a screen, such as in a game. Rendering pixels is an readily parallelizable operation, and the compute operation can be performed in parallel over the available hardware units. Originally these units used specialized silicon to perform the rendering calculations in parallel, however as the complexity of algorithms increased the hardware units became more generalised and programmable. Demand for the best frame rates in games drove performance, and this resulted in vendors providing GPU's with ever higher compute performance and memory bandwidth.

In 2004 the graphics card company ATI launched "Close To Metal", the first commercial solution for performing scientific calculations in parallel over General Purpose Graphics Processing Units (GPGPU's). This was followed by NVIDIA's CUDA in 2007 and Apple/Khrono's OpenCL in 2009, Apple's Metal in 2014 and AMD's HIP in 2016. Frameworks such as these enabled scientific calculations to be performed on the GPU at a rate that is often much faster than on CPU's. GPU's were packaged as discrete devices, separate from the CPU and connected to the host over a connection such as PCI Express.

In recent times, accelerating the training and inference operations in artificial intelligence is now the primary economic driver for compute performance in GPU's. Recent designs such as AMD's Mi300 and NVIDIA's Grace Hopper integrate both CPU's and GPU's in the same processor die along with high bandwidth memory. 

## Introduction to HIP

HIP stands for the Heterogeneous Interface for Portability. HIP is part of ROCM, AMD's competitor to CUDA, and aims to make GPU's accessible through providing  a subset of capability formed from the fusion of the driver and runtime API's in CUDA. HIP calls have their own prefix (i.e **hipMalloc** instead of **cudaMalloc**) and they can serve as a **very thin** layer over a vendor's GPU library calls when using their backend. As such, this design philosophy currently allows HIP programs to use either an AMD, NVIDIA, or even an Intel accelerator as the compute device, while allowing the use of vendor-specific debugging and performance tools. HIP has a number of benefits that include:

* A single source for programs and kernels.
* The ability to use an Intel, NVIDIA, or AMD compute devices at full performance.
* Easy-to-use API that is familiar in many ways to CUDA, with the ability to benefit from knowledge in established literature on GPU computing with CUDA.
* Tools available to port code from CUDA to HIP.

There are also some challenges that must be considered when considering using HIP for your project.

* The number of officially-supported devices and operating systems is quite low. See this [page](https://rocm.docs.amd.com/en/latest/release/gpu_os_support.html) for devices that are officially supported. Other recent AMD devices do generally work with ROCm, but with varying levels of support. 
* Only one type of compute device (limited to a single vendor) is accessible to a HIP program at runtime. In order to change vendors or compute devices the program must be recompiled with a different backend.
* The HIP API is still on the way to maturity. As such it is anticipated there will be bugs and things that don't work properly.
* Not every CUDA API call is supported in HIP.

## Introduction to hipfort

Hipfort is the Fortran interface to HIP, and provides a way to access the compute power of a GPU and HIP and ROCm compute libraries from a Fortran program. It is designed to provide access to HIP and ROCm library calls at the level of around ROCm 4.5. Supported libraries include:

* HIP
* hipBLAS and rocBLAS (Basic Linear Algebra)
* hipFFT and rocFFT (Fast Fourier Transforms)
* hipRAND and rocRAND (Random number generation)
* hipSOLVER and rocSOLVER (Linear algebra solvers, AMD backends only at this point)
* hipSPARSE and rocSPARSE (Implementation and tools to work with spare matrices.)

The hip**\*** libraries can use multiple backends, while the roc**\*** libraries are specific to AMD.

## GPU computing hardware

### Compute
### Memory

## Anatomy of an accelerated application

<address>
Written by Dr. Toby Potter of <a href="https://www.pelagos-consulting.com">Pelagos Consulting and Education</a> and Dr. Joe Schoonover from <a href="https://www.fluidnumerics.com">Fluid Numerics</a>. All trademarks mentioned in this page are the property of their prospective owners.
</address> 