# GPU Memory Management

## Introduction


Host and device memory on GPU

Types of host memory
 - pageable 
 - pinned
 - 
* Constant memory
* Shared memory
* Pinned memory
* Managed memory (host and device copies managed for you)


In this section, we will first start with the basics of how to allocate data on both the host and the device, starting with pageable memory on the host (using Fortran's built in `allocate`). Host and device memory refers to memory stored in RAM on the host side and device side respectively. On the host side, we explore the differences between pageable and pinned memory and walk through benchmark examples on a few different GPU models to motivate the benefits of pinned memory. While you can explicitly manage host and device pointers in your application, HIP and HIPFort also offer Managed Memory, which allows you to work with one variable to reference both host and device data. This pushes the control of memory movement between the host and device to a unified memory systemThere are additional types of memory on the GPU that are 

## Managing Host and Device Memory

Fortran pointer for host data

For device data, use either Fortran pointer or C-pointer -- what are the trade-offs ?


### Using type(c_ptr) for device data

#### Allocate

#### Deallocate

#### Memory Copy

### Using Fortran pointer for device data

#### Allocate

#### Deallocate

#### Memory Copy

## Pageable v. Pinned Memory

### Pageable
Pageable memory is usually gotten when calling malloc or new in a C++ application. It is unique in that it exists on “pages” (blocks of memory), which can be migrated to other memory storage. For example, migrating memory between CPU sockets on a motherboard, or a system that runs out of space in RAM and starts dumping pages of RAM into the swap partition of your hard drive.

Pageable memory refers to a concept in computer memory management where portions of a program's or system's memory can be temporarily transferred between the main memory (RAM) and secondary storage (usually a hard disk or SSD). This transfer is typically done in the form of pages, which are fixed-size blocks of memory.

In pageable memory systems, not all pages of a program or system need to reside in the physical RAM at all times. Instead, the operating system can swap pages in and out of RAM as needed. This allows more programs to be loaded into memory than can fit simultaneously, and the operating system can prioritize which pages are actively used and keep them in RAM while swapping less frequently used pages to secondary storage.

This paging mechanism helps optimize the use of available memory, allowing for efficient multitasking and the execution of larger programs that might not fit entirely in RAM. However, the process of swapping pages in and out of RAM can introduce some overhead, as data must be transferred between the faster RAM and the slower secondary storage.

Overall, pageable memory is a key aspect of virtual memory systems, enabling more flexible and efficient use of computer memory resources.

### Pinned
Pinned memory (or page-locked memory, or non-pageable memory) is host memory that is mapped into the address space of all GPUs, meaning that the pointer can be used on both host and device. 

Pinned memory is a concept in computer memory management where a specific region of memory is prevented from being swapped or moved by the operating system. Unlike pageable memory, which can be moved between RAM and secondary storage, pinned memory remains fixed in physical memory (RAM).

Pinning memory is often used in situations where the continuous and predictable access to specific memory locations is critical, and the overhead associated with paging or swapping can be detrimental to performance. Some common use cases for pinned memory include:

* Real-Time Systems: In real-time applications, where precise timing and responsiveness are crucial, certain data structures or buffers may be pinned in memory to avoid the latency introduced by paging.

* DMA (Direct Memory Access): Pinned memory is sometimes used in conjunction with DMA operations. When data needs to be transferred directly between devices (such as a network card or GPU) and memory without involving the CPU, pinning the memory can ensure a stable and predictable location for the data.

* High-Performance Computing: In scientific computing or other high-performance applications, where low-latency access to data is essential, pinning memory can be employed to avoid interruptions caused by paging.

It's important to note that pinning memory usually comes with trade-offs. While it provides more predictable and faster access to certain regions of memory, it also reduces the flexibility of memory management. Pinned memory is typically managed explicitly by the programmer or the application, as it requires special handling to allocate, deallocate, and ensure proper usage.

With HIP and HIPfort, this requires using...

### When to use pinned v. pageable memory

In GPU programming, using pinned memory can be beneficial in certain scenarios to optimize data transfers between the CPU and GPU. Pinned memory, also known as "pinned host memory" or "pinned host arrays," remains fixed in physical memory and can offer advantages in terms of data transfer bandwidth and latency. Here are some situations where you might consider using pinned memory in GPU programming:

* Frequent Data Transfers Between CPU and GPU: If your application involves frequent data transfers between the CPU and GPU, using pinned memory can reduce the overhead associated with pageable memory, as pinned memory is not subject to paging and is more suitable for high-throughput data transfer.

* DMA Transfers and Asynchronous Operations: Pinned memory is often recommended when using asynchronous data transfers or Direct Memory Access (DMA) operations. Asynchronous transfers allow the CPU and GPU to perform computations concurrently, and pinned memory ensures that data can be efficiently transferred without the need for explicit staging buffers.

* Streamlined Data Movement for HIP or OpenCL Kernels: When working with GPU programming frameworks like CUDA or OpenCL, pinned memory can help streamline data movement. It allows you to pass pointers directly to GPU kernels without the need for additional data copying, resulting in better performance.

* Optimizing Memory Bandwidth: Pinned memory can provide better memory bandwidth compared to pageable memory in scenarios where optimizing memory throughput is crucial. This is particularly relevant when dealing with large datasets or when maximizing data transfer rates is a priority.

* Reducing Latency in Interactive Applications: In interactive applications or simulations where low latency is critical, pinned memory can help minimize the time required for data transfers between the CPU and GPU. This is important for applications that demand real-time responsiveness.


It's important to note that while pinned memory can offer performance benefits in certain situations, it comes with trade-offs. Pinned memory requires explicit management, and the total amount of pinned memory available may be limited by the system. Therefore, it's recommended to use pinned memory judiciously based on the specific requirements and characteristics of your GPU-accelerated application.

In practice, you can find the amount of available page locked (pinned) memory (on Linux operating systems), using `ulimit -a `, and looking for `max locked memory` or `mlock`


### Allocating Pinned Memory for host data


> To do : show example of pinned v. pageable on MI250X (Setonix), MI210 (Noether), MI50 (Oram), V100 (GCP), and A100 (GCP) GPUs.

## Managed Memory

Managed memory refers to universally addressable, or unified memory available on the MI200 series of GPUs. Much like pinned memory, managed memory shares a pointer between host and device and (by default) supports fine-grained coherence, however, managed memory can also automatically migrate pages between host and device. The allocation will be managed by AMD GPU driver using the Linux HMM (Heterogeneous Memory Management) mechanism.


### XNACK - Migrateable pages with managed memory 
https://rocm.docs.amd.com/en/latest/conceptual/gpu-memory.html#xnack


## Optimizing Memory Transfers between host and device (AMD CDNA2)

SDMA engines
https://rocm.docs.amd.com/en/latest/conceptual/gpu-memory.html#system-direct-memory-access

## Other types of Device Memory


https://rocm.docs.amd.com/en/latest/conceptual/gpu-memory.html
https://docs.amd.com/projects/HIP/en/latest/doxygen/html/group___memory_m.html