# <span style="color:green"> Objective </span>

- To learn the important concepts involved in copying (transferring) data between host and device
    - System Interconnect
    - Direct Memory Access
    - Pinned memory

<hr style="height:2px">

# <span style="color:green"> CPU-GPU Data Transfer using DMA </span>

- DMA (Direct Memory Access) hardware is used for ```cudaMemcpy()``` for better efficiency
    - Frees CPU for other tasks
    - Hardware unit specialized to transfer a number of bytes requested by OS
    - Between physical memory address space regions (some can be mapped I/O memory locations)
    - Uses system interconnect, typically PCIe in today’s systems

![alt tag](img/3.png)


<hr style="height:2px">

# <span style="color:green"> Virtual Memory Management </span>

- Modern computers use virtual memory management
    - Many virtual memory spaces mapped into a single physical memory
    - Virtual addresses (pointer values) are translated into physical addresses
- Not all variables and data structures are always in the physical memory
    - Each virtual address space is divided into pages that are mapped into the physical memory
    - Memory pages can be paged out to make room
    - Whether or not a variable is in the physical memory is checked at address translation time

<hr style="height:2px">

# <span style="color:green"> Data Transfer and Virtual Memory </span>

- DMA uses physical addresses
    - When ```cudaMemcpy()``` copies an array, it is implemented as one or more DMA transfers
    - Address is translated and page presence checked for the entire source and desitination regions at the beginning of each DMA transfer
    - No address translation for the rest of the same DMA transfer so that high efficiency can be achieved
    
    
- The OS could accidentally page-out the data that is being read or written by a DMA and page-in another virtual page into the same physical location
<hr style="height:2px">

# <span style="color:green"> Pinned Memory and DMA Data Transfer </span>

- Pinned memory are virtual memory pages that are specially marked so that they cannot be paged out
- Allocated with a special system API function call
- a.k.a. Page Locked Memory, Locked Pages, etc.
- CPU memory that serve as the source or destination of a DMA transfer must be allocated as pinned memory

<hr style="height:2px">

# <span style="color:green"> CUDA data transfer uses pinned memory </span>

- ```cudaMemcpy()``` assumes that any source or destination in the host memory is allocated as pinned memory
- If a source or destination of a ```cudaMemcpy()``` in the host memory is not allocated in pinned memory, it needs to be first copied to a pinned memory extra overhead
- ```cudaMemcpy()``` is faster if the host memory source or destination is allocated in pinned memory since no extra copy is needed

<hr style="height:2px">

# <span style="color:green"> Allocate/Free Pinned Memory </span>

- ```cudaHostAlloc()```, three parameters
    - Address of pointer to the allocated memory
    - Size of the allocated memory in bytes
    - Option, use cudaHostAllocDefault for now
    
- ```cudaFreeHost()```, one parameter
    - Pointer to the memory to be freed

<hr style="height:2px">

# <span style="color:green"> Using Pinned Memory in CUDA </span>

- Use the allocated pinned memory and its pointer the same way as those returned by ```malloc()```
- The only difference is that the allocated memory cannot be paged by the OS
    - The ```cudaMemcpy()``` function should be about 2X faster with pinned memory
- Pinned memory is a limited resource
    - over-subscription can have serious consequences

<hr style="height:2px">

# <span style="color:green"> Putting It Together - Vector Addition Host Code Example </span>

```cpp

int main()
{
    float *h_A, *h_B, *h_C;
    ...
    cudaHostAlloc((void **) &h_A, N* sizeof(float),
    cudaHostAllocDefault);
    cudaHostAlloc((void **) &h_B, N* sizeof(float),
    cudaHostAllocDefault);
    cudaHostAlloc((void **) &h_C, N* sizeof(float),
    cudaHostAllocDefault);
    ...
    vecAdd(h_A, h_B, h_C, N);
}


```


<hr style="height:2px">

<footer>
<cite> GPU NVIDIA Teaching Kit - University of Illinois </cite>
</footer>