# Managing Device Data (C/C++)

#### Sections
- [Learning Objectives](#Learning-Objectives)
- [Data Offload](#Data-Offload)
- [Map Clause](#Map-Clause)
- _Code:_ [Lab Exercise: Map Clause](#Lab-Exercise:-Map-Clause)
- [Dynamically Allocated Data and Length Specification](#Dynamically-Allocated-Data-and-Length-Specification)
- [Target Data Region](#Target-Data-Region)
- _Code:_ [Lab Exercise: Target Data Region](#Lab-Exercise:-Target-Data-Region)
- [Mapping Global Variables to Device](#Mapping-Global-Variables-to-Device)

## Learning Objectives
* Use OpenMP* constructs to effectively manage data transfers to and from the device 
* Be able to create a device data environment and map data to the device data environment
* Map global variables to OpenMP devices

### Prerequisites
Basic understanding of OpenMP constructs are assumed for this module. You also should have already went through the  [Introduction to OpenMP Offload module](../intro/intro.ipynb) where the basics of using the Jupyter notebooks with the Intel® devcloud and an introduction to the OpenMP `target` constructs were discussed.

***
## Data Offload
The host and devices have separate memory spaces, so when parts of the code are offloaded, data needs to be mapped to the target device in order to be accessed inside the target region.

By default, variables accessed inside the target region are treated as follows:

|Type | Behavior |
|:----:|:----|
|Scalars | Treated as `firstprivate` |
|Static arrays | Copied to the device on entry and from the device to the host on exit |
|Dynamic arrays | Same as above, length must be specified |

In the following example, the compiler will identify all variables that are used in the target region (a, x, and y), and data will be transferred to the device based on the above rules.

```c
void saxpy() {
    float a, x[ARRAY_SIZE], y[ARRAY_SIZE];
    #pragma omp target
    // On entry of target region, a, x, and y copied from host to the device
    for (int i=0; i< ARRAY_SIZE; i++) {
        y[i] = a * x[i] + y[i];
    }
    // Upon exit of the target region, both x and y are copied back to the host, 
    // even though x was not changed.
}
```

## Map Clause

To eliminate unnecessary data copies, use the `map` clause of the `target` directive to manually map variables to the device data environment.
```c
#pragma omp target map (map-type: list)
```
Available *map-type*s are
* `alloc`: Allocate storage for variable on target device, values not copied
* `to`: Allocate storage on target device and assign value **from original host variable to device** on target region entry
* `from`: Allocate storage on target device and assign value **from device to original host variable** on target region exit
* `tofrom`: default, both `to` and `from`

<img src="Assets/mapclause.jpg">

## Lab Exercise: Map Clause

In this exercise you will add a map clause to the saxpy ($y=a*x+y$) operation. The primary source file, main.cpp, is written for you. It includes saxpy_func.cpp that you will complete and write out to file in this Jupyter notebook. If you would like to see the contents of main.cpp, execute the following cell.


In [None]:
#Optional, see the contents of main.cpp
%pycat main.cpp

In the cell below, add the map clause that would map the x array to the target so that it won't be unnecessarily copied back. Also, add the clause `map(from:is_cpu)` so we'll know whether the code was executed on the GPU.

In [None]:
%%writefile lab/saxpy_func.cpp
// Add the target pragma with the map clauses here

{
  is_cpu = omp_is_initial_device();
  for (i = 0; i < ARRAY_SIZE; i++) {
    y[i] = a * x[i] + y[i];
  }
}

### Compile the Code
Next, compile the code using *compile_c.sh*. If you would like to see the contents of compile_c.sh execute the following cell.

In [None]:
# Optional: Run this cell to see the contents of compile_c.sh
%pycat compile_c.sh

Execute the following cell to perform the compilation

In [None]:
!chmod 755 compile_c.sh; ./compile_c.sh;

### Execute the code
Next, run the code using the script *run.sh*.

In [None]:
# Optional: Run this cell to see the contents of run.sh
%pycat run.sh

Execute the following cell to execute main.cpp. Look for the PASSED! message.

_If the Jupyter cells are not responsive or if they error out when you compile the samples, please restart the Kernel and compile the samples again_

In [None]:
! chmod 755 q; chmod 755 run.sh;if [ -x "$(command -v qsub)" ]; then ./q run.sh; else ./run.sh; fi

Execute the following cell to see the solution.

In [None]:
%pycat saxpy_func_solution.cpp

## Dynamically Allocated Data and Length Specification

For dynamically allocated arrays, when using the `target map` construct, the number of elements to be mapped must be explicitly specified. Partial arrays maybe specified.
```c
#pragma omp target map(to:array[start:length])
```
In the previous example, x and y are static arrays, so length specification is optional. If you wish you may go back to the previous example and specify the size of the array to map. Alternatively, you may run the following cell to see the solution.

In [None]:
%pycat saxpy_func_solution_length.cpp

***
## Target Data Region
When there are more than one target regions, it's often useful to create a larger target **data** region that encompasses all of the target regions to minimize data copy across target regions. There are two ways to create a target data region, using `target data` or using `target enter data` and `target exit data`.
### Target Data
The `target data` construct creates a scoped data environment and maps data to and from the device. When using this construct, the `alloc`, `to`, `from`, and `tofrom` map-types are available. 

Note: `Target Data` does not create a target region that offloads execution. `target` constructs inside the data environment is needed to accomplishes that.
```c
#pragma omp target data map(tofrom: x)
// Device data environment created, x stays on the device through out the two target regions
{
    #pragma omp target(to: y)
    {
        // First target region
    }
    host_update(y); // y must be mapped at each target region because it's being updated by the host
    #pragma omp target(to: y)
    {
        // Second target region
    }
}
```
### Target Enter/Exit Data and Update
`target enter/exit data` constructs can be used to explicitly mark the beginning and ending of the target data environment.

When using the `target enter data` construct, only the map-types of `alloc` and `to` are available. When using the `target exit data` construct, the `from`, `release`, and `delete` map-types are available. 

The `target update` construct is used to issue data transfers to or from the existing data device environment.

Note: `target enter/exit/update data` constructs are not scoped and does not offload execution of code. `target` constructs are needed between enter and exit of data environment to accomplish that.

Example:
```c
#pragma omp target enter data map(to:y) map(alloc: x)
#pragma omp target
{ //First target region, device operations on x and y
}

#pragma omp target update from (y)
host_update(y);
#pragma omp target update to (y)

#pragma omp target
{ //2nd target region, device operations on x and y
}
#pragma omp target exit data map(from:x)
```

## Lab Exercise: Target Data Region
In this exercise, we have two target regions. x and y are static arrays of size ARRAY_SIZE, and they are used in the target regions. In addition, the value of y is updated by the host between the regions. For this program, *main_data_region.cpp* contains main and includes *target_data_region.cpp*, which is the file you will override.

Create a target data environment that encompasses both target regions, ensure `x` stays on the device across the region and make sure `y` is updated to the device after the host `init2` call. Test your code, and ensure the PASSED message is displayed.

There are two ways to solve this problem. You may choose to use either `target data` or `target enter/update/exit data`. Solution is provided for both.

In [None]:
#Examine main_data_region.cpp if you wish.
%pycat main_data_region.cpp

In [None]:
%%writefile lab/target_data_region.cpp


#pragma omp target
  {
    for (int i = 0; i < ARRAY_SIZE; i++) x[i] += y[i];
  }

  init2(y, ARRAY_SIZE);

#pragma omp target
  {
    for (int i = 0; i < ARRAY_SIZE; i++) x[i] += y[i];
  }


### Compile the Code

In [None]:
# Optional: Examine the compile script if you choose
%pycat compile_data_c.sh

In [None]:
#Execute this cell to compile the program, ensure your porgram compiles correctly
! chmod 755 compile_data_c.sh; ./compile_data_c.sh;

### Execute the Code

In [None]:
# Optional: Examine the run script if you choose
%pycat run_data.sh

In [None]:
#Execute the program, if you see the "FAILED" message, go back and debug your code
! chmod 755 q; chmod 755 run_data.sh;if [ -x "$(command -v qsub)" ]; then ./q run_data.sh; else ./run_data.sh; fi

_If the Jupyter cells are not responsive or if they error out when you compile the samples, please restart the Kernel and compile the samples again_

In [None]:
#Examine both solutions
%pycat target_data_region_solution.cpp

## Mapping Global Variables to Device
With OpenMP, you also have the option to map a variable to the device for the duration of the program. Use the `declare target` directive to specify that variables and functions are mapped to a device. Here's an example.
```c
#pragma omp declare target
int a[N]
#pragma omp end declare target

...
    
//Host Code
init(a);

#pragma omp target
for (int i=0; i<N; i++) {
    result[i]=process(a[i]);
}
```

# Summary
In this module, you have learned the following:
* How OpenMP handles data transfers to the device by default
* Explicitly specify data mapping in the `#pragma omp target` construct with the map clause
* Declare target data region with `target data` and `target enter/exit data` constructs
* Explicitly issue data transfers using the `target update` directive
* Map global variables to the target device

***

@Intel Corporation | [\*Trademark](https://www.intel.com/content/www/us/en/legal/trademarks.html)