# OpenMP Cheat Sheet

## Directive syntax

<img alt="OpenMP Directive Syntax" src="../../pictures/directive_omp.png" style="float:none" width="30%"/>

If we break it down, we have those elements:

- The sentinel is a special instruction for the compiler. It tells him that what follows has to be interpreted as OpenMP directives
- The directive is the action to do. In the example, _target_ is the way to open a parallel region that will be offloaded to the GPU
- The clauses are "options" of the directive. In the example we want to copy some data from the GPU.
- The clause arguments give more details for the clause. In the example, we give the name of the variables to be copied

## Creating kernels

The way to open kernels on the GPU is to use the `omp target` directive with directive to create threads.

### Creating threads

The threads creation is the job of the developper in OpenMP. The standard defines 3 levels of parallelism:

- `omp teams`: Several groups of threads are created but only the master thread is active.
- `omp parallel`: The other threads of the team are activated.
- `omp simd`: SIMD threads are activated

### Work Sharing

Creating threads is not enough to have the full power of the GPU. You have to share work among threads:

- `omp teams distribute`: distribute work among teams
- `omp parallel for/do`: distribute work inside a team

### _omp target_ Clauses

| Clause                  | effect                                                                             |
|-------------------------|------------------------------------------------------------------------------------|
| private(vars, ...)      | Make _vars_ private at _team_ level                                                |
| firstprivate(vars, ...) | Make _vars_ private at _team_ level and copy the value vars had on the host before |
| device(dev\_num)        | Set the device on which to run the kernel                                          |

Other clauses might be available. Check the specification and the compiler documentation for full list.

### _omp teams_ Clauses

| Clause                    | effect                                                            |
|---------------------------|-------------------------------------------------------------------|
| num\_teams(#teams)        | Set the number of teams for the target region                     |
| thread\_limit(#threads)  | Set the maximum number of threads inside a team                   |
| private(vars, ...)      | Make _vars_ private at _team_ level                                                |
| firstprivate(vars, ...) | Make _vars_ private at _team_ level and copy the value vars had on the host before |
| reduction(op:vars, ...)   | Perform a reduction of the variables _vars_ with operation _op_   |

Other clauses might be available. Check the specification and the compiler documentation for full list.

### _omp parallel_ Clauses

| Clause            | effect                                        |
|-------------------|-----------------------------------------------|
| private(vars, ...)      | Make _vars_ private at _parallel_ level                                                |
| firstprivate(vars, ...) | Make _vars_ private at _parallel_ level and copy the value vars had on the host before |
| reduction(op:vars, ...)   | Perform a reduction of the variables _vars_ with operation _op_   |

Other clauses might be available. Check the specification and the compiler documentation for full list.

### _omp simd_ Clauses

| Clause                    | effect                                                            |
|---------------------------|-------------------------------------------------------------------|
| private(vars, ...)      | Make _vars_ private at _simd_ level                                                |
| firstprivate(vars, ...) | Make _vars_ private at _simd_ level and copy the value vars had on the host before |
| reduction(op:vars, ...)   | Perform a reduction of the variables _vars_ with operation _op_   |
| simdlen(vector\_size)   | Set the length of the vector |

Other clauses might be available. Check the specification and the compiler documentation for full list.

## Combined constructs for loops

It is possible to combine the

## Managing data

### Data regions

| Region           | Directive                                        |
|------------------|--------------------------------------------------|
| Program lifetime | `omp target enter data` & `omp target exit data` |
| Structured       | `omp target data`                                |
| Kernels          | `omp target map(...)`                            |

### Data clauses

To choose the right data clause you need to answer the following questions:

- Does the kernel need the values computed on the host (CPU) beforehand? (Before)
- Are the values computed inside the kernel needed on the host (CPU) afterhand? (After)

|                    | Needed after          | Not needed after     |
|--------------------|-----------------------|----------------------|
| Needed Before      | map(tofrom:var1, ...) | map(to:var2, ...)    |
| Not needed before  | map(from:var3, ...)   | map(alloc:var4, ...) |

<img alt="Data clauses in OpenMP" src="../../pictures/data_clauses_omp.png" style="float:none" width="45%"/>

### Updating data already present on the GPU

It is not possible to update data present on the GPU with the data clauses on a data region.
To do so you need to use `omp target update`

#### `omp target update` Clauses

- To update CPU with data computed on GPU: `omp target update from(data, ...)`
- To update GPU with data computer on CPU: `omp target update to(data, ...)`

## GPU routines

A routine called from a kernel needs to be inside a `declare target` region.
```c
#pragma omp declare target
void my_funtion(void)
{
        ...
}
#pragma omp end declare target
```

## Using data on the GPU with GPU aware libraries

To get a pointer to the device memory for a variable you have to use:

- `omp data use_device_ptr(var, ...)` for pointers
- `omp data use_device_addr(var, ...)` for allocatables