# 31) ISPC, OpenMP target, OpenACC, and all that

Last time:

- Parallel reductions with CUDA.jl
- Different strategies of optmization on the GPU

Today: 

1. ISPC  
2. OpenMP target offload  
  2.1 Terminology
3. OpenACC  

| Architecture | Directives | SIMD | SPMD |
|---------|-----------|------|-----|
| Intel AVX+ (SIMD) | `#pragma omp simd` | [intrinsics](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#) | [ISPC](https://ispc.github.io/ispc.html) |
| CUDA (SIMT) | `#pragma omp target` | C++ templates and other high-level APIs | CUDA |

## 1. [ISPC: Intel SPMD Program Compiler](https://ispc.github.io/ispc.html)

The Intel **Implicit SPMD Program Compiler (ISPC)** is a compiler for writing **single program multiple data (SPMD)** programs to run on the CPU and GPU. 

The SPMD programming approach is similar to approaches used in computer graphics and general-purpose-GPU programming; it is used for GPU shaders and CUDA and OpenCL kernels, for example.

- The main idea behind SPMD is that one writes programs as if they were operating on a single data element (a pixel for a pixel shader, for example), but then the underlying hardware and runtime system executes multiple invocations of the program in parallel with different inputs (the values for different pixels, for example).

- In summary, we can program **SIMT** (e.g., CUDA) devices using directives, but we can also program **SIMD** (e.g., Intel CPUs) using a **SPMD** (recall, the CUDA-like, acronym that comes from "single program" versus "single instruction") programming model.


```{literalinclude} ../c_codes/module9-1/simple-ispc.ispc
:language: c
:linenos: true
```

This function is callable from native C code. Example:

```{literalinclude} ../c_codes/module9-1/simple.c
:language: c
:linenos: true
```


In [2]:
! gcc -O3 -march=native -o simple.o -c ../c_codes/module9-1/simple.c && ispc -O3 --target=avx2-i32x8 ../c_codes/module9-1/simple-ispc.ispc -o simple-ispc.o && gcc simple.o simple-ispc.o  -lm -o simple  && ./simple