# OpenMP: Open Multi-Processing
* http://openmp.org/
* [Specifications](https://www.openmp.org/specifications/)
  * [OpenMP 4.5 Reference Guide - C/C++](https://www.openmp.org/wp-content/uploads/OpenMP-4.5-1115-CPP-web.pdf): 指令和构造, 运行时库例程, 子句, 环境变量, ICV(Internal Control Variables, 内部控制变量)的环境变量值
* [OpemMP Compilers & Tools](https://www.openmp.org/resources/openmp-compilers-tools/)
  * [GNU Offloading and Multi-Processing Project (GOMP)](https://gcc.gnu.org/projects/gomp/): 选项`-fopenmp`, `-fopenmp-simd`


More:

* [GCC Wiki - OpenMP](https://gcc.gnu.org/wiki/openmp)
* [ACENET workshop on OpenMP](https://acenet-arc.github.io/ACENET_Summer_School_OpenMP/)

In [1]:
# 查看支持的OpenMP版本
# ref https://www.openmp.org/specifications/
# OpenMP 4.5
!echo | cpp -fopenmp -dM | grep -i openmp

#define _OPENMP 201511


In [2]:
# 另一种查看版本方法
!gcc -fopenmp -dM -E - < /dev/null | grep -i open

#define _OPENMP 201511


# Hello World
* [OpenMP/aipp_5_1.c](./OpenMP/aipp_5_1.c): `parallel`, `_OPENMP`

In [2]:
%cd OpenMP

/mnt/d/GoogleDrive/wiki/jupyter-notebooks/Concurrency and Parallel/OpenMP


  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


In [4]:
# 编译
!gcc -g -Wall -fopenmp -o aipp_5_1 aipp_5_1.c

In [5]:
# 执行
!./aipp_5_1 4

Hello from thread 3 of 4
Hello from thread 1 of 4
Hello from thread 2 of 4
Hello from thread 0 of 4


In [6]:
!./aipp_5_1 1

Hello from thread 0 of 1


In [7]:
# 清理
!rm -f aipp_5_1

In [6]:
# 使用预处理器宏_OPENMP: 不使用-fopenmp选项
filename = 'aipp_5_1'
!gcc -g -Wall -fopenmp -o {filename} {filename}.c
!time ./{filename} 4
!rm -f {filename}

Hello from thread 0 of 4
Hello from thread 2 of 4
Hello from thread 1 of 4
Hello from thread 3 of 4
./aipp_5_1 4  0.00s user 0.00s system 79% cpu 0.004 total


# 梯形法数值积分
* [OpenMP/aipp_5_2.c](./OpenMP/aipp_5_2.c): `critical`
* [OpenMP/aipp_5_2-2.c](./OpenMP/aipp_5_2-2.c): `reduction`

In [9]:
import numpy as np

x = np.linspace(0.0, 3.0, 1024)
y = x * x
np.trapezoid(y, x=x)

np.float64(9.000004299928621)

In [5]:
filename = 'aipp_5_2'
!gcc -g -Wall -fopenmp -o {filename} {filename}.c
!echo 0.0 3.0 1024 | time ./{filename} 4
!rm -f {filename}

With n=1024 trapezoids, estimated integral from 0.000000 to 3.000000 = 9.000004291534424e+00
./aipp_5_2 4  0.00s user 0.00s system 15% cpu 0.006 total


In [4]:
# 使用reduction clause
filename = 'aipp_5_2-2'
!gcc -g -Wall -fopenmp -o {filename} {filename}.c
!echo 0.0 3.0 1024 | time ./{filename} 4
!rm -f {filename}

With n=1024 trapezoids, estimated integral from 0.000000 to 3.000000 = 9.000004291534424e+00
./aipp_5_2-2 4  0.00s user 0.00s system 35% cpu 0.005 total


# `parallel for`
* [OpenMP/ex_parallel_for.c](./OpenMP/ex_parallel_for.c)

In [11]:
filename = 'ex_parallel_for'
!gcc -g -Wall -fopenmp -o {filename} {filename}.c
!echo 0.0 3.0 1024 | time ./{filename} 4
!rm -f {filename}

With n=1024 trapezoids, estimated integral from 0.000000 to 3.000000 = 9.000004291534424e+00
./ex_parallel_for 4  0.00s user 0.00s system 17% cpu 0.005 total


## 数据依赖
* [OpenMP/pi.c](./OpenMP/pi.c): `private`

估计$\pi$:

$$
\pi = 4 \sum_{k=0}^{\infin} \frac{(-1)^{k}}{2k + 1}
$$



In [15]:
filename = 'pi'
!gcc -g -Wall -fopenmp -o {filename} {filename}.c
!time ./{filename} 1000 4
!rm -f {filename}

With n=1000 terms and 4 threads, estimated of pi = 3.140592653839791e+00
./pi 1000 4  0.00s user 0.00s system 20% cpu 0.006 total


## 排序
* [OpenMP/odd_even_sort.c](./OpenMP/odd_even_sort.c): 奇偶排序 `for`

In [18]:
filename = 'odd_even_sort'
!gcc -g -Wall -fopenmp -o {filename} {filename}.c
!time ./{filename} 4
!rm -f {filename}

With 4 threads, result = 6 7 8 9 
./odd_even_sort 4  0.00s user 0.00s system 36% cpu 0.003 total


In [19]:
# use parallel + for to reuse threads
filename = 'odd_even_sort'
!gcc -g -Wall -fopenmp -o {filename} {filename}.c
!time ./{filename} 4
!rm -f {filename}

With 4 threads, result = 6 7 8 9 
./odd_even_sort 4  0.00s user 0.00s system 24% cpu 0.004 total


## 调度循环
* [OpenMP/ex_schedule.c](./OpenMP/ex_schedule.c): `schedule`

In [26]:
filename = 'ex_schedule'
!gcc -g -Wall -fopenmp -o {filename} {filename}.c -lm
!time ./{filename} 1000 4
!rm -f {filename}

With 4 threads, sum = 1.133390068763746e+00
./ex_schedule 1000 4  0.00s user 0.00s system 68% cpu 0.005 total


In [27]:
# schedule(static, 1) 
filename = 'ex_schedule'
!gcc -g -Wall -fopenmp -o {filename} {filename}.c -lm
!time ./{filename} 1000 4
!rm -f {filename}

With 4 threads, sum = 1.133390068763715e+00
./ex_schedule 1000 4  0.00s user 0.00s system 54% cpu 0.004 total


# 生产者, 消费者
* 队列

constructs:

```c
// queue tail pointer: enqueue, dequeue
#pragma omp critical

// explicit barrier
#pragma omp barrier

// protect critical sections that consist of a single C assignment statement
#pragma omp atomic

// name a critical section
#pragma omp critical(name)

// locks: simple, nested
omp_lock_t
omp_init_lock()
omp_set_lock()
omp_unset_lock()
omp_test_lock()
omp_destroy_lock()

omp_nest_lock_t
omp_init_nest_lock()
omp_set_nest_lock()
omp_unset_nest_lock()
omp_test_nest_lock()
omp_destroy_nest_lock()
```

# Tasking
* [OpenMP/ex_task.c](./OpenMP/ex_task.c): `task`, `taskwait`

In [41]:
filename = 'ex_task'
!gcc -g -Wall -fopenmp -o {filename} {filename}.c
!time ./{filename} 35 4
!rm -f {filename}

fib(35) = 9227465
./ex_task 35 4  29.83s user 4.68s system 268% cpu 12.859 total


In [40]:
# task if (n > 20): reduce task creation
filename = 'ex_task'
!gcc -g -Wall -fopenmp -o {filename} {filename}.c
!time ./{filename} 35 4
!rm -f {filename}

fib(35) = 9227465
./ex_task 35 4  13.61s user 1.22s system 268% cpu 5.520 total


In [39]:
# NOTE: Why only one thread execute faster?
# Can we find another running example to view the benifits of tasking API???
filename = 'ex_task'
!gcc -g -Wall -fopenmp -o {filename} {filename}.c
!time ./{filename} 35 2
!rm -f {filename}

filename = 'ex_task'
!gcc -g -Wall -fopenmp -o {filename} {filename}.c
!time ./{filename} 35 1
!rm -f {filename}

fib(35) = 9227465
./ex_task 35 2  5.08s user 0.56s system 173% cpu 3.258 total
fib(35) = 9227465
./ex_task 35 1  1.09s user 0.00s system 99% cpu 1.097 total
