# Shared Memory Parallelism using OpenMP

We want to use OpenMP to enable parallel execution of our codes. If multiple workers can do the same job, execution will be sped up.

### Example

Computing $\pi$ using the Leibniz formula:
$$1 - \frac{1}{3} + \frac{1}{5} - \frac{1}{7} + \frac{1}{9} ... = \frac{\pi}{4}$$

In [None]:
pygmentize omp_examples/01-demo.cpp

We first need to compile this C++ code using a compiler into an executable we name `serial`.

In [None]:
g++ omp_examples/01-demo.cpp -o serial

Now we can run the generated executable.

In [None]:
./serial

Seems good enough.

Now if we have 12 cores available on our CPU, what is the easiest way to parallelize this?

In [None]:
pygmentize omp_examples/01-demoparallel.cpp

We need to tell the compiler that our program contains OpenMP pragmas with the `-fopenmp` option.

In [None]:
g++ -fopenmp omp_examples/01-demoparallel.cpp -o parallel

We can determine the number of cores (threads) to use with the `OMP_NUM_THREADS` environment variable.

In [None]:
OMP_NUM_THREADS=12 ./parallel

Check the performance impact of the one line of pragma changes

In [None]:
time for i in {1..10}; do ./serial; done
time for i in {1..10}; do OMP_NUM_THREADS=12 ./parallel; done

And a bit of cleanup

In [None]:
rm serial parallel

# Amdahl's law

We look at the performance of the simple code above (slightly changed for better output readability

In [None]:
pygmentize omp_examples/02-timing.cpp

In [None]:
g++ -fopenmp omp_examples/02-timing.cpp -o timing

In [None]:
./timing 1 > out.txt
./timing 2 >> out.txt
./timing 3 >> out.txt
./timing 4 >> out.txt
./timing 5 >> out.txt
./timing 6 >> out.txt
./timing 7 >> out.txt
./timing 8 >> out.txt
./timing 9 >> out.txt
./timing 10 >> out.txt
./timing 11 >> out.txt
./timing 12 >> out.txt

In [None]:
gnuplot -e "\
set terminal png; \
set style fill solid; \
set yrange[0:0.1]; \
set xlabel '# cores'; \
set ylabel 'runtime [s]'; \
plot 'out.txt' using 2: xtic(1) title 'runtime' with histogram \
" | display

In [None]:
base=`head -1 out.txt | awk '{print $2}'`
gnuplot -e "\
set terminal png; \
set style fill solid; \
set yrange[0:14]; \
set xlabel '# cores'; \
set ylabel 'speedup (relative to 1 core)'; \
plot 'out.txt' using ($base/\$2): xtic(1) title 'runtime' with histogram, 
'out.txt' using :(\$1) title 'linear' with lines\
" | display

And a bit of cleanup

In [None]:
rm out.txt timing

# Race conditions

Since these processes can all interfere with each other we need to be careful

In [None]:
pygmentize omp_examples/03-race.cpp

What happens if we write to the same memory location with more than one thread?

In [None]:
g++ -fopenmp omp_examples/03-race.cpp -o test

In [None]:
OMP_NUM_THREADS=10 ./test

This does not only affect variables defined outside. This can have a lot of implications:

In [None]:
pygmentize omp_examples/03-race2.cpp

In [None]:
g++ -fopenmp omp_examples/03-race2.cpp -o output

In [None]:
./output

and a bit of cleanup

In [None]:
rm -f test output

# Synchronization

Options to prevent race conditions are:
- Ensure only one thread is in the critical region at once
- Make writes atomic

### Ensure only one processor is present

In [None]:
pygmentize omp_examples/04-critical.cpp

In [None]:
g++ -fopenmp omp_examples/04-critical.cpp -o test
./test

In [None]:
pygmentize omp_examples/04-ordered.cpp

In [None]:
g++ -fopenmp omp_examples/04-ordered.cpp -o test
./test

In [None]:
pygmentize omp_examples/04-flush.cpp

In [None]:
g++ -fopenmp omp_examples/04-flush.cpp -o test
./test

And a bit of cleanup

In [None]:
rm -f test

## Caching

Here we see the implication of caching in a multithreaded environment

In [None]:
pygmentize omp_examples/05-caching.cpp

In [None]:
g++ -fopenmp omp_examples/05-caching.cpp -o timing

In [None]:
./timing 1 > caching.txt
./timing 2 >> caching.txt
./timing 4 >> caching.txt
./timing 8 >> caching.txt
./timing 12 >> caching.txt
./timing 13 >> caching.txt
./timing 20 >> caching.txt
./timing 24 >> caching.txt

In [None]:
gnuplot -e "\
set terminal png; \
set style fill solid; \
set yrange[0:0.1]; \
plot 'caching.txt' using 2: xtic(1) with histogram \
" | display

And a bit of cleanup

In [None]:
rm -f timing caching.txt