### Loop Parallelism

Loop parallelism is a form of parallelism and _programming pattern_ that derives parallel tasks from the iterations of loops.

* Most common use and programming pattern for OpenMP
  * add parallel directives to a for loop
  * OpenMP divides the loops iterations into _chunks_ assigned to threads
* Merits of loop parallelism
  * __Sequential equivalence__: parallel program is equivalent to a serial program (easy to write and maintain, good tools)
  * __Refactoring__: Incremental conversion of a serial program to a parallel program (easy to test and debug)
* Drawbacks of loop parallelism
  * __Memory utilization__: if loop access patterns don’t match cache hierarchy, programs often require massive restructuring
  
### #pragma parallel for

In [1]:
.L omp

In [None]:
#include <iostream>
#include <omp.h>

{
  #pragma omp parallel for 
  for ( int i=0; i<100; i++ )
  {
    std::cout << "OMP Thread# " << omp_get_thread_num() << " loop variable " << i << "\n";
  }
}

OpenMP divided the iterations of the loops into contiguous _chunks_ assigned to threads
  * number of threads derived from environment
  * chunks are (by default) sequential: leads to _coalesced_ and _sequential_ memory utilization
 
  
### Loop Scheduling

The full looping directive includes the specification of a scheduling directive and a chunk size
```c
#pragma omp parallel for schedule(kind [,chunk size])
```
in which schedule can be one of:
* Static – divide loop into equal sized chunks
* Dynamic — build internal work queue and dispatch blocksize at a time
* Guided — dynamic scheduling with decreasing block size for load balance
* Auto — compiler chooses from above
* Runtime — runtime configuration chooses from above

__NoteBook__: <a href="openmp/loops.ipynb">Loop Parallelism</a>
