# Lab2: Parallelism Generation in Multithreading

The objective of this lab is to better understand parallelism generation in OpenMP. Parallelism generation is the ability to create workers. Some of these concepts were briefly introduced in [lab1](../Lab1/lab1.ipynb). In this lab we will describe them more in depth

## Table of content:

## Parallel regions
Since its origin, OpenMP has focus on parallel regions. Parallel regions are created with the `#pragma omp parallel` directive, and its purpose is to generate a collection of software threads, each entering the same region of code.

The `parallel` construct generates a `team of threads` containing one or more threads. There are different factors that influence how many threads are created and the user have some control over these. One factor that we cannot control as users is the system limitation. OpenMP is portable across systems, however, each system may impose limitations on how many threads can be generated. For example, an embedded system with a light weight operating system may not allow thread oversubscription (see lab 1 for details on this), a batch scheduler may restrict the number of threads in an application, or an OpenMP compiler implementation may decide to limit the number of threads it can handle to a maximum number.

There are two rules of thumb:
1. Number of threads used by the application is not guaranteed to be equal to the number of threads requested. The actual number of threads may be less or implementation defined. It is not a good idea for applications to rely on the number of threads as this will limit portability (Use discretion here, as there are good reasons to break this rule.)
2. System oversubscription is not recommended, when all the threads will be constantly busy, as this has demonstrated to reduce application performance. Still, some applications use thread oversubscription to have some threads that do little work. 

Elements that affect the number of threads:
* The `OMP_NUM_THREADS` environment variable
* The `omp_set_num_threads()` API call
* The `num_threads()` clause of the `parallel` construct
* The `if()` clause of the `parallel` construct
* The `OMP_DYNAMIC` environment variable
* The `omp_set_dynamic()` API call
* The `thread_limit()` clause in the `teams` directive
* The `OMP_THREAD_LIMIT` and `OMP_TEAMS_THREAD_LIMIT` environment variable
* The `OMP_NESTED` and `OMP_MAX_ACTIVE_LEVELS` environment variables

In this section we will try to cover most of these. For the following example, we assume that we are running on a system that supports more threads.

### OMP_NUM_THREADS environment variable
Sets the default value to be use by `parallel` regions inside the OpenMP code.

For the following hello world code

```C
#pragma omp parallel
{
    printf("Hello from %d out of \n", omp_get_thread_num(), omp_get_num_threads());
}
```

The number of threads can be changed during runtime by using the `OMP_NUM_THREADS` environment variable, without the need to recompile

In [None]:
# First, let's compile it
!clang -fopenmp C/hello_world.c -o C/hello_world.exe

In [None]:
# Now let's run it with different values

#Running with default num threads. Implementation defined
!C/./hello_world.exe

In [None]:
#Running with different num threads modifying OMP_NUM_THREADS
!OMP_NUM_THREADS=3 C/./hello_world.exe

In [None]:
#Running with different num threads modifying OMP_NUM_THREADS
!OMP_NUM_THREADS=5 C/./hello_world.exe

Note:
```
Environment variables depend on the shell you're using. Here I am assuming bash and I am using the inline format. In bash you can also call `export` command prior to the execution of the program. Use your favorite shell, but change the code appropriately
```

You can check this code in [hello_world.c](C/hello_world.c).

## OMP_THREAD_LIMIT environment variable

This environment variable presents an additional limitation in the number of threads that are used, regardless of the number of threads requested. Using the same example above, we can set both environment variables to demonstrate the precedence order.

In [None]:
#Building the hello world program
!clang -fopenmp C/./hello_world.exe -o C/hello_world.exe

In [None]:
# Running it with OMP_THREAD_LIMIT
!OMP_THREAD_LIMIT=4 OMP_NUM_THREADS=1000 C/./hello_world.exe

## Programs relying on number of threads
Notice that it is possible to create programs that lock or fail due to insufficient number of threads. Take for example the following code:

```C
int a_var = 0;
#pragma omp parallel shared(a_var)
{
    if (omp_get_thread_num()==0) {
        while(a_var != 0);
    } else if {
        #pragma omp atomic
        a_var++;
    }
}
```

Really important note:
```
The example above is a terrible code meant only to demonstrate that it is possible to create programs that rely on a given number of threads, in this case it must be more than 1. Please use caution when using this code, and don't blame me.
```

