### OpenMP and MPI Interactive Tutorial

## Part 1: OpenMP

### 2. Basics of OpenMP

#### What is OpenMP?
OpenMP is a parallel programming model designed for shared-memory systems. It allows programmers to parallelize code using simple compiler directives while maintaining ease of development. OpenMP is supported by most modern C, C++, and Fortran compilers.

#### OpenMP Execution Model (Fork-Join)
- The OpenMP execution model follows a **fork-join** pattern:
  1. The program starts as a single **master thread**.
  2. When a parallel region is encountered, the master thread **forks** multiple worker threads.
  3. The threads execute concurrently.
  4. At the end of the parallel region, the worker threads **join** back into the master thread.

#### Writing a Simple "Hello, World!" Program in OpenMP
Let's write a basic OpenMP program to print messages from multiple threads:

```c
#include <omp.h>
#include <stdio.h>

int main() {
    #pragma omp parallel
    {
        int id = omp_get_thread_num();
        printf("Hello from thread %d\n", id);
    }
    return 0;
}
```

#### Explanation:
- `#pragma omp parallel` creates a parallel region where multiple threads execute the enclosed block.
- `omp_get_thread_num()` returns the thread's unique identifier.

**Note**: By executing the following cell a hello_world.c file will be created.

In [1]:
%%file hello_world.c
#include <omp.h>
#include <stdio.h>

int main() {
    #pragma omp parallel
    {
        int id = omp_get_thread_num();
        printf("Hello from thread %d\n", id);
    }
    return 0;
}

Writing hello_world.c


#### Running the Code
To compile and run an OpenMP program you can use the following commands in terminal. You can achieve the same result by running the following cell in the notebook:
```bash
$ gcc -fopenmp hello_world.c -o hello_world
$ ./hello_world
```

In [2]:
!gcc -fopenmp hello_world.c -o hello_world
!./hello_world

Hello from thread 0
Hello from thread 1


#### Exercise:
Modify the Hello World program to print the **total number of threads** using `omp_get_num_threads()`.

In [1]:
%%file thread_count.c
#include <omp.h>
#include <stdio.h>

int main() {
    #pragma omp parallel
    {
        int id = omp_get_thread_num();
        printf("Hello from thread %d\n", id);
    }
    return 0;
}

Writing thread_count.c


In [2]:
!gcc -fopenmp thread_count.c -o thread_count
!./thread_count

Hello from thread 0
Hello from thread 2
Hello from thread 1
Hello from thread 3


### 3. OpenMP Directives and Constructs

OpenMP provides various directives to control parallel execution efficiently. These directives are used to manage threads, workload distribution, synchronization, and data-sharing attributes.

#### 3.1 Parallel Directive
The `#pragma omp parallel` directive is the fundamental construct in OpenMP, which creates a parallel region where multiple threads execute the enclosed code block.

Example:
```c
#pragma omp parallel
{
    int id = omp_get_thread_num();
    printf("Executing in parallel%d\n", id);
}
```

**Note**: The output of the following code can vary depending on which thread is executed first.

In [3]:
%%file parallel_directive.c
#include <omp.h>
#include <stdio.h>

int main() {
    #pragma omp parallel
    {
        int id = omp_get_thread_num();
        printf("Executing in parallel%d\n", id);
    }
    return 0;
}

Writing parallel_directive.c


In [4]:
!gcc -fopenmp parallel_directive.c -o parallel_directive
!./parallel_directive

Executing in parallel1
Executing in parallel0
Executing in parallel3
Executing in parallel2


#### 3.2 Work Sharing Constructs
OpenMP provides multiple constructs to divide tasks among threads:
- `#pragma omp for`: Distributes loop iterations across threads.
- `#pragma omp sections`: Assigns independent code blocks to different threads.
- `#pragma omp single`: Ensures that a block of code is executed by only one thread.
- `#pragma omp master`: Ensures execution only by the master thread.

Example:
```c
#pragma omp parallel
{
    #pragma omp for
    for (int i = 0; i < 10; i++) {
        printf("Iteration %d handled by thread %d\n", i, omp_get_thread_num());
    }
}
```

In [9]:
%%file work_sharing.c
#include <omp.h>
#include <stdio.h>

int main() {
    #pragma omp parallel
    {
        #pragma omp for
        for (int i = 0; i < 10; i++) {
            printf("Iteration %d handled by thread %d\n", i, omp_get_thread_num());
        }
    }
}

Writing work_sharing.c


In [10]:
!gcc -fopenmp work_sharing.c -o work_sharing
!./work_sharing

Iteration 0 handled by thread 0
Iteration 1 handled by thread 0
Iteration 2 handled by thread 0
Iteration 5 handled by thread 1
Iteration 6 handled by thread 1
Iteration 7 handled by thread 1
Iteration 8 handled by thread 1
Iteration 9 handled by thread 1
Iteration 3 handled by thread 0
Iteration 4 handled by thread 0


#### 3.3 Synchronization Constructs
To prevent race conditions and ensure safe access to shared resources, OpenMP provides synchronization directives:
- `#pragma omp critical`: Ensures that a section of code is executed by only one thread at a time.
- `#pragma omp atomic`: Ensures that a specific operation is performed atomically.
- `#pragma omp barrier`: Synchronizes all threads at a specific point.

Example:
```c
int sum = 0;
#pragma omp parallel for
for (int i = 0; i < 10; i++) {
    #pragma omp critical
    {
        sum += i;
    }
}
```

In [11]:
%%file synchronization_constructs.c
#include <omp.h>
#include <stdio.h>

int main() {
    int sum = 0;
    #pragma omp parallel for
    for (int i = 0; i < 10; i++) {
        #pragma omp critical
        {
            sum += i;
            printf("Iteration %d handled by thread %d, sum is: %d\n", i, omp_get_thread_num(), sum);
        }
    }
}

Writing synchronization_constructs.c


In [12]:
!gcc -fopenmp synchronization_constructs.c -o synchronization_constructs
!./synchronization_constructs

Iteration 0 handled by thread 0, sum is: 0
Iteration 1 handled by thread 0, sum is: 1
Iteration 2 handled by thread 0, sum is: 3
Iteration 3 handled by thread 0, sum is: 6
Iteration 4 handled by thread 0, sum is: 10
Iteration 5 handled by thread 1, sum is: 15
Iteration 6 handled by thread 1, sum is: 21
Iteration 7 handled by thread 1, sum is: 28
Iteration 8 handled by thread 1, sum is: 36
Iteration 9 handled by thread 1, sum is: 45


#### 3.4 Data Sharing Attributes
OpenMP allows controlling variable scope across threads:
- `shared`: Variables are accessible by all threads.
- `private`: Each thread has its own copy of the variable.
- `firstprivate`: Each thread gets a private copy initialized with the original value.
- `lastprivate`: Updates the original variable with the last thread's value.

Example:
```c
int x = 10;
#pragma omp parallel private(x)
{
    x = omp_get_thread_num();
    printf("Thread %d, x = %d\n", omp_get_thread_num(), x);
}
```

In [13]:
%%file sharing_attributes.c
#include <omp.h>
#include <stdio.h>

int main() {
    int x = 10;
    #pragma omp parallel private(x)
    {
        x = omp_get_thread_num();
        printf("Thread %d, x = %d\n", omp_get_thread_num(), x);
    }   
}

Writing sharing_attributes.c


In [14]:
!gcc -fopenmp sharing_attributes.c -o sharing_attributes
!./sharing_attributes

Thread 0, x = 0
Thread 1, x = 1


#### 3.5 Reduction Directive
The `reduction` clause combines results from multiple threads safely.
Example:
```c
int sum = 0;
#pragma omp parallel for reduction(+:sum)
for (int i = 0; i < 10; i++) {
    sum += i;
}
```

These directives and constructs allow efficient parallel execution while maintaining data consistency. Next, we will explore advanced OpenMP features!

In [15]:
%%file reduction_directive.c
#include <omp.h>
#include <stdio.h>

int main() {
    int sum = 0;
    #pragma omp parallel for reduction(+:sum)
    for (int i = 0; i < 10; i++) {
        sum += i;
        printf("Thread %d, i = %d, sum = %d\n", omp_get_thread_num(), i, sum);
    }
    printf("Sum = %d\n", sum);
}

Writing reduction_directive.c


In [16]:
!gcc -fopenmp reduction_directive.c -o reduction_directive
!./reduction_directive

Thread 0, i = 0, sum = 0
Thread 0, i = 1, sum = 1
Thread 0, i = 2, sum = 3
Thread 0, i = 3, sum = 6
Thread 0, i = 4, sum = 10
Thread 1, i = 5, sum = 5
Thread 1, i = 6, sum = 11
Thread 1, i = 7, sum = 18
Thread 1, i = 8, sum = 26
Thread 1, i = 9, sum = 35
Sum = 45


### 4. Synchronization in OpenMP

Synchronization is crucial in OpenMP to ensure that multiple threads can work together without data races or inconsistencies. OpenMP provides various synchronization constructs to manage access to shared resources.

#### 4.1 Critical Sections
The `#pragma omp critical` directive ensures that only one thread at a time executes the enclosed code block, preventing race conditions.

Example:
```c
#include <omp.h>
#include <stdio.h>

int main() {
    int sum = 0;
    #pragma omp parallel for
    for (int i = 0; i < 10; i++) {
        #pragma omp critical
        {
            sum += i;
            printf("Thread %d added %d, total sum = %d\n", omp_get_thread_num(), i, sum);
        }
    }
    return 0;
}
```



In [17]:
%%file critical_section.c
#include <omp.h>
#include <stdio.h>

int main() {
    int sum = 0;
    #pragma omp parallel for
    for (int i = 0; i < 10; i++) {
        #pragma omp critical
        {
            sum += i;
            printf("Thread %d added %d, total sum = %d\n", omp_get_thread_num(), i, sum);
        }
    }
    return 0;
}

Writing critical_section.c


In [18]:
!gcc -fopenmp critical_section.c -o critical_section
!./critical_section

Thread 0 added 0, total sum = 0
Thread 0 added 1, total sum = 1
Thread 0 added 2, total sum = 3
Thread 0 added 3, total sum = 6
Thread 0 added 4, total sum = 10
Thread 1 added 5, total sum = 15
Thread 1 added 6, total sum = 21
Thread 1 added 7, total sum = 28
Thread 1 added 8, total sum = 36
Thread 1 added 9, total sum = 45


#### 4.2 Atomic Operations
The `#pragma omp atomic` directive ensures that a specific memory operation is performed atomically, reducing overhead compared to critical sections.

Example:
```c
int sum = 0;
#pragma omp parallel for
for (int i = 0; i < 10; i++) {
    #pragma omp atomic
    sum += i;
}
```



In [19]:
%%file atomic_operation.c
#include <omp.h>
#include <stdio.h>

int main() {
    int sum = 0;
    #pragma omp parallel for
    for (int i = 0; i < 10; i++) {
        #pragma omp atomic
        sum += i;
        printf("Thread %d added %d, total sum = %d\n", omp_get_thread_num(), i, sum);
    }
    return 0;
}

Writing atomic_operation.c


In [20]:
!gcc -fopenmp atomic_operation.c -o atomic_operation
!./atomic_operation

Thread 0 added 0, total sum = 0
Thread 0 added 1, total sum = 6
Thread 0 added 2, total sum = 8
Thread 0 added 3, total sum = 11
Thread 0 added 4, total sum = 15
Thread 1 added 5, total sum = 5
Thread 1 added 6, total sum = 21
Thread 1 added 7, total sum = 28
Thread 1 added 8, total sum = 36
Thread 1 added 9, total sum = 45


#### 4.3 Barriers
The `#pragma omp barrier` directive forces all threads to wait at a synchronization point before proceeding.

Example:
```c
#pragma omp parallel
{
    printf("Thread %d reached barrier.\n", omp_get_thread_num());
    #pragma omp barrier
    printf("Thread %d passed barrier.\n", omp_get_thread_num());
}
```



In [21]:
%%file barriers.c
#include <omp.h>
#include <stdio.h>

int main() {
    #pragma omp parallel
    {
        printf("Thread %d reached barrier.\n", omp_get_thread_num());
        #pragma omp barrier
        printf("Thread %d passed barrier.\n", omp_get_thread_num());
    }
    return 0;
}

Writing barriers.c


In [22]:
!gcc -fopenmp barriers.c -o barriers
!./barriers

Thread 0 reached barrier.
Thread 1 reached barrier.
Thread 1 passed barrier.
Thread 0 passed barrier.


#### 4.4 Ordered Execution
The `#pragma omp ordered` directive ensures that specific sections of code are executed in order within a parallel loop.

Example:
```c
#pragma omp parallel for ordered
for (int i = 0; i < 10; i++) {
    #pragma omp ordered
    printf("Iteration %d executed by thread %d\n", i, omp_get_thread_num());
}
```



In [23]:
%%file ordered_execution.c
#include <omp.h>
#include <stdio.h>

int main() {
    #pragma omp parallel for ordered
    for (int i = 0; i < 10; i++) {
        #pragma omp ordered
        printf("Iteration %d executed by thread %d\n", i, omp_get_thread_num());
    }
    return 0;
}

Writing ordered_execution.c


In [24]:
!gcc -fopenmp ordered_execution.c -o ordered_execution
!./ordered_execution

Iteration 0 executed by thread 0
Iteration 1 executed by thread 0
Iteration 2 executed by thread 0
Iteration 3 executed by thread 0
Iteration 4 executed by thread 0
Iteration 5 executed by thread 1
Iteration 6 executed by thread 1
Iteration 7 executed by thread 1
Iteration 8 executed by thread 1
Iteration 9 executed by thread 1


#### 4.5 Locks
OpenMP provides explicit locking mechanisms for fine-grained control using `omp_lock_t`.

Example:
```c
omp_lock_t lock;
omp_init_lock(&lock);
#pragma omp parallel
{
    omp_set_lock(&lock);
    printf("Thread %d has acquired the lock.\n", omp_get_thread_num());
    omp_unset_lock(&lock);
}
omp_destroy_lock(&lock);
```

These synchronization techniques ensure safe and efficient parallel execution in OpenMP programs. Next, we will explore **Work Sharing Constructs** in OpenMP!


In [25]:
%%file locks.c
#include <omp.h>
#include <stdio.h>

int main() {
    omp_lock_t lock;
    omp_init_lock(&lock);
    #pragma omp parallel
    {
        omp_set_lock(&lock);
        printf("Thread %d has acquired the lock.\n", omp_get_thread_num());
        omp_unset_lock(&lock);
    }
    omp_destroy_lock(&lock);
    return 0;
}

Writing locks.c


In [26]:
!gcc -fopenmp locks.c -o locks
!./locks

Thread 0 has acquired the lock.
Thread 1 has acquired the lock.


### 5. Work Sharing Constructs

Work sharing constructs in OpenMP allow efficient distribution of tasks among multiple threads to maximize parallel performance. These constructs include loop parallelization, sections, single-thread execution, and task parallelism.

#### 5.1 `#pragma omp for`
The `#pragma omp for` directive distributes loop iterations among multiple threads.

Example:
```c
#include <omp.h>
#include <stdio.h>

int main() {
    #pragma omp parallel for
    for (int i = 0; i < 10; i++) {
        printf("Thread %d processing iteration %d\n", omp_get_thread_num(), i);
    }
    return 0;
}
```

In [27]:
%%file omp_for.c
#include <omp.h>
#include <stdio.h>

int main() {
    #pragma omp parallel for
    for (int i = 0; i < 10; i++) {
        printf("Thread %d processing iteration %d\n", omp_get_thread_num(), i);
    }
    return 0;
}

Writing omp_for.c


In [28]:
!gcc -fopenmp omp_for.c -o omp_for
!./omp_for

Thread 0 processing iteration 0
Thread 0 processing iteration 1
Thread 0 processing iteration 2
Thread 0 processing iteration 3
Thread 0 processing iteration 4
Thread 1 processing iteration 5
Thread 1 processing iteration 6
Thread 1 processing iteration 7
Thread 1 processing iteration 8
Thread 1 processing iteration 9


#### 5.2 `#pragma omp sections`
The `#pragma omp sections` directive allows independent code blocks to be executed by different threads.

Example:
```c
#pragma omp parallel
{
    #pragma omp sections
    {
        #pragma omp section
        printf("Thread %d executing section A\n", omp_get_thread_num());

        #pragma omp section
        printf("Thread %d executing section B\n", omp_get_thread_num());
    }
}
```

In [29]:
%%file omp_sections.c
#include <omp.h>
#include <stdio.h>

int main() {
    #pragma omp parallel
    {
        #pragma omp sections
        {
            #pragma omp section
            printf("Thread %d executing section A\n", omp_get_thread_num());

            #pragma omp section
            printf("Thread %d executing section B\n", omp_get_thread_num());
        }
    }
}

Writing omp_sections.c


In [30]:
!gcc -fopenmp omp_sections.c -o omp_sections
!./omp_sections

Thread 0 executing section A
Thread 1 executing section B


#### 5.3 `#pragma omp single`
The `#pragma omp single` directive ensures that only one thread executes a block of code, while other threads wait.

Example:
```c
#pragma omp parallel
{
    #pragma omp single
    {
        printf("Thread %d is executing the single block.\n", omp_get_thread_num());
    }
}
```

In [31]:
%%file omp_single.c
#include <omp.h>
#include <stdio.h>

int main() {
    #pragma omp parallel
    {
        #pragma omp single
        {
            printf("Thread %d is executing the single block.\n", omp_get_thread_num());
        }
    }
}

Writing omp_single.c


In [32]:
!gcc -fopenmp omp_single.c -o omp_single
!./omp_single

Thread 0 is executing the single block.


#### 5.4 `#pragma omp master`
The `#pragma omp master` directive ensures that only the master thread (thread 0) executes a block of code without synchronization overhead.

Example:
```c
#pragma omp parallel
{
    #pragma omp master
    {
        printf("Master thread executing this block.\n");
    }
}
```

In [33]:
%%file omp_master.c
#include <omp.h>
#include <stdio.h>

int main() {
    #pragma omp parallel
    {
        #pragma omp master
        {
            printf("Master thread executing this block.\n");
        }
    }
}

Writing omp_master.c


In [34]:
!gcc -fopenmp omp_master.c -o omp_master
!./omp_master

Master thread executing this block.


#### 5.5 `#pragma omp task`
The `#pragma omp task` directive allows dynamic task creation where each task is assigned to an available thread.

Example:
```c
#pragma omp parallel
{
    #pragma omp single
    {
        #pragma omp task
        printf("Thread %d executing task 1\n", omp_get_thread_num());
        
        #pragma omp task
        printf("Thread %d executing task 2\n", omp_get_thread_num());
    }
}
```

These work-sharing constructs enable efficient parallelism by distributing computation among threads in a controlled manner. Next, we will explore **Data Sharing in OpenMP**!


In [35]:
%%file omp_task.c
#include <omp.h>
#include <stdio.h>

int main() {
    #pragma omp parallel
    {
        #pragma omp single
        {
            #pragma omp task
            printf("Thread %d executing task 1\n", omp_get_thread_num());
        
            #pragma omp task
            printf("Thread %d executing task 2\n", omp_get_thread_num());
        }
    }
}

Writing omp_task.c


In [36]:
!gcc -fopenmp omp_task.c -o omp_task
!./omp_task

Thread 1 executing task 1
Thread 0 executing task 2


### 6. Hands-on Exercises for OpenMP

To reinforce the concepts learned in OpenMP, we will go through several hands-on exercises that demonstrate different parallelization techniques. These exercises will help develop practical skills in writing and optimizing OpenMP programs.

#### Exercise 1: Parallelizing a Simple Loop
**Task:** Modify the following sequential loop to run in parallel using OpenMP.

```c
#include <stdio.h>
#include <omp.h>

int main() {
    int i;
    #pragma omp parallel for
    for (i = 0; i < 10; i++) {
        printf("Thread %d processing iteration %d\n", omp_get_thread_num(), i);
    }
    return 0;
}
```


In [37]:
%%file exercise_1.c
#include <stdio.h>
#include <omp.h>

int main() {
    int i;
    #pragma omp parallel for
    for (i = 0; i < 10; i++) {
        printf("Thread %d processing iteration %d\n", omp_get_thread_num(), i);
    }
    return 0;
}

Writing exercise_1.c


In [38]:
!gcc -fopenmp exercise_1.c -o exercise_1
!./exercise_1

Thread 0 processing iteration 0
Thread 0 processing iteration 1
Thread 0 processing iteration 2
Thread 0 processing iteration 3
Thread 0 processing iteration 4
Thread 1 processing iteration 5
Thread 1 processing iteration 6
Thread 1 processing iteration 7
Thread 1 processing iteration 8
Thread 1 processing iteration 9


#### Exercise 2: Computing the Sum of an Array
**Task:** Use OpenMP to parallelize an array summation.

```c
#include <stdio.h>
#include <omp.h>

#define N 1000

int main() {
    int i, sum = 0;
    int arr[N];
    for (i = 0; i < N; i++) arr[i] = i;

    #pragma omp parallel for reduction(+:sum)
    for (i = 0; i < N; i++) {
        sum += arr[i];
    }
    
    printf("Total sum: %d\n", sum);
    return 0;
}
```

In [39]:
%%file exercise_2.c
#include <stdio.h>
#include <omp.h>

#define N 1000

int main() {
    int i, sum = 0;
    int arr[N];
    for (i = 0; i < N; i++) arr[i] = i;

    #pragma omp parallel for reduction(+:sum)
    for (i = 0; i < N; i++) {
        sum += arr[i];
    }
    
    printf("Total sum: %d\n", sum);
    return 0;
}

Writing exercise_2.c


In [40]:
!gcc -fopenmp exercise_2.c -o exercise_2
!./exercise_2

Total sum: 499500


#### Exercise 3: Parallelizing Matrix Multiplication
**Task:** Modify the following matrix multiplication code to run in parallel using OpenMP.

```c
#include <stdio.h>
#include <omp.h>

#define N 500

int main() {
    int i, j, k;
    double A[N][N], B[N][N], C[N][N];

    // Initialize matrices
    for (i = 0; i < N; i++)
        for (j = 0; j < N; j++) {
            A[i][j] = i + j;
            B[i][j] = i - j;
            C[i][j] = 0;
        }

    #pragma omp parallel for private(i, j, k)
    for (i = 0; i < N; i++)
        for (j = 0; j < N; j++)
            for (k = 0; k < N; k++)
                C[i][j] += A[i][k] * B[k][j];

    printf("Matrix multiplication completed.\n");
    return 0;
}
```

In [41]:
%%file exercise_3.c
#include <stdio.h>
#include <omp.h>

#define N 500

int main() {
    int i, j, k;
    double A[N][N], B[N][N], C[N][N];

    // Initialize matrices
    for (i = 0; i < N; i++)
        for (j = 0; j < N; j++) {
            A[i][j] = i + j;
            B[i][j] = i - j;
            C[i][j] = 0;
        }

    #pragma omp parallel for private(i, j, k)
    for (i = 0; i < N; i++)
        for (j = 0; j < N; j++)
            for (k = 0; k < N; k++)
                C[i][j] += A[i][k] * B[k][j];

    printf("Matrix multiplication completed.\n");
    return 0;
}

Writing exercise_3.c


In [42]:
!gcc -fopenmp exercise_3.c -o exercise_3
!./exercise_3

Matrix multiplication completed.


#### Exercise 4: Implementing a Parallel Reduction for Finding Maximum
**Task:** Modify the following code to find the maximum value in an array using OpenMP’s reduction clause.

```c
#include <stdio.h>
#include <omp.h>

#define N 1000

int main() {
    int i, max_val = 0;
    int arr[N];
    for (i = 0; i < N; i++) arr[i] = i;

    #pragma omp parallel for reduction(max:max_val)
    for (i = 0; i < N; i++) {
        if (arr[i] > max_val)
            max_val = arr[i];
    }
    
    printf("Maximum value: %d\n", max_val);
    return 0;
}
```

These hands-on exercises will help in developing OpenMP programming skills. Next, we will explore **MPI Basics**!

In [43]:
%%file exercise_4.c
#include <stdio.h>
#include <omp.h>

#define N 1000

int main() {
    int i, max_val = 0;
    int arr[N];
    for (i = 0; i < N; i++) arr[i] = i;

    #pragma omp parallel for reduction(max:max_val)
    for (i = 0; i < N; i++) {
        if (arr[i] > max_val)
            max_val = arr[i];
    }
    
    printf("Maximum value: %d\n", max_val);
    return 0;
}

Writing exercise_4.c


In [44]:
!gcc -fopenmp exercise_4.c -o exercise_4
!./exercise_4

Maximum value: 999
