# HW 2: Shared Memory Parallelism

This homework will primarily require writing code so you should be turning in a **SINGLE** zip/tar file into courseworks as your homework.

## 1 - Buggy OpenMP Code

[30] Below is code that produces a simple sum of numbers.  Unfortunately neither of the examples is working.  Turn in the needed changes to the code to make this work.  Note that we are also introducing a new directive here, the `schedule(dynamic, 10)` piece for the `for` and `do` `omp` directives that allow us to have more control over the thread break up the problem.

```c
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>

int main (int argc, char *argv[]) 
{
    int num_threads, i, thread_ID;
    double total;

    #pragma omp parallel
    {
        thread_ID = omp_get_thread_num();
        num_threads = omp_get_num_threads();
  
        if (thread_ID == 0)
        {
            printf("Number of threads = %d\n", num_threads);
        }

        printf("Thread %d is starting...\n", thread_ID);
        #pragma omp barrier

        total = 0.0;
        #pragma omp for schedule(dynamic,10)
        for (i = 0; i < 1000000; i++) 
            total = total + (double)i;

        printf ("Thread %d is done! Total= %f\n", thread_ID, total);

    }
}
```
```fortran
program problem1

    use omp_lib

    implicit none

    integer :: num_threads, i, thread_ID
    double precision :: total

    !$omp parallel

    thread_ID = omp_get_thread_num()
    num_threads = omp_get_num_threads()

    if (thread_ID == 0) then
        print *, "Number of threads = ", num_threads
    end if

    print *, "Thread ", thread_ID, " is starting..."
    !$omp barrier

    total = 0.d0
    !$omp do schedule(dynamic, 10)
    do i = 1, 999999
        total = total + real(i, kind=8)
    end do

    print *, "Thread ", thread_ID, " is done! Total= ", total

    !$omp end parallel

end program problem1
```


Solution:

```c
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>

int main (int argc, char *argv[])
{
    int num_threads, i, thread_ID;
    double total, local_sum;

    #pragma omp parallel private(thread_ID, local_sum) shared(total)
    {
        thread_ID = omp_get_thread_num();
        num_threads = omp_get_num_threads();

        if (thread_ID == 0)
        {
            printf("Number of threads = %d\n", num_threads);
        }

        printf("Thread %d is starting...\n", thread_ID);
        #pragma omp barrier

        total = 0.0;
        local_sum = 0.0;
        #pragma omp for schedule(dynamic,10) private(total)
        for (i = 0; i < 1000000; i++)
               for (i = 0; i < 1000000; i++)
            local_sum = local_sum + (double)i;
        #pragma omp critical
        total = total + local_sum;

        printf ("Thread %d is done! Total= %f\n", thread_ID, total);

    }
    printf ("True Total= %f \n", total);
}
```

## 2 - Outer-Vector Product

Compute $x \otimes y = A$.

[35] Using shared memory parallelism write a program that takes two vectors and computes the outer-vector product that produces the resulting matrix.  You can start with the following stub code if you would like.
```c
#include <stdio.h>
#include <stdlib.h>

int main (int argc, char *argv[]) 
{
    int const N = 10;
    double A[N][N], x[N], y[N];

    // Fill x and y
    for (int i = 0; i < N; ++i)
    {
        x[i] = (double) i;
        y[i] = 2.0 * i;
    }

    // Perform the outer product
    for (int j = 0; j < N; ++j)
        for (int i = 0; i < N; ++i)
            A[i][j] = x[i] * y[j];
}

```
```fortran
program problem2

    implicit none

    integer, parameter :: N = 1000
    double precision :: A[N, N], x[N], y[N]

    do i=1, N
        x(i) = real(i, kind=8)
        y(i) = 2.d0 * real(i, kind=8)
    end do

    do i=1, N
        do j=1, N
            A(i, j) = x(i) * y(j)
        end do
    end do

end program problem2
```

Soltion:

```c
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>

int main (int argc, char *argv[])
{
    int const N = 10;
    double A[N][N], x[N], y[N];

    #pragma omp parallel
    {
        // Fill in x and y
        #pragma omp for schedule(static)
        for (int i = 0; i < N; ++i)
            {
                x[i] = (double) i;
                y[i] = 2.0 * i;
                printf("y = %f \n", y[i]);
            }
        #pragma omp barrier

        // Perform the outer product
        #pragma omp for schedule(static)
        for (int j = 0; j < N; ++j)
            for (int i = 0; i < N; ++i)
                A[i][j] = x[i] * y[j];
    }
    printf("A[8][8] = %f \n", A[10]);
}
```

## 3 - Race Conditions

[35] Write a code that demonstrates a race condition in share memory parallelism.  Explain in the comments why it happens and also provide the corrected code as another source file in your solution.

Problem number 1 demonstarates a race condition. When twoi threads access and read the 'total' variable before one of them gets a chance to rewrite to the total vairable. Each thread adds its current number to its local value of 'total'. Regardless of who rewrites to the 'total' variable first, one of the aditions are lost to the update (the first update or rewrite).