# Hands-On 3: Parallelization with MPI
*   Enzo Bacelar Conte Gebauer
*   Luiz Guilherme Guerreiro
*   Maria Eduarda Lopes de Morais Brito


|  Sessions     | Codes               | files              |
| --------------| --------------------| ------------------ |
| Session 1     |  Basic Operations   |   operations.c   |
| Session 2     | Algebraic Function  |  function.c      |
| Session 3     |  Tridiagonal Matrix |   tridiagonal.c  |


## `Basic Operations`

The Algorithm below solves the multiplication, addition and subtraction of the elements of a vector of integers. The variable array is the vector on which the operations will be performed. Then, modify the program to run in parallel using MPI. Present the primitives used. The idea is made the following MPI version with only $4$ processes running.In the version, each process does a function: $1$ add, $1$ subtract and $1$ multiplies. The other process is responsible for telling each of the other $3$ its function, and when finished printing the results.

In [31]:
%%writefile operations.c
#include <stdio.h>
#define SIZE 12
#include <mpi.h>

int main(int argc, char **argv)
{
  int i, sum = 0, subtraction = 0, mult = 1;
  int array[SIZE];

  char operations[] = {'+', '-', '*'};
  char operationsRec;
  int numberOfProcessors, id, to, from, tag = 1000;
  int result, value;

  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD, &numberOfProcessors);
  MPI_Comm_rank(MPI_COMM_WORLD, &id);
  MPI_Status status;

  switch (id)
  {
  case 0:
    for (i = 0; i < SIZE; i++)
    {
      array[i] = i + 1;
      printf("%d\t%d\t", i, array[i]);
    }
    printf("\n");
    for (to = 1; to < numberOfProcessors; to++)
    {
      MPI_Send(&array, SIZE, MPI_INT, to, tag, MPI_COMM_WORLD);
      MPI_Send(&operations[to - 1], 1, MPI_CHAR, to, tag, MPI_COMM_WORLD);
    }
    break;
  default:
    MPI_Recv(&array, SIZE, MPI_INT, 0, tag, MPI_COMM_WORLD, &status);
    MPI_Recv(&operationsRec, 1, MPI_CHAR, 0, tag, MPI_COMM_WORLD, &status);
    switch (operationsRec)
    {
    case '+':
      value = 0;
      for (i = 0; i < SIZE; i++)
        value += array[i];
      break;
    case '-':
      value = 0;
      for (i = 0; i < SIZE; i++)
        value -= array[i];
      break;
    case '*':
      value = 1;
      for (i = 0; i < SIZE; i++)
        value *= array[i];
      break;
    }
  }
    MPI_Send(&value, 1, MPI_INT, 0, tag, MPI_COMM_WORLD);
    MPI_Send(&operationsRec, 1, MPI_CHAR, 0, tag, MPI_COMM_WORLD);
  for(to = 1; to < numberOfProcessors; to++) {
      MPI_Recv(&result, 1, MPI_INT, to, tag, MPI_COMM_WORLD, &status);
      MPI_Recv(&operationsRec, 1, MPI_CHAR, to, tag, MPI_COMM_WORLD, &status);
      printf ("(%c) = %d\n", operationsRec, result);
    }

     for(i = 0; i < SIZE; i++)
  array[i] = i + 1;

  for(i = 0; i < SIZE; i++)
    printf("array[%d] = %d\n", i, array[i]);

  for(i = 0; i < SIZE; i++)
  {
    sum = sum + array[i];
    subtraction = subtraction - array[i];
    mult = mult * array[i];
  }
  printf("MPI RESULTS:=========================================================================================================================================================\n\n");
  printf("Sum = %d\n", sum);
  printf("Subtraction = %d\n", subtraction);
  printf("Multiply = %d\n", mult);
  MPI_Finalize();

  return 0;
}

Overwriting operations.c


### Run the Code

In [32]:
!mpicc operations.c -o operations

In [33]:
!mpirun --allow-run-as-root -np 4 ./operations

--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 4
slots that were requested by the application:

  ./operations

Either request fewer slots for your application, or make more slots
available for use.

A "slot" is the Open MPI term for an allocatable unit where we can
launch a process.  The number of slots available are defined by the
environment in which Open MPI processes are run:

  1. Hostfile, via "slots=N" clauses (N defaults to number of
     processor cores if not provided)
  2. The --host command line parameter, via a ":N" suffix on the
     hostname (N defaults to 1 if not provided)
  3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
  4. If none of a hostfile, the --host command line parameter, or an
     RM is present, Open MPI defaults to the number of processor cores

In all the above cases, if you want Open MPI to default to the number
of hardware threads instead of the num

## `Algebraic Function`

The idea of this Hands-on is to make an algorithm that uses the
`MPI_Recv` and `MPI_Send` routines in the Master-Worker Paradigm in such
a way that in the sequential code:

In [42]:
%%writefile function.c
#include <stdio.h>
#include <mpi.h>

int main(int argc, char **argv)
{
  double coef[4], result[4] = {0}, total = 0, x = 10, received_result;
  int numberOfProcessors, id, index, i, to, from, tag = 1000, received_index;

  // Initialize coef values
  for (i = 1; i <= 4; i++)
  {
    coef[i - 1] = i;
  }

  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD, &numberOfProcessors);
  MPI_Comm_rank(MPI_COMM_WORLD, &id);
  MPI_Status status;

  switch (id)
  {
  case 0: // Master
    // Send x to all workers
    for (to = 1; to < numberOfProcessors; to++)
    {
      MPI_Send(&x, 1, MPI_DOUBLE, to, tag, MPI_COMM_WORLD);
    }
    break;

  default: // Workers
    // Receive x from master
    MPI_Recv(&x, 1, MPI_DOUBLE, 0, tag, MPI_COMM_WORLD, &status);
    index = 3 - id;  // Calculate indexed value based on id
    result[index] = coef[index];  // Initialize result with corresponding coef value

    // Multiply result by x id times
    for (i = 1; i <= id; i++)
    {
      result[index] *= x;
    }

    // If id == 1, add coef[3]
    if (id == 1)
    {
      result[index] += coef[3];
    }

    break;
  }

  // Send result and index back to the Master
  MPI_Send(&result[index], 1, MPI_DOUBLE, 0, tag, MPI_COMM_WORLD);
  MPI_Send(&index, 1, MPI_INT, 0, tag, MPI_COMM_WORLD);

  if (id == 0)
  {
    // Master receives results from Workers
    for (from = 1; from < numberOfProcessors; from++)
    {
      MPI_Recv(&received_result, 1, MPI_DOUBLE, from, tag, MPI_COMM_WORLD, &status);
      MPI_Recv(&received_index, 1, MPI_INT, from, tag, MPI_COMM_WORLD, &status);
      result[received_index] = received_result;
      printf("(%d) = %lf\n", received_index, result[received_index]);
      total += received_result;
    }

    // Print total sum of results
    if (total > 0)
      printf("Total: %.5lf\n", total);
  }

  MPI_Finalize();
  return 0;
}


Overwriting function.c


### Run the Code

In [43]:
!mpicc function.c -o function

In [44]:
!mpirun --allow-run-as-root -np 4 ./function

(2) = 34.000000
(1) = 200.000000
(0) = 1000.000000
Total: 1234.00000


OUTPUT:
(2) = 34.000000
(1) = 200.000000
(0) = 1000.000000
Total: 1234.00000

## `Tridiagonal Matrix`

In [90]:
%%writefile tridiagonal.c
#include <stdio.h>
#define ORDER 4
#include <mpi.h>

void printMatrix(int m[][ORDER])
{
  int i, j;
  for (i = 0; i < ORDER; i++)
  {
    printf("| ");
    for (j = 0; j < ORDER; j++)
    {
      printf("%3d ", m[i][j]);
    }
    printf("|\n");
  }
  printf("\n");
}

int main(int argc, char **argv)
{
  int k[3] = {100, 200, 300};
  int matrix[ORDER][ORDER] = {0}, received_matrix[ORDER][ORDER], i, j;
  int numberOfProcessors, id, to, from, tag = 1000;
  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD, &numberOfProcessors);
  MPI_Comm_rank(MPI_COMM_WORLD, &id);
  MPI_Status status;

  if (id == 0)
  {
    // Process 0 sends the initial matrix to the other processes
    for (to = 1; to < numberOfProcessors; to++)
    {
      MPI_Send(&matrix, ORDER * ORDER, MPI_INT, to, tag, MPI_COMM_WORLD);
    }

    // Process 0 receives the submatrices from the other processes
    for (from = 1; from < numberOfProcessors; from++)
    {
      MPI_Recv(&received_matrix, ORDER * ORDER, MPI_INT, from, tag, MPI_COMM_WORLD, &status);

      // Combine the received submatrix into the final matrix
      for (i = 0; i < ORDER; i++)
      {
        for (j = 0; j < ORDER; j++)
        {
          if (received_matrix[i][j] != 0) // Avoid overwriting existing values
          {
            matrix[i][j] = received_matrix[i][j];
          }
        }
      }
    }

    // Print the final matrix in process 0
    printf("Final matrix assembled by process 0:\n");
    printMatrix(matrix);
  }
  else
  {
    // Processes 1, 2, and 3 receive the initial matrix and perform their tasks
    MPI_Recv(&matrix, ORDER * ORDER, MPI_INT, 0, tag, MPI_COMM_WORLD, &status);

    switch (id)
    {
    case 1:
      for (i = 0; i < ORDER; i++)
        for (j = 0; j < ORDER; j++)
          if (i == j)
            matrix[i][j] = i + j + 1 + k[0]; // Main diagonal
      break;
    case 2:
      for (i = 0; i < ORDER; i++)
        for (j = 0; j < ORDER; j++)
          if (i == (j + 1))
          {
            matrix[i][j] = i + j + 1 + k[1];    // Subdiagonal
            matrix[j][i] = matrix[i][j] + k[2]; // Superdiagonal
          }
      break;
    case 3:
      for (i = 0; i < ORDER; i++)
      {
        for (j = 0; j < ORDER; j++)
        {
          if (i < j - 1 || i > j + 1)
          {
            matrix[i][j] = 0; // Zero outside the super and subdiagonal
          }
        }
      }
      break;
    }
    printf("Process %d - Matrix after the task:\n", id);
    printMatrix(matrix);

    // Processes 1, 2, and 3 send the submatrices back to process 0
    MPI_Send(&matrix, ORDER * ORDER, MPI_INT, 0, tag, MPI_COMM_WORLD);
  }

  MPI_Finalize();
  return 0;
}

Overwriting tridiagonal.c


### Run the Code

In [91]:
!mpicc tridiagonal.c -o tridiagonal

In [92]:
!mpirun --allow-run-as-root -np 4 ./tridiagonal

Process 3 - Matrix after the task:
|   0   0   0   0 |
|   0   0   0   0 |
|   0   0   0   0 |
|   0   0   0   0 |

Final matrix assembled by process 0:
| 101 502   0   0 |
| 202 103 504   0 |
|   0 204 105 506 |
|   0   0 206 107 |

Process 1 - Matrix after the task:
| 101   0   0   0 |
|   0 103   0   0 |
|   0   0 105   0 |
|   0   0   0 107 |

Process 2 - Matrix after the task:
|   0 502   0   0 |
| 202   0 504   0 |
|   0 204   0 506 |
|   0   0 206   0 |



## References

M. Boratto. Hands-On Supercomputing with Parallel Computing. Available: https://github.com/muriloboratto/Hands-On-Supercomputing-with-Parallel-Computing. 2022.

B. Chapman, G. Jost and R. Pas. Using OpenMP: Portable Shared Memory Parallel Programming. The MIT Press, 2007, USA.