# Usage differences between MPI and OpenMP

## Why was OpenMP tightly integrated with C/C++ compiler?

Because the `#pragma`s that we insert require compiling something different than the code that was written (if we think of it just as C code)

```C
#pragma omp parallel for
for (i = 0; i < N; i++) {
  loop_body(i);
}
```

Transforms into code roughly like the following

```C

/* declared outside the original function */
void omp_partial_loop_function (int start, int end, ... /* other variables from the environment used in the loop */)
{
  for (int i = start; i < end; i++) {
    loop_body(i);
  }
}

... /* back in the original function */

omp_fork_threads(); /* not an actual function call, this is illustrative */
idx = omp_get_thread_num();
start = omp_schedule_static_start(N,i); /* not real, illustrative */
end = omp_schedule_static_end(N,i); /* ditto */
omp_partial_loop_function(start, end, ...);
omp_join_threads();
```

## Why isn't MPI tightly integrated with the compiler?

MPI is a C library interface with multiple implementations, and is used the same way that other C libraries are used:

### 1. include the header file

```c
#include <mpi.h>
```
### 2. use the functions and data types in code

```
MPI_Comm comm;
int      rank;
int      err;

err = MPI_Initialize();
comm = MPI_COMM_WORLD;
err = MPI_Comm_rank(comm, &rank);
err = MPI_Finalize();
```

### 3. tell the compiler where to find the header files at compile time

```
cc -I/path/to/mpi/include -c mycode.c
```

### 4. tell the compiler where to find the libraries when at link time

```
cc -L/path/to/mpi/library -lmpi mycode.o -o myprog
```



## Wait, if it isn't tightly integrated with the compiler, why is there `mpicc`?  That seems even more tightly integrated than the compiler.

`mpicc` is essentially a wrapper around a compiler that automatically does steps 3 and 4 for you.

## Why does MPI have a launcher and OpenMP doesn't?

MPI code is written from the perspective of a single thread of computation working independently, which is actually a separate _process_ from the OS perspective, whereas OpenMP is mostly written from the perspective of the master thread delegating to forked threads.  `mpirun` and `mpiexec` help the OS set up the independent processes, and does some of the things that are handled by the environment variables for OpenMP (e.g. `OMP_NUM_THREADS=8` becomes `mpiexec -n 8` or `mpirun -np 8`)

## Isn't it more overhead to have 8 independent OS processes than to have 1 OS processes with 8 threads?

Not as much as you'd think: in linux they are handled by different uses of the same system call `clone` with different semantics about how memory is passed from the cloner to the clonee:

- With threads, the clonee sees and shares the memory addresses of the cloner (which is why variables in OpenMP declared outside of a parallel scope are **shared by default**)
- With processes, the clonee gets a copy of the cloner's memory, but it is implemented by **copy-on-write**.  This means that if there is a large portion of read-only memory (like the code from libraries that the program links to), it doesn't actually get duplicated in physical memory, it just gets _new virtual memory addresses_ to the _same physical memory_.  We will see in more advanced MPI that MPI processes can in fact share writeable memory, but you have to ask for it: MPI memory is **private by default**
- Thus the one real overhead of processes over threads is on the part of the memory hierarchy of a chip that translates virtual addresses into physical addresses.  If that is not the bottleneck, then processes will be as efficient as threads