### Exercise 1: Hello World
1. Write an MPI program which prints the message "Hello World"
2. Modify your program so that each process prints out both its rank and the total number of processes P that the code is running on, i.e. the size of `MPI_COMM_WORLD`.
3. Modify your program so that only a single controller process (e.g. rank 0) prints out a message (very useful when you run with hundreds of processes).
4. What happens if you omit the final MPI procedure call in your program?

In [None]:
# part1
from mpi4py import MPI
COMM = MPI.COMM_WORLD
RANK = COMM.Get_rank()
SIZE = COMM.Get_size()
print("hello", RANK, SIZE)

mpirun --oversubscribe -n 8 python3 ex1.py

#part2
from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

print("Process", rank, "of", size, "says Hello World! :)")

MPI.Finalize()

#part3
from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

if rank == 0:
    print("Hello World from controller process")

MPI.Finalize()

### Exercise 2: Sharing Data
Create a program that obtains an integer input from the terminal and distributes it to all the MPI processes.
Each process must display its rank and the received value. 
Keep reading values until a negative integer is entered.
**Output Example**
```shell
10
Process 0 got 10
Process 1 got 10
```

In [1]:
from mpi4py import MPI

COMM =MPI.COMM_WORLD
RANK =COMM.Get_rank()
if RANK ==0:
    sendbuf=int(input("donner n"))
else:
    sendbuf =None
recvbuf =COMM.bcast(sendbuf, root=0)
print(RANK,recvbuf)

ModuleNotFoundError: No module named 'mpi4py'

### Exercise 3 Sending in a ring (broadcast by ring)

Write a program that takes data from process zero and sends it to all of the other processes by sending it in a ring. That is, process i should receive the data add the rank of the process to it then send it to process i+1, until the last process is reached.
Assume that the data consists of a single integer. Process zero reads the data from the user.
print the process rank and the value received.


![ring](../data/ring.gif)

You may want to use these MPI routines in your solution:
`Send` `Recv`

In [None]:
from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

while(1):
    if rank == 0:
        x= int(input("entrer x"))
        comm.send(x, 1)
    else:
        x= comm.recv(source = rank - 1)
        if rank < size - 1:
            if x< 0: x-= rank
            comm.send(x+ rank, rank + 1)
    if x< 0:
        break
    print("rank:", rank, ",data:",x)
MPI.Finalize()

### Exercise 4: Scattering Matrix
1. Create an n by m matrix A on processor 0.
2. Use MPI_Scatterv to send parts of the matrix to the other processors.
3. Processor 1 receives A(i,j) for i=0 to (n/2)-1 and j=m/2 to m-1.
4. Processor 2 receives A(i,j) for i=n/2 to n-1 and j=0 to (m/2)-1.
5. Processor 3 receives A(i,j) for i=n/2 to n-1 and j=m/2 to m-1.
**Example:** using n=m=8 for simplicity.

![N2utM.png](attachment:N2utM.png)

In [None]:
from mpi4py import MPI
import numpy as np

comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()
n = 8
m = 8
if rank == 0:
    A = np.random.rand(n,m)
    print("Original matrix on processor 0:")
    print(A)
    # Divide the matrix into parts to send to each processor
    sendcounts = np.zeros(size, dtype=int)
    displs = np.zeros(size, dtype=int)
    sendcounts[1] = (n // 2) * (m - m // 2)
    sendcounts[2] = (n - n // 2) * (m // 2)
    sendcounts[3] = (n - n // 2) * (m - m // 2)
    displs[1] = (n // 2) * m + m // 2
    displs[2] = n // 2
    displs[3] = (n // 2) * m + m // 2 + n // 2
else:
    A = None
    sendcounts = None
    displs = None
# Scatter the matrix parts to each processor
recvA = np.zeros((n // 2, m // 2))
recvcounts = (n // 2) * (m // 2)
comm.Scatterv([A, sendcounts, displs, MPI.DOUBLE], recvA, root=0)
if rank == 1:
    print("Received matrix on processor 1:")
    print(recvA)
elif rank == 2:
    print("Received matrix on processor 2:")
    print(recvA)
elif rank == 3:
    print("Received matrix on processor 3:")
    print(recvA)

### Exercise 5 Matrix vector product

1. Use the `MatrixVectorMult.py` file to implement the MPI version of matrix vector multiplication.
2. Process 0 compares the result with the `dot` product.
3. Plot the scalability of your implementation. 

**Output Example**
```shell
CPU time of parallel multiplication using 2 processes is  174.923446
The error comparing to the dot product is : 1.4210854715202004e-14

In [None]:
import numpy as np
from scipy.sparse import lil_matrix
from numpy.random import rand, seed
from numba import njit
from mpi4py import MPI

comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()

seed(42)
@njit
def matrix_vector_mult(A, b, x):
    row, col = A.shape
    for i in range(row):
        a = A[i]
        for j in range(col):
            x[i] += a[j] * b[j]

    return x

matrix_size = 1000
block_size = matrix_size // size

if rank == 0:
    A = lil_matrix((matrix_size, matrix_size))
    A[0, :100] = rand(100)
    A[1, 100:200] = A[0, :100]
    A.setdiag(rand(matrix_size))
    A = A.toarray()
    b = rand(matrix_size)
else:
    A = None
    b = None

matrix = np.zeros((block_size, matrix_size))
comm.Scatter(A, matrix, root=0)

b = comm.bcast(b, root=0)

block_result = np.zeros(block_size)

start_time = MPI.Wtime()
matrix_vector_mult(matrix, b, block_result)
stop_time = MPI.Wtime()

send_counts = np.array(comm.gather(len(block_result), root=0))

if rank == 0:
    result = np.zeros(sum(send_counts), dtype=np.double)
else:
    result = None

comm.Gatherv(block_result, recvbuf=(result, send_counts, MPI.DOUBLE), root=0)

if rank == 0:
    dot_product_result = A.dot(b)
    print("CPU time of matrix multiplication is ", (stop_time - start_time) * 1000)
    print("The error comparing to the dot product is:", np.max(np.abs(dot_product_result - result)))

### Exercise 6: Pi calculation
An approximation to the value π can be obtained from the following expression

$$
\frac{\pi}{4}=\int_0^1 \frac{d x}{1+x^2} \approx \frac{1}{N} \sum_{i=1}^N \frac{1}{1+\left(\frac{i-\frac{1}{2}}{N}\right)^2}
$$

where the answer becomes more accurate with increasing N. Iterations over i are independent so the
calculation can be parallelized.

For the following exercises you should set N = 840. This number is divisible by 2, 3, 4, 5, 6, 7 and 8
which is convenient when you parallelize the calculation!

1. Create a program where each process independently computes the value of `π` and prints it to the screen. Check that the values are correct (each process should print the same value)
2. Now arrange for different processes to do the computation for different ranges of i. For example, on two processes: rank 0 would do i = 0, 1, 2, . . . , N/2 - 1; rank 1 would do i = N/2, N/2 + 1, . . . , N-1.
Print the partial sums to the screen and check the values are correct by adding them up by hand.
3. Now we want to accumulate these partial sums by sending them to the controller (e.g. rank 0) to add up:
- all processes (except the controller) send their partial sum to the controller
- the controller receives the values from all the other processes, adding them to its own partial sum
1. Use the function `MPI_Wtime` (see below) to record the time it takes to perform the calculation. For a given value of N, does the time decrease as you increase the number of processes? Note that to ensure that the calculation takes a sensible amount of time (e.g. more than a second) you will probably have to perform the calculation of `π` several thousands of times.
2. Ensure your program works correctly if N is not an exact multiple of the number of processes P


In [None]:
#part1
from mpi4py import MPI
import numpy as np

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

N = 840
dx = 1.0/N

# Compute the partial sum for each process
start = rank*N//size + 1
end = (rank+1)*N//size
partial_sum = np.sum(1.0/(1.0 + ((np.arange(start, end) - 0.5)/N)**2))

# Reduce the partial sums to compute the final result
pi = comm.reduce(partial_sum, op=MPI.SUM, root=0)

if rank == 0:
    pi = 4.0*dx*pi
    print(f"Final result: {pi:.10f}")
    
#part2
from mpi4py import MPI
import math

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

N = 840
start_i = int(N/2)*rank
end_i = int(N/2)*(rank+1)

pi_part = 0.0
for i in range(start_i, end_i):
    pi_part += 1.0/(1.0 + ((i + 0.5)/N)**2)
pi_part *= 4.0/N

print("Process", rank, "partial sum:", pi_part)

if rank == 0:
    pi = pi_part + comm.recv(source=1)
    print("Computed pi:", pi)
else:
    comm.send(pi_part, dest=0)