**Installing MPI Library**

In [1]:
%pip install mpi4py

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 26.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


**Creating the Success Script**

In [2]:
%%writefile mpi_success.py
from mpi4py import MPI
import sys

# Initialize MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

operations = ["*", "/", "+", "-"]

if rank == 0:
    # --- MASTER PROCESS ---
    print(f"\n{' WORKER ID ':<15} | {' OPERATION ':<20} | {' OUTPUT ':<15}")
    print(f"{'':=<56}")

    for i in range(1, size):
        data = comm.recv(source=i)

        p_id = f"Process {data['rank']}"
        task = data['task']
        res = f"{data['result']:.2f}" if isinstance(data['result'], float) else data['result']

        print(f"{p_id:<15} | {task:<20} | {res:<15}")

    print(f"{'':=<56}")
    print(f"Master: Received data from {size-1} processes.\n")

else:
    # --- WORKER PROCESS ---
    op = operations[rank % len(operations)]
    val_a = rank * 100
    val_b = 5

    if op == "+": result = val_a + val_b
    elif op == "-": result = val_a - val_b
    elif op == "/": result = val_a / val_b
    else: result = val_a * val_b

    # Send result package
    payload = {"rank": rank, "task": f"{val_a} {op} {val_b}", "result": result}
    comm.send(payload, dest=0)

Overwriting mpi_success.py


**Creating the Fail Process Script**

In [3]:
%%writefile mpi_fail.py
from mpi4py import MPI
import sys

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

operations = ["+", "-", "/", "*"]

if rank == 0:
    # --- MASTER PROCESS LOGIC ---
    print(f"\n{' PROCESS ID ':#^20} | {' ASSIGNED TASK ':#^20} | {' RESULT ':#^15}")
    print(f"{'':-^61}")

    for i in range(1, size):
        print(f"Master: Waiting for data from Worker {i}...")
        
        # BLOCKING RECV: The Master will hang here if a worker fails
        data = comm.recv(source=i)

        p_id = f"Worker {data['rank']}"
        task = data['task']
        res = f"{data['result']:.2f}" if isinstance(data['result'], float) else data['result']
        print(f"{p_id:<20} | {task:^20} | {res:>15}")

    print("Master: Operation Complete.")

else:
    # --- WORKER PROCESS LOGIC ---
    # FAIL PROCESS: Worker 2 crashes before sending data
    if rank == 2:
        print(f"DEBUG: Worker 2 has encountered a fatal error and is exiting...")
        sys.exit(1) 

    op = operations[rank % len(operations)]
    val_a, val_b = rank * 10, 2
    
    if op == "+": result = val_a + val_b
    elif op == "-": result = val_a - val_b
    elif op == "/": result = val_a / val_b
    else: result = val_a * val_b

    comm.send({"rank": rank, "task": f"{val_a} {op} {val_b}", "result": result}, dest=0)

Overwriting mpi_fail.py


**Configuration & Path Setup**

In [4]:
import os

mpi_cmd = r"D:\coding\Bin\mpiexec.exe"

os.environ["OMPI_ALLOW_RUN_AS_ROOT"] = "1"
os.environ["OMPI_ALLOW_RUN_AS_ROOT_CONFIRM"] = "1"

print(f"MPI Configuration Set. Using executable at: {mpi_cmd}")

MPI Configuration Set. Using executable at: D:\coding\Bin\mpiexec.exe


**Executing the Distributed Programs**

Normal Execution

In [5]:
! "{mpi_cmd}" -n 10 python mpi_success.py


 WORKER ID      |  OPERATION           |  OUTPUT        
Process 1       | 100 / 5              | 20.00          
Process 2       | 200 + 5              | 205            
Process 3       | 300 - 5              | 295            
Process 4       | 400 * 5              | 2000           
Process 5       | 500 / 5              | 100.00         
Process 6       | 600 + 5              | 605            
Process 7       | 700 - 5              | 695            
Process 8       | 800 * 5              | 4000           
Process 9       | 900 / 5              | 180.00         
Master: Received data from 9 processes.



Failure Simulation (The Hang)

In [8]:
import subprocess
import time

# Your exact path
mpi_cmd = r"D:\coding\Bin\mpiexec.exe"

# We pass the command as a list instead of a single string.
# This prevents Windows from creating an un-killable background shell.
command = [mpi_cmd, "-n", "4", "python", "mpi_fail.py"]

print("Executing command...")
print("Limit: 5 seconds before force-kill...")
print("-" * 50)

# Start the process directly
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)

try:
    # Wait for up to 5 seconds for it to finish naturally
    output, errors = process.communicate(timeout=5)
    print(output)

except subprocess.TimeoutExpired:
    # If it hits 5 seconds, we brutally kill the process tree
    process.kill()
    
    # Grab whatever it managed to print before we killed it
    output, errors = process.communicate()
    
    print(output)
    print("\n" + "="*55)
    print("[SUCCESS] PROOF OF HANG:")
    print("The system froze waiting for Process 2.")
    print("The execution was forcefully terminated after 5 seconds.")
    print("="*55)

Executing command...
Limit: 5 seconds before force-kill...
--------------------------------------------------

#### PROCESS ID #### | ## ASSIGNED TASK ### | ### RESULT ####
-------------------------------------------------------------
Master: Waiting for data from Worker 1...
Worker 1             |        10 - 2        |               8
Master: Waiting for data from Worker 2...
DEBUG: Worker 2 has encountered a fatal error and is exiting...


[SUCCESS] PROOF OF HANG:
The system froze waiting for Process 2.
The execution was forcefully terminated after 5 seconds.


**Why is message passing required in distributed systems?**
- In a distributed system, every process operates in its own isolated memory space, meaning one processor cannot look at or change the variables of another. Message passing is required because it is the only way to move data between these separate spaces. Without it, the processes would be completely independent and could not work together to solve a single problem.

**What happens if one process fails?**
- If a process crashes before it sends its assigned data, the system will likely freeze. This happens because the receiving process uses a "blocking" command that pauses everything until a message arrives. Since the crashed process never sends the message, the receiver waits forever, causing the entire application to hang.

**How does this model differ from shared-memory programming?**
- In shared-memory programming, all threads can read and write to the same variables instantly, which is faster but can lead to errors if two threads change data at the same time. The message passing model is different because processes cannot see each other's memory. They must strictly use send and receive commands to communicate, which is safer for data but requires more lines of code to manage.