# Multiprocessing 

Learning about multiprocessing in Python is a great way to leverage multiple CPU cores and enhance the performance of your programs, especially for tasks that can be parallelized. I'll guide you through the basics of multiprocessing in Python, from fundamental concepts to more advanced topics.

**1. Introduction to Multiprocessing:**

Multiprocessing is a module in Python that allows you to create and manage multiple processes concurrently. It's particularly useful for CPU-bound tasks where the program's performance can be improved by distributing the work across multiple cores.

**2. Basic Concepts:**

- **Process Creation:** The `multiprocessing` module provides a `Process` class that allows you to create new processes. Each process runs in its own memory space and has its own Python interpreter.

- **Process Communication:** Processes can communicate with each other through various mechanisms, such as pipes, queues, and shared memory.

- **Pool of Workers:** The `Pool` class allows you to create a pool of worker processes that can execute a function concurrently.

**3. Creating Processes:**

You can create processes using the `Process` class from the `multiprocessing` module:

```python
import multiprocessing

def worker_function():
    print("Worker process")

if __name__ == "__main__":
    process = multiprocessing.Process(target=worker_function)
    process.start()
    process.join()  # Wait for the process to finish
```

**4. Process Communication:**

Processes can communicate using various methods such as `Queue` and `Pipe`. Here's an example using a `Queue`:

```python
import multiprocessing

def worker_function(queue):
    queue.put("Message from worker")

if __name__ == "__main__":
    queue = multiprocessing.Queue()
    process = multiprocessing.Process(target=worker_function, args=(queue,))
    process.start()
    process.join()
    
    message = queue.get()
    print("Parent received:", message)
```

**5. Using Pool of Workers:**

The `Pool` class provides a convenient way to manage a pool of worker processes:

```python
import multiprocessing

def worker_function(x):
    return x*x

if __name__ == "__main__":
    with multiprocessing.Pool(processes=4) as pool:
        results = pool.map(worker_function, range(10))
    print(results)
```

**6. Shared Memory:**

The `multiprocessing` module also provides tools for sharing memory between processes, like `Value` and `Array`:

```python
import multiprocessing

def worker_function(shared_value):
    shared_value.value += 1

if __name__ == "__main__":
    shared_value = multiprocessing.Value("i", 0)
    process = multiprocessing.Process(target=worker_function, args=(shared_value,))
    process.start()
    process.join()
    print("Shared value:", shared_value.value)
```

**7. Advanced Concepts:**

For more advanced use cases, you can explore topics like synchronization using `Lock`, using the `Manager` class for managing shared objects, and handling exceptions in multiprocessing code.

Remember that due to Python's Global Interpreter Lock (GIL), multiprocessing is most effective for CPU-bound tasks. For I/O-bound tasks, consider using the `concurrent.futures` module with threads or asynchronous programming.

As you delve deeper into multiprocessing, make sure to understand process synchronization, avoiding deadlocks, and managing resources effectively. Always refer to the official Python documentation for the most accurate and updated information.

In [2]:
import multiprocessing

def worker_function(queue):
    queue.put("Message from worker")

if __name__ == "__main__":
    queue = multiprocessing.Queue()
    process = multiprocessing.Process(target=worker_function, args=(queue,))
    process.start()
    process.join()
    
    message = queue.get()
    print("Parent received:", message)


Parent received: Message from worker


In [2]:
import multiprocessing

l = list(range(10)) # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
def worker_function(x):
    return x*x

pool = multiprocessing.Pool(processes=4)
results = pool.map(worker_function,l)
pool.close()

results

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [9]:
import multiprocessing
import time

def worker_function(x):
    time.sleep(1)
    return x * x

if __name__ == "__main__":
    pool = multiprocessing.Pool(processes=4)
    results = pool.map(worker_function, range(10))
    # Missing pool.close() and pool.join()
    print(results)


[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


In [13]:
import multiprocessing
import time

def worker_function(x):
    print(f"Worker {x} started.   ")
    time.sleep(2)
    print(f"Worker {x} finished.  ")

if __name__ == "__main__":
    pool = multiprocessing.Pool(processes=4)
    results = pool.map(worker_function, range(10))
    # Missing pool.close() and pool.join()
    print("All tasks submitted")


Worker 0 started.   Worker 1 started.   Worker 2 started.   Worker 3 started.   





Worker 1 finished.  Worker 0 finished.  Worker 3 finished.  
Worker 4 started.   
Worker 2 finished.  


Worker 5 started.   
Worker 7 started.   Worker 6 started.   

Worker 4 finished.  
Worker 8 started.   
Worker 5 finished.  Worker 7 finished.  Worker 6 finished.  


Worker 9 started.   
Worker 8 finished.  
Worker 9 finished.  
All tasks submitted


Why pool.close() is before pool.join() ? 

pool.close()
pool.join()


The reason why `pool.close()` is called before `pool.join()` is to allow new tasks to be submitted to the pool and prevent any more tasks from being added after the pool is closed.

Here's the rationale behind this order:

1. **Submitting Tasks:** When you call methods like `pool.map()` or `pool.apply_async()`, you're submitting tasks to the pool to be executed by the worker processes.

2. **Closing the Pool:** After you've submitted all the tasks you want to run in parallel, you call `pool.close()`. This prevents any additional tasks from being added to the pool. At this point, the pool is still operational, and the worker processes will continue executing the tasks that have already been submitted.

3. **Waiting for Completion:** After the pool is closed, you call `pool.join()`. This call blocks the main program's execution until all tasks submitted to the pool have been completed. This is necessary to ensure that your program doesn't proceed further until all the parallel tasks are finished.

In summary, the order of `pool.close()` before `pool.join()` ensures that no more tasks are added to the pool after it's closed, and the `pool.join()` call blocks the main program until all submitted tasks are completed.