Multiprocessing in Python refers to the ability to create and manage multiple processes to execute tasks concurrently. Each process runs independently and has its own memory space, allowing true parallelism, unlike multithreading, which is limited by the Global Interpreter Lock (GIL) in Python.

Here's why multiprocessing is useful:

    True Parallelism: Multiprocessing allows Python programs to take advantage of multiple CPU cores, enabling true parallel execution of tasks. Each process runs in its own memory space, avoiding the limitations of the GIL in Python's multithreading.

    Improved Performance: By distributing tasks across multiple processes, multiprocessing can significantly speed up the execution of CPU-bound tasks, such as heavy computations or data processing.

    Isolation: Each process has its own memory space, preventing interference between processes. This isolation makes multiprocessing ideal for tasks that require independent and isolated execution.

    Fault Tolerance: Since each process runs independently, errors or crashes in one process do not affect the execution of other processes. This enhances fault tolerance and stability in multiprocessing applications.

    Resource Utilization: Multiprocessing allows better utilization of available system resources, particularly on multi-core systems, by distributing tasks across multiple processes.

    Scaling: Multiprocessing provides a scalable solution for handling computationally intensive tasks. As the number of CPU cores increases, the performance gains from multiprocessing also increase proportionally.

Multiprocessing and multithreading are both techniques used to achieve concurrency and parallelism in Python, but they differ in several key aspects:

    Execution Model:
        Multiprocessing: In multiprocessing, multiple processes run concurrently, each with its own memory space and resources. Processes are independent of each other and do not share memory unless explicitly specified. Multiprocessing achieves true parallelism by utilizing multiple CPU cores.
        Multithreading: In multithreading, multiple threads run concurrently within the same process, sharing the same memory space and resources. Threads within the same process can share data easily but must coordinate access to shared resources to avoid data corruption. Multithreading can achieve concurrency but is limited by the Global Interpreter Lock (GIL) in CPython, preventing true parallelism.

    Concurrency vs. Parallelism:
        Concurrency: Both multiprocessing and multithreading allow multiple tasks to be executed concurrently, where tasks can start, run, and complete in overlapping time periods. Concurrency is useful for tasks that involve I/O-bound operations, where threads or processes can overlap their execution while waiting for I/O operations to complete.
        Parallelism: Multiprocessing achieves true parallelism by executing tasks simultaneously across multiple CPU cores. Each process runs independently and can execute tasks in parallel without interference. Multithreading can achieve concurrency but not true parallelism due to the GIL, which restricts the execution of Python bytecode to a single thread at a time.

    Memory and Communication:
        Multiprocessing: Processes have separate memory spaces and communicate through inter-process communication (IPC) mechanisms like pipes, queues, shared memory, or sockets. Communication between processes is more complex but safer, as processes are isolated from each other.
        Multithreading: Threads share the same memory space, allowing them to access shared data directly. However, this requires careful synchronization to avoid race conditions and data corruption. Communication between threads is simpler but can be prone to synchronization issues.

    Resource Overhead:
        Multiprocessing: Creating and managing processes typically incurs higher overhead in terms of memory and system resources compared to threads. Each process has its own memory space and system resources, which can lead to higher memory usage.
        Multithreading: Threads within the same process share memory and resources, resulting in lower overhead compared to processes. However, threads must contend for shared resources, which can lead to synchronization overhead and potential performance bottlenecks.

    Fault Isolation:
        Multiprocessing: Since each process runs independently, errors or crashes in one process do not affect others. Fault isolation is stronger in multiprocessing, making it more suitable for tasks requiring robustness and fault tolerance.
        Multithreading: Errors in one thread can potentially affect the entire process, as threads share the same memory space. Debugging and isolating errors in multithreaded applications can be more challenging.

In [2]:
import multiprocessing
import os

# Function to print process ID
def print_process_id():
    print("Process ID:", os.getpid())

if __name__ == "__main__":
    # Create a multiprocessing process
    process = multiprocessing.Process(target=print_process_id)

    # Start the process
    process.start()

    # Wait for the process to finish
    process.join()

    print("Main process ID:", os.getpid())


Main process ID: 1848


In this code:

    We import the multiprocessing module and the os module, which allows us to access process-related functions.
    We define a function print_process_id() that simply prints the process ID (os.getpid()).
    We use the multiprocessing.Process() constructor to create a multiprocessing process. We pass the target function (print_process_id) to be executed by the process.
    We start the process using the start() method.
    We wait for the process to finish using the join() method. This ensures that the main process waits for the child process to complete its execution.
    Finally, we print the process ID of the main process.

A multiprocessing pool in Python is a mechanism provided by the multiprocessing module to manage a pool of worker processes. It is used to parallelize the execution of a function across multiple processes by distributing the workload among the available CPU cores. The pool manages the creation and management of worker processes, making it easier to perform parallel computations and handle multiple tasks concurrently.

Here's how a multiprocessing pool works:

    Creation of Worker Processes:
        When you create a multiprocessing pool, you specify the number of worker processes to be created.
        The pool automatically creates the specified number of worker processes, usually equal to the number of CPU cores available on the system.

    Distribution of Tasks:
        You can submit tasks (functions) to the pool for execution using various methods such as map(), apply(), or apply_async().
        The pool distributes these tasks among the worker processes, ensuring that each task is executed concurrently by a separate process.

    Execution of Tasks:
        Each worker process executes the assigned task independently and concurrently with other processes.
        The results of the tasks are collected and returned to the main process.

    Management of Worker Processes:
        The pool manages the lifecycle of worker processes, including creation, execution of tasks, and termination.
        Worker processes are reused for subsequent tasks, reducing the overhead of process creation and termination.

Multiprocessing pools are useful for several reasons:

    Parallel Execution: Multiprocessing pools enable parallel execution of tasks across multiple CPU cores, allowing you to utilize the full computational power of the system.

    Improved Performance: By distributing tasks among multiple processes, multiprocessing pools can significantly speed up the execution of CPU-bound tasks, such as heavy computations or data processing.

    Simplified Parallelism: The pool abstracts away the complexities of process management, making it easier to parallelize tasks without having to manage individual processes manually.

    Scalability: Multiprocessing pools provide a scalable solution for handling computationally intensive tasks. As the number of CPU cores increases, the performance gains from multiprocessing also increase proportionally.

In [None]:
import multiprocessing

def worker_function(argument):
    """Function to be executed by worker processes."""
    # Process the argument here
    result = argument ** 2
    return result

if __name__ == "__main__":
    # Create a multiprocessing pool with 4 worker processes
    pool = multiprocessing.Pool(processes=4)

    # Define a list of arguments
    arguments = [1, 2, 3, 4, 5]

    # Apply the worker function to each argument using the pool
    results = pool.map(worker_function, arguments)

    # Close the pool (no more tasks can be submitted)
    pool.close()

    # Wait for all processes in the pool to complete
    pool.join()

    # Print the results
    print("Results:", results)


Let's break down the code:

    We import the multiprocessing module.

    We define a function worker_function that represents the task to be performed by each worker process. In this case, it squares the given argument.

    Inside the if __name__ == "__main__": block (which ensures that the code inside it is only executed when the script is run directly, not when it's imported), we create a multiprocessing pool using multiprocessing.Pool(processes=4) to create a pool of 4 worker processes.

    We define a list of arguments (arguments) for the worker function.

    We use the pool.map() method to apply the worker_function to each argument in the arguments list. This distributes the tasks among the worker processes in the pool.

    We close the pool using pool.close() to indicate that no more tasks will be submitted to the pool.

    We use pool.join() to wait for all processes in the pool to complete their tasks.

    Finally, we print the results.

In [None]:
import multiprocessing

def print_number(number):
    """Function to print the given number."""
    print("Process ID:", multiprocessing.current_process().pid, "- Number:", number)

if __name__ == "__main__":
    # Define a list of numbers
    numbers = [1, 2, 3, 4]

    # Create a list to store the processes
    processes = []

    # Create a process for each number
    for num in numbers:
        process = multiprocessing.Process(target=print_number, args=(num,))
        processes.append(process)
        process.start()

    # Wait for all processes to finish
    for process in processes:
        process.join()


In this program:

    We import the multiprocessing module.

    We define a function print_number that prints the given number along with the process ID.

    Inside the if __name__ == "__main__": block, we define a list of numbers.

    We create a list processes to store the processes.

    We loop through each number in the numbers list, create a process for each number using multiprocessing.Process, passing print_number as the target function and the number as the argument, and then start the process.

    We add each process to the processes list.

    We wait for all processes to finish by using process.join() for each process in the processes list.