# Chapter 27: Shared State and Inter-Process Communication

Since each process has its own memory space, sharing data between processes requires
explicit mechanisms. The `multiprocessing` module provides several IPC (Inter-Process
Communication) primitives: `Queue`, `Pipe`, `Value`, `Array`, `Manager`, and
synchronization tools like `Lock`.

## Topics Covered
- **Queue**: Thread/process-safe FIFO queue
- **Pipe**: Bidirectional communication channel
- **Value**: Shared single value in shared memory
- **Array**: Shared array in shared memory
- **Lock**: Synchronize access to shared resources
- **Manager**: Proxy-based shared objects (dicts, lists, etc.)
- **Practical**: Producer-consumer pattern with Queue

## Queue: Safe Inter-Process Communication

`multiprocessing.Queue` is a process-safe FIFO queue built on top of pipes and locks.
It is the most common way to pass data between processes.

Key methods:
- `put(item)` -- add an item (blocks if the queue is full)
- `get()` -- remove and return an item (blocks if empty)
- `empty()` -- approximate check if the queue is empty
- `qsize()` -- approximate number of items

In [None]:
import multiprocessing

# Basic Queue usage (within a single process for demonstration)
q: multiprocessing.Queue = multiprocessing.Queue()

# Put items into the queue
q.put("hello")
q.put("world")
q.put(42)

# Get items in FIFO order
print(f"First:  {q.get()}")
print(f"Second: {q.get()}")
print(f"Third:  {q.get()}")
print(f"Empty:  {q.empty()}")

In [None]:
import multiprocessing
import os


def queue_worker(q: multiprocessing.Queue, values: list[int]) -> None:
    """Put computed results into a queue."""
    pid: int = os.getpid()
    for v in values:
        q.put((pid, v, v * v))


# Use a Queue to collect results from multiple worker processes
result_queue: multiprocessing.Queue = multiprocessing.Queue()

p1 = multiprocessing.Process(target=queue_worker, args=(result_queue, [1, 2, 3]))
p2 = multiprocessing.Process(target=queue_worker, args=(result_queue, [4, 5, 6]))

p1.start()
p2.start()
p1.join()
p2.join()

# Read all results
print("Results from workers:")
while not result_queue.empty():
    pid, value, squared = result_queue.get()
    print(f"  PID {pid}: {value}^2 = {squared}")

## Pipe: Bidirectional Communication

`multiprocessing.Pipe()` creates a pair of connected `Connection` objects. By default,
the pipe is bidirectional -- each end can both `send()` and `recv()`.

Pipes are faster than queues for simple two-process communication, but they are not
safe for use by more than two processes simultaneously.

In [None]:
import multiprocessing

# Bidirectional pipe (within a single process for clarity)
parent_conn, child_conn = multiprocessing.Pipe()

# Send from parent end, receive at child end
parent_conn.send("ping")
print(f"Child received: {child_conn.recv()}")

# Send from child end, receive at parent end
child_conn.send("pong")
print(f"Parent received: {parent_conn.recv()}")

# Pipes can send any picklable object
parent_conn.send({"type": "data", "values": [1, 2, 3]})
msg: dict = child_conn.recv()
print(f"Dict received: {msg}")

parent_conn.close()
child_conn.close()

In [None]:
import multiprocessing
from multiprocessing.connection import Connection


def pipe_child(conn: Connection) -> None:
    """Child process that communicates via a Pipe."""
    msg: str = conn.recv()
    conn.send(f"echo: {msg}")
    conn.close()


# Two-process communication with Pipe
parent_conn, child_conn = multiprocessing.Pipe()

p = multiprocessing.Process(target=pipe_child, args=(child_conn,))
p.start()

# Parent sends a message and waits for the reply
parent_conn.send("hello from parent")
reply: str = parent_conn.recv()
print(f"Parent got reply: {reply}")

p.join()
parent_conn.close()

## Value and Array: Shared Memory

`multiprocessing.Value` and `multiprocessing.Array` create shared memory objects that
can be accessed by multiple processes. They use ctypes type codes:

| Type code | C type | Python type |
|-----------|--------|:------------|
| `'i'` | `int` | `int` |
| `'d'` | `double` | `float` |
| `'f'` | `float` | `float` |
| `'c'` | `char` | `bytes` |

These objects live in **shared memory** (not copied), so changes are visible to all processes.

In [None]:
import multiprocessing

# Value: a single shared value
counter = multiprocessing.Value("i", 0)  # 'i' = signed int, initial value = 0
print(f"Initial value: {counter.value}")

counter.value = 42
print(f"After assignment: {counter.value}")

# Value with a float
temperature = multiprocessing.Value("d", 98.6)  # 'd' = double
print(f"Temperature: {temperature.value}")

# Array: shared array of fixed type
arr = multiprocessing.Array("d", [1.0, 2.0, 3.0, 4.0, 5.0])
print(f"\nArray contents: {list(arr)}")

arr[0] = 10.0
arr[4] = 50.0
print(f"After modification: {list(arr)}")
print(f"Array length: {len(arr)}")

In [None]:
import multiprocessing


def increment_counter(
    counter: multiprocessing.Value,
    times: int,
) -> None:
    """Increment a shared counter multiple times."""
    for _ in range(times):
        with counter.get_lock():  # Acquire the built-in lock
            counter.value += 1


# Shared counter accessed by multiple processes
shared_counter = multiprocessing.Value("i", 0)

processes: list[multiprocessing.Process] = [
    multiprocessing.Process(target=increment_counter, args=(shared_counter, 1000))
    for _ in range(4)
]

for p in processes:
    p.start()
for p in processes:
    p.join()

# With proper locking, result should be exactly 4000
print(f"Final counter value: {shared_counter.value}")
print(f"Expected: {4 * 1000}")
print(f"Correct: {shared_counter.value == 4000}")

## Lock: Synchronizing Shared Access

`multiprocessing.Lock` prevents multiple processes from accessing a shared resource
simultaneously. It works exactly like `threading.Lock` but across processes.

- `lock.acquire()` / `lock.release()` -- manual lock management
- `with lock:` -- context manager (preferred) for automatic release

Without a lock, concurrent modifications to shared state can produce incorrect results
due to **race conditions**.

In [None]:
import multiprocessing


def unsafe_increment(
    counter: multiprocessing.Value,
    times: int,
) -> None:
    """Increment WITHOUT a lock -- prone to race conditions."""
    for _ in range(times):
        counter.value += 1  # NOT atomic!


def safe_increment(
    counter: multiprocessing.Value,
    lock: multiprocessing.Lock,
    times: int,
) -> None:
    """Increment WITH a lock -- safe from race conditions."""
    for _ in range(times):
        with lock:
            counter.value += 1


# Unsafe version (may produce incorrect results)
unsafe_counter = multiprocessing.Value("i", 0)
procs = [
    multiprocessing.Process(target=unsafe_increment, args=(unsafe_counter, 10000))
    for _ in range(4)
]
for p in procs:
    p.start()
for p in procs:
    p.join()
print(f"Unsafe counter: {unsafe_counter.value} (expected 40000)")

# Safe version with explicit Lock
safe_counter = multiprocessing.Value("i", 0)
lock = multiprocessing.Lock()
procs = [
    multiprocessing.Process(target=safe_increment, args=(safe_counter, lock, 10000))
    for _ in range(4)
]
for p in procs:
    p.start()
for p in procs:
    p.join()
print(f"Safe counter:   {safe_counter.value} (expected 40000)")

## Manager: Proxy-Based Shared Objects

`multiprocessing.Manager()` creates a server process that hosts shared Python objects.
Other processes access these objects through **proxies**. This is more flexible than
`Value`/`Array` because it supports standard Python types:

- `manager.list()` -- shared list
- `manager.dict()` -- shared dictionary
- `manager.Value()` -- shared value
- `manager.Queue()` -- shared queue
- `manager.Lock()` -- shared lock

The tradeoff: Managers are **slower** than `Value`/`Array` because they use inter-process
communication (proxies) under the hood, but they are more convenient for complex data.

In [None]:
import multiprocessing
from multiprocessing.managers import DictProxy, ListProxy


def add_results(
    shared_dict: DictProxy,
    shared_list: ListProxy,
    key: str,
    values: list[int],
) -> None:
    """Add results to shared dict and list."""
    total: int = sum(values)
    shared_dict[key] = total
    shared_list.append(f"{key}={total}")


with multiprocessing.Manager() as manager:
    # Create shared objects through the manager
    shared_dict = manager.dict()
    shared_list = manager.list()

    # Launch processes that modify shared objects
    processes = [
        multiprocessing.Process(
            target=add_results,
            args=(shared_dict, shared_list, f"task_{i}", list(range(i * 10))),
        )
        for i in range(1, 5)
    ]
    for p in processes:
        p.start()
    for p in processes:
        p.join()

    print("Shared dict:")
    for key, value in sorted(shared_dict.items()):
        print(f"  {key}: {value}")

    print(f"\nShared list: {list(shared_list)}")

## Choosing the Right IPC Mechanism

| Mechanism | Use Case | Speed | Flexibility |
|-----------|----------|:------|:------------|
| **Queue** | Multiple producers/consumers, message passing | Medium | High |
| **Pipe** | Two-process communication, request/response | Fast | Low |
| **Value** | Single shared number (int, float) | Fast | Low |
| **Array** | Shared fixed-size array of numbers | Fast | Low |
| **Manager** | Shared dicts, lists, complex objects | Slow | High |
| **Lock** | Protect shared resources from race conditions | -- | -- |

## Practical: Producer-Consumer with Queue

The **producer-consumer** pattern is a classic concurrency design pattern. Producers
generate data and place it on a queue; consumers take data from the queue and process it.
A special **sentinel** value (e.g., `None`) signals consumers to stop.

In [None]:
import multiprocessing
import os
import time


def producer(
    queue: multiprocessing.Queue,
    items: list[int],
    name: str,
) -> None:
    """Produce items and put them on the queue."""
    for item in items:
        queue.put((name, item))
        time.sleep(0.05)  # Simulate work
    print(f"  Producer {name} (PID {os.getpid()}) finished")


def consumer(
    queue: multiprocessing.Queue,
    result_queue: multiprocessing.Queue,
    name: str,
) -> None:
    """Consume items from the queue until sentinel is received."""
    processed: int = 0
    while True:
        item = queue.get()
        if item is None:  # Sentinel value
            break
        producer_name, value = item
        result_queue.put((name, producer_name, value, value * value))
        processed += 1
    print(f"  Consumer {name} (PID {os.getpid()}) processed {processed} items")


# Set up the queues
work_queue: multiprocessing.Queue = multiprocessing.Queue()
result_queue: multiprocessing.Queue = multiprocessing.Queue()

# Start 2 producers and 2 consumers
producers = [
    multiprocessing.Process(target=producer, args=(work_queue, [1, 2, 3, 4], "P1")),
    multiprocessing.Process(target=producer, args=(work_queue, [5, 6, 7, 8], "P2")),
]
consumers = [
    multiprocessing.Process(target=consumer, args=(work_queue, result_queue, "C1")),
    multiprocessing.Process(target=consumer, args=(work_queue, result_queue, "C2")),
]

for p in producers + consumers:
    p.start()

# Wait for producers to finish
for p in producers:
    p.join()

# Send sentinel values (one per consumer)
for _ in consumers:
    work_queue.put(None)

# Wait for consumers to finish
for c in consumers:
    c.join()

# Collect results
print("\nResults:")
while not result_queue.empty():
    consumer_name, producer_name, value, squared = result_queue.get()
    print(f"  {consumer_name} processed {producer_name}'s item: {value}^2 = {squared}")

## Shared Array with Worker Processes

When multiple processes need to write results into a shared data structure, `Array`
combined with index-based partitioning avoids the need for locks entirely. Each process
writes to its own slice of the array.

In [None]:
import multiprocessing


def fill_array_slice(
    shared_arr: multiprocessing.Array,
    start: int,
    end: int,
    multiplier: float,
) -> None:
    """Fill a slice of a shared array with computed values."""
    for i in range(start, end):
        shared_arr[i] = float(i) * multiplier


# Create a shared array of 12 doubles
size: int = 12
shared: multiprocessing.Array = multiprocessing.Array("d", size)

# Each process handles a different slice (no lock needed)
chunk: int = size // 3
procs = [
    multiprocessing.Process(target=fill_array_slice, args=(shared, 0, chunk, 1.0)),
    multiprocessing.Process(target=fill_array_slice, args=(shared, chunk, 2 * chunk, 2.0)),
    multiprocessing.Process(target=fill_array_slice, args=(shared, 2 * chunk, size, 3.0)),
]

for p in procs:
    p.start()
for p in procs:
    p.join()

print("Shared array contents:")
for i, val in enumerate(shared):
    print(f"  [{i}] = {val}")

## Summary

### Key Takeaways

| Concept | API | Purpose |
|---------|-----|:--------|
| **Queue** | `multiprocessing.Queue` | FIFO message passing between processes |
| **Pipe** | `multiprocessing.Pipe()` | Fast two-process bidirectional channel |
| **Value** | `multiprocessing.Value('i', 0)` | Single shared int/float in shared memory |
| **Array** | `multiprocessing.Array('d', [...])` | Fixed-size shared array in shared memory |
| **Lock** | `multiprocessing.Lock()` | Prevent race conditions on shared state |
| **Manager** | `multiprocessing.Manager()` | Proxy-based shared dicts, lists, etc. |

### Best Practices
- Use `Queue` for general-purpose message passing between any number of processes
- Use `Pipe` for fast, simple two-process communication
- Use `Value` and `Array` for high-performance shared memory with numeric data
- Always use locks when multiple processes write to the same `Value` or `Array`
- Use `Manager` when you need to share complex Python objects (dicts, lists)
- Prefer partitioning work (each process writes its own slice) over shared locks
- Use sentinel values to signal consumers when all work is done