# Process
- A running instance of a computer program

## 1. Processes vs Threads
- Process: Sidesteps GIL, Less need for synchronization, Can be paused and terminated, more resilient
- Thread: Higher memory footprint, expensive content switches

## 2. Simple multiprocessing pattern

In [1]:
import multiprocessing
import time

def do_some_work(val):
    print("Doing some work in thread")
    time.sleep(1)
    print("echo: {}".format(val))

if __name__ == "__main__":
    val = "text"
    p = multiprocessing.Process(target=do_some_work, args=(val,))
    p.start()
    print("Start thread, process alive: {}".format(p.is_alive()))
#     p.terminate()  # Terminate the process
    p.join()
    print("End thread")

Doing some work in thread
Start thread, process alive: True
echo: text
End thread


## 3. Terms 
- Pickle: Process whereby a Python object hierarchy is converted into a byte stream. "Unpickling" is the inverse operation.
- Deamon Process: A child process that does not prevent its parent process from exiting

## 4. Other operation
- p.is_alive(): Check if process is alive
- p.terminate(): Terminate a process
- multiprocessing.cpu_count(): Check # of CPUs

## 5. Process pool
- A process pool object which controls a pool of worker processes to which jobs can be submitted. It supports asynchronous results with timeouts and callbacks and has a parallel map implementation.

### 5.0. Pool example

In [2]:
def do_work(data):
    time.sleep(1)
    return data**2

def start_process():
    print("Start", multiprocessing.current_process().name)
    
if __name__ == "__main__":
    pool_size = multiprocessing.cpu_count() * 2
    pool = multiprocessing.Pool(processes=pool_size, initializer=start_process)
    inputs = list(range(10))
    # map(): block until it's ready
    # map_async(): non-block and return a call back
    #              use .get() on call back object to get result
    outputs = pool.map(do_work, inputs)  
    pool.close()  # No more task accepted
    pool.join()  # Wait for the worker processes to exit
    print("Outputs:", outputs)    

Start ForkPoolWorker-2
Start ForkPoolWorker-3
Start ForkPoolWorker-4
Start ForkPoolWorker-5
Start ForkPoolWorker-6
Start ForkPoolWorker-7
Start ForkPoolWorker-8
Start ForkPoolWorker-9
Outputs: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


## 6. Inter-process Communication
### 6.0. Pipe
- Communication between process

In [3]:
import random
from multiprocessing import Pipe, Process
import time

def make_tuple(conn):
    num = random.randint(1, 9)
    conn.send(("Hi", num))
    print(conn.recv())
    
    
def make_string(conn):
    tup = conn.recv()
    result = ""
    substr, num = tup
    for _ in range(num):
        result += substr
    print(result)
    conn.send(result)
    
if __name__ == "__main__":
    conn1, conn2 = Pipe(duplex=True)
    p1 = Process(target=make_tuple, args=(conn1,))
    p2 = Process(target=make_string, args=(conn2,))
    p1.start()
    p2.start()
    p1.join()
    p2.join()
    print("Done")

HiHiHiHi
HiHiHiHi
Done


### 6.1. Queue
- Pipe can only have two endpoints
- Queue can have multiple producers and consumers

In [4]:
""" Data flow
1. make_tuple -> ("Hi", num) -> make_string
2. sleep for 1 second
3. make_string -> result -> make_tuple
4. make_tuple print result by using queue.get()
"""
from multiprocessing import Queue

def make_tuple(queue):
    num = random.randint(1, 9)
    queue.put(("Hi", num))
    time.sleep(1)
    print(queue.get()) # Get from 'make_string'
        
def make_string(queue):
    tup = queue.get()
    result = ""
    substr, num = tup
    for _ in range(num):
        result += substr
    queue.put(result)
    
if __name__ == "__main__":
    queue = Queue()
    p1 = Process(target=make_tuple, args=(queue,))
    p2 = Process(target=make_string, args=(queue,))
    p1.start()
    p2.start()

HiHiHiHiHiHiHiHiHi


## 7. Sharing State Between Processes
### 7.0. Value

In [5]:
from multiprocessing import Value
import multiprocessing
import ctypes

counter = Value('i')  # shared object of type int, defaults to 0
# shared object of type boolean, defaulting to False, unsynchronized
is_running = Value(ctypes.c_bool, False, lock=False)  

my_lock = multiprocessing.Lock()
# Shared object of type long, with a lock specified
size_counter = Value('l', 0, lock=my_lock)

### 7.1. Manager
- Share variables between processes

In [6]:
import multiprocessing
from multiprocessing import Process

def do_work(dictionary, item):
    dictionary[item] = item ** 2
    
if __name__ == "__main__":
    mgr = multiprocessing.Manager()
    d = mgr.dict()  # Shared dict
    # Multiple processes work on same shared-dict
    jobs = [
        Process(target=do_work, args=(d, i)) for i in range(8)
    ]

    for j in jobs:
        j.start()

    for j in jobs:
        j.join()

    print("Results:", d)

Results: {0: 0, 1: 1, 2: 4, 4: 16, 3: 9, 6: 36, 5: 25, 7: 49}
