# I36 : Use subprocess to Manage Child Processes

In [2]:
import subprocess

proc = subprocess.Popen(
    ['echo', 'Hello from the child!'],
    stdout=subprocess.PIPE)
out, err = proc.communicate()
print(out.decode('utf-8'))

Hello from the child!



In [3]:
proc = subprocess.Popen(['sleep', '0.3'])
while proc.poll() is None:
    print('Working')
    

Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working


Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working
Working


In [10]:
def run_sleep(period):
    proc = subprocess.Popen(['sleep', str(period)])
    return proc


from time import time 

start = time()
procs = []
for _ in range(10):
    proc = run_sleep(0.1)
    procs.append(proc)
    
for proc in procs:
    proc.communicate()
end = time()
print('Finished in %.3f seconds' % (end - start))

Finished in 0.151 seconds


In [11]:
procs

[<subprocess.Popen at 0x7ff4701760f0>,
 <subprocess.Popen at 0x7ff470176240>,
 <subprocess.Popen at 0x7ff4701762b0>,
 <subprocess.Popen at 0x7ff470176278>,
 <subprocess.Popen at 0x7ff470176320>,
 <subprocess.Popen at 0x7ff470176400>,
 <subprocess.Popen at 0x7ff470176470>,
 <subprocess.Popen at 0x7ff4701764e0>,
 <subprocess.Popen at 0x7ff470176550>,
 <subprocess.Popen at 0x7ff4701765c0>]

In [12]:
def run_openssl(data):
    env = os.environ.copy()
    env['password'] = b'\xe24U\n\xd0Ql3S\x11'
    proc = subprocess.Popen(
        ['openssl', 'enc', '-des3', '-pass', 'env:password'],
        env=env,
        stdin=subprocess.PIPE,
        stdout=subprocess.PIPE
    )
    proc.stdin.write(data)
    proc.stdin.flush()
    return proc
    

In [14]:
import os

procs = []
for _ in range(3):
    data = os.urandom(10)
    proc = run_openssl(data)
    procs.append(proc)

In [15]:
for proc in procs:
    out, err = proc.communicate()
    print(out[-10:])

b'F\x1a\xaa\xfb\xea\x0f\xd3x\xf7{'
b'|Io\xf8\xac\xf9\x0e\x19>\xf6'
b'\xe7\x98\x90\xac\xad\xb7\n\xbd\xcd\xe0'


In [16]:
def run_md5(input_stdin):
    proc = subprocess.Popen(
        ['md5'],
        stdin=input_stdin,
        stdout=subprocess.PIPE)
    return proc

In [17]:
input_procs = []
hash_procs = []
for _ in range(3):
    data = os.urandom(10)
    proc = run_openssl(data)
    input_procs.append(proc)
    hash_proc = run_md5(proc.stdout)
    hash_procs.append(hash_proc)

FileNotFoundError: [Errno 2] No such file or directory: 'md5'

In [18]:
for proc in input_procs:
    proc.communicate()
for proc in hash_procs:
    out, err = proc.communicate()
    print(out.strip())

In [19]:
proc = run_sleep(10)
try:
    proc.communicate(timeout=0.1)
except subporcess.TimeoutExpired:
    proc.teminate()
    proc.wait()

NameError: name 'subporcess' is not defined

## Things to Remember
- Use the subprocess module to run child processes and manage their input and output streams.
- Child processes run in parallel with the Python interpreter, enabling you to maximize your CPU usage.
- Use the timeout parameter with communicate to avoid deadlocks and hanging child processes.

# I37 : Use Threads for Blocking I/O, Avoid for Parallelism

- The standard implementation of PYthon is called CPython. CPython runs a Python program in two steps. First, it parses and compiles the source text into bytecode. Then, it runs the bytecode using a stack-based interpreter. The bytecode interpreter has state that must be maintained and coherent while the Python program executes. Python enforces coherence with a mechamism called the *global interpreter lock (GIL)*.

- Essentially, the GIL is a mutual-exclusion lock (mutex) that prevents CPython from being affected by preemptive multithreading , where one thread takes control of a program by interrupting another thread. Such an interruption could corrupt the interpreter state if it comes at an unexpected time. The GIL prevents these interruptions and ensures that every bytecode instruction works correctly with the CPython implementation and its C-extension modules.

- The GIL has an important negative side effect. With programs written in languages like C++ or Java, having multiple threads of execution means your program could utilize multiple CPU cores at the same time. Although Python supports multiple threads of execution, the GIL causes only one of them to make forward progress at a time. This means that when you reach for threads to do parallel computation and speed up your Python programs, you will be sorely disappointed.

- For example, say you want to do something computationally intensive with Python. I'll use a naive number factorization algorithm as a proxy.

In [20]:
def factorize(number):
    for i in range(1, number + 1):
        if number % i == 0:
            yield i

In [21]:
numbers = [2139079, 1214759, 1516637, 1852285]
start = time()
for number in numbers:
    list(factorize(number))
end = time()
print('Took %.3f seconds' % (end - start))

Took 0.449 seconds


In [24]:
from threading import Thread

class FactorizerThread(Thread):
    def __init__(self, number):
        super().__init__()
        self.number = number

    def run(self):
        self.factors = list(factorize(self.number))

In [25]:
start = time()
threads = []
for number in numbers:
    thread = FactorizerThread(number)
    thread.start()
    threads.append(thread)

In [26]:
for thread in threads:
    thread.join()
end = time()
print('Took %.3f seconds' % (end - start))

Took 38.569 seconds


- First, multiple threads make it easy for your program to seem like it's doing multiple things at the same time. Managin the juggling act of simultaneous tasks is difficult to implement yourself. With threads, you can leave it to PYthon to run your functions seemingly in parallel. This works because CPython ensures a level of fairness between Python threads of execution, even though only one of them makes forward progress at a time due to the GIL.

- The second reason Python supports threads is to deal with blocking I/O, which happens when Python does certain types of system calls. System calls are how your Python program asks your computer's operating system to interact with the external environment on your behalf. Blocking I/O includes things like reading and writing files, interacting with networks, communicating with devices like displays, etc. Threads help you handle blocking I/o by insulating your program from the time is takes for the operating system to respond to your requests.

- For example, say you want to send a signal to a remote-controlled helicopter through a serial port. I'll use a slow system call (select) as a proxy for this activity.This function asks the operating system to block for 0.1 second and then return control to my program, similar to what would happen when using a synchronous serial port.

In [27]:
import select

def slow_systemcall():
    select.select([], [], [], 0.1)

In [28]:
start = time()
for _ in range(5):
    slow_systemcall()
end = time()
print('Took %.3f seconds' % (end - start))

Took 0.507 seconds


- The problem is that while the slow\_systemcall function is running, my program can't make any other progress. My program's main thread of execution is blocked on the select system call. This situation is awful in practice. You need to be able to compute your helicopter's next move while you're sending it a signal, otherwise it'll crash. When you find yourself needing to do blocking I/O and computation simultaneously, it's time to consider moving your system calls to threads.

- Here, I run multiple invacations of the slow\_systemcall function in separate threads. This would allow you to communicate with multiple serial ports at the same time, while leaving the main thread to do whateveer computation is required.

In [30]:
start = time()
threads = []
for _ in range(5):
    thread = Thread(target=slow_systemcall)
    thread.start()
    threads.append(thread)

In [32]:
def compute_helicopter_location(index):
    # ...
    pass
    
for i in range(5):
    compute_helicopter_location(i)
for thread in threads:
    thread.join()
end = time()
print('Took %.3f seconds' % (end - start))

Took 158.923 seconds


## Things to Remember

- Python threas can't run bytecode in parallel on multiple CPU cores becauese of the global interpreter lock(GIL)
- Python threads are still useful despite the GIL becuase they provide an easy way to do multiple things at seemingly the same time.
- Use Python threads to make multiple system calls in parallel. This allows you to do blocking I/O at the same time as computation.

# I38 : Use Lock to Prevent Data Races in Threads

- After learning about the global interpreter lock (GIL), many new Python programmers assume they can forgo using mutual-exclusion locks (mutexes) in their code altogether. If the GIL is already preventing Python threads from running on multiple CPU cores in parallel, it must also act as a lock for a program's data structures, right? Some testing on types like lists and dictionaries may even show that this assumption appears to hold.

- But beware, this is truly not the case. The GIL will not protect you. Although only one Python thread runs at a time, a thread's operations on data structures can be interrupted between any two bytecode instructions. in the Python interpreter. This is dangerous if you access the same objects from multiple threads simultaneously. The invariants of your data structures could be violated at practically any time because of these interruptions, leaving your program in a corrupted state.

- For example, say you want to write a program tha counts many things in parallel, like sampling light levles from a whole network of sensors. If you want to determine the total number of light sampels over time, you can aggregate them with a new class.

In [33]:
class Counter(object):
    def __init__(self):
        self.count = 0

    def increment(self, offset):
        self.count += offset

- Imagine that each sensor has its own worker thread because reading from the sensor requires blocking I/O. After each sensor measurement, the worker thread increments the counter up to a maximum number of desired readings.

In [36]:
def worker(sendor_index, how_many, counter):
    for _ in range(how_many):
        # Read from the sensor
        #...
        counter.increment(1)

In [37]:
def run_threads(func, how_many, counter):
    threads = []
    for i in range(5):
        args = (i, how_many, counter)
        thread = Thread(target=func, args=args)
        threads.append(thread)
        thread.start()
    for thread in threads:
        thread.join()

In [38]:
how_many = 10**5
counter = Counter()
run_threads(worker, how_many, counter)
print('Counter should be %d, found %d' %
     (5 * how_many, counter.count))

Counter should be 500000, found 361076


- The Counter object's increment method looks simple.

In [None]:
counter.count += offset

- but the += operation used on an object attribute actually instructs Python to do three separate operations behind the scenes.

In [40]:
value = getattr(counter, 'count')
result = value + offset
setattr(counter, 'count', result)

NameError: name 'offset' is not defined

- Python threads incrementing the counter can be suspended between any two of these operations. This is problematic if the way the operations interleave causes old versions of value to be assigned to the counter. Here's an example of bad interaction between two threads, A and B:

In [None]:
# Running in Thread A
value_a = getattr(counter, 'count')
# Context switch to Thread B
value_b = getattr(counter, 'count')
result_b = value_b + 1
setattr(counter, 'count', result_b)
# Context switch back to Thread A
result_a = value_a + 1
setattr(counter, 'count', result_a)

- Thread A stomped on thread B, erasing all of its progress incrementing the counter. This is exactly what happened in the light sensor example above.

- To prevent data races like these and other forms of data structure corruption, Python includes a robust set of tools in the threading built-in module. the simplest and most useful of them is hte Lock class, a mutual-exclusion lock(mutex).

- By using a lock, I can have the Counter class protect its current value against simultaneous access from multiple threads. Only one thread will be able to acquire the lock at a time. Here, I use a with statement to acquire and release the lock; this makes it easier to see which code is executing while the lock is held.

In [49]:
from threading import Lock

class LockingCounter(object):
    def __init__(self):
        self.lock = Lock()
        self.count = 0
        
    def increment(self, offset):
        with self.lock:
            self.count += offset

In [50]:
counter = LockingCounter()
run_threads(worker, how_many, counter)
print('Counter should be %d, found %d' %
      (5 * how_many, counter.count))

Counter should be 500000, found 500000


## Things to Remember

- Even though Python has a global interpreter lock, you're still responsible for  protecting against data races between the greads in your programs.
- Your programs will corrup their data structures if you allow multiple threads to modify the same objects without locks.
- The Lock class in the threading built-in module it Python's standard mutual exclusion lock implementation.