## Use subprocess to manage child processes
Child processes started by Python are able to run in parallel, enabling you to use Python to consume all of the CPU cores of your machine.

In [1]:
import subprocess

proc = subprocess.Popen(["echo", "Hello from the child!"], stdout=subprocess.PIPE, shell=True)
out, err = proc.communicate()
print(out.decode("utf-8"))

"Hello from the child!"



Child processes can run independently of their parent process, the Python interpreter. Their status can be polled periodically:

In [2]:
import time

proc = subprocess.Popen(["sleep", "0.3"], shell=True)
while proc.poll() is None:
    print("Working...")
    time.sleep(0.05)
    
print("Exit status", proc.poll())

Working...
Working...
Working...
Working...
Working...
('Exit status', 1)


Starting several child processes...

In [8]:
# https://docs.python.org/2/library/subprocess.html
import subprocess
import time

def run_sleep(period):
    proc = subprocess.Popen(["sleep", str(period)], shell=True)
    return proc
    
start = time.time()
procs = []

# start all of the processes
for _ in range(10):
    proc = run_sleep(0.1)
    procs.append(proc)

for proc in procs:
    proc.communicate()

statusList = [proc.poll() for proc in procs]
print(statusList)

end = time.time()
print("Finished in %.3f seconds" % (end - start))

[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
Finished in 0.669 seconds


In [17]:
# from https://pymotw.com/2/subprocess/
import subprocess

# Simple command
subprocess.call(["dir"], shell=True) # running on Windows

0

Setting the shell argument to a true value causes subprocess to spawn an intermediate shell process, and tell it to run the command. The default is to run the command directly. Using an intermediate shell means that variables, glob patterns, and other special shell features in the command string are processed before the command is run.

In [18]:
import subprocess

# Command with shell expansion
subprocess.call("echo $HOME", shell=True)

0

In [19]:
# capturing output
import subprocess

output = subprocess.check_output(["dir"], shell=True) # running on Windows
print 'Have %d bytes in output' % len(output)
print output

Have 805 bytes in output
 Volume in drive U is New Volume
 Volume Serial Number is AAD0-7DCA

 Directory of U:\git\Effective-Python

09/05/2018  06:07 PM    <DIR>          .
09/05/2018  06:07 PM    <DIR>          ..
23/04/2018  05:29 PM             1,258 .gitignore
07/05/2018  09:22 AM    <DIR>          .ipynb_checkpoints
23/04/2018  05:16 PM            19,867 1 Pythonic Thinking.ipynb
02/05/2018  07:01 PM            21,706 2 Functions.ipynb
03/05/2018  06:27 PM            26,085 3 Classes and Inheritance.ipynb
07/05/2018  09:20 AM            35,868 4 Metaclasses and Attributes.ipynb
09/05/2018  06:07 PM             6,225 5 Concurrency and Parallelism.ipynb
23/04/2018  05:29 PM               241 README.md
               7 File(s)        115,346 bytes
               3 Dir(s)  315,015,225,344 bytes free



In [21]:
# or running a batch file
import subprocess

filepath="myBatch.bat"
p = subprocess.Popen(filepath, shell=True, stdout = subprocess.PIPE)

stdout, stderr = p.communicate()
print p.returncode # is 0 if success
print stdout

0

[r] U:\git\Effective-Python>echo Message in a batch file! 
Message in a batch file!



## Use Threads for Blocking I/O, Avoid for Parallelism

In [22]:
import time

def factorize(number):
    for i in range(1, number + 1):
        if number % i == 0:
            yield i
            
numbers = [2139079, 1214759, 1516637, 1852285]
start = time.time()
for number in numbers:
    list(factorize(number))
end = time.time()
print("Took %0.3f seconds" % (end - start))

Took 0.575 seconds


Using threading for the same task

In [23]:
import threading

class FactorizeThread(threading.Thread):
    def __init__(self, number):
        super(FactorizeThread, self).__init__()
        self._number = number
        
    def run(self):
        self.factors = list(factorize(self._number))


In [24]:
start = time.time()
threads = []
for number in numbers:
    thread = FactorizeThread(number)
    thread.start()
    threads.append(thread)
    
for thread in threads:
    thread.join()
    
end = time.time()
print("Took %.3f seconds" % (end - start))

Took 0.820 seconds


This is slower because of Python's GIL. Only one thread makes forward progress at a time.

## Use Lock to prevent data races in threads
Here we have two threads both adding to a counter. If each count to 1000 then we might expect the total to be 2000, but this is not always the case.

In [25]:
import threading
import time

class SomeThread(threading.Thread):
    def __init__(self):
        super(SomeThread, self).__init__()
        self.daemon = True
        self.start()
        
    def run(self):
        for i in range(1000):
            global count
            count = count + 1

for i in range(5):
    count = 0
    threads = [SomeThread(), SomeThread()]
    [thread.join() for thread in threads]
    print(count)

2000
2000
2000
1592
2000


Suppose thread 1 references the current value of count as 1500 and then increments it to 1501. However, before thread 1 gets the chance to change count the second thread also references count as 1500 and increments it to 1501. Thread 1 has not finished applying its increment before thread two operates on count and we end up with a global count variable that is incorrect. <br>
<br>
This is where locks come in: Locks are the most fundamental synchronization mechanism provided by the threading module. Locks are typically used to synchronize access to a shared resource. For each shared resource, create a Lock object. When you need to access the resource, call acquire to hold the lock (this will wait for the lock to be released, if necessary), and call release to release it. If a thread attempts to hold a lock that’s already held by some other thread, execution of the first thread is halted until the lock is released. The acquire method takes an optional wait flag, which can be used to avoid blocking if the lock is held by someone else:

In [26]:
import threading
import time

lock = threading.Lock() # create a lock per shared resource

class SomeThread(threading.Thread):
    def __init__(self):
        super(SomeThread, self).__init__()
        self.daemon = True
        self.start()
        
    def run(self):
        for i in range(1000):
            global count
            lock.acquire(True) # acquire the lock with wait for release by other threads = True
            count = count + 1
            lock.release()     # always release the lock so that resources can be accessed by other threads

for i in range(5):
    count = 0
    threads = [SomeThread(), SomeThread()]
    [thread.join() for thread in threads]
    print(count)

2000
2000
2000
2000
2000


When using lock there is the possibility that a thread can block itself from acquiring the lock when calling lock.acquire if it already holds the lock itself:

In [None]:
import threading
import time

lock = threading.Lock()

class SomeThread(threading.Thread):
    def __init__(self):
        super(SomeThread, self).__init__()
        self.daemon = True
        self.start()
        
    def run(self):
        for i in range(1000):
            global count
            lock.acquire(True)
            lock.acquire(True)
            count = count + 1
            lock.release()

count = 0
thread = SomeThread()
thread.join()
print(count)

Notice that the thread gets stuck. The thread attempts to acquire the lock twice with lock.acquire(True). When calling lock.acquire(True) a second time the thread will wait for the lock to be released not realising that the lock has already been acquired and unreleased by itself. To get around this there are reentrant locks (RLock Objects). With RLocks a thread cannot block itself from acquiring the lock:

In [1]:
import threading
import time

lock = threading.RLock()

class SomeThread(threading.Thread):
    def __init__(self):
        super(SomeThread, self).__init__()
        self.daemon = True
        self.start()
        
    def run(self):
        for i in range(1000):
            global count
            lock.acquire(True)
            lock.acquire(True)
            count = count + 1
            lock.release()

count = 0
thread = SomeThread()
thread.join()
print(count)

1000


## Use Queue to coordinate work between threads