# Why?

SPEED.

You want to spread the processing load to take advantage of modern systems' multiple cores.

Two main approaches:

- multiple processes
- multiple threads


# Terminology

- multi-processing: an application utilizing multiple OS-level processes
- multi-threading: an application with multiple threads running within a process
- lock: synchronization mechanism for enforcing access limits (ex: serialized access among processes / threads)
- Global Interpreter Lock (GIL):
    - "The mechanism used by the CPython interpreter to assure that only one thread executes Python bytecode at a time. This simplifies the CPython implementation by making the object model (including critical built-in types such as dict) implicitly safe against concurrent access. Locking the entire interpreter makes it easier for the interpreter to be multi-threaded, at the expense of much of the parallelism afforded by multi-processor machines." ([Ref.](http://docs.python.org/3/glossary.html#term-global-interpreter-lock))
    - See also [here](http://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock) for lower-level details
- futures:
    - AKA promises
    - a value that represents an asynchronous computation
    - the actual computation may take place in a separate thread OR separate process
    - convenient, high-level library, often simpler than using `threading` or `multiprocessing`


# Libraries

## subprocess
- spawn individual subprocesses
    - subprocess.call (execute subprocess, return error code)
    - subprocess.check_output (execute subprocess, return process output)
    - argument "shell=True" allows full command strings (optionally with pipes) to be passed directly to these methods

### Exercise
1. Lauch any subprocess from a script and examine it's return code in the python interpreter
2. Capture the output of a pipeline (ex: `date | tr a-z A-Z | tr 1 Q`) in a variable

## threading
- [docs](http://docs.python.org/3/library/threading.html)
- examples: [here](http://pymotw.com/2/threading/index.html#module-threading)

## multiprocessing
- [docs](http://docs.python.org/3/library/multiprocessing.html)
- examples [here](http://pymotw.com/2/multiprocessing/basics.html)

## concurrent.futures
- proposed in [PEP-3148](http://www.python.org/dev/peps/pep-3148/)
- available in standard library since Python 3.2 (~2010, AFTER Programming in Python 3, 2nd edition was published!)
- [docs](http://docs.python.org/3/library/concurrent.futures.html)

    

In [5]:
# What are futures?                                                                                                                                            
                                                                                                                                                                                                                                                                                                                   
    # pseudocode                                                                                                                                               
    # wrap the computation in a future                                                                                                                         
    future_val = Future(my_long_computation)                                                                                                                   
    ... # do other work                                                                                                                                        
    # now we need it, so block until it's ready                                                                                                                
    print(future_val.result())


IndentationError: unexpected indent (<ipython-input-5-150803019e0a>, line 3)

* A Future is a value that represents an asynchronous computation                                                                                              
* The actual computation may take place in a separate thread or separate process                                                                               


# Why care about Futures?                                                                                                                                      
- "The Free Lunch Is Over" [(Sutter, 2005)](http://www.gotw.ca/publications/concurrency-ddj.htm)                                                                                                                      
- Futures encapsulate the low-level details of parallel programming (launching threads, IPC, etc.)                                                             
- Great for launching several long or blocking tasks in parallel (Web Requests, File I/O, Computation)                                                         
- Python is one of the few mainstream languages to support Futures natively                                                                                    


In [1]:
# Futures: Example                                                                                                                                             
                                                                                                                                                                                                                                                                                                                 
    # futures.py                                                                                                                                               
    # New in Python 3.2                                                                                                                                        
    import concurrent.futures                                                                                                                                  
    import math                                                                                                                                                
                                                                                                                                                               
    def is_prime(n):                                                                                                                                           
        if n % 2 == 0:                                                                                                                                         
            return False                                                                                                                                       
                                                                                                                                                               
        sqrt_n = int(math.floor(math.sqrt(n)))                                                                                                                 
        for i in range(3, sqrt_n + 1, 2):                                                                                                                      
            if n % i == 0:                                                                                                                                     
                return False                                                                                                                                   
        return True                                                                                                                                            
                                                                                                                                                               
    PRIMES = [112272535095293, 1099726899285419, 112582705942171]                                                                                              
                                                                                                                                                               
    # computations will be done in separate processes                                                                                                          
    with concurrent.futures.ProcessPoolExecutor() as executor:                                                                                                 
        for number, prime in zip(PRIMES, executor.map(is_prime, PRIMES)):                                                                                      
            print('{:d} is prime: {}'.format(number, prime)))


IndentationError: unexpected indent (<ipython-input-1-6063ba4a3a2e>, line 5)

# Futures: Caveats                                                                                                                                             
- ThreadPoolExecutor requires a "max_workers" argument                                                                                                         
    - Optional for ProcessPoolExecutor
    - Defaults to the number of processors                
- ProcessPoolExecutor uses multiprocessing                                                                                                                     
    - It safely bypasses the Global Interpreter Lock (this is good for parallelism!)
    - BUT can only execute/return picklable objects (i.e. no sockets, database connections, etc.)
        - c.f. upcoming (Python 3.4, march 2014) [asyncio](http://docs.python.org/3.4/library/asyncio.html) module!
            - nodejs-style event-loop based concurrency, without callback-style code (uses [yield from](http://www.python.org/dev/peps/pep-0380/))


In [None]:
# Futures: Deadlocks Example 1 (Live)                                                                                                                          
                                                                                                                                                                                                                                                                                                                 
    from concurrent.futures import ThreadPoolExecutor
    import time                                                                                                                                                
    def wait_on_bob():                                                                                                                                         
        time.sleep(5)                                                                                                                                          
        print(bob.result()) # bob will never complete because it is waiting on alice.                                                                          
        return 5                                                                                                                                               
                                                                                                                                                               
    def wait_on_alice():                                                                                                                                       
        time.sleep(5)                                                                                                                                          
        print(alice.result()) # alice will never complete because it is waiting on bob.                                                                        
        return 6                                                                                                                                               
                                                                                                                                                               
    executor = ThreadPoolExecutor(max_workers=2)                                                                                                               
    alice = executor.submit(wait_on_bob)                                                                                                                       
    bob = executor.submit(wait_on_alice)                                                                                                                       
                                                                                                                                                               
* Deadlock occurs because a is waiting on b and b is waiting on a                                                                                              
* Simple Solution: Don't have Futures wait on other Futures                                                                                                    


In [None]:
# Futures: Deadlocks Example 2                                                                                                                                 
                                                                                                                                                               
    from concurrent.futures import ThreadPoolExecutor                                                                                                                                                   
   
    def wait_on_future():                                                                                                                                      
        f = executor.submit(pow, 5, 2)                                                                                                                         
        # This will never complete because there is only one worker                                                                                            
        # thread and it is executing this function!!!                                                                                                          
        print(f.result())                                                                                                                                      
                                                                                                                                                               
    executor = ThreadPoolExecutor(max_workers=1)                                                                                                               
    executor.submit(wait_on_future)                                                                                                                            
                                                                                                                                                               
* Deadlock occurs due to lack of worker threads                                                                                                                
* Solution: we need more threads                                                                                                                               
* Simpler solution: Futures should not wait on Futures and probably shouldn't launch them either                                                               


# Exercise: Futures                                                                                                                                            
* Run each script 3 times, calculating the average execution time using the time command: `time ./myscript.py`                                                                                                                      
* Compare average execution time among the different scripts                                                                                                          


# EXERCISE: Benchmarking CPU or IO bound tasks

Using the `ProcessPoolExecutor` OR the `ThreadPoolExecutor` example from the `concurrent.futures` documentation as a starting point:

- Change the task to run (ex: use different URLs,  different numbers, or different tasks altogether if you're feeling ambitious)
- benchmark at least 3 runs of:
    - single worker execution (`max_workers=1`), and
    - multi-worker execution (`max_workers > 1`, you can leave `max_workers` unset to accept the default value for example)
- Review your benchmark data and see whether it aligns with your expectations