In [2]:
%run ../00_AdvancedPythonConcepts/talktools.py

# Parallelism

Python Computing for Data Science (AY250)

## Outline for Today

- Motivation

- Single-machine
    - threading
    - multiprocessing
    - joblib
    - dask
- Multi-machine
    - ipython cluster

## Motivation

Generally, the goal of your computing task is to finish as quickly as possible. The **speed of your processor** at executing instructions and the **speed at which data can be read from disk and from RAM** are major contributors to the execution time. Obviously the **choice of algorithm(s)** is critical too. Choosing an $N \log N$ algorithm over a $N^2$ one that gets the same answer is almost always preferred for any sizeable $N$.

### Types of Bottlenecks

If you think of your run-time program as stream of data and computation on data, it should be clear that **bottlenecks are inevitable**. Your job (as you begin optimize for execution time) is to understand where those bottlenecks are and to use the tools we have in Python to minimize those. (Ultimately, it's a never-ending whack-a-mole).

#### I/O Bound

*"a condition in which the time it takes to complete a computation is determined principally by the period spent waiting for input/output operations to be completed."* -- wikipedia

This can be because we're waiting for a response from an external source (e.g. loading a webpage) or because data needs to be moved around on your bus and we're waiting for it to show up in the right place to compute on. If you have very fast CPUs, you're more likely to be I/O bound.

#### CPU Bound

*"when the time for it to complete a task is determined principally by the speed of the central processor: processor utilization is high, perhaps at 100% usage for many seconds or minutes."* -- wikipedia

If you're doing  algorithmic computations where the amount if input data is small and the amount of output data is also small (e.g. fournier transform) you'll typically be CPU bound. Slowed CPUs lead to more CPU bound bottlenecks. If you have a lot of data ("big") you're moving data around from disk, RAM, cache and you're likely I/O bound.

#### (Memory Bound)

"time to complete a given computational problem is decided primarily by the amount of memory required to hold data" - wikipedia.

<img src="https://www.evernote.com/l/AUUzntxvU9BHWJMZSH_CL3S7YRUjThJTrPEB/image.png">

Source: http://www.slideshare.net/ManojitNandi/parallel-programming-in-python-speeding-up-your-analysis

# Processes & Threads

Each Python interpreter runs in a `process,` containing the program code, stack, and its current activity. 

In [3]:
import os
os.getpid()

1288

Within a process one can create a set of `threads` which share everything with the process in which they were spawned (memory, data, state). But, most generally, they are little programs (with their own stack) that execute `concurrently` (independent of each other). Since they share things like memory, it requires the programmer to "lock" everything that might conflict. The way we make many threads in Python is using the `threading` module.

<div class="alert alert-info">The Global Interpreter Lock (GIL) in Python stops threads from truly happening in parallel. That is, the interpreter can only operate one thread at a time. This is an impliementation detail of how CPython was programmed. Many things you use push threads down into the C-layer and "avoid the GIL". </div>

You can also make many processes, which are copies of the original parent process (memory, data, state) and act independently of each other. To share data between them you have to explicitly do that within each process. The Pythonic way we do multiprocessing (creation of new processes, communication between processes) is with `multiprocessing`.

The goal of computing with `threading` and `multiprocessing` is to not wait around: the CPU should not be idle if it doesn't have too. AND since we almost always have multiple cores, we should be able to let the work we want to do happen in parallel over those cores.

### Threading

`threading.Thread(target=f, args=(...))` is the basic way to use function `f` with arguments in a thread.

`.start()`: Calls the `.run()` of a thread object. This method will raise a `RuntimeError` if called more than once on the same thread object.

In [23]:
import threading

def worker(num):
    """thread worker function"""
    print('Worker: %s' % num)
    return

threads = []
for i in range(5):
    t = threading.Thread(target=worker, args=(i,))
    threads.append(t)
    t.start()
threads

Worker: 0
Worker: 1
Worker: 2
Worker: 3
Worker: 4


[<Thread(Thread-51, stopped 123145330724864)>,
 <Thread(Thread-52, stopped 123145330724864)>,
 <Thread(Thread-53, stopped 123145330724864)>,
 <Thread(Thread-54, stopped 123145330724864)>,
 <Thread(Thread-55, stopped 123145330724864)>]

In [24]:
%%time
threads = []
for i in range(5):
    t = threading.Thread(target=worker, args=(i,))
    threads.append(t)
    t.start()

Worker: 0
Worker: 1
Worker: 2
Worker: 3
Worker: 4
CPU times: user 1.46 ms, sys: 1.38 ms, total: 2.84 ms
Wall time: 3.65 ms


Despite the GIL, threads wont get in each other's way if they are idle.

In [9]:
import IPython

In [10]:
IPython.__version__

'5.1.0'

In [28]:
#%%time
import logging
import random
import time

root = logging.getLogger()
root.handlers = []
logging.basicConfig(level=logging.DEBUG,
                    format='(%(threadName)-9s) %(message)s',)

import threading

def worker(num):
    """thread worker function"""
    
    sleep_time = random.randint(1,5)
    logging.debug('worker: {0} sleeping for {1} s, name: {2}'
                   .format(num,sleep_time,threading.current_thread().getName()))
    time.sleep(sleep_time)
    logging.debug('done')
    return

threads = []
for i in range(5):
    t = threading.Thread(target=worker, args=(i,))
    threads.append(t)
    t.start()

(Thread-67) worker: 0 sleeping for 4 s, name: Thread-67
(Thread-68) worker: 1 sleeping for 2 s, name: Thread-68
(Thread-69) worker: 2 sleeping for 1 s, name: Thread-69
(Thread-70) worker: 3 sleeping for 5 s, name: Thread-70
(Thread-71) worker: 4 sleeping for 3 s, name: Thread-71
(Thread-69) done
(Thread-68) done
(Thread-71) done
(Thread-67) done
(Thread-70) done


`.join()`

In [15]:
t.is_alive()

False

In [18]:
%%time
# not very parallel ... 
threads = []
for i in range(2):
    t = threading.Thread(target=worker, args=(i,))
    threads.append(t)
    t.start()
    t.join() # this waits for the thread to finish

(Thread-41) worker: 0 sleeping for 4 s, name: Thread-41
(Thread-41) done
(Thread-42) worker: 1 sleeping for 5 s, name: Thread-42
(Thread-42) done


CPU times: user 6.36 ms, sys: 2.94 ms, total: 9.3 ms
Wall time: 9.01 s


In [29]:
%%time
threads = []
for i in range(2):
    t = threading.Thread(target=worker, args=(i,))
    threads.append(t)

print("waiting around a bit, then starting threads",flush=True)
# time.sleep(2)

# dont have to start a thread immediately after creating them
for t in threads:
    t.start() 

for t in threads:
    t.join() # this waits for the thread to finish

print("I'm really done with all the threads.")

waiting around a bit, then starting threads


(Thread-72) worker: 0 sleeping for 5 s, name: Thread-72
(Thread-73) worker: 1 sleeping for 2 s, name: Thread-73
(Thread-73) done
(Thread-72) done


I'm really done with all the threads.
CPU times: user 6.36 ms, sys: 2.53 ms, total: 8.9 ms
Wall time: 5.01 s


A few things:

- `logging` is "thread-safe" -- so different threads can write to the log file without causing issues
- you can always get a handle to the current thread with `threading.current_thread()`

You can delay the start of the execution of a thread with `Timer`

```python
threading.Timer(interval, function, args=None, kwargs=None)
```

In [31]:
threads = []
for i in range(2):
    r = random.randint(1,5)
    t = threading.Timer(r, worker, args=(i,))
    threads.append(t)
    logging.debug("starting {0} with delay {1}"
                  .format(t.getName(),r))
    threads[-1].start()

(MainThread) starting Thread-76 with delay 5
(MainThread) starting Thread-77 with delay 3
(Thread-77) worker: 1 sleeping for 5 s, name: Thread-77
(Thread-76) worker: 0 sleeping for 4 s, name: Thread-76
(Thread-77) done
(Thread-76) done


You can share variables (safely) between threads with a `queue`:

In [34]:
from queue import Queue

q = Queue()

def worker2(num):
    sleep_time = random.randint(1,5)
    
    logging.debug('worker: {0} sleeping for {1} s, name: {2}'
                   .format(num,sleep_time,threading.current_thread().getName()))
    time.sleep(sleep_time)

    if q.empty():
        q.put(sleep_time)
        logging.debug("initiated q = {0}".format(sleep_time))
    else:
        var = q.get()
        logging.debug("var {0}".format(var))
        q.put(sleep_time + var)
        logging.debug("added {0} to the q".format(sleep_time))
        
    logging.debug('done')
    return

threads = []
for i in range(2):
    t = threading.Thread(target=worker2, args=(i,))
    threads.append(t)
    t.start()

(Thread-80) worker: 0 sleeping for 3 s, name: Thread-80
(Thread-81) worker: 1 sleeping for 1 s, name: Thread-81
(Thread-81) initiated q = 1
(Thread-81) done
(Thread-80) var 1
(Thread-80) added 3 to the q
(Thread-80) done


In [35]:
q.get()

4

Threads can also signal each other with `Event` and can thresholds for the numbers of finished threads can be created with `Barrier`. There are low-level primiatives (pushed to the UNIX \_pthreads level) called `locks` and `semaphores` that we'll not bother with here.

Threading can be done with objects. You can subclass `threading.Thread` and create your own threads that know how to run.

In [37]:
os.popen("ping -q -c2 google.com","r").readlines()

['PING google.com (216.58.192.14): 56 data bytes\n',
 '\n',
 '--- google.com ping statistics ---\n',
 '2 packets transmitted, 2 packets received, 0.0% packet loss\n',
 'round-trip min/avg/max/stddev = 3.812/3.827/3.841/0.014 ms\n']

In [45]:
# adapted from http://www.python-course.eu/threads.php
import os, re, threading

# mac
received_packages = re.compile(r"(\d).*received")


class ip_check(threading.Thread):
  
    def __init__ (self,ip):
      threading.Thread.__init__(self)
      self.ip = ip
      self.__successful_pings = -1
   
    def run(self):
      ping_out = os.popen("ping -q -c2 "+self.ip,"r")
      while True:
        lines = ping_out.readlines()
        if not lines or len(lines) < 3: 
            break
        n_received = re.findall(received_packages,lines[3])
        if n_received:
           self.__successful_pings = int(n_received[0])
        
    def status(self):
      if self.__successful_pings == 0:
         return "has no response"
      elif self.__successful_pings == 1:
         return "is alive, but 50 % package loss"
      elif self.__successful_pings == 2:
         return "is alive"
      else:
         return "not reachable"

check_results = []
for ip in ["google.com","slashdot.com","berkeley.edu","google.org"]:
   current = ip_check(ip)
   check_results.append(current)
   current.start()

for el in check_results:
   el.join()
   print("Status of", el.ip,el.status())

Status of google.com is alive
Status of slashdot.com is alive
Status of berkeley.edu is alive
Status of google.org is alive


### Breakout

Using threading, grab the titles of random 10 wikipedia webpages using https://en.wikipedia.org/wiki/Special:Random. Count the total number of characters returned over all 10 pages.

Aside: for asynchronous I/O tasks you might consider using an event loop. Use the built in `asyncio` and, for gathering webpages, use `aiohttp` (http://aiohttp.readthedocs.io/en/stable/).

In [None]:
!pip install aiohttp

In [None]:
import asyncio
from aiohttp import ClientSession
from bs4 import BeautifulSoup

async def fetch(url, session):
    async with session.get(url) as response:
        return await response.read()

async def run(loop,  r):
    url = "https://en.wikipedia.org/wiki/Special:Random"
    tasks = []

    # Fetch all responses within one Client session,
    # keep connection alive for all requests.
    async with ClientSession() as session:
        for i in range(r):
            task = asyncio.ensure_future(fetch(url, session))
            tasks.append(task)

        responses = await asyncio.gather(*tasks)
        # you now have all response bodies in this variable
    
    for resp in responses:
        print("title=",BeautifulSoup(resp, 'html.parser')
              .title.string.split("- Wikipedia")[0],"len=",len(resp))

loop = asyncio.get_event_loop()
future = asyncio.ensure_future(run(loop, 4))
loop.run_until_complete(future)