# Concurent futures

- [A QUICK INTRODUCTION TO THE CONCURRENT.FUTURES MODULE](http://masnun.com/2016/03/29/python-a-quick-introduction-to-the-concurrent-futures-module.html)
- [RealPython - Speed Up Your Python Program With Concurrency](https://realpython.com/python-concurrency/)
- [python3 非同步處理：thread, process 與 concurrent.futures 的兩三事](https://medium.com/@felixie/python3-%E9%9D%9E%E5%90%8C%E6%AD%A5%E8%99%95%E7%90%86-thread-process-%E8%88%87concurrent-futures%E5%85%A9%E4%B8%89%E4%BA%8B-f9d61fc7ccbf)

複習一下作業系統 program/process/thread 觀念，program 執行被 load 到記憶體以一個或多個 process 的形式存在。

- process 是 thread 的容器，同一個 process 間的 thread 共用資源而 process 間的資源彼此獨立。因此，同一個 process 的 thread 也共用同一個 GIL，要避開 GIL 就使用多個 process 就好了。
- multiprocess 比起 multithread 最大的優點是不受 GIL 限制，cpu bound 的 task 執行會很快。而最大的缺點就是 process 間彼此無法共享資源、溝通會受到限制。
- 不管是 multithread 或是 multiprocess，都需要耗費資源去產生 thread/process，而產生 process 的成本又比產生 thread 的成本高。

## Executors

This module features the Executor class which is an abstract class and it can not be used directly. However it has two very useful concrete subclasses – `ThreadPoolExecutor` and `ProcessPoolExecutor`. As their names suggest, one uses multi threading and the other one uses multi-processing. In both case, we get a pool of threads or processes and we can submit tasks to this pool. The pool would assign tasks to the available resources (threads or processes) and schedule them to run.

### ThreadPoolExecutor

[**Futures**](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Future)

In [5]:
from concurrent.futures import ThreadPoolExecutor
from time import sleep
 
def return_after_5_secs(message):
    sleep(5)
    return message
 
pool = ThreadPoolExecutor(3)
 
future = pool.submit(return_after_5_secs, ("hello"))
print(future.done())
sleep(5)
print(future.done())
print(future.result())

# result() Return the value returned by the call.
# If the call hasn’t yet completed then this method will wait up to timeout seconds.

False
True
hello


### ProcessPoolExecutor

In [6]:
from concurrent.futures import ProcessPoolExecutor
from time import sleep

def return_after_5_secs(message):
    sleep(5)
    return message

pool = ProcessPoolExecutor(3)

future = pool.submit(return_after_5_secs, ("hello"))
print(future.done())
sleep(5)
print(future.done())
print("Result: " + future.result())

False
False
Result: hello


### Executor.map()

Both executors have a common method – `map()`. Like the built in function, the map method allows multiple calls to a provided function, passing each of the items in an iterable to that function. Except, in this case, the functions are called concurrently. For multiprocessing, this iterable is broken into chunks and each of these chunks is passed to the function in separate processes. We can control the chunk size by passing a third parameter, `chunk_size`. By default the chunk size is 1.

In [9]:
import concurrent.futures
import urllib.request

URLS = ['http://www.foxnews.com/',
        'http://www.cnn.com/',
        'http://europe.wsj.com/',
        'http://www.bbc.co.uk/',
        'http://some-made-up-domain.com/']

# Retrieve a single page and report the url and contents
def load_url(url, timeout):
    with urllib.request.urlopen(url, timeout=timeout) as conn:
        return conn.read()


# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))

'http://www.foxnews.com/' page is 243520 bytes
'http://some-made-up-domain.com/' page is 475 bytes
'http://europe.wsj.com/' page is 954801 bytes
'http://www.bbc.co.uk/' page is 327624 bytes
'http://www.cnn.com/' page is 993570 bytes


In [10]:
import concurrent.futures
import math

PRIMES = [
    112272535095293,
    112582705942171,
    112272535095293,
    115280095190773,
    115797848077099,
    1099726899285419]

def is_prime(n):
    if n % 2 == 0:
        return False
 
    sqrt_n = int(math.floor(math.sqrt(n)))
    for i in range(3, sqrt_n + 1, 2):
        if n % i == 0:
            return False
    return True

def main():
    with concurrent.futures.ProcessPoolExecutor() as executor:
        for number, prime in zip(PRIMES, executor.map(is_prime, PRIMES)):
            print('%d is prime: %s' % (number, prime))


if __name__ == '__main__':
    main()

112272535095293 is prime: True
112582705942171 is prime: True
112272535095293 is prime: True
115280095190773 is prime: True
115797848077099 is prime: True
1099726899285419 is prime: False


### as_completed() & wait()

The `concurrent.futures` module has two functions for dealing with the futures returned by the executors. One is `as_completed()` and the other one is `wait()`.

The `as_completed()` function takes an *iterable* of Future objects and starts yielding values as soon as the futures start resolving. The main difference between the aforementioned `map` method with `as_completed` is that `map` returns the results *in the order* in which we pass the iterables. That is the first result from the map method is the result for the first item. On the other hand, the first result from the `as_completed` function is from whichever future completed first.

In [11]:
from concurrent.futures import ThreadPoolExecutor, wait, as_completed
from time import sleep
from random import randint

def return_after_5_secs(num):
    sleep(randint(1, 5))
    return "Return of {}".format(num)

pool = ThreadPoolExecutor(5)
futures = []
for x in range(5):
    futures.append(pool.submit(return_after_5_secs, x))

for x in as_completed(futures):
    print(x.result())

Return of 0
Return of 4
Return of 1
Return of 2
Return of 3


The `wait()` function would return a **named tuple** which contains two set – one set contains the futures which **completed** `done` (either got result or exception) and the other set containing the ones which **didn’t complete** `not_done`.

In [15]:
from concurrent.futures import ThreadPoolExecutor, wait, as_completed, FIRST_COMPLETED
from time import sleep
from random import randint

def return_after_5_secs(num):
    sleep(randint(1, 5))
    return "Return of {}".format(num)

pool = ThreadPoolExecutor(5)
futures = []
for x in range(5):
    futures.append(pool.submit(return_after_5_secs, x))

print(wait(futures, return_when=FIRST_COMPLETED))

DoneAndNotDoneFutures(done={<Future at 0x108d038d0 state=finished returned str>}, not_done={<Future at 0x10af48a20 state=running>, <Future at 0x10af485c0 state=running>, <Future at 0x10af48dd8 state=running>, <Future at 0x10af48278 state=running>})
