## Intro

With threads, 99% of the use cases an application programmer is likely to run into is the simple pattern of "spawning a bunch of independent threads and collecting the results in a queue". This chapter focuses on the `concurrent.futures.Executor` classes that encapsulate this pattern. 

We will also learn the concept of `futures` — objects representing the asynchronous execution of an operation (and foundational to the `asyncio` package), similar to JavaScript's `promises`.

## Concurrent Web Downloads

<span style="color:skyblue">***Concurrency is essential for efficient network I/O: instead of idly waiting for remote machines, the application should do something else until a response comes back***</span>.

We will learn 3 scripts to download 20 flags from the web: 
- `flags.py`: run downloads sequentially
- `flags_threadpool.py`: make concurrent downloads using `concurrent.futures`
- `flags_asyncio.py`: make concurrent downloads using `asyncio`

The sequential code will take the longest, while the 2 others make similar perfomance with much shorter time. And also, the order of the flags downloaded will be different in each code

### A sequential download script

In [1]:
import time
from pathlib import Path
from typing import Callable

import httpx

POP20_CC = ('CN IN US ID BR PK NG BD RU JP '
            'MX PH VN ET EG DE IR TR CD FR').split()  # country codes for the 20 most populous countries

BASE_URL = 'https://www.fluentpython.com/data/flags'  # The directory with the flag images
DEST_DIR = Path('downloaded')                         # Local directory where the images are saved.

def save_flag(img: bytes, filename: str) -> None:
    """
    Save the img bytes to filename in the DEST_DIR
    """
    (DEST_DIR / filename).write_bytes(img)

def get_flag(cc: str) -> bytes:
    """
    Given a country code, build the URL and download the image, 
    returning the binary contents of the response.
    """
    url = f'{BASE_URL}/{cc}/{cc}.gif'.lower()
    resp = httpx.get(url, timeout=6.1,  # adding time out to avoid blocking for too long
                    follow_redirects=True)  # HTTPX does not follow redirects by default
    resp.raise_for_status()  # raises an exception if the HTTP status is not in the 2XX range
    return resp.content

def download_many(cc_list: list[str]) -> int:
    """
    Key function to compare with the concurrent implementations
    """
    for cc in sorted(cc_list): # Loop over the list of country codes in alphabetical order
        image = get_flag(cc)
        save_flag(image, f'{cc}.gif')
        print(cc, end=' ', flush=True)  # Display the country code after the image is downloaded
    return len(cc_list)

def main(downloader: Callable[[list[str]], int]) -> None:
    """
    main must be called with the function that will make the downloads; 
    that way, we can use main as a library function with other 
    implementations of download_many in the threadpool and ascyncio examples
    """
    DEST_DIR.mkdir(exist_ok=True)
    t0 = time.perf_counter() # Record the elapsed time after running the downloader function
    count = downloader(POP20_CC)
    elapsed = time.perf_counter() - t0
    print(f'\n{count} downloads in {elapsed:.2f}s')

main(download_many)

BD BR CD CN DE EG ET FR ID IN IR JP MX NG PH PK RU TR US VN 
20 downloads in 15.12s


🗒️ Note: Crucially, `HTTPX` provides synchronous and asynchronous APIs, so we can use it in all HTTP client examples in this chapter and the next. Python's standard library provides the `urllib.request` module, but its API is synchronous only, and is not user friendly.

### Downloading with `concurrent.futures`

The main features of the `concurrent.futures` package are the `ThreadPoolExecutor` and `ProcessPoolExecutor` classes, which implement an API to submit callables for execution in different threads or processes, respectively. The classes transparently manage a pool of worker threads or processes, and queues to distribute jobs and collect results.

In [14]:
from concurrent import futures

# Use the `save_flag`, `get_flag`, `main` functions from the sequential code above

def download_one(cc: str):
    """
    Function to download a single image; this is what 
    each worker will execute.
    """
    image = get_flag(cc)
    save_flag(image, f'{cc}.gif')
    print(cc, end=' ', flush=True)
    return cc

def download_many_thread_pool(cc_list: list[str]) -> int:
    # Below, we instantiate the ThreadPoolExecutor as a context manager; 
    # the `executor.__exit__` method will call `executor.shutdown(wait=True)`, 
    # which will block until all threads are done.
    with futures.ThreadPoolExecutor() as executor:
        # The `map` method is similar to the built-in `map`,
        # except that the `download_one` function will be called 
        # concurrently from multiple threads; 
        # it returns a generator that you can iterate to retrieve 
        # the value returned by each function call—in this case,
        # each call to download_one will return a country code.
        res = executor.map(download_one, sorted(cc_list))

    # Fianlly we return the number of results obtained.
    # If any of the threaded calls raises an exception, 
    # that exception is raised here when the implicit `next()` call 
    # inside the list constructor tries to retrieve the corresponding 
    # return value from the iterator returned by `executor.map`.
    return len(list(res))

main(download_many_thread_pool)

CD CN EG ID FR ET BD MX NG BR JP IN PK PH IR RU TR US VN DE 
20 downloads in 0.42s


🗒️ Note: `ThreadPoolExecutor`'s most important argument is the `max_workers` which is the max number of worker threads to be executed. By default, `max_workers = min(32, os.cpu_count() + 4)`

### Where Are the `Future`s?

Futures are core components of `concurrent.futures` and of `asyncio`, but as users of these libraries we sometimes don’t see them (they run behind the scene).

Since Python 3.4, there are two classes named `Future` in the standard library: `concurrent.futures.Future` and `asyncio.Future`. They serve the same purpose: 
<span style="color:skyblue">***an instance of either `Future` class represents a deferred computation that may or may not have completed. `Future`s encapsulate pending operations so that we can put them in queues, check whether they are done, and retrieve results (or exceptions) when they become available.***</span>. (Similar to the `Deferred` class in `Twisted`, the `Future` class in `Tornado`, and `Promise` in modern JavaScript).

🗒️ Note: <span style="color:orange">*Programmers should not create or change the state of a `Future`:
they are meant to be instantiated exclusively by the concurrency framework, be it `concurrent.futures` or `asyncio`*</span>. Here is why: a `Future` represents something that will eventually run, therefore it must be scheduled to run, and that’s the job of the framework. In particular, `concurrent.futures.Future` instances are created only as the result of submitting a callable for execution with a `concurrent.futures.Executor` subclass. For example, the `Executor.submit()` method takes a callable, schedules it to run, and returns a `Future`

Let's look more closely into `Future` by replacing `executor.map` with `executor.submit` and `futures.as_completed` in the `download_many` function

In [13]:
def download_many_futures(cc_list: list[str]) -> int:
    cc_list = cc_list[:5] # use only the top five most populous countries
    with futures.ThreadPoolExecutor(max_workers=3) as executor:  # Set max_workers to 3 
                                                                 # so we can see pending futures in the output
        to_do: list[futures.Future] = []
        for cc in sorted(cc_list):  # Iterate over country codes alphabetically, 
                                    # to make it clear that results will arrive out of order.
            future = executor.submit(download_one, cc)  # executor.submit schedules the 
                                                        # callable to be executed, and returns a future representing this pending operation.
            to_do.append(future)  # Store each future so we can later retrieve them with as_completed
            print(f'Scheduled for {cc}: {future}')  # Display a message with the country code and the respective future
        
        for count, future in enumerate(futures.as_completed(to_do), 1):  # as_completed yields futures as they are completed
            res: str = future.result()  # Get the result of this future
            print(f'{future} result: {res!r}') # Display the future and its result.
    
    return count

main(download_many_futures)

Scheduled for BR: <Future at 0x7110b7994f10 state=running>
Scheduled for CN: <Future at 0x7110b684b4d0 state=running>
Scheduled for ID: <Future at 0x7110b684a150 state=running>
Scheduled for IN: <Future at 0x7110b6dd2810 state=pending>
Scheduled for US: <Future at 0x7110b68788d0 state=pending>
BR CN <Future at 0x7110b7994f10 state=finished returned str> result: 'BR'
<Future at 0x7110b684b4d0 state=finished returned str> result: 'CN'
ID <Future at 0x7110b684a150 state=finished returned str> result: 'ID'
IN <Future at 0x7110b6dd2810 state=finished returned str> result: 'IN'
US <Future at 0x7110b68788d0 state=finished returned str> result: 'US'

5 downloads in 0.39s


From the output of `download_many_futures`, we can see that 
- The futures are scheduled in alphabetical order; the `repr()` of a future shows its state: the first three are running, because there are three worker threads.
- The last two futures are pending, waiting for worker threads.
- Running the function several times will give different output orders

## Launching Processes with `concurrent.futures`

The package enables parallel computation on multicore machines because it supports distributing work among multiple Python processes using the `ProcessPoolExecutor` class. Both `ProcessPoolExecutor` and `ThreadPoolExecutor` implement the `Executor`
interface, so it’s easy to switch from a thread-based to a process-based solution using `concurrent.futures`.

In [16]:
def download_many_process_pool(cc_list: list[str]) -> int:
    # below we use the ProcessPoolExecutor instead of ThreadPoolExecutor
    with futures.ProcessPoolExecutor() as executor:
        res = executor.map(download_one, sorted(cc_list))

    return len(list(res))

main(download_many_process_pool)

ID BR FRDE  EG ETIN  BD CN JP IR CD MX NG PK US RU PH VN TR 
20 downloads in 0.44s


### Multicore Prime Checker Redux
In chapter 19, we wrote a program that checked the primality of some large numbers using `multiprocessing` (`procs.py`). Let's do the same thing using `ProcessPoolExecutor`

In [18]:
from concurrent import futures # No need to import multiprocessing, SimpleQueue... 
                            # since `concurrent.futures` hides all that
from time import perf_counter
from typing import NamedTuple

import math


PRIME_FIXTURE = [
    (2, True),
    (142702110479723, True),
    (299593572317531, True),
    (3333333333333301, True),
    (3333333333333333, False),
    (3333335652092209, False),
    (4444444444444423, True),
    (4444444444444444, False),
    (4444444488888889, False),
    (5555553133149889, False),
    (5555555555555503, True),
    (5555555555555555, False),
    (6666666666666666, False),
    (6666666666666719, True),
    (6666667141414921, False),
    (7777777536340681, False),
    (7777777777777753, True),
    (7777777777777777, False),
    (9999999999999917, True),
    (9999999999999999, False),
]

NUMBERS = [n for n, _ in PRIME_FIXTURE]

def is_prime(n: int) -> bool:
    if n < 2:
        return False
    if n == 2:
        return True
    if n % 2 == 0:
        return False

    root = math.isqrt(n)
    for i in range(3, root + 1, 2):
        if n % i == 0:
            return False
    return True

class PrimeResult(NamedTuple):
    n: int
    flag: bool
    elapsed: float

def check(n: int) -> PrimeResult:
    t0 = perf_counter()
    res = is_prime(n)
    return PrimeResult(n, res, perf_counter() - t0)

def main() -> None:
    executor = futures.ProcessPoolExecutor()
    actual_workers = executor._max_workers  # used to show the number of workers used

    print(f'Checking {len(NUMBERS)} numbers with {actual_workers} processes:')

    t0 = perf_counter()

    numbers = sorted(NUMBERS, reverse=True)  # Sort the numbers to be checked in descending order
    with executor:  # Use the executor as a context manager
        # below, the `executor.map` call returns the `PrimeResult` instances returned 
        # by `check` in the same order as the `numbers` arguments
        for n, prime, elapsed in executor.map(check, numbers):
            label = 'P' if prime else ' '
            print(f'{n:16}  {label} {elapsed:9.6f}s')

    time = perf_counter() - t0
    print(f'Total time: {time:.2f}s')

main()

Checking 20 numbers with 12 processes:
9999999999999999     0.000035s
9999999999999917  P  4.447117s
7777777777777777     0.000018s
7777777777777753  P  4.263446s
7777777536340681     3.885129s
6666667141414921     4.045834s
6666666666666719  P  3.815436s
6666666666666666     0.000006s
5555555555555555     0.000019s
5555555555555503  P  3.729342s
5555553133149889     3.923309s
4444444488888889     3.494794s
4444444444444444     0.000001s
4444444444444423  P  3.373819s
3333335652092209     3.317096s
3333333333333333     0.000003s
3333333333333301  P  3.020736s
 299593572317531  P  1.151811s
 142702110479723  P  0.707913s
               2  P  0.000002s
Total time: 4.50s


🗒️ Note: The ordering of the output of `procs.py` which uses `multiprocessing` in chapter 19 is heavily influenced by the difficulty in checking whether each number is a prime. In contrast, the results appearing using `ProcessPoolExecutor` above are in strict descending order. The reasons is that `executor.map(check, numbers)` returns the result in the same order as the `numbers` are given, and while the process checking `9999999999999917` takes very long, all other processes will be checking other number, and when the worker in charge of `9999999999999917` finally determines that’s a prime, all the other processes have completed their last jobs, so the results appear immediately after.

## Experimenting with `Executor.map`

To learn more about how concurrent programs behave, let's study another example of the `map` method of `ThreadPoolExecutor` with 3 workers running 5 callables that output timestamped messages

In [24]:
from time import sleep, strftime
from concurrent import futures

def display(*args):
    """
    Simply prints all the arguments it gets, preceded by a timestamp
    """
    print(strftime('[%H:%M:%S]'), end=' ')
    print(*args)

def loiter(n):
    """
    display a message when it starts, sleep for n seconds,
    then display a message when it ends
    """
    msg = '{}loiter({}): doing nothing for {}s...'
    display(msg.format('\t'*n, n, n))
    sleep(n)
    msg = '{}loiter({}): done.'
    display(msg.format('\t'*n, n))
    return n * 10  # loiter returns n * 10 so we can see how to collect results

def main():
    display('Script starting.')
    executor = futures.ThreadPoolExecutor(max_workers=3)  # three threads running five callables
    results = executor.map(loiter, range(5))  # Submit five tasks to the executor. Since 
                                            # there are only three threads, only three
                                            # of those tasks will start immediately: 
                                            # the calls loiter(0), loiter(1), and loiter(2)
                                            # this is a nonblocking call.
    display('results:', results)  # Immediately display the results of 
                                # invoking `executor.map`: it’s a generator
    display('Waiting for individual results:')
    for i, result in enumerate(results):  # The enumerate call in the `for` loop will implicitly invoke `next(results)`, which
                                        #in turn will invoke `_f.result()` on the (internal) `_f` future representing the first
                                        # call, `loiter(0)`. The `result` method will block until the future is done, therefore
                                        # each iteration in this loop will have to wait for the next `result` to be ready.
        display(f'result {i}: {result}')

main()

[16:19:25] Script starting.
[16:19:25] loiter(0): doing nothing for 0s...
[16:19:25] loiter(0): done.
[16:19:25] 	loiter(1): doing nothing for 1s...
[16:19:25] 		loiter(2): doing nothing for 2s...
[16:19:25] 			loiter(3): doing nothing for 3s...
[16:19:25] results: <generator object Executor.map.<locals>.result_iterator at 0x7110b68f9340>
[16:19:25] Waiting for individual results:
[16:19:25] result 0: 0
[16:19:26] 	loiter(1): done.
[16:19:26] 				loiter(4): doing nothing for 4s...
[16:19:26] result 1: 10
[16:19:27] 		loiter(2): done.
[16:19:27] result 2: 20
[16:19:28] 			loiter(3): done.
[16:19:28] result 3: 30
[16:19:30] 				loiter(4): done.
[16:19:30] result 4: 40


The `Executor.map` function is easy to use, but often it’s preferable to get the results as they are ready, regardless of the order they were submitted. To do that, we need a combination of the `Executor.submit` method and the `futures.as_completed` function. The combination of `executor.submit` and `futures.as_completed` is more flexible than `executor.map` because 
- you can submit different callables and arguments, while `executor.map` is designed
to run the same callable on the different arguments.
- the set of futures you pass to `futures.as_completed` may come from more than one executor—perhaps some were created by a `ThreadPoolExecutor` instance, while others are from a `ProcessPoolExecutor`.

## Downloads with Progress Display and Error Handling

Let's rewrite the downloading flags program with error handlings to make them easier to read and to contrast the structure of the three approaches: *sequential*, *threaded*, and *asynchronous*. The code are in `flags2/`

- `flags2_common.py`: This module contains common functions and settings used by all flags2 examples, including a main function, which takes care of command-line parsing, timing, and reporting results.
- `flags2_sequential.py`: A sequential HTTP client with proper error handling and progress bar display. Its `download_one` function is also used by `flags2_threadpool.py`.
- `flags2_threadpool.py`: Concurrent HTTP client based on `futures.ThreadPoolExecutor` to demonstrate error handling and integration of the progress bar.
- `flags2_asyncio.py`: Same functionality as the previous example, but implemented with `asyncio` and `httpx`.

In [25]:
!python3 flags2/flags2_threadpool.py -h

usage: flags2_threadpool.py [-h] [-a] [-e] [-l N] [-m CONCURRENT] [-s LABEL]
                            [-v]
                            [CC ...]

Download flags for country codes. Default: top 20 countries by population.

positional arguments:
  CC                    country code or 1st letter (eg. B for BA...BZ)

options:
  -h, --help            show this help message and exit
  -a, --all             get all available flags (AD to ZW)
  -e, --every           get flags for every possible code (AA...ZZ)
  -l N, --limit N       limit to N first codes
  -m CONCURRENT, --max_req CONCURRENT
                        maximum concurrent requests (default=30)
  -s LABEL, --server LABEL
                        Server to hit; one of DELAY, ERROR, LOCAL, REMOTE
                        (default=LOCAL)
  -v, --verbose         output detailed progress info


In [33]:
!python3 flags2/flags2_sequential.py -s REMOTE

REMOTE site: https://www.fluentpython.com/data/flags
Searching for 20 flags: from BD to VN
1 connection will be used.
100%|███████████████████████████████████████████| 20/20 [00:09<00:00,  2.09it/s]
--------------------
 20 flags downloaded.
Elapsed time: 9.57s


In [39]:
!python flags2/flags2_threadpool.py -s REMOTE -m 5

REMOTE site: https://www.fluentpython.com/data/flags
Searching for 20 flags: from BD to VN
5 concurrent connections will be used.
100%|███████████████████████████████████████████| 20/20 [00:05<00:00,  3.85it/s]
--------------------
 20 flags downloaded.
Elapsed time: 5.23s


In [40]:
!python flags2/flags2_asyncio.py -s REMOTE -m 10

REMOTE site: https://www.fluentpython.com/data/flags
Searching for 20 flags: from BD to VN
10 concurrent connections will be used.
100%|███████████████████████████████████████████| 20/20 [00:02<00:00,  9.35it/s]
--------------------
 20 flags downloaded.
Elapsed time: 2.15s


## Summary

## Further Reading