# Launching parallel tasks with concurrent futures

https://docs.python.org/3/library/concurrent.futures.html

Python standard library also houses a module called the concurrent.futures. This module was added in Python 3.2 for **providing the developers a high-level interface to launch asynchronous tasks**. It’s a generalized abstraction layer on top of threading and multiprocessing modules for providing an interface to run tasks concurrently using pools of threads or processes. It’s the perfect tool when you just want to run a piece of eligible code concurrently and **don’t need the added modularity that the threading and multiprocessing APIs expose**.

The  `concurrent.futures` module provides a high-level interface for asynchronously executing callables.

The asynchronous execution can be performed with threads, using `ThreadPoolExecutor`, or separate processes, using `ProcessPoolExecutor`. Both implement the same interface, which is defined by the abstract Executor class.

Internally, **these two classes interact with the pools and manage the workers**. Futures are used for managing results computed by the workers. To use a pool of workers:
1. an application creates an instance of the appropriate executor class and then submits them for it to run. 
2. When each task is started, a Future instance is returned. 
    - When the result of the task is needed, an application can use the Future object to block until the result is available. 
   
Various APIs are provided to make it convenient to wait for tasks to complete, so that the Future objects do not need to be managed directly.

## Executors

This module features the `Executor` class which is an abstract class and it can not be used directly. However it has two very useful concrete subclasses – `ThreadPoolExecutor` and `ProcessPoolExecutor`. As their names suggest, one uses multi threading and the other one uses multi-processing. In both case, we get a pool of threads or processes and we can submit tasks to this pool. The pool would assign tasks to the available resources (threads or processes) and schedule them to run.

Since both ThreadPoolExecutor and ProcessPoolExecutor have the same API interface, in both cases I’ll primarily talk about two methods that they provide. Their descriptions have been collected from the official docs verbatim.

### submit(fn, args, *kwargs)

> `submit(fn, /, *args, **kwargs)`: Schedules the callable, fn, to be executed as fn(*args, **kwargs) and returns a Future object representing the execution of the callable.

In [8]:
# 01_example.py
from concurrent.futures import ThreadPoolExecutor
from time import sleep

def return_after_5_secs(message):
    sleep(5)
    return message

pool = ThreadPoolExecutor(3)

future1 = pool.submit(return_after_5_secs, ("hello1"))
future2 = pool.submit(return_after_5_secs, ("hello2"))

print(future1.done())
sleep(6)
print(future1.done())

print(f"Resutl 1: {future1.result()}")
print(f"Resutl 2: {future2.result()}")
pool.shutdown()

False
True
Resutl 1: hello1
Resutl 2: hello2


I hope the code is pretty self explanatory. We first construct a ThreadPoolExecutor with the number of threads we want in the pool. By default the number is 5 but we chose to use 3 just because we can ;-). Then we submitted a task to the thread pool executor which waits 5 seconds before returning the message it gets as it’s first argument.

**When we `submit()` a task, we get back a `Future`**

As we can see in the docs, the `Future` object has a method – `done()` which tells us if the future has resolved, that is a value has been set for that particular future object. When a task finishes (returns a value or is interrupted by an exception), the thread pool executor sets the value to the future object.

In our example, the task doesn’t complete until 5 seconds, so the first call to `done()` will return `False`. We take a really short nap for 5 secs and then it’s done. We can get the result of the future by calling the `result()` method on it.

> `done()`: Return True if the call was successfully cancelled or finished running.

> `result(timeout=None)`: Return the value returned by the call. If the call hasn’t yet completed then this method will wait up to timeout seconds. If the call hasn’t completed in timeout seconds, then a `concurrent.futures.TimeoutError` will be raised. timeout can be an int or float. If timeout is not specified or None, there is no limit to the wait time. If the future is cancelled before completing then CancelledError will be raised. If the call raised an exception, this method will raise the same exception.

> `shutdown(wait=True, *, cancel_futures=False)`: Signal the executor that it should free any resources that it is using when the currently pending futures are done executing. Calls to Executor.submit() and Executor.map() made after shutdown will raise RuntimeError.

### map(func, *iterables, timeout=None, chunksize=1)

Similar to map(func, *iterables) except:
- the iterables are collected immediately rather than lazily;
- func is executed asynchronously and several calls to func may be made concurrently.

The returned iterator raises a `concurrent.futures.TimeoutError` if `__next__()` is called and the result isn’t available after timeout seconds from the original call to `Executor.map()`. Timeout can be an int or a float. If timeout is not specified or None, there is no limit to the wait time.

If a func call raises an exception, then that exception will be raised when its value is retrieved from the iterator.

> When using ProcessPoolExecutor, this method chops iterables into a number of chunks which it submits to the pool as separate tasks. The (approximate) size of these chunks can be specified by setting chunksize to a positive integer. For **very long iterables, using a large value for chunksize can significantly improve performance** compared to the default size of 1. **With `ThreadPoolExecutor`, chunksize has no effect**.

In [5]:
# 02_example.py
from concurrent.futures import ThreadPoolExecutor
from time import sleep

def return_after_5_secs(message):
    sleep(5)
    return message

with ThreadPoolExecutor(max_workers=3) as executor:
    results = executor.map(return_after_5_secs, ("hello1", "hello2"))
    for result in results:
        print(result)
    


hello1
hello2


Both executors have a common method – `map()`. Like the built in function, the **map method allows multiple calls to a provided function, passing each of the items in an iterable to that function**. Except, in this case, the functions are called concurrently. For multiprocessing, this iterable is broken into chunks and each of these chunks is passed to the function in separate processes. We can control the chunk size by passing a third parameter, `chunk_size`. By default the chunk size is 1.

## Generic Workflows for Running Tasks Concurrently

A lot of scripts contains some variants of the following:

In [None]:
for task in get_tasks():
    perform(task)

Here, `get_tasks` returns an iterable that contains the target tasks or arguments on which a particular task function needs to applied. **Tasks are usually blocking callables and they run one after another**, with only one task running at a time. The logic is simple to reason with because of its sequential execution flow. This is fine when the number of tasks is small or the execution time requirement and complexity of the individual tasks is low. However, this can quickly get out of hands when the number of tasks is huge or the individual tasks are time consuming.

A general rule of thumb is using:
- `ThreadPoolExecutor` when the tasks are primarily I/O bound like - sending multiple http requests to many urls, saving a large number of files to disk etc. 
- `ProcessPoolExecutor` should be used in tasks that are primarily CPU bound like - running callables that are computation heavy, applying pre-process methods over a large number of images, manipulating many text files at once etc.

### Running Tasks with Executor.submit

When you have a number of tasks, you can schedule them in one go and wait for them all to complete and then you can collect the results.

In [15]:
# 03_example.py
import time
from concurrent import futures
from concurrent.futures import ThreadPoolExecutor

def get_tasks():
    return list(range(20))
    

def perform(number):
    time.sleep(1)
    return number * number
    
    

with ThreadPoolExecutor(max_workers=10) as executor:
    futures_set = {executor.submit(perform, task) for task in get_tasks()}

    for fut in futures.as_completed(futures_set):
        print(f"The power is {fut.result()}")
        # vidimo ni vrstnega reda

The power is 0
The power is 1
The power is 4
The power is 9
The power is 25
The power is 36
The power is 49
The power is 16
The power is 64
The power is 81
The power is 100
The power is 121
The power is 144
The power is 169
The power is 196
The power is 289
The power is 256
The power is 225
The power is 324
The power is 361


Here you start by creating an Executor, which manages all the tasks that are running – either in separate processes or threads. **Using the with statement creates a context manager, which ensures any stray threads or processes get cleaned up via calling** the `executor.shutdown()` method implicitly when you’re done.

Then a **set comprehension has been used here to start all the tasks**. The `executor.submit()` method schedules each task. **This creates a Future object, which represents the task to be done.** Once all the tasks have been scheduled, the method `concurrent.futures_as_completed()` is called, which **yields the futures as they’re done** – that is, as each task completes. The `fut.result()` method **gives you the return value of perform(task)**, or throws an exception in case of failure.

The `executor.submit()` method **schedules the tasks asynchronously and doesn’t hold any contexts regarding the original tasks**. So if you want to map the results with the original tasks, you need to track those yourself.

In [16]:
# 04_example.py
import time
from concurrent import futures
from concurrent.futures import ThreadPoolExecutor


def get_tasks():
    return list(range(20))
    

def perform(number):
    time.sleep(1)
    return number * number
    
with ThreadPoolExecutor(max_workers=10) as executor:
    futures_set = {executor.submit(perform, task): task for task in get_tasks()}

    for fut in futures.as_completed(futures_set):
        original_task = futures_set[fut]
        print(f"The power of {original_task} is {fut.result()}")

The power of 0 is 0
The power of 1 is 1
The power of 2 is 4
The power of 3 is 9
The power of 4 is 16
The power of 5 is 25
The power of 6 is 36
The power of 7 is 49
The power of 8 is 64
The power of 9 is 81
The power of 10 is 100
The power of 11 is 121
The power of 13 is 169
The power of 12 is 144
The power of 14 is 196
The power of 15 is 225
The power of 16 is 256
The power of 17 is 289
The power of 18 is 324
The power of 19 is 361


Notice the variable futures where the original tasks are mapped with their corresponding futures using a dictionary.

### Running Tasks with Executor.map

Another way the results can be collected in the same order they’re scheduled is via using executor.map() method.

In [17]:
# 05_example.py
from concurrent.futures import ThreadPoolExecutor
import time

def get_tasks():
    return list(range(20))
    

def perform(number):
    time.sleep(1)
    return number * number

with ThreadPoolExecutor(max_workers=10) as executor:
    for arg, res in zip(get_tasks(), executor.map(perform, get_tasks())):
        print(f"The power of {arg} is {res}")

The power of 0 is 0
The power of 1 is 1
The power of 2 is 4
The power of 3 is 9
The power of 4 is 16
The power of 5 is 25
The power of 6 is 36
The power of 7 is 49
The power of 8 is 64
The power of 9 is 81
The power of 10 is 100
The power of 11 is 121
The power of 12 is 144
The power of 13 is 169
The power of 14 is 196
The power of 15 is 225
The power of 16 is 256
The power of 17 is 289
The power of 18 is 324
The power of 19 is 361


Notice how the map function takes the entire iterable at once. It **spits out the results immediately rather than lazily and in the same order they’re scheduled**. If any unhandled exception occurs during the operation, it’ll also be raised immediately and the execution won’t go any further.

In Python 3.5+, `executor.map()` receives an optional argument: `chunksize`. While using `ProcessPoolExecutor`, for very long iterables, using a large value for chunksize can significantly improve performance compared to the default size of 1. With `ThreadPoolExecutor`, chunksize has no effect.

### wait()

The `wait()` function would return a named tuple which contains two set – one set contains the futures which completed (either got result or exception) and the other set containing the ones which didn’t complete.

> `concurrent.futures.wait(fs, timeout=None, return_when=ALL_COMPLETED)`: Wait for the Future instances (possibly created by different Executor instances) given by fs to complete. Duplicate futures given to fs are removed and will be returned only once. Returns a named 2-tuple of sets. The first set, named done, contains the futures that completed (finished or cancelled futures) before the wait completed. The second set, named not_done, contains the futures that did not complete (pending or running futures).



In [30]:
from concurrent.futures import ThreadPoolExecutor, wait
from time import sleep
from random import randint

def return_after_5_secs(num):
    sleep(randint(1, 5))
    if num == 3:
        raise ValueError
    return f"Return of {num}"

pool = ThreadPoolExecutor(5)
futures = []

for x in range(5):
    futures.append(pool.submit(return_after_5_secs, x))

done_futures = wait(futures)

In [44]:
print(done_futures.done)

{<Future at 0x7fc1f63a6a00 state=finished returned str>, <Future at 0x7fc1f63a6c70 state=finished returned str>, <Future at 0x7fc1f6fc1e80 state=finished returned str>, <Future at 0x7fc1f6550550 state=finished returned str>, <Future at 0x7fc1f63a23a0 state=finished raised ValueError>}


In [50]:
for df in done_futures.done:
    try:
        print(df.result())
    except ValueError as e:
        print("Error")

Return of 4
Return of 1
Return of 2
Return of 0
Error


We can control the behavior of the `wait` function by defining when it should return. We can pass one of these values to the `return_when` param of the function: `FIRST_COMPLETED`, `FIRST_EXCEPTION` and `ALL_COMPLETED`. By default, it’s set to `ALL_COMPLETED`, so the wait function returns only when all futures complete. But using that parameter, we can choose to return when the first future completes or first exception encounters.

## Download & Save Files from URLs with Multi-threading

Before proceeding with the examples, let’s write a small decorator that’ll be helpful to measure and compare the execution time between concurrent and sequential code.

In [18]:
# 06_example.py
import time
from functools import wraps


def timeit(method):
    @wraps(method)
    def wrapper(*args, **kwargs):
        start_time = time.time()
        result = method(*args, **kwargs)
        end_time = time.time()
        print(f"{method.__name__} => {(end_time-start_time)*1000} ms")

        return result

    return wrapper

In [22]:
@timeit
def func(n):
    return list(range(n))

a = func(5)

func => 0.03600120544433594 ms


This will print out the name of the method and how long it took to execute it.

First, let’s download some pdf files from a bunch of URLs and save them to the disk. This is presumably an I/O bound task and we’ll be using the ThreadPoolExecutor class to carry out the operation.

In [None]:
# 06_example.py
from pathlib import Path
import urllib.request
from concurrent.futures import ThreadPoolExecutor


def download_one(url):
    """
    Downloads the specified URL and saves it to disk
    """

    req = urllib.request.urlopen(url)
    fullpath = Path(url)
    fname = fullpath.name
    fname_path = Path(__file__).parent.joinpath(fname)
    ext = fullpath.suffix

    if not ext:
        raise RuntimeError("URL does not contain an extension")

    with open(fname_path, "wb") as handle:
        while True:
            chunk = req.read(1024)
            if not chunk:
                break
            handle.write(chunk)

    msg = f"Finished downloading {fname}"
    return msg


@timeit
def download_all(urls):
    """
    Create a thread pool and download specified urls
    """
    with ThreadPoolExecutor(max_workers=13) as executor:
        return executor.map(download_one, urls, timeout=60)



if __name__ == "__main__":
    urls = (
        "http://www.irs.gov/pub/irs-pdf/f1040.pdf",
        "http://www.irs.gov/pub/irs-pdf/f1040a.pdf",
        "http://www.irs.gov/pub/irs-pdf/f1040ez.pdf",
        "http://www.irs.gov/pub/irs-pdf/f1040es.pdf",
        "http://www.irs.gov/pub/irs-pdf/f1040sb.pdf",
    )

    results = download_all(urls)
    for result in results:
        print(result)

In the above code snippet, I have primary defined two functions. The download_one function downloads a pdf file from a given URL and saves it to the disk. It checks whether the file in URL has an extension and in the absence of an extension, it raises RunTimeError. If an extension is found in the file name, it downloads the file chunk by chunk and saves to the disk.

Notice in this concurrent version, the `download_one` function is the same as before but in the `download_all` function, a ThreadPoolExecutor context manager wraps the `executor.map()` method. The download_one function is passed into the map along with the iterable containing the URLs. The timeout parameter determines how long a thread will spend before giving up on a single task in the pipeline. The max_workers means how many worker you want to deploy to spawn and manage the threads. A general rule of thumb is using `2 * multiprocessing.cpu_count() + 1`. My machine has 6 physical cores with 12 threads. So 13 is the value I chose.

> Note: You can also try running the above functions with ProcessPoolExecutor via the same interface and notice that the threaded version performs slightly better than due to the nature of the task.

There is one small problem with the example above. The `executor.map()` method **returns a generator which allows to iterate through the results once ready**. That means **if any error occurs inside map, it’s not possible to handle that and resume the generator** after the exception occurs. From PEP255:

> If an unhandled exception– including, but not limited to, StopIteration –is raised by, or passes through, a generator function, then the exception is passed on to the caller in the usual way, and subsequent attempts to resume the generator function raise StopIteration. In other words, **an unhandled exception terminates a generator’s useful life**.

> Pokažemo najprej pokvarjen URL brez error handlinga.

To get around that, you can use the `executor.submit()` method to create futures, accumulated the futures in a list, iterate through the futures and handle the exceptions manually. See the following example:

In [None]:
# 07_example.py
import time
import urllib.request
from concurrent.futures import ThreadPoolExecutor
from functools import wraps
from pathlib import Path


def timeit(method):
    @wraps(method)
    def wrapper(*args, **kwargs):
        start_time = time.time()
        result = method(*args, **kwargs)
        end_time = time.time()
        print(f"{method.__name__} => {(end_time-start_time)*1000} ms")

        return result

    return wrapper


def download_one(url):
    """
    Downloads the specified URL and saves it to disk
    """

    req = urllib.request.urlopen(url)
    fullpath = Path(url)
    fname = fullpath.name
    fname_path = Path(__file__).parent.joinpath(fname)
    ext = fullpath.suffix

    if not ext:
        raise RuntimeError("URL does not contain an extension")

    with open(fname_path, "wb") as handle:
        while True:
            chunk = req.read(1024)
            if not chunk:
                break
            handle.write(chunk)

    msg = f"Finished downloading {fname}"
    return msg


@timeit
def download_all(urls):
    """
    Create a thread pool and download specified urls
    """

    futures_list = []
    results = []

    with ThreadPoolExecutor(max_workers=13) as executor:
        for url in urls:
            futures = executor.submit(download_one, url)
            futures_list.append(futures)

        for future in futures_list:
            try:
                result = future.result(timeout=60)
                results.append(result)
            except Exception as exc:
                results.append(None)
                print(exc)
    return results


if __name__ == "__main__":
    urls = (
        "http://www.irs.gov/pub/irs-pdf/f1040.pdf",
        "http://www.irs.gov/pub/irs-pdf/f1040",
        "http://www.irs.gov/pub/irs-pdf/f1040ez.pdf",
        "http://www.irs.gov/pub/irs-pdf/f1040es.pdf",
        "http://www.irs.gov/pub/irs-pdf/f1040sb.pdf",
    )

    results = download_all(urls)
    for result in results:
        print(result)

## Running Multiple CPU Bound Subroutines with Multi-processing

The following example shows a CPU bound hashing function. The primary function will sequentially run a compute intensive hash algorithm multiple times. Then another function will again run the primary function multiple times. 

In [28]:
import hashlib
from concurrent.futures import ProcessPoolExecutor
from functools import wraps
import time

def timeit(method):
    @wraps(method)
    def wrapper(*args, **kwargs):
        start_time = time.time()
        result = method(*args, **kwargs)
        end_time = time.time()
        print(f"{method.__name__} => {(end_time-start_time)*1000} ms")

        return result

    return wrapper

def hash_one(n):
    """A somewhat CPU-intensive task."""

    for i in range(1, n):
        hashlib.pbkdf2_hmac("sha256", b"password", b"salt", i * 10000)

    return f"done {n}"


@timeit
def hash_all(n):
    
    with ProcessPoolExecutor(max_workers=10) as executor:
        for arg, res in zip(range(n), executor.map(hash_one, range(n), chunksize=2)):
            print(arg, res)

    return "done"


if __name__ == "__main__":
    hash_all(10)

0 done 0
1 done 1
2 done 2
3 done 3
4 done 4
5 done 5
6 done 6
7 done 7
8 done 8
9 done 9
hash_all => 1151.2749195098877 ms


If you analyze the hash_one and hash_all functions, you can see that together, they are actually running two compute intensive nested for loops. 

If you look closely, even in the concurrent version, the for loop in hash_one function is running sequentially. However, the other for loop in the hash_all function is being executed through multiple processes. Here, I have used 10 workers and a chunksize of 2. The number of workers and chunksize were adjusted to achieve maximum performance.

However, **we can not use any objects that is not picklable**. So we need to carefully choose what we use/return inside the callable passed to process pool executor.

> Things that are usually not pickable are, for example, sockets, file(handler)s, database connections, and so on. Everything that's build up (recursively) from basic python types (dicts, lists, primitives, objects, object references, even circular) can be pickled by default.

## Deadlock

 If your task at hand requires queuing, spawning multiple threads from multiple processes then you will still need to resort to the lower level threading and multiprocessing modules.

Another pitfall of using concurrency is **deadlock situations** that might occur while using ThreadPoolExecutor. When a callable associated with a Future waits on the results of another Future, they might never release their control of the threads and cause deadlock. Let’s see a slightly modified example from the official docs.

In [None]:
import time
from concurrent.futures import ThreadPoolExecutor


def wait_on_b():
    time.sleep(5)
    print(b.result())  # b will never complete because it is waiting on a.
    return 5


def wait_on_a():
    time.sleep(5)
    print(a.result())  # a will never complete because it is waiting on b.
    return 6


with ThreadPoolExecutor(max_workers=2) as executor:
    # here, the future from a depends on the future from b
    # and vice versa
    # so this is never going to be completed
    a = executor.submit(wait_on_b)
    b = executor.submit(wait_on_a)

    print("Result from wait_on_b", a.result())
    print("Result from wait_on_a", b.result())

In the above example, function wait_on_b depends on the result (result of the Future object) of function wait_on_a and at the same time the later function’s result depends on that of the former function. So the code block in the context manager will never execute due to having inter dependencies. This creates the deadlock situation. 

## Combining Asyncio with Multiprocessing

What if I need to combine many I/O operations with heavy calculations?

We can do that too. Say you need to scrape 100 web pages for a specific piece of information, and then you need to save that piece of info in a file for later. We can separate the compute power across each of our computer's cores by making each process scrape a fraction of the pages.

For this script, let's install Beautiful Soup to help us easily scrape our pages: pip install beautifulsoup4. This time we actually have quite a few imports. Here they are, and here's why we're using them:

In [None]:
# asyncio_mulitproc.py
import asyncio
import concurrent.futures
import time
from math import floor
from multiprocessing import cpu_count
from pathlib import Path

import aiofiles
import aiohttp
from bs4 import BeautifulSoup

First, we're going to create an async function that makes a request to Wikipedia to get back random pages. We'll scrape each page we get back for its title using BeautifulSoup, and then we'll append it to a given file; we'll separate each title with a tab. The function will take two arguments:
- num_pages - Number of pages to request and scrape for titles
- output_file - The file to append our titles to

In [None]:
async def get_and_scrape_pages(num_pages: int, output_file: str):
    """
    Makes {{ num_pages }} requests to Wikipedia to receive {{ num_pages }} random
    articles, then scrapes each page for its title and appends it to {{ output_file }},
    separating each title with a tab: "\\t"
    #### Arguments
    ---
    num_pages: int -
        Number of random Wikipedia pages to request and scrape
    output_file: str -
        File to append titles to
    """
    async with \
    aiohttp.ClientSession() as client, \
    aiofiles.open(output_file, "a+", encoding="utf-8") as f:

        for _ in range(num_pages):
            async with client.get("https://en.wikipedia.org/wiki/Special:Random") as response:
                if response.status > 399:
                    # I was getting a 429 Too Many Requests at a higher volume of requests
                    response.raise_for_status()

                page = await response.text()
                soup = BeautifulSoup(page, features="html.parser")
                title = soup.find("h1").text

                await f.write(title + "\t")

        await f.write("\n")

We're both asynchronously opening an aiohttp `ClientSession` and our output file. The mode, a+, means append to the file and create it if it doesn't already exist. Encoding our strings as utf-8 ensures we don't get an error if our titles contain international characters. If we get an error response, we'll raise it instead of continuing (at high request volumes I was getting a 429 Too Many Requests). We asynchronously get the text from our response, then we parse the title and asynchronously and append it to our file. After we append all of our titles, we append a new line: "\n".

Our next function is the function we'll start with each new process to allow running it asynchronously:

In [None]:
def start_scraping(num_pages: int, output_file: str, i: int):
    """ Starts an async process for requesting and scraping Wikipedia pages """
    print(f"Process {i} starting...")
    asyncio.run(get_and_scrape_pages(num_pages, output_file))
    print(f"Process {i} finished.")

Now for our main function. Let's start with some constants (and our function declaration):

In [None]:
def main():
    NUM_PAGES = 50  # Number of pages to scrape altogether
    NUM_CORES = cpu_count()  # Our number of CPU cores (including logical cores)
    OUTPUT_FILE = str(
        Path(__file__).parent.joinpath("wiki_titles.tsv")
    )  # File to append our scraped titles to

    PAGES_PER_CORE = floor(NUM_PAGES / NUM_CORES)
    PAGES_FOR_FINAL_CORE = (
        PAGES_PER_CORE + NUM_PAGES % PAGES_PER_CORE
    )  # For our final core

    futures = []

    with concurrent.futures.ProcessPoolExecutor(NUM_CORES) as executor:
        for i in range(NUM_CORES - 1):
            new_future = executor.submit(
                start_scraping,  # Function to perform
                # v Arguments v
                num_pages=PAGES_PER_CORE,
                output_file=OUTPUT_FILE,
                i=i,
            )
            futures.append(new_future)

        futures.append(
            executor.submit(
                start_scraping, PAGES_FOR_FINAL_CORE, OUTPUT_FILE, NUM_CORES - 1
            )
        )

    concurrent.futures.wait(futures)

We create an array to store our futures, then we create a ProcessPoolExecutor, setting its max_workers equal to our number of cores. We iterate over a range equal to our number of cores minus 1, running a new process with our start_scraping function. We then append it our futures list. Our final core will potentially have extra work to do as it will scrape a number of pages equal to each of our other cores, but will additionally scrape a number of pages equal to the remainder that we got when dividing our total number of pages to scrape by our total number of cpu cores.

Make sure to actually run your main function:

In [None]:
if __name__ == "__main__":
    print("Starting: Please wait (This may take a while)....")
    start = time.time()
    main()
    print(f"Time to complete: {round(time.time() - start, 2)} seconds.")