Source: [Speed Up Your Python Program with Concurrency](http://www.pybloggers.com/2019/01/speed-up-your-python-program-with-concurrency/)

#### Synchronous Version

In [1]:
import requests
import time

def download_site(url, session):
    with session.get(url) as response:
        print(f"Read {len(response.content)} from {url}")
        
def download_all_sites(sites):
    with requests.session() as session:
        for url in sites:
            download_site(url, session)
            
if __name__ == "__main__":
    sites = [
        "http://www.jython.org",
        "http://olympus.realpython.org/dice",
    ] * 80

start_time = time.time()
download_all_sites(sites)
duration = time.time() - start_time
print(f"Downloaded {len(sites)} in {duration} seconds")

Read 19210 from http://www.jython.org
Read 275 from http://olympus.realpython.org/dice
Read 19210 from http://www.jython.org
Read 275 from http://olympus.realpython.org/dice
Read 19210 from http://www.jython.org
Read 275 from http://olympus.realpython.org/dice
Read 19210 from http://www.jython.org
Read 275 from http://olympus.realpython.org/dice
Read 19210 from http://www.jython.org
Read 275 from http://olympus.realpython.org/dice
Read 19210 from http://www.jython.org
Read 275 from http://olympus.realpython.org/dice
Read 19210 from http://www.jython.org
Read 275 from http://olympus.realpython.org/dice
Read 19210 from http://www.jython.org
Read 275 from http://olympus.realpython.org/dice
Read 19210 from http://www.jython.org
Read 275 from http://olympus.realpython.org/dice
Read 19210 from http://www.jython.org
Read 275 from http://olympus.realpython.org/dice
Read 19210 from http://www.jython.org
Read 275 from http://olympus.realpython.org/dice
Read 19210 from http://www.jython.org
Read 

In [5]:
import concurrent.futures
import requests
import threading
import time

thread_local =  threading.local()

def get_session():
    if not getattr(thread_loacal, "session", None):
        thread.local.session = requests.Session()
    return thread_local.session

def download_site(url):
    session = get_session()
    with session.get(url) as response:
        print(f"Read {len(response.content)} from {url}")
        
def download_all_sites(sites):
    with concurrent.futures.ThreadPoolExecutor(max_workers =5) as executor:
        executor.map(download_site, sites)
        
if __name__ == "__main__":
     sites = [
        "http://www.jython.org",
        "http://olympus.realpython.org/dice",
    ] * 80

start_time = time.time()
download_all_sites(sites)
duration = time.time() - start_time
print(f"Downloaded {len(sites)} in {duration} seconds")

Downloaded 160 in 0.04680013656616211 seconds


#### `threading` Version

When you add threading, the overall structure is the same and you only needed to make a few changes. `download_all_sites()` changed from calling the function once per site to a more complex structure.

In this version, you’re creating a `ThreadPoolExecutor`, which seems like a complicated thing. Let’s break that down:      
    
    ThreadPoolExecutor = Thread + Pool + Executor.
    
You already know about the `Thread part`. The `Pool` portion is where it starts to get interesting. This object is going to create a pool of threads, each of which can run concurrently. Finally, the `Executor` is the part that’s going to control how and when each of the threads in the pool will run. It will execute the request in the pool. 

Helpfully, the standard library implements `ThreadPoolExecutor` as a context manager so you can use the `with` syntax to manage creating and freeing the pool of `Threads`.

Once you have a `ThreadPoolExecutor`, you can use its handy `.map()` method. This method runs the passed-in function on each of the sites in the list. The great part is that it automatically runs them concurrently using the pool of threads it is managing.

The other interesting change in our example is that each thread needs to create its own `requests.Session()` object. When you’re looking at the documentation for `requests`, it’s not necessarily easy to tell, but reading one issue (link omitted), it seems fairly clear that you need a separate `Session` for each thread.

This is one of the interesting and difficult issues with threading. Because the operating system is in control of when your task gets interrupted and another task starts, any data that is shared between the threads needs to be protected, or thread-safe. Unfortunately `requests.Session()` is not thread-safe.

One strategy to use here is something called thread local storage. `Threading.local()` creates an object that look like a global but is specific to each individual thread. In your example, this is done with `threadLocal` and `get_session()`.

`ThreadLocal` is in the `threading` module to specifically solve this problem. It looks a little odd, but you only want to create one of these objects, not one for each thread. The object itself takes care of separating accesses from different threads to different data.

When `get_session()` is called, the session it looks up is specific to the particular thread on which it’s running. So each thread will create a single session the first time it calls `get_session()` and then will simply use that session on each subsequent call throughout its lifetime.

### `asyncio` Version ###

**`asyncio` Basics**

This will be a simplified version of `asycio`. There are many details that are glossed over here, but it still conveys the idea of how it works.

The general concept of `asyncio` is that a single Python object, called the **event loop**, controls how and when each task gets run. The event loop is aware of each task and knows what state it’s in. In reality, there are many states that tasks could be in, but for now let’s imagine a simplified event loop that just has two states.

The ready state will indicate that a task has work to do and is ready to be run, and the waiting state means that the task is waiting for some external thing to finish, such as a network operation.

Your simplified event loop maintains two lists of tasks, one for each of these states. It selects one of the ready tasks and starts it back to running. That task is in complete control until it cooperatively hands the control back to the event loop.

When the running task gives control back to the event loop, the event loop places that task into either the ready or waiting list and then goes through each of the tasks in the waiting list to see if it has become ready by an I/O operation completing. It knows that the tasks in the ready list are still ready because it knows they haven’t run yet.

Once all of the tasks have been sorted into the right list again, the event loop picks the next task to run, and the process repeats. Your simplified event loop picks the task that has been waiting the longest and runs that. This process repeats until the event loop is finished.

An important point of `asyncio` is that the tasks never give up control without intentionally doing so. They never get interrupted in the middle of an operation. This allows us to share resources a bit more easily in `asyncio` than in `threading`. You don’t have to worry about making your code thread-safe.

That’s a high-level view of what’s happening with `asyncio`. If you want more detail, this [StackOverflow answer](https://stackoverflow.com/questions/49005651/how-does-asyncio-actually-work/51116910#51116910) provides some good details if you want to dig deeper.[See the last section of this notebook where SO answer is reproduced].

**`async` and `await`**

Now let’s talk about two new keywords that were added to Python: `async` and `await`. In light of the discussion above, you can view `await` as the magic that allows the task to hand control back to the event loop. When your code awaits a function call, it’s a signal that the call is likely to be something that takes a while and that the task should give up control.

It’s easiest to think of `async` as a flag to Python telling it that the function about to be defined uses `await`. There are some cases where this is not strictly true, like asynchronous generators, but it holds for many cases and gives you a simple model while you’re getting started.

One exception to this that you’ll see in the next code is the `async with` statement, which creates a context manager from an object you would normally `await`. While the semantics are a little different, the idea is the same: to flag this context manager as something that can get swapped out.

As I’m sure you can imagine, there’s some complexity in managing the interaction between the event loop and the tasks. For developers starting out with `asyncio`, these details aren’t important, but you do need to remember that any function that calls `await` needs to be marked with `async`. You’ll get a syntax error otherwise.


The below code is running when executed as a script in command prompt.

In [2]:
import asyncio
import time
import aiohttp

async def download_site(session, url):
    async with session.get(url) as response:
        print("Read {0} from {1}".format(response.content_length, url))
        
async def download_all_sites(sites):
    async with aiohttp.ClientSession() as session:
        tasks = []
        for  url in sites:
            task = asyncio.ensure_future(download_site(session, url))
            tasks.append(task)
        await asyncio.gather(*tasks,return_exceptions = True)
        
if __name__ == "__main__":
    sites = ["http://www.jython.org","http://olympus.realpython.org/dice"] * 80
    start_time = time.time()
    asyncio.get_event_loop().run_until_complete(download_all_sites(sites))
    duration = time.time() - start_time
    print(f"Downloaded {len(sites)} in {duration} seconds")        

RuntimeError: This event loop is already running

Read 19210 from http://www.jython.org
Read 19210 from http://www.jython.org
Read 19210 from http://www.jython.org
Read 19210 from http://www.jython.org
Read 19210 from http://www.jython.org
Read 19210 from http://www.jython.org
Read 19210 from http://www.jython.org
Read 19210 from http://www.jython.org
Read 19210 from http://www.jython.org
Read 19210 from http://www.jython.org
Read 19210 from http://www.jython.org
Read 19210 from http://www.jython.org
Read 19210 from http://www.jython.org
Read 19210 from http://www.jython.org
Read 19210 from http://www.jython.org
Read 19210 from http://www.jython.org
Read 19210 from http://www.jython.org
Read 19210 from http://www.jython.org
Read 19210 from http://www.jython.org
Read 19210 from http://www.jython.org
Read 19210 from http://www.jython.org
Read 19210 from http://www.jython.org
Read 19210 from http://www.jython.org
Read 19210 from http://www.jython.org
Read 19210 from http://www.jython.org
Read 19210 from http://www.jython.org
Read 19210 f

#### `download_site()`

`download_site()` at the top is almost identical to the `threading` version with the exception of the `async` keyword on the function definition line and the `await` keyword when you actually call `session.get()`. You’ll see later why Session can be passed in here rather than using thread-local storage.

#### `download_site_from_list()`

The next function, `download_site_from_list()`, is fairly straight-forward. While there are still items in the list of sites, it pops one from the list and processes it by calling `download_site()`.

Since this function is run concurrently in many tasks, you might be wondering if calling `pop()` on a List is thread-safe. It is not. But all of our `asyncio` tasks are actually running in the same thread so you don’t need to worry about thread safety.

Tasks can’t be swapped out in the middle of a statement, which means that you don’t need to worry about a task getting interrupted and leaving the list in a bad state. Because there is no `await` keyword, you know that this statement will not hand control back to the event loop.

The next statement, `await download_site(...)`, will hand control back to the event loop, but by that point you know the sites list will be in a good state.

Note that, similar to our threading example, if you actually need things like queues in your design, the `asyncio` module provides classes and methods that do those operations but work with an event loop.

A slightly more subtle difference is that the `session` object does not need to be created in each task like it was in each thread. Each task uses the same session that was created earlier and passed in. You can get away with this because, again, all tasks are running in the same thread, so you don’t have to worry about thread safety.

#### `download_all_sites()`

`download_all_sites()` is where you will see the biggest change from the threading example.

You can share the session across all tasks, so the session is created here as a context manager.

Inside that context manager, it creates a list of tasks using `asyncio.ensure_future()`, which also takes care of starting them. Once all the tasks are created, this function uses `asyncio.wait()` to keep the session context alive until all of the tasks have completed.

The threading code does something similar to this, but the details are conveniently handled in the `ThreadPoolExecutor`. There currently is not an `AsyncioPoolExecutor` class.

There is one small but important change buried in the details here, however. Remember how we talked about the number of threads to create? It wasn’t obvious in the threading example what the optimal number of threads was.

One of the cool advantages of asyncio is that it scales far better than threading. Each task takes far fewer resources and less time to create than a thread, so creating and running more of them works well. This example just creates a separate task for each site to download, which works out quite well.

#### `__main__`

Finally, the nature of `asyncio` means that you have to start up the event loop and tell it which tasks to run. The `__main__` section at the bottom of the file contains the code to `get_event_loop(`) and then `run_until_complete()`. If nothing else, they’ve done an excellent job in naming those functions.

If you’ve updated to Python 3.7, the Python core developers simplified this syntax for you. Instead of the `asyncio.get_event_loop().run_until_complete()` tongue-twister, you can just use `asyncio.run()`.

### `multiprocessing` Version ###


In [None]:
import requests
import multiprocessing
import time

session = None

def set_global_session():
    global session
    if not session:
        session = requests.Session()
        
def download_site(url):
    with session.get(url) as response:
        name = multiprocessing.current_process().name
        print(f"{name}: Read {len(response.content)} from {url}")
        
def download_all_sites(sites):
    with multiprocessing.Pool(initializer=set_global_session) as pool:
        pool.map(download_site, sites)
        
if __name__ == "__main__":
    sites = [
        "http://www.jython.org",
        "http://olympus.realpython.org/dice",
    ] * 80

    start_time = time.time()
    download_all_sites(sites)
    duration = time.time() - start_time
    print(f"Downloaded {len(sites)} in {duration} seconds")        

#### `multiprocessing` in a Nutshell

Up until this point, all of the examples of concurrency in this article run only on a single CPU or core in your computer. The reasons for this have to do with the current design of CPython and something called the **Global Interpreter Lock**, or **GIL**.

This article won’t dive into the hows and whys of the GIL. It’s enough for now to know that the synchronous, threading, and asyncio versions of this example all run on a single CPU.

`multiprocessing` in the standard library was designed to break down that barrier and run your code across multiple CPUs. At a high level, it does this by creating a new instance of the Python interpreter to run on each CPU and then farming out part of your program to run on it.

As you can imagine, bringing up a separate Python interpreter is not as fast as starting a new thread in the current Python interpreter. It’s a heavyweight operation and comes with some restrictions and difficulties, but for the correct problem, it can make a huge difference.

#### `multiprocessing` Code

The code has a few small changes from our synchronous version. The first one is in `download_all_sites()`. Instead of simply calling `download_site()` repeatedly, it creates a `multiprocessing.Pool` object and has it map `download_site` to the iterable `sites`. This should look familiar from the `threading` example.

What happens here is that the Pool creates a number of separate Python interpreter processes and has each one run the specified function on some of the items in the iterable, which in our case is the list of sites. The communication between the main process and the other processes is handled by the `multiprocessing` module for you.

The line that creates Pool is worth your attention. First off, it does not specify how many processes to create in the Pool, although that is an optional parameter. By default, `multiprocessing.Pool()` will determine the number of CPUs in your computer and match that. This is frequently the best answer, and it is in our case.

For this problem, increasing the number of processes did not make things faster. It actually slowed things down because the cost for setting up and tearing down all those processes was larger than the benefit of doing the I/O requests in parallel.

Next we have the `initializer=set_global_session` part of that call. Remember that each process in our Pool has its own memory space. That means that they cannot share things like a `Session` object. You don’t want to create a new Session each time the function is called, you want to create one for each process.

The initializer function parameter is built for just this case. There is not a way to pass a return value back from the initializer to the function called by the process `download_site()`, but you can initialize a global session variable to hold the single session for each process. Because each process has its own memory space, the global for each one will be different.

That’s really all there is to it. The rest of the code is quite similar to what you’ve seen before.

#### How to Speed Up a CPU-Bound Program

Let’s shift gears here a little bit. The examples so far have all dealt with an I/O-bound problem. Now, you’ll look into a CPU-bound problem. As you saw, an I/O-bound problem spends most of its time waiting for external operations, like a network call, to complete. A CPU-bound problem, on the other hand, does few I/O operations, and its overall execution time is a factor of how fast it can process the required data.

For the purposes of our example, we’ll use a somewhat silly function to create something that takes a long time to run on the CPU. This function computes the square of each number from 1 to the passed-in value:

    def cpu_bound(number):
        return sum(i * i for i in range(number))

You’ll be passing in large numbers, so this will take a while. Remember, this is just a placeholder for your code that actually does something useful and requires significant processing time, like computing the roots of equations or sorting a large data structure.


### CPU-Bound  Synchronous Version###

In [1]:
import time

def cpu_bound(number):
    return sum(i*i for i in range(number))

def find_sums(numbers):
    for number in numbers:
        cpu_bound(number)
        
if __name__ == '__main__':
    numbers = [5_00_000 + x for x in range(20)]
    
    start_time = time.time()
    find_sums(numbers)
    duration = time.time() - start_time
    print(f"Duration {duration} seconds")        
        

Duration 5.865610361099243 seconds


This code calls `cpu_bound()` 20 times with a different large number each time. It does all of this on a single thread in a single process on a single CPU. The execution timing diagram looks like this:

Timing Diagram of an CPU Bound Program

### CPU-Bound `multiprocessing` Version ###

In [None]:
import multiprocessing
import time

def cpu_bound(number):
    return sum(i*i for i in range(number))

def find_sums(numbers):
    with multiprocessing.Pool() as pool:
        pool.map(cpu_bound, numbers)
        
if __name__ == '__main__':
    numbers = [5_00_000 + x for x in range(20)]
    
    start_time = time.time()
    find_sums(numbers)
    duration = time.time() - start_time
    print(f"Duration {duration} seconds")        
        

#### A SO POST: "How does `asyncio` actually work?"

Following is from a StackOverflow post "How does `asyncio` actually work?"[Link](https://stackoverflow.com/questions/49005651/how-does-asyncio-actually-work/51116910#51116910)

Before answering this question we need to understand a few base terms.

##### Generators

Generators are objects that allow us to suspend the execution of a Python function. User curated generators are implemented using th keyword `yield`. By creating a normal function containing the `yield` keyword, we turn that function into a generator:


In [1]:
def test():
    yield 1
    yield 2
    
gen = test()
next(gen)`

1

In [2]:
next(gen)

2

In [3]:
next(gen)

StopIteration: 

As you can see, calling `next()` on the generator causes the interpreter to load test's frame, and return the `yield`ed value. Calling `next()` again, cause the frame to load again into the interpreter stack, and continue on `yield`ing another value.

By the third time `next()` is called, our generator was finished, and `StopIteration` was thrown.

##### Communicating with a generator

A less known feature of generators, is the fact that you can communicate with them using two methods: `send()` and `throw()`.

In [4]:
def test():
    val = yield 1
    print(val)
    yield 2
    yield 3
    
gen =  test()
next(gen)

1

In [5]:
gen.send('abc')

abc


2

In [6]:
gen.throw(Exception())

Exception: 

Upon calling `gen.send()`, the value is passed as a return value from the `yield` keyword.

`gen.throw()` on the other hand allows throwing Exceptions inside generators, with the exception raised at the same spot `yield` was called.

##### Returning values from generators

Returning a value from a generator, results in the value being put inside the `StopIteration` exception, we can later on recover the value from the exception and use it to our need.

In [8]:
def test():
    yield 1
    return 'abc'

gen = test()
next(gen)

1

In [9]:
try:
    next(gen)
except StopIteration as exc:
    print(exc.value)

abc


##### Behold, a new keyword: `yield from`

Python 3.4 came with the addition of a new keyword: `yield from`. What that keyword allows us to do, is pass on any `next()`, `send()` and `throw()` into an inner-most nested generator. If the inner generators return a value, it is also the return value of `yield from`:

In [10]:
def inner():
    print((yield 2))
    return 3

def outer():
    yield 1
    val = yield from inner()
    print(val)
    yield 4
    
gen = outer()
next(gen)

1

In [11]:
next(gen)

2

In [12]:
gen.send('abc')

abc
3


4

#### Putting it all together

Upon introducing the new keyword `yield from` in Python 3.4, we were now able to create generators inside generators that just like a tunnel, pass the data back and forth from the inner-most to the outer-most generators. This has spawned a new meaning for generators - *coroutines*

**Coroutines** are functions that can be stopped and resumed while being run. In Python, they are defind using the **`async def`** keyword. Much like generators, they too use their own form of `yield from` which is **`await`**. Before `async` and `await` were introduced in Python 3.5, we created coroutines in the exact same way generators were created (with `yield from` instead of `await`).

```python
async def inner():
    return 1
    
async def outer():
    await inner()
```

Like every iterator or generator that implement the `__iter__()` method, coroutines implement `__await__()` which allows them to continue on every time `await coro` is called.

There is a nice [sequence diagram](image to be inserted) inside the [Python docs](url to be inserted) that you should check out. 

In asyncio, apart from coroutine functions, we have 2 important objects: **tasks** and **futures**.
    

#### Futures

Futures are objects that have th `__await__()` method implemented, and their job is to hold a certain stage and result. The state can be one of the following:

1- PENDING - future does not have any result or exception set.

2- CANCELLED - future was cancelled using `fut.cancel()`

3-FINISHED - future was finished, either by a result set using `fut.set_result()` or by `fut.set_exception()`

Another important feature of `future` objects, is that contain a method called `add_done_callback()`. This method allows functions to be called as soon as the task is done - whether it raised an exception or finished.

#### Tasks

Task objects are special futures, which wrap around coroutines, and communicate with the inner-most and outer-most coroutines. Every time a coroutine `await`s a future, the future is passed all the way back to the task(just like in `yield from`), and the task receives it.

Next, the task binds itself to the future. It does so by calling `add_done_callback()` on the future. From now on, if the future will ever be done, by either being cancelled, passed an exception or Passed a Python object as a result, the task's callback will be called, and it will rise back up to existence.

### Asyncio

The final burning question we must answer is - how is the IO implemented?

Deep inside asyncio, we have an event loop. An event loop of tasks. The event loop's job is to call tasks every time they are ready and coordinate all that effort into one single working machine.

The IO part of the event loop is built upon a single crucial function called `select`, which is a blocking function, implemented by the OS underneath, that allows waiting on sockets for incoming or outgoing data. Upon data being received it wakes up, and returns the sockets which received data,  or the sockets whom are ready for writing. 

When you try to receive or send data over a socket through asyncio, what actually happens below is that the socket is first checked if it has any data that can be immediately read or sent. It its `.send()` buffer is full, or the `.recv()` buffer is empty, the socket is registered to the `select` function (by simply adding it to one of the lists, `rlist` for `recv` and `wlist` for `send`) and the appropriate function `await`s a newly created `future` object, tied to that socket. 

When all available tasks are waiting for futures, the event loop calls `select` and waits. When the one of the sockets has incoming data, or its `send` buffer drained up, asyncio checks for the future object tied to that socket, and sets it done.

Now all the magic happens. The future is set to done, the task that added itself before with `add_done_callback()` rises back to life, and calls `send()` on the coroutine which resumes the innermost coroutine (because of th `await` chain) and you read the newly received data from a nearby buffer it was spilled unto.

#### Method chain again, in case of `recv()`:

1 - `select.select` waits.

2 - A ready socket, with data is returned. 

3 -  Data from the socket is moved into a buffer.

4 - `future.set_result()` is called.

5 - Task that added itself with `add_done_callback()` is now woken up.

6 - Task calls `.send()` on the coroutine which goes all the way into the innermost coroutine and wakes it up. 

7 - Data is being read from the buffer and returned to our humble user. 

In summary, asyncio uses generator capabilities, that allows pausing and resuming functions. It uses `yield from` capabilities that allow passing data back and forth from the innermost generator to the outermost. It uses all of those in order to halt function execution while it is waiting for IO to complete (by using the OS `select` function).

And the best of all? While one function is paused, another may run and interleave with delicate fabric, which is asyncio.

In [None]:
 caa