# Prerequisites
- Experience with the specific topic: Novice
- Professional experience: No industry experience required

Knowledge of general concurrent programming is not required, but the reader should be familiar with basic Python programming, as well as the basic theories behind web scraping.

    pip install tornado==4.5.3



# Introduction to Asynchronous Programming

This tutorial will provide an overview of asynchronous programming. It will cover conceptual elements of asynchronous programming, the basics of Python's async API's, and an example implementation of an asynchronous web scraper.

Synchronous programs are straightfoward: start a task, wait for it to finish, repeat until all tasks have executed; however, waiting wastes valuable CPU cycles. Asynchronous programs remedy this problem by using context-switching to interleave tasks in order to minimize processor idle-time, minimize total program execution time, and improve program responsiveness. 


## Simultaneous Chess Analogy

To understand the benefit of asynchronous programming, consider the following scenario:

Ten opponents have each challenged you to a game of chess. Since you are very good at chess, you take three minutes per move. Each of your opponents take five minutes. Each full game takes twenty moves per player, or forty moves in total. 

The time it takes to play one full game is:

    num_moves * my_move_time + num_moves * opponent_move_time
    = 20 * 3 + 20 * 5
    = 160 minutes

Playing through each of the 10 games sequentially would take about 27 hours. 

Notice, however, that each time you make a move, you then spend 5 minutes idle while you wait for the other player to respond. For each 160 minute game, you spend 100 minutes waiting -- that's 62.5% of each game, and 1000 minutes across all ten games. 

Instead of waiting, suppose that, as soon as you finish making your move in a given game, you immediately progress to the next opponent. Starting with opponent 1, you take 3 minutes to make a move. Then you progress on to opponent `2, 3, ..., 10`. When you've finished your move with the 10th opponent, you return to the game with opponent 1. Let's call this process a _round_.

While you're progressing through opponents 2...10, opponent 1 has `(num_opponents - 1) * my_move_time = (10-1) * 3 = 27 minutes > 5 minutes` to think and make their next move -- therefore, when you come back to opponent 1 at the start of the next round, you can immediately make your move without waiting for them to finish up. 

In general, as long as each opponent takes less than `(num_opponents - 1) * my_move_time` to make their move, you will spend no time waiting. Each round will take `num_opponents*my_move_time = 10*3 = 30 minutes`. Since we're assuming each game consists of 40 moves and we complete 2 moves per game in each round (your move and the opponent's move), it will take 20 rounds to complete all the games. So, we have that all 10 games will take `30 minutes * 20 rounds = 600 minutes = 10 hours`, which is 270% speedup overall over the sequential approach.

<table class="image">
    <caption align="bottom" style="text-align: center">Simultaneous Chess by Samuel Reshevsky</caption>
    <tr><td><img width=500 src="https://image.ibb.co/h2oizp/Samuel_Reshevsky_age_8_defeating_several_chess_masters_at_once_in_France_1920.jpg"></img></td></tr>
</table>


Another advantage of asynchronous programming is improved responsiveness. Say you have in a program an ordered list of independent tasks to be executed, and the first task is a relatively long-running (heavy-weight) one. If the program was executed sequentially, it would first have to spend a significant amount of time to finish the first task (as it is a long-running task) before it moves to the other faster, lighter tasks.


# Asynchronous Programming in Python

In this section we will be discussing the specifics of implementing an asynchronous program in Python. First we will go over the general structure of an asynchronous program and its main elements, then we will learn about specific APIs that Python provides to facilitate asynchronous programming, and finally we will try our hands with a starting problem of programming asynchrously.

## Coroutines, Event Loops, and Futures

Coroutines, event loops, and futures are the essential elements of an asynchronous program.

- The **event loop** handles the task-switching aspect, or execution flow, of the program. It keeps track of all the tasks that are to be run asynchronously and decides which of those should be executed at a given moment.
- A **coroutine** is a special type of function that wraps around a specific task so that it can be executed asynchronously. A coroutine specifies where in the function the task-switching event should take place -- this is when the execution flow is returned from the function back to the event loop. Coroutines are typically created by the event loop, and stored internally in a task queue.
- A **future** is a placeholder for a result returned from a coroutine. _As soon as the event loop initiates a coroutine, a corresponding future is created that stores the result of the coroutine, or an exception if one was thrown during the coroutine's execution._

To illustrate these concepts, let's start with a simple synchronous countdown program:

In [None]:
import time

def seq_countdown(name, delay):
    # padding space for formatting
    indents = (ord(name) - ord('a')) * '\t'
    n = 3
    while n:
        time.sleep(delay)
        # elapsed time from the beginning
        duration = time.perf_counter() - start
        print('-' * 40)
        # message of the form: duration letter = countdown
        print(f'{duration:.4f} \t{indents}{name} = {n}')
        n -= 1
        
start = time.perf_counter()

seq_countdown('a', 1)
seq_countdown('b', 0.8)
seq_countdown('c', 0.5)

print('-' * 40)
print('Finished.')

Currently, the execution time of this program is `3a + 3b + 3c = 7 seconds` -- however, if we run these countdowns asynchronously, the total execution time should become `max(3a, 3b, 3c) = 3 seconds`, a 233% speedup. How can we rewrite a synchronous program as an asynchronous one?

## Python: `async`, `await`, and `asyncio`

Python 3.5 ([PEP 492](https://www.python.org/dev/peps/pep-0492/)) added two new keywords to support asynchronous programming: `async` and `await`. It also added a builtin module called `asyncio`.

The `async` keyword indicates to the interpreter that the function defintion that follows should be flagged as a coroutine. For example:

In [None]:
async def my_coroutine(name):
    # coroutine implementation
#     print('hello: ' + name)
    return 'hello: ' + name

The `await` keyword creates a future for a coroutine:

In [None]:
future_result = await my_coroutine('foo')
print(future_result)

The `asyncio` module provides high-level routines for creating event loops (among other things):

In [None]:
import asyncio

# create an event loop (an instance of asyncio.AbstractEventLoop)
event_loop = asyncio.get_event_loop()

# populate the event loop's task queue
tasks = [event_loop.create_task(my_coroutine('foo')),
        event_loop.create_task(my_coroutine('bar'))]

# blocks until all future are complete
results = event_loop.run_until_complete(asyncio.wait(tasks))
# TODO wat?
print(results)

An event loop, coroutines and their corresponding futures are the core elements to an asynchronous program. First, the event loop is started and creates a task queue. A coroutine for the first task is then executed and its corresponding future is created. When a task-switching event is to take place inside this coroutine, the coroutine is suspended and another coroutine will be called. If this coroutine is a blocking function (e.g. input/output processing, sleeping, etc.), the execution flow is released back to the event loop, which will then execute the next item in the task queue.

This process will repeat for all items of the task queue, and as the task-switching event takes place in the coroutine for the last task, the execution flow will be given to the coroutine of the first task again. During this process, as a task finishes executing, it will be eliminated from the task queue, its coroutine will be terminated, and the corresponding future will register the returned result from that coroutine. The process will go on until all tasks in the queue are completely executed or terminated.

The following diagram further illustrates the general structure of an asynchronous program described above:

<table class="image">
    <caption align="bottom" style="text-align: center">
        Asynchronous Programming Structure<br>
        <a href="https://medium.freecodecamp.org/a-guide-to-asynchronous-programming-in-python-with-asyncio-232e2afa44f6">Source</a>
    </caption>
    <tr><td><img width=400 src="https://image.ibb.co/iJ2PvU/0_s1_GH0_YO9_ZNd_EEDxo.jpg"></img></td></tr>
</table>

## Asynchronous Countdowns

Let's turn our synchronous countdown program into an asynchronous one. To do this, we need to make the countdown function into a coroutine and specify a point inside the function to be a task switching event. We'll add the keyword `async` in front of the function, and instead of the `time.sleep()` function, we will be using `asyncio.sleep()` together with the `await` keyword. The rest of the function should remain the same. Our countdown coroutine should now be:

In [1]:
import asyncio

async def async_countdown(name, delay):
    # padding space for formatting
    indents = (ord(name) - ord('a')) * '\t'
    
    n = 3
    while n:
        await asyncio.sleep(delay)
        
        # elapsed time from the beginning
        duration = time.perf_counter() - start
        print('-' * 40)
        # message of the form: duration letter = countdown
        print(f'{duration:.4f} \t{indents}{name} = {n}')
        
        n -= 1

We now need to initialize and manage an event loop. Specifically, we'll create an empty event loop with the `asyncio.get_event_loop()` method, add all of the three countdown tasks into the task queue with `create_task()`, and finally start running the event loop with `run_until_complete()`:

In [2]:
loop = asyncio.get_event_loop() # creating the event loop
# adding tasks to the task queue
tasks = [
    loop.create_task(async_countdown('a', 1)),
    loop.create_task(async_countdown('b', 0.8)),
    loop.create_task(async_countdown('c', 0.5))
]

start = time.perf_counter()
# run the event loop until all tasks are complete
loop.run_until_complete(asyncio.wait(tasks))

print('-' * 40)
print('Done.')

----------------------------------------
0.5012 			c = 3
----------------------------------------
0.8015 		b = 3
----------------------------------------
1.0012 	a = 3
----------------------------------------
1.0026 			c = 2
----------------------------------------
1.5036 			c = 1
----------------------------------------
1.6022 		b = 2
----------------------------------------
2.0031 	a = 2
----------------------------------------
2.4039 		b = 1
----------------------------------------
3.0048 	a = 1
----------------------------------------
Done.


Now we can see how asynchronous program can improve execution time and responsiveness of our programs. Instead of executing individual tasks sequentially, our program now switches between different countdowns and overlaps their processing/waiting time. 

At the beginning of the program, instead of waiting for the whole first second to print out the first message "a = 3", the program switches to the next task in the task queue, in this case it is waiting for 0.8 seconds for the letter "b". This process continues until 0.5 seconds have passed, and "c = 3" is printed out, and 0.3 seconds later (at time 0.8 seconds), "b = 3" is printed out. This all happens before "a = 3" is printed out.

This task-switching property of our asynchronous program makes it significantly more responsive. Instead of "hanging" for one second before the first message is printed, the program now only takes 0.5 seconds (the shortest waiting period) to print out its first message. As for execution time, we see that this time it only takes three seconds in total to execute the whole program (instead of 6.9 seconds). This corresponds to what we speculated--that the execution time would be right around the time it takes to execute the largest task.

## Asynchronous Fibonacci

Let us try applying asynchronous programming to another problem. In this example we will consider the task of calculating different numbers in the Fibonacci sequence. Our starting function will look like:

In [None]:
def seq_fib(n):
    # if n = 0, say 0, if n = 1, say 1
    if n in [0, 1]:
        print(f'{n}: {n}')
        print(f'Took {time.perf_counter() - start:.2f} seconds.')
    
    # sequentially calculating fib(n)
    a, b = 1, 2
    i = 1
    while i < n:
        a, b = b, a + b
        i += 1
    
    # printing the last 20 digits if the result is too large
    print(f'{n}: {a % (10 ** 20)}')
    # printing the time elapsed from the beginning
    print(f'Took {time.perf_counter() - start:.2f} seconds.')

Now we will use this function to generate different numbers in the sequence:

In [None]:
start = time.perf_counter()

seq_fib(1000000)
seq_fib(1000)
seq_fib(20)

As discussed above, this is one of the situations in which the first task in the program is so large that the execution appears to be hanging while processing this task. This is undesirable in terms of application responsiveness. Furthermore, the other two tasks were light-weight enough that it took almost no time to execute them after the first task.

Now, if we were to implement asynchronous programming for this program, we would want to address this problem so that the last two tasks, `seq_fib(1000)` and `seq_fib(20)` would finish before the first task, `seq_fib(1000000)`. To do this we would need to convert the `seq_fib()` function into a coroutine with a specification about the task-switching event inside the function. Below we are specifying inside the while loop that if the iterator `i` is divisible by 50,000 then switch to the next task:

In [None]:
async def async_fib(n):
    # if n = 0, say 0, if n = 1, say 1
    if n in [0, 1]:
        print(f'{n}: {n}')
        print(f'Took {time.perf_counter() - start:.2f} seconds.')
    
    # sequentially calculating fib(n)
    a, b = 1, 2
    i = 1
    while i < n:
        a, b = b, a + b
        
        if i % 50000 == 0:
            await asyncio.sleep(0) # switches task every 50,000 iterations
        
        i += 1
    
    # printing the last 20 digits if the result is too large
    print(f'{n}: {a % (10 ** 20)}')
    # printing the time elapsed from the beginning
    print(f'Took {time.perf_counter() - start:.2f} seconds.')

With the coroutine implemented, we now need to set up the event loop and add the tasks to its task queue:

In [None]:
loop = asyncio.get_event_loop() # creating the event loop
# adding tasks to the task queue
tasks = [
    loop.create_task(async_fib(1000000)),
    loop.create_task(async_fib(1000)),
    loop.create_task(async_fib(20))
]

start = time.perf_counter()
# run the event loop until all tasks are complete
loop.run_until_complete(asyncio.wait(tasks))

print('Done.')

We see that our problem of responsiveness has been successfully addressed: after running the program, the more light-weight tasks were executed and completed first, and the longest-running task returned last. Again, the implementation of asynchronous programming allows that wherever the light-weight tasks are in the task queue, they will be able to finish first, thus creating better responsiveness for the program.

## CPU-bound and Blocking Tasks

Asynchronous programming is said to provide us more than just better responsiveness; it provides even better execution time as we saw in the counting-down example. However, when analyzing the execution time obtained from the two versions of our Fibonacci-finding program, it was actually the sequential version that resulted in better speed. So does that mean asynchronous programming cannot really provide better speed for our programs?

The answer is, it still can, but more steps are needed, specifically for the Fibonacci-finding program. In the `async_fib()` coroutine of that program, most instructions are CPU-bound tasks. This means that the program needs to access the CPU to execute the code, and no waiting/processing time can therefore be overlapped. While task-switching events still take place in our program, which will still improve its responsiveness, no instructions will be overlapping each other and no additional speed can thus be gained. In fact, since there is considerable overhead regarding the implementation and initialization of asynchrnous elements, our asynchronous program might, and did, take even longer to execute than the original synchronous program. The same goes for blocking tasks, which also prevent multiple tasks to be overlapped with each other.

This, however, does not mean that if your program contains heavy CPU-bound or blocking tasks, asynchronous programming is out of the question. All execution in an asynchronous program, if not specified otherwise, occurs entirely in the same thread and process, and CPU-bound or blocking tasks can thus prevent program instructions from overlapping each other. However, this is not the case if the tasks are distributed to separate threads/processes. In other words, threading and multiprocessing can help asynchronous programs with blocking instructions achieve better execution time.

## The concurrent.futures Module

Luckily, Python does provide an option to implement this specification: the `concurrent.futures module`, which is designed to be a high-level interface for implementing asynchronous tasks. Specifically, `concurrent.futures` works seamlessly with the `asyncio` module, and in addition provides an an abstract class called `Executor`, which contains the skeleton of the two main classes that implement asynchronous threading and multiprocessing respectively (as suggested by their names): `ThreadPoolExecutor` and `ProcessPoolExecutor`.

Before we jump into the API from `concurrent.futures`, let us discuss the theoretical basics of asynchronous threading/multiprocessing, and how it plays into the framework of asynchronous programming that asyncio provides.

As a reminder, we have three major elements in our ecosystem of asynchronous programming: the event loop, coroutines and their corresponding futures. We still need the event loop while utilizing threading/multiprocessing to coordinate the tasks and handle their returned results (futures), so these elements typically remain consistent with single-thread asynchronous programming.

As for the coroutines, since the idea of combining asynchronous programming with threading and multiprocessing involves avoiding blocking tasks in the coroutines by executing them in separate threads and processes, the coroutines do not necessarily have to be interpreted as actual coroutines by Python anymore. Instead, they can simply be traditional Python functions.

One new element we will need to implement is the executor that facilitates threading or multiprocessing; this can be an instance of the `ThreadPoolExecutor` class or the `ProcessPoolExecutor` class. Now, every time we add a task to our task queue in the event loop, we will also need to reference this executor so that separate tasks will be executed in separated threads/processes. This is done through the `AbstractEventLoop.run_in_executor()` method, which takes in an executor, a coroutine (though, again, it does not have to be an actual coroutine), and arguments for the coroutines to be executed in separate threads/processes.

Let us try applying this solution to our Fibonacci-finding program:

In [None]:
from concurrent.futures import ProcessPoolExecutor

async def main():
    tasks = [
        loop.run_in_executor(executor, seq_fib, 1000000),
        loop.run_in_executor(executor, seq_fib, 1000),
        loop.run_in_executor(executor, seq_fib, 20)
    ]
    
    await asyncio.gather(*tasks)

start = time.perf_counter()

executor = ProcessPoolExecutor(max_workers=3)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())

We can see that by combining multiprocessing with asynchronous programming, we get "the best of both worlds": the consistent responsiveness from asynchronous programming, and the improvement in speed from multiprocessing.

# Asynchronous Web-Scraping

One of the most common applications of asynchronous programming is data collection via web-scraping. This is the process of automating HTTP requests to various websites and extracting specific information from their HTML source code. While intereacting with multiple web servers simultanesouly, it is possible that some servers will take longer to process our requests than others. This is the perfect opportunity to implement asynchronous programming, as we can overlap the time spent waiting on the servers that take longer to response and the time we process the responses that are already returned from other servers.

The following diagram further illustrates the concept of asynchronous HTTP server-client communication processes:

<table class="image">
    <caption align="bottom" style="text-align: center">
        Asynchronous Programming Structure<br>
        <a href="https://en.kodedu.com/2013/06/jax-rs-2-and-longpolling-based-chat-application/">Source</a>
    </caption>
    <tr><td><img width=500 src="https://preview.ibb.co/hDuuNK/jax_rs_async.png"></img></td></tr>
</table>

In this section we will also be introduced with another module supporting asynchronous programming options: `aiohttp` (which stands for Asynchronous I/O HTTP). This module provides high-level functionalities that streamline HTTP communication procedures, and it additionally works really well with the `asyncio` module to facilitate asynchronous programming.

## Fetching a Website's HTML Code

Let us first look at how to make a request and obtain the HTML source code from a single website with aiohttp. Note that even with only one task (website), our application still remains asynchronous, and the structure of an asynchronous program still needs to be implemented. The general structure of the program is as follow:

In [None]:
import aiohttp

async def get_html(session, url):
    async with session.get(url, ssl=False) as res:
        return await res.text()

async def main():
    async with aiohttp.ClientSession() as session:
        html = await get_html(session, 'http://datascience.com')
        print(html[: 1000]) # first 1000 characters

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

Let's consider the `main()` coroutine first. We are initiating an instance from the `aiohttp.ClientSession` class within a context manager; note that we are placing the `async` keyword in front of this declaration also, since the whole context block itself will also be treated as a coroutine. Inside this block we are calling and waiting for the `get_html()` coroutine to process and return.

Turning our attention to the `get_html()` coroutine, we see that it takes in a session object and a url for the website that we want to extract HTML source code from. Inside this function we make another context manager asynchronous, which is used to make a GET request and store the response from the server to the variable res. Finally, we return the HTML source code stored in the response; since the response is an object returned from the `aiohttp.ClientSession` class, its methods are asynchronous functions and we therefore need to specify the keyword `await` when we call the `text()` function.

Finally, we are printing out the first 1,000 characters of the HTML code returned from the website we specified - in this case it is the homepage of [DataScience.com](https://www.datascience.com).

## Writing Files Asynchronously

Most of the time, we would like to collect data by making requests to multiple websites, and simply printing out the response HTML code is inappropriate for many reasons; instead, we'd like to write the returned HTML code to output files. This process, in essence, is asynchronous downloading, which is also implemented in the underlying architecture of popular download managers. To do this we will use the `aiofiles` module, which is used to facilitate this asynchronous file-writing processes, in combination with `aiohttp` and `asyncio`.

To do this, we first need to modify our `get_html()` coroutine, which originally simply returns the HTML code in the server's response, so that it can now write that HTML code to file in an asynchronous manner.

In [None]:
import aiofiles
import os

async def download_html(session, url):
    async with session.get(url, ssl=False) as res:
        filename = f'output_{os.path.basename(url)}.html'
        
        async with aiofiles.open(filename, 'wb') as f:
            while True:
                chunk = await res.content.read(1024)
                if not chunk:
                    break
                await f.write(chunk)
            
        return await res.release()

Instead of using an aiohttp.ClientSession instance to make a GET request and print out the returned HTML code, we now write the HTML code to file using the aiofiles module. For example, to facilitate asynchronous file writing, we use the asynchronous `open()` function from `aiofiles` to read in a file in a context manager. Furthermore, we also read the returned HTML in chunks asynchronously using the `read()` function for the content attribute of the response object--this means that after reading 1024 bytes of the current response, the execution flow will be released back to the event loop and the task-switching event will take place.

Now we can implement the main part of our program:

In [None]:
async def main(url):
    async with aiohttp.ClientSession() as session:
        await download_html(session, url)
        
urls = [...] # a big list of websites to scrape

loop = asyncio.get_event_loop()
loop.run_until_complete(
    asyncio.gather(*(main(url) for url in urls))
)

The `main()` coroutine takes in a URL and pass it to the `download_html()` coroutine, together with an `aiohttp.ClientSession` instance. Finally, in our main program we create an event loop and pass each item in a specified list of URLs to the `main()` coroutine.

Try this program with your favorite websites to see the improvements that can be achieved via asynchronous programming. As a note specifically regarding the legality of web-scraping, it is important to know and understand usage-of-data policies of websites that you would like to scrape from. Even though cooperative web-scraping is generally acceptable, some websites do not allow automated engines to collect their data and scraping data from these websites is therefore illegal.

# Conclusion

In this tutorial, we have learnt about the foudnational idea of asynchronous programming and the main elements of any asynchronous program. We have seen the process of implementing asynchronous programs in Python via various examples, and we finally designed a simply asynchronous web-scraping engine. Throughout this tutorial, various points about the advantages of asynchronous programming in comparison with traditional sequential programming such as better speed and improved responsiveness have also been made.