# Implementing Concurrency

**`Asynchronous Programming:`** is a **programming paradigm** that helps to deal with slow and unpredictable resources (such as `users`) and is widely used to build responsive services and user interfaces. And the techniques involved to implement it are: **`coroutines`** and **`reactive programming`**.

## Asynchronous Programming
Asynchronous Programming is a way of dealing with slow and unpredictable resources. Rather than waiting for idly resources to become available, asynchronous programs can handle multiple resources to become available, asynchronous programming can handle multiple resources concurrently and efficiently.

Programming in an asynchronous way can be *challenging* because it is necessary to *deal with external requests that can arrive in any order, may take a varible amount of time, or may fail unpredictably.*


### Waiting for Input/Output
A modern computer employs different kinds of memory to store data and perform operations. In general, a computer possesses a combination of expensive memory that is capable of operating efficiently and cheaply and more abundant memory that is slower and is used to store a large amount of data.

**The memory hierarchy:**

<img src="./assets/memory-hierarchy.png">

**`Registers:`** At the top of the memory hierarchy, are the CPU **registers**. Those are integrated with the CPU and are used to store and execute machine instructions. Acccessing data in a register memory generally takes one clock cycle. This means that if the CPU operates at 3 gigahertz (GHz), the time it takes to access one element in CPU Register Memory is in the order of 0.3 nanoseconds. 3 GHz means 3 Billion Cycles Per Second.

**`CPU Cache:`** Below the registers layer, you can find the CPU Cache Mmeory, which comprises of multiple levels and is integrated with the processor. The Cache operates at a slightly slower speed than the registers but within the same **order of magnitude (OOM)**.

**`Main Memory (RAM):`** Holds much more data but is slower than the Cache Memory. Fetching an item from memory can take a few 100 clock cycles.

**`Secondary Memory (Persistence Storage):`** At the Bottom Layer, you can find the persistent storage, such as rotating disks (Hard Disk Drives (HDDs)) and (Solid State Drives (SSD)). These devices hold the most data and are order of magnitude (OOMs) slower than the main memory (RAM). An HDD may take few miliseconds to seek and retrieve an item, while an SSD is substantially faster and takes only a fraction of a milisecond.

To put the relative speed of each memory type into perspective, if you were to have the CPU with a Clock Speed of about 1 Second, a Register access would be equivalent to picking up a pen from a table. A Cache access would be equivalent to picking up a book from a shelf. Moving higher up the hirearchy, a RAM access would be equivalent to loading up the laundry (about 20 times slower than the cache). When we move to persistent storage, things are quite different. Retrieving an element from an SSD will be equivalent to going on a 4-Day road trip while retrieving an element from an HDD can take up to 6 months! The duration can stretch even further if we move on to access resources over the network.

Therefore, accessing data from **storage** and other **input/output (I/O)** devices is much slower compared to the CPU; therefore, it is very important to handle those resources so that the CPU is never stuck waiting aimlessly. This can be accomplished by carefully designed software capable of managing multiple ongoing requests at the same time. This is the idea of **concurrency** or **concurrent programming**.

## Concurrency

**Concurrency** is a way to implement a system that can *deal with multiple requests at the same time*. The idea is that we can move on and start handling other resources while we wait for a resource to become available.

Concurrency works by splitting a task into smaller subtasks that can be executed out of order so that multiple taks can be prtially advanced without waiting for the previous tasks to finish.

For example, let's say we have a Web Service that takes the square of a number, and the time between our request and the response will be approximately 1 second. Implementing a `network_request` function that takes a number and returns a dictionary that contains information about the success of the operation and the result.

In [2]:
import time


def network_request(number: int) -> dict[str, int | bool]:
    time.sleep(1)
    return {
        "success": True,
        "result": number**2,
    }

Let's also write some additional code that performs the request , verifies that the request was successful, and prints the result. The following `fetch_square` function calculates the square of the number 2 using a call to the `network_request` function.

In [3]:
def fetch_square(number):
    response: dict[str, int | bool] = network_request(number=number)
    if response["success"]:
        print(f"Result is {response["result"]}")

In [4]:
fetch_square(number=2)

Result is 4


Fetching a number from the network will take 1 second because of the slow network. *What if we want to calculate the square of multiple numbers?* We can call `fetch_square`, which will start a next network request as soon as the previous one is done. For example:

In [5]:
fetch_square(number=2)
fetch_square(number=3)
fetch_square(number=4)

Result is 4
Result is 9
Result is 16


This above code as-is will take roughly 3 seconds to run, but it's not the best we can do. Notice that the calculation of the square of 2,3 and 4 all of them are independent of each other. As such, waiting for a previous result to finish before moving on to the next number is unnecessary, if we can technically submit multiple requests and wait for them at the same time.

In the following diagram, the 3 tasks are represented as boxes. The time spent by the *CPU processing and submitting* the request is in **orange**, while *waiting times* are in **blue**. You can clearly see how most of the time is spent waiting for the resources while our machine sits idle without doing anything else.

<img src="./assets/execution-time-of-independent-calculation.png"/>

Ideally, we would like to start another new task while we are waiting for the already submitted tasks to finish. In the following diagram, you can see that as soon as we submit our network request in `fetch_square(2)`, we can start preparing for `fetch_square(3)`, and so on. This allows us to reduce the CPU waiting time and to start processing the results as soon as they become available.

<img src="./assets/efficient-way-of-performing-independent-calculations.png"/>

Again, this strategy is made possible by the fact that the 3 network requests in fetch square function are completely independent, and we don't need to wait for the completion of a previous task to start the next one.

Also, note how a Single CPU can comfortably handle this scenario. While distributing the work on multiple CPUs can further speed up the execution, if the wait time is large compared to the processing times, the speedup will be minimal. To implement concurrency, it is necessary to think about our programs and their design differently.

## Callbacks
> **Callback is simply a function that you `pass as an argument` to another function.** With the understanding that the receiving function will `"call back"` the passed-in function at a certain time. Callbacks are often used when you have some task that will complete in the Future - such as network requests, reading files or performing asynchronous operations and you also want to specify what should happen once that task is finished, without blocking the entire program while waiting.

The code we have seen so far, blocks the execution of the programs until the resource is available. The call responsible for the waiting time is `time.sleep`. To make the code, start working on the other tasks, we need to find a way to avoid blocking the progrm flow so that the rest of the program can move on to those other tasks. One of the simplest way to accomplish this behaviour is through **Callbacks**.

The **Callback Strategy** is quite similar to what we do when we request a cab. Imagine that you are at a restaurant and you've had a few drinks. It's raining outside, and you'd rather not take the bus, therefore, you request a taxi and ask them to call when they're outside so that you can come out and you don't have to wait in the rain. What you did in this case, is request a taxi (that is, the slow resource), but instead of waiting outside until the taxi arrives, you provide your number and instructions (callback) so that you can come outside when they're ready and go home.

Let's see *how this mechanism can work in code*. We will compare the blocking code of `time.sleep` with the equivalent non-blocking code of `threading.Timer`. The following `wait_and_print` function will block the program execution for 1 second and then print a message.

In [10]:
def wait_and_print(message: str) -> None:
    time.sleep(1)
    print(message)

If we want to write the same function in a **non-blocking** way, we can use the `threading.Timer` class. We can initialize the class instance by passing the amount of time we want to wait and a **Callback**. A **Callback** is *simply a function* that will be called when the timer expires.

In [17]:
import threading


def wait_and_print_async(message: str) -> None:
    def callback() -> None:
        print(message)

    # After 1.0 second delay the timer will call the function:callback
    timer = threading.Timer(interval=1.0, function=callback)
    # Starts the timer to run in a separate thread instead of the Main Thread
    timer.start()

An important feature of the `wait_and_print_async` function is that none of the statements are blocking the execution flow of the program. This technique of registering callbacks for execution in response to certain events is called Hollywood Principle. Because, after auditioning for a movie or TV role at Hollywood, you may be told *"Don't call us, we'll call you"*, meaning that they won't tell you if they chose you for the role immediately, but they'll call you if they do.

**Analogy to Understand Callbacks**

Imagine you have a friend who offers to remind you about something after a certain amount of time—say, to water your plants after 1 hour. You give your friend a note with instructions: *"Please tell me 'Remember to water your plants' after 1 hour."*

- **You**: The main program that wants something done after a delay.
- **The Note (Your Instructions)**: This is like your callback function—it's your predefined action (what to say to remind you).
- **Your Friend (The Timer)**: Your friend will not remind you immediately. Instead, they will hold onto your instructions (the callback) and, after 1 hour has passed, they will execute those instructions—i.e., they will read them out loud to remind you.

You’re not standing around waiting for your friend to remind you. You go about your day. The friend takes care of the timing and "calls back" to you with the message. In this analogy, the callback is the set of instructions you left with your friend. The friend (timer) decides when to execute them.

**How is `threading.Timer` capable of waiting without blocking?**

The strategy used by `threading.Timer` involves starting a new thread that can execute code in parallel.


To highlight the difference between the blocking and non-blocking versions of `wait_and_print`, we can test and compare the execution of the two versions.

In [18]:
# Synchronous
wait_and_print(message="First Call")
wait_and_print(message="Second Call")
print("After Call")

First Call
Second Call
After Call


In [22]:
# Asynchronous
wait_and_print_async(message="First call async")
wait_and_print_async(message="Second call async")
print("After submission")

After submission


Second call async
