## Task 4: Asynchronous Conncurrency with Trio
**Goal**: We will learn about synchronous operations using a python library called Trio

**Learning Outcomes**: Gain a basic understanding of parallelized I/O, learn how to use Trio to implement parallelization. 

**Prerequisites**: Basic understanding of python.

### Part 1: Introduction to Parallelization
Parallelization allows you to do multiple things the same time. This is particularily important for tasks that involved waiting like webscraping. 

For example, if you want to fetch a lot of webpages in parallel, your program needs to manager lots of connections at the same time. 

First, we will introduce you to async functions in python. Here is how you would define a regular function compared to an as
```python
# A regular function
def regular_double(x):
    return 2 * x

# An async function
async def async_double(x):
    return 2 * x
```

To call an async function, use the ```await``` keyword inside an async function.
```python
async def print_double(x):
    print(await async_double(x)) 
```

In [6]:
from IPython.display import Image

## Part 2: Trio, an async library

To call your async function, all you have to do is call ```trio.run```, for example: 
```python
import trio

async def async_double(x):
    return 2 * x

trio.run(async_double, 3)  # returns 6
```

Here the second parameter is the input to the ```async_double``` function. 

To really see async in action, we need to observed multiple functions running in parallel, below is an example. When you run this example, you can see that even though child 1 is sleeping longer than child 2, child 2 can spawn and finish before child 1 because both functions are running in parallel. Here is an image to help you understand: 

![Image](img/trio-image.jpg)

In [7]:
import trio
import time

### SYNCHRONOUS EXECUTION
def synchronous_child1():
    print("  child1: started! sleeping now...")
    time.sleep(3)
    print("  child1: woke up! exiting!")


def synchronous_child2():
    print("  child2: started! sleeping now...")
    time.sleep(2)
    print("  child2: woke up! exiting!")


def synchronous_parent():
    start_time = time.time()
    print("parent: started!")
    
    print("parent: spawning child1...")
    synchronous_child1()

    print("parent: spawning child2...")
    synchronous_child2()
    end_time = time.time()
    print(f"parent: all done! {end_time - start_time} seconds to complete")

## ASYNCHRONOUS EXECUTION
async def child1():
    print("  child1: started! sleeping now...")
    await trio.sleep(3)
    print("  child1: woke up! exiting!")


async def child2():
    print("  child2: started! sleeping now...")
    await trio.sleep(2)
    print("  child2: woke up! exiting!")


async def parent():
    start_time = time.time()
    print("parent: started!")
    async with trio.open_nursery() as nursery:
        print("parent: spawning child1...")
        nursery.start_soon(child1)

        print("parent: spawning child2...")
        nursery.start_soon(child2)

        print("parent: waiting for children to finish...")
        # -- we exit the nursery block here --
    end_time = time.time()
    print(f"parent: all done! {end_time - start_time} seconds to complete")

print("running synchronous parent")
synchronous_parent()
print("--------------------------------")
print("running async parent")
trio.run(parent)

running synchronous parent
parent: started!
parent: spawning child1...
  child1: started! sleeping now...
  child1: woke up! exiting!
parent: spawning child2...
  child2: started! sleeping now...
  child2: woke up! exiting!
parent: all done! 5.001297235488892 seconds to complete
--------------------------------
running async parent
parent: started!
parent: spawning child1...
parent: spawning child2...
parent: waiting for children to finish...
  child2: started! sleeping now...
  child1: started! sleeping now...
  child2: woke up! exiting!
  child1: woke up! exiting!
parent: all done! 3.004193067550659 seconds to complete


### Part 3: Scrapping from the Web Concurrently
Now that you know about using async functions in trio. Your task will be to scrape three books from the internet. In the code snippet below, you are given three files to download. To download these files, you can start a ```httpx.AsyncClient()``` and use the client to call ```get``` function to download the code. 

```python
client = httpx.AsyncClient()
await client.get(url)
```

If your code is correct, you should see *War and Peace* to be the last download that finishes since it is the largest. 

In [22]:
import trio
import httpx
import os
import time
from urllib.parse import urlparse

# List of files to download
FILES_TO_DOWNLOAD = {
    "War and Peace": "https://www.gutenberg.org/files/2600/2600-0.txt",
    "Pride and Prejudice": "https://www.gutenberg.org/cache/epub/1342/pg1342.txt",
    "The Adventures of Sherlock Holmes": "https://www.gutenberg.org/files/1661/1661-0.txt",
}

# configuration
GLOBAL_TIMEOUT_SECONDS = 60
DOWNLOAD_DIR = "downloads"
CHUNK_SIZE = 4096  # 4KB chunks
os.makedirs(DOWNLOAD_DIR, exist_ok=True)

async def download_file(url, client, name, download_dir):
    """Download a file and track progress."""
    ### YOUR CODE STARTS HERE
    response = await client.get(url)
    with open(os.path.join(download_dir, f'{name}.txt'), "wb") as f:
        f.write(response.content)
    ### YOUR CODE ENDS HERE
    print(f"Downloaded {name} from {url}")
    print(response.content[:100])


async def main():
    """Main function that orchestrates the downloads."""
    async with httpx.AsyncClient() as client:
        ### YOUR CODE STARTS HERE
        async with trio.open_nursery() as nursery:
            for name, url in FILES_TO_DOWNLOAD.items():
                    print(f"Downloading {name}...")
                    nursery.start_soon(download_file, url, client, name, DOWNLOAD_DIR)
        ### YOUR CODE ENDS HERE

start_time = time.time()
trio.run(main)
end_time = time.time()
print(f"Total time taken: {end_time - start_time} seconds")


Downloading War and Peace...
Downloading Pride and Prejudice...
Downloading The Adventures of Sherlock Holmes...
Downloaded Pride and Prejudice from https://www.gutenberg.org/cache/epub/1342/pg1342.txt
b'\xef\xbb\xbfThe Project Gutenberg eBook of Pride and Prejudice\r\n    \r\nThis ebook is for the use of anyone any'
Downloaded The Adventures of Sherlock Holmes from https://www.gutenberg.org/files/1661/1661-0.txt
b'\xef\xbb\xbfThe Project Gutenberg eBook of The Adventures of Sherlock Holmes,\r\nby Arthur Conan Doyle\r\n\r\nThis '
Downloaded War and Peace from https://www.gutenberg.org/files/2600/2600-0.txt
b'\xef\xbb\xbfThe Project Gutenberg eBook of War and Peace, by Leo Tolstoy\r\n\r\nThis eBook is for the use of anyo'
Total time taken: 2.6476690769195557 seconds
