## Task 4p: Asynchronous Conncurrency with Trio
**Goal**: We will learn about synchronous operations using a python library called Trio

**Learning Outcomes**: Gain a basic understanding of parallelized I/O, learn how to use Trio to implement parallelization. 

**Prerequisites**: Basic understanding of python.

### Part 1: Introduction to Parallelization
Parallelization allows you to do multiple things the same time. This is particularily important for tasks that involved waiting like webscraping. 

For example, if you want to fetch a lot of webpages in parallel, your program needs to manager lots of connections at the same time. 

First, we will introduce you to async functions in python. Here is how you would define a regular function compared to an as
```python
# A regular function
def regular_double(x):
    return 2 * x

# An async function
async def async_double(x):
    return 2 * x
```

To call an async function, use the ```await``` keyword inside an async function.
```python
async def print_double(x):
    print(await async_double(x)) 
```

In [6]:
from IPython.display import Image

## Part 2: Trio, an async library

To call your async function, all you have to do is call ```trio.run```, for example: 
```python
import trio

async def async_double(x):
    return 2 * x

trio.run(async_double, 3)  # returns 6
```

Here the second parameter is the input to the ```async_double``` function. 

To really see async in action, we need to observed multiple functions running in parallel, below is an example. When you run this example, you can see that even though child 1 is sleeping longer than child 2, child 2 can spawn and finish before child 1 because both functions are running in parallel. Here is an image to help you understand: 

![Image](../img/trio-image.jpg)

In [7]:
import trio
import time

### SYNCHRONOUS EXECUTION
def synchronous_child1():
    print("  child1: started! sleeping now...")
    time.sleep(3)
    print("  child1: woke up! exiting!")


def synchronous_child2():
    print("  child2: started! sleeping now...")
    time.sleep(2)
    print("  child2: woke up! exiting!")


def synchronous_parent():
    start_time = time.time()
    print("parent: started!")
    
    print("parent: spawning child1...")
    synchronous_child1()

    print("parent: spawning child2...")
    synchronous_child2()
    end_time = time.time()
    print(f"parent: all done! {end_time - start_time} seconds to complete")

## ASYNCHRONOUS EXECUTION
async def child1():
    print("  child1: started! sleeping now...")
    await trio.sleep(3)
    print("  child1: woke up! exiting!")


async def child2():
    print("  child2: started! sleeping now...")
    await trio.sleep(2)
    print("  child2: woke up! exiting!")


async def parent():
    start_time = time.time()
    print("parent: started!")
    async with trio.open_nursery() as nursery:
        print("parent: spawning child1...")
        nursery.start_soon(child1)

        print("parent: spawning child2...")
        nursery.start_soon(child2)

        print("parent: waiting for children to finish...")
        # -- we exit the nursery block here --
    end_time = time.time()
    print(f"parent: all done! {end_time - start_time} seconds to complete")

print("running synchronous parent")
synchronous_parent()
print("--------------------------------")
print("running async parent")
trio.run(parent)

running synchronous parent
parent: started!
parent: spawning child1...
  child1: started! sleeping now...
  child1: woke up! exiting!
parent: spawning child2...
  child2: started! sleeping now...
  child2: woke up! exiting!
parent: all done! 5.001297235488892 seconds to complete
--------------------------------
running async parent
parent: started!
parent: spawning child1...
parent: spawning child2...
parent: waiting for children to finish...
  child2: started! sleeping now...
  child1: started! sleeping now...
  child2: woke up! exiting!
  child1: woke up! exiting!
parent: all done! 3.004193067550659 seconds to complete


### Part 3: Scrapping from the Web Concurrently
Now that you know about using async functions in trio. Your task will be to scrape three books from the internet. In the code snippet below, you are given three files to download. To download these files, you can start a ```httpx.AsyncClient()``` and use the client to call ```get``` function to download the code. 

```python
client = httpx.AsyncClient()
await client.get(url)
```

If your code is correct, you should see *War and Peace* to be the last download that finishes since it is the largest. 

In [None]:
## TASK ONLY
import trio
import httpx
import os
import time
from urllib.parse import urlparse

# List of files to download
FILES_TO_DOWNLOAD = {
    "War and Peace": "https://www.gutenberg.org/files/2600/2600-0.txt",
    "Pride and Prejudice": "https://www.gutenberg.org/cache/epub/1342/pg1342.txt",
    "The Adventures of Sherlock Holmes": "https://www.gutenberg.org/files/1661/1661-0.txt",
}

# configuration
GLOBAL_TIMEOUT_SECONDS = 60
DOWNLOAD_DIR = "downloads"
CHUNK_SIZE = 4096  # 4KB chunks
os.makedirs(DOWNLOAD_DIR, exist_ok=True)

async def download_file(url, client, name, download_dir):
    """Download a file and track progress."""
    ### YOUR CODE STARTS HERE

    ### YOUR CODE ENDS HERE
    print(f"Downloaded {name} from {url}")
    print(response.content[:100])


async def main():
    """Main function that orchestrates the downloads."""
    async with httpx.AsyncClient() as client:
        ### YOUR CODE STARTS HERE
        pass 
        ### YOUR CODE ENDS HERE

start_time = time.time()
trio.run(main)
end_time = time.time()
print(f"Total time taken: {end_time - start_time} seconds")

In [2]:
# CLAUDE ANSWER more complicated than my own answer
import trio
import httpx
import os
import time
from urllib.parse import urlparse

# List of files to download
import trio
import httpx
import os
import time
from urllib.parse import urlparse

# List of files to download
FILES_TO_DOWNLOAD = {
    "War and Peace": "https://www.gutenberg.org/files/2600/2600-0.txt",
    "Pride and Prejudice": "https://www.gutenberg.org/cache/epub/1342/pg1342.txt",
    "The Adventures of Sherlock Holmes": "https://www.gutenberg.org/files/1661/1661-0.txt",
}

# configuration
GLOBAL_TIMEOUT_SECONDS = 60
DOWNLOAD_DIR = "downloads"
CHUNK_SIZE = 4096  # 4KB chunks
os.makedirs(DOWNLOAD_DIR, exist_ok=True)

async def download_file(url, client, name, download_dir):
    """Download a file and track progress."""
    file_path = os.path.join(download_dir, f"{name}.txt")
    
    async with client.stream('GET', url) as response:
        total_size = int(response.headers.get('content-length', 0))
        bytes_downloaded = 0
        
        with open(file_path, 'wb') as f:
            async for chunk in response.aiter_bytes(chunk_size=CHUNK_SIZE):
                f.write(chunk)
                bytes_downloaded += len(chunk)
                progress = (bytes_downloaded / total_size) * 100 if total_size > 0 else 0
                print(f"\rDownloading {name}: {progress:.2f}%", end="", flush=True)
    
    print(f"\nDownloaded {name} from {url}")
    with open(file_path, 'r', encoding='utf-8') as f:
        print(f.read(100))

async def main():
    """Main function that orchestrates the downloads."""
    async with httpx.AsyncClient() as client:
        async with trio.open_nursery() as nursery:
            for name, url in FILES_TO_DOWNLOAD.items():
                nursery.start_soon(download_file, url, client, name, DOWNLOAD_DIR)

start_time = time.time()
trio.run(main)
end_time = time.time()
print(f"Total time taken: {end_time - start_time} seconds")

Downloading The Adventures of Sherlock Holmes: 100.00%
Downloaded The Adventures of Sherlock Holmes from https://www.gutenberg.org/files/1661/1661-0.txt
﻿The Project Gutenberg eBook of The Adventures of Sherlock Holmes,
by Arthur Conan Doyle

This eBook
Downloading Pride and Prejudice: 100.00%
Downloaded Pride and Prejudice from https://www.gutenberg.org/cache/epub/1342/pg1342.txt
﻿The Project Gutenberg eBook of Pride and Prejudice
    
This ebook is for the use of anyone anywher
Downloading War and Peace: 100.00%
Downloaded War and Peace from https://www.gutenberg.org/files/2600/2600-0.txt
﻿The Project Gutenberg eBook of War and Peace, by Leo Tolstoy

This eBook is for the use of anyone a
Total time taken: 1.2005910873413086 seconds


In [22]:
# MY ANSWER
import trio
import httpx
import os
import time
from urllib.parse import urlparse

# List of files to download
FILES_TO_DOWNLOAD = {
    "War and Peace": "https://www.gutenberg.org/files/2600/2600-0.txt",
    "Pride and Prejudice": "https://www.gutenberg.org/cache/epub/1342/pg1342.txt",
    "The Adventures of Sherlock Holmes": "https://www.gutenberg.org/files/1661/1661-0.txt",
}

# configuration
GLOBAL_TIMEOUT_SECONDS = 60
DOWNLOAD_DIR = "downloads"
CHUNK_SIZE = 4096  # 4KB chunks
os.makedirs(DOWNLOAD_DIR, exist_ok=True)

async def download_file(url, client, name, download_dir):
    """Download a file and track progress."""
    ### YOUR CODE STARTS HERE
    response = await client.get(url)
    with open(os.path.join(download_dir, f'{name}.txt'), "wb") as f:
        f.write(response.content)
    ### YOUR CODE ENDS HERE
    print(f"Downloaded {name} from {url}")
    print(response.content[:100])


async def main():
    """Main function that orchestrates the downloads."""
    async with httpx.AsyncClient() as client:
        ### YOUR CODE STARTS HERE
        async with trio.open_nursery() as nursery:
            for name, url in FILES_TO_DOWNLOAD.items():
                    print(f"Downloading {name}...")
                    nursery.start_soon(download_file, url, client, name, DOWNLOAD_DIR)
        ### YOUR CODE ENDS HERE

start_time = time.time()
trio.run(main)
end_time = time.time()
print(f"Total time taken: {end_time - start_time} seconds")


Downloading War and Peace...
Downloading Pride and Prejudice...
Downloading The Adventures of Sherlock Holmes...
Downloaded Pride and Prejudice from https://www.gutenberg.org/cache/epub/1342/pg1342.txt
b'\xef\xbb\xbfThe Project Gutenberg eBook of Pride and Prejudice\r\n    \r\nThis ebook is for the use of anyone any'
Downloaded The Adventures of Sherlock Holmes from https://www.gutenberg.org/files/1661/1661-0.txt
b'\xef\xbb\xbfThe Project Gutenberg eBook of The Adventures of Sherlock Holmes,\r\nby Arthur Conan Doyle\r\n\r\nThis '
Downloaded War and Peace from https://www.gutenberg.org/files/2600/2600-0.txt
b'\xef\xbb\xbfThe Project Gutenberg eBook of War and Peace, by Leo Tolstoy\r\n\r\nThis eBook is for the use of anyo'
Total time taken: 2.6476690769195557 seconds


In [25]:
import json
async with httpx.AsyncClient() as client:
    result = await client.get("https://api.worldbank.org/v2/country/br/indicator/NY.GDP.MKTP.CD?date=2006:2020&format=json")
    print(result.json())


[{'page': 1, 'pages': 1, 'per_page': 50, 'total': 15, 'sourceid': '2', 'lastupdated': '2025-03-24'}, [{'indicator': {'id': 'NY.GDP.MKTP.CD', 'value': 'GDP (current US$)'}, 'country': {'id': 'BR', 'value': 'Brazil'}, 'countryiso3code': 'BRA', 'date': '2020', 'value': 1476107231194.11, 'unit': '', 'obs_status': '', 'decimal': 0}, {'indicator': {'id': 'NY.GDP.MKTP.CD', 'value': 'GDP (current US$)'}, 'country': {'id': 'BR', 'value': 'Brazil'}, 'countryiso3code': 'BRA', 'date': '2019', 'value': 1873288205186.45, 'unit': '', 'obs_status': '', 'decimal': 0}, {'indicator': {'id': 'NY.GDP.MKTP.CD', 'value': 'GDP (current US$)'}, 'country': {'id': 'BR', 'value': 'Brazil'}, 'countryiso3code': 'BRA', 'date': '2018', 'value': 1916933898038.36, 'unit': '', 'obs_status': '', 'decimal': 0}, {'indicator': {'id': 'NY.GDP.MKTP.CD', 'value': 'GDP (current US$)'}, 'country': {'id': 'BR', 'value': 'Brazil'}, 'countryiso3code': 'BRA', 'date': '2017', 'value': 2063514977334.32, 'unit': '', 'obs_status': '', '

In [26]:
import pandas as pd
pd.DataFrame.from_dict(result.json()[1])

Unnamed: 0,indicator,country,countryiso3code,date,value,unit,obs_status,decimal
0,"{'id': 'NY.GDP.MKTP.CD', 'value': 'GDP (curren...","{'id': 'BR', 'value': 'Brazil'}",BRA,2020,1476107000000.0,,,0
1,"{'id': 'NY.GDP.MKTP.CD', 'value': 'GDP (curren...","{'id': 'BR', 'value': 'Brazil'}",BRA,2019,1873288000000.0,,,0
2,"{'id': 'NY.GDP.MKTP.CD', 'value': 'GDP (curren...","{'id': 'BR', 'value': 'Brazil'}",BRA,2018,1916934000000.0,,,0
3,"{'id': 'NY.GDP.MKTP.CD', 'value': 'GDP (curren...","{'id': 'BR', 'value': 'Brazil'}",BRA,2017,2063515000000.0,,,0
4,"{'id': 'NY.GDP.MKTP.CD', 'value': 'GDP (curren...","{'id': 'BR', 'value': 'Brazil'}",BRA,2016,1795693000000.0,,,0
5,"{'id': 'NY.GDP.MKTP.CD', 'value': 'GDP (curren...","{'id': 'BR', 'value': 'Brazil'}",BRA,2015,1802212000000.0,,,0
6,"{'id': 'NY.GDP.MKTP.CD', 'value': 'GDP (curren...","{'id': 'BR', 'value': 'Brazil'}",BRA,2014,2456044000000.0,,,0
7,"{'id': 'NY.GDP.MKTP.CD', 'value': 'GDP (curren...","{'id': 'BR', 'value': 'Brazil'}",BRA,2013,2472820000000.0,,,0
8,"{'id': 'NY.GDP.MKTP.CD', 'value': 'GDP (curren...","{'id': 'BR', 'value': 'Brazil'}",BRA,2012,2465228000000.0,,,0
9,"{'id': 'NY.GDP.MKTP.CD', 'value': 'GDP (curren...","{'id': 'BR', 'value': 'Brazil'}",BRA,2011,2616156000000.0,,,0


In [27]:
import requests
all_countries_url = "https://api.worldbank.org/v2/country?format=json"

response = requests.get(all_countries_url)
all_countries = pd.DataFrame.from_dict(response.json()[1]) 

In [28]:
all_countries

Unnamed: 0,id,iso2Code,name,region,adminregion,incomeLevel,lendingType,capitalCity,longitude,latitude
0,ABW,AW,Aruba,"{'id': 'LCN', 'iso2code': 'ZJ', 'value': 'Lati...","{'id': '', 'iso2code': '', 'value': ''}","{'id': 'HIC', 'iso2code': 'XD', 'value': 'High...","{'id': 'LNX', 'iso2code': 'XX', 'value': 'Not ...",Oranjestad,-70.0167,12.5167
1,AFE,ZH,Africa Eastern and Southern,"{'id': 'NA', 'iso2code': 'NA', 'value': 'Aggre...","{'id': '', 'iso2code': '', 'value': ''}","{'id': 'NA', 'iso2code': 'NA', 'value': 'Aggre...","{'id': '', 'iso2code': '', 'value': 'Aggregates'}",,,
2,AFG,AF,Afghanistan,"{'id': 'SAS', 'iso2code': '8S', 'value': 'Sout...","{'id': 'SAS', 'iso2code': '8S', 'value': 'Sout...","{'id': 'LIC', 'iso2code': 'XM', 'value': 'Low ...","{'id': 'IDX', 'iso2code': 'XI', 'value': 'IDA'}",Kabul,69.1761,34.5228
3,AFR,A9,Africa,"{'id': 'NA', 'iso2code': 'NA', 'value': 'Aggre...","{'id': '', 'iso2code': '', 'value': ''}","{'id': 'NA', 'iso2code': 'NA', 'value': 'Aggre...","{'id': '', 'iso2code': '', 'value': 'Aggregates'}",,,
4,AFW,ZI,Africa Western and Central,"{'id': 'NA', 'iso2code': 'NA', 'value': 'Aggre...","{'id': '', 'iso2code': '', 'value': ''}","{'id': 'NA', 'iso2code': 'NA', 'value': 'Aggre...","{'id': '', 'iso2code': '', 'value': 'Aggregates'}",,,
5,AGO,AO,Angola,"{'id': 'SSF', 'iso2code': 'ZG', 'value': 'Sub-...","{'id': 'SSA', 'iso2code': 'ZF', 'value': 'Sub-...","{'id': 'LMC', 'iso2code': 'XN', 'value': 'Lowe...","{'id': 'IBD', 'iso2code': 'XF', 'value': 'IBRD'}",Luanda,13.242,-8.81155
6,ALB,AL,Albania,"{'id': 'ECS', 'iso2code': 'Z7', 'value': 'Euro...","{'id': 'ECA', 'iso2code': '7E', 'value': 'Euro...","{'id': 'UMC', 'iso2code': 'XT', 'value': 'Uppe...","{'id': 'IBD', 'iso2code': 'XF', 'value': 'IBRD'}",Tirane,19.8172,41.3317
7,AND,AD,Andorra,"{'id': 'ECS', 'iso2code': 'Z7', 'value': 'Euro...","{'id': '', 'iso2code': '', 'value': ''}","{'id': 'HIC', 'iso2code': 'XD', 'value': 'High...","{'id': 'LNX', 'iso2code': 'XX', 'value': 'Not ...",Andorra la Vella,1.5218,42.5075
8,ARB,1A,Arab World,"{'id': 'NA', 'iso2code': 'NA', 'value': 'Aggre...","{'id': '', 'iso2code': '', 'value': ''}","{'id': 'NA', 'iso2code': 'NA', 'value': 'Aggre...","{'id': '', 'iso2code': '', 'value': 'Aggregates'}",,,
9,ARE,AE,United Arab Emirates,"{'id': 'MEA', 'iso2code': 'ZQ', 'value': 'Midd...","{'id': '', 'iso2code': '', 'value': ''}","{'id': 'HIC', 'iso2code': 'XD', 'value': 'High...","{'id': 'LNX', 'iso2code': 'XX', 'value': 'Not ...",Abu Dhabi,54.3705,24.4764
