# Asyncio
* Asyncio is a related to multiprocessing and threading
* Threading
    * Achieves concurrent computing through the creation of multiple threads 
    * Restricted by the GIL (Global Interpreter Lock)
    * The GIL hampers the ability of threads to run in parallel
    * Useful for IO related tasks
* Multiprocessing
    * Achieves concurrent computing through multiple operating system processes
    * Operating system scheduling of processes on CPUs allows multiprocessing to achieve parallelism
    * Useful for CPU related tasks
* Asyncio
    * Achieves concurrent computing through thr use of "coroutines"
    * Built around the idea of cooperative multitasking 
    * Code can be scheduled / paused allowing other routines to execute
    * By avoiding blocking, tasks run when needed and yield when waiting
    * Features the feeling of concurrent code, but is single threaded

# Setup

In [13]:
#conda install aiohttp --yes

In [35]:
import asyncio
import aiohttp
import random
import requests
import time
import queue
import concurrent.futures
from contextlib import asynccontextmanager

In [15]:
import logging
import sys
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logger = logging.getLogger()
#logger.setLevel(logging.DEBUG)
logger.debug('Hello')

DEBUG:root:Hello


# Async, Await, Coroutine & Event Loops
* Async creates a coroutine 
* A coroutine is a generator representing a schedulable block of code
* When you call a coroutine, you await the results
* An event loop manages the execution of coroutines

In [16]:
async def work():
    logger.debug('Work Started')
    await asyncio.sleep(1)
    logger.debug('Work Finished')
    
await work()

DEBUG:root:Work Started
DEBUG:root:Work Finished


## Return
* If we want the results, we can assign the return value

In [17]:
async def work():
    logger.debug('Work Started')
    await asyncio.sleep(1)
    logger.debug('Work Finished')
    return 1
    
reply = await work()
print(reply)

DEBUG:root:Work Started
DEBUG:root:Work Finished
1


## Gather
* For multiple responses, we can gather them

In [18]:
async def work():
    logger.debug('Work Started')
    await asyncio.sleep(1)
    logger.debug('Work Finished')
    return 1
    
reply = await asyncio.gather(*[work(), work()])
print(reply)

DEBUG:root:Work Started
DEBUG:root:Work Started
DEBUG:root:Work Finished
DEBUG:root:Work Finished
[1, 1]


# Example
* We have some work to do
* We log starting and completion
* To simulate the work, we sleep for 2 seconds
* We have 5 work tasks to complete

In [19]:
async def work(task):
    logger.debug(f'  Starting Work {task}')
    await asyncio.sleep(2)
    logger.debug(f'  Finished Work {task}')
  
async def schedule(tasks):
    replies = await asyncio.gather(*(work(task) for task in range(tasks)))
    return replies

# Call if running in plain python
#if __name__ == '__main__':
#    asyncio.run(schedule(5))

reply = await schedule(5)

DEBUG:root:  Starting Work 0
DEBUG:root:  Starting Work 1
DEBUG:root:  Starting Work 2
DEBUG:root:  Starting Work 3
DEBUG:root:  Starting Work 4
DEBUG:root:  Finished Work 0
DEBUG:root:  Finished Work 1
DEBUG:root:  Finished Work 2
DEBUG:root:  Finished Work 3
DEBUG:root:  Finished Work 4


# Patterns
* In this next section, we look at several common patterns & uses
* Chained Jobs - Piping coroutines
* Callback - Getting a callback when complete
* Queue - Passing data between coroutines
* Polling - Periodically checking on a value
* Fire & Forget - Run a task in the background
* Context Manager - Processing a web page with context managers 

# Chained Jobs
* Chained coroutines is similar to piping commands in a shell
* In this trivial example we have:
    * Job3 --> Loops num of times
    * Job3 --> Calls Job2
    * Job2 --> Doubles the Value
    * Job2 --> Calls Job1
    * Job1 --> Adds One

In [20]:
async def job3(y):
    # Add 1 to value
    return y + 1

async def job2(x):
    # Double
    x = x * 2
    response = await job3(x)
    return response

async def job1(num):
    # Run num of times
    for value in range(num):
        response = await job2(value)
        logger.debug(f'{value} --> {response}')
        
j = await job1(10)

DEBUG:root:0 --> 1
DEBUG:root:1 --> 3
DEBUG:root:2 --> 5
DEBUG:root:3 --> 7
DEBUG:root:4 --> 9
DEBUG:root:5 --> 11
DEBUG:root:6 --> 13
DEBUG:root:7 --> 15
DEBUG:root:8 --> 17
DEBUG:root:9 --> 19


# Callback
* We have a long / slow task to perform
* When the task finishes we would like to perform an action
* A callback works well for this use case
* In the example, we'll fire a task and a future for the slow work

In [31]:
async def slow_task(name):
    logger.debug(f'  Slow Task {name} Started')
    result = random.randint(1,5)
    await asyncio.sleep(result)
    logger.debug(f'  Slow Task {name} Completed')
    logger.debug(f'  Slow Task {name} Result {result}')
    return name, result
    
def handle(future):
    logger.debug(f'Callback:  Received notice of completion')
    logger.debug(f'Callback:  Value : {future.result()}')

# Processing the Task
logger.debug('Create Task')
task = asyncio.create_task(slow_task('Task'))
task.add_done_callback(handle)

await asyncio.sleep(5)

DEBUG:root:Create Task
DEBUG:root:Create Future
DEBUG:root:  Slow Task Task Started
DEBUG:root:  Slow Task Future Started
DEBUG:root:  Slow Task Future Completed
DEBUG:root:  Slow Task Future Result 1
DEBUG:root:Callback:  Received notice of completion
DEBUG:root:Callback:  Value : ('Future', 1)
DEBUG:root:  Slow Task Task Completed
DEBUG:root:  Slow Task Task Result 3
DEBUG:root:Callback:  Received notice of completion
DEBUG:root:Callback:  Value : ('Task', 3)


# Queues

In [32]:
async def consume(q):
    while True:
        job = await q.get()
        logger.debug(f'<--  Consumer Handling {job}')
        q.task_done()
        await asyncio.sleep(.5)

async def produce(q):
    job_counter = 1
    while True:
        jobs = random.randint(1,3)
        for j in range(jobs):
            logger.debug(f'-->  Producer Creating {job_counter}')
            await q.put(str(job_counter))
            job_counter += 1
        await asyncio.sleep(random.random())
    
logger.debug('Main Creating Queue, Producer, Consumer')
q = asyncio.Queue()
consumer = asyncio.create_task(consume(q))
producer = asyncio.create_task(produce(q))

await asyncio.sleep(5)

logger.debug('Main Stopping Producer')
producer.cancel()

logger.debug('Draining Queue & Stopping Consumer')
await q.join()
consumer.cancel()

logger.debug('Completed')

DEBUG:root:Main Creating Queue, Producer, Consumer
DEBUG:root:-->  Producer Creating 1
DEBUG:root:-->  Producer Creating 2
DEBUG:root:<--  Consumer Handling 1
DEBUG:root:-->  Producer Creating 3
DEBUG:root:<--  Consumer Handling 2
DEBUG:root:-->  Producer Creating 4
DEBUG:root:-->  Producer Creating 5
DEBUG:root:-->  Producer Creating 6
DEBUG:root:-->  Producer Creating 7
DEBUG:root:-->  Producer Creating 8
DEBUG:root:<--  Consumer Handling 3
DEBUG:root:-->  Producer Creating 9
DEBUG:root:-->  Producer Creating 10
DEBUG:root:-->  Producer Creating 11
DEBUG:root:-->  Producer Creating 12
DEBUG:root:-->  Producer Creating 13
DEBUG:root:<--  Consumer Handling 4
DEBUG:root:<--  Consumer Handling 5
DEBUG:root:-->  Producer Creating 14
DEBUG:root:<--  Consumer Handling 6
DEBUG:root:<--  Consumer Handling 7
DEBUG:root:-->  Producer Creating 15
DEBUG:root:-->  Producer Creating 16
DEBUG:root:-->  Producer Creating 17
DEBUG:root:<--  Consumer Handling 8
DEBUG:root:-->  Producer Creating 18
DEBU

# Polling
* In this example, we create a poll task
* That task runs in a while loop, periodically sleeping
* When we want to quit, we cancel the task

In [23]:
async def poll():
    while True:
        logger.debug('  Poller Running')
        await asyncio.sleep(.5)

async def main():
    logger.debug('Main Starting Poller')
    loop = asyncio.get_running_loop()
    task = loop.create_task(poll())
    await asyncio.sleep(5)
    logger.debug('Main Cancelling Poller')
    task.cancel()
    
await main()

DEBUG:root:Main Starting Poller
DEBUG:root:  Poller Running
DEBUG:root:  Poller Running
DEBUG:root:  Poller Running
DEBUG:root:  Poller Running
DEBUG:root:  Poller Running
DEBUG:root:  Poller Running
DEBUG:root:  Poller Running
DEBUG:root:  Poller Running
DEBUG:root:  Poller Running
DEBUG:root:  Poller Running
DEBUG:root:Main Cancelling Poller


# Fire & Forget
* In this pattern we want to setup a task to operate in the background
* We won't won't be concerned about completion

In [33]:
async def background_task(type):
    logger.debug(f'  Starting Background Task : {type}')
    await asyncio.sleep(5)
    logger.debug(f'  Completed Task : {type}')
    
logger.debug('Fire off Background Task')
asyncio.create_task(background_task('Cleanup')) 
logger.debug('Continuing On')

DEBUG:root:Fire off Background Task
DEBUG:root:Continuing On
DEBUG:root:  Starting Background Task : Cleanup
DEBUG:root:  Completed Task : Cleanup


# Context Manager
* We'll use an annotation wrapping the process method which will handle:
    * dunder aentet
    * dunder aexit
* Process will be responsible for:
    * Pulling the web page (with requests)
    * Yielding the page data
* Caller will be responsible for:
    * Invoking process in an asynch with block
    * Calling parse on the response

In [44]:
@asynccontextmanager  
async def process(url):  
    logger.debug(f'Processing {url}')
    page = pull(url)  
    yield page  
    
def pull(url):
    logger.debug(f'Pulling {url}')
    r = requests.get(url)
    return r.text
    
def parse(page):
    logger.debug(f'Handling Response')

async with process('http://www.microsoft.com') as page:
    parse(page)

DEBUG:root:Processing http://www.microsoft.com
DEBUG:root:Pulling http://www.microsoft.com
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): www.microsoft.com:80
DEBUG:urllib3.connectionpool:http://www.microsoft.com:80 "GET / HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.microsoft.com:443
DEBUG:urllib3.connectionpool:https://www.microsoft.com:443 "GET /en-us/ HTTP/1.1" 200 39438
DEBUG:root:Handling Response


## Jupyter Environment
* We're running a notebook and asyncio.get_event_loop returns an active loop
* That prevents us from calling run_until_complete
* We have two options:
    * Add a task to the Jupyter event loop (task is a future that wraps completion of a coroutine)
    * Use run_coroutine_threadsafe (returns a future)

In [25]:
# Exception
try:
    loop = asyncio.get_event_loop()
    loop.run_until_complete(background_task('Jupyter'))
except Exception as fail:
    print(f'Exception : {fail}')

Exception : This event loop is already running


In [26]:
loop = asyncio.get_event_loop()
loop.create_task(background_task('Jupyter'))

<Task pending coro=<background_task() running at <ipython-input-24-3515b96dc3aa>:1>>

In [27]:
loop = asyncio.get_event_loop()
asyncio.run_coroutine_threadsafe(background_task('Jupyter'), loop)

<Future at 0x10d809090 state=pending>

# Worker Pool
* Rather than running the requests in the main loop
* We'll create an executor pool and execute the coroutines in the pool

In [28]:
def pull(url):
    logger.debug('Pulling')
    r = requests.get(url)
    return r

with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    loop = asyncio.get_event_loop()   
    futures = [
        loop.run_in_executor(executor, pull, 'http://www.microsoft.com') 
            for _ in range(10)
    ]
    await asyncio.gather(*futures)

DEBUG:root:  Starting Background Task : Cleanup
DEBUG:root:  Starting Background Task : Jupyter
DEBUG:root:  Starting Background Task : Jupyter
DEBUG:root:Pulling
DEBUG:root:Pulling
DEBUG:root:Pulling
DEBUG:root:Pulling
DEBUG:root:Pulling
DEBUG:root:  Starting Background Task : Jupyter
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): www.microsoft.com:80
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): www.microsoft.com:80
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): www.microsoft.com:80
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): www.microsoft.com:80
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): www.microsoft.com:80
DEBUG:urllib3.connectionpool:http://www.microsoft.com:80 "GET / HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.microsoft.com:443
DEBUG:urllib3.connectionpool:http://www.microsoft.com:80 "GET / HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:http://www.microso

# Web Crawling - Scraping 

## Synchronous Approach
* Create "n" synchronous tasks that pull a url
* Each task uses requests' get method on the url
* We time each HTTP request & the overall time

In [29]:
def pull_sync(url):
    res   = requests.get(url)

def run_sync(url, num=5):
    for counter in range(num):
        start = time.time()
        pull_sync(url)
        end   = time.time()
        logger.debug(f'  Synchronous {end - start:.2f} seconds')    
   
logger.debug('Running Synchronous Pulls')
start = time.time()
run_sync('http://www.microsoft.com')
end = time.time()
logger.debug(f'Completed in {end - start:.2f} seconds')

DEBUG:root:Running Synchronous Pulls
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): www.microsoft.com:80
DEBUG:urllib3.connectionpool:http://www.microsoft.com:80 "GET / HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.microsoft.com:443
DEBUG:urllib3.connectionpool:https://www.microsoft.com:443 "GET /en-us/ HTTP/1.1" 200 39437
DEBUG:root:  Synchronous 0.75 seconds
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): www.microsoft.com:80
DEBUG:urllib3.connectionpool:http://www.microsoft.com:80 "GET / HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.microsoft.com:443
DEBUG:urllib3.connectionpool:https://www.microsoft.com:443 "GET /en-us/ HTTP/1.1" 200 39440
DEBUG:root:  Synchronous 0.47 seconds
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): www.microsoft.com:80
DEBUG:urllib3.connectionpool:http://www.microsoft.com:80 "GET / HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:Starting n

## Asynchronous Approach
* Create "n" asynchronous tasks that pull a url
* Each task uses aiohttp's ClientSession to perform a get on the url
* We time each HTTP request & the overall time

In [30]:
async def pull_aiohttp(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as r:
            return r
        
async def pull_async(url):
    start = time.time()
    await pull_aiohttp(url)
    end   = time.time()
    logger.debug(f'  Asynchronous {end - start:.2f} seconds') 
        
async def run_async(url, num=5):
    pulls = [asyncio.create_task(pull_async(url)) for counter in range(num)]
    await asyncio.wait(pulls)          
               
logger.debug('Running Asynchronous Pulls')
start = time.time()
await run_async('http://www.microsoft.com')
end = time.time()
logger.debug(f'Completed in {end - start:.2f} seconds')

DEBUG:root:Running Asynchronous Pulls
DEBUG:root:  Asynchronous 0.45 seconds
DEBUG:root:  Asynchronous 0.42 seconds
DEBUG:root:  Asynchronous 0.43 seconds
DEBUG:root:  Asynchronous 0.43 seconds
DEBUG:root:  Asynchronous 0.44 seconds
DEBUG:root:Completed in 0.48 seconds
DEBUG:root:  Completed Task : Cleanup
DEBUG:root:  Completed Task : Jupyter
DEBUG:root:  Completed Task : Jupyter
DEBUG:root:  Completed Task : Jupyter
