# making sync funcs async

Trying out how to use async with I/O bound functions:

In [1]:
import asyncio
import requests

Here I have a simple slow function, which uses a non async library, requests, to grab a url and return the len of the text.

In [2]:
urls = ["google.com", "bing.com", "amazon.com", "google.com", "microsoft.com"]*100

def get_url_size(url="google.com"):
    r = requests.get("http://"+ url)
    return len(r.text)

%time get_url_size()

CPU times: user 21 ms, sys: 2 ms, total: 23 ms
Wall time: 763 ms


10689

Almost a second! I've cut down the number of urls to 20 so it doesn't take forever:

In [3]:
%time results = [get_url_size(url) for url in urls[:20]]
print(f"got {len(results)} first 10 results:", results[:10])

CPU times: user 860 ms, sys: 82 ms, total: 942 ms
Wall time: 37.3 s
got 20 first 10 results: [10708, 107607, 2671, 10701, 160820, 10672, 107607, 471175, 10696, 160815]


This is taking forever, or 30 seconds to fetch 20 urls. Now with true async code we should be looking at a fraction of this time, depending on how many threads are running. 

Now, I should be able to rewrite `get_url_size` like so:

```
async def get_url_size(url="google.com"):
    r = await requests.get("http://"+ url)
    return len(r.text)
```

But this fails, as the function being called, `requests.get` in this case, needs to be async enabled. Since its old school code, it holds on to the CPU for dear life and doesn't let go until it returns. 

So we need to use threads to run it in parallel:

Below, I use the default threads executor to spawn multiple threads to run the non async function above:

In [4]:
%%time

async def main():

    loop = asyncio.get_event_loop()
    futures = [loop.run_in_executor(None, get_url_size, url) for url in urls[:20]]
    
    results = []
    
    for response in await asyncio.gather(*futures):
        results.append(response)
    
    return results

loop = asyncio.get_event_loop()
results = loop.run_until_complete(main())
print(f"got {len(results)} first 10 results:", results[:10])

got 20 first 10 results: [10691, 107607, 397539, 10666, 160810, 10675, 107607, 2671, 10720, 160810]
CPU times: user 926 ms, sys: 120 ms, total: 1.05 s
Wall time: 4.03 s


So this is heaps faster, took 4 seconds to get 20 urls. Since grabbing one url takes just a second, theoritcally this whole op could be as fast as 1 second + overhead time. 

The reason for the fast but not super fast is the [default executor](https://docs.python.org/3/library/asyncio-eventloop.html#executor):

>  If max_workers is None or not given, it will default to the number of processors on the machine, multiplied by 5, assuming that ThreadPoolExecutor is often used to overlap I/O instead of CPU work and the number of workers should be higher than the number of workers for ProcessPoolExecutor.

Now, I'm doing I/O bound ops, so the internet says:

- For CPU-bound workloads: use ProcessPoolExecutor
- For I/O-bound workloads: use ThreadPoolExecutor

So to go faster we can specify max threads for ThreadPoolExecutor:

In [5]:
%%time
async def main_2(num_threads=100):
    
    with concurrent.futures.ThreadPoolExecutor(max_workers=num_threads) as executor:

        loop = asyncio.get_event_loop()
        futures = [loop.run_in_executor(executor, get_url_size, url) for url in urls]

        results = []

        for response in await asyncio.gather(*futures):
            results.append(response)

    return results

loop = asyncio.get_event_loop()
results = loop.run_until_complete(main_2())
print(f"got {len(results)} first 10 results:", results[:10])

NameError: name 'concurrent' is not defined

So here the second version is faster, though in the same ballpark. Lets see what tinkering with the number of threads does:

In [140]:
%%time
loop = asyncio.get_event_loop()
results = loop.run_until_complete(main_2(200))
print(f"got {len(results)} first 10 results:", results[:10])

got 500 first 10 results: [2470, 105873, 2671, 2470, 146685, 2470, 105245, 2671, 2470, 146581]
CPU times: user 16.4 s, sys: 1.83 s, total: 18.2 s
Wall time: 24.8 s


Suprisingly, not much from doubling threads from 100 to 200. So really, most of the time the default executor is fine for IO bound ops. 