## ASYNCIO Theory

- <span style='font-size:16px;'> coroutines defined with operator async </span> 
```python 
async def foo():
    print('hello')
    await asyncio.sleep(1)
    print('world')
asyncio.run(foo()) 
await foo() # in jupyter (because jupyter runs it's own event_loop, which causes problems, moreover loops cannot be nested)
``` 
- <span style='font-size:16px;'> await passes management to event_loop </span>
- <span style='font-size:16px;'> nested coroutines </span>

```python 
import asyncio
import time

async def say_after(delay, what):
    await asyncio.sleep(delay)
    print(what)

async def main():
    print(f"started at {time.strftime('%X')}")

    await say_after(1, 'hello')
    await say_after(2, 'world')

    print(f"finished at {time.strftime('%X')}")

asyncio.run(main()) # in jupyter change to await 
```
- <span style='font-size:16px;'> create_task function that allows run coroutines concurentently (as i understand simplyfied syntax) </span>

```python 
async def main():
    task1 = asyncio.create_task(
        say_after(1, 'hello'))

    task2 = asyncio.create_task(
        say_after(2, 'world'))

    print(f"started at {time.strftime('%X')}")

    # Wait until both tasks are completed (should take
    # around 2 seconds.)
    await task1
    await task2

    print(f"finished at {time.strftime('%X')}")
await main() # in jupyter
```
- <span style='font-size:16px'> awaitable object - object that can be used in await expression </span>
    - coroutines
    - tasks 
    - futures

### Coroutines:

---
**Object that has async operator and also obkecj returned by another coroutine**

---
```python
import asyncio

async def nested():
    return 42

async def main():
    # Nothing happens if we just call "nested()".
    # A coroutine object is created but not awaited,
    # so it *won't run at all*.
    nested()

    # Let's do it differently now and await it:
    print(await nested())  # will print "42".

asyncio.run(main())
```
### Tasks 

---
**Tasks are used to schedule coroutines concurrently.**

---
```python
import asyncio

async def nested():
    return 42

async def main():
    # Schedule nested() to run soon concurrently
    # with "main()".
    task = asyncio.create_task(nested())

    # "task" can now be used to cancel "nested()", or
    # can simply be awaited to wait until it is complete:
    await task

asyncio.run(main())
```
NOTE: task behaves differently - return do tnot printed, it probabaly has higher api

### Futures

---
**A Future is a special low-level awaitable object that represents an eventual result of an asynchronous operation.** <br>
When a Future object is awaited it means that the coroutine will wait until the Future is resolved in some other place. <br>
Future objects in asyncio are needed to allow callback-based code to be used with async/await.

**Normally there is no need to create Future objects at the application level code.**

---

```python 
async def main():
    await function_that_returns_a_future_object()

    # this is also valid:
    await asyncio.gather(
        function_that_returns_a_future_object(),
        some_python_coroutine()
    )

```
NOTE: we do not really need this? 

In [24]:
## small example 

async def foo():
    await asyncio.sleep(1)
    return 4

async def boo():
    await asyncio.sleep(2)
    return 5

async def main():
    task1 = asyncio.create_task(foo())
    task2 = asyncio.create_task(boo())
    
    return await asyncio.gather(*[task1, task2])

f = await main()
f

[4, 5]

## Problem

---
Choose appropriate proxy for requests

In [20]:
from bs4 import BeautifulSoup
import requests 
import pandas as pd 
import asyncio
from aiohttp import ClientSession
import time
import operator as op
import random

Pipeline: 
- list of proxies. 
- function my_ip call api to check ip adress. 
- iterate over all possible proxies to find availble and fast.
- sort by callback time, choose 10-15 fastest, randomly pick up one and put into proxy settings.

In [11]:
def getProxies():
    '''
    Function returns pandas dataframe of available proxies from site: https://free-proxy-list.net/
    NOTE: probabaly it's not the best place to get proxies, therefore TODO: add other resources.
    '''
    r = requests.get('https://free-proxy-list.net/') #, headers=header
    soup = BeautifulSoup(r.text, 'html.parser')
    t = soup.find_all('table', {'id':'proxylisttable'})
    a = pd.read_html(str(t[0]), flavor='bs4', skiprows=0)[0]
    a = a.dropna()
    a['Port'] = a['Port'].astype(int)
    a[['IP Address', 'Port']] = a[['IP Address', 'Port']].astype(str)
    a['target'] = a[['IP Address', 'Port']].astype(str).apply(lambda x: ':'.join([str(x[0]), str(x[1])]), axis=1)
    return a

In [15]:
p = getProxies()

In [16]:
p.head()

Unnamed: 0,IP Address,Port,Code,Country,Anonymity,Google,Https,Last Checked,target
0,103.78.80.194,33442,ID,Indonesia,elite proxy,no,no,1 minute ago,103.78.80.194:33442
1,115.87.110.143,8118,TH,Thailand,anonymous,no,yes,1 minute ago,115.87.110.143:8118
2,5.167.255.227,8080,RU,Russian Federation,elite proxy,no,no,1 minute ago,5.167.255.227:8080
3,139.5.71.46,23500,NP,Nepal,elite proxy,no,yes,1 minute ago,139.5.71.46:23500
4,103.94.121.27,60109,ID,Indonesia,elite proxy,no,no,1 minute ago,103.94.121.27:60109


## AIOHTTP version
NOTE: do not handle with https

In [21]:
async def fetch(url, session, proxy=None, timeout=None):
    '''Function designed to make requests in aiohttp.ClientSession() way. 
    All function is substitution of requests.get() function. It's more native way to work with aio. 
    Function returns response itself and time response was waited. 
    '''
    start = time.process_time() 
    async with session.get(url, proxy=proxy, timeout=timeout) as response:
        res = await response.read() # pass event_loop
        return res.decode('utf-8'), time.process_time() - start


async def my_ip(proxies=None, timeout=None):
    '''
    Function checks availability of proxy server asking for ip. 
    Function supposed to be more protected due to the reason it 
    uses double context manager 'with' - in session and response manner. 
    '''
    async with ClientSession() as session:
        await asyncio.sleep(1)
        try:
            res = await fetch('http://myip.ru/index_small.php', session=session, proxy=proxies, timeout=timeout) # requests via await to pass to event_loop
            soup =  BeautifulSoup(res[0])
            return soup.find('table', {'class':'network-info'}).find_all('tr')[1].find('td').get_text(), res[1]
        except Exception as e: 
            print(e)
            return None

start = time.time() 
tasks = [] # list of futures 
for i in p['target'][:10]: 
    #proxies = {'http':i, 'https':i} # aiohttp do not handle https
    proxies = f'http://{i}'
    task = asyncio.create_task(my_ip(proxies, 5)) # create_task - is high level future. Low level futeres do not recomended to use. 
    tasks.append(task) # list of promises
async def main():
    return await asyncio.gather(*tasks)
f = await main()
# f = asincio.run(main()) # console version, due to the reason, jupyter already run own event_loop
print(f'execution time {time.time()-start}')







execution time 6.133014917373657


## Request version

In [18]:
async def my_ip(proxies=None, timeout=None):
    await asyncio.sleep(1)
    try:
        soup =  BeautifulSoup(requests.get('http://myip.ru/index_small.php', proxies=proxies, timeout=timeout).text)
        return soup.find('table', {'class':'network-info'}).find_all('tr')[1].find('td').get_text()
    except Exception as e: 
        print(e)
        return None
async def main():
    return await my_ip()

In [19]:
start = time.time()
tasks = []
for i in p['target'][:10]: 
    proxies = {'http':i, 'https':i}
    task = asyncio.create_task(my_ip(proxies, 5))
    tasks.append(task)
async def main():
    return await asyncio.gather(*tasks)
f = await main()
print(f'execution time {time.time()-start}')

HTTPConnectionPool(host='103.78.80.194', port=33442): Read timed out. (read timeout=5)
HTTPConnectionPool(host='115.87.110.143', port=8118): Max retries exceeded with url: http://myip.ru/index_small.php (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f256405a3d0>, 'Connection to 115.87.110.143 timed out. (connect timeout=5)'))
HTTPConnectionPool(host='5.167.255.227', port=8080): Max retries exceeded with url: http://myip.ru/index_small.php (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f256405b510>, 'Connection to 5.167.255.227 timed out. (connect timeout=5)'))
HTTPConnectionPool(host='139.5.71.46', port=23500): Max retries exceeded with url: http://myip.ru/index_small.php (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f256405c450>, 'Connection to 139.5.71.46 timed out. (connect timeout=5)'))
HTTPConnectionPool(host='103.94.121.27', port=60109): Max retries exceeded with url: http://myip.ru/i

## Sync version

In [23]:
def my_ip_sync(proxies=None, timeout=None):
    time.sleep(1) # to be honest - asyncio.sleep(1)
    try:
        soup =  BeautifulSoup(requests.get('http://myip.ru/index_small.php', proxies=proxies, timeout=timeout).text)
        return soup.find('table', {'class':'network-info'}).find_all('tr')[1].find('td').get_text()
    except Exception as e: 
        print(e)
        return None

start = time.time()
f2 = []  
for i in p['target'][:10]:
    proxies = {'http':i, 'https':i}
    f2.append(my_ip_sync(proxies, 5))
print(f'execution time {time.time() - start}')

HTTPConnectionPool(host='103.78.80.194', port=33442): Read timed out. (read timeout=5)
HTTPConnectionPool(host='115.87.110.143', port=8118): Max retries exceeded with url: http://myip.ru/index_small.php (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f2564055250>, 'Connection to 115.87.110.143 timed out. (connect timeout=5)'))
HTTPConnectionPool(host='5.167.255.227', port=8080): Max retries exceeded with url: http://myip.ru/index_small.php (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f256405bf10>, 'Connection to 5.167.255.227 timed out. (connect timeout=5)'))
HTTPConnectionPool(host='139.5.71.46', port=23500): Max retries exceeded with url: http://myip.ru/index_small.php (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f2564055250>, 'Connection to 139.5.71.46 timed out. (connect timeout=5)'))
HTTPConnectionPool(host='79.111.13.155', port=50625): Max retries exceeded with url: http://myip.ru/i