Los ejemplos están basados en esta página

https://medium.com/analytics-vidhya/asyncio-threading-and-multiprocessing-in-python-4f5ff6ca75e8

### Envio de mensajes

Ejemplo de ejecución de dos tareas de forma secuencial.

In [1]:
#Utilidad Python para tener timestamp de forma automática
import logging
import time

#configuración del logger

logger_format = '%(asctime)s:%(threadName)s:%(message)s'
logging.basicConfig(format=logger_format, level=logging.INFO, datefmt="%H:%M:%S")


num_word_mapping = {1: 'ONE', 2: 'TWO', 3: "THREE", 4: "FOUR", 5: "FIVE", 6: "SIX", 7: "SEVEN", 8: "EIGHT",
                   9: "NINE", 10: "TEN"}

# Tarea básica: envio de mensaje

def delay_message(delay, message):
    logging.info(f"{message} received")
    time.sleep(delay)
    logging.info(f"Printing {message}")

def main():
    logging.info("Main started")
    delay_message(2, num_word_mapping[2])
    delay_message(3, num_word_mapping[3])
    logging.info("Main Ended")

main()

23:03:33:MainThread:Main started
23:03:33:MainThread:TWO received
23:03:35:MainThread:Printing TWO
23:03:35:MainThread:THREE received
23:03:38:MainThread:Printing THREE
23:03:38:MainThread:Main Ended


### Concurrencia con hilos (threads)

Ejecución en paralelo, con espacio compartido

In [8]:
import threading

def main():
    logging.info("Main started")
    threads = [threading.Thread(target=delay_message, args=(delay, message)) for delay, message in zip([2, 3, 4, 5, 6], 
                                                                            [num_word_mapping[2], num_word_mapping[3], num_word_mapping[4], num_word_mapping[5], num_word_mapping[6]])]
    
    # Se ejecutan los hilos
    
    for thread in threads:
        thread.start()
        
    # Se espera que acaben los hilos (sincronización)
    for thread in threads:
        thread.join() # waits for thread to complete its task
    logging.info("Main Ended")
main()

23:13:19:MainThread:Main started
23:13:19:Thread-24 (delay_message):TWO received
23:13:19:Thread-25 (delay_message):THREE received
23:13:19:Thread-26 (delay_message):FOUR received
23:13:19:Thread-27 (delay_message):FIVE received
23:13:19:Thread-28 (delay_message):SIX received
23:13:21:Thread-24 (delay_message):Printing TWO
23:13:22:Thread-25 (delay_message):Printing THREE
23:13:23:Thread-26 (delay_message):Printing FOUR
23:13:24:Thread-27 (delay_message):Printing FIVE
23:13:25:Thread-28 (delay_message):Printing SIX
23:13:25:MainThread:Main Ended


In [7]:
import threading

def main():
    logging.info("Main started")
    threads = [threading.Thread(target=delay_message, args=(delay, message)) for delay, message in zip([2, 3, 4, 5, 6], 
                                                                            [num_word_mapping[2], num_word_mapping[3], num_word_mapping[4], num_word_mapping[5], num_word_mapping[6]])]
    
    # Se ejecutan los hilos
    
    for thread in threads:
        thread.start()
        
    logging.info("Main Ended")
main()

23:13:09:MainThread:Main started
23:13:09:Thread-19 (delay_message):TWO received
23:13:09:Thread-20 (delay_message):THREE received
23:13:09:Thread-21 (delay_message):FOUR received
23:13:09:Thread-22 (delay_message):FIVE received
23:13:09:Thread-23 (delay_message):SIX received
23:13:09:MainThread:Main Ended
23:13:11:Thread-19 (delay_message):Printing TWO
23:13:12:Thread-20 (delay_message):Printing THREE
23:13:13:Thread-21 (delay_message):Printing FOUR
23:13:14:Thread-22 (delay_message):Printing FIVE
23:13:15:Thread-23 (delay_message):Printing SIX


### Reutilización de hilos

Utilización de uns estructura para los hilos.

In [5]:
import concurrent.futures as cf
import logging
import time

logger_format = '%(asctime)s:%(threadName)s:%(message)s'
logging.basicConfig(format=logger_format, level=logging.INFO, datefmt="%H:%M:%S")

num_word_mapping = {1: 'ONE', 2: 'TWO', 3: "THREE", 4: "FOUR", 5: "FIVE", 6: "SIX", 7: "SEVEN", 8: "EIGHT",
                    9: "NINE", 10: "TEN"}
    
def delay_message(delay, message):
    logging.info(f"{message} received")
    time.sleep(delay)
    logging.info(f"Printing {message}")
    return message


if __name__ == '__main__':
    with cf.ThreadPoolExecutor(max_workers=2) as executor:
        future_to_mapping = {executor.submit(delay_message, i, num_word_mapping[i]): num_word_mapping[i] for i in
                             range(2, 7)}
        for future in cf.as_completed(future_to_mapping):
            logging.info(f"{future.result()} Done")

23:10:13:ThreadPoolExecutor-1_0:TWO received
23:10:13:ThreadPoolExecutor-1_1:THREE received
23:10:15:ThreadPoolExecutor-1_0:Printing TWO
23:10:15:ThreadPoolExecutor-1_0:FOUR received
23:10:15:MainThread:TWO Done
23:10:16:ThreadPoolExecutor-1_1:Printing THREE
23:10:16:ThreadPoolExecutor-1_1:FIVE received
23:10:16:MainThread:THREE Done
23:10:19:ThreadPoolExecutor-1_0:Printing FOUR
23:10:19:ThreadPoolExecutor-1_0:SIX received
23:10:19:MainThread:FOUR Done
23:10:21:ThreadPoolExecutor-1_1:Printing FIVE
23:10:21:MainThread:FIVE Done
23:10:25:ThreadPoolExecutor-1_0:Printing SIX
23:10:25:MainThread:SIX Done


### Concurrencia con la libreria AsyncIO

* Coroutine: A diferencia de una función convencional con un único punto de salida, una coroutine puede pausar y reanudar su ejecución. La creación de una coroutine es tan simple como utilizar la palabra clave async antes de declarar una función.

* Bucle de eventos o coordinador: Coroutine que gestiona otras coroutines. Puedes pensar en ella como un planificador o maestro.

* Coroutine, Tasks, y Future son objetos awaitable. Una coroutine puede esperar en objetos awaitable. Mientras una coroutine está esperando en un objeto awaitable, su ejecución se suspende temporalmente y se reanuda después de que Future haya terminado.

In [9]:
import asyncio
import logging
import time

logger_format = '%(asctime)s:%(threadName)s:%(message)s'
logging.basicConfig(format=logger_format, level=logging.INFO, datefmt="%H:%M:%S")

num_word_mapping = {1: 'ONE', 2: 'TWO', 3: "THREE", 4: "FOUR", 5: "FIVE", 6: "SIX", 7: "SEVEN", 8: "EIGHT",
                   9: "NINE", 10: "TEN"}

async def delay_message(delay, message):
    logging.info(f"{message} received")
    await asyncio.sleep(delay) # time.sleep is blocking call. Hence, it cannot be awaited and we have to use asyncio.sleep
    logging.info(f"Printing {message}")
    
async def main():
    logging.info("Main started")
    logging.info(f'Current registered tasks: {len(asyncio.all_tasks())}')
    logging.info("Creating tasks")
    task_1 = asyncio.create_task(delay_message(2, num_word_mapping[2])) 
    task_2 = asyncio.create_task(delay_message(3, num_word_mapping[3]))
    logging.info(f'Current registered tasks: {len(asyncio.all_tasks())}')
    await task_1 # suspends execution of coroutine and gives control back to event loop while awaiting task completion.
    await task_2
    logging.info("Main Ended")

await main()

23:14:38:MainThread:Main started
23:14:38:MainThread:Current registered tasks: 2
23:14:38:MainThread:Creating tasks
23:14:38:MainThread:Current registered tasks: 4
23:14:38:MainThread:TWO received
23:14:38:MainThread:THREE received
23:14:40:MainThread:Printing TWO
23:14:41:MainThread:Printing THREE
23:14:41:MainThread:Main Ended


### Otra opción para crear tareas AsyncIO

Usando asyncio.gather para crear varias tareas de una vez

In [10]:
import asyncio
import logging
import time

logger_format = '%(asctime)s:%(threadName)s:%(message)s'
logging.basicConfig(format=logger_format, level=logging.INFO, datefmt="%H:%M:%S")

num_word_mapping = {1: 'ONE', 2: 'TWO', 3: "THREE", 4: "FOUR", 5: "FIVE", 6: "SIX", 7: "SEVEN", 8: "EIGHT",
                   9: "NINE", 10: "TEN"}

async def delay_message(delay, message):
    logging.info(f"{message} received")
    await asyncio.sleep(delay) # time.sleep is blocking call. Hence, it cannot be awaited and we have to use asyncio.sleep
    logging.info(f"Printing {message}")
    
async def main():
    logging.info("Main started")
    logging.info("Creating multiple tasks with asyncio.gather")
    await asyncio.gather(*[delay_message(i+1, num_word_mapping[i+1]) for i in range(5)]) # awaits completion of all tasks
    logging.info("Main Ended")


await main()

23:15:49:MainThread:Main started
23:15:49:MainThread:Creating multiple tasks with asyncio.gather
23:15:49:MainThread:ONE received
23:15:49:MainThread:TWO received
23:15:49:MainThread:THREE received
23:15:49:MainThread:FOUR received
23:15:49:MainThread:FIVE received
23:15:50:MainThread:Printing ONE
23:15:51:MainThread:Printing TWO
23:15:52:MainThread:Printing THREE
23:15:53:MainThread:Printing FOUR
23:15:54:MainThread:Printing FIVE
23:15:54:MainThread:Main Ended


### Caution about Blocking Calls in AsyncIO Tasks

As I told earlier, an asyncio task has an exclusive right to use CPU until it volunteers to give up. If by mistake a blocking call sneaks into your task, it is going to stall the progress of the program.

In [22]:
import asyncio
import logging
import time

logger_format = '%(asctime)s:%(threadName)s:%(message)s'
logging.basicConfig(format=logger_format, level=logging.INFO, datefmt="%H:%M:%S")

num_word_mapping = {1: 'ONE', 2: 'TWO', 3: "THREE", 4: "FOUR", 5: "FIVE", 6: "SIX", 7: "SEVEN", 8: "EIGHT",
                   9: "NINE", 10: "TEN"}

async def delay_message(delay, message):
    logging.info(f"{message} received")
    if message != 'THREE':
        await asyncio.sleep(delay) # non-blocking call. gives up execution
    else:
        time.sleep(delay) # blocking call
    logging.info(f"Printing {message}")
    
async def main():
    logging.info("Main started")
    logging.info("Creating multiple tasks with asyncio.gather")
    await asyncio.gather(*[delay_message(i+1, num_word_mapping[i+1]) for i in range(5)]) # awaits completion of all tasks
    logging.info("Main Ended")

if __name__ == '__main__':

    await main() # creats an envent loop

14:34:46:MainThread:Main started
14:34:46:MainThread:Creating multiple tasks with asyncio.gather
14:34:46:MainThread:ONE received
14:34:46:MainThread:TWO received
14:34:46:MainThread:THREE received
14:34:49:MainThread:Printing THREE
14:34:49:MainThread:FOUR received
14:34:49:MainThread:FIVE received
14:34:49:MainThread:Printing ONE
14:34:49:MainThread:Printing TWO
14:34:53:MainThread:Printing FOUR
14:34:54:MainThread:Printing FIVE
14:34:54:MainThread:Main Ended


### Race Conditions

A multithreaded code can quickly fall apart if it doesn’t account for race conditions. It especially becomes tricky when using external libraries, as we need to verify if they support multithreaded code. For example, the session object of popular requests module is not thread-safe. Hence, trying to parallelize network requests using a session object can produce unintended results.

In [24]:
import concurrent.futures as cf
import logging
import time

logger_format = '%(asctime)s:%(threadName)s:%(message)s'
logging.basicConfig(format=logger_format, level=logging.INFO, datefmt="%H:%M:%S")

class DbUpdate:
    def __init__(self):
        self.value = 0

    def update(self):
        logging.info("Update Started")
        logging.info("Sleeping")
        time.sleep(2) # thread gets switched
        logging.info("Reading Value From Db")
        tmp = self.value**2 + 1
        logging.info("Updating Value")
        self.value = tmp
        logging.info("Update Finished")
        
db = DbUpdate()
with cf.ThreadPoolExecutor(max_workers=5) as executor:
    updates = [executor.submit(db.update) for _ in range(10)]
logging.info(f"Final value is {db.value}")

14:37:05:ThreadPoolExecutor-3_0:Update Started
14:37:05:ThreadPoolExecutor-3_1:Update Started
14:37:05:ThreadPoolExecutor-3_2:Update Started
14:37:05:ThreadPoolExecutor-3_3:Update Started
14:37:05:ThreadPoolExecutor-3_4:Update Started
14:37:05:ThreadPoolExecutor-3_0:Sleeping
14:37:05:ThreadPoolExecutor-3_1:Sleeping
14:37:05:ThreadPoolExecutor-3_2:Sleeping
14:37:05:ThreadPoolExecutor-3_3:Sleeping
14:37:05:ThreadPoolExecutor-3_4:Sleeping
14:37:07:ThreadPoolExecutor-3_0:Reading Value From Db
14:37:07:ThreadPoolExecutor-3_2:Reading Value From Db
14:37:07:ThreadPoolExecutor-3_1:Reading Value From Db
14:37:07:ThreadPoolExecutor-3_3:Reading Value From Db
14:37:07:ThreadPoolExecutor-3_0:Updating Value
14:37:07:ThreadPoolExecutor-3_2:Updating Value
14:37:07:ThreadPoolExecutor-3_1:Updating Value
14:37:07:ThreadPoolExecutor-3_3:Updating Value
14:37:07:ThreadPoolExecutor-3_4:Reading Value From Db
14:37:07:ThreadPoolExecutor-3_0:Update Finished
14:37:07:ThreadPoolExecutor-3_2:Update Finished
14:37:

In [25]:
import concurrent.futures as cf
import logging
import time
import threading

LOCK = threading.Lock()

logger_format = '%(asctime)s:%(threadName)s:%(message)s'
logging.basicConfig(format=logger_format, level=logging.INFO, datefmt="%H:%M:%S")

class DbUpdate:
    def __init__(self):
        self.value = 0

    def update(self):
        logging.info("Update Started")
        logging.info("Sleeping")
        time.sleep(2) # thread gets switched
        with LOCK:
            logging.info("Reading Value From Db")
            tmp = self.value**2 + 1
            logging.info("Updating Value")
            self.value = tmp
            logging.info("Update Finished")
        
db = DbUpdate()
with cf.ThreadPoolExecutor(max_workers=5) as executor:
    updates = [executor.submit(db.update) for _ in range(2)]
logging.info(f"Final value is {db.value}")

14:38:41:ThreadPoolExecutor-4_0:Update Started
14:38:41:ThreadPoolExecutor-4_1:Update Started
14:38:41:ThreadPoolExecutor-4_0:Sleeping
14:38:41:ThreadPoolExecutor-4_1:Sleeping
14:38:43:ThreadPoolExecutor-4_0:Reading Value From Db
14:38:43:ThreadPoolExecutor-4_0:Updating Value
14:38:43:ThreadPoolExecutor-4_0:Update Finished
14:38:43:ThreadPoolExecutor-4_1:Reading Value From Db
14:38:43:ThreadPoolExecutor-4_1:Updating Value
14:38:43:ThreadPoolExecutor-4_1:Update Finished
14:38:43:MainThread:Final value is 2


### Race Conditions are Rare with AsyncIO

Since the task has complete control on when to suspend execution, race conditions are rare with asyncio.

In [29]:
import asyncio
import logging
import time

logger_format = '%(asctime)s:%(threadName)s:%(message)s'
logging.basicConfig(format=logger_format, level=logging.INFO, datefmt="%H:%M:%S")

class DbUpdate:
    def __init__(self):
        self.value = 0

    async def update(self):
        logging.info("Update Started")
        logging.info("Sleeping")
        await asyncio.sleep(1)
        logging.info("Reading Value From Db")
        tmp = self.value**2 + 1
        logging.info("Updating Value")
        self.value = tmp
        logging.info("Update Finished")
        
async def main():
    db = DbUpdate()
    await asyncio.gather(*[db.update() for _ in range(10)])
    logging.info(f"Final value is {db.value}")
    
await main()

14:42:17:MainThread:Update Started
14:42:17:MainThread:Sleeping
14:42:17:MainThread:Update Started
14:42:17:MainThread:Sleeping
14:42:17:MainThread:Update Started
14:42:17:MainThread:Sleeping
14:42:17:MainThread:Update Started
14:42:17:MainThread:Sleeping
14:42:17:MainThread:Update Started
14:42:17:MainThread:Sleeping
14:42:17:MainThread:Update Started
14:42:17:MainThread:Sleeping
14:42:17:MainThread:Update Started
14:42:17:MainThread:Sleeping
14:42:17:MainThread:Update Started
14:42:17:MainThread:Sleeping
14:42:17:MainThread:Update Started
14:42:17:MainThread:Sleeping
14:42:17:MainThread:Update Started
14:42:17:MainThread:Sleeping
14:42:18:MainThread:Reading Value From Db
14:42:18:MainThread:Updating Value
14:42:18:MainThread:Update Finished
14:42:18:MainThread:Reading Value From Db
14:42:18:MainThread:Updating Value
14:42:18:MainThread:Update Finished
14:42:18:MainThread:Reading Value From Db
14:42:18:MainThread:Updating Value
14:42:18:MainThread:Update Finished
14:42:18:MainThread:R

As you can see, once the task got resumed after sleeping, it didn’t give up control until it completed the execution of coroutine. With threading, thread swapping in not very obvious, but with asyncio, we can control on when exactly the coroutine execution should be suspended. Nonetheless, it can go wrong when two coroutines enter a deadlock.

In [30]:
import asyncio 

async def foo():
    await boo()
    
async def boo():
    await foo()
    
async def main():
    await asyncio.gather(*[foo(), boo()])
    
await main()

RecursionError: maximum recursion depth exceeded

### Multiprocessing

As aforementioned, multiprocessing comes really handy when implementing CPU intensive programs. Below code executes merge sorting on 1000lists with 30000 elements. Bear with me if below implementation of merge sort is bit clumsy.


#### Synchronous version

In [31]:
import concurrent.futures as cf
import logging
import math
import numpy as np
import time
import threading

logger_format = '%(asctime)s:%(threadName)s:%(message)s'
logging.basicConfig(format=logger_format, level=logging.INFO, datefmt="%H:%M:%S")

r_lists = [[np.random.randint(500000) for _ in range(30000)] for _ in range(1000)]

def merge(l_1, l_2):
    out = []
    key_1 = 0
    key_2 = 0
    for i in range(len(l_1) + len(l_2)):
        if l_1[key_1] < l_2[key_2]:
            out.append(l_1[key_1])
            key_1 += 1
            if key_1 == len(l_1):
                out = out + l_2[key_2:]
                break
        else:
            out.append(l_2[key_2])
            key_2 += 1
            if key_2 == len(l_2):
                out = out + l_1[key_1:]
                break
    return out

def merge_sort(l):
    if len(l) == 1:
        return l
    mid_point = math.floor((len(l) + 1) / 2)
    l_1, l_2 = merge_sort(l[:mid_point]), merge_sort(l[mid_point:])
    out = merge(l_1, l_2)
    del l_1, l_2
    return out

if __name__ == '__main__':
    logging.info("Starting Sorting")
    for r_list in r_lists:
        _ = merge_sort(r_list)
    logging.info("Sorting Completed")

14:46:30:MainThread:Starting Sorting
14:47:32:MainThread:Sorting Completed


#### Asynchronous version

In [None]:
import concurrent.futures as cf
import logging
import math
import numpy as np
import time
import threading

logger_format = '%(asctime)s:%(threadName)s:%(message)s'
logging.basicConfig(format=logger_format, level=logging.INFO, datefmt="%H:%M:%S")

r_lists = [[np.random.randint(500000) for _ in range(30000)] for _ in range(1000)]

def merge(l_1, l_2):
    out = []
    key_1 = 0
    key_2 = 0
    for i in range(len(l_1) + len(l_2)):
        if l_1[key_1] < l_2[key_2]:
            out.append(l_1[key_1])
            key_1 += 1
            if key_1 == len(l_1):
                out = out + l_2[key_2:]
                break
        else:
            out.append(l_2[key_2])
            key_2 += 1
            if key_2 == len(l_2):
                out = out + l_1[key_1:]
                break
    return out

def merge_sort(l):
    if len(l) == 1:
        return l
    mid_point = math.floor((len(l) + 1) / 2)
    l_1, l_2 = merge_sort(l[:mid_point]), merge_sort(l[mid_point:])
    out = merge(l_1, l_2)
    del l_1, l_2
    return out

if __name__ == '__main__':
    logging.info("Starting Sorting")
    with cf.ProcessPoolExecutor() as executor:
        sorted_lists_futures = [executor.submit(merge_sort, r_list) for r_list in r_lists]
    logging.info("Sorting Completed")