# Concurrency ⚡️

### Threads / async / multiprocessing

### Asincronía vs paralelismo

* enviar/recibir datos a través de la red
* leer el contenido de un archivo dentro de nuestro programa
* escribir datos de nuestro programa en el disco
* esperar a que finalice una operación en una API remota
* esperar a que finalice una operación en una base de datos 
* etc.

### Que puede ser paralelizado y que no

Cuando llegamos a los detalles, solo el `multiprocessing` realmente ejecuta estos hilos de procesamiento literalmente al mismo tiempo.

`threading` y `asyncio` se ejecutan en un único proceso y, por lo tanto, solo se ejecutan uno a la vez. Simplemente encuentran formas de turnarse para acelerar el proceso general. Aún así llamamos a esto concurrencia.

* threading: multitarea apropiativa (OS. El SO decide.) https://es.wikipedia.org/wiki/Multitarea_apropiativa  
* asyncio: multitarea cooperativa (Tu. Cada proceso cede el control.) https://es.wikipedia.org/wiki/Multitarea_cooperativa

### Memoria compartida vs replicación

Cosas más lentas que la CPU; I/O o network bound.

![I/O - networking](https://files.realpython.com/media/IOBound.4810a888b457.png)

No asociado a I/O, mucha computación; CPU bound.

![](https://files.realpython.com/media/CPUBound.d2d32cb2626c.png)

In [1]:
sites = [
    "https://www.yahoo.com/",
    "http://www.cnn.com",
    "http://www.python.org",
    "http://www.jython.org",
    "http://www.pypy.org",
    "http://www.perl.org",
    "http://www.cisco.com",
    "http://www.facebook.com",
    "http://www.twitter.com",
    "http://www.macrumors.com/",
    "http://arstechnica.com/",
    "http://www.reuters.com/",
    "http://abcnews.go.com/",
    "http://www.cnbc.com/",
    "http://olympus.realpython.org/dice",
    "https://realpython.com/",
]

In [2]:
import requests
import time


def download_site(url):
    response = requests.get(url)
    return len(response.content)


start_time = time.time()

for url int sites:
    download_site(url)

duration = time.time() - start_time

print(duration)

Read 244295 from https://www.yahoo.com/
Read 1142002 from http://www.cnn.com
Read 49886 from http://www.python.org
Read 10394 from http://www.jython.org
Read 6832 from http://www.pypy.org
Read 12861 from http://www.perl.org
Read 96543 from http://www.cisco.com
Read 209853 from http://www.facebook.com
Read 42523 from http://www.twitter.com
Read 336513 from http://www.macrumors.com/
Read 90569 from http://arstechnica.com/
Read 203818 from http://www.reuters.com/
Read 212963 from http://abcnews.go.com/
Read 971874 from http://www.cnbc.com/
Read 276 from http://olympus.realpython.org/dice
Read 39789 from https://realpython.com/
12.013489007949829


### Threading

In [3]:
from concurrent.futures import ThreadPoolExecutor
import requests
import time


def download_site(url):
    response = requests.get(url)
    return len(response.content)


start_time = time.time()

with ThreadPoolExecutor(max_workers=5) as executor:
    todos = [resultado for resultado in executor.map(download_site, sites)]


duration = time.time() - start_time
print(duration)

Read 10394 from http://www.jython.org
Read 49886 from http://www.python.org
Read 1142002 from http://www.cnn.com
Read 6832 from http://www.pypy.org
Read 96543 from http://www.cisco.com
Read 12861 from http://www.perl.org
Read 336467 from http://www.macrumors.com/
Read 206758 from http://www.facebook.com
Read 204144 from http://www.reuters.com/
Read 245526 from https://www.yahoo.com/
Read 90569 from http://arstechnica.com/
Read 42523 from http://www.twitter.com
Read 276 from http://olympus.realpython.org/dice
Read 970282 from http://www.cnbc.com/
Read 212963 from http://abcnews.go.com/
Read 39789 from https://realpython.com/
2.04518723487854


La parte del `Thread`. Eso es solo un hilo de procesamiento que mencionamos anteriormente. `Pool` es donde comienza a ponerse interesante. Este objeto va a crear un grupo de subprocesos, cada uno de los cuales puede ejecutarse simultáneamente. Finalmente, el `Executor` es la parte que controlará cómo y cuándo se ejecutará cada uno de los hilos del grupo.

![](https://files.realpython.com/media/Threading.3eef48da829e.png)

In [47]:
import concurrent.futures
# from threading import Lock


counter = 0

l = Lock()

def increment_counter(fake_value):
    # l.acquire()
    with l:
        global counter
        for _ in range(100):
            counter += 1
    # l.release()


fake_data = [x for x in range(5000)]

# counter = 0

with concurrent.futures.ThreadPoolExecutor(max_workers=50) as executor:
    executor.map(increment_counter, fake_data)

In [4]:
5000 * 100

500000

In [28]:
print(counter)

500000


### Async

The general concept of asyncio is that a single Python object, called the event loop, controls how and when each task gets run. The event loop is aware of each task and knows what state it’s in. In reality, there are many states that tasks could be in, but for now let’s imagine a simplified event loop that just has two states.

The ready state will indicate that a task has work to do and is ready to be run, and the waiting state means that the task is waiting for some external thing to finish, such as a network operation.

Your simplified event loop maintains two lists of tasks, one for each of these states. It selects one of the ready tasks and starts it back to running. That task is in complete control until it cooperatively hands the control back to the event loop.

When the running task gives control back to the event loop, the event loop places that task into either the ready or waiting list and then goes through each of the tasks in the waiting list to see if it has become ready by an I/O operation completing. It knows that the tasks in the ready list are still ready because it knows they haven’t run yet.

Once all of the tasks have been sorted into the right list again, the event loop picks the next task to run, and the process repeats. Your simplified event loop picks the task that has been waiting the longest and runs that. This process repeats until the event loop is finished.

An important point of asyncio is that the tasks never give up control without intentionally doing so. They never get interrupted in the middle of an operation. This allows us to share resources a bit more easily in asyncio than in threading. You don’t have to worry about making your code thread-safe.

**Any function that calls await needs to be marked with async. You’ll get a syntax error otherwise.**

No hay que preocuparse del número de threads que crear

https://markhneedham.com/blog/2019/05/10/jupyter-runtimeerror-this-event-loop-is-already-running/

In [51]:
# if __name__ == "__main__":
#     asyncio.run(download_sites)

In [53]:
import asyncio
import time
import aiohttp


async def download_site(session, url):
    async with session.get(url) as response:
        # result = await response.content_length
        print("Read {0} from {1}".format(response.content_length, url))


async def download_all_sites(sites):
    # You can share the session across all tasks, so the session is created here as a context manager.
    # The tasks can share the session because they are all running on the same thread.
    async with aiohttp.ClientSession() as session:
        tasks = []
        for url in sites:
            # ensure_future also takes care of starting the tasks
            task = asyncio.ensure_future(download_site(session, url))
            tasks.append(task)
        # print(tasks)
        # Once all the tasks are created, this function uses asyncio.gather()
        # to keep the session context alive until all of the tasks have completed.
        await asyncio.gather(*tasks, return_exceptions=True)


start_time = time.time()

await download_all_sites(sites)

duration = time.time() - start_time
print(f"Downloaded {len(sites)} sites in {duration} seconds")

Read 45718 from http://www.reuters.com/
Read 12861 from http://www.perl.org
Read 49886 from http://www.python.org
Read 3586 from http://www.jython.org
Read 153581 from http://www.cnbc.com/
Read 20418 from http://www.cisco.com
Read 276 from http://olympus.realpython.org/dice
Read None from http://www.macrumors.com/
Read None from https://realpython.com/
Read 157341 from http://www.cnn.com
Read None from http://www.facebook.com
Read 6832 from http://www.pypy.org
Read 50102 from http://abcnews.go.com/
Read None from http://arstechnica.com/
Read None from http://www.twitter.com
Read None from https://www.yahoo.com/
Downloaded 16 sites in 1.0771641731262207 seconds


![](https://files.realpython.com/media/Asyncio.31182d3731cf.png)

**ATENCIÓN**

**ATENCIÓN**

**ATENCIÓN**

Las siguientes celdas demuestran como descargar un set de imágenes aleatorias de la API de [unsplash](https://unsplash.com/). Una de las celdas las escribe en una carpeta `images/` la otra en `images2/`.


En la carpeta donde está este notebook están estas dos carpetas vacias. Pero revisa todo bien para que no se sobreescriba nada que no quieras.

Las celdas están aquí para demostrar como descargar muchas imágenes de forma asíncrona usando [aiohtpp](https://docs.aiohttp.org/en/stable/) en el primer caso, y [httpx](https://github.com/encode/httpx) en el segundo.

In [6]:
from itertools import repeat

from string import ascii_lowercase, printable
from random import choice


def random_string(string_length=15):
    """Generate a random string of fixed length """
    letters = ascii_lowercase
    return "".join(choice(letters) for i in range(string_length))


import asyncio
import aiohttp
import aiofiles


async def download_site(session, url):
    async with session.get(url) as response:
        c = await response.read()
        await write_file(c)


async def write_file(content):

    filename = "images/" + random_string() + ".jpg"
    async with aiofiles.open(filename, mode="wb") as f:
        await f.write(content)


urls = [
    "https://source.unsplash.com/1600x900/?nature,water," + random_string(6)
    for _ in range(50)
]


async def dl():
    async with aiohttp.ClientSession() as session:
        tasks = []
        for url in urls:
            task = asyncio.ensure_future(download_site(session, url))
            tasks.append(task)

        await asyncio.gather(*tasks, return_exceptions=True)


start = time.perf_counter()
await dl()
print(f"total = {time.perf_counter() - start}")

CPU times: user 2 µs, sys: 1e+03 ns, total: 3 µs
Wall time: 22.2 µs
total = 4.9483451249998325


In [54]:
from itertools import repeat

from string import ascii_lowercase
from random import choice


def random_string(string_length=15):
    """Generate a random string of fixed length """
    letters = ascii_lowercase
    return "".join(choice(letters) for i in range(string_length))


import asyncio
import time
import httpx
import aiofiles


async def download_site(client, url):
    r = await client.get(url)
    await write_file(r.content)


async def write_file(content):

    filename = "images2/" + random_string() + ".jpg"
    async with aiofiles.open(filename, mode="wb") as f:
        await f.write(content)


urls = [
    "https://source.unsplash.com/1600x900/?nature,water," + random_string(10)
    for _ in range(50)
]


async def dl():
    async with httpx.AsyncClient() as client:
        tasks = []
        for url in urls:
            task = asyncio.ensure_future(download_site(client, url))
            tasks.append(task)

        await asyncio.gather(*tasks, return_exceptions=True)


start = time.perf_counter()
await dl()
print(f"total = {time.perf_counter() - start}")

total = 6.267024283999945


Ejercicio opcional, leer y entender este código: https://pybay.com/site_media/slides/raymond2017-keynote/async_examples.html

In [5]:
# adapted from https://gist.github.com/bradmontgomery/81d71e415b0ff693f00408388590acb9

import hashlib
import sys

from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor
from time import sleep, time


def t1(n):
    """Silly function whose time increases as n does, it increases linearly."""
    for i in range(n):
        if i % 2 == 0:
            sleep(0.5)


def t2(n):
    """A somewhat CPU-intensive task."""
    for i in range(n):
        hashlib.pbkdf2_hmac("sha256", b"password", b"salt", 100000)


def do_work(n):
    """Function that does t1 and t2 in serial."""
    start = time()
    t1(n)
    t2(n)
    end = time()
    print("Work for {} finished in {}s".format(n, round(end - start, 2)))

In [6]:
def serial():

    start = time()
    for x in range(10):
        do_work(x)
    end = time()
    print("All work finished in {}s".format(round(end - start, 2)))

In [7]:
def parallel():
    start = time()
    with ThreadPoolExecutor(max_workers=4) as executor:
        inputs = range(10)
        for x, result in zip(inputs, executor.map(do_work, inputs)):
            pass
    end = time()
    print("All work finished in {}s".format(round(end - start, 2)))

In [27]:
serial()

Work for 0 finished in 0.0s
Work for 1 finished in 0.59s
Work for 2 finished in 0.66s
Work for 3 finished in 1.26s
Work for 4 finished in 1.4s
Work for 5 finished in 1.93s
Work for 6 finished in 2.05s
Work for 7 finished in 2.58s
Work for 8 finished in 2.69s
Work for 9 finished in 3.25s
All work finished in 16.42s


In [8]:
parallel()

Work for 0 finished in 0.0s
Work for 1 finished in 0.63s
Work for 2 finished in 0.78s
Work for 3 finished in 1.32s
Work for 4 finished in 1.4s
Work for 5 finished in 1.9s
Work for 6 finished in 2.36s
Work for 7 finished in 2.75s
Work for 8 finished in 2.79s
Work for 9 finished in 3.22s
All work finished in 5.75s


### Nº

In [9]:
import multiprocessing as mp

print("Number of processors: ", mp.cpu_count())

Number of processors:  4


In [10]:
import os

os.cpu_count()

4

### Ejercicio

* Escribe un script que identifique todas las imágenes de un árbol de carpetas.
* Debemos obtener una lista con todas las rutas de archivo de las imagénes.
* Crear una función que covierta una imagen a 128x128. Usar la librería **Pillow**, ya viene instalada en vuestra distribución de Anaconda creo.

```python
from PIL import Image
````

* Tras convertir una imagen, todas deben estar guardadaes en una misma carpeta. Por ejemplo al final habrá una carpeta que se llame "miniaturas" que contendrá todas las imágenes convertidas.
* Cada imagen debe convertirla en un thumbnail (128x128) y guardarlas en una misma carpeta.
* Cuando guardemos la imagen debemos guardarla con su nombre original añadiendo "_thumbnail".
    Por ejemplo `imagen.jpg` -> `imagen_thumbnail.jpg`

Intentar usar un f-string para el path `(f"carpeta/{}_{}.jpg")`.

* **Importante**: una vez tengamos la lista con todas nuestras rutas de archivo. Hay que usar procesamiento en paralelo para convertir las imágenes. Por ejemplo un ThreadPoolExecutor o ProcessPoolExecutor.

**Extra**:

En el módulo `functools` de Python existe una cosa que se llama `partial`. Esta función nos permite crea lo que se llaman funciones parciales. Si tenemos una función que por ejemplo acepta 3 argumentos, crear una función parcial sería *"duplicar"* está función pero haciendo que uno de estos parámetros sea fijo. Y obtendríamos una función. Por ejemplo:

* Tengo una función: `convertir_miniatura(resolucion, ruta)`
* Puedo hacer `miniatura128 = partial(convertir_miniatura, 128)`.
* Esto último me devolvería otra función, que ahora puedo utilizar directamente con: `miniatura128("/Users/r/.../imagen.jpg")`. Tendremos a nuestra disposición una nueva función que es igual que la original pero como si uno de sus parámetros estuviera fijo.


`functools.partial` + executors Pillow + paths (download images)




In [None]:
import requests
import zipfile
import os

with open("archivo_ejercicio.zip", "wb") as f:
    f.write(
        requests.get(
            "https://github.com/polyrand/teach/raw/master/11_concurrencia_paralelismo/archivo_ejercicio.zip"
        ).content
    )

with zipfile.ZipFile("archivo_ejercicio.zip", "r") as zip_ref:
    zip_ref.extractall()

### Ejercicio pistas

In [46]:
import os

filelist = []

for _, _, _ in os.walk("tree"):
    # os.walk itera sobre 3 parámetros, cuales son?
    # hacer algo
    pass

In [11]:
from pathlib import Path

In [14]:
p = Path("tree/fdjtoupvvurxgrd.jpg")

In [15]:
p.absolute()  # <<-- es una función, hay que poner ()

PosixPath('/Users/r/Projects/teach/11_concurrencia_paralelismo/tree/fdjtoupvvurxgrd.jpg')

In [16]:
p.name  # es un método, NO hay que poner ()

'fdjtoupvvurxgrd.jpg'

In [17]:
p.stem  # es un método, NO hay que poner ()

'fdjtoupvvurxgrd'

In [18]:
p.suffix

'.jpg'

In [19]:
nuevo_nombre = p.stem + "_thumbnail" + p.suffix

In [21]:
miniaturas = Path("miniaturas/")

In [22]:
miniaturas/nuevo_nombre

PosixPath('miniaturas/fdjtoupvvurxgrd_thumbnail.jpg')

In [None]:
def miniaturizar(path: str):
    size = (128, 128)  # 128x128
    p = Path(path).absolute()
    nuevo_nombre = p.stem + "_thumbnail" + p.suffix
    miniaturas = Path("miniaturas/").absolute()
    save = miniaturas / nuevo_nombre
    image = Image.open(p)
    image.thumbnail(size)
    image.save(save)

[Donald Knuth](https://en.wikipedia.org/wiki/Donald_Knuth): “Premature optimization is the root of all evil (or at least most of it) in programming.”

Más info y fuentes:

* https://realpython.com/python-concurrency/ 👈🏼
* https://realpython.com/intro-to-python-threading/
* https://www.youtube.com/watch?v=9zinZmE3Ogk
* https://pybay.com/site_media/slides/raymond2017-keynote/index.html
* https://realpython.com/async-io-python/  👈🏼 lectura recomendada, async es un tema complejo y tiene su curva de apredizaje
* https://realpython.com/intro-to-python-threading/
* https://www.toptal.com/python/beginners-guide-to-concurrency-and-parallelism-in-python
* https://stackoverflow.com/questions/49005651/how-does-asyncio-actually-work/51116910#51116910
* https://www.blog.pythonlibrary.org/2016/07/26/python-3-an-intro-to-asyncio/
* https://stackabuse.com/python-async-await-tutorial/