### Ejercicio

* Escribe un script que identifique todas las imágenes de un árbol de carpetas.
* Debemos obtener una lista con todas las rutas de archivo de las imagénes.
* Crear una función que covierta una imagen a 128x128. Usar la librería **Pillow**, ya viene instalada en vuestra distribución de Anaconda creo.

```python
import PIL
````

* Tras convertir una imagen, todas deben estar guardadaes en una misma carpeta. Por ejemplo al final habrá una carpeta que se llame "miniaturas" que contendrá todas las imágenes convertidas.
* Cada imagen debe convertirla en un thumbnail (128x128) y guardarlas en una misma carpeta.
* Cuando guardemos la imagen debemos guardarla con su nombre original añadiendo "_thumbnail".
    Por ejemplo `imagen.jpg` -> `imagen_thumbnail.jpg`

Intentar usar un f-string para el path `(f"carpeta/{}_{}.jpg")`.

* **Importante**: una vez tengamos la lista con todas nuestras rutas de archivo. Hay que usar procesamiento en paralelo para convertir las imágenes. Por ejemplo un ThreadPoolExecutor o ProcessPoolExecutor.

**Extra**:

En el módulo `functools` de Python existe una cosa que se llama `partial`. Esta función nos permite crea lo que se llaman funciones parciales. Si tenemos una función que por ejemplo acepta 3 argumentos, crear una función parcial sería *"duplicar"* está función pero haciendo que uno de estos parámetros sea fijo. Y obtendríamos una función. Por ejemplo:

* Tengo una función: `convertir_miniatura(resolucion, ruta)`
* Puedo hacer `miniatura128 = partial(convertir_miniatura, 128)`.
* Esto último me devolvería otra función, que ahora puedo utilizar directamente con: `miniatura128("/Users/r/.../imagen.jpg")`. Tendremos a nuestra disposición una nueva función que es igual que la original pero como si uno de sus parámetros estuviera fijo.


`functools.partial` + executors Pillow + paths (download images)




Ejercicio adaptado de: https://www.toptal.com/python/beginners-guide-to-concurrency-and-parallelism-in-python

In [4]:
import os
from pathlib import Path

In [5]:
lista_de_paths = []
for root, dirs, files in os.walk("tree"):
    lista_de_paths = lista_de_paths + [os.path.join(root, x) for x in files if x.endswith((".jpg"))]

In [6]:
[str(ruta.absolute()) for ruta in Path("tree").rglob("*.jpg")]

['/Users/r/Projects/teach/11_concurrencia_paralelismo/tree/jkrlyxsbonbrdjb.jpg',
 '/Users/r/Projects/teach/11_concurrencia_paralelismo/tree/qynjddmvbflfmfl.jpg',
 '/Users/r/Projects/teach/11_concurrencia_paralelismo/tree/pjeyfssssqovicw.jpg',
 '/Users/r/Projects/teach/11_concurrencia_paralelismo/tree/rvqjyhrsggnkgmf.jpg',
 '/Users/r/Projects/teach/11_concurrencia_paralelismo/tree/mtshvhafwuveczn.jpg',
 '/Users/r/Projects/teach/11_concurrencia_paralelismo/tree/hriejwwwxnlslnr.jpg',
 '/Users/r/Projects/teach/11_concurrencia_paralelismo/tree/ufufkvakojndran.jpg',
 '/Users/r/Projects/teach/11_concurrencia_paralelismo/tree/rxvjgoudtzlgska.jpg',
 '/Users/r/Projects/teach/11_concurrencia_paralelismo/tree/fdjtoupvvurxgrd.jpg',
 '/Users/r/Projects/teach/11_concurrencia_paralelismo/tree/tjbcluxsujxgcnw.jpg',
 '/Users/r/Projects/teach/11_concurrencia_paralelismo/tree/msqfostmedkgorc.jpg',
 '/Users/r/Projects/teach/11_concurrencia_paralelismo/tree/E5E2Q06R/ogdottndbujeyrq.jpg',
 '/Users/r/Projects

In [7]:
filelist = lista_de_paths.copy()

In [10]:
from pathlib import Path
import PIL
from PIL import Image
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

filelist = [str(ruta.absolute()) for ruta in Path("tree/").rglob("*jpg")]


def miniaturizar(path: str):
    size = (128, 128)  # 128x128
    p = Path(path).absolute()
    nuevo_nombre = p.stem + "_thumbnail" + p.suffix
    miniaturas = Path("miniaturas/").absolute()
    save = miniaturas / nuevo_nombre
    image = Image.open(p)
    image.thumbnail(size)
    image.save(save)



import os
from joblib import Parallel, delayed

max_nucleos = os.cpu_count()

In [19]:
%%time
Parallel(n_jobs=max_nucleos, verbose=10)(delayed(miniaturizar)(p) for p in filelist)

[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done   5 tasks      | elapsed:    2.1s
[Parallel(n_jobs=4)]: Done  10 tasks      | elapsed:    2.2s
[Parallel(n_jobs=4)]: Done  17 tasks      | elapsed:    2.3s
[Parallel(n_jobs=4)]: Batch computation too fast (0.1837s.) Setting batch_size=2.
[Parallel(n_jobs=4)]: Done  24 tasks      | elapsed:    2.4s
[Parallel(n_jobs=4)]: Done  34 tasks      | elapsed:    2.7s
[Parallel(n_jobs=4)]: Batch computation too fast (0.1563s.) Setting batch_size=4.
[Parallel(n_jobs=4)]: Done  56 tasks      | elapsed:    3.1s


CPU times: user 121 ms, sys: 55.7 ms, total: 177 ms
Wall time: 3.94 s


[Parallel(n_jobs=4)]: Done  94 tasks      | elapsed:    3.8s
[Parallel(n_jobs=4)]: Done  97 out of 104 | elapsed:    3.9s remaining:    0.3s
[Parallel(n_jobs=4)]: Done 104 out of 104 | elapsed:    3.9s finished


[None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None]

In [17]:
%%time
with ThreadPoolExecutor(max_workers=max_nucleos) as executor:
    executor.map(miniaturizar, filelist)

CPU times: user 1.14 s, sys: 131 ms, total: 1.27 s
Wall time: 809 ms


In [16]:
%%time
for ruta in filelist:
    miniaturizar(ruta)

CPU times: user 927 ms, sys: 77.5 ms, total: 1 s
Wall time: 1.31 s


In [34]:
from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor
import os

In [35]:
%time


# with ProcessPoolExecutor(max_workers=os.cpu_count()) as executor:
# nº maximo de cpus lo calcula solo ^^^^^^^^^^^^^
with ThreadPoolExecutor(max_workers=4) as executor:
    executor.map(miniaturizar, filelist)

CPU times: user 3 µs, sys: 1e+03 ns, total: 4 µs
Wall time: 4.77 µs


### Alternativa

Usando una [función parcial](https://docs.python.org/3/library/functools.html#functools.partial)

In [2]:
import os

In [3]:
filelist = []
for root, dirs, files in os.walk("res"):
    filelist = filelist + [os.path.join(root, x) for x in files if x.endswith((".jpg"))]

In [36]:
import logging
from pathlib import Path
from time import time
from functools import partial

from concurrent.futures import ProcessPoolExecutor

from PIL import Image

logging.basicConfig(
    level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
)

logger = logging.getLogger(__name__)


def create_thumbnail(img, size, save):
    """
    Creates a thumbnail of an image with the same name as image but with
    _thumbnail appended before the extension. E.g.:

    >>> create_thumbnail((128, 128), 'image.jpg')

    A new thumbnail image is created with the name image_thumbnail.jpg

    :param size: A tuple of the width and height of the image
    :param path: The path to the image file
    :return: None
    """
    path = img
    print("Current image:", path)
    print("Save path:", save)
    print("Size:", size)
    path = Path(path)
    name = path.stem + "_thumbnail" + path.suffix
    thumbnail_path = Path(save) / name
    image = Image.open(path)
    image.thumbnail(size)
    image.save(thumbnail_path)

In [37]:
thumbnail_128 = partial(create_thumbnail, size=(128, 128), save="save")

In [None]:
# aplicado a una image
thumbnail_128(img=filelist[0])

In [41]:
ts = time()
# Partially apply the create_thumbnail method, setting the size to 128x128
# and returning a function of a single argument.


# Create the executor in a with block so shutdown is called when the block
# is exited.
with ProcessPoolExecutor() as executor:
    executor.map(thumbnail_128, filelist)


logging.info("Took %s", time() - ts)

2020-10-27 14:02:46,893 - root - INFO - Took 0.5612080097198486
