`mputil` -- Utility functions for Python's multiprocessing standard library module

- Author: Sebastian Raschka <mail@sebastianraschka.com>
- License: MIT
- Code Repository: https://github.com/rasbt/mputil

# `lazy_map` and `lazy_imap` examples

`lazy_map` and `lazy_imap` are wrappers of the `map` function in Python's [`multiprocessing`](https://docs.python.org/3.6/library/multiprocessing.html) module. These wrappers evaluate the "iterator" lazily (in contrast to `map` and `imap`), which can be desirable if the iterator or generator yields objects with large memory footprints. Note that the syntax and use of `lazy_map` and `lazy_imap` do not exactly mimic their respective `map` and `imap` counterparts.

## `lazy_map`

The `lazy_map` function requires a `data_processor` function as input as well as a `data_generator`. The `data_processor` is a function that performs a desired computation on each of the elements of an iterator (`data_generator`). This iterator is typically a Python generator that yields arbitrary objects.

In [1]:
def my_data_processor(x):
    # some expensive computation
    return x

def my_data_generator():
    for i in range(20):
        yield i
    
# think of `list(my_data_generator())`
# as too large to fit into memory, which is why
# we don't want to use map or imap

The `lazy_map` function then applies the `data_processor` function to a generator and returns a list containing the values returned by the `data_processor` in sorted order as shown in the example below:

In [2]:
from mputil import lazy_map

gen = my_data_generator()
print(lazy_map(data_processor=my_data_processor, 
               data_generator=gen, 
               n_cpus=0))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]


In the example above, `n_cpus` specifies the number of CPUs being used.

    - If `n_cpus` > 0, the specified number of CPUs will be used.
    - If `n_cpus=0`, all available CPUs will be used.
    - If `n_cpus` < 0, all available CPUs - `n_cpus` will be used.

## `lazy_imap`

The `lazy_imap` generator is similar to `lazy_map` function, but the results are returned in "chunks" (in sorted oder), which can be useful of the result list itself is too large to fit into memory. Like in `lazy_map`, the "iterator" (here: `data_generator`) is also evaluated lazily. The example below demonstrates the use of `lazy_imap`:

In [1]:
from mputil import lazy_imap

def my_data_processor(x):
    # some expensive computation
    return x

def my_data_generator():
    for i in range(22):
        yield i

gen = my_data_generator()

for chunk in lazy_imap(data_processor=my_data_processor, 
                       data_generator=gen, 
                       n_cpus=0):
    print(chunk)

[0, 1, 2, 3]
[4, 5, 6, 7]
[8, 9, 10, 11]
[12, 13, 14, 15]
[16, 17, 18, 19]
[20, 21]


Note that the number of elements in each return-list is by default equal to the number of CPUs being used. (The example above was run on a machine with 4 CPUs, thus each list consists of 4 elements).

We can increase or decrease the number of elements in each return-list using the `stepsize` parameter; the `stepsize` determines how many values from the `data_generator` are evaluated are fetched in one `lazy_imap` iteration. If the number of objects that can be fetched from `data_generator` is not evenly divisible by `stepsize`, the number of elements in the last result-list is smaller than `stepsize` as shown in the example below:

In [2]:
gen = my_data_generator()

for chunk in lazy_imap(data_processor=my_data_processor, 
                       data_generator=gen,
                       stepsize=6,
                       n_cpus=0):
    print(chunk)

[0, 1, 2, 3, 4, 5]
[6, 7, 8, 9, 10, 11]
[12, 13, 14, 15, 16, 17]
[18, 19, 20, 21]
