# Functional Programming
*With toolz*

I'm assuming you are well aware of the builtin `functools` module. You may have come across [toolz](https://github.com/pytoolz/toolz).

Until recently, I used it almost exclusively for [get-in](https://toolz.readthedocs.io/en/latest/api.html#toolz.dicttoolz.get_in)
and [groupby](https://toolz.readthedocs.io/en/latest/api.html#toolz.itertoolz.groupby)

I really rediscovered it for functional programming. Here are some examples of how it can be useful.

## Toolz

### Installation
```bash
pip install toolz
```

There is also `cytoolz` which is an optimized version of `toolz`. It's basically a drop-in replacement, except Pycharm seems to have trouble with the typing.

### Data Pipeline

For this example, I'll use the task of extracting data from HTML, a.k.a. <keyword>Web Scraping</keyword>

I'm using [Scrape This Site](https://scrapethissite.com/pages/simple/) as an example

As you can see, we have 250 countries that we'd like to extract. If we think about our general steps it should look something like:

- Select a Country Element
    - Select Country Name Element
        - Get Country Name Text

    - Select Country Info Element
        - Select Capital Element
            - Get Capital Text

    - etc...

These are simple operations - selecting or getting. Here's how we could define them with `toolz` and `BeautifulSoup`

In [1]:
from toolz import curry, excepts, compose_left


@curry
def select(element, sel, method):
    if method == 'one':
        return element.select_one(sel)
    return element.select(sel)


@curry
def get_text(element):
    return element.text.strip()


@curry
def cast_to(x, to_type):
    return to_type(x)


to_int = cast_to(to_type=int)
to_float = cast_to(to_type=float)


@curry
def for_each(coll, func):
    return [func(c) for c in coll]


select_all = select(method='all')
select_one = select(method='one')
select_countries = select(sel="div.country")
get_country_name = compose_left(select_one(sel="h3.country-name"), get_text)
get_country_info = select_one(sel="div.country-info")
get_country_capital = compose_left(get_country_info, select_one(sel="span.country-capital"), get_text)
get_country_pop = compose_left(get_country_info, select_one(sel="span.country-population"), get_text, to_int)
get_country_area = compose_left(get_country_info, select_one(sel="span.country-area"), get_text, to_float)

I find this very easy to read. We keep our functions short, and it's just a matter of defining steps.

So how do we handle exceptions? What if an element does not exist?

In [2]:
# Apply excepts where ever it makes sense.

@curry
def in_case(ex, func, handler=lambda _: None):
    return excepts(ex, func, handler)


get_country_name = in_case(AttributeError, compose_left(select_one(sel="h3.country-name"), get_text))
get_country_info = select_one(sel="div.country-info")
get_country_capital = in_case(AttributeError,
                              compose_left(get_country_info, select_one(sel="span.country-capital"), get_text))
get_country_pop = in_case(AttributeError,
                          compose_left(get_country_info, select_one(sel="span.country-population"), get_text, to_int))
get_country_area = in_case(AttributeError,
                           compose_left(get_country_info, select_one(sel="span.country-area"), get_text, to_float))

### Toolz vs Cytoolz

As mentioned earlier, [Cytoolz](https://github.com/pytoolz/cytoolz) claims to be "2x-5x faster than toolz"

Let's profile it with a basic operation: snake_case to Title Case

In [3]:
# String Processing - Toolz
@curry
def str_split(s, split_on):
    return s.split(split_on)

@curry
def str_join(coll, join_on):
    return join_on.join(coll)

@curry
def for_each(coll, func):
    return [func(c) for c in coll]

split_words = str_split(split_on="_")
join_words = str_join(join_on=" ")
title_case = for_each(func=lambda s: s.title())
toolz_op = compose_left(split_words, title_case, join_words)

In [4]:
%timeit toolz_op("profiling_string_operations")

2.78 µs ± 230 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [5]:
from cytoolz import curry as c_curry, compose_left as c_compose_left
# String Processing - Cytoolz

@c_curry
def c_str_split(s, split_on):
    return s.split(split_on)

@c_curry
def c_str_join(coll, join_on):
    return join_on.join(coll)

@c_curry
def c_for_each(coll, func):
    return [func(c) for c in coll]

c_split_words = c_str_split(split_on="_")
c_join_words = c_str_join(join_on=" ")
c_title_case = c_for_each(func=lambda s: s.title())
cytoolz_op = c_compose_left(split_words, title_case, join_words)

In [6]:
%timeit cytoolz_op("profiling_string_operations")

2.46 µs ± 29.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


Nearly equal...

Let's try using a larger string

In [7]:
import secrets
very_long_token = "_".join([secrets.token_urlsafe(8) for _ in range(100_000)])

In [8]:
%timeit toolz_op(very_long_token)

39.3 ms ± 1.48 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [9]:
%timeit cytoolz_op(very_long_token)

41.1 ms ± 1.39 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
