# Part 3: The Five Kinds of Python Functions

## Agenda

1. Function Definitions 

2. Higher-Order Functions

3. Function Wrappers (around methods)

4. Lambdas

5. Callable Objects

6. Generator Functions and Yield

Yes. That's six. But. Higher-Order Functions aren't magical.

## 1. Function Definitions

I've hinted at these.

Several Elements:

-  Name
-  Zero or more parameters.
-  One or more lines of code in the body.  Use `pass` as a degenerate case.
-  Optional `return` statement to return a value. Otherwise it returns `None`.

In [1]:
def transform_1a(n: float) -> float:
    return n * 0.7142

In [2]:
transform_1a(1100)

785.6199999999999

## Some Features

- Optional parameters (i.e., there's a default value)

- Keyword argument values when calling the function

- Variable numbers of positional or keyword parameters

In [3]:
def transform_1b(n: float, factor: float = 0.7142) -> float:
    return n * factor

In [4]:
transform_1b(1100)

785.6199999999999

In [5]:
transform_1b(1000, 0.72)

720.0

In [6]:
transform_1b(factor=0.72, n=1100)

792.0

## Variable Positional Arguments

In [7]:
def transform_1c(*args: float) -> float:
    return [a * 0.7142 for a in args]

In [8]:
transform_1c(1000, 1100, 1250, 1300)

[714.1999999999999, 785.6199999999999, 892.7499999999999, 928.4599999999999]

In [9]:
def transform_1d(*args: float) -> float:
    results = []
    for a in args:
        results.append(a * 0.7142)
    return results

In [10]:
transform_1d(1000, 1100, 1250, 1300)

[714.1999999999999, 785.6199999999999, 892.7499999999999, 928.4599999999999]

## Variable Keyword Arguments

In [11]:
def transform_1e(**kwargs: float) -> float:
    n = kwargs["n"]
    factor = kwargs.get("factor", 0.7142)
    return n * factor

In [12]:
transform_1e(factor=0.72, n=1100)

792.0

In [13]:
transform_1e(n=1100)

785.6199999999999

## Important

You rarely need *all* of these features in a single function.

- If you think you do, reconsider your design, please.

You'll often need default values to create optional arguments.

Sometimes you'll want keyword args.

## 2. Higher-Order Functions

Function that takes function as an argument.

And/or returns function as a result.

In some languages, this is **Rocket Science**.

Given \\(g(x)\\) is some function,

\\[
    f(g; a) = g(a \times 0.7142)
\\]

In [14]:
from typing import Callable, Optional

In [15]:
def transform_2a(n: float, g: Optional[Callable[[float], float]] = None) -> float:
    if g is None:
        g = round
    return g(n*0.7142)

In [16]:
transform_2a(n=1100)

786

In [17]:
transform_2a(1100, int)

785

In [18]:
def round100(n: float) -> float:
    return int(round(n, -2))

In [19]:
transform_2a(1100, round100)

800

for cleanup in int, round, round100:     print(transform_2a(1100, cleanup))
This makes a lot of functional programming kind of fun. And not much work.

In [20]:
for r in range(1200, 2400, 200):
    for cleanup in int, round, round100:
        print(r, transform_2a(r, cleanup))

1200 857
1200 857
1200 900
1400 999
1400 1000
1400 1000
1600 1142
1600 1143
1600 1100
1800 1285
1800 1286
1800 1300
2000 1428
2000 1428
2000 1400
2200 1571
2200 1571
2200 1600


## 3. Function Wrappers

These are functions that are "syntactic sugar" on methods.

Example::

    len(x)
    
This is really::

    x.__len__()
    
The `len()` function invokes a similarly-named `__len__()` method of the object.

Some folks will try to say the `len()` syntax means Python is not "purely" object-oriented.

Because it uses the class' `__len__()` method, I submit it is purely OO with some easier-to-read functional syntax.

These are a special case because there's a trivial mapping between function and method. No additional processing. Nothing. 

## Not the best idea, but it works

In [21]:
from math import isclose
class NonZero(list[float]):
    def __len__(self) -> int:
        return sum(1 for x in self if not isclose(x, 0.0))

In [22]:
x = NonZero([1, 2, 3, 0, 4])

In [23]:
len(x)

4

## 4. Lambda Objects

Functions without a name and a trivial body.

Function definitions via the `def` statement have a name, parameters, and a multi-statement body.

Function definition via a `lambda` has parameters and a single expression, nothing more. In effect it's the expression from a `return` statement.

In [24]:
transform_2a(1100, lambda n: int(round(n, -2)))

800

In [25]:
transform_2a(1100, lambda n: int(round(n, -1)))

790

Commonly used with functions like `sorted()`, `max()`, `min()`, `map()`, `filter()`, `reduce()`.

This works well when we have collections of data.

In [26]:
import csv
from pathlib import Path

In [27]:
source = Path("series_2.csv")
with source.open() as source_file:
    reader = csv.DictReader(source_file)
    data = list(reader)

In [28]:
data

[{'x': '10.0', 'y': '9.14'},
 {'x': '8.0', 'y': '8.14'},
 {'x': '13.0', 'y': '8.74'},
 {'x': '9.0', 'y': '8.77'},
 {'x': '11.0', 'y': '9.26'},
 {'x': '14.0', 'y': '8.10'},
 {'x': '6.0', 'y': '6.13'},
 {'x': '4.0', 'y': '3.10'},
 {'x': '12.0', 'y': '9.13'},
 {'x': '7.0', 'y': '7.26'},
 {'x': '5.0', 'y': '4.74'}]

In [29]:
ascending_y = sorted(data, key=lambda row: float(row['y']))

In [30]:
ascending_y

[{'x': '4.0', 'y': '3.10'},
 {'x': '5.0', 'y': '4.74'},
 {'x': '6.0', 'y': '6.13'},
 {'x': '7.0', 'y': '7.26'},
 {'x': '14.0', 'y': '8.10'},
 {'x': '8.0', 'y': '8.14'},
 {'x': '13.0', 'y': '8.74'},
 {'x': '9.0', 'y': '8.77'},
 {'x': '12.0', 'y': '9.13'},
 {'x': '10.0', 'y': '9.14'},
 {'x': '11.0', 'y': '9.26'}]

In [31]:
ascending_diff = sorted(data, key=lambda row: abs(float(row['y'])-float(row['x'])))

In [32]:
ascending_diff

[{'x': '6.0', 'y': '6.13'},
 {'x': '8.0', 'y': '8.14'},
 {'x': '9.0', 'y': '8.77'},
 {'x': '7.0', 'y': '7.26'},
 {'x': '5.0', 'y': '4.74'},
 {'x': '10.0', 'y': '9.14'},
 {'x': '4.0', 'y': '3.10'},
 {'x': '11.0', 'y': '9.26'},
 {'x': '12.0', 'y': '9.13'},
 {'x': '13.0', 'y': '8.74'},
 {'x': '14.0', 'y': '8.10'}]

In [33]:
max(data, key=lambda row: float(row['x']))

{'x': '14.0', 'y': '8.10'}

## Map, Filter, and Reduce

Two built-in functions and a library function.

In [34]:
map(
    lambda row: {"x": float(row["x"]), "y": float(row["y"])}, 
    data
)

<map at 0x7f9ca10da760>

This is a teaser for a future section. A `map()` is a generator object from which we can built a list.

In [35]:
list(
    map(
        lambda row: {"x": float(row["x"]), "y": float(row["y"])}, 
        data
    )
)

[{'x': 10.0, 'y': 9.14},
 {'x': 8.0, 'y': 8.14},
 {'x': 13.0, 'y': 8.74},
 {'x': 9.0, 'y': 8.77},
 {'x': 11.0, 'y': 9.26},
 {'x': 14.0, 'y': 8.1},
 {'x': 6.0, 'y': 6.13},
 {'x': 4.0, 'y': 3.1},
 {'x': 12.0, 'y': 9.13},
 {'x': 7.0, 'y': 7.26},
 {'x': 5.0, 'y': 4.74}]

Lambdas are designed to make it easy to work with higher-order functions.

They save us from having to write a one-time-only function definition when the function is a single expression.

In [36]:
list(
    filter(
        lambda row: float(row["x"]) > 10, 
        data
    )
)

[{'x': '13.0', 'y': '8.74'},
 {'x': '11.0', 'y': '9.26'},
 {'x': '14.0', 'y': '8.10'},
 {'x': '12.0', 'y': '9.13'}]

Note that we use `float()` above. The `map()` operation did not change the underlying `data` object. It's still text. We might want to combine things.

In [37]:
list(
    filter(
        lambda row: row["x"] > 10,
        map(
            lambda row: {"x": float(row["x"]), "y": float(row["y"])}, 
            data
        )
    )
)

[{'x': 13.0, 'y': 8.74},
 {'x': 11.0, 'y': 9.26},
 {'x': 14.0, 'y': 8.1},
 {'x': 12.0, 'y': 9.13}]

We'll expand on this idea of multi-step higher-order functions later.

In [38]:
from functools import reduce

In [39]:
total_x = reduce(lambda total, row: float(row["x"]) + total, data, 0)
total_x

99.0

In [40]:
total_y = reduce(lambda total, row: float(row["y"]) + total, data, 0)
total_y

82.51

And yes, this is the built-in `sum()` function.

Want to build your pwn `prod()` function?  It's a reduce using multiply.

In [41]:
def prod(*args: float) -> float:
    return reduce(lambda a, b: a*b, args, 1)

In [42]:
def factorial(n: int) -> int:
    return prod(*range(1, n+1))

In [43]:
factorial(5) == 5*4*3*2*1

True

Built-in logical functions `all()` and `any()` can be viewed as if they're something like `reduce(lambda a, b: a and b, ...)` and `reduce(lambda a, b: a or b, ...)`. 

They're actually slightly smarter than a simple reduction. They short-circuit and stop evaluating when the answer is known. 

## Sketchy, but valid

You can assign lambdas to variables.

In [44]:
transform_4a = lambda n: n*0.7142

In [45]:
transform_4a(1100)

785.6199999999999

The debate on assigning lambdas to variables:

- It's simple. When you only have one expression, why write out the whole `def` thing?

- It's too simple. You can't write a docstring to provide useful documentation.

I haven't emphasized docstrings. Here's an example of how we might describe a function. Note the use of the `>>>` prompt example. This can be validated by the **doctest** tool.

Jupyter Notebook uses iPython which has a different prompt, `[n]:`, which more clearly shows the history of your interactions with the interpreter. 

Doctest uses native Python prompt of `>>>` (and `...` for statements that have multiple logical lines.)

In [46]:
def transform_4b(n: float) -> float:
    """
    Transforms tachometer readinds to actual engine RPM's. 
    From this we can estimate horsepower and fuel consumption.
    
    :param n: tachometer reading
    :returns: estimated engine RPM's
    
    >>> round(transform_4b(1100), -1)
    790.0
    """
    return n * 0.7142

In [47]:
round(transform_4b(1100), -1)

790.0

## 5. Callable Objects

We'll do some remedial object-oriented programming as a kind of side-bar before creating callable objects.

Note the `__init__()` special method.   We produce the double-underscore "dunder". As in "dunder init" when talking about the code.

This is called implicitly as part of object creation. After the new, empty object is built, the `__init__()` method is called to create the initial state.

In this case, the initial state is to create the `data` attribute in the object's namespace. The object is provided as the `self` attribute to each method.

In [48]:
class Totaller:
    def __init__(self, data: dict[str, str]) -> None:
        self.data = list(
            map(
                lambda row: {"x": float(row["x"]), "y": float(row["y"])}, 
                data
            )
        )
        
    def sum(self, name: str) -> float:
        return sum(row[name] for row in self.data)
    
    def sum_2(self, name: str) -> float:
        return sum(row[name]**2 for row in self.data)

We can create an instance of the object. Since the `__init__()` requires the `data` parameter, we must provide the value when initializing.

As with many of the examples, I've include type annotations. They're optional in Python. Juypter Lab doesn't check them. A tool called **mypy** can be used to check them.

In [49]:
t = Totaller(data)

In [50]:
t.sum("x")

99.0

In [51]:
t.sum("y")

82.51

## What's a "callable object?"

A class with a `__call__()` special method.

(Seems absurdly simple, but Python is designed to be simple.)

We call the presence of a particular method, like `__call__()` a "protocol".  We say that a class with `__call__()` follows the Callable protocol.

In [52]:
class Transformer:
    def __call__(self, n: float) -> float:
        return n*0.7142

In [53]:
transform_5a = Transformer()

In [54]:
round(transform_5a(1100), -1)

790.0

## Why create a callable object?

This example wraps a simple expression into a method of an object. That seems like a lot of machinery for a multiplication and a constant.

We might want to do this so we can leverage other aspects of object construction, like initialization to provide default values.

In [55]:
from typing import Optional

In [56]:
class Transformer_B:
    FACTOR = 0.7142
    def __init__(self, factor: Optional[float] = None) -> None:
        self.factor = self.FACTOR if factor is None else factor 
    def __call__(self, n: float) -> float:
        return n*self.factor
    
transform_5b = Transformer_B()

In [57]:
round(transform_5b(1100), -1)

790.0

In [58]:
transform_5b_x = Transformer_B(0.80)
round(transform_5b_x(1100), -1)

880.0

There's a lot going on in this simple class. A class-level variable, `FACTOR` shared by all instances of the class.

The `__init__()` has an optional value where the default is the special `None` value.

We can test for an omitted argument value with the expression `factor is None`. (We don't use `is` for much, but we **always** use it when comparing against the `None` object.)

An instance of the class will have a state defined by the `factor` attribute that we can inspect.

In [59]:
transform_5b.factor

0.7142

In [60]:
transform_5b_x.factor

0.8

## Why Not A Function?

Three reasons:

-  We use callable objects when a function is too simple. 
   For example: a **very** complex computation. We can encapsulate the whole computation as methods within a single class definition.

-  We use callable objects to break the rule of idempotency. Normally, a function returns the same result when called with the same arguments. Counter examples of this are random number generators and registries (like the loggers or codecs.)

-  Caching

In [61]:
from random import Random
class D6(Random):
    def __call__(self):
        return self.randint(1, 6)
d6 = D6()

In [62]:
[d6() for _ in range(10)]

[5, 2, 3, 6, 3, 1, 5, 3, 2, 3]

Different results from `d6()` because it's a callable object and can have an internal state.

## Caching

A cache means the object is stateful -- the cache evolves.

It's still idepotent, though, if we do things properly.

In [63]:
class Transformer_C:
    FACTOR = 0.7142
    def __init__(self) -> None:
        self.cache = {}
    def __call__(self, n: float) -> float:
        if n not in self.cache:
            self.cache[n] = self.transform(n)
        return self.cache[n]
    def transform(self, n: float) -> float:
        return n * self.FACTOR
    
transform_5c = Transformer_C()

In [64]:
round(transform_5c(1100), -1)

790.0

With a `transform()` method so simple, the effect of caching is hard to measure. 

If the body of `transform()` were a **lot** more complex, though, you'd see some impact.
Let's add some fake workload.

In [65]:
import time
class Transformer_C_slow:
    FACTOR = 0.7142
    def __init__(self) -> None:
        self.cache = {}
    def __call__(self, n: float) -> float:
        if n not in self.cache:
            self.cache[n] = self.transform(n)
        return self.cache[n]
    def transform(self, n: float) -> float:
        time.sleep(1)  # Mocking a LOT of work here.
        return n * self.FACTOR
    
transform_5c_slow = Transformer_C_slow()

In [66]:
%%time
transform_5c_slow(1100)

CPU times: user 514 µs, sys: 709 µs, total: 1.22 ms
Wall time: 1 s


785.6199999999999

In [67]:
%%time
transform_5c_slow(1100)

CPU times: user 5 µs, sys: 1 µs, total: 6 µs
Wall time: 7.87 µs


785.6199999999999

## Cache Complication

This design suffers from a serious problem.

`n` is float.

Back in section 2, I pointed out that float is **not** a good candidate for exact match retrieval. Computed values may differ by tiny, tiny amounts, and we can wind up with a cache full of nearly equal values.

Dictionary lookups rely on a two-step comparison.

- Hash must be the same. There's a built-in function, `hash()` which does this. This is a wrapper for the `__hash__()` special method.

- Values must be equal. This uses the `__eq__()` special method.

In [68]:
a = ((100 + 0.001) - 99.0 - 0.001) * 1100
b = ((100 + 0.003) - 99.0 - 0.003) * 1100

In [69]:
a == b

False

In [70]:
a, b

(1100.0000000000055, 1100.0000000000002)

In [71]:
from math import isclose
isclose(a, b, abs_tol = 1E-14)

True

In [72]:
hash(a), hash(b)

(12584012, 525388)

We can't use a simple dictionary lookup. But. We have a bunch of choices. Here are two:

1. Use a tree search with `isclose()` instead of a simple dictionary.

2. Change the dictionary's key-matching rules.  These are implemented by `__contains__()`, and `__getitem__()` special methods.  You can read about it in the `collections.abc` module. And yes, extending built-in classes to add new behavior is solid Python. A bit too advanced for this course, though.

3. Consider converting the RPM value from `float` to `int` (or `fraction`). We can then suppress "noise" bits in the `float` approximation.

A binary tree search is in the `bisect` module. This maintains a cache in sorted order.

In [73]:
import bisect

In [74]:
class Transformer_D:
    FACTOR = 0.7142
    def __init__(self) -> None:
        self.keys: list[float] = [] 
        self.values: list[float] = []
        
    def __call__(self, n: float) -> float:
        index = bisect.bisect(self.keys, n)
        # Is it in the cache?
        try:
            cached_n = self.keys[index-1]
            result = self.values[index-1]
        except IndexError:
            # Special case -- cache is empty and index is invalid.
            cached_n = None
            result = None
        if cached_n is None or not isclose(cached_n, n, abs_tol=1E-5):
            # Not in the cache or cache is empty.
            result = self.transform(n)
            # Insert it into the cache, maintaining sort order.
            self.keys.insert(index, n)
            self.values.insert(index, result)
        return result
    
    def transform(self, n: float) -> float:
        return n * self.FACTOR
    
transform_5d = Transformer_D()

In [75]:
a, b

(1100.0000000000055, 1100.0000000000002)

In [76]:
a == b

False

In [77]:
transform_5d(a)

785.6200000000039

In [78]:
transform_5d(b)

785.6200000000039

In [79]:
transform_5d.keys

[1100.0000000000055]

Turns out, there's still a hidden cost. The `list.insert()` method can get expensive when the list is very large. 

We might try rounding or truncating the RPM values to an integer.

## Quantize the RPM's

This is a better way to deal with caching for this particular application.

In [80]:
class Transformer_E:
    FACTOR = 0.7142
    def __init__(self) -> None:
        self.cache: dict[int, float] = {}
    def __call__(self, n: float) -> float:
        int_n = int(round(n, -1))  # Nearest 10
        if int_n not in self.cache:
            self.cache[int_n] = self.transform(n)
        return self.cache[int_n]
    def transform(self, n: float) -> float:
        return n * self.FACTOR
    
transform_5e = Transformer_E()

In [81]:
transform_5e(a)

785.6200000000039

In [82]:
transform_5e(b)

785.6200000000039

## Brief Review of callables

1. Ordinary `def` functions.

2. Higher-Order Functions. These have functions as parameters or return values. They're not fundamentally unique in Python.

3. Function Wrappers (around methods). This is syntactic sugar so we don't have to use object methods all the time. `len(x)` seems better than `x.len()` for a special cases. Some more examples: `str(x)`, `repr(x)`, `hash(x)`.

4. Lambdas. Anonymous functions with a single expression. 

5. Callable Objects. Classes that follow the `Callable` protocol; i.e., have a `__call__()` method.

## Why so many?

Functions, Lambdas, callable objects are all slightly different in the features they offer.

Functions are (generally) stateless. You can use the `global` statement to make a stateful function. Please don't.

Lambdas are designed to be simpler than a function.

The callable object is the most general set of features, able to do anything.

## 6. Generator Functions and Yield

This is the best part of Python. Seriously.

Let's start with a "comprehension".  

- List comprehension: generator in []'s

- Set comprehension: generator in {}'s

- Dictionary comprehension: a generator with :'s in {}'s

In [83]:
[2*x+1 for x in range(10)]

[1, 3, 5, 7, 9, 11, 13, 15, 17, 19]

In [84]:
{x % 3 for x in range(999)}

{0, 1, 2}

In [85]:
{n: f"{2*n+1=}" for n in range(10)}

{0: '2*n+1=1',
 1: '2*n+1=3',
 2: '2*n+1=5',
 3: '2*n+1=7',
 4: '2*n+1=9',
 5: '2*n+1=11',
 6: '2*n+1=13',
 7: '2*n+1=15',
 8: '2*n+1=17',
 9: '2*n+1=19'}

Alternative forms:

In [86]:
list(2*x+1 for x in range(10))

[1, 3, 5, 7, 9, 11, 13, 15, 17, 19]

In [87]:
set(x % 3 for x in range(999)) 

{0, 1, 2}

In [88]:
dict( (n, f"{2*n+1=}") for n in range(10) )

{0: '2*n+1=1',
 1: '2*n+1=3',
 2: '2*n+1=5',
 3: '2*n+1=7',
 4: '2*n+1=9',
 5: '2*n+1=11',
 6: '2*n+1=13',
 7: '2*n+1=15',
 8: '2*n+1=17',
 9: '2*n+1=19'}

Yes, the dictionary syntax had to change. We have to provide a sequence of tuples to the dict function because the `k: v` syntax is a special case and only works inside `{}`'s.

## Some formalities

The classes, `list`, `set`, `dict` have initializer methods that work with the  type `Iterable[Any]`. 

A list object, `[1, 2, 3]`, is an `Iterable[Any]`. 

The `Iterable` type is a protocol and it requires a method, `__iter__()`, that will produce a sequence of values. 

The output of `__iter__()` is `Iterator[Any]`. An iterator.
    
The generator expression `x % 3 for x in range(999)` is an `Iterator[Any]`.

## Some More Examples

`range()` is an iterator over values.

`map()` and `filter()` require an iterable. They are iterators.

In [89]:
[2*n+1 for n in range(10)]

[1, 3, 5, 7, 9, 11, 13, 15, 17, 19]

In [90]:
list(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [91]:
map(lambda n: 2*n+1, range(10))

<map at 0x7f9ca1227790>

In [92]:
list(
    map(lambda n: 2*n+1, range(10))
)

[1, 3, 5, 7, 9, 11, 13, 15, 17, 19]

In [93]:
list(
    2*n+1 for n in range(10)
)

[1, 3, 5, 7, 9, 11, 13, 15, 17, 19]

This is always true:

    map(f, D) == (f(x) for x in D)
    
They're identical by design. Some folks like `map(f(x), D)`. Some folks like `(f(x) for x in D)`.

## Continuing this Example

In [94]:
from math import sqrt
def prime(n: int) -> bool:
    if n < 2: return False
    if n == 2: return True
    if n % 2 == 0: return False
    for k in range(3, int(sqrt(n))+2, 2):
        if n % k == 0: return False
    return True

In [95]:
list(filter(lambda n: prime(n), range(100)))

[2,
 3,
 5,
 7,
 11,
 13,
 17,
 19,
 23,
 29,
 31,
 37,
 41,
 43,
 47,
 53,
 59,
 61,
 67,
 71,
 73,
 79,
 83,
 89,
 97]

In [96]:
list(
    n for n in range(100) if prime(n)
)

[2,
 3,
 5,
 7,
 11,
 13,
 17,
 19,
 23,
 29,
 31,
 37,
 41,
 43,
 47,
 53,
 59,
 61,
 67,
 71,
 73,
 79,
 83,
 89,
 97]

This is always true:

    filter(f, D) == (x for x in D if f(x))
    
They're identical by design. Some folks like `filter(f(x), D)`. Some folks like `(x for x in D if f(x))`.

## The pattern is `f(D: Iterable[T]) -> Iterator[T]:`

It works like this.

Use a `for` statement to consume the source domain of values.

Use a `yield` statement to provide one value for the Iterator.

In [97]:
from typing import Iterator
def some_floats() -> Iterator[float]:
    yield 2.7182818
    yield 3.1415926

In [98]:
list(some_floats())

[2.7182818, 3.1415926]

In [99]:
def odd_pairs(limit: int) -> Iterator[tuple[int, int]]:
    for n in range(limit):
        yield n, 2*n+1

In [100]:
list(odd_pairs(5))

[(0, 1), (1, 3), (2, 5), (3, 7), (4, 9)]

In [101]:
from typing import Iterator, Iterable

def odd_pairs_iter(source: Iterable[int]) -> Iterator[tuple[int, int]]:
    for n in source:
        yield n, 2*n+1

In [102]:
list(odd_pairs_iter(range(5)))

[(0, 1), (1, 3), (2, 5), (3, 7), (4, 9)]

## A transformer that's an Iterator

In [103]:
from typing import Iterator, Iterable

def transform_6(values: Iterable[float]) -> Iterator[float]:
    for v in values:
        yield v*0.7142

In [104]:
list(transform_6(range(1100, 2000, 100)))

[785.6199999999999,
 857.04,
 928.4599999999999,
 999.8799999999999,
 1071.3,
 1142.7199999999998,
 1214.1399999999999,
 1285.56,
 1356.9799999999998]

In [105]:
list(
    map(
        round100, 
        transform_6(
            range(1100, 2000, 100)
        )
    )
)

[800, 900, 900, 1000, 1100, 1100, 1200, 1300, 1400]

Important:

The iterable does not have to be a big data structure laying around in memory.

It can be a source of data like a file, that's read one row at a time.

## Case Study

New alternator on an old engine.

The tachometer display reflects the ratio of drive wheels on a serpentine belt.

Smaller wheels on the new alternator means the tachometer display is now wrong.

Preliminary measurements made it look like the new wheel is $\tfrac{5}{7}$ the original wheel size. In principle this means scaling tachomter readings by $\tfrac{5}{7} = 0.7142$.

Later, with an optical device in the engine room, detailed samples were gathered. Recorded in a spreadsheet, and saved as a CSV file, `engine.csv`.

In [106]:
from csv import DictReader
from pathlib import Path

In [107]:
source_path = Path("engine.csv")
with source_path.open() as source_file:
    reader = DictReader(source_file)
    tach_readings = (float(row["Tach"]) for row in reader)
    for result in transform_6(tach_readings):
        print(result)

714.1999999999999
1071.3
1071.3
1142.7199999999998
1249.85
1428.3999999999999
1428.3999999999999


Let's work through this carefully.

We create a `DictReader`. This is an iterable of `dict[str, str]` -- each dictionary is a row from the CSV file.

`tach_readings` is a generator expression over the rows in the reader.  It performs a computation on each row.
Input is `Iterable[dict[str, str]]`. Output is `Iterator[float]`.

`transform_6()` is a generator function. Defined as an `Iterable[float]`. Output is another `Iterable[float]`.

Note the obvious matchup between `tach_readings` type and `transform_6()` parameter type. Tools like **mypy** love this.

## Important: Generators are Lazy!

The `for` statement consumes one row from `transform_6()` to print the result.

`transform_6()` is a lazy generator. It consumes one row from `tach_readings`.

`tach_readings` is a lazy generator. It consumes one row from `reader`.

`reader` is just as lazy as everyone else. It consumes one row from the file.

Eventually, the `reader` runs of out of rows and each of these iterators stops consuming data.

But. 

There's only one row of source data in memory at a time.

## Hmmm. I want more than one value

Instead of `Iterable[float]` I really want `Iterable[dict[str, float]]` with some additional values in the dictionary.

Here's what's in the dictionary now.  The `fieldnames` attribute will tell me what the headers in the CSV file really are.

In [108]:
source_path = Path("engine.csv")
with source_path.open() as source_file:
    reader = DictReader(source_file)
    print(reader.fieldnames)

['Sample', 'Tach', 'Engine']


I'd like to add a new attribute, `Transfomed`, to compare the actual `Engine` against the transformed `Tach` value.

Why? I'm trying to model something and I need to compare model against actual.

Two choices:

- As described above, update the dictionary. It's mutable.

- Change from a mutable dictionary to something a little smarter.

    - A NamedTuple or dataclass with a computed property. We'll look at this, first.
    
    - A sequence of NamedTuples with increasingly sophisticated data.

In [109]:
from typing import NamedTuple
class SampleWithModel(NamedTuple):
    tach: float
    engine: float
    @property
    def transformed(self):
        return self.tach * 0.7142

In [110]:
source_path = Path("engine.csv")
with source_path.open() as source_file:
    reader = DictReader(source_file)
    samples = (
        SampleWithModel(float(row["Tach"]), float(row["Engine"]))
        for row in reader)
    for swm in samples:
        print(f"{swm.tach}, {swm.engine}, {swm.transformed}")

1000.0, 883.0, 714.1999999999999
1500.0, 1242.0, 1071.3
1500.0, 1217.0, 1071.3
1600.0, 1306.0, 1142.7199999999998
1750.0, 1534.0, 1249.85
2000.0, 1805.0, 1428.3999999999999
2000.0, 1720.0, 1428.3999999999999


## Not a fan

The model is buried inside a class as the `transformed()` method. It's awkward to modify as we understand more about the data.

Let's try an alternative approach with two, separate named tuples and a separate function for the model.  We'll use `transform_5b()` because it works as well as all the others.

In [111]:
class Sample(NamedTuple):
    tach: float
    engine: float

class SamplePlus(NamedTuple):
    tach: float
    engine: float
    transformed: float

In [112]:
source_path = Path("engine.csv")
model = transform_5b
with source_path.open() as source_file:
    reader = DictReader(source_file)
    raw_data = (
        Sample(float(row["Tach"]), float(row["Engine"]))
        for row in reader)
    transformed = (
        SamplePlus(row.tach, row.engine, model(row.tach))
        for row in raw_data)
    for swm in transformed:
        print(f"{swm.tach}, {swm.engine}, {swm.transformed}")

1000.0, 883.0, 714.1999999999999
1500.0, 1242.0, 1071.3
1500.0, 1217.0, 1071.3
1600.0, 1306.0, 1142.7199999999998
1750.0, 1534.0, 1249.85
2000.0, 1805.0, 1428.3999999999999
2000.0, 1720.0, 1428.3999999999999


## Refactor into a higher-order function

THe data reading and computation can be a single function to provide an iterable source of objects.

In [113]:
def alternator_mapping(source_path: Path, model: Callable[[float], float]) -> Iterator[SamplePlus]:
    with source_path.open() as source_file:
        reader = DictReader(source_file)
        raw_data = (
            Sample(float(row["Tach"]), float(row["Engine"]))
            for row in reader)
        transformed = (
            SamplePlus(row.tach, row.engine, model(row.tach))
            for row in raw_data)
        yield from transformed

Here's how we use the transform_5a model in our engine

In [114]:
list(alternator_mapping(source_path, transform_5a))

[SamplePlus(tach=1000.0, engine=883.0, transformed=714.1999999999999),
 SamplePlus(tach=1500.0, engine=1242.0, transformed=1071.3),
 SamplePlus(tach=1500.0, engine=1217.0, transformed=1071.3),
 SamplePlus(tach=1600.0, engine=1306.0, transformed=1142.7199999999998),
 SamplePlus(tach=1750.0, engine=1534.0, transformed=1249.85),
 SamplePlus(tach=2000.0, engine=1805.0, transformed=1428.3999999999999),
 SamplePlus(tach=2000.0, engine=1720.0, transformed=1428.3999999999999)]

## Corrected Model

Skipping details of the linear regression.

It involves a lot of `sum(tach)` and `sum_2(tach)` and that kind of thing.

\\[ r_{xy} = \frac{\sum x y - n \bar x \bar y}{(n-1)\sigma_x\sigma_y} \\]


\\[
b = \frac{\sum {x}{y} - n  \bar x \bar y}{(n-1){\sigma_x} ^ 2} = r_{xy} \frac {\sigma_y}{\sigma_x}
\\]

\\[
a = \bar y - b \bar x
\\]

In [115]:
def new_model(t: float) -> float:
    return 0.9*t - 90

list(alternator_mapping(source_path, new_model))

[SamplePlus(tach=1000.0, engine=883.0, transformed=810.0),
 SamplePlus(tach=1500.0, engine=1242.0, transformed=1260.0),
 SamplePlus(tach=1500.0, engine=1217.0, transformed=1260.0),
 SamplePlus(tach=1600.0, engine=1306.0, transformed=1350.0),
 SamplePlus(tach=1750.0, engine=1534.0, transformed=1485.0),
 SamplePlus(tach=2000.0, engine=1805.0, transformed=1710.0),
 SamplePlus(tach=2000.0, engine=1720.0, transformed=1710.0)]

## Wrap-up

1. Function Definitions 

2. Higher-Order Functions

3. Function Wrappers (around methods)

4. Lambdas

5. Callable Objects

6. Generator Functions and Yield

Yes. That's six. But. Higher-Order Functions aren't magical.

## Questions?

We'll start again with part 4, **Tooling and Good Practices**