# Programming Tools II

In this lesson we look at some modules that support functional and *lazy* styles of programming.  In this context, "lazy" has a precise meaning of deferring operations until they are actually needed.  For example, in processing infinite streams, this style is required; for many others it is merely useful.

These tools are somewhat more advanced than many we look at it this course, and only an introduction is made.  We look at the modules `operator`, `functools` and `itertools` in this lesson.

In [1]:
import operator as op
from collections import namedtuple
from functools import *
from itertools import *
from dataclasses import dataclass

## Module: operator

The `operator` module simply contains functions corresponding to the standard syntactic operators in Python, which are themselves defined by "magic methods" on classes.  A quick example lets us see this.

In [2]:
a, b = 4, 5
a < b, a + b   # Syntactic versions

(True, 9)

In [3]:
a.__lt__(b), a.__add__(b)  # Methods on objects

(True, 9)

In [4]:
int.__lt__(a, b), int.__add__(a, b)  # Methods on type

(True, 9)

In [5]:
op.lt(a, b), op.add(a, b)  # Operator for any type

(True, 9)

The obvious question at this point is "Why on Earth would you want to avoid the syntax?"  In "regular" code you clearly do not want to, but in some special contexts you have to.  Let us create some objects from the last lesson to illustrate.

In [6]:
RGB = namedtuple("Color", "red green blue")
salmon = RGB(250, 128, 114)
lavender = RGB(230, 230, 250)
seagreen = RGB(46, 139, 87)
indianred = RGB(205, 92, 92)
colorlist = [salmon, lavender, seagreen, indianred]

Let's say we want to arrange these colors by the brightness of the blue channel.  We can do that with an operator, or alternately fall back to a (slower and more verbose) `lambda` form.

In [7]:
sorted(colorlist, key=op.attrgetter('blue'))

[Color(red=46, green=139, blue=87),
 Color(red=205, green=92, blue=92),
 Color(red=250, green=128, blue=114),
 Color(red=230, green=230, blue=250)]

In [8]:
sorted(colorlist, key=lambda color: color.blue)

[Color(red=46, green=139, blue=87),
 Color(red=205, green=92, blue=92),
 Color(red=250, green=128, blue=114),
 Color(red=230, green=230, blue=250)]

However, the lambda is also less flexible.  For example, maybe the choice of color channel to sort on is made at runtime.  There is no straightforward way to do this with lambda alone (the `getattr()` built-in can work in a lambda, but it is no less obscure).

In [9]:
whichcolor = "green"
sorted(colorlist, key=op.attrgetter(whichcolor))

[Color(red=205, green=92, blue=92),
 Color(red=250, green=128, blue=114),
 Color(red=46, green=139, blue=87),
 Color(red=230, green=230, blue=250)]

### Familial abstractions

A similar abstraction is available with `itemgetter()`, `methodcaller()`, for getting an index from a sequence and calling a method on an object, respectively.  But also, all the syntactic operator have such names, e.g. `floordiv()`, `not_()`, `pow()`, `xor()`, and many others.

The key function we show above is an example of using these abstracted operators, but we will see a few others in the context of the other modules in this lesson.

## Module: functools

The `functools` module contains some advanced capabilities that are not discussed in this introductory course.  A few of its features are important even for beginners to know about.

### `@lru_cache`

The "least-recently used cache" decorator allows the results of a potentially costly computation to be stored in a dictionary, and hence the work of the function to be avoided on repeated calls.  Of course, if the results are **expected** to be different across calls, this is wholly inappropriate. The number of results cached defaults to 128, but may be set to something else.

A notoriously bad function to implement naively recursively in Python is a Fibonacci calculation, because an uncached version leads to explosive repetition of the same calculations.  Let us do that with and without a cache.

In [10]:
def fib(n: int) -> int:
    "Fibonacci sequence: 1 1 2 3 5 8 13 21 ..."
    if n < 3:
        return 1
    return fib(n-1) + fib(n-2)

In [11]:
%time fib(42)

CPU times: user 47.2 s, sys: 463 µs, total: 47.2 s
Wall time: 47.2 s


267914296

Think about, e.g., how many times `fib(30)` was independently calculated.  Every binary branch descending from the top reaches that calculation at some point (e.g $2^{(N-30)}$ separate times)!

Merely caching is not the optimal approach, but it's notably better.

In [12]:
@lru_cache(maxsize=250)
def fib(n: int) -> int:
    return 1 if n < 3 else fib(n-1) + fib(n-2)

In [13]:
%time fib(42)

CPU times: user 23 µs, sys: 0 ns, total: 23 µs
Wall time: 26 µs


267914296

In [14]:
%time fib(200)

CPU times: user 118 µs, sys: 1 µs, total: 119 µs
Wall time: 121 µs


280571172992510140037611932413038677189525

### Slow resources

This mathematical sequence is a very artificial example, but the `@lru_cache` decorator is widely useful for real purposes too.

In [15]:
@lru_cache
def fetch_page(url):
    from urllib.request import urlopen
    page = urlopen(url)
    return page.read()

In [16]:
pat = "https://raw.githubusercontent.com/python/peps/master/pep-NUMBER.txt"
urls = [pat.replace('NUMBER', n) 
        for n in "0001 0002 0003 0004 0005 0006 0007 0008".split()]

Running the first time pulls down the actual data.

In [17]:
%%time
for url in urls:
    print(url, len(fetch_page(url)))

https://raw.githubusercontent.com/python/peps/master/pep-0001.txt 36499
https://raw.githubusercontent.com/python/peps/master/pep-0002.txt 8214
https://raw.githubusercontent.com/python/peps/master/pep-0003.txt 2229
https://raw.githubusercontent.com/python/peps/master/pep-0004.txt 12015
https://raw.githubusercontent.com/python/peps/master/pep-0005.txt 3043
https://raw.githubusercontent.com/python/peps/master/pep-0006.txt 8174
https://raw.githubusercontent.com/python/peps/master/pep-0007.txt 7727
https://raw.githubusercontent.com/python/peps/master/pep-0008.txt 51552
CPU times: user 220 ms, sys: 8.01 ms, total: 228 ms
Wall time: 3.77 s


Running subsequently merely reads the page contents from the cache.

In [18]:
%%time
for url in urls:
    print(url, len(fetch_page(url)))

https://raw.githubusercontent.com/python/peps/master/pep-0001.txt 36499
https://raw.githubusercontent.com/python/peps/master/pep-0002.txt 8214
https://raw.githubusercontent.com/python/peps/master/pep-0003.txt 2229
https://raw.githubusercontent.com/python/peps/master/pep-0004.txt 12015
https://raw.githubusercontent.com/python/peps/master/pep-0005.txt 3043
https://raw.githubusercontent.com/python/peps/master/pep-0006.txt 8174
https://raw.githubusercontent.com/python/peps/master/pep-0007.txt 7727
https://raw.githubusercontent.com/python/peps/master/pep-0008.txt 51552
CPU times: user 1.73 ms, sys: 0 ns, total: 1.73 ms
Wall time: 1.34 ms


In [19]:
fetch_page.cache_info()

CacheInfo(hits=8, misses=8, maxsize=128, currsize=8)

In [20]:
fetch_page.cache_clear()

### @total_ordering

One of the magic things you can do with Python classes is make them respond to operators in custom ways.  What it means to add two `Foo` instances is up to you.  Likewise, what it means for one instance of `Foo` to be "less than" another instance.

A difficulty with defining inequality relations among objects, however, is that there are **lots** of methods to define.  To allow objects to respond appropriately to all the comparisons, you need `.__lt__()`, `.__le__()`, `.__eq__()`, `.__ne__()`, `.__ge__()`, `.__gt__()`.  While in principle, each of those can do something completely different, **usually** you just want them to be consistent with each other.

By adding the decorator `@total_ordering`, all of those implicitly consistent methods are added, as long as you define at least two of them.  We can use the decorator with any class, but lets make a Data Class, continuing the last lesson.

In [21]:
@dataclass
@total_ordering
class HowBlue:
    # @dataclass generates __eq__ implementation
    Red: int = 0
    Green: int = 0
    Blue: int = 0
    
    def __lt__(self, other):
        return self.Blue < other.Blue

In [22]:
Salmon = HowBlue(*salmon)
Lavender = HowBlue(*lavender)
Seagreen = HowBlue(*seagreen)
Indianred = HowBlue(*indianred)
Seagreen

HowBlue(Red=46, Green=139, Blue=87)

In [23]:
Salmon <= Lavender, Seagreen > Indianred, Seagreen != Salmon

(True, False, True)

In [24]:
sorted([Salmon, Lavender, Seagreen, Indianred])

[HowBlue(Red=46, Green=139, Blue=87),
 HowBlue(Red=205, Green=92, Blue=92),
 HowBlue(Red=250, Green=128, Blue=114),
 HowBlue(Red=230, Green=230, Blue=250)]

### partial

The function `partial()` modifies other functions by "filling in" some of their arguments in advance.  This can both avoid repetition and make it possible to use functions in more abstract contexts. 

I give an example of a toy logger below, but for genuine logging capabilities, the Python standard library `logging` module is far more robust.

In [25]:
import sys
from datetime import datetime as dt
debug = partial(print, "DEBUG", sep="|", flush=True, file=sys.stdout)
error = partial(print, "ERROR", sep="|", flush=True, file=sys.stderr)

Using our engineered functions is just like any function.

In [26]:
debug(dt.isoformat(dt.now()), "Stuff happened")
debug(dt.isoformat(dt.now()), "Different stuff")
error(dt.isoformat(dt.now()), "More crucial event")
debug(dt.isoformat(dt.now()), "Regular event")
error(dt.isoformat(dt.now()), "Something went wrong")

DEBUG|2020-09-06T12:06:42.914318|Stuff happened
DEBUG|2020-09-06T12:06:42.915448|Different stuff


ERROR|2020-09-06T12:06:42.916314|More crucial event


DEBUG|2020-09-06T12:06:42.917130|Regular event


ERROR|2020-09-06T12:06:42.917887|Something went wrong


We might want to utilize "partialized" function within some context needing a function object.  This is similar to what we saw with functions from the `operator` moule.

In [27]:
base7 = partial(int, base=7)
nums = "23 30 52 44 12 6 56".split()

base10 = map(base7, nums)
print(*base10)

17 21 37 32 9 6 41


## Module: itertools

The module `itertools` allows for combinations of "lazy" iterators in useful ways.  When the streams produced by iterators are infinite, or even simply very large or slow to produce their next item, an "algebra" of their combination can be powerful.

The built-in functions `map()`, `filter()`, and `enumerate()` are not part of `itertools` because they are always available; however, they function very much the same way as the functions in `itertools`, and are often combined with them.

We can create an infinite sequence to work with.  Here is the most efficient version of the Fibonnaci series as an infinite iterator.

In [28]:
def make_fib():
    a, b = 0, 1
    while True:
        a, b = b, a+b
        yield a

In [29]:
# As many iterator generators as we like
fibs1, fibs2 = make_fib(), make_fib()
next(fibs1), next(fibs1), next(fibs1), next(fibs1)

(1, 1, 2, 3)

Let's get a feel for working with these infinite iterators.  For example, we can find only the even Fibonacci numbers using `filter()`.  That still leaves infinitely many of them, so we can use `itertools.islice()` to select the 5th through 15th of those.  Of those few even Fibonacci numbers, let us also divide them all by two.

In [30]:
is_even = lambda n: not n % 2
half = lambda n: n // 2

even_fibs = filter(is_even, fibs2)
print("even_fibs is:", even_fibs)

first_even_fibs = islice(even_fibs, 4, 15) # zero-based start
print("first_even_fibs:", first_even_fibs)

half_evens = map(half, first_even_fibs)
print("half_evens:", half_evens)

even_fibs is: <filter object at 0x7fb650410df0>
first_even_fibs: <itertools.islice object at 0x7fb6503bb270>
half_evens: <map object at 0x7fb650410370>


So far we have described some computation, but we have not yet *performed* it.  Let us now concretize the elements.  Only when we do this do we consume elements from the original `fibs2` iterator.  You can verify the specific numbers produced, but they do represent the several operations accurately.

In [31]:
for n in half_evens:
    print(n)

305
1292
5473
23184
98209
416020
1762289
7465176
31622993
133957148
567451585


In [32]:
# The underlying sequence is partially consumed
next(fibs2)

1836311903

These abstract mathematical sequences are fun for illustration, but rarely your daily work.  More common is iterators over things like files or sockets.  For example, the _Plague of Pythons_ short story downloaded from Project Gutenberg and used in the last lesson is a text file where we can iterate over lines.

In [33]:
def not_start_line(line):
    return "*** START" not in line

for line in takewhile(not_start_line, open('pg51804.txt')):
    print(line, end='')

The Project Gutenberg EBook of Plague of Pythons, by Frederik Pohl

This eBook is for the use of anyone anywhere at no cost and with
almost no restrictions whatsoever.  You may copy it, give it away or
re-use it under the terms of the Project Gutenberg License included
with this eBook or online at www.gutenberg.org/license

Title: Plague of Pythons
Author: Frederik Pohl
Release Date: April 19, 2016 [EBook #51804]
Language: English



In the story, a main character is named "Chandler."  We would like to skip the PG header, then find the lines mentioning that character but that do not have a full stop (period).  Moreover, throughout this, we add abd keep the line numbers of the original story.

In [34]:
def has_chandler(line):
    return "Chandler" in line[1]

def has_period(line):
    return '.' in line[1]

story = open('pg51804.txt')
no_header = dropwhile(not_start_line, story)
line_nums = enumerate(no_header)
with_character = filter(has_chandler, line_nums)
no_full_stop = filterfalse(has_period, with_character)

print(story, no_header, line_nums, with_character, no_full_stop, sep='\n')

<_io.TextIOWrapper name='pg51804.txt' mode='r' encoding='UTF-8'>
<itertools.dropwhile object at 0x7fb6503cda40>
<enumerate object at 0x7fb6503cda80>
<filter object at 0x7fb6503ae880>
<itertools.filterfalse object at 0x7fb6503ae850>


We can print off the first few of these to show the steps have been followed.

In [35]:
for line in islice(no_full_stop, 8):
    print(line)

(23, "Because of the crowd they held Chandler's trial in the all-purpose room\n")
(32, 'Chandler got to his feet and leaned on the table while the bailiff\n')
(72, 'The bailiff ordered Chandler to stand and informed him that he was\n')
(138, 'Chandler, for the first time, allowed himself to meet the eyes of the\n')
(181, 'day of June last?" prompted the prosecutor, and Chandler\'s attorney\n')
(204, 'Chandler; a doctor described in chaste medical words the derangements\n')
(206, "question from Chandler's lawyer--and, for that matter, nothing to\n")
(217, 'the guards took Chandler back to the detention cell in the basement of\n')


There is much more you can do to build up these combinations and filters and selections from iterators.  For one more example, let's select certain elements from the Fibonocci numbers.  Our "mask" will be 7 long, and take the elements at index 2 and 5 within that.  This is artificial, but shows the power.

In [36]:
# numbers 2 and 5 are just to remind of index position
# ... and Truthy value would work the same
two_and_five = cycle([0, 0, 2, 0, 0, 5, 0])

# Apply an (infinite) mask to the infinite sequence
selected = compress(make_fib(), two_and_five)

# Display the first 10 that qualify
list(islice(selected, 10))

[2, 8, 55, 233, 1597, 6765, 46368, 196418, 1346269, 5702887]