##### Diving In

Rules for english language pluralization


* If a word ends in S, X, or Z, add ES. Bass becomes basses, fax becomes faxes, and waltz becomes waltzes.
* If a word ends in a noisy H, add ES; if it ends in a silent H, just add S. What’s a noisy H? One that gets combined with other letters to make a sound that you can hear. So coach becomes coaches and rash becomes rashes, because you can hear the CH and SH sounds when you say them. But cheetah becomes cheetahs, because the H is silent.
* If a word ends in Y that sounds like I, change the Y to IES; if the Y is combined with a vowel to sound like something else, just add S. So vacancy becomes vacancies, but day becomes days.
* If all else fails, just add S and hope for the best.

Design a library that pluralizes English nouns

##### Step 1: Use regex

We can use regex's to pluralize words nouns

In [130]:
import re
re.search('[abc]', 'Mark')

<re.Match object; span=(1, 2), match='a'>

In [131]:
re.sub('[abc]', 'o', 'Mark')

'Mork'

In [132]:
re.sub('[abc]', 'o', 'rock')

'rook'

In [133]:
re.sub('[abc]', 'o', 'caps')

'oops'

In the above `re.sub` (substitute) examples, if the string contains any of `a` or `b` or `c` replace it with `o`.

In the last example we see that all occurences are replaced. So this regex turns `caps` into `oops` because both the `c` and the `a` are replaced by an `o`

In [134]:
import re

def plural(noun):
    if (re.search("[sxz]$", noun)):
        return re.sub("$", "es", noun)
    elif (re.search("[^aeioudgkprt]h$", noun)):
        return re.sub("$", "es", noun)
    elif (re.search("[^aeiou]y$", noun)):
        return re.sub("y$", "ies", noun)
    else:
        return re.sub("$", "s", noun)

In the above the square brackets mean "match _exactly_ one of these characters". So `[sxz]` means any of `s`, `x` or `z`. 

A `^` inside a bracket means _any single character except_, for example `[^sxz]` means any character _except_ `s`, `x` or `z`. 


##### A List of Functions

Let's temporarily complicate part of the problem to simplify another part of the solution.

In [135]:
import re

def match_sxz(noun: str) -> bool:
    return re.search("[sxz]$", noun)

def apply_sxz(noun: str) -> str:
    return re.sub("$", "es", noun)

def match_h(noun: str) -> bool:
    return re.search("[^aeioudgkprt]h$", noun)

def apply_h(noun: str) -> str:
    return re.sub("[^aeioudgkprt]h$", "es", noun)

def match_y(noun: str) -> bool:
    return re.search("[^aeiou]y$", noun)

def apply_y(noun: str) -> str:
    return re.sub("[^aeiou]y$", "ies", noun)
                  
def match_default(noun: str) -> bool:
    return True

def apply_default(noun: str) -> str:
    return re.sub("$", "s", noun)

We are going to use the function definition above to "simply" the `plural` method

In [136]:
from typing import List, Tuple


def plural2(noun: str) -> str:
    rules : List[Tuple[function, function]] = [(match_sxz, apply_sxz),
                                              (match_h, apply_h),
                                              (match_y, apply_y),
                                              (match_default, apply_default)]
        
    for match, apply in  rules:
        if match(noun):
            return apply(noun)

In [137]:
print(plural2("vacancy"))
print(plural2("boy"))
print(plural2("apple"))

vacanies
boys
apples


The reason this technique works is that functions are **first-class** is Python. This means functions can be passed around as data.

Python achieves this my treating **everyting as an object**

The above `plural` function can be simplified further in multiple way, we'll first look at a way utilizing only a single `match` and `apply` function.

In [138]:
import re
from typing import List, Tuple

def match(rule: str, noun: str) -> bool:
    return re.search(rule, noun)

def apply(rule: str, sub:str, noun: str) -> str:
    return re.sub(rule, sub, noun)


def plural3(noun: str) -> str:
    rules : List[Tuple[str, str, str]] = [("[sxz]$", "$", "es"),
                                        ("[^aeioudgkprt]h$", "$", "es"),
                                        ("[^aeiou]y$", "y$", "ies"),
                                        (".*", "$", "s")]
    
    for rule, sub_rule, sub_str in rules:
        if match(rule, noun):
            return apply(sub_rule, sub_str, noun)

In [139]:
print(plural3("vacancy"))
print(plural3("boy"))
print(plural3("apple"))

vacancies
boys
apples


This is an improvement, reducing the number of functions required to twice the number of rules required to *exactly 2*, in fact this is the best solution imo.

However we are here to demonstrate closures and generation. Let's rewrite `rules` to have the same signature as in `plural2` but yet maintain the improvements from `plural3`

In [140]:
import re

from typing import List, Tuple, Callable, Optional

Rules = Tuple[Callable[[str], Optional[re.Match]], Callable[[str], str]]

def make_rule(mat_pat: str, sub_pat: str, sub_str: str) -> Rules:
    
    def match(noun: str) -> Optional[re.Match]:
        return re.search(mat_pat, noun)
    
    def apply(noun: str) -> str:
        return re.sub(sub_pat, sub_str, noun)
    
    return match, apply



def plural4(noun: str) -> str:
    rules : List[Rules] = [make_rule("[sxz]$", "$", "es"),
         make_rule("[^aeioudgkprt]h$", "$", "es"),
         make_rule("[^aeiou]y$", "y$", "ies"),
         make_rule(".*", "$", "s")]
    
    for match, apply in rules:
        if match(noun):
            return apply(noun)

In [141]:
print(plural4("vacancy"))
print(plural4("boy"))
print(plural4("apple"))

vacancies
boys
apples


The `make_rule` function builds two functions `match` and `apply`. The `match` and `apply` function use the values outside of parameters, the values are defined in the scope of `make_rule`.

The scope is `closed` over the scope of `make_rule` and is called a `closure`. `match` and `apply` operate in the `closed` scope of `make_rule`.

##### A File Of Patterns

Our next step is to *externalize* the rules needed to pluralize. This will allow us to add additional rules without modifying code.

In [142]:
with open('examples/plural4-rules.txt') as pat_file:
    for line in pat_file:
        print(line)

[sxz]$			$		es

[^aeioudgkprt]h$	$		es

[^aeiou]y$		y$		ies

$			$		s



In [143]:
import re

Rules = Tuple[Callable[[str], Optional[re.Match]], Callable[[str], str]]

def make_rule(mat_pat: str, sub_pat: str, sub_str: str) -> Rules:
    
    def match(noun: str) -> Optional[re.Match]:
        return re.search(mat_pat, noun)
    
    def apply(noun: str) -> str:
        return re.sub(sub_pat, sub_str, noun)
    
    return match, apply

def plural5(noun: str) -> str:
    
    rules : List[Rules] = []
        
    with open('examples/plural4-rules.txt') as pf:
        for line in pf:
            parts = line.split(None, 3)
            rules.append(make_rule(parts[0], parts[1], parts[2]))
    
    for match, apply in rules:
        if match(noun):
            return apply(noun)

In [144]:
print(plural5("vacancy"))
print(plural5("boy"))
print(plural5("apple"))

vacancies
boys
apples


We have now externalized externalized the rule. 

Note: In `line.split(None, 3)` the argument `None` is passed in to split on any whitespace (spaces, tabs etc. it makes no difference). The argumetn `3` is passed in to split into 3 string/parts.

The major drawback of this method is the need to read the entire file into memory before processing.

Code is code, data is data and life is good.

##### Generators

Wouldn't it be grand if we didn't need to parse the entire file, to have a generic `plural()` function that parse the rules file, gets a single rule, checks for a match and applies the transformation before proceeding to the next rule when there isn't a match.

In [145]:
import re

Rules = Tuple[Callable[[str], Optional[re.Match]], Callable[[str], str]]

def rules(rulefile) -> Rules:
    
    with open(rulefile) as pf:
        for line in pf:
            parts = line.split(None, 3)
            yield make_rule(parts[0], parts[1], parts[2])
    
def make_rule(mat_pat: str, sub_pat: str, sub_str: str) -> Rules:

    def match(noun: str) -> Optional[re.Match]:
        return re.search(mat_pat, noun)

    def apply(noun: str) -> str:
        return re.sub(sub_pat, sub_str, noun)

    return match, apply

def plural6(noun: str) -> str:
    for match, apply in rules("examples/plural4-rules.txt"):
        if match(noun):
            return apply(noun)

In [146]:
print(plural6("vacancy"))
print(plural6("boy"))
print(plural6("apple"))

vacancies
boys
apples


The presence of the `yield` keywork means the `rules` function is not a normal function. It is a special function that **generates** values one at a time. 

You can think of it as a resumable function. Calling it will return a *generator* that can be used to generate successive values.

In [147]:
def fib(max):
    a, b = 0, 1
    while a < max:
        yield a
        a, b = b, a + b

In [148]:
print(type(fib(10)))
print(type(fib))

<class 'generator'>
<class 'function'>


You can use a **generator** like the result of invoking `fib` in a for loop directly. The `for` loop will automatically call the `next()` function to get values from the `fib()` generator and assign them to the for loop variable.

In [149]:
for n in fib(1000):
    print(n, end=' ')

0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 

Passing a generator to the `list()` function will iterate through the entire generator (just like the `for` loop in the previous example) and return a list of all the values.

In [150]:
list(fib(1000))

[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987]