# Cryptarithmetic

An example of cryptarithmetic is the equation

```text
 ODD +
 ODD =
EVEN
```

where each letter represents a digit and where the equation holds true. First consideration: if the numbers are in base 10, ODD must be greater than 500. It cannot be 500 because this would mean O=5, D=0, and 1000 should be XDDD (we don't yet know X). Also D must be > 5, since there clearly is a carry digit. We also realize that E = 1, because ODD must be <= 999 (it cannot be 999, since the digits are different), so the sum must be strictly lower than 2000. So, the first and third digits must be 1. N must be an even number, since it's the unitary part of 2*D. Using some more inferences we can come up with two solutions: 655 + 655 = 1310 or 855 + 855 = 1710.

## Brute Force Solution

One possible approach could be implementing all the rules of arithmetc (carry digits, odd/even). This is a challenging task, even just for addition. We would like to find a shortcut. Another possibility would be trying all possibilities. Given 10 digits there are $10! \approx 3\times10^6$, this seems feasible, although not fast. For each of the letters, we can consider each of the $10!$ permutations of the digits. We would replace the letters with the values and keep iterating until the equation holds. Some edge cases may appear, and we will deal with them below.

## Inventory of Concepts

We are dealing with the following concepts:

1. Equations - two types of them:
    - The initial ones where digits are represented by uppercase letters.
    - The "filled-in" ones, where we replace letters with digits.
3. Letters.
4. Digits.
5. Assignment of a letter to a digit.
6. Evaluation or validation that the equation is correct.

How do we represent these concepts? Norvig suggests this approach:

- The original equation can be expressed as a string.
- The filled-in equation can also be represented as a string.
- The letters are single characters (one-character strings).
- The digits are also represented as single characters.
- We need a mapping that associates letters to numbers. We can leverage `str.translate()`.
- For the evaluation we can use `eval()`, that takes a string and evaluates it as an expression.

The main advantage of using strings to represent both types of equations are

1. The possibility of using `str.maketrans()` and `str.translate()`, which is much easier than doing some complicated string replacements.
2. The possibility of parsing and evaluating the filled-in equation via `eval()`.

### `eval`

`eval()` takes a string that represents a valid Python expression, parses it (i.e., it build a parse tree) and evaluates it.

In [1]:
eval('2 + 2')

4

In [2]:
eval('2 + 2 == 3')

False

### Translation Tables

We can create a translation table with the function `str.maketrans()`. If, for example, we want to translate `'ABC'` to `'123'` we can write:

In [3]:
table = str.maketrans('ABC', '123')
f = 'A + B == C'
f.translate(table)

'1 + 2 == 3'

In [4]:
eval(f.translate(table))

True

Let's define a function `valid()` that returns `True` or `False` if the function is valid or not. It should also return `False` if the equation returns an operation like `1/0`. The one below is my initial version.

In [5]:
def valid(f):
    "Formula f is valid iff it has no numbers with leading zero and evals true."
    try:
        return eval(f)
    except ZeroDivisionError:
        return False

def test_valid():
    assert valid('1 + 3 == 4') is True
    assert valid('6/2 == 3') is True
    assert valid('1/0') is False
    print('All tests pass')

test_valid()

All tests pass


This is Norvig's version, which checks for more general errors, including non-numeric characters. `ArithmeticError` is a superclass of `ZeroDivisionError`.

In [6]:
import itertools
import re
import string

def valid(f):
    """Formula is valid iff it has no numbers with leading zeros and evals True"""
    try:
        return not re.search(r'\b0[0-9]', f) and eval(f) is True
    except ArithmeticError:
        return False

test_valid()

All tests pass


### Explanation of Norvig's implementation

In the case of `ODD + ODD = EVEN` `EVEN` = 3435 would be a valid number, but `EVEN` = 0435 would not. We want to avoid cases where numbers have a leading zero. This could appear anywhere in a formula. The regular expression `r'\b0[0-9]` looks for a 0 appearing at a *word boundary*, marked by `\b`.

There is a more profound reason for avoiding numbers with leading zeros: in Python they are interpreted as octal, therefore a number like 012 would be interpreted as decimal 10.

## `solve()`

We build the solution step by step. The `solve()` function is very short and takes a formula with digits replaced by letters, fills-in all the possible replacement of such letters with numbers, and checks whether the resulting equation is valid, where by "valid" we mean that the equation does not return an `ArithmeticError` and is a true and correct Python expression. The function is just a loop calling `fill_in(formula)` which we haven't implemented yet.

In [7]:
def solve(formula):
    """Given a formula like 'ODD + ODD == EVEN', fill in digits to solve it.
    Input formula is a string; output is a digit-filled-in string or None."""
    for f in fill_in(formula):
        if valid(f):
            return f

## `fill_in()`

Let's consider a strategy to create the `fill_in(formula)` function that takes an unfilled formula and returns all possible filled-in formulas. Let's consider a simpler example: 'I + I = ME'. First note that there are multiple solution (I = 5, 6, 7, 8, 9 are all possible). In this example we have 3 letters: I, M, E. We are considering all possible permutations of 3 numbers from the set $\{0, \ldots, 9\}$.
Note also that this solution covers the case where the formula mixes letters and numbers, like 'BC2 + DE3 = AC**F'. One of the examples in the course was like this. String translation is a great choice for this purpose, as it only replace what needs to be replaced.

Norvig proposes this implementation of `fill_in()`.

In [8]:
def fill_in(formula):
    "Generate all possible fillings-in of letters in formula with digits."
    # As usual my solution was more complicated. This is nice and neat.
    letters = ''.join(set(re.findall('[A-Z]', formula)))
    for digits in itertools.permutations('1234567890', len(letters)):
        table = str.maketrans(letters, ''.join(digits))
        yield formula.translate(table)

We start first finding all the uppercase letters in the formula and put them in a set. This tells us how many unique digits we need to consider. I initially considered a set comprehension like

```python
''.join({letter for letter in formula if letter.isupper()})
```

but the one based on `re.findall()`, which returns a list with all occurrences, is probably faster.
`itertools.permutations(seq, len)` returns all the permutations of length `len` from a sequence `seq`. We then make a translation table based on the chosen digits (which are characters, not ints) and yield the formula. Note that we are using a generator function, as we don't want to compute all possible replacements at once, but rather return one at a time.

We are not done yet, as we have to verify that the result is correct. Norvig has these clever examples to go through:

In [9]:
%%time

examples = """TWO + TWO == FOUR
A**2 + B**2 == C**2
A**2 + BE**2 == BY**2
X/X == X
A**N + B**N == C**N and N > 1
ATOM**0.5 == A + TO + M
GLITTERS is not GOLD
ONE < TWO and FOUR < FIVE
ONE < TWO < THREE
RAMN == R**3 + RM**3 == N**3 + RX**3
sum(range(AA)) == BB
sum(range(POP)) == BOBO
ODD + ODD == EVEN
PLUTO not in set([PLANETS])""".splitlines()

def test_crypto(examples):
    for n, example in enumerate(examples):
        print(f'{n}: {example} ==> {solve(example)}')

test_crypto(examples)

0: TWO + TWO == FOUR ==> 928 + 928 == 1856
1: A**2 + B**2 == C**2 ==> 3**2 + 4**2 == 5**2
2: A**2 + BE**2 == BY**2 ==> 9**2 + 40**2 == 41**2
3: X/X == X ==> 1/1 == 1
4: A**N + B**N == C**N and N > 1 ==> 3**2 + 4**2 == 5**2 and 2 > 1
5: ATOM**0.5 == A + TO + M ==> 1296**0.5 == 1 + 29 + 6
6: GLITTERS is not GOLD ==> 54377186 is not 5942
7: ONE < TWO and FOUR < FIVE ==> 513 < 925 and 8540 < 8673
8: ONE < TWO < THREE ==> 513 < 625 < 64733




9: RAMN == R**3 + RM**3 == N**3 + RX**3 ==> 1729 == 1**3 + 12**3 == 9**3 + 10**3
10: sum(range(AA)) == BB ==> sum(range(11)) == 55
11: sum(range(POP)) == BOBO ==> sum(range(101)) == 5050
12: ODD + ODD == EVEN ==> 655 + 655 == 1310
13: PLUTO not in set([PLANETS]) ==> 36289 not in set([3651487])
CPU times: user 499 ms, sys: 0 ns, total: 499 ms
Wall time: 498 ms


By the way, this one below rings a bell, doesn't it?

In [10]:
print(solve('A**N + B**N == C**N and N > 2'))

None


## Profiling

If we save the code above in a file, say, `crypto.py`, we can get profiling information by typing

```python
python -m cProfile crypto.py
```

You can profile your code from inside the python prompt by importing the `cProfile` module and passing to `cProfile.run('code_to_run')`. The string should contain the code to be profiled. For more information about the profiling modules, check the [Python documentation](https://docs.python.org/3/library/profile.html).

In Jupyter there are a `%prun` and a `%%prun` magic methods. From the table below, we can see that most of the time is spent calling `solve()` which calls `valid()` which calls `eval()`.

In [11]:
%prun solve('ATOM**0.5 == A + TO + M')

 

         471 function calls in 0.001 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       46    0.000    0.000    0.000    0.000 {built-in method builtins.eval}
       47    0.000    0.000    0.000    0.000 559079801.py:1(fill_in)
       46    0.000    0.000    0.000    0.000 {method 'translate' of 'str' objects}
       46    0.000    0.000    0.000    0.000 1507065022.py:5(valid)
       47    0.000    0.000    0.000    0.000 __init__.py:272(_compile)
       46    0.000    0.000    0.000    0.000 {method 'search' of 're.Pattern' objects}
        1    0.000    0.000    0.001    0.001 2524925880.py:1(solve)
       46    0.000    0.000    0.000    0.000 __init__.py:173(search)
        1    0.000    0.000    0.001    0.001 {built-in method builtins.exec}
       47    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}
       47    0.000    0.000    0.000    0.000 {method 'join' of 'str' objects}
       46  

A more complete picture may be obtained by running `%prun` on `test_cripto()`. We can see that `solve()` is called 14 times, once per example. Look at the cumulative times associated with `valid()` and `fill_in()`. It seems that speeding up `valid()` may bring the largest benefits (Law of Diminishing Returns).

In [12]:
%prun test_crypto(examples)

0: TWO + TWO == FOUR ==> 928 + 928 == 1856
1: A**2 + B**2 == C**2 ==> 3**2 + 4**2 == 5**2
2: A**2 + BE**2 == BY**2 ==> 9**2 + 40**2 == 41**2
3: X/X == X ==> 1/1 == 1
4: A**N + B**N == C**N and N > 1 ==> 3**2 + 4**2 == 5**2 and 2 > 1
5: ATOM**0.5 == A + TO + M ==> 1296**0.5 == 1 + 29 + 6
6: GLITTERS is not GOLD ==> 54377186 is not 5942
7: ONE < TWO and FOUR < FIVE ==> 513 < 925 and 8540 < 8673
8: ONE < TWO < THREE ==> 513 < 625 < 64733




9: RAMN == R**3 + RM**3 == N**3 + RX**3 ==> 1729 == 1**3 + 12**3 == 9**3 + 10**3
10: sum(range(AA)) == BB ==> sum(range(11)) == 55
11: sum(range(POP)) == BOBO ==> sum(range(101)) == 5050
12: ODD + ODD == EVEN ==> 655 + 655 == 1310
13: PLUTO not in set([PLANETS]) ==> 36289 not in set([3651487])
 

         741478 function calls in 0.744 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    61699    0.443    0.000    0.454    0.000 {built-in method builtins.eval}
    75492    0.054    0.000    0.128    0.000 559079801.py:1(fill_in)
    75478    0.043    0.000    0.043    0.000 {method 'translate' of 'str' objects}
    75478    0.036    0.000    0.600    0.000 1507065022.py:5(valid)
    75478    0.034    0.000    0.034    0.000 {method 'search' of 're.Pattern' objects}
    75478    0.032    0.000    0.110    0.000 __init__.py:173(search)
    75492    0.032    0.000    0.045    0.000 __init__.py:272(_compile)
       14    0.026    0.002    0.754    0.054 2524925880.py:1(solve)
    75478    0.019    0.000    0.019    0.000 {built-in method maketrans}
    75521    0.013    0.000    0.013    0.000 {built-in method builtins.isinstance}
    75492    0.012    0.000    0.012    0.000 {method 'join' of 'str' objects}
       29   

`valid()` calls `eval()`, which is what is taking most time. The problem is that `eval()` is a builtin function, and we cannot modify it directly. If we cannot operate directly on `eval()` we can try do two things:

1. We can make *fewer* calls to `eval()`.
2. We can make *easier* calls to `eval()`.

How can we make it easier? One way is to break the filled-in equation into two parts, say the lhs and the rhs. This is a form of *divide and conquer*. However it seems difficult to find a good way to break the problem into subproblems that are easier and faster to handle so, in general, this approach is probably not going to work.

The other possibility is making *fewer* calls, but how? One way could be to evaluate the formula only once and call it as a function with parameters. We would still need to call the function many times, but we would call `eval()` only once. How can we do this? In order to understand, we need to dig deeper into how `eval()` works.

If applied to a (filled-in) equation like `'123 = 45**2'`, `eval()` first parses the string and builds a parse tree roughly like the one below.

```text
     ==
   /   \
 Num  Expr
 /    / | \
123 Num | Num
     |  **  \
    45       2
```

Parsing is followed by a *code generation* step which, as the name suggests, generates the code that will be executed by the final *execution* phase.

Each time we go through a new permutation, the parsing phase will be repeated, but the only thing that changes are the numbers, not the structure of the parse tree. The same is true for the code generation part. We would like to be able to run these two steps only once, and then be able to pass the specific numbers at each iteration. The `eval()` function doesn't take a statement (e.g. a function definition) in input, but rather an expression. In other words, if we can pass an expression to `eval()` without binding it to a name as we do with expression, this might work. There is a way in Python to create a function as an expression without binding it to a name, and this is what `lambda`s do. `lambda` creates a function as an expression. To make things clearer, if the input formula is something like 'YOU == ME**2', we would like to have a function like

```python
def foo(Y,O,U,M,E):
    return 100*Y + 10*O + U == (M*10 + E)**2
```

which takes the letters in the formula nad returns an expression to be evaluated, but we don't want to pass the function `foo()`, just the resulting expression.

## `compile_word()`

The `compile_word()` function takes a word like 'YOU' and returns a string like `'(1*U + 10*O + 100*Y)'`. If the word contains non-uppercase letters, i.e., lowercase letters or other symbols, they are left unchanged. Note that we assume that a single word cannot contain a combination of uppercase and non-uppercase letters.

In [13]:
def compile_word(word):
    """Compile a word of uppercase letters as numeric digits.
    E.g., compile_word('YOU') => '(1*U+10*O+100*Y)'
    Non-uppercase words unchanged: compile_word('+') => '+'"""
    if word.isupper():
        return '(' + '+'.join([str(10**n) + '*' + c
                               for n, c in enumerate(reversed(word))]) + ')'
    else:
        return word

print(compile_word('YOU'))

(1*U+10*O+100*Y)


Now let's see the whole program, centered on the `faster_solve()` function. This function takes a formula and pre-compiles it so that it is evaluated only once. We compile the formula via `compile_formula()` which returns the function that represents it and the letters. We then generate all the permutations of `len(letters)` digits, and not that now we are dealing with actual integers, not with characters.

We apply the function to the digits, passed as a tuple via unpacking. If the function returns `True`, only then we do the string translation as we did before to return the final result, but notice we have to do this translation only once one the specific permutation that we know works.

In [14]:
def faster_solve(formula):
    """Given a formula like 'ODD + ODD == EVEN', fill in digits to solve it.
    Input formula is a string; output is a digit-filled-in string or None.
    This version precompiles the formula; only one eval per formula."""
    f, letters = compile_formula(formula)
    for digits in itertools.permutations(
            (1, 2, 3, 4, 5, 6, 7, 8, 9, 0), len(letters)):
        try:
            if f(*digits) is True:
                table = str.maketrans(letters, ''.join(map(str, digits)))
                return formula.translate(table)
        except ArithmeticError:
            pass

## `compile_formula()`

Compile formula takes a formula expressed as a string and returns a function. It also returns the set of letters found as a comma separated string, stored in the variable `parms`. We split the formula into its individual tokens (variable `tokens`) by applyin `compile_word` to the list returned by `re.split('([A-Z]+)', formula)`. This regular expression splits the string using any sequence of one or more uppercase letters as separators. The fact that we wrap the separator in parentheses guarantees that the separator themselves will be returned. When we apply `compile_word` to each element we obtain either the substring unchanged, when it contains characters that are not uppercase letters, or the parsed formula as an expression.

In [15]:
formula = 'YOU == ME**2'
letters = ''.join(set(re.findall('[A-Z]', formula)))
print(letters)
parms = ', '.join(letters)
print(parms)
re.split(pattern='([A-Z]+)', string=formula)

EYUMO
E, Y, U, M, O


['', 'YOU', ' == ', 'ME', '**2']

In [16]:
tokens = map(compile_word, re.split(pattern='([A-Z]+)', string=formula))
# list(tokens) returns '(1*U+10*O+100*Y) == (1*E+10*M)**2'

`body` concatenates the tokens in a string. This is the body of the function we want to execute.

In [17]:
body = ''.join(tokens)
body

'(1*U+10*O+100*Y) == (1*E+10*M)**2'

We create a string with the function to be evaulated as follows

In [18]:
f = f'lambda {parms}: {body}'
f

'lambda E, Y, U, M, O: (1*U+10*O+100*Y) == (1*E+10*M)**2'

The formula is thne passed to `eval()` and the result is returned together with `letters`.

Putting everything together, this is what the `compile_formula()` function looks like.

In [19]:
def compile_formula(formula, verbose=False):
    """Compile formula into a function. Also return letters found, as a str,
    in same order as parms of function. The first digit of a multi-digit 
    number can't be 0. So if YOU is a word in the formula, and the function
    is called with Y eqal to 0, the function should return False."""

    # modify the code in this function.

    letters = ''.join(set(re.findall('[A-Z]', formula)))
    parms = ', '.join(letters)
    tokens = map(compile_word, re.split('([A-Z]+)', formula))
    body = ''.join(tokens)
    f = 'lambda %s: %s' % (parms, body)
    if verbose:
        print(f)
    return eval(f), letters

Let's see how this function works on the 'YOU == ME**2' example.

In [20]:
formula = 'YOU == ME**2'
f, letters = compile_formula(formula, verbose=True)
type(f)

lambda E, Y, U, M, O: (1*U+10*O+100*Y) == (1*E+10*M)**2


function

So, `compile_formula()` takes an unfilled equation, parses it and builds an expression representing a callable (the lambda), and returns the callable that is then bound to a name in `faster_solve()` and called repeatedly. This is what makes it easy to call `eval()` only once.

In [21]:
def test_faster_crypto(examples):
    for n, example in enumerate(examples):
        print(f'{n}: {example} ==> {faster_solve(example)}')

In [22]:
%prun test_faster_crypto(examples)

0: TWO + TWO == FOUR ==> 418 + 418 == 0836
1: A**2 + B**2 == C**2 ==> 3**2 + 4**2 == 5**2
2: A**2 + BE**2 == BY**2 ==> 9**2 + 40**2 == 41**2
3: X/X == X ==> 1/1 == 1
4: A**N + B**N == C**N and N > 1 ==> 3**2 + 4**2 == 5**2 and 2 > 1
5: ATOM**0.5 == A + TO + M ==> 1296**0.5 == 1 + 29 + 6
6: GLITTERS is not GOLD ==> 54377186 is not 5942
7: ONE < TWO and FOUR < FIVE ==> 013 < 820 and 7049 < 7563
8: ONE < TWO < THREE ==> 013 < 520 < 54633
9: RAMN == R**3 + RM**3 == N**3 + RX**3 ==> 1729 == 1**3 + 12**3 == 9**3 + 10**3
10: sum(range(AA)) == BB ==> sum(range(11)) == 55
11: sum(range(POP)) == BOBO ==> sum(range(101)) == 5050
12: ODD + ODD == EVEN ==> 655 + 655 == 1310
13: PLUTO not in set([PLANETS]) ==> 36289 not in set([3651487])
 

         28229 function calls (28183 primitive calls) in 0.025 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       14    0.013    0.001    0.041    0.003 1868669819.py:1(faster_solve)
    27351    0.010    0.000    0.010    0.000 <string>:1(<lambda>)
       14    0.001    0.000    0.001    0.000 {built-in method builtins.eval}
      106    0.000    0.000    0.000    0.000 1580709322.py:1(compile_word)
   102/56    0.000    0.000    0.000    0.000 {method 'join' of 'str' objects}
       28    0.000    0.000    0.000    0.000 iostream.py:610(write)
       46    0.000    0.000    0.000    0.000 1580709322.py:6(<listcomp>)
       14    0.000    0.000    0.001    0.000 4106518789.py:1(compile_formula)
       72    0.000    0.000    0.000    0.000 {built-in method builtins.sum}
       14    0.000    0.000    0.000    0.000 {method 'findall' of 're.Pattern' objects}
        1    0.000    0.000    0.042    0.042 2766598025.py:1(

## Recap

In this lesson we saw:

- List comprehensions.
- Generator expressions.
- Generator functions -> useful because you don't have to compute all possible values upfront, as is the case for a list comprehension.
- We handled different types (polymorphism). We saw an example in `timedcalls()` where `n` could be an int or a float.
- We saw the `eval()` function that can evaluate to an object or to a function.
- We did timing with the `time` builtin module and with our own function.
- We used function attributes (without saying it explicitly. Find more information on this).

## Problem Set 1 - No Leading Zeros Solutions

The problem asks to modify `compile_formula()` such that it does not allow the first digit of a multi-digit number to be zero. In the `src` folder there is my solution, but below I am showing Norvig's, which is cleaner, if still based on the old string formatting style. I used a simple list comprehension but, again, the approach based on REs is simpler and probably faster.

In [29]:
def compile_formula(formula, verbose=False):
    """Compile formula into a function. Also return letters found, as a str,
    in same order as parms of function. The first digit of a multi-digit 
    number can't be 0. So if YOU is a word in the formula, and the function
    is called with Y eqal to 0, the function should return False."""

    letters = ''.join(set(re.findall('[A-Z]', formula)))
    firstletters = set(re.findall(r'\b([A-Z])[A-Z]', formula))
    parms = ', '.join(letters)
    tokens = map(compile_word, re.split('([A-Z]+)', formula))
    body = ''.join(tokens)
    if firstletters:
        tests = ' and '.join(fl + '!=0' for fl in firstletters)
        body = '%s and (%s)' % (tests, body)
    f = 'lambda %s: %s' % (parms, body)
    if verbose:
        print(f)
    return eval(f), letters

compile_formula(formula, verbose=True)
print(faster_solve(formula))
assert faster_solve('X / X == X') == '1 / 1 == 1'

lambda E, Y, U, M, O: Y!=0 and M!=0 and ((1*U+10*O+100*Y) == (1*E+10*M)**2)
576 == 24**2
