# Cryptarithmetic

An example of cryptarithmetic is the equation

```text
 ODD +
 ODD =
EVEN
```

where each letter represents a digit and where the equation holds true. First consideration: if the numbers are in base 10, ODD must be greater than 500. It cannot be 500 because this would mean O=5, D=0, and 1000 should be XDDD (we don't yet know X). Also D must be > 5, since there clearly is a carry digit. We also realize that E = 1, because ODD must be <= 999 (it cannot be 999, since the digits are different), so the sum must be strictly lower than 2000. So, the first and third digits must be 1. N must be an even number, since it's the unitary part of 2*D. Using some more inferences we can come up with two solutions: 655 + 655 = 1310 or 855 + 855 = 1710.

## Brute Force Solution

One possible approach could be implementing all the rules of arithmetc (carry digits, odd/even). This is a challenging task, even just for addition. We would like to find a shortcut. Another possibility would be trying all possibilities. Given 10 digits there are $10! \approx 3\times10^6$, this seems feasible, although not fast. For each of the letters, we can consider each of the $10!$ permutations of the digits. We would replace the letters with the values and keep iterating until the equation holds. Some edge cases may appear, and we will deal with them below.

## Inventory of Concepts

We are dealing with the following concepts:

1. Equations. Two types: the original ones (with letters), and the "filled-in" (with digits).
2. Letters.
3. Digits.
4. Assignment of a letter to a digit.
5. Evaluation or validation that the equation is correct.

How do we represent these concepts? Norvig suggests this approach:

- The original equation can be expressed as a string.
- The filled-in equation can also be represented as a string.
- The letters are single characters (one-character strings).
- The numbers are also represented as single characters.
- We need a mapping that associates letters to numbers. We can leverage `str.translate()`.
- For the evaluation we can use `eval()`, that takes a string and evaluates it as an expression.

### `eval`

In [1]:
eval('2 + 2')

4

In [2]:
eval('2 + 2 == 3')

False

### Translation Tables

We can create a translation table with the function `str.maketrans()`. If, for example, we want to translate `'ABC'` to `'123'` we can write:

In [3]:
table = str.maketrans('ABC', '123')
f = 'A + B == C'
f.translate(table)

'1 + 2 == 3'

In [4]:
eval(f.translate(table))

True

Let's define a function `valid()` that returns `True` or `False` if the function is valid or not. It should also return `False` if the equation returns an operation like `1/0`. The one below is my version.

In [5]:
def valid(f):
    "Formula f is valid iff it has no numbers with leading zero and evals true."
    try:
        return eval(f)
    except ZeroDivisionError:
        return False

def test_valid():
    assert valid('1 + 3 == 4') is True
    assert valid('6/2 == 3') is True
    assert valid('1/0') is False

test_valid()

This is Norvig's version, which checks for more general errors, including non-numeric characters. `ArithmeticError` is a superclass of `ZeroDivisionError`.

In [6]:
import itertools
import re
import string

def valid(f):
    """Formula is valid iff it has no numbers with leading zeros and evals True"""
    try:
        return not re.search(r'\b0[0-9]', f) and eval(f) is True
    except ArithmeticError:
        return False

def solve(formula):
    """Given a formula like 'ODD + ODD == EVEN', fill in digits to solve it.
    Input formula is a string; output is a digit-filled-in string or None."""
    for f in fill_in(formula):
        if valid(f):
            return f

### Explanation of Norvig's implementation

In the case of `ODD + ODD = EVEN` `EVEN` = 3435 would be a valid number, but `EVEN` = 0435 would not. We want to avoid cases where numbers have a leading zero. This could appear anywhere in a formula. The regular expression `r'\b0[0-9]` looks for a 0 appearing at a *word boundary*, marked by `\b`.

There is a more profound reason for avoiding numbers with leading zeros: in Python they are interpreted as octal, therefore a number like 012 would be interpreted as decimal 10.

## `fill_in()`

Let's consider a strategy to create the `fill_in(formula)` function that takes an unfilled formula and returns all possible filled-in formulas. Let's consider a simpler example: 'I + I = ME'. Note that in one of the examples the formula mixed letters and numbers, like 'BC2 + DE3 = AC**F'. String translation is a great choice for this purpose, as it only replace what needs to be replaced.

Ultimately we came up with a design split into three parts: `solve()`, `fill_in()`, and `valid()`.

In [21]:
def fill_in(formula):
    "Generate all possible fillings-in of letters in formula with digits."
    # As usual my solution was more complicated. This is nice and neat.
    letters = ''.join(set(re.findall('[A-Z]', formula)))
    for digits in itertools.permutations('1234567890', len(letters)):
        table = str.maketrans(letters, ''.join(digits))
        yield formula.translate(table)

We are not done yet, as we have to verify that the result is correct. Norvig has these clever examples to go through:

In [29]:
examples = """TWO + TWO == FOUR
A**2 + B**2 == C**2
A**2 + BE**2 == BY**2
X/X == X
A**N + B**N == C**N and N > 1
ATOM**0.5 == A + TO + M
GLITTERS is not GOLD
ONE < TWO and FOUR < FIVE
ONE < TWO < THREE
RAMN == R**3 + RM**3 == N**3 + RX**3
sum(range(AA)) == BB
sum(range(POP)) == BOBO
ODD + ODD == EVEN
PLUTO not in set([PLANETS])""".splitlines()

In [37]:
%%time
def test_crypto(examples):
    for n, example in enumerate(examples):
        print(f'{n}: {example} ==> {solve(example)}')

test_crypto(examples)

0: TWO + TWO == FOUR ==> 765 + 765 == 1530
1: A**2 + B**2 == C**2 ==> 3**2 + 4**2 == 5**2
2: A**2 + BE**2 == BY**2 ==> 5**2 + 12**2 == 13**2
3: X/X == X ==> 1/1 == 1
4: A**N + B**N == C**N and N > 1 ==> 3**2 + 4**2 == 5**2 and 2 > 1
5: ATOM**0.5 == A + TO + M ==> 6724**0.5 == 6 + 72 + 4
6: GLITTERS is not GOLD ==> 58922637 is not 5481




7: ONE < TWO and FOUR < FIVE ==> 187 < 261 and 5134 < 5907
8: ONE < TWO < THREE ==> 165 < 241 < 27355
9: RAMN == R**3 + RM**3 == N**3 + RX**3 ==> 1729 == 1**3 + 12**3 == 9**3 + 10**3
10: sum(range(AA)) == BB ==> sum(range(11)) == 55
11: sum(range(POP)) == BOBO ==> sum(range(101)) == 5050
12: ODD + ODD == EVEN ==> 655 + 655 == 1310
13: PLUTO not in set([PLANETS]) ==> 49325 not in set([4918627])
CPU times: user 4.17 s, sys: 0 ns, total: 4.17 s
Wall time: 4.17 s


By the way, this one below rings a bell, doesn't it?

In [35]:
print(solve('A**N + B**N == C**N and N > 2'))

None


## Profiling

If we save the code above in a file, say, `crypto.py`, we can get profiling information by typing

```python
python -m cProfile crypto.py
```

You can profile your code from inside the python prompt by importing the `cProfile` module and passing to `cProfile.run('code_to_run')`. The string should contain the code to be profiled. In Jupyter there are a `%prun` and a `%%prun` magic methods. From the table below, we can see that most of the time is spent in `solve()` and `valid()` plus some primitive functions we cannot modify.

In [45]:
%prun solve('ATOM**0.5 == A + TO + M')

 

         17307 function calls in 0.022 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1402    0.012    0.000    0.013    0.000 {built-in method builtins.eval}
     1767    0.002    0.000    0.004    0.000 559079801.py:1(fill_in)
     1766    0.002    0.000    0.002    0.000 {method 'translate' of 'str' objects}
     1766    0.001    0.000    0.017    0.000 1569592752.py:5(valid)
     1767    0.001    0.000    0.001    0.000 __init__.py:272(_compile)
     1766    0.001    0.000    0.001    0.000 {method 'search' of 're.Pattern' objects}
     1766    0.001    0.000    0.003    0.000 __init__.py:173(search)
        1    0.001    0.001    0.022    0.022 1569592752.py:12(solve)
     1766    0.000    0.000    0.000    0.000 {built-in method maketrans}
     1767    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}
     1767    0.000    0.000    0.000    0.000 {method 'join' of 'str' objects}
        1   

A more complete picture may be obtained by running `%prun` on `test_cripto()`. We can see that `solve()` is called 14 times, once per example, and on average it takes 0.017 seconds to complete a call. The cumulative time associated with `valid()` is 5.130 seconds while the corresponding time for `fill_in()` is 1.271. It seems that speeding up `valid()` may bring the largest benefits (Law of Diminishing Returns).

`eval()` is taking 3.77 of the total 6.545 seconds of runtime. The problem is that `eval()` is a builtin function, and we cannot modify it directly. If we cannot operate directly on `eval()` we can do two things:

1. We can make *fewer* calls to `eval()`.
2. We can make *easier* calls to `eval()`.

How can we make it easier? One way is to break the filled-in equation into two parts, say the lhs and the rhs. This is a form of *divide and conquer*. However it seems difficult to find a good way to break the problem into easier and faster to handle parts, so, in general, this approach is probably not going to work.

In [46]:
%prun test_crypto(examples)

0: TWO + TWO == FOUR ==> 765 + 765 == 1530
1: A**2 + B**2 == C**2 ==> 3**2 + 4**2 == 5**2
2: A**2 + BE**2 == BY**2 ==> 5**2 + 12**2 == 13**2
3: X/X == X ==> 1/1 == 1
4: A**N + B**N == C**N and N > 1 ==> 3**2 + 4**2 == 5**2 and 2 > 1
5: ATOM**0.5 == A + TO + M ==> 6724**0.5 == 6 + 72 + 4
6: GLITTERS is not GOLD ==> 58922637 is not 5481




7: ONE < TWO and FOUR < FIVE ==> 187 < 261 and 5134 < 5907
8: ONE < TWO < THREE ==> 165 < 241 < 27355
9: RAMN == R**3 + RM**3 == N**3 + RX**3 ==> 1729 == 1**3 + 12**3 == 9**3 + 10**3
10: sum(range(AA)) == BB ==> sum(range(11)) == 55
11: sum(range(POP)) == BOBO ==> sum(range(101)) == 5050
12: ODD + ODD == EVEN ==> 655 + 655 == 1310
13: PLUTO not in set([PLANETS]) ==> 49325 not in set([4918627])
 

         5980465 function calls in 6.545 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   493399    3.770    0.000    3.864    0.000 {built-in method builtins.eval}
   609582    0.498    0.000    1.271    0.000 559079801.py:1(fill_in)
   609568    0.434    0.000    0.434    0.000 {method 'translate' of 'str' objects}
   609568    0.307    0.000    5.130    0.000 1569592752.py:5(valid)
   609568    0.302    0.000    0.302    0.000 {method 'search' of 're.Pattern' objects}
   609582    0.268    0.000    0.391    0.000 __init__.py:272(_compile)
   609568    0.267    0.000    0.959    0.000 __init__.py:173(search)
   609568    0.238    0.000    0.238    0.000 {built-in method maketrans}
       14    0.234    0.017    6.635    0.474 1569592752.py:12(solve)
   609611    0.123    0.000    0.123    0.000 {built-in method builtins.isinstance}
   609582    0.101    0.000    0.101    0.000 {method 'join' of 'str' objects}
      541 

The other possibility is making *fewer* calls, but how? One way could be to evaluate the formula only once and call it as a function with parameters. We would still need to call the function many times, but we would call `eval()` only once. How can we do this? In order to understand, we need to dig a bit deeper into how `eval()` works.

If applied to a (filled-in) equation like `'123 = 45**2'`, `eval()` first parses the string and builds a parse tree roughly like the one below.

```text
     ==
   /   \
 Num  Expr
 /    / | \
123 Num | Num
     |  **  \
    45       2
```

Parsing is followed by a *code generation* which, as the name suggests, generates the code that will be executed by the final *execution* phase.

Each time we go through a new permutation, the parsing phase will be repeated, but the only thing that changes is the numbers, not the structure of the parse tree. The same is true for the code generation part. We would like to be able to run these two steps only once, and then be able to pass the specific numbers at each iteration. `eval()` takes an expression, and there is a way in Python to create a function as an expression (without binding it to a name), and this is what `lambda`s do. So, if the input formula is something like 'YOU == ME**2', we would like to have a function like

```python
def foo(Y,O,U,M,E):
    return 100*Y + 10*O + U == (M*10 + E)**2
```

but as a lambda function.

In [50]:
def compile_word(word):
    """Compile a word of uppercase letters as numeric digits.
    E.g., compile_word('YOU') => '(1*U+10*O+100*Y)'
    Non-uppercase words unchanged: compile_word('+') => '+'"""
    if all([c.isupper() for c in word]):
        return '+'.join([str(10**n) + '*' + c for n, c in enumerate(reversed(word))])
    else:
        return word

In order to split a formula like `'YOU == ME**2'`into it's building blocks we can use a regular expression like the one below. By putting the splitting sequence in parentheses we are asking the RE engine to return both the separated substrings as well as the separators.

In [48]:
re.split(pattern='([A-Z]+)', string='YOU == ME**2')

['', 'YOU', ' == ', 'ME', '**2']

We can then pass the output of the function above to `map(compile_word())`

In [52]:
list(map(compile_word, re.split(pattern='([A-Z]+)', string='YOU == ME**2')))

['', '1*U+10*O+100*Y', ' == ', '1*E+10*M', '**2']

**TO BE COMPLETED** - FIND THE MISSING CODE THAT ACHIEVES THE SPEEDUP

## Recap

In this lesson we saw:

- List comprehensions.
- Generator expressions.
- Generator functions -> useful because you don't have to compute all possible values upfront, as is the case for a list comprehension.
- We handled different types (polymorphism). We saw an example in `timedcalls()` where `n` could be an int or a float.
- We saw the `eval()` function that can evaluate to an object or to a function.
- We did timing with the `time` builtin module and with our own function.
- We used function attributes (without saying it explicitly. Find more information on this).