# Day 5: Pattern matching

Part 1 at least is a typical pattern matching exercise. Repeatedly remove lowercase/UPPERCASE combos (in either order) until the length of the string no longer changes.

For string pattern matching the obvious tool is the [`re` module](https://docs.python.org/3/library/re.html) (see the [regex HOWTO](https://docs.python.org/3/howto/regex.html) as well), but regex doesn't have syntax to spell 'uppercase version of a matched letter' or 'lowercase version of a matched letter'. Not in the Python `re` module syntax at any rate, and not in the much more advanced Python [`regex` project](https://pypi.org/project/regex/) either.

But that doesn't stop us from just generating all possible combinations from [`string.ascii_uppercase`](https://docs.python.org/3/library/string.html#string.ascii_uppercase) and [`string.ascii_lowercase`](https://docs.python.org/3/library/string.html#string.ascii_uppercase)....

In [1]:
import re
import string
from functools import partial

patterns = '|'.join([f'{l}{u}|{u}{l}' for l, u in zip(string.ascii_lowercase, string.ascii_uppercase)])
replace = partial(re.compile(patterns).sub, '')

def polymer_reactions(s):
    l = len(s)
    while True:
        s = replace(s)
        if len(s) == l:
            break
        l = len(s)
    return s

In [2]:
tests = {
    'aA': '',
    'abBA': '',
    'abAB': 'abAB',
    'aabAAB': 'aabAAB',
    'dabAcCaCBAcCcaDA': 'dabCBAcaDA',
}

for t, expected in tests.items():
    assert polymer_reactions(t) == expected

In [3]:
import aocd

data = aocd.get_data(day=5, year=2018)
print('Base length:', len(data))

Base length: 50000


In [4]:
print('Part 1:', len(polymer_reactions(data)))

Part 1: 10132


## Part 2

This is just a loop over `string.ascii_lowercase`, to replace all instances of each character with the `re.I` flag to remove the character case insensitively (or a loop over `zip(string.ascii_lowercase, string.ascii_uppercase)` to use in `str.translate()` perhaps), and figuring out the length of the shortest result.

We can do this all in a generator expression passed to `min()` to produce the answer, if we chain everything together.

No, this is not going to be *fast*. The base input string is 50k characters, and already takes 2+ seconds to reduce this to the part 1 answer size. Doing this 26 times is not going to improve matters:

In [5]:
%timeit polymer_reactions(data)

2.17 s ± 125 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [6]:
def shortest_fixed_polymer(s):
    return min(  # smallest
        len( # length
            polymer_reactions( # of a polymer reaction result
                re.sub(c, '', s, flags=re.I) # with one letter removed, case-insensitively
            )
        )
        for c in string.ascii_lowercase # for every ASCII letter
    )

In [7]:
assert shortest_fixed_polymer('dabAcCaCBAcCcaDA') == 4

In [8]:
print('Part 2:', shortest_fixed_polymer(data))

Part 2: 4572
